Jensen's Inequality as an Intuition Tool (2021)

(blog.moontower.ai)

95 points | by sebg 124 days ago

7 comments

  • raziel2701 121 days ago
    I have no idea what Jensen's inequality means.

    Goes to wikipedia

    "In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations"

    I still have no idea what it means.

    • programjames 121 days ago
      A convex function is open to the top, like this:

        \      /
         \    /
          \  /
           \/
      
      If you connect any two points, it lies outside the curve. Which is basically the intuition for Jensen's inequality: if you go partway between two points it's above the curve, so the weighted average of the curve at those two points is bigger than the curve at their weighted average.
    • kxyvr 121 days ago
      A convex function is a function that is bowl shaped such a parabola, `x^2`. If you take two points and connect them with a straight line, then Jensen's inequality tells you that the function lies below this straight line. Basically, `f(cx+(1-c)y) <= c f(x) + (1-c) f(y)` for `0<=c<=1`. The expression `cx+(1-c)y` provides a way to move between a point `x` and a point `y`. The expression on the left of the inequality is the evaluation of the function along this line. The expression on the right is the straight line connecting the two points.

      There are a bunch of generalizations to this. It works for any convex combination of points. A convex combination of points is a weighted sum of points where the weights are positive and add to 1. If one is careful, eventually this can become an infinite convex combination of points, which means that the inequality holds with integrals.

      In my opinion, the wiki article is not well written.

    • Ey7NFZ3P0nzAe 121 days ago
      I can recommend this awesome 6min youtube video by the channel Mutual Information : https://www.youtube.com/watch?v=u0_X2hX6DWE
    • belter 121 days ago
      An algorithm for fair distribution of GPU's :-)
  • lupire 121 days ago
    Jensen's inequality says that the average value of a function is biased toward the value where the derivative is closer to 0. That's where moving away from a sample point has the least impact on the output.

    It applies in scenarios that are "convex", which means that the derivative is monotonically increasing or decreasing, so "closer to 0" is a consistent direction.

    • blackeyeblitzar 121 days ago
      Isn’t this just saying things seek a local minima (or maxima)?
  • fn-mote 121 days ago
    I would have spent longer on this article but it did not define Jensen’s inequality before I got frustrated.

    I skimmed several section looking for it. :(

    • daveguy 121 days ago
      There is a problem with the organization. At least one of the explanatory paragraphs in "2" of the table of contents should come before "why I find it interesting".

      Definitely recommend reading a little of this first:

      https://en.m.wikipedia.org/wiki/Jensen%27s_inequality

      (Could just provide a link to it near the beginning of the article for reference.)

      • kristopolous 121 days ago
        Although I usually rail against Wikipedia math articles being gleefully esoteric jargon describing concepts in the most aggressively arcane way possible as if it's a competitive sport, the convex function article is fine. People have clearly done work on it

        (People might say, well isn't it obvious what convex means. My response is no, it's dealer's choice)

    • bilater 121 days ago
      I had the exact same reaction and was about to rant here but I looked up the wikipedia article and to be fair to the author there is no simple easy way to explain this. The article actually does a decent job. :)
      • taneq 121 days ago
        "If a line's getting steeper then its average height up til now is less than its average height will be later."
    • iamcreasy 121 days ago
      The article does it around half way through,

      ...the inequality the way I learned it²:

      E[f(x)] ≥ f(E[x])

      …if f(x) is convex

  • vermarish 121 days ago
    This is really cool! I've seen Jensen's inequality used many times over in my stats/ML classes, but the traffic example here gave me an "aha" moment about how it manifests.

    I like the visualizations of the expected value against the individual probabilistic components as well, though I wish there were more non-uniform distributions visualized. Perhaps if we take the traffic example and tweak the distribution to be non-uniform, that might make for a cool interactive viz.

  • lordofmoria 121 days ago
    This was interesting, especially with the DCF example at the end - it’s pertinent to business sell decisions (assuming your ownership structure allows you to make a decision) should I sell at an 8x multiple of revenue, or hold at an X% growth rate and Y% cash flow? What’s my net after 10 years?

    The point of Jensen’s inequality if I understand correctly is that you’d underestimate the value of holding using a basic estimate approach, because you’ll underestimate the compounding cash flow from growth?

    • pfortuny 121 days ago
      It depends on whether future returns increase. One tends to draw the optimistic version of Jensen’s inequality (and in general, of convex curves), but it also applies to decreasing functions.
  • vladimirralev 121 days ago
    Looks like a reasonable intuition guide with a couple of caveats. This intuition only works for gaussian or may be a few other distributions, not for a general p(x). Then the EV rows in the tables are not actual EV quantities, only the total is an EV, that can confuse somebody. Overall I think, the point could have been carried better with a few nice charts showing the p(x), f(p(x)) and the E.
    • seanhunter 121 days ago
      > This intuition only works for gaussian or may be a few other distributions, not for a general p(x)

      I don't think that's true. It's that if X is a random variable and φ is a convex function, then

          φ(E[X]) ≤ E[φ(X)].
      
      It's not necessary for X to be Gaussian, only that φ is convex.

      An intuitive way of thinking about it is if φ is convex then it is cup-shaped. So if I sample two points from X and draw a line φ(x_1) to φ(x_2) then that line will clearly lie above the points in x that are between x_1 and x_2 in the cup right? Jensen's inequality just generalises that to say what if I take all the points from X, then the expectation of φ(X) is going to sit above φ(E[X]). Because E[X] is just going to sit somewhere in the middle of X so φ(E[X]) is going to be down in the middle of the cup so is going to be smaller than (φ(x_1)+φ(x_2)+...φ(x_n))/n, which is E[φ(X)].

      • vladimirralev 121 days ago
        That's what I mean. The Jensen inequality applies to any distribution, but the intuition presented in the post is only good for simple distributions and all examples are gaussian/binomial-like. It would be difficult to raise the same points with something multimodal or arbitrary.
  • varelse 121 days ago
    [dead]