"In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations"
If you connect any two points, it lies outside the curve. Which is basically the intuition for Jensen's inequality: if you go partway between two points it's above the curve, so the weighted average of the curve at those two points is bigger than the curve at their weighted average.
A convex function is a function that is bowl shaped such a parabola, `x^2`. If you take two points and connect them with a straight line, then Jensen's inequality tells you that the function lies below this straight line. Basically, `f(cx+(1-c)y) <= c f(x) + (1-c) f(y)` for `0<=c<=1`. The expression `cx+(1-c)y` provides a way to move between a point `x` and a point `y`. The expression on the left of the inequality is the evaluation of the function along this line. The expression on the right is the straight line connecting the two points.
There are a bunch of generalizations to this. It works for any convex combination of points. A convex combination of points is a weighted sum of points where the weights are positive and add to 1. If one is careful, eventually this can become an infinite convex combination of points, which means that the inequality holds with integrals.
In my opinion, the wiki article is not well written.
Jensen's inequality says that the average value of a function is biased toward the value where the derivative is closer to 0. That's where moving away from a sample point has the least impact on the output.
It applies in scenarios that are "convex", which means that the derivative is monotonically increasing or decreasing, so "closer to 0" is a consistent direction.
There is a problem with the organization. At least one of the explanatory paragraphs in "2" of the table of contents should come before "why I find it interesting".
Definitely recommend reading a little of this first:
Although I usually rail against Wikipedia math articles being gleefully esoteric jargon describing concepts in the most aggressively arcane way possible as if it's a competitive sport, the convex function article is fine. People have clearly done work on it
(People might say, well isn't it obvious what convex means. My response is no, it's dealer's choice)
I had the exact same reaction and was about to rant here but I looked up the wikipedia article and to be fair to the author there is no simple easy way to explain this. The article actually does a decent job. :)
This is really cool! I've seen Jensen's inequality used many times over in my stats/ML classes, but the traffic example here gave me an "aha" moment about how it manifests.
I like the visualizations of the expected value against the individual probabilistic components as well, though I wish there were more non-uniform distributions visualized. Perhaps if we take the traffic example and tweak the distribution to be non-uniform, that might make for a cool interactive viz.
This was interesting, especially with the DCF example at the end - it’s pertinent to business sell decisions (assuming your ownership structure allows you to make a decision) should I sell at an 8x multiple of revenue, or hold at an X% growth rate and Y% cash flow? What’s my net after 10 years?
The point of Jensen’s inequality if I understand correctly is that you’d underestimate the value of holding using a basic estimate approach, because you’ll underestimate the compounding cash flow from growth?
It depends on whether future returns increase. One tends to draw the optimistic version of Jensen’s inequality (and in general, of convex curves), but it also applies to decreasing functions.
Looks like a reasonable intuition guide with a couple of caveats. This intuition only works for gaussian or may be a few other distributions, not for a general p(x). Then the EV rows in the tables are not actual EV quantities, only the total is an EV, that can confuse somebody. Overall I think, the point could have been carried better with a few nice charts showing the p(x), f(p(x)) and the E.
> This intuition only works for gaussian or may be a few other distributions, not for a general p(x)
I don't think that's true. It's that if X is a random variable and φ is a convex function, then
φ(E[X]) ≤ E[φ(X)].
It's not necessary for X to be Gaussian, only that φ is convex.
An intuitive way of thinking about it is if φ is convex then it is cup-shaped. So if I sample two points from X and draw a line φ(x_1) to φ(x_2) then that line will clearly lie above the points in x that are between x_1 and x_2 in the cup right? Jensen's inequality just generalises that to say what if I take all the points from X, then the expectation of φ(X) is going to sit above φ(E[X]). Because E[X] is just going to sit somewhere in the middle of X so φ(E[X]) is going to be down in the middle of the cup so is going to be smaller than (φ(x_1)+φ(x_2)+...φ(x_n))/n, which is E[φ(X)].
That's what I mean. The Jensen inequality applies to any distribution, but the intuition presented in the post is only good for simple distributions and all examples are gaussian/binomial-like. It would be difficult to raise the same points with something multimodal or arbitrary.
Goes to wikipedia
"In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations"
I still have no idea what it means.
There are a bunch of generalizations to this. It works for any convex combination of points. A convex combination of points is a weighted sum of points where the weights are positive and add to 1. If one is careful, eventually this can become an infinite convex combination of points, which means that the inequality holds with integrals.
In my opinion, the wiki article is not well written.
It applies in scenarios that are "convex", which means that the derivative is monotonically increasing or decreasing, so "closer to 0" is a consistent direction.
I skimmed several section looking for it. :(
Definitely recommend reading a little of this first:
https://en.m.wikipedia.org/wiki/Jensen%27s_inequality
(Could just provide a link to it near the beginning of the article for reference.)
(People might say, well isn't it obvious what convex means. My response is no, it's dealer's choice)
...the inequality the way I learned it²:
E[f(x)] ≥ f(E[x])
…if f(x) is convex
I like the visualizations of the expected value against the individual probabilistic components as well, though I wish there were more non-uniform distributions visualized. Perhaps if we take the traffic example and tweak the distribution to be non-uniform, that might make for a cool interactive viz.
The point of Jensen’s inequality if I understand correctly is that you’d underestimate the value of holding using a basic estimate approach, because you’ll underestimate the compounding cash flow from growth?
I don't think that's true. It's that if X is a random variable and φ is a convex function, then
It's not necessary for X to be Gaussian, only that φ is convex.An intuitive way of thinking about it is if φ is convex then it is cup-shaped. So if I sample two points from X and draw a line φ(x_1) to φ(x_2) then that line will clearly lie above the points in x that are between x_1 and x_2 in the cup right? Jensen's inequality just generalises that to say what if I take all the points from X, then the expectation of φ(X) is going to sit above φ(E[X]). Because E[X] is just going to sit somewhere in the middle of X so φ(E[X]) is going to be down in the middle of the cup so is going to be smaller than (φ(x_1)+φ(x_2)+...φ(x_n))/n, which is E[φ(X)].