I’d like to know what the advantage is over KL divergence. It seems like the important idea is symmetry? Not clear to me why that matters; I’d love to know what application this is used for.
Iirc (and I could be wrong, this is from memory) JS divergence is what is minimized in GANs (where we simultaneously train a generator and real/synthetic classifier with the goal of each trying to beat the other to converge on real looking synthetic data), at least for some training methods.
I don’t think GANs are used much now in comparison to diffusion models, but as recently as a few years ago they were the standard way to make fake data, a la “this face does not exist”
I was just reading about JSD the other day after reading about KL divergence...seems like a nifty measurement device for things like sim-to-real evaluations in robots (the reason I was going down this rabbit hole.)
I think the appeal over raw KL is that JSD behaves a bit nicer when the simulated and real distributions don't perfectly overlap...which is basically always true in the real world!
I don’t think GANs are used much now in comparison to diffusion models, but as recently as a few years ago they were the standard way to make fake data, a la “this face does not exist”
I was just reading about JSD the other day after reading about KL divergence...seems like a nifty measurement device for things like sim-to-real evaluations in robots (the reason I was going down this rabbit hole.)
I think the appeal over raw KL is that JSD behaves a bit nicer when the simulated and real distributions don't perfectly overlap...which is basically always true in the real world!