6 comments

  • byyoung3 2 days ago
    It’s not new and only superior in a very narrow set of categories.
    • heyitsguay 2 days ago
      As a computer vision guy I'm sad JEPA didn't end up more effective. Makes perfect sense conceptually, would have easily transferred to video, but other self-supervised methods just seem to beat it!
      • turnersr 2 days ago
        Yeah! JEPA seems awesome. Do you mind sharing what other self-supervised methods work better than JEPA?
  • blixt 2 days ago
    Needs a (2023) tag. But definitely the release of ARC2 and image outputs from 4o got me thinking about the JEPA family too.

    I don't know if it's right (and I'm sure JEPA has lots of performance issues) but seems good to have a fully latent space representation, ideally across all modalities, so that the concept "an apple a day keeps the doctor away" becoming image/audio/text is a choice of decoder rather than dedicated token ranges being chosen even before the actual creation process in the model begins.

  • niemandhier 2 days ago
    GPTs are in the “exploit” phase of the “explore-exploit” trade-off.

    JEPA is still in the explore phase, it’s good to read the paper and have an understanding of the architecture to gain an alternative perspective.

  • laughingcurve 2 days ago
    Not new, not notable right now, not sure why it's getting upvoted (just kidding, it's because people see YLC and upvote based on names)
    • MoonGhost 2 days ago
      Even average papers can have nice overview of the problem and references.
    • Grimblewald 2 days ago
      I don't care for names, i just thought it was an interesting read.
  • justanotheratom 2 days ago
    JEPA is presumably superior to Transformers. Can any expert enlighten us on the implications of this paper?
    • spmurrayzzz 2 days ago
      Transformers are usually part of JEPA architectures. In I-JEPA's case, there is a ViT that is used in the context encoding phase.