Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

(arxiv.org)

53 points | by gmays 20 hours ago

3 comments

miven 18 hours ago
I'm really glad that these HNet-inspired approaches are getting traction, I'm a big fan of that paper.
Though I wonder how much of the gains in this case are actually due to 75% extra parameters compared to the baseline, even if the inference FLOPs are matched.
Can't help but see this as a just different twist on parameter use sparsity idea leveraged by MoE models, as those also gain in performance at constant forward pass FLOPs because of extra parameters.
sorenjan 19 hours ago
Would this enable a model to learn concepts in one language and generate answers about it in another, as long as it learns general translations between them?
[-]
- notrealyme123 18 hours ago
  My educated guess: Not more than any other LLM. The text-latent encoder and latent-text decoder just find am more efficient representation of the tokens, but it's more of a compression instead of turning words/sentences into abstract concepts. There will be residuals of the input language be in there.
notrealyme123 18 hours ago
Broken citations. My inner reviewer gets sad. :(