Searchformer: Beyond A* – Better planning with transformers via search dynamics

(github.com)

181 points | by yeldarb 11 days ago

4 comments

a_wild_dandan 11 days ago
Ah, I remember reading this paper! Essentially, they created synthetic data by solving search problems using A*. Trained on this data, transformers unsurprisingly learned to solve these problems. They then improved their synthetic data by repeatedly solving a given problem many times with A*, and keeping only the shortest step solution. Transformers learned to be competitive with this improved search heuristic too!
Pretty remarkable stuff. Given the obsession over chat bots, folk often miss how revolutionary transformer sequence modeling is to...well, any sequence application that isn't a chat bot. Looking solely at speeding up scientific simulations by ~10x, it's a watershed moment for humanity. When you include the vast space of other applications, we're in for one wild ride, y'all.
[-]
- electricships 11 days ago
  After AlphaZero, people tried to train various combinatorial solvers
  IIRC, they were unable to exceed Sota of SAT solvers (which admittedly are the results of decades of research)
  maybe we just needed bigger networks, or transformers, or just more training compute
  [-]
  - lou1306 11 days ago
    SAT appears to be rather impervious to statistical methods because it is very non-linear and non-local (besides, of course, being NP-complete). Flip one single literal anywhere and poof!, your formula mght go from SAT to UNSAT, or vice versa, or you might enable a ton of simplifications, etc. So it might be very hard to train a network on such a riotous state space.
    Additionally, in practice SAT solvers are also supposed to provide witnesses (i.e., a satisfying assignment or an unsatisfiability core) to back their verdicts, and that may be an ever taller order for ML.
    [-]
    - math_dandy 10 days ago
      Since all of these combinatorial optimization problems are, in full generality, are as difficult as possible in the complexity theoretic sense, there is little prospect of AI/ML helping to solve these problems in general. (?)
      But people hope that some subclass of "real-life" instances of one of these optimization problems has enough structure that AI/ML may be able to learn useful heuristics for it. (This is not my field. This is just my take-away from reading some AI/ML-for-CO papers.)
  - davidguetta 11 days ago
    Mm theres one purely based on position analysis and not tree search that has GM level in chess. From deepmind
  - radomir_cernoch 11 days ago
    I'd love to read more. Do you know any sources?
    [-]
    - algo_trader 11 days ago
      Try Reinforcement Learning for Combinatorial Optimization: A Survey
      or a more recent and spirited discussion on reddit..
      https://old.reddit.com/r/reinforcementlearning/comments/196i...
      [-]
      - nickpsecurity 10 days ago
        Link for those who need it:
        https://arxiv.org/abs/2003.03600
  - reindeergmz 10 days ago
    Seems kind of clear to me we’re driving towards as little statefulness as possible, compressed symbolic logic, that’s able to compute as many states as possible. Less resource use on RAM and storage as available chip cycles continues to grow
    Human biology is a min/max engine, evolved for years to “enough” food, shelter… and we see daily there is not static state to reality. Our social behavior to try and preserve state is some cognitive dissonant “War is Peace.” insanity
    Math is a compressed form of communication. “Add two apples to the set of all apples” is literally more glyphs than the same statement in usual math syntax. Paper and written literacy was hard to come by back in the day, but verbal literacy was cheap and ubiquitous.
    The economy is built on physical statistical measures. Such a framework to keep TP and food on shelves is the minimal state needed to keep people from rioting. Economy has nothing to do with historical story and political memes at this point.
    We could look further into computing in other substrates, like mycelia. Biochemically shape structure to represent certain states.
    Computing is a concept not coupled to contemporary ignorant business machines. We made a social monolith around computing by propping up big tech
- unraveller 11 days ago
  what are those other applications? so now everyone has SoTA warehouse scheduling in their pockets no one can be expected to do anything or go anywhere inefficiently.
  [-]
- jadbox 11 days ago
  Remarkable indeed, I've been toying with many smaller, unorthodox use-cases for LLMs and continue to be surprised.
  [-]
  - smcin 11 days ago
    I want to see an optimal (2P) speedrun with Sonic the Hedgehog and Tails. As a fun proof. There are two players working together; also it has more degrees-of-freedom than a Sokoban.
  - jasonjmcghee 11 days ago
    Can you say more? Where can I read about some of these alternate use cases?
    [-]
    - ebri 11 days ago
      yes I want to learn about this too! Hadn't thought about it like that.
  - littlestymaar 10 days ago
    It uses transformers but it's not an llm though.
yeldarb 11 days ago
> While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A∗ search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A∗. This model is then fine-tuned via expert iterations to perform fewer search steps than A∗ search while still generating an optimal plan. In our training method, A∗'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10× smaller model size and a 10× smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.
Neat; TIL about Sokoban puzzles. I remember playing Chip's Challenge on Windows 3.1 when I was a kid which had a lot of levels like that.
[-]
- Modified3019 11 days ago
  It's definitely a fun little puzzle.
  I was introduced to it ages ago via "Wyvern" (https://web.archive.org/web/20040225005408/http://www.caboch...) where I'd basically sit around playing it and chatting.
  Wyvern was heavily influenced by Crossfire (https://crossfire.real-time.com/) which had a not so functional version. Crossfire itself was influenced by nethack, which apparently also has sokoban levels.
- passion__desire 11 days ago
  Jonathan Blow is working on "Sokoban"-Inspired Puzzle Game. I am looking forward to its release.
- smcin 11 days ago
  Sokoban puzzles have a discrete set of actions, typically 8 (move U/D/L/R ; drag U/D/L/R), on a discrete grid.
  [-]
  - airstrike 11 days ago
    more like push than drag tbf
nextaccountic 11 days ago
Due to the no free lunch theorem [0], any search algorithm that makes some problems faster will necessarily make other problems slower. How does the worst case for an algorithm like this look like?
I think that part of the appeal of A* to me is that I can readily visualize why the algorithm failed at some pathological inputs.
[0] https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_op...
[-]
- kevindamm 11 days ago
  Well, it's still using A* and state exploration to validate the plans generated. But, it's generating plans nondeterministically so it is possible it could run in O(\inf).
  Why didn't they give more details about the Sokoban puzzles being solved? The standard benchmark is the 90 puzzles in XSokoban and there's a Large Test Suite that has a few thousand but many of them haven't been solved by any search or planning system. Even the 90 puzzles in XSokoban were only solved a few years ago (2020?) by Festival (using FESS, a feature-space search in addition to domain-space, state-space). Before that it had been about 20 years of only 57 or so levels solved via search.
  I see that they measure trace lengths but I would have really liked to see # of XSokoban levels and how many states were examined, to line up with existing research.
- blamestross 11 days ago
  Well you have to run the AI, which is significantly more costly than just running A*
  Also, the AI has a finite size, which means it has to be scaled up for bigger graphs. I doubt it would work at all for non-euclidian 2d graphs.
  The exciting thing is that they made an AI do a toy problem, not that it was efficient.
  But people are dumb and tech companies have brain control over them. "AI good and solve all problems"
  It reminds me of the "slime mold solves mazes" thing. It was interesting and there were even things to learn from it, but it wasn't magic and couldn't give us better general tools.
  [-]
- reaperman 11 days ago
  The "No free lunch" theorem does have some prequisites for it to actually apply.
  > There is no free lunch in search if and only if the distribution on objective functions is invariant under permutation of the space of candidate solutions.
- IanCal 11 days ago
  NFL theorem is frankly silly.
  It boils down to "you can't predict random numbers".
  I'm sure it's useful to have formalized but it has no real impact on real world situations because most problems we care about have some kind of structure we can exploit.
  [-]
  - nextaccountic 11 days ago
    Well when I asked "How does the worst case for an algorithm like this look like?" I fully expected the answer to be "in practice this never happens" (that is, there are bad cases but since we aren't drawing inputs at random, in practice we never hit them)
    But how likely is that?
  - woopsn 10 days ago
    Yes significantly. But take for example the stock market, which is obviously of interest, to OAI in particular, but which cannot over the medium-long term be extrapolated accurately from historical data.
teleforce 11 days ago
Previous post and discussions on HN:
Beyond A*: Better Planning with Transformers:
https://news.ycombinator.com/item?id=39479478