LegoGPT: Generating Physically Stable and Buildable Lego

(avalovelace1.github.io)

416 points | by nkko 9 hours ago

29 comments

haberman 7 hours ago
> To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints.
I'm far from an AI expert, but I've long felt that this is one of the most interesting ways to use AI: to generate and optimize possibilities within a set of domain-specific constraints that are programmed manually.
For example, imagine an AI that is designed to optimize traffic light patterns. You want a hard constraint that no intersection gives a combination of green lights that could cause collisions. But within that set of constraints, which you could manually specify, the AI could go wild trying whatever ideas it can come up with.
At that point, the interesting work is deciding how to design the problem space and the set of constraints. In this case it's a set of lego bricks and how they can be built (and be stable).
[-]
- benterix 7 hours ago
  > to generate and optimize possibilities within a set of domain-specific constraints
  Well, yes, we've been doing this for several decades, many people call it metaheuristics. There is a wide array of algorithms in there. An excellent and light intro can be found here: https://cs.gmu.edu/~sean/book/metaheuristics/
  [-]
  - eurekin 6 hours ago
    Metaheurestics? I always thought it's similar to "I don't know how many neurons to put in the hidden layer... and I also don't know how many hidden layers I need, so, let's make it a part of the optimisation problem to find out on it's own".
    [-]
    - PeterStuer 5 hours ago
      That is usually called Hyperparameter tuning.
    - benterix 4 hours ago
      As for hyperparameter tuning, the existing solutions such as Optuna or Katib (in KubeFlow) also use metaheuristics, e.g. CMA-ES.
  - mzl 3 hours ago
    Or more generally the whole field of combinatorial optimization, of which metaheuristics is a (small) part.
  - jllyhill 5 hours ago
    Thanks, but some strange coincidence this is exactly the book I have right now. In the introduction the author says, "I think these notes would best serve as a complement to a textbook". Do you happen to know any good textbooks on that topic?
- lolinder 1 hour ago
  A simple version of this that already shines with existing LLMs is JSON Schema mode. You can go quite a long way towards making illegal states unrepresentable and then turn a model loose in the constrained sandbox, with the guarantee that anything it produces will be at least valid if not correct: it's basically type safety for LLM output.
  The same mechanism that underlies JSON Schema support can be applied to any sort of validation and correction, and yeah, I'd love to see more of this kind of thing!
- lgiordano_notte 1 hour ago
  Agree with this. Constraining generation with physics, legality, or even tooling limits turns the model into a search-and-validate engine instead of a word predictor. Closer to program synthesis.
  The real value is upstream: defining a problem space so well that the model is boxed into generating something usable.
- zelos 6 hours ago
  You'd probably use some kind of MILP or CLP based model for that kind of thing, wouldn't you? The constraints define the search space and the solver algorithm then explores it.
- Narew 7 hours ago
  I haven't read how they apply the constraint. But there is similar stuff when you force llm to generate structured output like Json format. llama.cpp allow to match a custom grammar for example.
- bob1029 2 hours ago
  Error feedback seems to be the one thing that can unlock some of the original promises.
  For example, if you give a text-to-SQL bot access to the same idea (e.g., error feedback from the SQL provider), it is much more likely to succeed in generating valuable queries.
- KurSix 4 hours ago
  Totally agree, this is where AI shines the most for me too. Let humans define the rules of the game (like physics or traffic safety), and let the AI explore the massive search space for optimized solutions.
- londons_explore 6 hours ago
  Fun thing to try:
  Ask an LLM: "Say the word APPLE", but modify the code so the logits of the token for Apple/apple/APPLE is permanently set to -Inf - ie. the model cannot say that word.
  The output ends up like this:
  "Banana. Oh, just kidding. Banana. Oh, it's so tasty I said it wrong. Lets try again: Orange. Whoops, I meant to say grape. No I meant to say the tasty crunchy fruit known as a carrot".....
  [-]
  - londons_explore 5 hours ago
    Note that OP's traffic light problem would suffer the same problem.
    Ie. a smart model, knowing it cannot say a word, will give the next best solution - for example maybe saying "A P P L E" or maybe "I'm afraid I'm not able to do that".
    However, a constrained model does not know or understand its own constraints, so keeps trying to do things which aren't allowed - and even goes back and tries to redo these things which aren't allowed, because to the model it is a mistake which needs correcting.
    [-]
    - adammarples 4 hours ago
      There's a whole field of solving constrainted optimization and it doesn't really work like that, but they don't use LLMs.
  - jcims 3 hours ago
    Like your brain when you know you know a word but it's just not surfacing in your mind.
    I'm guessing I'm not that different from the average human and I can 'feel' something physically while I'm searching for the word. I've always wondered what that was.
- jgalt212 3 hours ago
  like Combinatorial Chemistry, but we should probably just call it AI Chemistry for the likes.
  https://en.wikipedia.org/wiki/Combinatorial_chemistry
sschueller 8 hours ago
This is probably going to get a letter from LEGO's lawyers.
If you want to be safe do not use the word LEGO. Use Bricks or in German "Klemmbausteine".
Many people have had to deal with LEGO's lawyers and it ain't pretty.
[-]
- necovek 2 hours ago
  They are actively using actual LEGO bricks, and as such they are not misrepresenting anything.
  Where there is gray area is in them not clearly stating they are unaffiliated with LEGO the company.
  OTOH, they also don't seem to be looking to monetize anything, so they are at lower risk from LEGO having a plausible claim that they are hurting their sales.
  [-]
  - dec0dedab0de 1 hour ago
    While it is perfectly valid to describe what they made as a designer or builder for LEGO, I do not believe they are allowed to use part of a trademark in a way that could be trademarkable itself, so basically good for everything but the name.
    But then again IANAL, and that is just how I understand the American law, and every country is different.
- amelius 5 hours ago
  This is academic research, and I suppose it falls under fair use.
  [-]
  - msiebuhr 2 hours ago
    IANAL, but EU law doesn't have "fair use". It does have a _very specific_ set of uses where you don't have to ask for permission (or pay). As I understand, it is more limited than the US' "fair use" doctrine.
    EU being EU, I can only imagine there's a bunch of particular rules around research that may or may not work in the authors' favor.
- KurSix 4 hours ago
  Even YouTubers and small hobby sites have gotten takedown notices just for using the name in the wrong context
- edoceo 8 hours ago
  Why are they like Nintendo when they could be like Sega? Embrace your community where they are.
  [-]
  - Freak_NL 6 hours ago
    Trademark law leaves no space for that. The Lego Group has to actively defend their trademark. That means a name like LegoGPT is really on the obvious end of 'don't do that'.
    [-]
    - MrOrelliOReilly 5 hours ago
      Completely agree. This should be well beyond accusations of corporate bullying. It's one thing to mention Legos, it's another to actively include a brand name in your product! NikeGPT, CocaColaGPT and IkeaGPT will face the same issue ;)
      [-]
      - Freak_NL 4 hours ago
        Mentioning Lego is absolutely OK, and you can sell used Lego as well and note that you are using genuine Lego bricks (resale laws simply allow that). Lego is really antsy about anything which might look like it is actually a Lego Group initiative though, and anything where Lego bricks are offered for sale in a modified state¹.
        1: Never, ever, sell modified Lego bricks: https://www.brickfanatics.com/lego-wins-court-case-against-c...
    - makeitdouble 6 hours ago
      https://www.eff.org/deeplinks/2013/11/trademark-law-does-not...
      [-]
      - shakna 5 hours ago
        Whilst they don't need to attack every form of speech, a name like 'LegoGPT' is not protected.
      - dudeinjapan 5 hours ago
        If LEGO's lawyers agreed with this article, they'd be out of business!
    - cluckindan 5 hours ago
      The registered trademark is LEGO, in all caps.
      Also, they don’t tend to go after fan-made things like this, based on some googling they typically throw the book at counterfeit producers who are eating into their profits.
      [-]
      - Freak_NL 4 hours ago
        (Initially) fan-made stuff which gets big enough to get noticed usually won't be able to call themselves something with 'Lego' in it. Usually some variation of 'brick' is used instead (e.g., Bricklink, Rebrickable, EuroBricks, etc.).
  - makeitdouble 6 hours ago
    Sega's [0] main business is pachinko (so gambling). To them Sonic brand being used by fans has very little consequences, if not building most needed goodwill toward their other brands.
    That's where Nintendo is fundamentaly different.
    [0] https://en.m.wikipedia.org/wiki/Sega_Sammy_Holdings
    [-]
    - andrewchilds 4 minutes ago
      TIL the pachinko connection perfectly explains the visual/sound/game design of the Sonic games.
    - MrsPeaches 5 hours ago
      Interesting that many of Sega’s games are now mobile focused.
      Also don’t forget that Sega was “originally an importer of coin-operated arcade games to Japan and manufacturer of slot machines and jukeboxes”
      https://en.m.wikipedia.org/wiki/History_of_Sega
    - foobahhhhh 3 hours ago
      Did that start with that merger in 2004 so back in the Sonic heyday it wasn't in to gambling?
      [-]
      - makeitdouble 3 hours ago
        Yes.
        Sega was mostly into normal arcade games, and Sammy baught them for their expertise to improve Sammy's much more profitable gambling machines. It's Sammy's CEO that took the lead, and Sonic and console games became a mere side business.
  - ygouzerh 3 hours ago
    They probably have a culture of "patents".
    They just won the market because historically they reused existing locking bricks concept from a company called Kiddicraft, found a way to make it more lockable... and patent it before the original company and other companies could implement it.
    We can say that they became famous half fir engineering reason, and half from their legal department...
  - Perz1val 6 hours ago
    > Embrace your community where they are.
    In the casinos?
  - raverbashing 8 hours ago
    (Not saying it's related) But, which one of those are still running?
    [-]
    - vanderZwan 7 hours ago
      Both are. Sega just lost a console war a few decades ago and decided not to pursue that any more.
    - Cthulhu_ 7 hours ago
      Sega is generating more than $1.5 billion a year, they're fine.
    - Philpax 6 hours ago
      I'd go as far as to say that Sega's embracing of their fans is a big part of why they're still around: https://en.wikipedia.org/wiki/Sonic_Mania#Development
    - 71bw 7 hours ago
      Both?
- ChrisRob 6 hours ago
  Immediately thought the same thing! This will get busted very soon
RaSoJo 5 hours ago
I don’t need automation to build LEGO sets — that’s the fun part, and I want to do it myself. What I need is automation after the build: to clean up, sort the bricks by color and shape, and store them properly.
I just wish scientists would start by solving problems that actually exist in the real world. There’s real value — and real money — in that.
[-]
- lee-rhapsody 1 hour ago
  The issue with solving real-world problems is that it distracts from publishing, which is all scientists are taught to care about.
- KurSix 4 hours ago
  You're totally right: sometimes the real innovation isn't in making the fun parts easier, it's in making the boring parts disappear
stevage 6 hours ago
This does not seem like a very impressive result. It's using such a small set of bricks and the results don't really look much like the intended thing.
It feels like a hand-crafted algorithm would get a much better result.
[-]
- tokai 2 hours ago
  The (fake) texturing is the only thing making it somewhat work. As normal colored bricks it would just be lumps of lego.
- KurSix 4 hours ago
  But I think the cool part here isn't photorealism, it's the combo of language understanding and physical buildability
- nkko 3 hours ago
  this is very cool considering it is a fine-tuned 1B model
- otabdeveloper4 6 hours ago
  What we need is an AI where you feed it some photos of your pile of bricks and it generates you instructions based on the bricks you have.
  (Totally feasible with today's technology, but you'll need to train your own specialized models.)
  [-]
  - dspillett 5 hours ago
    There already exists an app that will, from photos of your pile, pick out models you can make from a large library of existing models. Though IIRC that has been around long enough that it isn't quite using what people are currently calling AI (instead using older ML techniques for brick identification, and a basic DB search to pick out the valid plans for the resulting list of bricks).
  - stevage 5 hours ago
    https://au.lifehacker.com/toys-board-games/50948/news/scan-y...
  - amelius 5 hours ago
    What I'd be interested in most is a robot that can assemble a model from a pile of bricks/parts.
jader201 8 hours ago
There’s a bug on the page (on iPhone, at least) once you scroll to the gifs that it starts to auto load them without doing anything, making it really hard to navigate anywhere at that point.
[-]
- Aeolun 8 hours ago
  When will people finally learn to never autoplay.
  [-]
  - vachina 7 hours ago
    Autoplay is fine, it’s Safari opting to autoplay in FULLSCREEN. Firefox et. al. play them in the respective video containers.
- pragmatick 8 hours ago
  The opposite for me on Firefox Desktop - I didn't realize they were gifs and wondered what the pictures were supposed to tell me.
- MangoTec 5 hours ago
  this should be fixable with `playsinline` on the video element: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
  annoying that this is the default behaviour on iOS though
yathaid 8 hours ago
This is super cool! The GIFs showing the object being built are just yummy; I have no other way to describe it.
If anyone else was searching for the dataset, it is at https://huggingface.co/datasets/AvaLovelace/StableText2Lego
It contains " contains 47,000+ different LEGO structures, covering 28,000+ unique 3D objects from 21 common object categories of the ShapeNetCore dataset".
Local inference instructions are over at their github page - https://github.com/AvaLovelace1/LegoGPT/?tab=readme-ov-file
gilgoomesh 8 hours ago
It's hilarious watching $50,000 worth of robots take so long to assemble a couple dollars worth of Lego. It's like peering into the old folks home for robots.
[-]
- KurSix 4 hours ago
  Give it a decade and we'll probably have robo-builders doing it faster than we can blink…
- Zobat 8 hours ago
  People claim that Lego is expensive, but try buying a robot that builds Lego...
  [-]
  - bombcar 8 hours ago
    You build the robot out of Lego.
    [-]
    - wiz21c 7 hours ago
      They should have done it with lego mindstorm :-)
- FirmwareBurner 8 hours ago
  That should tell you why stuff is still hand assembled in Asia instead of by robots in the west.
  [-]
  - femto 7 hours ago
    As a counterexample, I offer a pick-and-place line in action.
    https://youtu.be/Ca-SoKzjh4M?t=110
    SMT component placement isn't that different to placing bricks. Conventional wisdom is that if you can design a PCB that requires no manual work, its assembly cost is more-or-less location independent. SMT pick and place can hit speeds of 200,000 components per hour [1]. That's about 50 components per second.
    [1] https://www.hallmarknameplate.com/smt-process/
    [-]
    - FirmwareBurner 5 hours ago
      The tasks requiring high dexterity like final assembly of the product with displays, keyboards, ribbon cables and cases is still done by humans by hand.
    - imtringued 6 hours ago
      Fixturing isn't automated in most places. Sure a gantry style CNC machine can drive screws vertically into your parts to join them, but it requires a human loader to put the two parts onto the fixture in the first place.
  - smikhanov 7 hours ago
    Also why it’s OK to stop worrying about our future robotic (or AI) overlords.
    [-]
    - FirmwareBurner 7 hours ago
      Those are already an issue. AI is a bigger threat to cognitive tasks than to physical ones.
      Skynet isn't goanna attack you with Terminators wielding a "phased plasma rifle in the 40W range", but will be auto-rejecting your job application, your health insurance claims, your credit score and brain washing your relatives on social media.
      [-]
      - davidthewatson 2 hours ago
        This reply is so perfect I'm going to memorize it for family and friends.
      - smikhanov 6 hours ago
        Absolutely, that’s without any doubt.
        There’s a difference though. The “cool” Terminator Skynet pursues its own goals, and wasn’t programmed by humans to kill. The “boring” insurance-rejecting Skynet is explicitly programmed to reject insurance claims by other humans, unfortunately.
        So still, no need to worry about our AI overlords, worry about people running the AI systems.
      - imtringued 6 hours ago
        > AI is a bigger threat to cognitive tasks than to physical ones.
        I don't see how you could possibly think this is true. Physical automation is easier to scale since you only need to solve a single problem instance and then just keep applying it on a bigger scale.
        [-]
        FirmwareBurner 5 hours ago
        Automation doesn't work where high dexterity and quick adaptability is required. You can much cheaper and quicker to train a human worker to move from sewing a Nike shoe to an Adidas shoe than you can reprogram and retool a robot.
        Robots work for highly predictable high speed tasks where dexterity is not an issue, like PCB pick and place.
psiops 7 hours ago
I noticed that "a basic sofa" involves some placing some floating bricks if built in the order of the animation. It hints at the way this model generates the designs. The automated assembly of generated LEGO structures using robots would have serious trouble creating these designs I reckon.
[-]
- sdoering 6 hours ago
  I came here to say that. I immediately thought: Wow, this works in the assembled version, but not the way the assembly is being animated. You would need to first build the base sofa layer from two levels so that the upper layer keeps the lower layer bricks in place. Only afterwards could it be put onto the legs.
  [-]
  - paulluuk 3 hours ago
    Indeed, I would be very curious to see how their robots would actually build that sofa. Although the robots aren't really part of the model of course, they're just a little extra.
dwighttk 3 hours ago
Quit trying to read the article after the 15th video went to full screen and had to be dismissed hitting the tiny x in the upper left… 3 more interfered with me trying to go back to this page
[-]
- beklein 3 hours ago
  You can also grab the paper as a PDF here: https://arxiv.org/pdf/2505.05469.
  Keep in mind that these sites are run by AI researchers, not dedicated UX teams at major tech companies—so the interface can feel a bit rough around the edges. That said, your critique is still valid; it’s just fair to cut them a little slack given their priorities.
soared 1 hour ago
The high backed chair gif example is interesting - the way it’s animated it would completely fall apart and be unstable. But if you built it in reverse, it would work fine.
But it also shows the weirdness of the solution - in places where larger bricks make sense, multiple smaller bricks are used instead. In a section where a 2x6 should be repeated, in on instance of the repetition it uses tow 1x6s. It’s weird.
Cool idea.
kilimounjaro 7 hours ago
Doesn’t seem to add much to just converting a 3d model into voxels and therefore bricks.
Using bricks other than 2x2 and 2x4 blocks creatively to make interesting things is really important, i’m not sure what type if algorithm would best auto generate beautiful MOCs however? Was thinking of doing a $50000 kaggle comp for this, what do others think?
W0lfEagle 8 hours ago
Great. Please do cabinets next. Constrain to some specified material such as 2.5m by 1.25m 18mm ply. Iterate designs by text and output the model, cutlist and assembly instructions. Simple right?
9dev 8 hours ago
When I was a kid, I proudly exclaimed I wanted to become a professional lego builder. Not in my wildest dreams would I have assumed how close to that career path I could have come.
z3t4 4 hours ago
The results are a bit underwhelming, considering what I'm used to see in image generation, world generation in games, etc.
necovek 2 hours ago
I love this: it'd be great if it also worked to give age-appropriate designs as well (eg. 5 yo has less patience and ability than an 8 yo).
carstenhag 8 hours ago
Have the authors never heard of Lego being one of the companies that are super strict about their trademark? They file takedown notices etc on every project they see. Even if the stone design has the little thingies on top/bottom...
oaiey 6 hours ago
Real challenge are not LEGO System pieces but LEGO Technic pieces where you do not have to build layers bottom-up but everything inside out
yunusabd 7 hours ago
Cool project, but judging from the videos, it looks like some of them can't actually be built using those instructions. E.g. "A backless bench with armrest" would require some bricks to float in the air with no support while you're assembling the rest.
[-]
- biofox 7 hours ago
  The design is sound, just not the order of assembly shown. For the bench, the lower bricks are suspended by the upper ones, so would need to be assembled separately before connecting to the legs.
  [-]
  - Etherlord87 19 minutes ago
    Still, I think you wouldn't find such a design in actual Lego set, because such floating bricks make the construction weak.
  - yunusabd 6 hours ago
    True, easy for a human, not so easy for a robot to go through those extra steps. I wonder if they made it work with the robots, because in the video they only show the robots building from the bottom up.
vladde 7 hours ago
Looks amazing, with the arms building the piece!
However, the model "A high-backed chair" has some floating pieces in the middle of the seat, that are fastened from above. Can these robots handle building these?
josefx 6 hours ago
Does it have a sense of scale? Can it make a matching bookcase, table and chair or are they all going to end up with a random size?
londons_explore 6 hours ago
> voxelizing it onto a grid and applying legolization
I guess I learned a word today...
belter 3 hours ago
Can I have IKEA-GPT ? This weekend if possible?
[-]
- davidthewatson 2 hours ago
  Indeed. I'm guessing legal is the only reason we don't have 3d-printed ikea. Raymond Loewy FTW. But then, we'd have garages full of bespoke n-of-1 junk instead of mass-made LLM (liminal-labor-made) junk.
xeyownt 7 hours ago
So, besides training a LLM to generate build instructions for lego model, they have robots to assemble these models, and they applied 3D texture on 3D generated model (what for?).
Sometimes the amount of money and energy that are spent in "recreation" projects just amazes me.
[-]
- thenaturalist 7 hours ago
  You do realize that a system like LEGO is just an extremely efficient and cheap proof of concept with a proxy material (LEGO) for later real life applications of building X from standardized components Y right?
  This is interesting and seemingly quite applicable base research and we move forward by being curious.
benob 8 hours ago
It looks like it is an extension of 3d model generation techniques.
aayushmaan45 1 hour ago
wasd
aayushmaan45 2 hours ago
good
Traubenfuchs 7 hours ago
Is this aligned, safeguarded, censored?
Can it produce an ample bossom made of lego? And indecent protrusion? Weapons?
[-]
- b3lvedere 7 hours ago
  Weapons has been kinda done earlier: https://nostarch.com/flego https://nostarch.com/legoguns.htm
- foobahhhhh 3 hours ago
  [flagged]
nurettin 8 hours ago
This could prove useful for minecraft bot scene
[-]
- davidthewatson 2 hours ago
  Indeed. I thought blocks world stuff would be amazing for early childhood education. I'm guessing some labs are already there since minecraft supports user-programmable models for years though I dunno the details. I'd be happy to learn if anybody knows of their evolution since the rise of AI.