This technical blog post describes an automated LEGO sorting machine project, demonstrating freedom of expression through open publication, educational access through technical knowledge sharing, entrepreneurial freedom through business development, and participation in scientific culture. While the content does not explicitly advocate for human rights, it naturally embodies and practices several UDHR provisions including free expression (Article 19), right to work (Article 23), and right to education (Article 26).
Fascinating. It touches on the discolored and counterfeit parts but doesn't say how they are detected – I assume there was a lot of manual training of the neural net?
How do you deal with parts that are stuck together? I actually noticed one in your demo video, and was curious. This seems like it would be very difficult to classify, even in a sense to sort them into a "take these apart" bin.
Awesome. Do you have a video of it running at full speed? Also, the bin at the end is for all the pieces save the fake/discolored/technic ones & see statistics on the PC or is there a more elaborate sorting scheme? Watching the belt go I was kind of expecting the pieces to be sorted by color or something, which would look neat but isn't very practical, I assume.
Can you publish the details of the h/w-s/w interface...the only piece I grokked was the vgg classifier. How do you go from a physical Lego on the hopper to jpg to class label to the lego in the correct physical bin ? I'd like to do this myself. I don't have 2 tons but definitely some 10k pieces. Thanks.
Thanks for sharing such a cool build and helping keep alive a hope of mine. I dream of a day I have enough time/capital to build/buy a Lego sorter, a robotic Lego brick separator (perhaps using high-resolution ultrasound/radar to detect where to insert the separator and where to push), pair that with an automated storage system in a subterranean vertical tunnel with robot arms similar to a robotic tape library keeping track of all detected parts and minifigs according to BrickLink categorization. Let the system keep it all organized (for example, bin overflows into multiple bins are automatically tracked as a single part and color combination), and I even have the choice to have it dump a random assortment into a big laundry-size bin, and build like a kid again, yet have it clean up after itself once I'm done.
I would imagine that this is a hobby project and you're losing cash on it. But what would be the parameters of a profitable business? At what level of scale (if any) would it have to operate? And is there a lot of competition in this space?
Really awesome. What Im dying to know though is some stats on the profitability. On average what sort of groupings of parts do you get from the bulk Lego and what do they sell for vs what you paid for them? Is there a variance is the quality of the bulk lots? I presume once you've sorted out the rare Lego you could just resell the common stuff as another bulk lot, but if everyone does that how do you avoid buying stuff that has already had the rare pieces filtered out?
Question: Were you able to utilise any data about Lego parts from Lego's own catalogues (current and historal) or technical specifications? It sounds like you trained the classifier manually. I imagine if you want to sort into sets you need to know what makes up a particular set.....does Lego provide an API or anything regarding parts/sets?
Further to that, if you have pricing data on sets you have a nice little optimisation problem - given my metric ton of parts, what are the most valuable complete sets I can make?
Thank you very much for this fascinating post, nice work.
Did you use any other resources to learn about deep learning besides http://course.fast.ai/? I'm looking to get started learning, and wondered what the best way forward would be.
Heh, that looks like a ton of fun, sorry you lost your van though! Also interesting to know that the pile of Lego Technic parts I've got from my lego bot building days actually might have some resale value :-).
Lots of interesting questions come to mind though, in that if you have two bits of Lego that are attached, what bin do you put them into? And have you looked at ways to automatically disassemble Legos? And did any of your purchases have Legos that were superglued together? (as is done in some displays.)
I work for a mill that cleans and sorts grains and beans (taking the rocks out, stems out, etc.), and it's fascinating to see the parallel invention of something really similar! We have a bunch of different steps:
1) Air is blown through the product and any dust is taken out.
2) The product is run through a bunch of screens that take out anything too big or too small.
3) The product is put through a gravity separator to separate based on mass.
4) Finally, the product is put through an optical sorter (https://www.youtube.com/watch?v=O0gWUeqzk_o) which uses blasts of air to push out unwanted materials from a stream of falling product.
I'm sure you could use the same process for Legos. Not sure about how to distinguish between branded and unbranded Legos though.
One thought: I'd think creating a similar solution would make an amazing semester course for University students.
Maybe you package stuff up nicely and give it away as a course, or try and sell the plans as a course to one of the coding schools or large Education companies?
Lots of comments on here about the software, but I'm really fascinated by the hardware. Where did you get the conveyor belts and how much did they cost?
For the belt that lifts item up out of the hopper, I notice there's a little white hook (or platform, not sure what to call that) jutting out that does the actual lifting of the legos. How did you get the size of that right? Did you install that jutting-out part, or did it come pre-attached to the belt?
What tools are you using to make a computer do the actual belt rotation? I'm wondering how low-level it is - are you spinning the steppers directly or did the conveyor belts come with some kind of API? I'm guessing the belts don't have a USB port for easy control.
Bootstrapting rocks to speed manual labeling. I got to full unsupervised on coin designs and angles by augmenting with many different lighting angles with ws8211 led strips and correlating the angles. I almost can with the dates, but it's so easy to finish with bootstrapping. See http://www.GemHunt.com/dates for the 100% unsupervised classes.
My 9yo son is willing to give you his life savings of $41.56 to have an at home kit of this machine :)
I've played with OpenCV and tried for fun to train a HAAR cascade classifier to recognise a minifigure. It didn't work which made me realise one has to really understand under the hood of machine learning like this in order to give it good training data.
I am kinda boggled that you thought "Huh, Lego, think I'll get into that" and immediately ordered two metric tons of Lego. o_O
I get that you thought (for some reason) that you would only win some small fraction of your bids, but ordering, say, a quarter-ton of Lego at a go isn't reasonable either. The whole episode is pretty hilarious.
This is amazing ! I am currently struggling to sort properly a few Technic sets (roughly equivalent to 5-6 shoeboxes), and one of the biggest challenge besides sorting, is to find boxes that are large enough to store the individual types of pieces. Any ideas ?
It's basically bootstrapped. Use the machine to sort a few kilos, pick out the mistakes, update the training set, let it train overnight, rinse, repeat. After many such cycles it is getting pretty good. The best part is when I think it has made a mistake but actually it is right and I'm wrong :)
They have a category all by themselves, get taken apart after the first run and then simply made to go through again in the next batch. The neural net is surprisingly good at classifying 'mess' as a category!
Sorting by color is useless unless you want to go and sort directly into sets (it can do that too but that's very much experimental, also it would take lots of bins and there are some details that are important to get right that are almost invisible to the camera).
A full speed video is super hard to shoot because you can't follow the parts as they move, they basically disappear because of the air puff being so short that a part is there one frame and gone the next. Right now classification takes about 30 ms, that's the limiting factor because that belt keeps moving during that 30 ms so you need to be 'back' at the camera before it moves so far that you can't stitch the next image to the previous one.
Another limiting factor is the relationship between the two belts, the second belt can only go so fast before the precision of the puffers starts to be insufficient to aim the parts into the right bin, they also carry too much forward momentum, and the second belt needs to go many times faster than the first in order to spread out the parts sufficiently. Yet another problem related to that: if you look at the video you'll notice that one of the little parts grabbers on the belt can push ahead of it quite a bit of stuff, if fortune is against you all of that lands on the belt in one go. By making that belt go slow it creates just enough pause between parts to be able to separate them with air without pushing the wrong part off the belt. It pays off to leave some safety margin there so I tend to set the second belt a bit faster than optimum and the first one a bit slower. That way the accuracy goes up quite a bit.
It took lots of experimenting and tweaking to get to this stage.
About 1 part per second is a practical upper limit right now, it can go way faster than that but then it starts dumping stuff all over the room :)
I'll see if I can shoot a video at a higher speed than the one in the post right now.
Well, yes. It is definitely not going to make money if I count my time. But I learned an awful lot about machine learning and the present state of computer vision, far more than if I had tried to do that on something abstract. And when I look at the Lego bought-and-sold it seems to work out ok.
As for competition, yes, plenty, but all manual. Scaling up now that I have the software working is definitely an option but I have a good set of very well paying customers and not much can compete with that.
Hopper belt to camera belt is just a speed difference that causes the parts to become nicely spread out (at least, you hope so!), the camera stitches together frames to scan the part so a part larger than a frame can be scanned. Once the end of the part is detected it gets fed into the classifier which returns part id, category and color. Depending on what sort is set up a part then gets pushed into one of 7 bins, these periodically are emptied into larger bins and bags.
If necessary a lot can be pushed through the machine twice for instance to sort parts by length or to pick out sets (that last bit works in theory but in practice there are a lot of problems to overcome because of the limited number of bins to deposit into).
As for the hardware, there is a nifty little camera with a macro lens that connects to the USB port (noname Asian stuff), it has a 10x magnifying lens, a pololu servo/gpio to USB card to drive the relays and a Sainsmart 16 port relay board to drive the solenoids for the air valves.
The software is all in python with a generous amount of help from the people who wrote numpy, opencv, keras and theano.
The error rate is between 3 and 5% depending on how fast I set the machine, there are a number of sources for the errors, obviously classification errors, also sometimes two parts are too close to each other and even if the classifier got them right the airpuff for one pushes the other of the belt as well. To minimize this effect I keep the airpuff super short, on the order of 10 ms, which is about as fast as the solenoids can open and close reliably, but it does mean that if it misses even by a bit there is nothing to be done about it and that part will land in the 'other' bin.
That error rate is still too high but with every run the classification errors go down and that's the main component.
One nasty little problem was that I spaced the puffers too regular in the first iteration which meant that sometimes the parts would line up just so in the order in which they came under the camera so that more than one puffer would be active at once leading to a reduction on pressure and no parts would be pushed off the belt. That was a tricky one!
> Were you able to utilise any data about Lego parts from Lego's own catalogues (current and historal) or technical specifications?
I tried, but in the end a straight up train-correct-retrain loop took care of all the edge cases much quicker and much more reliable than any feature engineering and database correlation that I tried before. This is roughly the fourth incarnation of the software and by far the most clean and effective. HN pointed me in the direction of Keras a few weeks ago, that coupled with Jeremy Howard's course gave me the keys to finally crack the software in a decisive way.
> It sounds like you trained the classifier manually.
Only the first batch, after that it was mostly corrections. What it does is while it classifies one batch it saves a log which gives me more data to feed the classifier with for the next training session. There are so few errors now that I can add another 4K images to the training set in half an hour or so.
> Further to that, if you have pricing data on sets you have a nice little optimisation problem - given my metric ton of parts, what are the most valuable complete sets I can make?
I'm on that one :) And a few others that are not so obvious. There is a lot to know about lego. Far more than you'd think at first glance.
2) the 1000 most common lego parts, 'other' and 'mess'. In the end the idea is to get to 20K classes and to sort directly into sets. This is very much a pipe dream at the moment but I think it is doable given a large enough set of samples. The problem is that you have to see all those parts at least a hundred times or so before it gets detected reliably.
3) too little :( The training data is still woefully insufficient but it is now good enough to bootstrap the rest. This took a while to achieve because without any sorted lego to begin with you have nothing to train with. So the first 20Kg or so were sorted by hand and imaged on the sorter without any actual sorting happening (everything into the run-off bin), then labeling the results by hand until the accuracy of the test set (500 parts or so) went over 80%. That was a week ago and since then it's been improving steadily day-by-day.
4) one training run per night, typically a few 100 epochs on the current set but, this will change soon. The machine is now expanding the training set rapidly with associated improvement in accuracy. This means that the training sessions are taking longer and longer but I'll be running fewer of them. What I'll probably do is offload training to one machine which will drop off a new set once per week or so and inference on another which is doing the sorting and capturing the new training data.
Checking the logged images for errors still takes up a bit of time though, but with the current error rate that is very well manageable. (Before it was an endless nightmare).
That was my starting point but I'd already played with neural nets when they first came out so that helped and I also did a lot of opencv stuff without any neural nets.
After that it was mostly googling each and every term that I didn't understand until it all started to make sense.
course.fast.ai is probably the fastest way to get something concrete going which is very useful if you need that instant gratification kick to keep going.
There are a lot. There are two main specialized marketplaces, Bricklink and Brickowl, with hundreds of sellers (most of them operate on both sites). Most sellers also have a brick and mortar shop, others only operate online.
Most fun I've had programming in years. Finally something where I don't have to worry right from the get-go if it is secure or not.
> sorry you lost your van though!
So am I. I had a ton of work in that thing and even if the insurance covered the value they did not give me back the many weeks I spent building it.
> Also interesting to know that the pile of Lego Technic parts I've got from my lego bot building days actually might have some resale value :-).
It'd be better if you had some really nice boxed sets from the 60's ;)
> if you have two bits of Lego that are attached, what bin do you put them into?
'Other', then pick them apart and run them through again
> And have you looked at ways to automatically disassemble Legos?
Yes, but this is very hard to do without damage.
> And did any of your purchases have Legos that were superglued together?
I've seen a few bit here and there but for the most part that doesn't happen. Kids are pretty destructive though so you have to count with a good %age of damaged / unusable parts.
I tried faster but then it is pointless you simply can't track the camera fast enough from the hopper to the bin where it will end up. Hope that is satisfactory :)
What is the %age by weight of 'trash' versus 'good stuff' for such a sorter?
I do use screens for various pre-sorting stages, not shown in the article. The sorter is only good for parts up to 40 mm and anything that isn't a wheel or round so it will roll away while being imaged.
That's by far the bulk though so for me if it does that part well it is already more than worth it.
Branded/unbranded: spectrum is different (far more different than you would say by looking at it with the naked eye), weight does not match for the part (though this can be very close with really good fakes), logo on the studs is different.
I've been thinking about doing that gravity thing, but a bit more fancy, rather than just a binary sort to shoot parts in several directions, an alternative is a spiral slide under a steep angle where parts are fed in at the top and ejected when they reach the right bin.
That's a lot more complicated to make than what I have right now mechanically and also the time available for a classification operation would be much shorter, but it would allow for a much larger number of output bins without taking up a whole lot of space. So maybe a next generation, if I still need it (this one is going through piles of lego now).
High-speed optical sorters are the sort of behind-the-scenes tech that make me feel like I'm living in the future. I have vivid memories as a child watching people winnow rice by hand using wide, flat bamboo baskets.
Had I seen something like the optical sorter back then, I would have thought it (Arthur C. Clarkeian) magic!
If you look closely at the belt you can see the traces of many failed experiments before I found a shape that worked without accidentally getting stuck on a part.
It is attached with super glue to the belt. I use the narrowest parts because that way it doesn't end up fighting with the curvature of the belt when it goes over the roller.
The belt rotation is done with a 3 phase AC motor hooked up to an inverter for the vertical belt, the camera belt is driven by a DC motor hooked up to a variable power supply.
So no steppers, that would have made life a bit easier because then I'd know (modulo some slippage) where the belt is positioned. So now I have to reconstruct that optically, hence the wavy line on the belt.
Every collector of Lego sooner or later becomes a collector of storage systems :) (Don't ask me how I know that...).
I use some relatively cheap plastic sliders stacked 10 high, parts go by length from the top down and by width left to right. Then there are departments for minifigs and associated parts as well as irregular stuff like base-plates and so on. Storage could easily be another blog post all by itself! It's a crazy problem.
For technic, which is many small parts I use small bins and bags inside the larger bins, but you probably could use a raaco rack or equivalent if you don't have too much of it.
You should have seen my face. Also, try to explain to your s.o. that you're about to buy an extra garage solely to house something that you have no idea how it will all work out and when - and if - it will ever go away again. And that was two years ago.
It really is hilarious. For me it's more or less business as usual though, I take lots of chances. Some work out and some don't. This one is still undecided.
L(ego)S(sorting)ASAS would be very popular with the families I know, a little van that pulls up outside, dump the lot in and get back the bits sorted by kits :)
I work in a steel mill we use a similar setup with an optical camera to size streams of Ore and Coke particles. One thing you could look into is using a vibrating feeder (sometimes called a 'vibro') this is what we use to stop screens from 'pegging' - similar issue to the bridging problem mentioned in the article.
> it's fascinating to see the parallel invention of something really similar!
This isn't parallel invention - the principal of optical sorting and air ejection are well known and understood. (Which is not to lessen the achievement in building this, but building and inventing are not the same thing.)
Dunno who here is old enough to remember but back in the day every bag of chips used to contain a few burnt chips. Well, thanks to those computer vision air blasting sorting machines theres no more burnt ones in the bag! They all get air blasted out and now chips are uniform.
Content is educational, sharing technical knowledge about machine learning, image processing, and problem-solving methodology. Freely accessible technical education.
FW Ratio: 60%
Observable Facts
Post explains: 'After messing around with carefully crafted feature detection, decision trees, bayesian classification... I've finally settled on training a neural net'.
Text describes technical implementation: 'VGG16 model but with some Lego specific tweaks and then trained on large numbers of images'.
Content provides detailed methodology for solving technical problems in automated vision systems.
Inferences
Sharing detailed technical explanations provides educational value and supports right to education.
Freely accessible technical content enables readers to understand and learn advanced engineering approaches.
Content describes building an automated business venture, representing self-directed entrepreneurship and freedom to choose and pursue work.
FW Ratio: 60%
Observable Facts
Post states: 'I figured this would be a fun thing to get in on and to build an automated sorter'.
Text describes business model: 'buy lego in bulk, buy new sets and then part this all out or sort it... into more desirable and thus more valuable groupings'.
Content describes self-directed problem-solving and iterative development of a business venture.
Inferences
Description of self-employment reflects right to work and choose occupation freely.
Entrepreneurial effort demonstrates freedom to pursue meaningful work of one's choice.
Content describes participation in technical and engineering culture, shares scientific knowledge about machine learning and computer vision, contributing to scientific advancement.
FW Ratio: 60%
Observable Facts
Post describes LEGO community participation: 'After a trip to lego land in Denmark I noticed how even adults buy lego in vast quantities'.
Text discusses machine learning approach: 'training a neural net and using that to do the classification... VGG16 model'.
Content shares technical innovation in engineering and making culture.
Inferences
Participation in technology and engineering culture represents engagement in scientific advancement.
Sharing machine learning methodology contributes to collective scientific knowledge.
Content describes buying and selling LEGO as commercial property. Discussion is factual, neutral, demonstrating property rights exercise.
FW Ratio: 60%
Observable Facts
Post states: 'I put in some bids on large lots of lego on the local ebay subsidiary'.
Text discusses LEGO market values: 'sets do roughly 40 euros / Kg and that bulk lego is about 10, rare parts and lego technic go for 100's of euros per kg'.
Content describes business model of acquiring and reselling LEGO parts for profit.
Inferences
Exercise of buying and selling demonstrates property ownership and commerce rights.
Discussion of market values reflects participation in functioning property rights system.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 11:31:12 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.