244 points by ks2048 3 days ago | 97 comments on HN
| Mild positive Editorial · v3.7· 2026-02-28 10:54:01 0
Summary Scientific Collaboration Acknowledges
The AdderBoard GitHub repository presents an open-source leaderboard and technical challenge to build minimal transformer models for integer addition, with MIT licensing, public code links, transparent verification methodology, and contributor attribution. The structure enables global scientific collaboration through free public access, no explicit eligibility restrictions, and open submission processes. The content is primarily technical—explaining transformer architectures and AI model optimization—and does not address most UDHR provisions directly, but its practices (openness, attribution, scientific focus, standardized verification) align implicitly with Articles 26-27 on education and science.
> In short: if you can swap in a different set of weights and use the exact same inference code for a different task, your setup is legitimate. If the inference code is inseparable from the algorithm, it's not.
I wonder why they don't just write the code themselves, so by design the focus can be on the model.
So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the former architecture, but starting with random weights and learning?
I was initially excited until i saw that, because it would reveal some sort of required local min capacity, and then further revelation that this was all vibe coded and no arXiv, makes me feel I should save my attn for another article.
The gap between 36 hand-coded params and 311 trained params is fascinating and honestly underappreciated. It mirrors something we see repeatedly in ML: gradient descent finds solutions in a fundamentally different region of parameter space than a human engineer would design.
When you hand-code the weights, you're essentially implementing a known algorithm (carry-propagation) directly into the network topology. But trained networks often discover distributed representations that spread the computation across more parameters in ways that are harder to interpret but more robust to input distribution shifts.
I'd be curious whether the 311-param trained model generalizes better to bases other than 10, or to addition with different digit counts than it was trained on. In my experience, the 'messier' learned solutions sometimes capture more structural regularity than the clean engineered ones, precisely because they aren't locked into a single algorithmic strategy.
I get that this is technically interesting, for certain, but the sheer amount of energy and associated global warming risk needed to do something with >=99% accuracy that we've been able to do easily for decades with a guaranteed 100% accuracy seems to me to be wasteful to the extreme.
Very cool, but can I suggest the `add` CPU instruction instead? Supports 64-bit numbers, and it's encoded in hardware, and no need to cross a PCIe interface into a beefy, power-hungry GPU and back again. And chances are it's cross-platform, because basically every ISA since the very first has had `add`.
For one the specific 36 parameter version is impossible without float64 so you might guess the corollary that it is not exactly amenable to being found by gradient descent. I think the question of how you can structure transformers and neural nets in general so that they can both very parsimoniously represent things like this and have it be amenible to learning by gradient descent.
I ask this question as someone who can't do much more than confirm that your blog post is written in English by someone who knows math.
Does this result suggest that if we had N clever humans manually building an LLM, they might come up with something as smart as a frontier model, but potentially 45 times smaller? (1644 / 36 ~= 45, N = very large, time not specified)
What would be an acceptable amount of energy to spend on something that someone has done in a different manner before? Would you rather we stick with all of the current known ways to do things.
Does this boil down to a condemnation of all scientific endeavours if they use resources?
Would it change things if the people who did it enjoyed themselves? Would they have spent more energy playing a first person shooter to get the same degree of enjoyment?
How do you make the calculation of the worth of a human endeavour? Perhaps the greater question is why are you making a calculation of the worth of a human endeavour.
Key Findings section synthesizes community discoveries: 'Parameter cliff at ~800', 'd=4 now works with rank-3 factorization + grokking', 'Hand-coded models can go much smaller...since they don't need to be discoverable by SGD.'
Verification section uses fixed seed (2025) and standardized test harness (10 edge cases + 10,000 random pairs), supporting scientific reproducibility.
Inferences
Explicit focus on 'fundamental question' frames challenge as scientific exploration rather than engineering task.
Public sharing of novel techniques advances global scientific knowledge in neural network efficiency and design.
Standardized verification and public leaderboard support scientific integrity and peer evaluation.
README functions as educational material explaining transformers, attention mechanisms, carry propagation, parameter counting, and verification methodology.
FW Ratio: 67%
Observable Facts
README includes detailed sections explaining 'Addition requires three capabilities: Digit alignment, Per-digit arithmetic, Carry propagation' and 'Transformers solve these using attention (for alignment), MLPs (for arithmetic), and autoregressive generation.'
Verification section includes executable Python script enabling readers to reproduce and learn from testing methodology.
Repository includes 'Context' section explaining fundamental ML concepts ('what is the minimal transformer that can represent integer addition?').
The repository implicitly recognizes contributor dignity through equal treatment and transparent attribution of intellectual work.
FW Ratio: 60%
Observable Facts
All leaderboard entries include named attribution linking to contributor profiles or repositories.
The challenge invites participation from anyone with a GitHub account, with no stated eligibility restrictions based on protected characteristics.
The README states 'Both are valid. Both are interesting.' regarding different approaches, demonstrating respect for diverse intellectual contributions.
Inferences
Transparent attribution system recognizes and honors the inherent worth of contributors' intellectual work.
Equal treatment in submission criteria and leaderboard display suggests structural commitment to contributor dignity.
Submission process contains no stated eligibility restrictions based on protected characteristics; GitHub platform provides accessible participation channels.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 13:57:54 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.