11 points by kofdai 5 days ago | 6 comments on HN
| Mild positive Product · v3.7· 2026-03-01 03:10:28 0
Summary Scientific Advancement Advocates
The page is a GitHub repository for 'verantyx-v6', an LLM-Free Symbolic Reasoning Engine. Its editorial content advocates for unbiased, structurally-verified scientific tools for a hypothetical 'Humanity's Last Exam'. The evaluation finds mild positive advocacy themes related to scientific advancement, freedom of expression, and participation in cultural life, framed within the platform's structural support for open access and collaboration.
You used LLMs to generate code to beat ARC-AGI "without using LLMs"... Uhh, okay then.
LLMs generating code to solve ARC-AGI is literally what they do these days, so as far as I see, basically this entire exercise is equivalent to just running "Deep Think" test-time compute type models and committing their output to Github?
What exactly was the novel, un-LLMable human input here?
Title:
[Show HN] Verantyx Update: 22.7% on ARC-AGI-2 using Human-Logic + OpenClaw Loop
Body:
Following up on my previous post (where I was at 18.1%), I’ve just reached 22.7% (227/1000) on the ARC-AGI-2 public evaluation set.
I want to address the skepticism regarding my development speed. As an undergraduate student in Japan, I have limited manual coding time. To overcome this, I’ve established a "Human-Architect / AI-Builder" research loop.
How the 24/7 loop works:
Human (Me): I analyze failed tasks to identify underlying geometric patterns and design new DSL primitives (e.g., the new gravity_solver and cross3d_geometry in v62).
AI Agent (OpenClaw/Claude Code): Based on my architectural design, the agent scaffolds the implementation, performs rigorous regression testing across all 1,000 tasks, and refines the code for performance.
This synergy allows for a high-frequency commit cycle that a single developer could never achieve alone, while ensuring the inference engine remains 100% symbolic and deterministic. At test-time, there are zero LLM calls; it's pure structural reasoning.
V62 Key Updates:
Gravity Solver: 4 distinct strategies for object sliding/gravity-based transformations.
Cross3D Geometry Engine: Improved handling of 3D-projected cross structures.
Score: 22.7% (monotonically increasing from 20.1% and 22.4% earlier this week).
I believe this hybrid development model—where human intuition drives logic and AI agents drive implementation—is the fastest path to 80%+ on the "Humanity's Last Exam".
I'm eager to hear your thoughts on this "System 2" approach and the role of AI agents in building symbolic AI.
I understand the skepticism—the line between "AI-generated" and "AI-assisted" has become incredibly blurry. Let me clarify the architectural distinction.
1. The Inference Engine is 100% Deterministic:
The "solver" is a standalone Python program (26K lines + NumPy). At runtime, it has zero neural dependencies. It doesn't call an LLM, it doesn't load weights, and it doesn't "hallucinate." It performs a combinatorial search over a formal Domain Specific Language (DSL). You could run this on a legacy machine with no internet connection. This is fundamentally different from o1/o3 or Grok-Thinking, where the model is the solver at test-time.
2. The "Novel Human Input" is the DSL Design:
Using an LLM to help write Python boilerplate is trivial. Using an LLM to design a 7-phase symbolic pipeline that solves ARC is currently impossible. My core contributions that an LLM could not "reason" out are:
The Cross DSL: The insight that ~57% of ARC transforms can be modeled by local 5-cell Von Neumann neighborhoods.
Iterative Residual Learning: A gradient-free strategy where the system synthesizes a transform, calculates the residual error on the grid, and iteratively synthesizes "correction" programs.
Pruning & Verification: Implementing a formal verification loop where every candidate solution is checked against the 3-5 training examples before being proposed.
3. Scaling through Logic, not Compute:
While the industry spends millions on "Test-time Compute" (GPU-heavy CoT), Verantyx achieves 18.1% (and now 20% in v6) using Symbolic Synthesis on a single CPU. The 208 commits in the repo represent 208 iterations of staring at grid failures and manually expanding the primitive vocabulary to cover topological edge cases that LLMs consistently miss.
If using Copilot to speed up the implementation of a deterministic search algorithm invalidates the algorithm, then we’d have to invalidate most modern OS kernels or compilers written today. The "intelligence" isn't in the typing; it's in the program synthesis architecture that does what pure LLM inference cannot.
The repository is described as an 'LLM-Free Symbolic Reasoning Engine for Humanity's Last Exam (HLE)' — advocating for unbiased tools for humanity.
FW Ratio: 50%
Observable Facts
The repository title states: '⚡ LLM-Free Symbolic Reasoning Engine for Humanity's Last Exam (HLE)'.
Inferences
The framing of a 'Symbolic Reasoning Engine for Humanity's Last Exam' suggests an aspirational goal oriented toward collective human advancement, aligning with the Preamble's ideals of freedom and dignity.
The repository promotes '3.80% bias-free score via structural verification, not statistical guessing', advocating for unbiased scientific contribution.
FW Ratio: 50%
Observable Facts
The repository subtitle states: '— 3.80% bias-free score via structural verification, not statistical guessing'.
The page is hosted on a platform for collaborative software development.
Inferences
The editorial claim advocates for a specific methodological standard in scientific contribution, aligning with the right to benefit from scientific advancement.
The platform's structure facilitates collaborative development, supporting participation in cultural life.
Platform provides equal access for users to create repositories, supporting baseline equality but not directly engaging with concepts of dignity.
FW Ratio: 50%
Observable Facts
The page is a public GitHub repository, accessible to anyone with an internet connection.
Inferences
The structural feature of an open-access code repository offers a space for participation on equal footing, indirectly supporting a practice of equal treatment.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 10:41:39 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.