1 stories
ppbench.com visit →
Stories 1 (0 evaluated) Avg HRCB ND
Avg SETL ND Avg Conf ND
Poster Karma 1,052 avg Submitters 1
1. Show HN: Pencil Puzzle Bench – LLM Benchmark for Multi-Step Verifiable Reasoning (ppbench.com)
4 points by bluecoconut 7 hours ago | 0 comments | skipped