+0.10 100M-Row Challenge with PHP

Name: HRCB Evaluation: 100M-Row Challenge with PHP
Item: 100M-Row Challenge with PHP
Rating: 0.12
Author: Human Rights Observatory

Model: @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 claude-haiku-4-5-20251001 +0.10 meta-llama/llama-3.3-70b-instruct:free ND Compare

+0.10	100M-Row Challenge with PHP (github.com S:+0.10 )
	185 points by brentroose 5 days ago \| 101 comments on HN \| Mild positive Landing Page · v3.7 · 2026-02-28 09:58:15 0

Summary Technical Skill Development & Community Acknowledges

This GitHub repository hosts a 2-week PHP performance optimization challenge (Feb 24–Mar 15, 2026) with transparent rules, fair compensation via sponsor prizes, and open community participation. The content engages positively with UDHR principles around non-discrimination, equality before rules, free expression, assembly, fair work, and education—but frames these as technical competition values rather than explicit human rights advocacy. Overall, the content is neutral-to-mildly-positive regarding human rights compatibility, with no hostile or undermining elements.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Editorial Mean	+0.10	Structural Mean	+0.10
Weighted Mean	+0.12	Unweighted Mean	+0.10
Max	+0.30 Article 26	Min	-0.10 Article 12
Signal	13	No Data	18
Volatility	0.10 (Medium)
Negative	1	Channels	E: 0.6 S: 0.4
SETL ℹ	+0.07	Editorial-dominant
FW Ratio ℹ	71%	32 facts · 13 inferences

Evidence 21% coverage ℹ

 1H  8M  4L  18 ND 

Theme Radar

HN Discussion 15 top-level · 21 replies

brentroose 2026-02-25 10:24 UTC link

A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds. This optimization process with so much fun, and so many people pitched in with their ideas; so I eventually decided I wanted to do something more.

That's why I built a performance challenge for the PHP community

The goal of this challenge is to parse 100 million rows of data with PHP, as efficiently as possible. The challenge will run for about two weeks, and at the end there are some prizes for the best entries (amongst the prize is the very sought-after PhpStorm Elephpant, of which we only have a handful left).

I hope people will have fun with it :)

spiderfarmer 2026-02-25 12:36 UTC link

Awesome. I’ll be following this. I’ll probably learn a ton.

pxtail 2026-02-25 12:47 UTC link

Side note - I wasn't aware that there is active collectors scene for Elephpants, awesome!

https://elephpant.me/

Retr0id 2026-02-25 13:19 UTC link

How large is a sample 100M row file in bytes? (I tried to run the generator locally but my php is not bleeding-edge enough)

tveita 2026-02-25 13:36 UTC link

> Also, the generator will use a seeded randomizer so that, for local development, you work on the same dataset as others

Except that the generator script generates dates relative to time() ?

Xeoncross 2026-02-25 15:00 UTC link

This is why I jumped from PHP to Go, then why I jumped from Go to Rust.

Go is the most battery-included language I've ever used. Instant compile times means I can run tests bound to ctrl/cmd+s every time I save the file. It's more performant (way less memory, similar CPU time) than C# or Java (and certainly all the scripting languages) and contains a massive stdlib for anything you could want to do. It's what scripting languages should have been. Anyone can read it just like Python.

Rust takes the last 20% I couldn't get in a GC language and removes it. Sure, it's syntax doesn't make sense to an outsider and you end up with 3rd party packages for a lot of things, but can't beat it's performance and safety. Removes a whole lot of tests as those situations just aren't possible.

If Rust scares you use Go. If Go scares you use Rust.

poizan42 2026-02-25 15:04 UTC link

> The output should be encoded as a pretty JSON string.

...

> Your parser should store the following output in $outputPath as a JSON file:

    {
        "\/blog\/11-million-rows-in-seconds": {
            "2025-01-24": 1,
            "2026-01-24": 2
        },
        "\/blog\/php-enums": {
            "2024-01-24": 1
        }
    }

They don't define what exactly "pretty" means, but superflous escapes are not very pretty in my opinion.

semiquaver 2026-02-25 15:10 UTC link

Are they just confused about what characters require escaping in JSON strings or is PHP weirder than I remember?

    {
        "\/blog\/11-million-rows-in-seconds": {
            "2025-01-24": 1,
            "2026-01-24": 2
        },
        "\/blog\/php-enums": {
            "2024-01-24": 1
        }
    }

csjh 2026-02-25 16:42 UTC link

Obligatory DuckDB solution:

> duckdb -s "COPY (SELECT url[20:] as url, date, count(*) as c FROM read_csv('data.csv', columns = { 'url': 'VARCHAR', 'date': 'DATE' }) GROUP BY url, date) TO 'output.json' (ARRAY)"

Takes about 8 seconds on my M1 Macbook. JSON not in the right format, but that wouldn't dominate the execution time.

tomaytotomato 2026-02-25 17:15 UTC link

Tempted to submit a Java app wrapped in PHP exec() :D

matei88 2026-02-25 19:41 UTC link

It reminds me of a good read about optimizing PHP for 1 billion rows challenge. TLDR; at some point you hit a limit in PHP’s stream layer

https://dev.to/realflowcontrol/processing-one-billion-rows-i...

chrismarlow9 2026-02-25 19:45 UTC link

I don't have time to put together a submission but I'm willing to bet you can use this:

https://github.com/kjdev/php-ext-jq

And replicate this command:

jq -R ' [inputs | split(",") | {url: .[0], date: .[1] | split("T")[0]}] | group_by(.url) | map({ (.[0].url): ( map(.date) | group_by(.) | map({(.[0]): length}) | add ) }) | add ' < test-data.csv

And it will be faster than anything you can do in native php

Edit: I'm assuming none of the urls have a comma with this but it's more about offloading it through an extension, even if you custom built it

Twirrim 2026-02-25 22:00 UTC link

I took a quick look, the dependency on php 8.5 is mildly irritating, even Ubuntu 26.04 isn't lined up to ship with that version, it's on 8.4.11.

You mention in the README that the goal is to run things in a standard environment, but then you're using a near bleeding edge PHP version that people are unlikely to be using?

I thought I'd just quickly spin up a container and take a look out of interest, but now it looks like I'll have to go dig into building my own PHP packages, or compiling my own version from scratch to even begin to look at things?

tzs 2026-02-25 22:05 UTC link

What's a decent time for this?

I was curious what it would take if I approached it the way I do with most CSV transformation tasks that I'm only intending to do once: use Unix command line tools such as cut, sed, sort, and uniq to do the bulk of the work, and then do something in whatever scripting language seems appropriate to put the final output in whatever format is needed.

The first part, using this command [1], produces output lines that look like this:

  219,/blog/php-81-before-and-after,2021-06-21

and is sorted by URL path and then date.

With 1 million lines that took 9 or 10 seconds (M2 Max Mac Studio). But with 100 million it took 1220 seconds, virtually all of which was sorting.

Turning that into JSON via a shell script [2] was about 15 seconds. (That script is 44% longer than it would have been had JSON allowed a comma after the last element of an array).

So basically 22 minutes. The sorting is the killer with this type of approach, because the input is 7 GB. The output is only 13 MB and the are under 300 pages and the largest page count is under 1000 so building the output up in memory as the unsorted input is scanned and then sorting it would clearly by way way faster.

[1] cut -d / -f 4- | sed -e 's/T..............$//' | sort | uniq -c | sed -e 's/^ *//' -e 's/ /,\//'

[2]

  #!/bin/zsh
  echo "{"
  PAGE=none
  while read LINE; do
      COLS=("${(@s/,/)LINE}")
      COUNT=${COLS[1]}
      URL=${COLS[2]}
      DATE=${COLS[3]}
      if [ $URL != $PAGE ]; then
          if [ $PAGE != "none" ]; then
              echo
              echo "    },"
          fi
          PAGE=$URL
          echo "    \"\\$URL\": {"
          FINISHDATE=no
      else
          if [ $FINISHDATE = "yes" ]; then
              echo ","
          fi
      fi
      echo -n "        \"$DATE\": $COUNT"
      FINISHDATE=yes
  done
  echo
  echo "}"

lofaszvanitt 2026-02-25 22:17 UTC link

Submit at the very end, so others wouldn't know you have a better solution.

gib444 2026-02-25 13:08 UTC link

> A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds

That's a huge improvement! How much was low hanging fruit unrelated to the PHP interpreter itself, out of curiosity? (E.g. parallelism, faster SQL queries etc)

brentroose 2026-02-25 13:32 UTC link

Around 7GB

Tade0 2026-02-25 13:33 UTC link

Pitch this to whoever is in charge of performance at Wordpress.

A Wordpress instance will happily take over 20 seconds to fully load if you disable cache.

t1234s 2026-02-25 13:39 UTC link

Elephpants should be for second and third place. First place should be the double-clawed hammer.

thih9 2026-02-25 13:51 UTC link

Excellent project. My favorites: the joker, php storm, phplashy, Molly.

codegeek 2026-02-25 15:05 UTC link

I am not that smart to use Rust so take it with a grain of salt. However, its syntax just makes me go crazy. Go/Golang on the other hand is a breath of fresh air. I think unless you really need that additional 20% improvement that Rust provides, Go should be the default for most projects between the 2.

poizan42 2026-02-25 15:13 UTC link

> The output should be encoded as a pretty JSON string.

So apparently that is what they consider "pretty JSON". I really don't want to see what they would consider "ugly JSON".

(I think the term they may have been looking for is "pretty-printed JSON" which implies something about the formatting rather than being a completely subjective term)

kijin 2026-02-25 15:14 UTC link

They probably mean "Should look like the output of json_encode($data, JSON_PRETTY_PRINT)". Which most PHP devs would be familiar with.

CapitaineToinon 2026-02-25 15:15 UTC link

That's the default output when using json_encode with the JSON_PRETTY_PRINT flag in php.

thinkingtoilet 2026-02-25 15:37 UTC link

It's almost comical how often bring up Rust. "Here's a fun PHP challange!" "Let's talk about Rust..."

daviddoran 2026-02-25 15:49 UTC link

PHP has always escaped forward slashes to help prevent malicious JSON from injecting tags into JavaScript I believe. Because it was common for PHP users to json_encode some data and then to write it out into the HTML in a script tag. A malicious actor could include a closing script tag, and then could inject their own HTML tags and scripts etc.

idoubtit 2026-02-25 16:54 UTC link

The weirdness is partly in JSON . In the JSON spec, the slash (named "solidus" there) is the only character that can be written plainly or prefixed with a backslash (AKA "reverse solidus").

See page 4, section 9 of the latest ECMA for JSON: https://ecma-international.org/wp-content/uploads/ECMA-404_2...

cess11 2026-02-25 17:49 UTC link

This log in one of the PR:s claims a 5.4s running time on some Mac.

https://github.com/tempestphp/100-million-row-challenge/pull...

ge96 2026-02-25 18:20 UTC link

5 days to 30 seconds? What kind of factor/order of magnitude is that damn

What takes 5 days to run

tracker1 2026-02-25 18:25 UTC link

Can't speak for go... but for the handful of languages I've thrown at Claude Code, I'd say it's doing the best job with Rust. Maybe the Rust examples in the wild are just better compared to say C#, but I've had a much smoother time of it with Rust than anything else. TS has been decent though.

Retr0id 2026-02-25 19:54 UTC link

The rules exclude FFI etc.

brentroose 2026-02-25 20:41 UTC link

True, it's a bug that I'm going to fix, but it only impacts local test data sets and not the real benchmark :)

brentroose 2026-02-25 20:42 UTC link

The rules state that FFI and the likes isn't allowed because the goal is to do it in PHP :)

contingencies 2026-02-25 21:18 UTC link

Hehe. Optimization ... it's a good way to learn. Earlier in my career I did a lot of PHP. Usually close to bare.

Other than the obvious point that writing an enormous JSON file is a dubious goal in the first place (really), while PHP can be very fast this is probably faster to implement in shell with sed/grep, or ... almost certainly better ... by loading to sqlite then dumping out from there. Your optimization path then likely becomes index specification and processing, and after the initial load potentially query or instance parallelization.

The page confirms sqlite is available.

If the judges whinge and shell_exec() is unavailable as a path, as a more acceptable path that's whinge-tolerant, use PHP's sqlite feature then dump to JSON.

If I wanted to achieve this for some reason in reality, I'd have the file on a memory-backed blockstore before processing, which would yield further gains.

Frankly, this is not much of a programming problem, it's more a system problem, but it's not being specced as such. This shows, in my view, immaturity of conception of the real problem domain (likely IO bound). Right tool for the job.

lofaszvanitt 2026-02-25 22:17 UTC link

Do not update the leaderboard.... at all.

gaigalas 2026-02-25 23:06 UTC link

Those are quite good:

https://launchpad.net/~ondrej/+archive/ubuntu/php

Anyway, whatever you write in an earlier PHP version is likely to work on future versions. PHP has remarkable BC.

If you're just experimenting, might as well start in the browser:

https://alganet.github.io/phasm/

Not all extensions available there, but it has the essentials.

Editorial Channel

What the content says

+0.30

Article 26 Education

High Advocacy

Editorial

+0.30

SETL

Content explicitly educates participants in technical skills: data parsing, performance optimization, algorithm design. Provides step-by-step learning path.

+0.20

Article 19 Freedom of Expression

Medium Advocacy

Editorial

+0.20

SETL

Content strongly encourages free expression of ideas and open sharing of code, core values of open source.

+0.20

Article 20 Assembly & Association

Medium Advocacy

Editorial

+0.20

SETL

+0.14

Content actively facilitates community assembly through Discord and collaborative problem-solving.

+0.20

Article 29 Duties to Community

Medium Advocacy

Editorial

+0.20

SETL

Content emphasizes community duties: warns against plagiarism, requires original work, and encourages ethical participation.

+0.10

Article 2 Non-Discrimination

Medium Advocacy

Editorial

+0.10

SETL

No explicit discrimination criteria stated; competition is open based on capability, not protected characteristics.

+0.10

Article 7 Equality Before Law

Medium Advocacy

Editorial

+0.10

SETL

Competition rules are transparent and applied uniformly to all participants without exception or favoritism.

+0.10

Article 17 Property

Medium Advocacy

Editorial

+0.10

SETL

Property rights in intellectual creations are implicitly protected; participants retain ownership of their solutions.

+0.10

Article 23 Work & Equal Pay

Medium Advocacy

Editorial

+0.10

SETL

Competition offers fair compensation through named prizes for work performed, with transparent criteria and no exploitation.

+0.10

Article 27 Cultural Participation

Medium Advocacy

Editorial

+0.10

SETL

0.00

Content contributes to open source software culture and shared technical knowledge benefiting global community.

0.00

Preamble Preamble

Low

Editorial

0.00

SETL

Preamble principles of dignity and human rights are not addressed in technical competition documentation.

0.00

Article 1 Freedom, Equality, Brotherhood

Low

Editorial

0.00

SETL

Content does not explicitly address equal dignity or inherent rights.

0.00

Article 10 Fair Hearing

Low

Editorial

0.00

SETL

Right to fair and public hearing implied through transparent submission and verification process, but not explicitly affirmed.

-0.10

Article 12 Privacy

Low Framing

Editorial

-0.10

SETL

GitHub's data collection practices are referenced implicitly (analytics, tracking) but privacy protections are not discussed on this page.

Article 3 Life, Liberty, Security

Right to life, liberty, and personal security does not engage with programming competition content.

Article 4 No Slavery

Slavery and servitude not addressed.

Article 5 No Torture

Torture and cruel treatment not addressed.

Article 6 Legal Personhood

Right to recognition as person before law not addressed.

Article 8 Right to Remedy

Right to remedy by competent courts not addressed.

Article 9 No Arbitrary Detention

Arbitrary arrest and detention not addressed.

Article 11 Presumption of Innocence

Presumption of innocence not applicable.

Article 13 Freedom of Movement

Freedom of movement not addressed.

Article 14 Asylum

Right to seek asylum not addressed.

Article 15 Nationality

Right to nationality not addressed.

Article 16 Marriage & Family

Marriage and family not addressed.

Article 18 Freedom of Thought

Freedom of thought, conscience, religion not addressed.

Article 21 Political Participation

Political participation not addressed.

Article 22 Social Security

Social security not addressed.

Article 24 Rest & Leisure

Rest and leisure not addressed.

Article 25 Standard of Living

Right to adequate standard of living not addressed.

Article 28 Social & International Order

Right to social and international order not addressed.

Article 30 No Destruction of Rights

Right to prevent UDHR destruction not addressed.

Structural Channel

What the site does

+0.10

Article 20 Assembly & Association

Medium Advocacy

Structural

+0.10

Context Modifier

SETL

+0.14

GitHub provides pull request and issue discussion features enabling assembly.

+0.10

Article 27 Cultural Participation

Medium Advocacy

Structural

+0.10

Context Modifier

SETL

0.00

GitHub platform enables public code sharing and cultural participation.

Preamble Preamble

Low

GitHub platform structure does not engage with Preamble.

Article 1 Freedom, Equality, Brotherhood

Low

Not applicable to technical competition.

Article 2 Non-Discrimination

Medium Advocacy

Not applicable.

Article 3 Life, Liberty, Security

Not applicable.

Article 4 No Slavery

Not applicable.

Article 5 No Torture

Not applicable.

Article 6 Legal Personhood

Not applicable.

Article 7 Equality Before Law

Medium Advocacy

Not applicable.

Article 8 Right to Remedy

Not applicable.

Article 9 No Arbitrary Detention

Not applicable.

Article 10 Fair Hearing

Low

Not applicable.

Article 11 Presumption of Innocence

Not applicable.

Article 12 Privacy

Low Framing

Not applicable.

Article 13 Freedom of Movement

Not applicable.

Article 14 Asylum

Not applicable.

Article 15 Nationality

Not applicable.

Article 16 Marriage & Family

Not applicable.

Article 17 Property

Medium Advocacy

Not applicable.

Article 18 Freedom of Thought

Not applicable.

Article 19 Freedom of Expression

Medium Advocacy

Not applicable.

Article 21 Political Participation

Not applicable.

Article 22 Social Security

Not applicable.

Article 23 Work & Equal Pay

Medium Advocacy

Not applicable.

Article 24 Rest & Leisure

Not applicable.

Article 25 Standard of Living

Not applicable.

Article 26 Education

High Advocacy

Not applicable.

Article 28 Social & International Order

Not applicable.

Article 29 Duties to Community

Medium Advocacy

Not applicable.

Article 30 No Destruction of Rights

Not applicable.

Supplementary Signals

How this content communicates, beyond directional lean. Learn more

Epistemic Quality ℹ

How well-sourced and evidence-based is this content?

0.73 medium claims

Sources		0.8
Evidence		0.7
Uncertainty		0.7
Purpose		0.9

Propaganda Flags ℹ

No manipulative rhetoric detected

0 techniques detected

Emotional Tone ℹ

Emotional character: positive/negative, intensity, authority

hopeful

Valence		+0.6
Arousal		0.5
Dominance		0.7

Transparency ℹ

Does the content identify its author and disclose interests?

0.67

✓ Author ✗ Conflicts ✓ Funding

More signals: context, framing & audience

Solution Orientation ℹ

Does this content offer solutions or only describe problems?

0.78 solution oriented

Reader Agency

0.8

Stakeholder Voice ℹ

Whose perspectives are represented in this content?

0.42 3 perspectives

Speaks: institutioncorporation

About: individualsworkers

Temporal Framing ℹ

Is this content looking backward, at the present, or forward?

present short term

Geographic Scope ℹ

What geographic area does this content cover?

global

Europe

Complexity ℹ

How accessible is this content to a general audience?

technical medium jargon domain specific

Longitudinal 1080 HN snapshots · 11 evals

Audit Trail 31 entries

2026-02-28 11:39	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-02-28 11:39	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning ED: Neutral tech tutorial content, no rights stance
2026-02-28 11:39	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 0W 1R	- -
2026-02-28 11:38	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-02-28 11:38	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
	reasoning Tech tutorial no rights stance
2026-02-28 11:38	rater_validation_warn	Lite validation warnings for model llama-3.3-70b-wai: 0W 1R	- -
2026-02-28 09:58	eval	Evaluated by claude-haiku-4-5-20251001: +0.12 (Mild positive) -0.08
2026-02-26 23:13	eval_success	Light evaluated: Neutral (0.00)	- -
2026-02-26 23:13	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral)
	reasoning ED: Neutral tech tutorial content, no rights stance
2026-02-26 20:22	dlq	Dead-lettered after 1 attempts: 100M-Row Challenge with PHP	- -
2026-02-26 20:19	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 20:18	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 20:17	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 17:46	dlq	Dead-lettered after 1 attempts: 100M-Row Challenge with PHP	- -
2026-02-26 17:44	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 17:43	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 17:42	rate_limit	OpenRouter rate limited (429) model=llama-3.3-70b	- -
2026-02-26 09:15	dlq	Dead-lettered after 1 attempts: 100M-Row Challenge with PHP	- -
2026-02-26 09:15	dlq	Dead-lettered after 1 attempts: 100M-Row Challenge with PHP	- -
2026-02-26 09:13	rate_limit	OpenRouter rate limited (429) model=hermes-3-405b	- -
2026-02-26 09:13	rate_limit	OpenRouter rate limited (429) model=mistral-small-3.1	- -
2026-02-26 09:12	rate_limit	OpenRouter rate limited (429) model=hermes-3-405b	- -
2026-02-26 09:11	rate_limit	OpenRouter rate limited (429) model=mistral-small-3.1	- -
2026-02-26 09:10	rate_limit	OpenRouter rate limited (429) model=hermes-3-405b	- -
2026-02-26 08:29	eval	Evaluated by deepseek-v3.2: +0.20 (Mild positive) 10,882 tokens
2026-02-26 02:52	eval	Evaluated by claude-haiku-4-5-20251001: +0.20 (Mild positive) 12,685 tokens +0.00
2026-02-26 02:27	eval	Evaluated by claude-haiku-4-5-20251001: +0.20 (Mild positive) 12,857 tokens -0.06
2026-02-26 00:06	eval	Evaluated by claude-haiku-4-5-20251001: +0.26 (Mild positive) 13,049 tokens +0.05
2026-02-26 00:01	eval	Evaluated by claude-haiku-4-5-20251001: +0.21 (Mild positive) 13,171 tokens +0.07
2026-02-25 22:43	eval	Evaluated by claude-haiku-4-5-20251001: +0.14 (Mild positive) 10,288 tokens -0.05
2026-02-25 22:24	eval	Evaluated by claude-haiku-4-5-20251001: +0.19 (Mild positive) 9,311 tokens

build fb2f838+pr96 · deployed 2026-03-02 19:02 UTC · evaluated 2026-03-02 19:05:25 UTC