Model Comparison
Model Editorial Structural Class Conf SETL Theme
deepseek/deepseek-v3.2-20251201 +0.10 +0.10 Mild positive 0.02 0.00 Technical Expression
@cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 ND Neutral 0.90 0.00 No human rights theme
@cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 ND Neutral 1.00 0.00
claude-haiku-4-5-20251001 0.00 ND Neutral 0.06 Technical Knowledge Sharing
Section deepseek/deepseek-v3.2-20251201 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite @cf/meta/llama-4-scout-17b-16e-instruct lite claude-haiku-4-5-20251001
Preamble ND ND ND ND
Article 1 ND ND ND ND
Article 2 ND ND ND ND
Article 3 ND ND ND ND
Article 4 ND ND ND ND
Article 5 ND ND ND ND
Article 6 ND ND ND ND
Article 7 ND ND ND ND
Article 8 ND ND ND ND
Article 9 ND ND ND ND
Article 10 ND ND ND ND
Article 11 ND ND ND ND
Article 12 0.10 ND ND ND
Article 13 ND ND ND ND
Article 14 ND ND ND ND
Article 15 ND ND ND ND
Article 16 ND ND ND ND
Article 17 ND ND ND ND
Article 18 ND ND ND ND
Article 19 0.10 ND ND ND
Article 20 ND ND ND ND
Article 21 ND ND ND ND
Article 22 ND ND ND ND
Article 23 ND ND ND ND
Article 24 ND ND ND ND
Article 25 ND ND ND ND
Article 26 ND ND ND ND
Article 27 0.10 ND ND ND
Article 28 ND ND ND ND
Article 29 ND ND ND ND
Article 30 ND ND ND ND
+0.15 Ask HN: What are some cool but obscure data structures you know about?
2051 points by Uptrenda 1319 days ago | 753 comments on HN | Mild positive Community · v3.7 · 2026-03-01 04:16:03 0
Summary Science & Culture Neutral
This is a user post on Hacker News soliciting discussion on interesting data structures, with the author providing an example of Bloom filters and Golomb Coded Sets. The content engages lightly with rights related to sharing information and participating in scientific culture. The evaluation is primarily neutral, with mildly positive scores stemming from the act of knowledge sharing on a platform designed for that purpose.
Article Heatmap
Preamble: ND — Preamble Preamble: No Data — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: ND — Equality Before Law Article 7: No Data — Equality Before Law 7 Article 8: ND — Right to Remedy Article 8: No Data — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: +0.10 — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: ND — Marriage & Family Article 16: No Data — Marriage & Family 16 Article 17: ND — Property Article 17: No Data — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: +0.10 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: ND — Political Participation Article 21: No Data — Political Participation 21 Article 22: ND — Social Security Article 22: No Data — Social Security 22 Article 23: ND — Work & Equal Pay Article 23: No Data — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: ND — Education Article 26: No Data — Education 26 Article 27: +0.10 — Cultural Participation 27 Article 28: ND — Social & International Order Article 28: No Data — Social & International Order 28 Article 29: ND — Duties to Community Article 29: No Data — Duties to Community 29 Article 30: ND — No Destruction of Rights Article 30: No Data — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.15 Structural Mean +0.30
Weighted Mean +0.10 Unweighted Mean +0.10
Max +0.10 Article 12 Min +0.10 Article 12
Signal 3 No Data 28
Volatility 0.00 (Low)
Negative 0 Channels E: 0.6 S: 0.4
SETL -0.21 Structural-dominant
FW Ratio 50% 5 facts · 5 inferences
Evidence 5% coverage
2M 3L 29 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.00 (0 articles) Security: 0.00 (0 articles) Legal: 0.00 (0 articles) Privacy & Movement: 0.10 (1 articles) Personal: 0.00 (0 articles) Expression: 0.10 (1 articles) Economic & Social: 0.00 (0 articles) Cultural: 0.10 (1 articles) Order & Duties: 0.00 (0 articles)
HN Discussion 20 top-level · 30 replies
petethepig 2022-07-21 23:17 UTC link
Tries (or prefix trees).

We use them a lot at Pyroscope for compressing strings that have common prefixes. They are also used in databases (e.g indexes in Mongo) or file formats (e.g debug symbols in macOS/iOS Mach-O format are compressed using tries).

We have an article with some animations that illustrate the concept in case anyone's interested [0].

[0] https://github.com/pyroscope-io/pyroscope/blob/main/docs/sto...

gpestana 2022-07-21 23:26 UTC link
Conflict-free replicated data type (CRDT) is super interesting data type (and algorithm), especially when it is implemented for complex data structures like JSON: A JSON CRDT is "[...] an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state (a conflict-free replicated datatype or CRDT)." [1].

[1] A Conflict-Free Replicated JSON Datatype (https://arxiv.org/abs/1608.03960) b Martin Kleppmann and Alastair R. Beresford.

anon291 2022-07-21 23:31 UTC link
Finger trees allow you to do amazing things. In essence they let you build an index, or multiple indices, for your dataset and then store them in a single structure.

When I go from Haskell back to imperative land I find myself greatly missing this ability. Sure I can make multiple hashmaps or trees or whatever but being able to stuff it all in one data structure is amazing.

One structure I built with them that is much more difficult with typical structures is a "block list" structure.

In this structure each block has a particular width and they're all stacked side by side.

I want to perform a query, "which box is at position X". So if my boxes are of widths 7,20,10, then the lookup for 2 yields the first box, the lookup for 12 yields the second, etc.

More interestingly, if add a new box between the second and third, the indices covered by the last box is increased.

With finger trees you use a sum monoid. This is easy. In other languages you have to roll your own structure either using a list (with o(n) lookup) or a tree with o(log n) lookup, but o(n) inserts (to translate the indices of future blocks)

https://andrew.gibiansky.com/blog/haskell/finger-trees/

binarymax 2022-07-21 23:59 UTC link
HNSW, or Hierarchical Navigable Small World is a graph data structure for approximate nearest neighbor search of vectors.

https://arxiv.org/abs/1603.09320

The problem space of ANN is one of those really deep holes you can go down. It’s a game of balancing time and space, and it’s got plenty of fascinating algorithms and datastructures.

Check out http://ann-benchmarks.com/ for a comparison. HNSW is not “the best” but it’s easy to understand and is quite effective.

jiggawatts 2022-07-22 00:02 UTC link
Cache-Oblivious Data Structures: https://cs.au.dk/~gerth/MassiveData02/notes/demaine.pdf

A vaguely related notion is that naive analysis of big-O complexity in typical CS texts ignores over the increasing latency/cost of data access as the data size grows. This can't be ignored, no matter how much we would like to hand-wave it away, because physics gets in the way.

A way to think about it is that a CPU core is like a central point with "rings" of data arranged around it in a more-or-less a flat plane. The L1 cache is a tiny ring, then L2 is a bit further out physically and has a larger area, then L3 is even bigger and further away, etc... all the way out to permanent storage that's potentially across the building somewhere in a disk array.

In essence, as data size 'n' grows, the random access time grows as sqrt(n), because that's the radius of the growing circle with area 'n'.

Hence, a lot of algorithms that on paper have identical performance don't in reality, because one of the two may have an extra sqrt(n) factor in there.

This is why streaming and array-based data structures and algorithms tend to be faster than random-access, even if the latter is theoretically more efficient. So for example merge join is faster than nested loop join, even though they have the same performance in theory.

EnderShadow8 2022-07-22 00:44 UTC link
Treap: https://en.wikipedia.org/wiki/Treap

It's like a Red-Black tree in use case, but much faster to implement, which is good for competitive programming. The average case complexity is the same for all operations, but there's an unlikely degeneration to worst-case linked-list behaviour.

Lazy Propagation Segment Tree: https://cp-algorithms.com/data_structures/segment_tree.html

Like a segment tree in that it supports range queries in logarithmic time, but supports range updates in log time as well.

I know a few more fun ones that I occasionally use in contests, but I've never touched otherwise.

zarzavat 2022-07-22 00:51 UTC link
Spatial hashing.

Say that you have data that is identified with points in 2D or 3D space. The standard way that CS students learn in school to represent this is via a quadtree or octree. However, these tree data structures tend to have a lot of "fluff" (needless allocations and pointer chasing) and need a lot of work to be made efficient.

Spatial hashing is the stupidly simple solution of just rounding coordinates to a grid and storing nearby objects together in a hash table. As long as your data is reasonably well behaved then you get many of the advantages of a quadtree, and it's completely trivial to implement.

loxias 2022-07-22 01:06 UTC link
(Fantastic post idea OP. One of the best I've ever seen :D)

Related to bloom filters, xor filters are faster and more memory efficient, but immutable.

HyperLogLog is an efficient way to estimate cardinality.

Coolest thing I've learned recently was Y-fast trie. If your dataset M is bounded integers (say, the set of all 128 bit numbers), you get membership, predecessor, or successor queries in log log time, not log, like in a normal tree.

see: https://www.youtube.com/playlist?list=PLUl4u3cNGP61hsJNdULdu... (6.851 Advanced Data Structures, Erik Demaine)

Would love to learn more "stupidly fast at the cost of conceptual complexity" things.

edit:

(adding) I don't know a name for it, because it's not one thing but a combination, but once can:

Use the first N bits from a very quick hash of a key from an unknown distribution (say a file path, or a variable name, or an address, or a binary blob,..) as a way to "shard" this key across M other fast data structures (like a tree) for search/add/remove. By changing M, you can tune the size of the terms in the O(1)+O(log) equation for running time.

Trees getting too deep for fast search? Every increase of N moves the computation from the log search of the tree to the space tradeoff of having more trees.

Added benefit is you can scale to multiple threads easily. Instead of locking the whole tree, you lock a tiny sub-tree.

Very clever. (I first saw in the Aerospike key value store)

jhallenworld 2022-07-22 01:20 UTC link
Some ones I've used recently:

The "golden section search" to find a the minimum (or maximum) of a unimodal function. An actual real-world use case for the golden ratio.

Exponentially Weighted Moving Average filters. Or how to have a moving average without saving any data points..

Some of my classic favorites:

Skiplists: they are sorted trees, but the algorithms are low complexity which is nice.

Boyer-Moore string search is awesome..

Bit twiddling algorithms from Hackers Delight: for example (x &-x) isolates the least significant set bit: basically use the ALU's carry chain to determine priority.

Another is compress from Hacker's Delight. It very quickly compresses selected bits (from a bitmask), all to the right of the word. It's useful to make certain hash functions.

The humble circular queue. If your circular queue is a power of 2 in size, then the number of bits needed for indexes is at least log2(size) + 1. The + 1 is key- it allows you to distinguish between completely full and completely empty with no effort. So:

    Empty: (write_index == read_index)
    Full: (write_index == read_index + size)
    Count: (write_index - read_index)
    Insert: queue[write_index++ & (size - 1)] = data
    Remove: data = queue[read_index++ & (size - 1)];
All lossless data compression algorithms are amazing.
jtolmar 2022-07-22 04:32 UTC link
The union-find data structure / algorithm is useful and a lot of fun.

The goal is a data structure where you can perform operations like "a and b are in the same set", "b and c are in the same set" and then get answers to questions like "are a and c in the same set?" (yes, in this example.)

The implementation starts out pretty obvious - a tree where every element either points at itself or some thing it was merged with. To check if two elements are in the same set, check if they have the same parent. Without analyzing it, it sounds like you'll average findRoot() performance of O(log(n)), worst-case O(n).

There are a couple of simple optimizations you can do to this structure, the type of things that seem like they shouldn't end up affecting asymptotic runtime all that much. The first is that, whenever you find a root, you can re-parent all the nodes you visited on the way to that root, so they'll all be quicker to look up next time. The other is that you keep track of the size of sets, and always make the larger set be the parent of the smaller.

And neither of those actually do anything impressive alone, but if you use both, the algorithm suddenly becomes incredibly fast, with the slowest-growing (non-constant) complexity I've ever heard of: O(the inverse of the Ackermann function(n)). Or, for any reasonable N, O(4 or less).

chriswarbo 2022-07-22 04:55 UTC link
The Zipper acts like a linked-list with a cursor, or "focused element"; it's implemented as a pair of lists in opposite orders; or, equivalently but more symmetric, as triple of (List[A], A, List[A])

Say we have a zipper containing [0, 1, 2, 3, 4, 5], and we're focusing on the 3. In code this will look like:

    ([2, 1, 0], 3, [4, 5])
Where [a, b, c] denotes a singly-linked list, with O(1) head (returning a) and tail (returning [b, c]). Notice that the first list is in reverse order.

To focus on the next element, we put the currently focused element on the first list, and pull the head off the second list:

    ([3, 2, 1, 0], 4, [5])
And vice versa to focus on the previous element:

    ([1, 0], 2, [3, 4, 5])
I like to imagine this as a string of beads, with the focused element held in our fingers and the rest hanging down:

     3
    / \
    2 4
    | |
    1 5
    |
    0
To move the focus forwards and backwards, we move our grip to the next/previous bead.

This works nicely as an immutable datastructure, since the tails of both lists can be shared (i.e. we don't need to copy the whole list, just the wrap/unwrap an element on to the head)

Zippers were first applied to lists, but have since been generalised:

- To trees: focusing on one node, and being able to move the focus up, to the left-child, or to the right child

- To having more than one focus

- To datastructures more generally, by treating them as 'derivatives' (as in calculus)

https://en.wikipedia.org/wiki/Zipper_(data_structure)

As an example, the XMonad window manager uses a zipper to keep track of the currently-focused window.

I've also used Zippers for window management, e.g. having a list of Displays, with one of them focused (accepting keyboard input); each Display containing a list of Workspaces, with one of them focused (currently shown); and each Workspace containing a list of Windows, with one of them focused (the active window). Keyboard shortcuts could shift between the previous/next Window, Workspace, or Display.

kcbanner 2022-07-22 05:00 UTC link

  "This is the story of a clever trick that's been around for at least 35 years,     in which array values can be left uninitialized and then read during normal   operations, yet the code behaves correctly no matter what garbage is sitting in the array. Like the best programming tricks, this one is the right tool for the job in certain situations. The sleaziness of uninitialized data access is offset by performance improvements: some important operations change from linear to constant time."
https://research.swtch.com/sparse
filoeleven 2022-07-22 05:18 UTC link
HAMT: Hash Array Mapped Trie. This data structure makes efficient immutable data possible. You can update a list of a million items, and keep a reference to the original list, by changing 3 or 4 references and some bytes.

This should replace copy-on-write for scripting languages. I really want to see it in a JS spec soon. There are libraries that can do it, but they add translation penalties and extra steps. I’d complain less if I could use Clojurescript professionally, because it and Clojure are built around them.

https://en.m.wikipedia.org/wiki/Hash_array_mapped_trie

dwaite 2022-07-22 05:42 UTC link
Splay trees:

These are binary search trees, but when you 'splay' the tree (such as by search) it rebalances the tree so that the search result is on top. This means while it is O(log n) for operations like other self-balancing trees, it optimizes tree depth in the face of non-random access so that recently accessed items are fewer steps into the tree.

Piece tables:

A bit more common for text editors, where you need to represent a sequence of characters in a form that allows efficient memory use and fast insertions/removals (so editing the first line isn't moving the 10MB of data that follows it). You create a series of references to spans (or pieces) of the document, possibly starting with a single span pointing to a mmap() version. Edits are done by appending/prepending pieces, which are potentially just references to subsequences of items created in fresh memory, appended into a buffer. Saving can create a single sequence (and a single span).

This has interesting variations:

- Put attributes on the pieces for formatting, such as indicating text should be rendered a different color or bolded.

- Create a hierarchy of pieces-of-pieces. With formatting attributes you are then dangerously close to a DOM.

- Retain old copies of a piece table - since your original mmap() file hasn't changed and your changes are in an append-only buffer, those piece table copies provide undo/redo state.

skrebbel 2022-07-22 07:13 UTC link
Not a very deep CS-y one, but still one of my favourite data structures: Promise Maps.

It only works in languages where promises/futures/tasks are a first-class citizen. Eg JavaScript.

When caching the result of an expensive computation or a network call, don't actually cache the result, but cache the promise that awaits the result. Ie don't make a

    Map<Key, Result>
but a

    Map<Key, Promise<Result>>
This way, if a new, uncached key gets requested twice in rapid succession, ie faster than the computation takes, you avoid computing/fetching the same value twice. This trick works because:

- promises hold on to their result value indefinitely (until they're GC'ed)

- you can await (or .then()) an existing promise as many times as you want

- awaiting an already-resolved promise is a very low-overhead operation.

In other words, the promise acts as a mutex around the computation, and the resulting code is understandable even by people unfamiliar with mutexes, locks and so on.

romes 2022-07-22 08:01 UTC link
Equality graphs (e-graphs) for theorem proving and equality saturation and other equality-related things.

They're awesome data structures that efficiently maintain a congruence relation over many expressions

> At a high level, e-graphs extend union-find to compactly represent equivalence classes of expressions while maintaining a key invariant: the equivalence relation is closed under congruence.

e.g. If I were to represent "f(x)" and "f(y)" in the e-graph, and then said "x == y" (merged "x" and "y" in the e-graph), then the e-graph, by congruence, would be able to tell me that "f(x) == f(y)"

e.g. If I were to represent "a*(2/2)", in the e-graph, then say "2/2 == 1", and "x*1 == x", by congruence the e-graph would know "a*(2/2) == a" !

The most recent description of e-graphs with an added insight on implementation is https://arxiv.org/pdf/2004.03082.pdf to the best of my knowledge.

P.S: I'm currently implementing them in Haskell https://github.com/alt-romes/hegg

nathell 2022-07-22 09:41 UTC link
Directed acyclic word graphs (DAWGs)!

They’re like tries, but with identical subtrees glued together. They’re capable of encoding real-world dictionaries of millions of words into ~1 byte per word, when stored cleverly (see [0]). They let you do things like efficiently finding a list of words in a dictionary whose Levenshtein’s distance to a given word is less than x. Their cousins, GADDAGs, are _the_ data structure of choice in Scrabble-playing engines.

[0]: http://www.jandaciuk.pl/fsa.html

amenghra 2022-07-22 10:58 UTC link
Purely Functional Data Structures[1] by Chris Okasaki is worth reading. There's a book version if you prefer vs reading a thesis.

Even though the domain application is functional programming, these datastructures can come in handy when you want to enable state sharing / keeping old versions around without having to copy data.

[1] https://www.cs.cmu.edu/~rwh/students/okasaki.pdf

SahAssar 2022-07-22 13:02 UTC link
https://en.wikipedia.org/wiki/Macaroons_(computer_science) are very interesting to me.

Think of it as a JWT that you can narrow down the authorizations for without needing to communicate with a server, so if you have read permissions to all your photos you can add a caveat saying `photoid=123456` and share it, and the recipient can only read the photo 123456. The caveats can be anything, including requiring third party authentication via third-party caveats. I've implemented systems with it being validated by a lua module in nginx which worked well, but the whole concept never took off.

I still think it seems like one of the most interesting authZ strategies around.

3wolf 2022-07-22 16:14 UTC link
A minor quibble with your use-case explanation: The advantage of a bloom filter isn't strictly time complexity. For example, a hash table would also have constant lookup time (best case), and would give a definitive answer on set membership. However, to store 1 million IPv6 addresses would take 16 MB. You can see very quickly that this would not scale very well to, say, a billion addresses stored in-memory on a laptop. With a bloom filter, we can shrink the amount of storage space required* while maintaining an acceptable, calculable false positive rate.

* IP addresses actually aren't a great use case for basic bloom filters, as they're fairly storage efficient to begin with, as opposed to a url for example. Taking your example, say we need to store 1 million IP addresses in our bloom filter and we're okay with a ~1% false positive rate. Well then, if we use a bloom filter with 2^23 bits (1 MB), the optimal number of hash functions is (2^23)/(10^6)*ln(2) = 6, yielding a false positive rate of (1 - exp(-6* 10^6 /2^23))^6 = ~1.8%. So we're using 6% of the storage space, but with a nearly 2% false positive rate.

NSMutableSet 2022-07-21 23:58 UTC link
I wouldn't consider tries to be obscure tbh. They are the recommended solution for many leetcode-style interview problems involving string searching. I think anyone who has done serious interview prep has encountered them.

https://leetcode.com/discuss/general-discussion/931977/begin...

jacamera 2022-07-22 00:03 UTC link
Very interesting! Sounds similar to a segment tree: https://en.wikipedia.org/wiki/Segment_tree
rockwotj 2022-07-22 00:17 UTC link
Do you have any documentation about how tries are used in mongo indexes? Or could you point to the source?
vvanders 2022-07-22 00:24 UTC link
Realtime collision detection[1] has a fantastic chapter in this with some really good practical examples if I remember right.

Great book, I used to refer to it as 3D "data structures" book which is very much in theme with this thread.

[1] https://www.amazon.com/Real-Time-Collision-Detection-Interac...

cassepipe 2022-07-22 01:20 UTC link
Speaking of ease of implementation, I just discovered AA trees. They are probably not "obscure" but I think they are worthy of more fame because they perform jsut as well as red-black trees and are easier to implement. Finding clear comprehensive documentation about them was not easy though, so here is for you, the best I could find : https://www.cs.umd.edu/class/fall2019/cmsc420-0201/Lects/lec...

And the, unhelpful to me, orginal paper : https://user.it.uu.se/~arnea/ps/simp.pdf

itamarst 2022-07-22 01:23 UTC link
Does anyone actually use cache-oblivious data structure in practice? Not, like, "yes I know there's a cache I will write a datastructure for that", that's common, but specifically cache-oblivious data structures?

People mention them a lot but I've never heard anyone say they actually used them.

pjscott 2022-07-22 01:27 UTC link
You can do this with any tree, finger or otherwise, and the big-O stuff applies as long as the tree is within a constant factor of balanced. For some bizarre reason, though, the concept of caching a monoid in each node doesn’t seem to have spread to the standard libraries of most languages. It’s a pity, since that would be incredibly useful.
pacaro 2022-07-22 01:53 UTC link
TBH even quadkeys are a fun answer to OPs question, many people aren't aware of them.

Simple explanation:

If you have data with x y coordinates and you know the bounds.

To compute the quad key for a point:

1. The key starts as the empty string. (I've also seen it start with "Z" to handle points outside the bounds)

2. Divide the space into 4 quadrants

3. determine which quadrant the point falls in, append a letter (A-D depending on the quadrant) to the key

4. Repeat step 3 using the quadrant bounds (i.e. recursively smaller area) until you have desired accuracy

This can then be used to efficiently find points within rectangular bounds.

chrchang523 2022-07-22 01:56 UTC link
Regarding the bit “compress” operation, this is supported in hardware (PEXT instruction) by Haswell and newer processors, though unfortunately most AMD processors have poor implementations. I’ve found it to be quite handy when implementing subsetting operations.
scott_s 2022-07-22 02:25 UTC link
You might enjoy a paper I’m a coauthor on which combines finger trees with B-trees to build a data structure for optimal updates to sliding windows: Optimal and General Out-of-Order Sliding-Window Aggregation, https://www.scott-a-s.com/files/vldb2019_fiba.pdf

Slides from the conference talk: https://www.scott-a-s.com/files/vldb2019_fiba_slides.pdf

cvoss 2022-07-22 03:48 UTC link
> In essence, as data size 'n' grows, the random access time grows as sqrt(n), because that's the radius of the growing circle with area 'n'.

I was about to write a comment suggesting that if we made better use of three dimensional space in constructing our computers and data storage devices, we could get this extra latency factor down to the cube root of n.

But then, I decided to imagine an absurdly large computer. For that, one has to take into account a curious fact in black hole physics: the radius of the event horizon is directly proportional to the mass, rather than, as one might expect, the cube root of the mass [1]. In other words, as your storage medium gets really _really_ big, you also have to spread it out thinner and thinner to keep it from collapsing. No fixed density is safe. So, in fact, I think the extra factor for latency in galactic data sets is neither the square root nor cube root, but n itself!

[1] https://en.wikipedia.org/wiki/Schwarzschild_radius

champtar 2022-07-22 04:01 UTC link
A fantastic thing about HyperLogLog is that it can be merged, so you can split your data between multiple server, precompute HLL for all IPs every minute, and then ask "how many unique IPs was there yesterday".

Discovered HLL because it's used in ClickHouse, which employ a ton of cool but obscure data structure.

throwamon 2022-07-22 04:47 UTC link
If I'm not mistaken, Clojure's data structures are (or used to be) implemented using finger trees.
decebalus1 2022-07-22 05:17 UTC link
I remember this from Programming Pearls and I was going to post about it after reading the comments. Great stuff.
mastermedo 2022-07-22 05:27 UTC link
I don’t understand why wouldn’t you just use a list and an index. You can always access list[index+1] or list[index-1] or list[0] or list[list.length-1].

What is the benefit here?

nemo1618 2022-07-22 05:30 UTC link
I encountered this when I tried implementing a Brainfuck interpreter in Haskell. Representing the memory array using a List + index would be way too slow; but a Zipper is perfect!
ksbrooksjr 2022-07-22 05:47 UTC link
If the Record & Tuple proposal advances to stage 3 we'll finally have native immutable data structures in JS [1].

[1] https://github.com/tc39/proposal-record-tuple

endgame 2022-07-22 06:09 UTC link
Finger Trees Explained Anew is a great derivation of the data structure: https://www.youtube.com/watch?v=ip92VMpf_-A
physicsguy 2022-07-22 06:16 UTC link
I spent a long time looking at an algorithm for long range force calculations called the Fast Multipole Method which is O(N) and found that practically it couldn't compete against an O(N log N) method we used that involved FFTs because the coefficient was so large that you'd need to simulate systems way bigger than is feasible in order for it to pay off because of cache locality, etc.
chatmasta 2022-07-22 06:24 UTC link
I’ve always found Morton encoding to be a cool approach. There’s an old but good blog post about this: https://www.forceflow.be/2013/10/07/morton-encodingdecoding-...

This looks like a more modern implementation: https://crates.io/crates/lindel

sfvisser 2022-07-22 06:25 UTC link
Another interesting approach to copy-on-write for immutable collections (for example arrays) is where you actively mutate the array in place, but leave the original as a lazy description/view of how to get back to the before state.

From the outside the effect is the same, but the performance is optimized for accessing the updated collection and only possibly using the old value.

Great for cases where you want the immutability guarantee, but where it might be unlikely the old value is actually going to be used in the general case.

zbobet2012 2022-07-22 06:25 UTC link
Y-fast tries are some of my favorites. I think they are heavily under utilized in modern terms. They sat by the the wayside for a long term because datasets where relatively small, at the time they where created ram didn't exist, and bitwise operations where inefficient along with many other constant factors.

Today; however, a lot of people have datasets on the order of 2^16 or 2^32 keys they need to maintain. And efficient bitwise operations (on upto 512 bits) are the norm. Y-fast tries are faster than b-trees or other datastructures. Also because they divide _space_ not the _data_ they are very amenable to multi-threaded and distributed algorithms. For example the hash-tables in a y-fast trie can actually be a rendezvous hash pointing to a database node. Once in that node you can hash on cores again to get to a local process for example.

zimpenfish 2022-07-22 06:31 UTC link
Oh, huh, this is serendipitously handy because I need to be doing something like that this weekend (translating from Go to Rust - I already have a reparenting solution using hashes but hopefully this is easier / better.)
kragen 2022-07-22 06:42 UTC link
My writeup of union find is in https://dercuano.github.io/notes/incremental-union-find.html. I did not in fact find a way to make it efficiently support incremental edge deletion, which is what I was looking for.
tresil 2022-07-22 06:43 UTC link
Big fan of HLL

Apache foundation has a fantastic DataSketches library that includes HLL and many other powerful data analytics algorithms: https://datasketches.apache.org/

Lee Rhodes has done an excellent introduction to this library - explaining some of the use cases, advantages, and things to be aware of when using these techniques: https://www.youtube.com/watch?v=nO9pauS-mGQ

chrismarlow9 2022-07-22 06:48 UTC link
Even though a trie is pretty standard and expected (to be known) these days, I will vote for this. It was my first deep dive into more exotic data structures after an interview question about implementing T9 that stumped me many years ago.

https://github.com/Azeem112/tries-T9-Prediction

btschaegg 2022-07-22 07:28 UTC link
Indeed, but it took a while for me to appreciate Tries.

Originally, i thought "nice idea", but I rarely ever encountered a problem where I really needed a prefix tree.

But while reading into HAMT and Clojure's datastructure implementations, it dawned on me that prefix trees aren't only useful for strings (or arbitrary byte sequences): Hashes in a dictionary, for example, are sequences of bits, too. And the design of a HAMT is exactly such that it doesn't concern itself with bytes but any blocks of N bits (determined by the branching factor; i.e. a factor of 32 implies 6-bit-blocks). And if you know the length of each sequence (hash), you know the maximum Trie depth, which also works for you, not against you.

That was rather eye opening for me at the time.

Cthulhu_ 2022-07-22 07:37 UTC link
I didn't know this was a formal design pattern; I do recall having implemented this myself once, as a means to avoid double requests from different components.

Later on I switched to using react-query which has something like this built-in, it's been really good.

miclill 2022-07-22 07:45 UTC link
also Ctrie: "a concurrent thread-safe lock-free implementation of a hash array mapped trie": https://en.wikipedia.org/wiki/Ctrie

So basically like HAMT but lock free.

jtwigg 2022-07-22 08:02 UTC link
FYI, if you plan to do this in ASP netcore, combine with AsyncLazy for the most optimal results https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b...
Editorial Channel
What the content says
+0.10
Article 12 Privacy
Low Framing
Editorial
+0.10
SETL
ND

Mentions IP blacklisting as a use case, which relates to privacy and surveillance concerns.

+0.10
Article 19 Freedom of Expression
Low Practice
Editorial
+0.10
SETL
0.00

Content is posted on a discussion forum, enabling expression of technical ideas.

+0.10
Article 27 Cultural Participation
Low Practice
Editorial
+0.10
SETL
ND

Content shares technical knowledge and invites community participation.

ND
Preamble Preamble

No content addressing human dignity, rights, or foundational principles.

ND
Article 1 Freedom, Equality, Brotherhood

No content addressing equality, dignity, or conscience.

ND
Article 2 Non-Discrimination

No content addressing non-discrimination or equal rights.

ND
Article 3 Life, Liberty, Security

No content addressing life, liberty, or security.

ND
Article 4 No Slavery

No content addressing slavery or servitude.

ND
Article 5 No Torture

No content addressing torture or cruel treatment.

ND
Article 6 Legal Personhood

No content addressing legal personality or recognition.

ND
Article 7 Equality Before Law

No content addressing equality before the law.

ND
Article 8 Right to Remedy

No content addressing effective remedies.

ND
Article 9 No Arbitrary Detention

No content addressing arbitrary detention.

ND
Article 10 Fair Hearing

No content addressing fair public hearings.

ND
Article 11 Presumption of Innocence

No content addressing presumption of innocence.

ND
Article 13 Freedom of Movement

No content addressing freedom of movement.

ND
Article 14 Asylum

No content addressing asylum rights.

ND
Article 15 Nationality

No content addressing nationality rights.

ND
Article 16 Marriage & Family

No content addressing marriage or family rights.

ND
Article 17 Property

No content addressing property rights.

ND
Article 18 Freedom of Thought

No content addressing freedom of thought or religion.

ND
Article 20 Assembly & Association

No content addressing assembly or association rights.

ND
Article 21 Political Participation

No content addressing political participation.

ND
Article 22 Social Security

No content addressing social security.

ND
Article 23 Work & Equal Pay

No content addressing work or union rights.

ND
Article 24 Rest & Leisure

No content addressing rest or leisure.

ND
Article 25 Standard of Living

No content addressing standard of living.

ND
Article 26 Education

No content addressing education rights.

ND
Article 28 Social & International Order

No content addressing social order for rights realization.

ND
Article 29 Duties to Community

No content addressing duties to community.

ND
Article 30 No Destruction of Rights

No content addressing rights destruction.

Structural Channel
What the site does
+0.10
Article 19 Freedom of Expression
Low Practice
Structural
+0.10
Context Modifier
0.00
SETL
0.00

Platform allows user-generated content and discussion.

ND
Preamble Preamble

No structural features related to human dignity or rights recognition.

ND
Article 1 Freedom, Equality, Brotherhood

No structural features related to equality or dignity.

ND
Article 2 Non-Discrimination

No structural features addressing discrimination or equal access.

ND
Article 3 Life, Liberty, Security

No structural features protecting life, liberty, or security.

ND
Article 4 No Slavery

No structural features related to preventing slavery or servitude.

ND
Article 5 No Torture

No structural features preventing torture or cruel treatment.

ND
Article 6 Legal Personhood

No structural features related to legal recognition.

ND
Article 7 Equality Before Law

No structural features promoting legal equality.

ND
Article 8 Right to Remedy

No structural features providing remedies for rights violations.

ND
Article 9 No Arbitrary Detention

No structural features preventing arbitrary detention.

ND
Article 10 Fair Hearing

No structural features related to fair hearings.

ND
Article 11 Presumption of Innocence

No structural features supporting presumption of innocence.

ND
Article 12 Privacy
Low Framing

No structural privacy protections visible on page.

ND
Article 13 Freedom of Movement

No structural features related to movement freedom.

ND
Article 14 Asylum

No structural features related to asylum.

ND
Article 15 Nationality

No structural features related to nationality.

ND
Article 16 Marriage & Family

No structural features supporting family rights.

ND
Article 17 Property

No structural features protecting property rights.

ND
Article 18 Freedom of Thought

No structural features supporting freedom of thought.

ND
Article 20 Assembly & Association

No structural features specifically supporting assembly.

ND
Article 21 Political Participation

No structural features supporting political participation.

ND
Article 22 Social Security

No structural features related to social security.

ND
Article 23 Work & Equal Pay

No structural features supporting work rights.

ND
Article 24 Rest & Leisure

No structural features supporting rest or leisure.

ND
Article 25 Standard of Living

No structural features related to standard of living.

ND
Article 26 Education

No structural features specifically supporting education.

ND
Article 27 Cultural Participation
Low Practice

No structural features specifically supporting cultural participation.

ND
Article 28 Social & International Order

No structural features promoting social order for rights.

ND
Article 29 Duties to Community

No structural features emphasizing community duties.

ND
Article 30 No Destruction of Rights

No structural features preventing rights destruction.

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.79 low claims
Sources
0.9
Evidence
0.7
Uncertainty
0.5
Purpose
1.0
Propaganda Flags
No manipulative rhetoric detected
0 techniques detected
Emotional Tone
Emotional character: positive/negative, intensity, authority
measured
Valence
+0.2
Arousal
0.2
Dominance
0.2
Transparency
Does the content identify its author and disclose interests?
0.00
✗ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.64 solution oriented
Reader Agency
0.4
Stakeholder Voice
Whose perspectives are represented in this content?
0.30 1 perspective
Speaks: individuals
Temporal Framing
Is this content looking backward, at the present, or forward?
present unspecified
Geographic Scope
What geographic area does this content cover?
unspecified
Complexity
How accessible is this content to a general audience?
technical high jargon domain specific
Longitudinal · 35 evals
+1 0 −1 HN
Audit Trail 55 entries
2026-03-01 19:51 eval_success Evaluated: Mild positive (0.10) - -
2026-03-01 19:51 eval Evaluated by deepseek-v3.2: +0.10 (Mild positive) 8,761 tokens -0.20
2026-03-01 18:37 eval_success Evaluated: Moderate positive (0.30) - -
2026-03-01 18:37 eval Evaluated by deepseek-v3.2: +0.30 (Moderate positive) 8,078 tokens +0.19
2026-03-01 16:04 eval_success Evaluated: Mild positive (0.11) - -
2026-03-01 16:04 eval Evaluated by deepseek-v3.2: +0.11 (Mild positive) 7,910 tokens -0.30
2026-03-01 04:16 eval_success Evaluated: Moderate positive (0.41) - -
2026-03-01 04:16 model_divergence Cross-model spread 0.41 exceeds threshold (3 models) - -
2026-03-01 04:16 eval Evaluated by deepseek-v3.2: +0.41 (Moderate positive) 8,437 tokens -0.15
2026-03-01 04:16 rater_validation_warn Validation warnings for model deepseek-v3.2: 0W 3R - -
2026-02-28 15:32 model_divergence Cross-model spread 0.55 exceeds threshold (4 models) - -
2026-02-28 15:32 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 15:32 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 14:00 model_divergence Cross-model spread 0.55 exceeds threshold (4 models) - -
2026-02-28 14:00 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 14:00 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 12:27 eval_success Evaluated: Moderate positive (0.55) - -
2026-02-28 12:27 model_divergence Cross-model spread 0.55 exceeds threshold (4 models) - -
2026-02-28 12:27 eval Evaluated by deepseek-v3.2: +0.55 (Moderate positive) 9,396 tokens
2026-02-28 12:27 rater_validation_warn Validation warnings for model deepseek-v3.2: 1W 12R - -
2026-02-28 11:53 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 11:53 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 11:53 rater_validation_warn Lite validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 11:48 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 11:48 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 11:48 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 11:05 eval Evaluated by claude-haiku-4-5-20251001: 0.00 (Neutral)
2026-02-28 11:04 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 11:04 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 11:04 rater_validation_warn Lite validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 10:13 eval_success Lite evaluated: Neutral (0.00) - -
2026-02-28 10:13 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 09:55 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 09:33 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 08:05 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 07:56 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 07:13 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 06:59 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 06:25 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 05:41 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 05:22 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 05:17 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 05:09 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 04:46 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 04:38 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 04:33 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 04:08 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 03:54 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 03:51 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 03:15 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 02:44 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion, no rights stance
2026-02-28 02:39 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 01:52 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Tech discussion, no rights stance
2026-02-28 01:47 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Tech discussion on data structures, no human rights relevance
2026-02-28 01:30 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
reasoning
Tech discussion on data structures, no human rights relevance