324 points by levkk 6 days ago | 64 comments on HN
| Mild positive Mixed · v3.7· 2026-02-26 03:26:27 0
Summary Knowledge Sharing & Digital Collaboration Advocates
The pgdog GitHub repository is a public, freely accessible open-source database tool that exemplifies advocacy for knowledge sharing, scientific progress, and collaborative participation. The content demonstrates strong positive alignment with Articles 19–20 (freedom of expression, association, and information access), Article 26–27 (education and cultural participation), and structural support for equitable access through GitHub's accessibility features and open-source distribution model. However, structural constraints on intellectual property ownership (Article 17) and privacy concerns related to platform analytics tracking (Article 12) create modest counterbalancing signals.
Happy pgdog user here, I can recommend it from a user perspective as a connection pooler to anyone checking this out (we're also running tests and positive about sharding, but haven't run it in prod yet, so I can't 100% vouch for it on that, but that's where we're headed.)
@Lev, how is the 2pc coming along? I think it was pretty new when I last checked, and I haven't looked into it much since then. Is it feeling pretty solid now?
> If you build apps with a lot of traffic, you know the first thing to break is the database.
Just out of curiosity, what kinds of high-traffic apps have been most interested in using PgDog? I see you guys have Coinbase and Ramp logos on your homepage -- seems like fintech is a fit?
Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?
Great progress, guys! It’s impressive to see all the enhancements - more types, more aggregate functions, cross-node DML, resharding, and reliability-focused connection pooling and more. Very cool! These were really hard problems and took multiple years to build at Citus. Kudos to the shipping velocity.
Some HTTP proxies can do retries -- if a connection to one backend fails, it is retried on a different backend. Can PgDog (or PgBouncer, or any other tool) do something similar -- if there's a "database server shutting down" error or a connection reset, retry it on another backend?
I see the word 'replication' mentioned quite a few times. Is this managed by pgdog? Would I be able to replace other logical replication setups with pgdog to create a High Availability cluster?
(apologies for new account - NDA applies to the specifics)
Nice surprise to see this here today. I was working on a deployment just last week.
Unfortunately for me, I found that it crashed when doing a very specific bulk load (COPY FORMAT BINARY with array columns inside a transaction). The process loads around 200MB of array columns (in the region of 10K rows) into a variety of tables. Very early in the COPY process PgDog crashes with :
"pgdog router error: failed to fill whole buffer"
So it appears something is not quite right for my specific use case (COPY with array columns). I'm not familiar enough with Rust but the failed to fill whole buffer seemed to come from Rust (rather than PgDog) based on what little I could find with searches.
I was very disappointed as it looked much simpler to get set up and running that PgPool-II (which I have had to revert to as my backup plan - I'm finding it more difficult to configured, but it does cope with the COPY command without issues).
How do you know when/if it's justified to add additional complexity like PgDog?
Is there a number of simultaneous connection / req per sec that's a good threshold?
Is it easy on my postgres instance to get the number of simulataneous connections, for instance if I simulate traffic, to know if I would gain anything from a connection pooler?
1) Is it possible to start off with plain Postgres and add pgdog without scheduled downtime down the road when scaling via sharding becomes necessary?
2) How are schema updates handled when using physical multi-tenancy? Does pgdog just loop over all the databases that it knows about and issues the update schema command to each?
Congrats on the progress!
What is the behavior of PgDoc if it receives some sort of query it can't currently handle properly? Is there a linter/static analysis tool I can use to evaluate if my query will work?
Can you elaborate a bit more on the challenges faced in making Postgres shard-able?
I remember that adding sharing to Postgres natively was an uphill battle. There were a few companies who has proprietary solutions for it. What you've been able to achieve is nothing less than a miracle.
How would this product compare to a PostgREST based approach (this is the cool tech behind the original supabase) with load balancing at the HTTP level?
As someone who has worked on many-TB-sized "custom" sharded systems with 30-150 shards at multiple (ok, 2) employers, a key challenge to the overall sharding landscape is unsharding all the data back at the analytics layer.
This at a minimum often involved adding back a shard key to the physical data, or partitioning, and/or physical data sorting easily in the "OLAP" layer. And a surprising number of CDC and ETL toolkits don't make it easy to parameterize a single code/configuration base, nor handle situations like shards being down at different times for maintenance or fetching data from each shard at a time of day specified by its end-of-day or handling retransmissions or reconciliation or gaps or data quality of a single shard when back in an unsharded landscape. SQL UNION ALL to reunite shards works, until it doesn't.
YMMV but would be curious if you have a story/solution/thoughts along these lines. It's easier if you shard with unified analytics/reporting in mind on day one of a sharded system design, but in the worlds I've lived in, nobody ever does. But maybe you could.
Really exciting to see the progress on this project! I'm not sure I understand the update "we are in production." Is this referencing a particular release or a more general statement about adoption?
It feels better now, but we still need to add crash protection - in case PgDog itself crashes, we need to restore in-progress 2pc transaction records from a durable medium. We will add this very soon.
We have all kinds, it's not specific to any particular sector. That's kind of the beauty for building for Postgres - everyone uses it in some capacity!
My general advice is, once you see more than 100 connections on your database, you should consider adding a connection pooler. If your primary load exceeds 30% (CPU util), consider adding read replicas. This also applies if you want some kind of workload isolation between databases, e.g. slow/expensive analytics queries can be pushed to a replica. Vertically scaling primaries is also a fine choice, just keep that vertical limit in mind.
Once you're a couple instance types away from the largest machine your cloud provider has, start thinking about sharding.
Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.
We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.
It shards it as well. We handle schema sync, moving table data (in parallel), setting up logical replication, and application traffic cutover. The zero-downtime resharding is currently WIP, working on the PR as we speak: https://github.com/pgdogdev/pgdog/pull/784.
Not currently, but we can add this. One thing we have to be careful of is to not retry requests that are executing inside transactions, but otherwise this would be a great feature.
I'll need a bit more info about your use case to answer. We use logical replication to move data between shards, with the intention of creating new shards.
This is managed by PgDog. We are building a lot of tooling here, and a lot of it is configurable and can be used separately. For example, we have a CLI and admin database commands to setup replication streams between databases, irrespective of their sharded status, so it can be used for other purposes as well, like moving tables or entire databases to new hardware. If you keep the stream(s) running, you can effectively keep up-to-date logical replicas.
We don't currently manage DDL replication (CREATE/ALTER/DROP) for logically replicated databases - this is a known limitation that we will address shortly. After all, we don't want users to pause schema migrations during resharding. I think once that piece is in, you'll be able to run pretty much any kind of long-lived logical replicas for any purpose, including HA.
Might be worth another try. If not, a GitHub issue with more specifics would be great, and we'll take a look. Also, if binary encoding isn't working out, try using text - it's more compatible between Postgres versions:
The way we solved it is by checking the lsn on the primary, and then waiting for the replica to catch up to that lsn before doing reads on the replica in various scenarios.
1. Yup, we support online resharding, so you don't need to deploy this until you have to.
2. That's right, we broadcast the DDL to all shards in the configuration. If two-phase commit [1] is enabled, you have a strong guarantee that this operation will be atomic. The broadcast is done in parallel, so this is fast.
PostgREST is a translation layer: you use HTTP methods, inputs and outputs, to interact with Postgres, the database. It's a replacement for SQL, the language, which happens to also have a load balancer.
Their load balancer is still at the Postgres layer though. You can think of it as just an application that happens to speak a specific API. Load balancing applications is a solved problem.
1. People don't design schemas to be sharded, although many gravitate towards a common key, e.g. user_id or country_id or tenant_it or customer_id. Once that happens, sharding becomes easier.
2. Postgres provides a lot of guarantees that are tricky to maintain when sharded: atomic changes, referential integrity, check constraints, unique indexes (and constraints), to name a few. Those have to be built separately by a sharding layer (like PgDog) and have trade-offs, usually around performance. It's a lot more expensive to check a globally enforced constraint than a local one (network hops aren't free).
3. Online migrations from unsharded to sharded can be tricky: you have to redistribute terabytes of data while the DB continues to serve writes. You can't lose a single row - Postgres is used as a store of record and this can be a serious issue with business impact.
We're taking increasingly bigger bites at this apple. We started with basic query routing and are now doing query rewrites as well. We didn't handle data movements previously and now have almost fully automatic resharding. It takes time, elbow grease and most importantly, willing and courageous early adopters to whom we owe a huge debt of gratitude.
The current behavior unfortunately is to just let it through and return an incorrect result. We are adding more checks here and rely heavily on early adopters to have a decent test suite before launching their apps to prod.
That being said, we do have this [1]:
[general]
expanded_explain = true
This will modify the output of EXPLAIN queries to return routing decisions made by PgDog. If you see that your query is "direct-to-shard", i.e. goes to only one shard, you can be certain that it'll work as expected. These queries will talk to only one database and don't require us to manipulate the result or assemble results from multiple shards.
For cross-shard queries, you'll need your own integration tests, for now. We'll add checks here shortly. We have a decent CI suite as well, but it doesn't cover everything. Every time we look at that part of the code, we just end up adding more features, like the recent support for LIMIT x OFFSET y (PgDog rewrites it to LIMIT x + y and applies the offset calculation in memory).
I would say, over 100 Postgres connections, consider getting a connection pooler. Requests per second is highly variable. Postgres can serve a lot of them, as long as you keep the number of server connections low - that's what the pooler is for.
You can use pgbench to benchmark this on local pretty easily. The TPS curve will be interesting. At first, the connection pooler will cause a decrease and as you add more and more clients (-c parameter), you should see increasing benefits.
Ultimately, you add connection poolers when you don't have any other option: you have hundreds of app containers with dozens of connections each and Postgres can't handle it anymore, so it's a necessity really.
Load balancing becomes useful when you start adding read replicas. Sharding is necessary when you're approaching the vertical limit of your cloud provider (on the biggest instance or close).
1. Replicate shards into one beefy database and use that. Replication is cheaper than individual statements, so this can work for a while. The sink can be Postgres or another database like Clickhouse. At Instacart, we used Snowflake, with an in-house CDC pipeline. It worked well, but Snowflake was only usable for offline analytics, like BI / batch ML, and quite expensive. We'll add support for this eventually; we're getting pretty good at managing logical replication, including DDL changes.
2. Use the shards themselves and build a decent query engine on top. This is the Citus way and we know it's possible. Some queries could be expensive, but that's expected and can be solved with more compute.
In our architecture, shards going down for maintenance is an incident-level event, so we expect those to be up at all times, and failover to a standby if there is an issue. These days, most maintenance tasks can be done online in-place, or with blue/green, which we'll support as well. Zero downtime is the name of the game.
Technically yes. We only support BIGINT (and all other integers), VARCHAR and UUID for sharding keys, but we'll happily pass through any other data. If we need to process it, we'll need to parse it. To be clear: you can include PostGIS data in all queries, as long as we don't need it for sharding.
It's not too difficult to add sharding on it if we wanted to. For example, we added support for pgvector a while back (L2/IVFlat-based sharding), so we can add any other data type, e.g., POLYGON for sharding on ST_Intersects, or for aggregates.
General statement about adoption. Last time we made a Show HN (9 months ago), it was a POC, running on my local. Now we're used in production by some pretty big companies, which is exciting!
You can I believe. We only support BIGINT, VARCHAR and UUID for sharding, but all other data types are completely fine for passthrough, i.e. to be included and used in your queries.
Repository is itself a contribution to shared scientific and technical knowledge; open publication exemplifies participation in advancing cultural and scientific progress.
FW Ratio: 57%
Observable Facts
pgdog project is a technical innovation shared publicly.
Repository enables others to contribute improvements and modifications.
Code is licensed for sharing and derivative work under open-source terms.
Feature flags indicate support for knowledge sharing infrastructure (copilot, documentation).
Inferences
Public repository and contribution model exemplify participation in scientific progress.
Open licensing respects moral and material interests in technical contribution.
Shared infrastructure supports cultural advancement in database technology.
Repository and open-source model explicitly support education and knowledge development; technical documentation, code comments, and shared implementation serve educational purposes.
FW Ratio: 57%
Observable Facts
Repository includes descriptive documentation and README files.
Code is freely accessible for study and learning.
Comments in feature flags and code structure provide technical context.
GitHub's platform supports knowledge discovery through search and navigation.
Inferences
Publicly available code and documentation serve educational purposes.
Open-source model enables development of human potential through learning.
Free access supports equitable education access regardless of economic status.
Page demonstrates advocacy for open-source software and collaborative knowledge sharing, which implicitly champion freedom of expression and information access through code publication.
FW Ratio: 57%
Observable Facts
Repository title describes pgdog as a PostgreSQL tool with public documentation.
Page is indexed and searchable, enabling discoverability.
No editorial controls visible restricting expression within GitHub's community standards.
GitHub's access model provides public discussion and contribution infrastructure.
Inferences
Open publication of technical documentation exemplifies freedom to express ideas and share information.
Repository structure enables others to freely receive and impart knowledge about database technology.
Public accessibility supports Article 19's information access principles.
The public repository model implicitly supports association and assembly by enabling collaborative contribution and community participation around the project.
FW Ratio: 50%
Observable Facts
Repository structure includes discussion, issue, and pull request mechanisms enabling community participation.
No restrictions on who can participate in repository discussions within GitHub's terms.
Page enables association through project contribution.
Inferences
Collaborative development model enables freedom of association among developers.
Public participation mechanisms support Article 20's assembly principle.
Open contribution structure facilitates voluntary association around shared goals.
Repository description and documentation implicitly support adequate standards of living by providing open-source tools that lower barriers to database infrastructure access.
FW Ratio: 60%
Observable Facts
Tool is freely available as open-source software.
GitHub page includes accessibility features for navigation and content access.
Documentation enables users to self-provision database infrastructure.
Inferences
Open-source distribution removes economic barriers, supporting access to technology for adequate living standards.
Free availability supports economic accessibility regardless of income.
GitHub's public repository model (cached DCP +0.12) and community guidelines (cached DCP +0.08) enable unrestricted speech within bounds of terms; repository is openly readable and searchable, supporting information access.
GitHub's access model (cached DCP +0.12) and community guidelines (cached DCP +0.08) enable open participation in cultural and scientific advancement; repository is freely shared and contributable.
GitHub's community infrastructure allows developers to associate freely, comment, create issues, and collaborate; public discussions visible in repository demonstrate freedom of association.
GitHub's accessibility and public-discussion model (cached DCP +0.15) create conditions for learning; repository code and documentation are freely accessible for educational purposes.
GitHub's public repository model and open access structure support the Preamble's aspirational framework of universal human dignity through knowledge sharing and collaborative practice.
GitHub's privacy controls (cached DCP +0.1) and ad_tracking concerns (-0.08) create a slight net positive but constrained by behavioral data collection risks on feature flags and analytics.
GitHub's platform control (cached DCP -0.05) means user-generated content ownership is conditional on platform terms rather than absolute; creators retain some rights but subject to GitHub's control.
Open-source model enables free participation and choice of contribution; GitHub's platform does not mandate labor but facilitates voluntary work sharing and peer production.
GitHub's terms of service (cached DCP) and community guidelines establish duties and limitations; platform enforces community standards and legal obligations.
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 13:57:54 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.