707 points by talhof8 396 days ago | 474 comments on HN
| Moderate positive
Contested
Editorial · v3.7· 2026-02-28 13:52:12 0
Summary Data Privacy & Security Advocates
Wiz Research documents a critical database exposure at DeepSeek containing over 1 million log entries with chat history, API secrets, and sensitive user data, demonstrating how rapid AI adoption has outpaced security practices. The article strongly advocates for industry-wide security standards comparable to public cloud providers and emphasizes data protection as a fundamental right, positioning infrastructure security as essential to digital rights. The research supports responsible disclosure principles and frames the vulnerability as a systemic human rights issue requiring immediate industry governance changes.
And that's why you run models locally. Or if you want a remote chat model, use something stateless like AWS Bedrock custom model import to avoid having stored chats on the server.
> More critically, the exposure allowed for full database control and potential privilege escalation within the DeepSeek environment, without any authentication or defense mechanism to the outside world.
Not only that, this was a "production-grade" database with millions of users using it and the app was #1 on the app store and ALL text sent there in the prompts was logged in plain-text?
It seems fair since all the other AI's scraped copyrighted information, images, video online and from pirated books, etc. without ever asking anyone first.
- Dev infra, observability database (open telemetry spans)
- Logs of course contain chat data, because that's what happens with logging inevitably
The startling rocket building prompt screenshot that was shared is meant to be shocking of course, but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes.
Still pretty bad obviously and could have easily led to further compromise but I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it. Glad to see it was disclosed and patched quickly.
This is probably an incredibly stupid, off-topic question, but why are their database schemas and logs in English?
Like, when a DeepSeek dev uses these systems as intended, would they also be seeing the columns, keys, etc. in English? Is there usually a translation step involved? Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?
I'm realizing now that I'm very ignorant when it comes to non English-based software engineering.
Does DeepSeek have a bug bounty program I'm not aware of with a clearly defined scope? It appears that Wiz took it upon themselves to probe and access DeepSeek's systems without permission and then write about it.
If you do this and the company you're conducting your "research" on hasn't given you permission in some form, you can get yourself in a lot of hot water under the CFAA in the USA and other laws around the world.
Please don't follow this example. Sign up for a bug bounty program or work directly with a company to get permission before you probe and access their systems, and don't exceed the access granted.
The amount of vitriol in these comments is the really surprising data. I've seen the same on Twitter. I can only put it down to the financial pain DeepSeek inflicted on many US retail investors by wiping almost $700 billion off NVidia's stock price. I think a lot of folks didn't see it coming and it hurt them right where it matters most: In the wallet. The anger out there is very real.
Thank you everyone, this was responsibly disclosed to DeepSeek and published after the issue was remediated, we got acknowledgment from their team today on our contribution.
The second Big Tech was threatened by significant competition (DeepSeek), this competition is "stealing"(lol), and is under heavy hacking attacks (main online inference portal).
There you have, the real face of Big Tech. Extinguishing the competition by locking a service behind a portal provided for free, then starting to milk the users, is not enough for them... they will also fight dirty, really dirty.
I agree this is really bad but far from unbelievable. I am only 23 and already my SSN and even my freaking DNA have both been leaked by major publicly traded companies.
Did they ever make promises as to confidentiality? What if providing all chat logs with users is just part of their open source / shānzhài attitude ? :)
DeepSeek isn’t a side project or just a bunch of quants - these are part of the marketing that people keep repeating blindly for some reason. To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M. Anyone spending that amount of money isn’t just doing a side project.
The client facing aspect isn’t the problem here. This linked article is talking about the backend having vulnerabilities, not the client facing application. It’s about a database that is accessible from the internet, with no authentication, with unencrypted data sitting in it. High Flyer, the parent company of Deep Seek, already has a lot of backend experience, since that is a core part of the technologies they’ve built to operate the fund. If you’re a quantitative hedge fund, you aren’t just going to be lazy about your backend systems and data security. They have a lot of experience and capability to manage those backend systems just fine.
I’m not saying other companies are perfect either. There’s a long list of American companies that violate user privacy, or have bad security that then gets exploited by (often Chinese or Russian) hackers. But encrypting data in a database seems really basic, and requiring authentication on a database also seems really basic. It would be one thing if exposure of sensitive info required some complicated approach. But this degree of failure raises lots of questions whether such companies can ever be trusted.
Not many non-gamers have hardware capable of running such a model locally - never mind the skills.
For most people, bash is not a tool for interacting with the computer, it is how they express their frustration with the computer (sometimes leaving damaged keyboards).
Someone who worked on a non-English environment years ago here: sometimes you do use the local language in some contexts, but, more often than not, you end up using English for the majority of stuff since it's a bit off-putting to mix another language with the English of programming languages and APIs.
> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?
That is precisely what happens. It is not unusual for code and databases to be written in English, even when the developers are from a non-English speaking country. Think about it: the toolchain, programming language and libraries are all based on English anyway.
> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?
I'm a native English speaker, but from looking at various code bases written by people who aren't, I gather that it's basically this. It wasn't too long ago that one couldn't even reliably feed non-ASCII comments to a lot of compilers, let alone variable and function names.
Almost all software engineers learn a passing amount of English - truly localized programming environments are quite esoteric and not really available for most mainstream use cases I can think of.
Depending on the company culture and policy, the most common thing to see is a mix of English variable and function names with native-language comments. Occasionally you will see native-language variable and function names. This is much more common in Latin character set languages (especially among Spanish and Portuguese speakers) in my experience; almost all Chinese code seems to use approximately-English variable and function names.
That's pretty much the same mistake as in VW recent "We know where you parked" hack. [0] So while I don't really want to say anything nice about VW, the mistake is no something that only happens to side projects.
> but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes
As I understand, the finish reason being “stop” in API responses usually means the AI ended the output normally. In any case, I don't see how training data could end up in production logs, nor why they'd want to prevent such data (a prompt you'd expect to see a normal user to write) from being responded to.
> [...] I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it.
Security researchers are often asked to not pursue findings further than confirming their existence. It can be unhelpful or mess things up accidentally. Since these researchers probably weren't invited to deeply test their systems, I think it's the polite way to go about it.
This mistake was totally amateur hour by DeepSeek, though. I'm not too into security stuff but if I were looking for something, the first thing I'd think to do is nmap the servers and see what's up with any interesting open ports. Wouldn't be surprised at all if others had found this too.
It might seem less credible to encounter English in a place where it’s less expected, but think of it this way: would a Yandex-developed ClickHouse database be adopted by Chinese devs if everything in it were written in Russian?
There is some merit in asking your question, for there’s an unspoken rule (and a source of endless frustration) that business-/domain-related terms should remain in the language of their origin. Otherwise, (real-life story) "Leitungsauskunft" could end up being translated as "line information" or even "channel interface" ("pipeline inquiry" should be correct, it's a type of document you can procure from the [German] government).
Ironically, I’m currently working in an environment where we decided to translate such terms, and it hasn’t helped with understanding of the business logic at all. Furthermore, it adds an element of surprise and a topic for debate whenever somebody comes up with a "more accurate translation".
So if anything, English is a sign of a battle-hardened developer, until they try to convert proper names.
They left open a publicly exposed database... I'm sure they informed the company about this before publishing their post. Why are you blaming Wiz for this?
'DeepSeek is the side project of a bunch of quants'
I doubt it very much that it only was that and not massivly backed by the Chinese state in general.
As with OpenAI, much of this has to do with hype based speculation.
In the case of OpenAI they played with the speculations, that they might have AGI locked up in their labs already and fueled those speculations.
The result, massive investment (now in danger).
And China and the US play a game of global hegemony. I just read articles with the essence of, see China is so great, that a small sideproject there can take down the leading players from the west! Come join them.
It is mere propaganda to me.
Now deepseek in the open is a good thing, but I believe the Chinese state is backing it up massivly to help with that success and to help shake the western world of dominance. I would also assume, the chinese intelligence services helped directly with Intel straight out of OpenAI and co labs.
This is about real power.
Many states are about to decide which side they should take, if they have to choose between West and East. Stuff like this heavily influences those decisions.
It's also deeply damaging to the western ego, especially one rooted in American exceptionalism.
But also one those of us actually working on foundational AI saw coming a mile away when most of the top research of late has been happening in Chinese labs, not American or European ones.
Can't wait to see what this boneheaded President's tarrif on TSMC does to this situation.
I'm sure some people did actually get hurt by NVIDIA's stock dropping, but it's also important to keep the size of the effect in perspective: NVIDIA's stock is back to where it was in September of last year, and still up almost 1900% from 5 years ago and up 103% from a year ago.
NVIDIA's stock has been super bubbly—all DeepSeek did was set off itchy investor trigger fingers that were already worried about its highly inflated price.
doesn't even need to be a side project, or by a bunch of quants. a bunch of AI researchers working on this as their primary job would still have no real idea about what it takes to secure a large-scale world-usable internet service.
In my opinion, it is not vitriol as much as unfiltered recognition of the significant issues and risks that have become a part of DeepSeek’s story: the Chinese government injecting propaganda into LLMs, the threat of apps from adversaries in US app stores (like TikTok and DeepSeek), the disregard for user privacy (their database was open to the Internet with no authentication and no encryption of data), the misleading claim of quoting the cost of a single final run (which amounts to market manipulation of nvidia stock), the theft of OpenAI’s assets that they’ve not admitted to, the likely evasion of sanctions, and so on.
Every intelligent colleague is an interesting mix of 'sour but intrigued'
Personally, I know I've lost a lot of street cred amongst certain work circles in recent history as far as my thoughts of 'shops should pursue local LLM solutions[0]' and the '$6000 4-8 tokens/second local LLM box' post making the rounds [1] hopefully gives orgs a better idea of what LLMs can do if we keep them from being 100% SAASlike in structure.
I think a big litmus test for some orgs in near future, is whether they keep 'buying ChatGPT' or instead find a good way to quickly customize or at least properly deploy such models.
[0] - I mean, for starters, a locally hosted LLM resolves a LOT of concerns around infosec....
[1] - Very thankful a colleague shared that with me...
AFAIK, Opensource Elasticsearch does not offer any form of authentication upon installation for many years but ClickHouse does and in fact I'm often surprised at how many authentication mechanisms were introduced over the years and can be easily configured:
- Password authentication (bcrypt, sha256 hashes)
- Certificate authentication (Fantastic for server to server communication)
- SSH key authentication (Personally, this is my favourite - every database should have this authentication mechanism to make it easy for Dev to work with)
Not very popular but LDAP and Http Authentication Server are also great options.
I also wonder how DeepSeek engineers deployed their ClickHouse instance. When I deployed using yum/apt install, the installation step literally ask you to input a default password.
And if you were to set it up manually with ClickHouse binary, the out-of-the-box config seal the instance from external network access and the default user is only exposed to localhost as explained by Alex here - https://news.ycombinator.com/item?id=42871371#42873446.
I agree to your comment, but also there's probably an unspoken gentleman's agreement that DeepSeek fixed the issue and won't pursue legal action against Wiz, since they were helpful and didn't do anything malicious.
I did the same a while ago, an education platform startup had their web server misconfigured, I could clone their repo locally because .git was accessible. I immediately sent them an email from a throwaway account in case they wanted to get me in trouble and informed them about the configuration issues. They thanked me for the warning and suggestions, and even said they could get me a job at their company.
Central focus: privacy violations through data exposure. Article frames privacy as fundamental right requiring protection. Detailed documentation of what was exposed: chat history, API keys, backend details, operational data.
FW Ratio: 57%
Observable Facts
Exposed database contained 'chat history, secret keys, backend details, and other highly sensitive information'
'Over a million lines of log streams' with 'Chat History, API Keys' exposed without authentication
Article frames exposure as threat: 'an attacker could retrieve sensitive logs and actual plain-text chat messages'
Advocates: 'protecting customer data must remain the top priority'
Inferences
Privacy violations (exposing chat history, personal data, credentials) are presented as violations of fundamental rights
The article frames privacy protection as non-negotiable corporate responsibility
Detailed exposure documentation emphasizes severity of right violations
Strong advocacy for industry-wide security frameworks as foundational governance infrastructure. Calls for standards 'on par with those required for public cloud providers.' Positions security as essential to digital rights infrastructure.
FW Ratio: 60%
Observable Facts
Article states: 'the industry must recognize the risks of handling sensitive data and enforce security practices on par with those required for public cloud providers'
Emphasizes: 'industry must recognize the risks' and 'enforce security practices' as foundational requirement
Concludes: security frameworks needed 'to accompany' widespread AI adoption
Inferences
Security frameworks are positioned as essential governance infrastructure for digital rights
Industry-wide standards advocacy reflects need for collective responsibility structures
Strong advocacy for corporate duty to protect user rights. 'Protecting customer data must remain the top priority' and 'we're entrusting these companies with sensitive data' implies non-negotiable duty.
FW Ratio: 60%
Observable Facts
Article emphasizes: 'Protecting customer data must remain the top priority'
States: 'by doing so, we're entrusting these companies with sensitive data' — implicit duty
Advocates that companies must 'safeguard data and prevent exposure'
Inferences
Corporate responsibility to protect user rights is explicitly positioned as non-negotiable
Data protection is framed as duty corresponding to entrusted user information
Strong advocacy for transparency through responsible disclosure. Detailed public documentation of vulnerability enables informed decision-making about AI services.
FW Ratio: 60%
Observable Facts
Wiz publicly documented vulnerability discovery in detailed technical blog post
Blog content freely accessible to all readers
Detailed technical walkthrough provided for industry education
Inferences
Responsible disclosure and public transparency support freedom of information principle
Free access to security research enables informed participation in digital economy
'This exposure underscores the fact that the immediate security risks for AI applications stem from the infrastructure' and 'The rapid adoption of AI services without corresponding security is inherently risky'
causal oversimplification
'While much of the attention around AI security is focused on futuristic threats, the real dangers often come from basic risks'
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 10:41:39 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.