How we’re responding to The NYT’s data demands in order to protect user privacy

260 points by BUFU 18 hours ago

molf 8 hours ago

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

miles 2 hours ago

> I get that approval needs to be given, and that there are barriers to entry.
Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.
- AlecSchueler an hour ago
  
  > what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
  Product development?
ArnoVW 3 hours ago

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation
pclmulqdq 5 hours ago

The missing ingredient is money.
- jewelry 5 hours ago
  
  not just money. How are you going to support this client’s support ticket if there is no log at all?
  
  ethbr1 3 hours ago
  
  Don't. "We're unable to provide support for your request, because you disabled retention." Easy.
  
  hirsin 3 hours ago
  
  They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.
  
  abeppu 3 hours ago
  
  ... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.
lmm 5 hours ago

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.
What's the betting that they just write it on the website and never actually implemented it?
- sigmoid10 4 hours ago
  
  Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.
belter 5 hours ago

If this stands I dont think they can operate in the EU
- bunderbunder 4 hours ago
  
  I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.
  
  glookler an hour ago
  
  >> Does this court order violate GDPR or my rights under European or other privacy laws?
  >> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
  
  danielfoster an hour ago
  
  They didn’t say which law (the US judge’s order or EU law) they are complying with.
inquirerGeneral 5 hours ago

[dead]

_jab 16 hours ago

> How will you store my data and who can access it?

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

tptacek 16 hours ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.
- VanTheBrand 15 hours ago
  
  The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.
  
  tptacek 14 hours ago
  
  No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").
  
  bee_rider 14 hours ago
  
  Standard corporate spin, then?
  
  bunderbunder 4 hours ago
  
  No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.
  
  jmull 3 hours ago
  
  This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.
  
  bunderbunder 2 hours ago
  
  I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.
  
  bee_rider 43 minutes ago
  
  Can you share your definition? This is actually quite puzzling because as far as I know “spin” has always been associated with presenting things in a way that benefits you. Like, decades ago, they could have the show “Bill O’Rilley’s No Spin Zone” and everybody knew the premise was that they argue against guests who were trying to tell a “massaged” version of the story, and that they’d go for some actual truth (fwiw I thought the whole show was full of crap, but the name was not confusing or ambiguous).
  I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.
  
  tptacek 13 hours ago
  
  No? "Spin" implies there was something else they could possibly say.
  
  bee_rider 40 minutes ago
  
  That is unrelated to what the expression means.
  
  justacrow 11 hours ago
  
  They could choose to not say it
  
  ethbr1 6 hours ago
  
  Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.
  Including lies.
  I'd like to aim a little higher, maybe towards expecting correspondence with reality?
  IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.
  
  mrgoldenbrown 5 hours ago
  
  If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.
  
  inquirerGeneral 5 hours ago
  
  [dead]
  
  mmooss 13 hours ago
  
  I haven't heard that interpretation; I might call it spin of spin.
  
  adamsb6 5 hours ago
  
  I’m typing these words from a brain that has absorbed copyrighted works.
- mhitza 10 hours ago
  
  My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].
  And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]
  [0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...
  [1] https://help.openai.com/en/articles/8809935-how-to-delete-an...
- ofjcihen 5 hours ago
  
  They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.
  Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.
  
  gruez 4 hours ago
  
  >They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.
  From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".
  
  ofjcihen 3 hours ago
  
  Here’s a good article that explains what you may be missing.
  https://techcrunch.com/2024/11/22/openai-accidentally-delete...
  
  gruez 3 hours ago
  
  Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.
  
  ofjcihen 3 hours ago
  
  Sure.
  Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.
- mmooss 13 hours ago
  
  > It's not an attempt to spin the lawsuit; it's about reassuring their customers.
  It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.
  
  roywiggins 4 hours ago
  
  It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.
  
  fallingknife 5 hours ago
  
  Why does OpenAI have any obligation to present the NYTs side?
  
  mmooss an hour ago
  
  Who said 'obligation'?
- conartist6 8 hours ago
  
  It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.
lxgr 16 hours ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.
sashank_1509 16 hours ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.
- ivape 10 hours ago
  
  To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).
  
  Workaccount2 4 hours ago
  
  > It's all stolen.
  LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.
  
  edbaskerville 3 hours ago
  
  Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.
  
  fallingknife 5 hours ago
  
  Copyright is pretty narrowly tailored to verbatim reproduction of content so I doubt they will have to pay anything.
  
  tiahura 4 hours ago
  
  incorrect. copyright applies to derived works.
  
  vel0city 4 hours ago
  
  Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.
  
  fallingknife 3 hours ago
  
  Please show me one of these prompts
  
  vel0city 2 hours ago
  
  NYT has examples in their legal complaint. See page 30.
  https://www.scribd.com/document/695189742/NYT-v-OpenAI
pritambarhate 13 hours ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.
I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.
- conartist6 8 hours ago
  
  You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy
- DrillShopper 41 minutes ago
  
  The OpenAI Privacy Policy specifically allows them to keep data as required by law.
- mmooss 13 hours ago
  
  > who don't even care about NYT's content or bypassing their paywalls.
  Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.
  If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?
  > (jeapordizes)
  ... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.
hiddencost 16 hours ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.
I am not an Open AI stan, but this needs to be responded to.
The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.
This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".
Anyone who makes promises about data security is at best incompetent and at worst dishonest.
- nhecker 4 hours ago
  
  Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...
- JohnKemeny 13 hours ago
  
  > Anyone who makes promises about data security is at best incompetent and at worst dishonest.
  Shouldn't that be "at best dishonest and at worst incompetent"?
  I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?
  
  HPsquared 11 hours ago
  
  An incompetent but honest person is more likely to accept correction and respond to feedback generally.

hombre_fatal 2 hours ago

You know how it's always been a meme that you'd be mortally embarrassed if your browser history ever leaked?

Imagine how much worse it is for your LLM chat history to leak.

It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.

vitaflo 2 hours ago

WTF are you asking LLMs and why would you expect any of it to be private?
- threecheese an hour ago
  
  This product is positioned as a personal copilot, and future iterations (based on leaked plans, may or may not be true) as a wholly integrated life assistant.
  Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?
  I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.
  A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.
- hombre_fatal 2 hours ago
  
  It's not that the convos are necessarily icky.
  It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.
  At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.
  Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.
  
  alec_irl an hour ago
  
  > how you copied that long text from your distraught girlfriend and asked it for some response ideas
  good lord, if tech were ethical then there would be mandatory reporting when someone consults an LLM to tell them how they should be responding to their intimate partner. are your skills of expression already that hobbled by chat bots?
  
  hombre_fatal an hour ago
  
  These are just concrete examples to get the imagination going, not an exhaustive list of the ways that you are revealing your true self in the folds of your LLM chat history.
  Note that it doesn't have to go all the way to "he gets Claude to help him win text arguments with his gf" for an uncomfortable amount of your self to be revealed by the chats.
  There is always something icky about someone observing messages you wrote in privacy, and you don't have to have particularly unsavory messages for it to be icky. Why is that?
  
  alec_irl an hour ago
  
  i don't personally see messages with an LLM as being different from, say, terminal commands. it's a machine interface. it sounds like you're anthropomorphizing the chat bot, if you're talking to it like you would a human then i would be more worried about the implications that has for you as a person.
  
  AlecSchueler an hour ago
  
  What does this comment add to the conversation? It feels like an personal attack with no real rebuttal. People with anthropomorphise them all talk to them, the human-like interface is the entire selling point.
  
  hombre_fatal an hour ago
  
  Focusing on how you anthropomorphize the LLM isn't really interacting with the point since it was one example.
  Might someone's google search history be embarrassing even though they don't treat google like a human?
  
  Jackpillar an hour ago
  
  Might have to reemphasize his question again but - what questions are you asking your LLM? Why are you responding to it and/or "treating" it differently then how you would a calculator or search engine.
  
  hombre_fatal an hour ago
  
  Because it's far more capable than a calculator or search engine and because you interact with it with conversational text, it reveals more aspects about your personality.
  Why might your search engine queries reveal more about you than your keystrokes in a calculator? Now dial that up.
  
  Jackpillar an hour ago
  
  Sure - but I don't interact with it as if its human so my demeanor or attitude is neutral because I'm talking to you know - a computer. Are you getting emotional with and reprimanding your chatbot?
  
  hombre_fatal an hour ago
  
  I don't get why I'm receiving pushback here. How you treat the LLM was only a fraction of my examples for ways you can look pathetic if your chats were made public.
  You don't reprimand the google search box, yet your search history might still be embarrassing.
  
  hackinthebochs an hour ago
  
  Your points were very accurate and relevant. Some people have a serious lack of imagination. The perpetual naysayers will never have their minds changed.
  
  hombre_fatal an hour ago
  
  Good god, thank you. I thought I was making an obvious, unanimous point when I wrote that first comment.
  
  AlecSchueler an hour ago
  
  It's so tiring to read. You're making a reasonable point. Some people can't believe that other people behave or feel differently to themselves.
- ofjcihen 2 hours ago
  
  “Write a song in the style of Slipknot about my dumb inbred dogs. I love them very much but they are…reaaaaally dumb.”
  To be fair the song was intense.

supriyo-biswas 16 hours ago

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

paxys 16 hours ago

Yeah, try explaining any of these words to a lawyer or judge.
- sthatipamala 14 hours ago
  
  The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)
  
  anshumankmr 13 hours ago
  
  As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.
  
  king_magic 9 hours ago
  
  a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two
- fc417fc802 16 hours ago
  
  I thought that's what GPT was for.
- m463 16 hours ago
  
  "you are a helpful law assistant."
- landl0rd 15 hours ago
  
  "You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."
LandoCalrissian 15 hours ago

Trying to actively circumvent the intention of a judges order is a pretty bad idea.
- Aeolun 14 hours ago
  
  That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.
- girvo 15 hours ago
  
  Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha
delusional 14 hours ago

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.
For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.
I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.
bigyabai 15 hours ago

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.
Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.
- landl0rd 15 hours ago
  
  Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.
  It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
  Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
  I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
  
  tdeck 15 hours ago
  
  > Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
  The laws have changed since then and it's not for the better:
  https://www.aclu.org/press-releases/congress-passing-bill-th...
  
  tuckerman 14 hours ago
  
  Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.
  [1] https://en.wikipedia.org/wiki/MUSCULAR
  
  onli 9 hours ago
  
  Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.
  
  tuckerman 5 hours ago
  
  You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).
  I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.
  
  dmurray 13 hours ago
  
  > You're pointing to the Russel's Teapot of sigint.
  If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.
  
  cwillu 15 hours ago
  
  > I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
  The input is what's interesting.
  
  Aeolun 14 hours ago
  
  It doesn’t change the monumental scope of the problem though.
  Though I’m inclined to believe the US gov can if OpenAI can.
  
  Yizahi 9 hours ago
  
  Metadata is spying (c) Bruce Schneier
  If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.
  Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.
  
  jstanley 9 hours ago
  
  The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.
  
  rl3 12 hours ago
  
  >However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
  Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.
  That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.
  These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.
  
  Workaccount2 4 hours ago
  
  My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.
  
  bigyabai an hour ago
  
  > because it ultimately plays into their hand.
  How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.
  In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?
  Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.
  
  zer00eyz 15 hours ago
  
  > However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
  This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.
  > Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...
  Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )
  
  komali2 15 hours ago
  
  There's no way to know, but it's safer to assume.
- 7speter 14 hours ago
  
  Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!
  
  nl 14 hours ago
  
  As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.
  
  Jackpillar an hour ago
  
  I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.
  
  justacrow 11 hours ago
  
  I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?
  
  stock_toaster 9 hours ago
  
  Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.
- rl3 12 hours ago
  
  >Of course it's backdoored, you can't even begin to try proving me wrong.
  On the contrary.
  >Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.
  I think you're being unduly paranoid. /s
  https://www.theverge.com/2024/6/13/24178079/openai-board-pau...
  https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...
- farts_mckensy 15 hours ago
  
  Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.
  
  artursapek 14 hours ago
  
  I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.
  
  refuser 14 hours ago
  
  It was that good?
  
  baobun 14 hours ago
  
  gief
  
  brigandish 14 hours ago
  
  Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.
  
  farts_mckensy 13 hours ago
  
  Search engines use our data for completely different purposes.
  
  yunwal 6 hours ago
  
  That doesn’t negate the GPs point. It’s easy to make datasets searchable.
  
  farts_mckensy 4 hours ago
  
  Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.
  
  bigyabai 13 hours ago
  
  "We kill people based on metadata." - National Security Agency Gen. Michael Hayden
  Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.
  
  farts_mckensy 4 hours ago
  
  This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.
  
  bigyabai 2 hours ago
  
  Citation?

tomhow 16 hours ago

Related discussion:

OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)

sega_sai 16 hours ago

Strange smear against NYT. If NYT has a case, and the court approves that, it's bizarre to to use the court order to smear NYT. If there is no case, "Open"AI will have a chance to prove its case in court.

lxgr 16 hours ago

The NYT is, in my view, exploiting a systematic weakness of the US legal system here, i.e. extremely wide reaching discovery laws with almost no regard for the privacy of parties not involved to a given dispute, or aspects of their lives not relevant to the dispute at hand.
Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.
- JumpCrisscross 14 hours ago
  
  > with almost no regard for the privacy of parties not involved to a given dispute
  Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.
- thinkingtoilet 6 hours ago
  
  The privacy onus is entirely on the company. If Open AI is concerned about user privacy then don't collect that data. End of story.
  
  acheron 6 hours ago
  
  …the whole point of this story is that the court is forcing them to collect the data.
  
  taormina 4 hours ago
  
  No no, they are being forced to KEEP the data they collected. They didn't have to keep it to begin with.
  
  pj_mukh 2 hours ago
  
  Isn't the only way to do that is for ChatGPT to run locally on a machine? The moment your chat hits their server they are legally required to store it?
  
  thinkingtoilet 5 hours ago
  
  You're telling me you don't think Open AI is already collecting chat logs?
  
  dghlsakjg 5 hours ago
  
  Yes.
  In the API that is an explicit option, as well as in the paid consumer product as well. The amount of business that they stand to lose by maliciously flouting that part of their contract is in the billions.
  
  thinkingtoilet 3 hours ago
  
  You can trust Sam Altman. I do not.
  
  Workaccount2 4 hours ago
  
  "I'm wrong so here is a conspiracy so I can be right again".
  Large companies lose far more by lying than they would gain from it.
- Arainach 16 hours ago
  
  What right to privacy? There is no right to have your interactions with a company (1) remain private, nor should there be. Even if there was you agree to let OpenAI do essentially whatever they want with your data - including hand it over to the courts in response to a subpoena.
  (1) With limited well scoped exclusions for lawyers, medical records, erc.
  
  ChadNauseam 15 hours ago
  
  Given how many important interactions people have with companies in our modern age, saying "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all". When I talk to my friends over facetime or imessage, that interaction is being mediated by Apple, as well as by my internet service provider and (I assume) many other parties.
  
  wvenable 15 hours ago
  
  > "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all".
  Legally that is a correct statement.
  If you want that changed, it will require legislation.
  
  HDThoreaun 8 hours ago
  
  Really not so simple. Roe v Wade was decided based on the implied right to privacy. Sure its been overturned but if liberals get back on the court it will be un-overturned
  
  wvenable 2 hours ago
  
  Roe v Wade refers to the constitutional right to privacy under the Due Process Clause of the 14th Amendment. This is part of individual rights against the state and has nothing to do with private companies. There is no general constitutional right that guarantees privacy in interactions with private companies.
  
  maketheman 5 hours ago
  
  Given the current balance of the court, I'd say it's about even odds we end the entire century without ever having had a liberal court the entire time. Best reasonable case we're a solid couple of decades from it, and even that's not got great odds.
  We'd have a better chance if anyone with power were talking about court reform to make the Supreme Court justices e.g. drawn by lot for each session from the district courts, but approximately nobody is. It'd be damn good and long overdue reform, but oh well.
  And the thing is, we've already had a fairly conservative court for decades. I'm pretty likely to die, even if of old age, never having seen an actually-liberal court in the US my entire life. Like, WTF. Frankly, no wonder so much of our situation is fucked up, backwards, and authoritarianism-friendly. And (sigh) any serious attempts to fix that are basically on hold for many decades more, assuming rule of law survives that long anyway.
  [EDIT] My point, in short, is that "we still have [thing], we just have to wait for a liberal court that'll support it" is functionally indistinguishable from not having [thing].
  
  fallingknife 5 hours ago
  
  A liberal court will probably start drawing exceptions to 1A out of thin air like "misinformation" and "hate speech." I'd rather stick with what we have.
  
  nativeit 6 hours ago
  
  That’s presumably why legislation is needed?
  
  whilenot-dev 15 hours ago
  
  Privacy in that example would be if no party except you and your friends can access the contents of this interaction. I wouldn't want neither Apple nor my ISP to have that access.
  A company like OpenAI that offers a SaaS is no such friend, and in such power dynamics (individual VS company) it's probably in your best interest to have everything public if necessary.
  
  lxgr 3 hours ago
  
  You're always free to keep records of your ChatGPT conversations on your end.
  Why tangle the data of people with very different preferences than yours up in that?
  
  bobmcnamara 15 hours ago
  
  > "there is no right to privacy at all"
  First time?
  
  Analemma_ 15 hours ago
  
  > essentially equivalent to saying "there is no right to privacy at all".
  As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law. Lots of people think the Fourth Amendment is a general right to privacy, and they are wrong: the Fourth Amendment is specifically about government search and seizure, and courts have been largely consistent about saying it does not extend beyond that to e.g. relationships with private parties.
  If you want a right to privacy, you will need to advocate for laws to be changed; the ones as they exist now do not give it to you.
  
  tiahura 14 hours ago
  
  No that is incorrect. See eg griswold, lawrence etc.
  
  Terr_ 13 hours ago
  
  That's a fallacy of equivocation, you're introducing a different meaning/flavor of the same word.
  As it stands today, a court case (A) affirming the right to use contraception is not equivalent to a court case (B) stating that a phone-company/ISP/site may not sell their records of your activity.
  
  tiahura 6 hours ago
  
  Your response hinges on a fallacy of equivocation, but ironically, it commits one as well.
  You conflate the absence of a statutory or regulatory regime governing private data transactions with the broader constitutional right to privacy. While it’s true that the Fourth Amendment limits only state action, U.S. constitutional law, via cases like Griswold v. Connecticut and Lawrence v. Texas, and clearly recognizes a substantive right to privacy, grounded in the Due Process Clause and other constitutional penumbras. This is not a semantic variant; it is a distinct and judicially enforceable right.
  Moreover, beyond constitutional law, the common law explicitly protects privacy through torts such as intrusion upon seclusion, public disclosure of private facts, false light, and appropriation of likeness. These apply to private actors and are recognized in nearly every U.S. jurisdiction.
  Thus, while the Constitution may not prohibit a website from selling your data, it does affirm a right to privacy in other, fundamental contexts. To deny that entirely is legally incorrect.
  
  wvenable 2 hours ago
  
  You're conflating the existence of specific privacy protections in narrow legal domains with a generalized, enforceable right to privacy which doesn't exist in US law. The Constitution recognizes a substantive right to privacy, but only in carefully defined areas like reproductive choice, family autonomy, and intimate conduct, and critically only against state actors. Citing Griswold, Lawrence, and related cases does not establish a sweeping privacy right enforceable against private companies.
  Common law requires a high threshold of offensiveness and are adjudicated on a case-by-case in individual jurisdictions. They offer only remedies and not a proactive right to control your data.
  The original point, that there is no general right in the US to have your interactions with a company remain private, still stands. That's not a denial of all privacy rights but a recognition that US law fails to provide comprehensive privacy protection.
  
  tiahura an hour ago
  
  The statement I was referring to is:
  “As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law.”
  That is an incorrect statement. The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.
  If you’re strawman is that in the US there’s no right to privacy because there’s no blanket prohibition on talking about other people, and what they’ve been up to, then run with it.
  
  wvenable 41 minutes ago
  
  > The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.
  I completely disagree. Yes, the Prosser privacy torts exist: intrusion upon seclusion, public disclosure, false light, and appropriation. But they are highly fact-specific, hard to win, rarely litigated, not recognized in all jurisdictions, and completely reactive -- you get harmed first, maybe sue later!
  They are utterly inadequate to protect people in the modern data economy. A website selling your purchase history? Not actionable. A company logging your AI chats? Not intrusion. These torts are not a privacy regime - they are scraps. Also when we're talking about basic privacy rights, we just as concerned with mundane material not just "highly offensive" material that the torts would apply to.
  
  tiahura 17 minutes ago
  
  Because in the US we value freedom and particularly freedom of speech.
  If don’t want the grocery store telling people you buy Coke, don’t shop there.
  
  jcalvinowens 4 hours ago
  
  In practice, the constitution says whatever the supreme court says it says.
  While these grand theories of traditional implicit constitutional law are nice, they're pretty meaningless in a system where five individuals can (and are willing to) vote to invalidate decades of tradition on a whim.
  I too want real laws.
  
  fc417fc802 15 hours ago
  
  > There is no right to have your interactions with a company (1) remain private, nor should there be.
  Why should two entities not be able to have a confidential interaction if that is what they both want? Certainly a court order could supersede such a right just as it could most others provided sufficient evidence. However I would expect such things to be both highly justified and narrowly targeted.
  This specific case isn't so much about a right to privacy as it is a more general freedom to enter into contracts with others and expect those to be honored.
  
  nativeit 6 hours ago
  
  Hey man, wanna buy some coke? How about trade secrets? State secrets?
  
  bionhoward 16 hours ago
  
  It’s also a matter of competition…there are other AI services available today with various privacy policies ranging from no training by default, ability to opt out of training, ability to turn off data retention, or e2e encryption. A lot of workloads (cough, working on private git repos) logically require private AI to make sense
  
  levocardia 14 hours ago
  
  But there's a very big difference between "no company is legally required to keep your data private" and "a company that explicitly and publically wants to protect your privacy is being legally coerced into not keeping your data private"
  
  nativeit 6 hours ago
  
  No room here for the company’s purely self-interested motivations?
  
  1shooner 15 hours ago
  
  >(1) With limited well scoped exclusions for lawyers, medical records, erc.
  Is this referring to some actual legal precedent, or just your personal opinion?
  
  lxgr 16 hours ago
  
  That may be your or your jurisdiction's view, but such privacy rights definitely exist in many countries.
  You might have heard of the GDPR, but even before that, several countries had "privacy by default" laws on the books.
  
  davedx 12 hours ago
  
  Hello. I live in the EU. Have you heard of GDPR?
  
  Imustaskforhelp 16 hours ago
  
  But if both the parties agree, then there should be The freedom to stay private.
  Your comment is dystopian given how the interaction is basically like how some people treat ai as their "friend" imagine no matter what encrypted messaging app or smth they use, the govt still snoops
  
  fastball 14 hours ago
  
  Dealer-Client privilege.
visarga 15 hours ago

NYT wants it both ways. When they were the ones putting freelancer articles into a database to rent, they argued against enforcing copyright and for supporting the new industry, and that it was too hard to revert their original assumptions. Now they absolutely love copyright.
https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...
- moefh 15 hours ago
  
  Another way of looking at it is that they lost that case over 20 years ago, and have been building their business model for 20 years accordingly.
  In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.
tptacek 16 hours ago

They're a party to the case! Saying it's baseless isn't a "smear". There is literally nothing else they can say (other than something synonymous with "baseless", like "without merit").
- lucianbr 16 hours ago
  
  Oh they definitely can say other things. It's just that it would be inconvenient. They might lose money.
  I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...
  
  tptacek 15 hours ago
  
  I'm not taking one side or the other in the case itself, but it's lazy and superficial to suggest that the defendant in a civil suit would say anything other than that the suit has no merit. The version of this statement where they generously interpret anything the NYT (I subscribe) says, they might as well just surrender.
  I'm not sticking up for OpenAI so much as just for decent, interesting threads here.
  
  fastball 14 hours ago
  
  This is the nature of the civil court system – it exists for when parties disagree.
  Why would a defendant who agrees a case has merit go to court at all? Much easier (and generally less expensive) to make the other party whole, assuming the parties agree on what "whole" is. And if they don't agree on what "whole" is, we are back to square one and of course you'd maintain that the other side's suit is baseless.
  
  wilg 14 hours ago
  
  > They might lose money.
  I expect it's more about them losing the _case_. Silly to expect someone fighting a lawsuit not to try to win it.
- mmooss 13 hours ago
  
  They could say nothing about the merits of the case.
eviks 15 hours ago

And if NYT has no case, but the court approves it, is that still bizarre?
tootie 3 hours ago

It's PR. OpenAI stole mountains of copyrighted content and are trying to make NYT look like bad guys. OpenAI would not be in the position of defending a lawsuit if they hadn't done something that is very likely illegal. OpenAI can also end this requirement right now by offering a settlement.
wyager 15 hours ago

Lots of people abuse the legal system in various ways. They don't get a free pass just because their abuse is technically legal itself.

energy123 7 hours ago

> Consumer customers: You control whether your chats are used to help improve ChatGPT within settings, and this order doesn’t change that either.

Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.

sib301 6 hours ago

Can you please elaborate?
- energy123 6 hours ago
  
  To opt-out of your data being trained on, you need to go to https://privacy.openai.com and click the button "Make a Privacy Request".
  
  alextheparrot 3 hours ago
  
  in the app: Settings ~> Data Controls ~> Improve the model for everyone
curtisblaine 6 hours ago

Yes, could you please explain why toggling "Improve model for everyone" off doesn't do anything and provide a link to this off-portal app that you mention?

atleastoptimal 16 hours ago

I've always assumed that anything sent to any company's hosted API will be logged forever. To assume otherwise always seemed naive, like thinking that apps aren't tracking your web activity.

lxgr 16 hours ago

Assuming the worst is wise, settling for the worst case outcome without any fight seems foolish.
fragmede 14 hours ago

privacy nhilism is a decision all on its own
- morsch 13 hours ago
  
  I'd only call it nihilism if you are in agreement with the grandparent and then do it anyway. Other choices are pretending it's not true (denialism), or just not thinking about (ignorance). Or you complicate your life by not uploading your private info.
- Barrin92 4 hours ago
  
  not really, it's basically just being anti fragile. Consider any corporate entity that interacts with you to be an Eldritch horror from outer space that wants to siphon your soul, because that's effectively what it is, and keep your business with them to a minimum.
  It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head

nraynaud 9 hours ago

Isn't Altman collecting millions of eye scans? Since when did he care about privacy?

yoaviram 13 hours ago

>Trust and privacy are at the core of our products. We give you tools to control your data—including easy opt-outs and permanent removal of deleted ChatGPT chats (opens in a new window) and API content from OpenAI’s systems within 30 days.

No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.

that_was_good 12 hours ago

Except all users can opt out. Am I missing something?
It says here:
> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.
Enterprise is just opt out by default...
https://help.openai.com/en/articles/8983130-what-if-i-want-t...
- bartvk 12 hours ago
  
  Indeed. Click your profile in the top right, click on the settings icon. In Settings, select "Data Controls" (not "privacy") and then there's a setting called "Improve the model for everyone" (not "privacy" or "data sharing") and turn it off.
  
  bugtodiffer 11 hours ago
  
  so they technically kind of follow the law but make it as hard as possible?
  
  bartvk 8 hours ago
  
  Personally I feel it's okay but kinda weird. I mean why not call it privacy. Gray pattern, IMHO. For example venice.ai simply doesn't have a privacy setting because they don't use the data from chats. (They do have basic telemetry, and the setting is called "Disable Telemetry Collection").
- atoav 12 hours ago
  
  Not sharing you data with other users does not mean the data of a deleted chat are gone, those are very likely two completely different mechanisms.
  And whether and how they use your data for their own purposes isn't touched by that either.
- agos 12 hours ago
  
  what about all the rest of the data they use for training, there's no opt out from that
baxtr 12 hours ago

This is a typical "corporate speak" / "trustwahsing" statement. It’s usually super vague, filled with feel-good buzzwords, with a couple of empty value statements sprinkled on top.

paxys 16 hours ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

That's a lot of words to say "yes, we are violating GDPR".

3836293648 11 hours ago

No, they're not, because the GDPR has an explicit exception for when a court orders that a company keeps data for discovery. It'd only be a GDPR violation if it's kept after this case is over.
- lompad 8 hours ago
  
  This is not correct.
  > Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.
  So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.
dragonwriter 15 hours ago

That's what they are trying to suggest, because they are still trying to use the GDPR as part of their argument challenging the US court order. (Kind of a longshot to get a US court to agree that the obligation of a US party to preserve evidence related to a suit in US courts under US law filed by another US party is mitigated by European regulations in any case, even if their argument that such preservation would violate obligations that the EU had imposed on them.)
kelvinjps 16 hours ago

Maybe the will ot store the chats of the European users?
esafak 16 hours ago

Could a European court not have ordered the same thing? Is there an exception for lawsuits?
- lxgr 16 hours ago
  
  There is, but I highly doubt a European court would have given such an order (or if they did, it would probably be axed by a higher court pretty quickly).
  There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.
  Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.
  
  dragonwriter 14 hours ago
  
  > There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.
  > Looking at the actual data seems much more invasive than that
  Looking at the data isn't involved in the current order, which requires OpenAI to preserve and segregate the data that would otherwise have been deleted. The reason for segregation is because any challenges OpenAI has to providing that data in disccovery will be heard before anyone other than OpenAI is ordered to have access to the data.
  This is, in fact, less invasive than the government mandating collection for speculative future uses, since it applies only to not destroying evidence already collected by OpenAI in the course of operating their business, and only for potential use, subject to other challenges by OpenAI, in the present case.

amluto 16 hours ago

It appears that the “Zero Data Retention” APIs they mention are something that customers need to request access to, and that it’s really quite hard to get this access. I’d be more impressed if any API user could use those APIs.

JimDabell 16 hours ago

I believe Apple’s agreement includes this, at least when a user isn’t signed into an OpenAI account:
> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.
— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...
I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.
- fc417fc802 15 hours ago
  
  > I’m sure after telling their users it’s private, they won’t be happy about everything getting logged,
  The ZDR APIs are not and will not be logged. The linked page is clear about that.
singron 16 hours ago

If OpenAI cared about our privacy, ZDR would be a setting anyone could turn on.

CjHuber 9 hours ago

Even though how they responded is definitely controversial, I‘m glad that they did publicize some response to it. After reading about it in the news yesterday and seeing no response on their side yet, I was worried that they would just keep silent

conartist6 8 hours ago

Hey OpenAI! In your "why is this happening" you left some bits out.

You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!

WorldPeas 15 hours ago

So how is this going to impact cursor's privacy mode, which is required by many companies for compliant usage of AI editors? For the uninitiated, in the web console this looks like:

Privacy mode (enforced across all seats)

OpenAI Zero-data-retention (approved)

Anthropic Zero-data-retention (approved)

Google Vertex AI Zero-data-retention (approved)

xAi Grok Zero-data-retention (approved)

did this just open another can of worms?

qmarchi 15 hours ago

Likely, they're using OpenAI's Zero-Retention APIs where there's never data stored in the first place.
So nothing?
- JumpCrisscross 14 hours ago
  
  > OpenAI's Zero-Retention APIs
  Do we know if the court order covers these?
  
  brigandish 14 hours ago
  
  Yes, follow the link at the top.
  
  JumpCrisscross an hour ago
  
  > Yes, follow the link at the top
  OpenAI says “this does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.”
8note 15 hours ago

at least, openai zero-data-retention will by court order be full retention.
im excited that the law is going to push for local models
- blerb795 15 hours ago
  
  The linked page specifically mentions that these ZDR APIs are not impacted.
  > This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.

Kiyo-Lynn 9 hours ago

Lately I’m not even sure if the things I say on OpenAI are really mine or just part of the platform. I never used to think much when chatting, but knowing some of it might be stored for a long time makes me feel uneasy. I’m not asking for much. I just want what I delete to actually be gone.

dataflow 14 hours ago

> ChatGPT Enterprise and ChatGPT Edu: Your workspace admins control how long your customer content is retained. Any deleted conversations are removed from our systems within 30 days, unless we are legally required to retain them.

I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?

oxw 14 hours ago

Enterprise has an exemption granted by the judge
> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.
- dataflow 14 hours ago
  
  Oh I missed that part, thanks. I wonder why. I guess the judge assumes it isn't being used for copyright infringement, but other plans might be?
  
  bee_rider 14 hours ago
  
  No idea, but just to speculate—the court’s goal isn’t actually to scare OpenAI’s users or harm their business, right? It is to collect evidence. Maybe they just figured they don’t need to dip into that pool to get enough evidence.
  
  Grikbdl 13 hours ago
  
  Who knows, it's probably the judge's twisted idea of "that'd be too far", as if cancelling basic privacy expectations of all users everywhere wouldn't be.

Caelus9 14 hours ago

Honestly, this incident makes me feel that it is really difficult to draw a clear line between “protecting privacy” and “obeying the law”. On the one hand, I am very relieved that OpenAI stood up and said “no”. After all, we all know that these systems collect everything by default, which makes people a little panic. But on the other hand, it sounds very strange that the court can directly say “give me all the data”, even those that users explicitly delete. Moreover, this also shows that everyone actually cares about their information and privacy now. No one wants to be used for anything casually.

dumbmrblah 15 hours ago

So is this for all chats going forward or does it include conversations retroactively?

steve_adams_86 15 hours ago

Presumably moving forward, because otherwise the data retention policies wouldn't have been followed correctly (from what I understand)

mediumsmart 8 hours ago

Its a newspaper. They are sold for a price, not to one person and they dont come with an nda. They become part of history and Society.

jamesgill 5 hours ago

Follow the money.

wand3r 14 hours ago

Does anyone know how this can be enforced?

The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.

mmooss 13 hours ago

This isn't the executive branch of the US government, which has Constitutional powers. It's a private company and the court can at least enforce massive penalties, presumptions against them at trial (causing them to lose), and contempt of court. Talk to a lawyer before you try something like it.
- imiric 12 hours ago
  
  > the court can at least enforce massive penalties
  A.k.a. the cost of doing business.
  
  mmooss an hour ago
  
  Businesses care deeply about money. The bravado of many businesspeople these days, that they are immune to criticism, lawsuits, etc. is a bluff. It apparently works, because many people repeat it.

landonxjames 14 hours ago

Repeatedly calling the lawsuit baseless feels like it makes Open AI’s point a lot weaker. They obviously don’t like the suit, but I don’t think you can credibly argue that there aren’t tricky questions around the use of copyrighted materials in training data. Pretending otherwise is disingenuous.

sigilis 3 hours ago

They pay their lawyers and whoever made this page a lot for the express purpose of credibly arguing that it is very clearly totally legal and very cool to use of any IP they want to train their models.
Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.

udev4096 4 hours ago

The irony is palpable here

lxgr 16 hours ago

Does anybody know if this also applies to "temporary chats" on ChatGPT?

Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.

miles 16 hours ago

> But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said.
https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

dvt 13 hours ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.

mosdl 16 hours ago

Its funny that OpenAI is complaining, they don't mind saying copyright doesn't apply to them if it makes them money.

ivape 9 hours ago

In retrospect, Bezos did the smartest thing by buying the Washington Post. In retrospect, Google did a great thing by working on a deal with Reddit. Content repositories/creators are going to sue these LLM companies in the West until they make licensing agreements. If I were OpenAI, I'd work hard to spend the money they raised to literally buyout as many of these outlets as possible.
How much could the NYT back catalog be worth? Just buy it, ask the Saudis.
tptacek 15 hours ago

[flagged]
- mmooss 13 hours ago
  
  People here advocate for private use, not profit-making corporate use.
- rasengan 15 hours ago
  
  The internet is the battle of the narratives.
- gnabgib 15 hours ago
  
  [flagged]
  
  tptacek 15 hours ago
  
  I'm sorry, do you think I'm kidding about that? "Situational ethics about copyright" seems like a weird charge to throw around here. I'm not being glib.
  
  fc417fc802 15 hours ago
  
  Suppose that you genuinely believe that the majority of HN users suffer from "situational ethics" regarding copyright that work in their favor rather than according to any particular principle. It would remain entirely irrelevant to a discussion about a third party. Absolute best case scenario it's a red herring.
  That said the comment you replied to doesn't even seem to make sense in the first place. What does "violation of copyright for financial gain" (alleged, controversial) have to do with "complaining about the specifics of a court order"?
  
  mosdl 3 hours ago
  
  They only care about rights when it helps them.
  
  juped 14 hours ago
  
  I think you meritlessly flamed someone when you should have flagged their comment and moved on. Copyright isn't remotely relevant to this litigation hold at all, after all, why start a fight over it?

john2x 15 hours ago

Does this mean that if I can get ChatGPT to generate copyrighted text, they'll get in trouble?

vessenes 14 hours ago

This is a massssive overreach. Not in the nature of the request: "please don't destroy data that might contain proof my case is strong," but in the scale, and therefore it's a massive overreach by the judge. But shame on NYT for asking.

This request also equals: "Please keep a backup of every Senator's private chats, every Senator's spouse's private chats, every military commander's personal chats, every politician in a foreign country, forever."

There is no way that data will stay safe forever. There is no way that, once such a facility is built, it will not be used constantly, by governments all over the world.

The NYT case seems to currently be on whether or not OpenAI users use ChatGPT to circumvent paywalls. Maybe they do, although when the suit was filed, 3.5 was definitely not a reliable witness to what NYT articles were about. There are 400 million MAUs at ChatGPT - more than the population of the US.

To my mind there's three tranches of information that we could find out:

1. People's primary use case for ChatGPT is to get NYT articles for free. Therefore oAI is a bad actor making a tool that largely got profitable off infringing NYT's copyright.

2. Some core segment used/uses it for infringement purposes; not a lot, but it's a use case that sells licenses.

3. This happens, but just vanishingly rarely compared to most use cases of the tool.

I'd imagine different rulings and orders to cure in each of these circumstances, but why is it that the court needs to know any more than some percentages?

Assuming a 10k system prompt, 500 tokens of chat, 400mm people, five chats a week, that comes to roughly 67 Terabytes of data per week(!) No metadata, just ASCII output.

Nobody, ever, will read all of this. In fact, it would take about 24 hours for a Seagate drive to just push all the bytes down a bus, much less process any of it. Why not agree on representative searches, get a team to spot check data, and go from there?

Personally, I would guess the percentage of "infringement" use cases, IF it is even infringement to get an AI to verbatim quote a news article while it is NOT infringement for Cloudflare to give a verbatim quote of a news article, is going to be tiny, tiny, tiny.

NYT should back the fuck off, remember it's supposed to be a force for good in the world and not be the cause of massive possible downstream harm to people all over the world.

DrillShopper 31 minutes ago

> There is no way that data will stay safe forever. There is no way that, once such a facility is built, it will not be used constantly, by governments all over the world.
That's on OpenAI for deciding to retain this data in the first place. They could just not have done that. That was a choice, their choice, and therefore they're responsible for it.
fallingknife 4 hours ago

It's obviously 3 because the entire point of the NYT is that it's a newspaper and probably 99% of their traffic is from articles new enough that they haven't had time to go into the training data. So anybody who wanted to use ChatGPT to breach the NYT paywall couldn't get any new articles. Also there are so many other ways to breach a paywall that you would have to be insane to try to do it through prompt engineering ChatGPT. The whole case is a scam and I hope the court makes them pay OpenAI's legal fees.

delusional 14 hours ago

I have no time for this circus.

The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.

You do not get to claim to protect the privacy of the customers of your illegal venture.

kingkawn 15 hours ago

Once the data is kept it is a matter of time til a new must-try use for it will be born

dangus 16 hours ago

I think the court order doesn’t quite go against as many norms as OpenAI is claiming. It’s very reasonable to retain data pertinent to a case, and NYT’s case almost certainly revolves around finding out copyright infringement damages, which are calculated based on the number of violations (how many users queried ChatGPT and were returned verbatim copyrighted material from NYT).

If you don’t retain that data you’re destroying evidence for the case.

It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.

lxgr 16 hours ago

It absolutely goes against norms in many countries other than the US, and the data of residents/citizens of these countries are affected too.
> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).
Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.
- dangus 6 hours ago
  
  Countries other than the US aren't part of this lawsuit. ChatGPT operates in the US under US law. I don't know if they have separated data storage for other countries.
  I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.
  You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.
  
  lxgr 3 hours ago
  
  > I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.
  It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.
  > You're saying it's unreasonable to store data somewhere for a pending court case?
  I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.
  > Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.
  I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.
  Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.
  
  dangus 2 hours ago
  
  The scope of the data access required by the court is being worked out via due process. That’s why there’s an appeal system. OpenAI is just grandstanding in a public forum so that their customers don’t defect.
  When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.
  Ironburg Inventions, Ltd. v. Valve Corp.
  Finjan, Inc. v. Zscaler, Inc.
  Corel Software, LLC v. Microsoft
  Rollins Ranches, LLC v. Watson
  In none of these cases was a GDPR fine issued.
tptacek 16 hours ago

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem
The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.
- dangus 6 hours ago
  
  No, you're misinterpreting how information discovery and the court system works.
  The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.
  It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."
  I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
danenania 16 hours ago

Putting the merits of this specific case and positive vs. negative sentiments toward OpenAI aside, this tactic seems like it can be used to destroy any business or organization with customers who place a high value on privacy—without actually going through due process and winning a lawsuit.
Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.
However you feel about OpenAI, this is not a good precedent for user privacy and security.
- fc417fc802 15 hours ago
  
  That's not entirely fair. The argument isn't "users are using the service to break the law" but rather "the service is facilitating law breaking". To fix your signal analogy suppose you could use the chat interface to request copyrighted material from the operator.
  
  charcircuit 15 hours ago
  
  That doesn't change the outcome being the same in that the app has to send the plain text messages of everyone, including the chat history of every user.
  
  fc417fc802 14 hours ago
  
  Right. But requiring logs due to suspicion that the service itself is actively violating the law is entirely different from doing so on the basis that end users might be up to no good entirely independently.
  Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.
  My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.
  
  dangus 5 hours ago
  
  Again keep in mind that we are talking about a case limited analysis of that data within the privacy of the court system.
  For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.
  Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.
  The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.
- dangus 6 hours ago
  
  I'm confused at how you think that NYT isn't going through due process and attempting to win a lawsuit.
  The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."
  IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.

6510 13 hours ago

The harm this is doing and will do (regardless) seems to exceed the value of the NYT.

If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.

The GDPR mandates specific consent and legal bases for processing data, including sharing it.

Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.

I wonder what the fine would be if they just delete it per user agreement.

I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?

FireBeyond 16 hours ago

Sure, OpenAI, I will absolutely trust you.

That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.

mmooss 13 hours ago

OpenAI's other policies, and other laws and regulations, do have such requirements. Are they nullified because the data is held under a court order?
- mrguyorama an hour ago
  
  "The judge and court need to view this information to actually pass justice and decide the case" almost always supersedes other laws.
  The GDPR does not say that you can never be proven to have done something wrong in a court of law.
  
  mmooss an hour ago
  
  Right. The GGP says the information could be used for other purposes.
fragmede 16 hours ago

why is it horse shit that OpenAI is saying they've put the files in a cabinet that only legal has access to?
- FireBeyond 4 hours ago
  
  They are saying a “legal hold” means that they have to keep the data but don’t worry they’re not allowed to use it or access it for any other reason.
  A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.

Noelia- 11 hours ago

[dead]

Duskgmxx 14 hours ago

[dead]

enyt1 15 hours ago

[flagged]

RcouF1uZ4gsC 14 hours ago

[flagged]

staticman2 6 hours ago

That was the incident when Alexander said he didn't want an article about him turning up on the top results in Google when people were considering hiring him as a medical doctor.
Using Alice in Wonderland logic, this somehow meant his fans were meant to declare holy war on the New York Times but not Google.
mmooss 13 hours ago

The argument that it's doxxing was an attempt to undermine journalism.
The NYT's job is to uncover information that powerful people don't want uncovered, in the public interest. If someone is wielding significant influence anonymously, it's valuable to the public to know who it is.
Doxxing is publishing as much info as possible to harm vulnerable people. The NYT didn't publish Alexander's home address, cell number, credit cards, and pictures of their kids, and encourage people to track Alexander down.

DanAtC 16 hours ago

[flagged]

nhinck2 14 hours ago

[flagged]

anikozx 14 hours ago

[flagged]

vanattab 16 hours ago

Protect our privacy? Or protect thier right to piracy?

charrondev 16 hours ago

I mean the court is ordering them to retain user conversations at least until resolution of the court case (in case there is copyrighted responses being generated?).
So user privacy is definitely implicated.
NBJack 16 hours ago

Agreed. I don't buy the spin.

tiahura 14 hours ago

Every concerned ChatGPT user should file an emergency motion to intervene and request for stay of the order. ChatGPT can help you draft the motion and proposed order, just give it a copy of the discovery order. The SDNY has a very helpful pro se hotline.

The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.

junto 15 hours ago

This is disingenuous from OpenAI.

They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.

NYT naively push to find a way to prove that NYT data is being used in user chats and how often.

OpenAI spin that to NYT are invading user privacy.

It’s quite transparent as to what they are doing here.

throwaway6e8f 14 hours ago

Agent-1, I want to legally retain all customer data indefinitely but I'm worried about a backlash from the public. Also, I'm having a bunch of problems with the NYT accusing us of copyright violation. Give me a strategy to resolve these issues so that I win in the long term.