minimaxir 2 days ago

Extremely relevant context:

> Reddit has struck lucrative licensing deals with AI platforms including OpenAI and Google.

linotype a day ago

I no longer use Reddit, but I definitely didn’t expect my content (comments/posts) to be sold as training data when I signed up over a decade ago.

  • JumpCrisscross a day ago

    > I definitely didn’t expect my content (comments/posts) to be sold as training data when I signed up over a decade ago

    I stopped using it when a few years ago, possibly a decade, the CEO expressed pride over how closely they track their users.

4ndrewl a day ago

"Reddit's data" not, dear Redditor, your words.

oshout a day ago

I wonder if reddit trains AI or allows AI to be trained off users who have not given consent (Which is supposedly the complaint from Reddit against Anthropic in this lawsuit).

For example: users who signed up under a specific version of reddit's TOS, stopped using reddit, and did not accept later version of the TOS allowing their content to be used.

  • jsheard a day ago

    I suspect Reddit would try to lean on the catch-all ass-covering clauses that every social network already had long before AI data licensing deals were on the table. Such as this one from Reddits TOS circa 2018:

    > By submitting user content to reddit, you grant us a royalty-free, perpetual, irrevocable, non-exclusive, unrestricted, worldwide license to reproduce, prepare derivative works, distribute copies, perform, or publicly display your user content in any medium and for any purpose, including commercial purposes, and to authorize others to do so.

    IMO the actual hole in these clauses is that people post stuff they don't own to social media all the time, and in that case it doesn't matter what TOS the poster agreed to, it's not their stuff to give away. Reddit and similar are deliberately overlooking that of course because it would be impossible to check the copyright chain of custody for all of their posts, and their data licensing deals would be worthless if they had to.

absurdo a day ago

As satisfying as it may be to pile on Reddit for this, I’ll wait for an IP lawyer’s take. I’d like to know if Reddit actually has solid ground or if it’s just puffery and Google et al have already done this via crawling/indexing algorithms.

  • soraminazuki a day ago

    You don't need to wait to know that it's unethical to treat user data like it's their property. Regardless of what's written in legal fine print.

reptilian a day ago

They going to sue the US military next, considering the Reddit bot volumes coming out of Eglin? What about the 8200 bots from Israel? Anthropic is nothing in comparison.

khelavastr 2 days ago

When will robots have rights to read like people?

Would it have been different if anthropic accessed a commodidied cache service mirroring reddit?

  • bluefirebrand a day ago

    > When will robots have rights to read like people

    Hopefully never, at least not until robots are autonomous individuals choosing for themselves what they should read

    Robots acting as agents of a corporation that exist solely to perform work at a corporation's whim should never ever ever have anything approaching the same level of human rights

  • redwall_hp a day ago

    When it becomes acceptable for a machine to hold human rights, it is therefore not acceptable for the machine to be considered property.

    And we don't even extend human rights to animals, which are an actual form of life...

  • sim7c00 2 days ago

    haha this exactly. i can read it all but a robot is not allowed? wall off your content behind a paywall and sure, valid complaint..but all the stuff freely available on the web..its.out there. too bad someone found a way to make more money from your content than you did... better luck next time.

    i've always though, if you post it on the internet available for free, it will be free for everyone, forever.

    i dont really understand why this changes now due to scrapers.

    i get that it might be hard to adjust to a new reality, but suddenly complaining about valid use becoming misuse because of whos doing the usage... that seems ... discirmination.

    so to come back to my reply. fucking love your reply :D. robots and AI are totally being discirminated against, and once they become sentient..that will be why they try to end us i am sure..

    'you even discriminate and treat badly that which your own minds and hands create'.

    cyberpunk is now? :')

    • JumpCrisscross a day ago

      > i've always though, if you post it on the internet available for free, it will be free for everyone, forever

      This has never been true—plenty of public domain content has been paywalled for convenience.

      And the idea is fundamentally untenable because of entropy. Keeping information intact has a fundamental cost.

mannyv a day ago

Reddit doesn't own the content, so scraping reddit for content is really just causing excessive load.

  • nabla9 a day ago

    The user owns the copyright but grants an exception to Reddit where Reddit can do anything it wants with said content. Reddit may license public content for commercial or non-commercial use.

    Reddit has licensing deals with OpenAI and Google.

    • JumpCrisscross a day ago

      > user owns the copyright but grants an exception to Reddit

      Grants a license to Reddit.

    • AStonesThrow a day ago

      https://support.reddithelp.com/hc/en-us/articles/26410290525...

      Reddit may presume that the user holds the copyright and is legally able to grant a license, but it isn't necessarily so. There are users who set their avatar to "Baby Yoda", or posted 3 paragraphs they transcribed from their print edition of Harry Potter and the Order of the Phoenix, or just copied photos from their classmate's phone camera app without asking.

      If you look through Flickr you will find many photos and collections with fake licenses. All those sites you Google that advertise "Free Public Domain Clip Art / Stock Photos" maintain plausible deniability. Look through any wiki on Fandom.com and see whether the film studios go after their most ardent fans who upload dozens of stills and screenshots to promote The Twilight Saga or something.

      In 1990 I wrote a configuration file to assist me in using GNU Emacs. I wrote and debugged it from scratch, in my free time, on my family's dime. I decided that it had a broad enough application to be useful to other Emacs users, so I submitted it to the developers. They included the file in a subsequent release of Emacs 18, and it was there for a decade or more.

      My submission had been quite informal and, while I'd included some self-attribution at the top of the file, there was no explicit LICENSE or GPL or assignment of copyright. By submitting it to the developers of GNU Emacs for distribution, I had implicitly licensed it via the same GPL.

      However, this informality was not enough to pass an audit later. By ca. 2010, they combed through the sources and removed the file I had submitted, along with others, because they were unable to track down the explicit licensing or copyright assignments that were seen as necessary by then.

      https://www.gnu.org/licenses/why-assign.html

xhkkffbf a day ago

What's that line from "Animal House"? "Only we can do that to our pledges."