cross-posted from: https://lemmy.world/post/47074737
Spotted in the wild:
Paper from JABDE:
Credit goes to u/TobyWasBestSpiderMan for the original post
cross-posted from: https://lemmy.world/post/47074737
Spotted in the wild:
Paper from JABDE:
Credit goes to u/TobyWasBestSpiderMan for the original post
Well, then Google shouldn’t have just scraped the site then. It’s not JABDE’s responsibility to make their content suitable for LLM training
It’s everybody’s responsibility not to spray piss in random directions hoping some of it will hit somebody they hate.
One thing about internet sources is that in general, people engage with them if they choose to. Your piss-spraying analogy only works if the users don’t have this freedom. At least for now, we the end users still have the choice to engage with LLM’s, or to choose to navigate elsewhere.
So no, there is no randomly pissing around hoping that LLM training data is among the things being hit. It’s Big G demanding everything as LLM training data and tossing it on the heap, and someone finding that said heap includes The Onion and individual shitposters, and with their dislike for LLM’s, acting accordingly.
Oh one more thing:
Be glad that OP’s site is shitposting.
This could get much worse if it was politically motivated propaganda.
Don’t believe me? Try getting DeepSeek to say anything critical of the CCP.
Ok so now we make everyone be nice and do not post satire and miss information on the Internet so that the LLMs don’t spread misinformation?
Yeah this will probably work very well…
As if nobody is going exploit that.
It’s everybody’s responsibility to get fucking literate in the use of media. Examples like these are harmless and just point out how easy it is for malicious actors, be it states, political partys, or other groups and individuals, to spread misinformation.
The use of AI tools such as LLMs makes this even more important.