LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI

geneva_convenience@lemmy.ml · 2 months ago

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI

r00ty@kbin.life · 2 months ago

I blocked the entire ASN for Meta, because they were downright dirty with their scraping. No gradual crawling, fakes UAs, random addresses across a large number of subnets.

They weren’t the only ones either. The AI scraping heist is the new goldrush.