OpenAI’s web scraping GPTBot is under attack – here’s why

There is already a ton of controversy surrounding AI, especially with the use of ChatGPT in papers, articles, and elsewhere. However, OpenAI (the company that developed the ChatGPT chatbot) is kicking up even more controversy with a new GPTBot that scrapes the internet, learning from the content published on the world wide web.

Tech. Entertainment. Science. Your inbox.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

It’s also likely that OpenAI knew the kind of controversy that this would cause, too, because it released the GPTBot without much fanfare or even an official announcement, though there is a support page for the bot that walks you through many of the details. Based on what it has shared, the bot appears to be designed as a web crawler, scraping content to learn more about the company’s language model.

So what’s the big deal? Why are so many people upset about this, and why are websites like The Verge scrambling to block the bot from scraping their content? Well, much of it comes down to the age-old consent variable. A lot of the content being shared on websites, especially blogs and things of that nature, is original content in some way.

ChatGPT homepage Image source: Stanislav Kogiku/SOPA Images/LightRocket via Getty Images

Someone has put their time and effort into writing or creating that content, and for many, the fact that a bot can just come by and scrape that information and knowledge and learn from it without any consent being involved is a huge problem. Additionally, AI is still very young and tends to paste the information it finds on the web, claiming it as its own, which is plagiarism, something that’s already rampant throughout the web without AI getting involved.

The other big problem is privacy. Because this bot is scraping the internet, it’s also scraping up information like usernames, emails, and other information that may have been shared in public places. That means that information could inadvertently be included somewhere it shouldn’t be, especially with the current copy/paste problems in AI models like that powering ChatGPT. We’ve already seen some privacy investigations into ChatGPT cropping up.

Luckily, OpenAI has enabled websites to block the GPTBot very easily, and that’s what many have done. But other bots do similar things, and there aren’t easy ways to block them. The blocking also doesn’t consider the thousands (possibly millions) of aggregating sites that rip off content daily. So it’s simply joining an already impossible battle that content creators and website owners are fighting.

We’ll likely see lawsuits concerning this, especially if OpenAI continues development on the GPTBot and pushes it harder as a tool for the language to learn from. These concerns are also underlined even more by the plethora of worries around AI already, as there are very few laws surrounding the advancement of AI systems and how they use data to learn and evolve.

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : BGR – https://bgr.com/tech/openais-web-scraping-gptbot-is-under-attack-heres-why/

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Trending Tags

Trending Tags

Trending Tags

Trending Tags

OpenAI’s web scraping GPTBot is under attack – here’s why

The Continental: Trailer, plot, and everything else we know about the John Wick prequel series

The latest Vision Pro leak provokes new questions about battery life

Categories

Archives

Browse by Category

Recent News