How to Stop GPT-3 From Stealing Your Content

Over the course of some 20-odd years, I've witnessed the web explode into something that probably not even Sir Timothy John Berners-Lee could have imagined. From the simple html page to the most advanced mobile app, the web has evolved into an impressive force to reckon with. But if you're not careful with how your own content is accessed, you could spend more time trying to protect your corner of the web than you spend building it.

For some people, the only alternative is to watch spiders, crawlers, and other bots whore their content out to anyone who'll pay for it all in the name of "automated content creation."

Say What?!

According to Imperva's 2020 Bad Bot Traffic Report, 37.2 percent of website visits are actually bot hits. That's an 18.1 percent increase from 2019. And it includes bots commonly used to steal copyrighted content. Some of the most popular content generators, including GPT-3, use these bots to illegally copy content and then sell bits and pieces of it to unsuspecting users. Not only is this a blatant disregard for copyright laws; it's a gateway crime that leads to plagiarized material.

This, despite one of the easiest things to understand about the copyright is in its name: copy + right. It literally means the right to copy something. And it's a right that always remains with the original creator.

Not a scraper.

And not a bot.

What Do I Mean by Gateway Crime?

Each time they're used to pirate someone else's material, they're being used to break the law. Since Internet bots cannot possibly be an original creator, they have no right to copy anything. So each time they're used to pirate someone else's material, they're being used to break the law. That's the crime.

The offenses don't stop there, of course. They continue with people who write with GPT generators and who unknowingly publish someone else's copyrighted material (quite possibly, yours) as their own. That's the plagiarism.

Even if the companies behind GPT generators sell their stolen goods under the guise of "re-written content", "AI content," or worse, "machine learned content" (eye-roll), you, as an original content creator, mustn't be fooled into thinking that what they're doing is anything but committing literary robbery.

Knuckleheads Don't Heed the Consequences

Aside from risking a DMCA defeat, people who use GPT generators risk having to pay royalties when caught or repay the profits that the original creator lost. A bank-breaking copyright lawsuit is always an unpleasant repercussion. So is jail time. It's enough to deter any normal person from plagiarizing.

The companies responsible for them?

Not so much.

They're a special breed that takes hard-headedness to a galactic level. Fortunately, there are a few things you can do to stop them from stealing your content and selling it to unwitting writers now.

How to Thwart Content Thieves

  • MAKE PERISHABLEWEB YOUR FIRST STOP.

Read PerishableWeb's article on How to Block Bad Bots. PerishableWeb will show you how to block the bots used by GPT generators and other through your .htaccess file. Blocking them in your robots.txt file, as commonly recommended, is simply not enough.

You have to block them via your .htaccess file because content thieves don't give a damn about what's inside of a robots.txt file. They actively ignore it, in fact. You can tell by the number of sites that teach people how!

  • REGULARLY MONITOR YOUR TRAFFIC LOGS

You'll also have to monitor your traffic logs for suspicious (excessive) activity. If you notice, for example, a bunch of successive hits from one IP address, you're probably looking at a content thief, especially if those hits fail to load a webpage's accompanying images or scripts.

Want to see what's been stolen from you thus far? Have a look at your traffic logs right now and then search for IP addresses that start with 5., 52., 54., or 13.. If you see a bunch of them, your content is definitely being stolen and has probably been stolen numerous times in the past too. Those IP addresses belong to Amazon's cloud service, a service that some content generators love using to steal copyrighted material.

  • OTHER RECOMMENDED STRATEGIES

Some people rely on CAPTCHA, the Completely Automated Public Turing test to tell Computers and Humans Apart tool. Others may rely on commercial anti-bot services like CloudFlare. Both of those tools have varying success. Personally, I've found a reasonable amount of progress by doing a combination of things: (1) using a honeypot, (2) closely monitoring my traffic logs, and (3) blocking badly behaving IP addresses in my site's .htaccess file.

Every once in a while, you should copy a sentence from a page on your website and search for it as well. If you see the sentence in a search result that points to a website that isn't yours, your stuff's being stolen, and it's time to start filing DMCAs and/or looking at your legal options.

The Lucky 1%

Who makes up the one percent of bots that shouldn't be blocked, you ask? How about the bots that list your website in their search engine: Bingbot (Bing), googlebot (Google), Slurp (Yahoo), etc. If you're not sure whether you should block or allow a particular bot, however, ask yourself, "For what reason would this bot need to access all of my site's content?" If you can't think of a good reason, don't allow it.

There's absolutely no reason for just any bot to download all of your content, especially now that you know it could be stolen right from under you.

submitted by /u/josourcing
[link] [comments]

Digitalmarketing

Digital marketing agency in darbhanga

Digital marketing agency in Darbhanga Digital marketing agency in Bangalore Local seo service in Darbhanga Best digital marketing agency in patna

0 Comments