If you visited my WordPress blog lately you may have noticed it wasn’t there, replaced by an apology: My blog’s returned, and for those who are curious here’s what happened.
“It isn’t the mountain ahead that wears you out; it is the grain of sand in your shoe.”
- Anonymous (from “The Western Underwriter“, May 18, 1916, page 10)
After seeing a sudden increase in visits to my WordPress blog I reviewed my visitor logs and discovered most of the visitors came from Amazon Web Services in Singapore.
Here’s a chart showing activity from Singapore:
(click image to embiggen in a new window)
On July 19, 2023 seventy-two (72) different pages on my blog were visited in less than an hour, apparently by a visitor using an an Android cellphone: That’s one page every every 51 seconds. How impressive!
Now it could be an actual human was actually reading my blog very quickly, or maybe Amazon was indexing my website to make finding it easier to find on the interwebs.
But it might also be an automatic bot from Singapore using Amazon AWS Elastic Compute Cloud (EC2) to systematically scrape the content of my WordPress blog, perhaps to create a duplicate website (typically used for bad purposes), extract content for use or sale somewhere else, or increase the popularity of their or someone else’s website. In other words, someone instantly profiting from years of my work?
Let’s check:
My Visitor Log reported the visitor was using an Android phone with a screen size of 375×812. The only phones having that screen size are the Apple iPhones XS, X, and 11 Pro, and they don’t use the Android OS.
With my Spidey-Sense tingling I took my WordPress blog offline and redirected visitors to an error page I tracked separately: An actual human would quickly discover the blog wasn’t around anymore and stop visiting, but an automated robot designed to simply follow all the links on my blog would just keep right on going.
…and, they did:
Visitors from Singapore reading a page which doesn’t exist:
(click image to embiggen in a new window)
Looks like a robot to me. And now for the boring part: Using my website’s traffic logs along with Whois & IP Subnet tools (and far too much of my time) the CIDR IP Ranges covering the IP addresses used for the presumed abuse were identified, and my website’s access file modified to deny access to anyone – or any ‘bot – using those IP ranges. The result is approx. 25,919,744 IP addresses are now denied access to my website because of (I presume) from the content-scraping actions of someone in Singapore using 150 different Amazon AWS IP addresses over 8 days:
This Singaporean became the “pebble” in the internet “shoe” of at least 26 Million users
I’m hoping my actions stopped this overly-inquisitive visitor, but they won’t completely stop those wishing to exploit your website for profit: Protecting your website content from being scraped, copied, spoofed, or stolen requires using a number of strategies — Click the links in this sentence to find out more on each.
Update: I presume those same Singaporeans are at it again, this time using 27 different Amazon AWS IP addresses to scrub 27 more pages during Nov. 3rd, 2023. The result is an additional 65,536 IP addresses are now denied access to my website.
Thanks for Reading! (and, for not abusing)