One of my sites has been under mass attack by bots for a month now, without cease. It’s cost me (and my developer partner) time, money, and an undue amount of stress. It’s undermined my analytics and stats.
And while there are a couple things we’ve been able to do to minimize the damage, essentially there’s no way to stop it. It just. keeps. coming. And frankly, if it continues, and spreads, there could be big repercussions across the web on ad revenue and analytics.
What I Know
The attack started on February 21, 2012, around noon. I keep an OCD-level eye on my traffic, and I noticed a big jump in direct traffic. This is unusual, because this particular site is less than a year old, and has not had a chance to develop a lot of branding yet. It’s pretty well situated in the search engines for its niche, but not that many people know it by name. Anything more than twenty or thirty percent direct traffic would definitely be odd. And then I started noticing some other strange behaviors:
- All the traffic was reported as Internet Explorer (versions 6 through 9)
- All the traffic was reported as Windows (XP through Win 7)
- The traffic was coming from all over the world (and the site is focused on ONE state in the US) and from thousands of IP numbers & ISPs.
- It was all hitting the home page and leaving immediately. My bounce rate quickly soared to about 99%
- There was nothing – no one thing – that I could pinpoint to block this traffic from coming in. No commonality.
Strangest of all, the traffic was *slow* – drip drip drip. Never so much to come anywhere near a DDOS, or have an effect on the server, but at any given point, there would be six to ten “visitors” on the site at a time. While it looked very much like actual human browser traffic, it wasn’t difficult to conclude that this was something automated.
What I Don’t Know
The first thing I did was turn off my AdSense. Any kind of automated traffic like this would (quite reasonably) be seen as a risk to advertisers, and I had no idea what this was or what it could do. I turned ads off on the entire site. After several days when it was obvious it was only hitting the home page, I could turn the ads back on for everything BUT the home page.
That same afternoon, Roger Dooley posted a thread on Webmaster World about the same sort of bot attack on one of his own sites. Over on the Google Analytics help forum, a discussion was forming with more and more reports of this same strange traffic pattern. Most (but not all) started on February 21.
The first thought that went through my head was that I was somehow being targeted (paranoia!) But if someone really wanted to attack me, they probably wouldn’t have done it on this site. Comparing notes with as many people as I could find, there seemed to be no commonality on the receiving end – some sites had AdSense, some did not. Some sites were WordPress, some were static HTML.
Then I thought, maybe it was some kind of a probe, looking for a WordPress exploit. But other (non-WP) sites were being hit. Harvesting email addresses? I don’t list any email addresses on that site, and besides, all this was doing was loading the home page, over and over and over.
More paranoia set in – maybe someone who got Pandalyzed and wrote something to trash Google Analytics, and I was unwittingly part of the beta test? After all, Google gets a lot of aggregate information from GA; trashing the stats like this would definitely damage trust in the product.
But whether it was a targeted attack, a coding error, or collateral damage didn’t matter. What mattered was that I might have to shut down my site. It’s a community service event site, and it’s supported by ad revenue, both Google’s and (hopefully) direct local ads. I couldn’t run AdSense on a site with bogus traffic, and it would be fairly difficult to sell direct advertising without decent stats to show potential advertisers. And this particular site sucks up a LOT of resources when it’s at peak; it needs to be able to pay for itself.
I asked my host, TigerTech, to take a look, and they said as far as they could tell, it looked just like human traffic. There were no User Agents or anything else by which we could block this traffic (without blocking real live users)
After a week or so, Roger posted a theory that perhaps Compuware’s Gomez Peer program (which pays users to install a screen saver that tests site and network performance, and collects benchmarking information) might be behind it. Some people had reported contacting Compuware and the traffic mysteriously stopped. But I contacted them, they opened up an investigation and determined that my site was not in their database, and the IP numbers I sent them were not part of their peer program. They also told me that Gomez identifies itself in the User Agent. I have no reason not to believe them. (And the people who said their traffic had mysteriously dropped jumped the gun – it came back.) So that was a dead end too.
I did a lot of frantic Googling for other people having the same problem, and we tried a lot of things, none of which ended up panning out. Bill Atchison (@IncrediBILL) of CrawlWall put a lot of time in as well, making suggestions, writing scripts to collect data, and so forth. In the end, he came to the same conclusion as everyone else – it was browser traffic. There was no way to 100% block it without blocking real human users.
Where I Am Now
As of today, March 20, 2012 it will be four weeks since this attack started. It’s still going on. It’s still a slow drip, and it’s gotten a lot slower on my site, although other people are reporting being hit much harder. After a peak of around 10k visits per day, it’s now settled down to a steady 1k per day, give or take a hundred visits. It’s still hitting the home page only. We’ve taken some steps to block ads and analytics, as much as we can; it means we are not showing ads or analytics to some real users too, but that’s our collateral damage. We’re also not allowing non-English browsers, because this site is targeted only to a region of the United States. If it is a virus attack, maybe some of the infected Windows machines were cleaned up, I dunno. As I get closer to my peak season on this site, I’ll have to evaluate what effect this will have on my earnings (the home page being the best earner) and whether or not I’ll be able to keep it going. Fortunately, I don’t rely on this site for my income; if I did I would be in trouble. Just waiting to see if this will end one day, as mysteriously as it began, or if it will be scaled up, or …? I just have no idea. All it appears to do is come hit the page.
What It Means
This is the hard one. Maybe nothing. Not that I’m an alarmist or anything (who’m I kidding, of course I’m an alarmist) if this spreads, or is the pre-cursor to some larger attack, it could seriously screw up the web. It could affect ads. It could affect every type of analytics – if you aren’t tracking for conversions, how are you ever gonna know how much of your traffic is real and how much is fake? It could affect end user trust in analytics.
Or it just might go away.
If you’ve seen or experienced anything like this, please chime in below.