"> 14 Aug 11 Filed in Tips and Tricks, Website exploits
This is a follow up to my last week’s post about hacked WordPress blogs and poisoned Google Images search results. Cyber-criminals infiltrated 4,000+ self-hosted WP blogs and created doorway pages that would redirect visitors coming from Google Images search to scareware sites. A few days ago I posted a short update to let you know that Google has removed the doorway pages from its index. I also promised to share some new interesting details about that black hat SEO campaign. So here we go!
Cloaked links
To have Google discover and index rogue doorway pages, the attackers needed to place links on web pages that Google already knows about and regularly crawls. One of the popular approaches is to create free websites and post links there (there are many services that allow to do it). However, in this particular case I couldn’t find such external links.
Then I checked cached versions of legitimate web pages on the hacked sites and found the following code right before the closing
The code cannot be found if you open the same web page in a browser. This means that hackers used cloaking to feed these links to search engine spiders only.
This code defines an invisible style (height:0; width:0) and then lists dozens to hundreds of links to doorway pages on that site inside the block that has that invisible style. The name of that style is a random combination of four letters and it changes from site to site.
This trick prevents webmasters form seeing the spammy links when they check cached web pages (of course, unless they scrutinize the HTML code) and at the same time provides links that don’t look like invisible to Googlebot (I guess Google is well aware of such tricks though ;-) ).
The placement of this spammy code makes me think that hackers injected it into the footer.php file of the blogs’ themes. Most likely the actual code is encrypted (e.g. with the base64_decode or some other obfuscation trick) so check the code right before the tag.
SEO Anomaly
I noticed one interesting thing. Every link block on every hacked site has a link to rankexplorer .com. The anchor text is always the same: Poker Software.
The domain was registered on February 21st, 2011 and already has PageRank 5. That was very suspicious. Only very popular sites can get PR5 in such a short time. So I decided to check who linked to the rankexplorer site and how seriously those links on the hacked sites contribute to this rapid progress.
Yahoo Site Explorer
First, I checked external backlinks using Yahoo Site Explorer:
The report says there are 1,858,186 external links to 7 pages on this site. Impressive!
It was clear that sites at the top of the list were hacked. But it was not clear how many of those 1,800,000+ links are from hacked sites and if there are many (or rather any) legitimate links. Moreover, YSE doesn’t distinguish “doFollow” and “noFollow” links so it’s hard to use this report to tell which links actually contribute to the high PageRank. (For example, there can be many “noFollow” links from spammy blog comments and forum posts).
MajesticSEO Site Explorer
So the next step was a more thorough investigation using MajesticSEO Site Explorer. MajesticSEO maintains quite a fresh index (updated 2-3 times a day) and its size is comparable to that of Yahoo (they claim that only Google has a larger index). What’s more important, they provide various backlink reports that allow to easily spot interesting patterns and anomalies.
Lets begin with the Domain Information report:
Well, the number of external links here is significantly smaller than in Yahoo Site Explorer. But we should not forget that this is a “fresh index” and we deal with hacked sites that get cleaned up once their webmasters notice the hack.
The useful information here is:
very few link are “NoFollow” – 0.3% (so the comment and forum spam is not the case)quite a few deleted links – (webmasters remove spammy links from hacked sites)domains/links ratio suggests that multiple pages of the same site link to rankexplorer — quite typical for spammy links.most of the linking sites reside on different servers and even on different subnetworks – (they are not just from one hacked server).
The same report has a “Referring Domains” history graph
You can see a spike on July 20th. This matches the beginning of the black hat SEO campaign.
The “Top Pages” report shows that all external links point to the home page only. That’s not typical even for a small site with so many backlinks.
The most revealing data can be found in the Top Backlinks report. It provides a list of up to 2,500 referring URLs (Majestic Silver plan) in order of their significance for SEO along with the anchor text (!) of the backlinks.
Main insights:
Out of 2,500 backlinks , 2,426 (97%) have the “poker software” anchor text – (This anchor text is used on hacked sites)60 backlinks (2.4%) have the “poker statistics” anchor text. They are hidden links on a few supposedly hacked sites (different attack though). The spammy code look like this:
The rest 13 links can be easily neglected.One of them comes from Baidu search results (why does MajesticSEO index Baidu SERPs?!)Six “software de poker” and “”poker mjukvara“” are from a hacked site that uses some sort of auto-translation that translated all spammy links into Spanish and Swedish ;-)
And finally, the “Referring Domains” report shows that most of the domains can also be found in my list of WordPress sites affected by this black hat SEO attack.
So the backlink analisys clearly shows that the rankexplorer .com owes its high PageRank exclusively to black hat techniques.
PageRank vs real SERP positions
Was it worth the effort for rankexplorer? Not that much. If we search for [poker software] or even for ["poker software"] on all major search engines, the rankexplorer is nowhere near the top. The top two Google search results for this query currently link to sites with PageRank 4, and #3 has PR3! As Matt Cutts always says: PageRank is only one of many factors that affect site position in search results.
So were all the spammers’ efforts futile? Not exactly. For some queries (I won’t call them popular) you can find the rankexplorer on the first page of search results. Currently it is #4 for the ["poker statistics analyzer"] query.
Interesting sidenote. Out of all major search enignes, Baidu (#1 search engine in China!) is the most susceptible to the rankexplorer’s black hat SEO campaign:
Previous generation of this campaign
The MajesticSEO’s reports helped me find some sites where the injected code and doorway pages were different than in the attack that I described last week. Moreover, some of the sites were not WordPress blogs. After some additional analysis, I figured out it was a previous generation of the same attack. Here are the details:
Link blocks
Checking cached versions (Google cache) of legitimate pages on the compromised sites, I found a familiar cloaked blocks of hidden link that used the style/font trick:
However, instead of linking to doorways on the same site, those blocks linked to doorways on multiple third party sites (usually about 50 unique sites in one block). And the rankexplorer link was in the middle of the block this time.
This cross-linking scheme helped me identify 700+ hacked sites. Most of them can be identified as WordPress blogs, Joomla sites and Zen Cart online stores.
URL patterns
The most common URL patterns of the doorway pages are:
example.com/[a-z]{3,4}=., where is a random combination of characters, digits and hyphens, and is a one of the popular file extensions of web pages (html|htm|shtml|php|php3|php4|php5|phtml|jsp|asp). The extension part can be missing.
Another popular doorway URL pattern is example.org/[a-z]{3}-., where are hyphen separated keywords targeted by the doorway page.
Examples:
example.com/qlv-wallpapers-cowgirl-stock-photos.asp (note, this page is on a Linux server that has no ASP)example.net/qxr-trail-of-tears-coloring-pages.php5example.se/lck-multiplication-chart-1-500.html
And the combination of the above two patterns: example.net/[a-z]{3,4}=.
example.org/?jyw=make-your-own-art-online.php4example.com/?liz=sample-1023-arts-organization.shtmlexample.net/?klb=dem-mac-martial-arts.phpexample.es/?jys=art-of-8000-bce-500-ceChronology of the attack
Some of the websites have already been cleaned up. On such sites, I can only find the spammy content in 2-3 months’ old cached copies, which proves that this attack was active around May 2011. We can find one more evidence of this in the MajesticSEO report for the notorious rankexplorer .com site that uses its “historic” index.
This graph shows that MajesticSEO began to index links to rankexplorer .com (and we know they all come from hacked sites) in April. Then the was a peak in May (new indexed domains referencing rankexplorer). Almost 0 new domains in June and then another uptrend in July (which corresponds to the attack against WordPress blogs that I described last week)
Still malicious
Although that wave of the black hat SEO campaign has been idle for at least a couple of months now, many of the compromised sites still contain malicious web pages. As in the most recent attack, they only redirect visitors to scareware sites if they come from Google Images search (clicking on web search results won’t trigger the redirect.)
Redirects
For visitors from Google Images, the doorway generate a page with an invisible form and a JavaScript that automatically clicks on the form button, which effectively redirects a browser to a Fake AV site: