~ spamfighting ~
Published @ Searchlores in January 2006 |
Ok, this is neither for newbyes nor for people that don't know much about the stuff being discussed.
So what? Excuse us for being -as usual- politically incorrect :-)
Yet those among the readers that enjoy the power of seeking may (may!) find this short "essay in fieri"
quite instructive. Good lecture!
A VERY promising spamfighting exercise
here is the assignment:
Starting from this website : http://www.hp-ariba.com/
Try to determine if we're facing pure spam or poor content website
If it is spam, try to find patterns to gather more information about this spammer, determine what scheme is used to generate money, what are the technical means to achieve that, and finally what can be done to terminate the activity of this spammer.
have fun :)
Local copy of Ariba, saved in case it cowardly disappears after publishing this :-)
Re: spamfighting exercise
Please address your questions to the Internet Marketing and SEO Services.
If unsuccessful, try out email@example.com, located at http://stvincentoffshore.net/about.html
Background information here
Re: spamfighting exercise
Looks like a domain hijacking to me.
Look at the number of marketing "keywords" and the frequency with which they occur. Blatant attempt at boosting their search ranking. This marketing slang never fails to make me laugh/puke. Do they think if they bombard us with buzzwords they will seem any less like liars/crooks/idiots?
Re: spamfighting exercise
aaaa crap i had all that post typed up to go, and then i close the window .. unfortunately opera's undo-close (ctrl-alt-Z, a lifesaver!) does not save formdata .. :-/
so i do it again:
- first thought: aww do i have to look at that crap? :(
- so it's an article surrounded by ads (the article appears to be original)
- money-making #1: google ads around the article
- this is interesting, at the bottom it says: ©2005 Hp-ariba.com (22.214.171.124) what's that IP? surfing to that IP gives me that same mortgage-article page, with one very interesting difference, now at the bottom it says: ©2005 126.96.36.199 (188.8.131.52). oooh automation! can you say
ucfirst($_SERVER[HTTP_HOST])? :) :) the google cache of that site says Can't found domain in the niches list: 184.108.40.206 "found" ?? they not be very good english. also "niches list" that's SEO-speak. also the ads on the IP-site turned into public-service googleads -> not indexed yet by google mediabot? (will probably turn indexed in a short time after i visited it)
- now to the "Advertisement (not connected with this site)" links on the bottom. they all link to <something>.<4-digit number>.info except for the roboform link (got an affiliate ID tagged to it) and showmyipnumber.com (which is a *very* similar page, only it shows your IP too). money-making #2: roboform affiliate.
- roboform is some kind of password-saver application (could be spyware, but seems legit at first glance)
- the <something>.<4-digit number>.info links all redirect to favse.com. this is a typical crapSE. if you search for some words it doesn't know, you get the same results every time: Go to School - Get an education at a school near you or online. Get more information now., maybe they're trying to be helpful, if you make a typo they tell you to go to school? :) :) btw check out the logo on that school-site, do you think they got the costumes from some kind of carnaval-shop? :) :)
- following that link brings us to a chain of redirections: via 220.127.116.11(spam?), c.goclick.com, to find.edu-search.com with a webbug from perf.overture.com. money-making #3: more affiliate programs.
- almost every link from favse goes to some site with an affiliate-ID tagged to the URL, so there's money made with clicks. money-making #4: did i mention affiliate programs?
- other links from the favse results point again to sites similar to the mortgage-article page, like http://www.directorize.com/music/mp3-players.htm but without any affiliateID-tags, so either they're monitoring traffic in some other way, or the directorize people are the same as the hp-ariba guys.
- contact info goes to: http://onlinemarketinggroup.biz/contact.php?d=<domain> (either favse.com or hp-ariba.com) see also http://onlinemarketinggroup.biz/
- the strange thing is that a part of the results from favse.com are going to websites without affiliate-IDs but still have the same format: lots of ads, little (or no) text, but the contact-info/style is different enough every time to suspect they are not the same people (partybusrental.info), whois also gives different people. so either they are not making any money from those links (which is strange, then why link to crap sites? perhaps so people will click on the OTHER links, which *are* affiliates??) or perhaps they get something else in return, governed by some global affiliate thing, partly based on trust? that would be the onlinemarketinggroup.biz, but they don't seem to have a connection with each site?
- one more thing, at the bottom of favse.com: Not satisfied with your results? Try Google: this googleform brings you to a favse.com-skinned version of google. do they get money for these referals? like opera gets money from their SE referals? so perhaps money-making #5: search-partner/affiliate with google?
- they use the google-urchin to analyze their ad-profits things (duh)
to partly answer lokis questions:
- scheme + pattern: lots of domains, lots of articles, lots of ads, link them together, attach some crap-SE you found somewhere in the gutter.
hmm wait a minute, looking at the results source of favse.com *every* result has
"<!-- google_ad_section_start -->" around it! (all in a separate table, even, which is odd) .. is this some kind of google ad program i'm not aware of? in that case google sucks even more than i thought it does (cause the results are worthless). or would they ... would they? somehow process the targeted ads from their other domains, put them all in a large database, build a simple SE for it, and "fake" the clicks?? (via
click.php?id=6fc63b6cb556b81bfe90b81b8aaa253d) in that case it's a clear violation of google's TOS and terminating them should be as easy as notifying google.
one easy way of terminating is of course not so very nice. they are probably already on a watchlist from googleads somewhere, so just faking some 1000s of clicks through a little script would probably get them banned hm? :)
loki you probably already did some research on these guys, tell me if i'm somewhere on the right track..
there are a few points that bug me:
- why does favse link to pages without affiliate-IDs if that's not making them any money
- what is that <!-- google_ad_section_start --> code doing in favse's searchresults if they are not clearly marked as google ads? and if they are faking it, then why bother with those comments in the first place?? adding code in your HTML that looks like google ads but in fact isn't, is probably not a TOS-violation, but the reason why they are doing it, probably IS.
LOL omg they're so lame :-)
how the fuck do you get a typo in your logo? that is equivalent to the domain-name you're using? once again this shows exactly how much time these guys put into their website creation... *sigh*
Re: Re: spamfighting exercise
Heya ritz. Sorry for the delay, I was waiting a bit to see if anybody was willing to join the party.
Looks like nobody's going to add something now, so here is my point of view.
First of all, you found a big part of the 'global scheme' of that specific spammer, and the two other
anonymous posters added enought hints to gather everything together in a unified picture.
You also made some small bad assumptions, but I'll get to those later.
So, let's start at the beginning again, and go through all the signals.
Content on the page is minimalist, an article about mortgage rates refinancing, two adsense blocks before
and after the article, a word count, links in the footer to external sites, a footer signature and a contact
Note also the link pointing to that same page, with the anchor text "Tips of Mortgage Rates Refinancing"
As we'll see later, this guy is going heavy on anchor text. This page is definitely optimised for those
keywords. Here follows a table presenting the keyword density of "mortgage rates refinancing" for this page.
It is showing density for the title, the body, headings, links and emphasis.
As you can see, it is decently optimised. As a consequence, if you search for [tips of mortgage rates refinancing]
hp-ariba.com is ranked first on 3.430.000 results. Not a bad result, especially because this 'niche' is quite
crowded : http://www.nichebot.com/?term=Mortgage+Rates
Moreover, the content as you pointed out is 'organic'. A snippet search gives you only one result, this site.
The 'organicity' of a site is something _really_ important for search engines. Duplicate content is an everyday
fight, and an organic content is a really good signal from an algorithm point of view. But we're pretty sure this
guy didn't write this article himself, aren't we ? now, a subsidiary exercise (hard one) would be to find the
source of that article, and what linguistic treatment was made on it (if any)
The aim of every web spammer is nearly always the same (if you let aside the political/activist/propaganda/religious/prank
objectives, which are not really mainstream. (but they still are a reality. If you're interested, tell
me and I can find some examples of so-called guerilla marketing applied to non economical issues)) : MONEY.
The money here is generated by Google adSense program. Note for later the customer ID of our spammer, it can be used
This can be useful later if you want to verify if two spamsites are owned by the same spammer.
So, our site here is a typical "made for adsense".
Attract traffic from SE on specific niches, and wait for the fish to click on your proeminent ads. And, believe
it or not, it is working. Yeah, people click on those god damn ads :(
I suppose the CTR is also related to the brain power of web surfers.
But.. typical ? Not really.
First, don't you think the domain name is a bit weird for an article dealing about mortgages ?
And it's an MFA site anyway. There's something fishy here.
If you can, check the pagerank of the site. With my SearchStatus extension, I read a PR of 5. Now, if you know a bit
about that algorithm, you know that you don't get a PR5 like that, with a single page and poor content. It represents
quite a lot of backlinks. There's something reallllly fishy with that domain.
As suggested in another post, check the wayback machine : http://web.archive.org/web/*/http://www.hp-ariba.com/
Go back a bit in its history : http://web.archive.org/web/20050128101343/http://hp-ariba.com/
"HP-Ariba enterprise solutions - © 2004 Hewlett-Packard Development Company, L.P."
Wtf is the relation between HP enterprise solutions and mortgage ? Pagerank.
This is a typical "drop catch". This domain isn't properly updated when it has reached its expiration date. It already
happened in 2004 : http://web.archive.org/web/20041009162333/http://hp-ariba.com/
Unfortunately for them, in 2005 they forgot again to rebuy the domain : http://whois.sc/hp-ariba.com
It is now owned by onlinemarketinggroup.biz.
And we can't seem to access the historical changes in the ownership. The domain appears to have been owned by
onlinemarketinggroup.biz since 2001. Which is obviously not true.
Look at the content that is still indexed : http://www.google.com/search?q=site%3Ahp-ariba.com
It's obviously not related to mortgage (beside the main page), all are supplemental results, and if you try to
view them you are redirected to the homepage (don't loose a drop of traffic juice !)
What we are seeing here is a 'drop catching'. Some people are buying expired domains. If you want to dig a bit
deeper with drop catchers, have a look at dnforum (http://www.dnforum.com) and learn what is a NNN or LNN domain.
Woohoo ! :|
Drop catchers usually do that to speculate on notorious domain, or blackmail people. But a new race of drop catchers
appeared with webspamming activity, and with the infamous pagerank algorithm.
This one, as you all know, uses the link structure of the web to determine the authority of a page, based on
the number of backlinks to it, and ponderate by the value of the pages from where those inbound links come from.
HP-Ariba.com was probably heavily linked before, by authoritative website (probably a lot by HP domains..)
Check the backlinks using yahoo tool (better than google's one, obfuscated to slow the spammers) :
Not much, but those links :
have both a PR of 6, and still points to hp-ariba.com
therefore google assume that HP and AribaLive website _vote_ for hp-ariba.com, and gives it good pagerank.
They still haven't noticed that the other domain is expired, drop catched, and converted to spam.
I hope you get the point now, and see why those domains are really valuable to spammers.
But our guy isn't stopping here.
Let's see if we can find more of those MFA websites built on dropcatched domains. Let's try to find pattern
that will help us build arrows to catch more. I would use the content of the page, but you can also gather
a lot using domain registration infos, or IPs of the hoster.
"Article Word Count: 322"
"Advertisement (not connected with this site)"
that should be enough : ["not connected with this site" "Article Word Count" "More Articles"]
exact same template, all PR5. Our spammer is definitely drop catching domain with high PR and use them for
very specific high profit niches.
Let's continue to pull the string. Tweak a bit the query and : ["not connected with this site" "Last updated"]
Here we can find different pattern !
No more content, no more adsense. Same template, links at the bottom, and same contact domain at onlinemarketingroup.biz.
Why are we seeing those after the 90th result ? Perhaps because their pagerank is lower. The spammer bought them too, but is
not bothering to put content and adsense on those. You have instead strange series of numbers like those :
"1271454894 - 1234814241 - 1241272342 - 1254250691
1247630001 - 1276535194 - 1255696050 - 1251025692"
I have no clue what those numbers are. I thought it was unix timestamp at first, but it makes no sense at all.
We can still use it to gather more domains from that spammer : ["not connected with this site" 1200000000..1300000000]
I suppose this is made to create easily and automatically 'organic' content, and therefore avoid to be filtered
out by the duplicate content filters.
Now, there is no direct money making on that page. So what's the point ?
At the botton you still have affiliate links, but who's gonna click there and make the spammer earn money ?
But but but..
You don't have only affiliation program links there. See a bit at the bottom. As you noticed ritz, there seems to be
a bunch of cheap domains (k002.info, m028.info, m026.info etc..) probably bought and managed by an automation software.
They all have subdomains containing porn keywords, that are also present in the anchor text.
eg: "abnormal cocks" points to http://abnormal-cocks.k002.info/
If you click on those, you are redirected with a 302 through an affiliation program to a porn website. Here is the real money.
But now the question is : How can people click on those links, there are at the bottom of a useless page !
That's were the real scheme (imho) is. Our guy is in fact a porn webspammer. If you make some research
on "online marketing group", from kingston, you'll find some adult webmaster forums complaining about the activity
of that guy. He's apparently really aggressive, using blackhat SEO techniques (like doorways+cloaking+link spam etc..)
and selected cleverly his keyword battlefields.
Check the domain k002.info : [site:k002.info]
Booh.. Porn porn porn, spam spam spam. Here is the worm can. All 302 redirections to movieaccess.com
Funny, the pages are not really indexed, you don't have anycache, keyword spam seems to be obvious.
you to the final commercial site (it's a 302. not client pull). He's perhaps cloaking more precisely
on the IPs of the crawlers, i don't know. But anyway you can't view their cache. So they're not optimised
'on page', otherwise they wouldn't rank.
But they do. Don't they ?
Search for [abnormal cocks]
422.000 hits. k002.info is ranked second !!
Now, that must generate quite a lot of traffic, indeed. Calculate : 94,400 subdomains on k002.info, and
this is only one domain between the hundreds (thousands ?) that this guy may have. All of those pages
ranked on competitive specific porn queries, and redirecting automatically to a commercial site through
and affiliation program (and each time this redirection is performed, the spammer earns money).
Those dummy sites are ranking so high BECAUSE they have the previously seen drop catch domains pointing
to them with the appropriate anchor.
My guess is that adsense isn't really giving our spammer a lot of money. It's probably a pathetci attempt to avoid
being blacklisted by saying "look, my site is organic, the article is actually interesting, nothing hidden
on my page, and I AM USING ADSENSE ! I AM A CUSTOMER !". But on the back, this spammer is specialized in
porn, and is probably earning a lot more by boosting the ranking of the dummy sites and acting as traffic
magnets to drive it through affiliation programs.
Hm. I think I've written enough for today. We've gone through the first MFA site, and discovered a big
network of spam sites specialised in porn niches. From there, we could gather even more informations
about this 'online marketing group', and draw a nearly exhaustive list of his activitites.
But there's one question left. What can we do now ?
If you have any idea, propose them here, I'd be really interested in hearing your thoughts.
I may find some time later to give you some of my ideas on 'retaliation' :)
That's all for now ! Hope you enjoyed the trip.
Re: Re: Re: spamfighting exercise
You have already done most of the boring stuff, but if you see the link at the bottom of the page - 'contact us' - pointing to:
which, interestingly *is* the name of the domain... we get another way of getting more 1800 spamdomains probably belonging to the same spammer: linkdomain:onlinemarketinggroup.biz
what surprises me is that the 'content' seems to make sense? and it seems to be what you call unique, but I doubt he knows what's talking about...
(c) Loki 2006
(c) III Millennium: [fravia+], all rights reserved, reversed, reviled and revealed