~ Loki on Yahoo's filter - a short pointer ~

By Loki

Lately, all eyes are turned on the diva Google, and every sidestep is
noticed, blogged, commented, flamed etc.. For good reason
or not. I don't really care. But people tend to forget about the other
big players in this industry, be it for good or bad things.

New
interesting features coming out of Yahoo's labs are ignored, useful MSN sliders
are underused, yet nobody miss the latest crappy packaged
solution promoted by Google and his partners. And it goes the same for
all bad stuff..

Recently, Google launched his China version of websearch, generating a
lot of discussions about censoring results for chinese users at
the request of China government.

Everybody knows by now the visual proof of this censor, by performing
the (infamous) following queries:

http://images.google.cn/images?q=Tiananmen

http://images.google.com/images?q=Tiananmen

Shortly after, some guys published a way to bypass this filter by using
capitalised queries, and managed to output uncensored results

(ie: [Tienanmen] instead of [tienanmen]).

See here :
http://www.crypticide.com/dropsafe/articles/security/post20060129233439.html

But it was quickly corrected and this trick isn't working anymore.

But what about Yahoo (or MSN) ? Are they also filtering the results ?

Compare the same query on Yahoo (tld .com) and Yahoo China :

http://images.search.yahoo.com/search/images?p=tiananmen

http://image.yahoo.com.cn/search?p=tiananmen

It's not even filtered. You do not have ANY result at all. Do you really
think that is a better solution ?

Same goes on for the web search, but instead of having no results you
are redirected to the news results, where sources are obviously filtered

and subject to censorship.

http://www.yahoo.com.cn/search?p=test

No problem.

http://www.yahoo.com.cn/search?p=tiananmen

Response: HTTP/1.x 302 Found

Location:
http://xinwen.yahoo.com.cn/search.html?p=tiananmen&ei=utf-8&source=ysearch_www_filter_noresult

Bounced to yahoo news. also note the source parameter :
ysearch_www_filter_noresult

The usual one is 'ysearch_www_result_topsearch' when it's not filtered.

So. Is there anything we can do, as some did for Google, to bypass this
filter ? And how long will it take to be spotted and corrected by

Yahoo's teams ? Yahoo and other competitors of Google don't have the
same hype around them, and if you publish something about them,

it won't spread like any Google related news.

I tried to bypass the filter, using similar "poke around" techniques.
tried different approaches, mixing caps, adding useless keywords
(-dsfasdfds for ex),

multiple quotes etc.. Nothing. But finally, I tried to 'overflow' it, by
feeding the query parameter with big numbers of chars.. and it worked !

Apparently, if you add enough '+' before your queries, the filter is
bypassed, and you get censor free output.

[tianamen] :
http://xinwen.yahoo.com.cn/search.html?p=tiananmen&ei=utf-8&source=ysearch_www_filter_noresult

['+'(338 times) tiananmen] :
http://xinwen.yahoo.com.cn/search.html?p=%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2Btiananmen&ei=utf-8&source=ysearch_www_filter_noresult

['+'(339 times) tiananmen] :
http://www.yahoo.com.cn/search?ei=UTF-8&fr=fp-tab-web-ycn&p=%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2B%2Btiananmen&meta=vl%3Dlang_zh-CN%26vl%3Dlang_zh-TW&pid=ysearch&source1=ysearch_www_hp_button

'+' is encoded for urls in '%3B'

338 times '+' -> 338*3 = 1014

339 times '+' -> 339*3 = 1017

tiananmen -> 9 chars

338 '+' and tiananmen -> 1023 chars

339 '+' and tiananmen -> 1026 chars

We've reached and crossed the 1024 bytes limit for the value used for
the filter. So this query does bypass it :)

But this is quickly changing, between the time when I made those tests
and now, they seem to have added more limits, and the query field seems
to be restricted to 1024 chars. But if you feed the
parameter directly into the URL is will still work (as per late february 2006).

Also, I did not manage to make it work on Yahoo Images.

(c) Loki 2006 nem0@nowhere.org ahem! 'linux'+'mail'