portalmain → operators.htm
This file dwells at
 http://www.searchlores.org/operators.htm
 
This is a windrose

Petit image
      

~ Search engine operators ~



("lego bricks" for webbits writers)


Version 0.03, updated February 2008





Introduction
yahoo
google
MSNlive
Altavista
Inktomi-type




Introduction
or
Explaining the whys and the hows


...intitle:whatever & inurl:whatever are VERY important parameters, because EVERYTHING on the Web is just a name, and because the very building blocks of the Web are just names... nomen est omen: redimages giotto5.jpg...

Check for instance this recent webbit

-inurl:htm -inurl:html -inurl:jsp -inurl:php -inurl:pdf -inurl:asp -inurl:txt -inurl:shtml -inurl:phtml -inurl:cgi -intitle:free -intitle:download -intitle:archive +intitle:index+of/ +parent-directory +name +"last modified" +size +description (oasis OR shakira) (mp3 OR wma OR m4a) -download

You see the inurl: and intitle: parameters? You understand what they do?

This page is in fieri, and you are seeing -and hopefully helping- its birth. Please excuse its obvious shortcomings.


Yahoo's operators
(Biggest index, small brain)

Yahoo operators:
site: hostname: link: linkdomain: (links that points to one domain) url: intitle: inurl: (a specific keyword as part of indexed urls, example: inurl:searching)


Google's operators
(Biggest brain, lotta spam)


Google's operators:
site: allintitle: (all of the query words in the title) intitle: (that word in the title) allinURL: (all of the query words in the URL) inURL: (that word in the URL) cache: link: related: (pages that are "similar" to a specified web page) info: (google's info)


Altavista's operators
(Have NEAR, will travel)

Note that Altavista now uses Yahoo's database
Altavista's most important operator:
NEAR : AltaVista NEAR operator constrains the search to documents that contain the words within a distance of ten words (in either order).
As searchers know, NEAR is much better than AND when seeking precision through associations between words.


MSNLive's operators
(Powerful macros but small database)

MSN Live's operators:
contains: Restricts results to sites that have links to the file type(s) you specify. For example, to search for websites that contain links to mp3 files, type music contains:mp3. filetype: Returns only web pages created in the file format you specify. Live Search recognizes html, txt, and pdf extensions. Live Search also recognizes the extensions for primary Office document types. For example, to find reports created in PDF format, type your subject, followed by filetype:pdf. For example, type information filetype:pdf. inanchor:, inbody:, intitle:, inurl: Returns pages that contain the specified term in the anchor, body, title, or web address of the site, respectively. Specify only one term per keyword. You can string multiple keyword entries as needed. For example, to find pages that contain google in the anchor, and the terms black and blue in the body, type inanchor:google inbody:black inbody:blue. ip: Finds sites that are hosted by a specific IP address. The IP address must be a dotted quad address. Type the IP: keyword, followed by the IP address of the website. For example, type IP:80.83.47.151. language: Returns web pages for a specific language. Specify the language code directly after the language: keyword. link: Finds sites that have links to the specified website or domain. This is useful for determining who links to whom. Do not add a space between link: and the web address. For example, to find pages that contain the word games and that link to searchlores.org, type games link:searchlores.org   linkdomain: Finds sites that link to any page within the specified domain. Use this keyword to determine how many links are being made to a specific page, as well as how those links are made. For example, to see pages that link to searchlores, type linkdomain:searchlores.org. linkfromdomain: Finds sites that are linked from the specified domain. Use this keyword to determine how many links are being made from a specific page, as well as how those links are made. For example, to see pages that are linked from my site, type linkfromdomain:fravia.com   loc:, location: Returns web pages from a specific country or region. Specify the country or region code directly after the loc: keyword. To focus on two or more languages, use a logical OR and group the languages. For example, "core python" (loc:RU OR loc:CN)   prefer: Adds emphasis on either a word or another operator. For example, type searching prefer:internet   site: Returns web pages that belong to the specified site. To focus on two or more domains, use a logical OR and group the domains. Do not add a space after the colon (:). You can use site search for web domains, top level domains, and directories that are not more than two levels deep. For example, to see web pages about media reporting from the BBC or CNN websites, type "media reporting" (site:bbc.co.uk OR site:cnn.com). You can also search for web pages that contain a specific search word on a site. For example, to find the library pages on searchlores, type site:www.searchlores.org/library feed: Finds RSS or Atom feeds on a website. For example, to find RSS or Atom feeds about web searching, type feed:"web searching"   hasfeed: Finds web pages that contain an RSS or Atom feed on a website. You can add search words to narrow your search. For example, to find web pages on the Guardian website that contain RSS or Atom feeds about google, type site:www.guardian.co.uk hasfeed:google   url: Checks whether the listed domain or web address is in the Live Search index. Do not add a space between url: and the domain or web address. For example, to verify that searchlores is in the index, type url:searchlores.org  
Most important MSNLive operator:

linkfromdomain: (an outbound links operator)


Inktomi-type's operators
(Working a lot you can do wonders)



Taken verbatim from Inktomi's search syntax by Nemo

Default search

Multiple search terms are processed as an AND operation.

Basic inclusion/exclusion of terms

You can use the + and - signs to include, respectively exclude, a term. To exclude terms in an effective way, read my search engines anti-optimization essay.

Boolean search

Inktomi offers full Boolean searching and its syntax is OR and NOT (as in Google, nothing stands for an AND), allows the use of - instead of NOT and searching can be nested using parentheses (). Operators must be in upper case. You are well advised to not use the OR operator for keyword variants, because your query will attract irrelevant search results (Inktomi gives an higher rank to documents containing all ORed keywords), in those cases you should use stemming whenever you can. Example, compare:

Phrase search

Inktomi lets you search for phrases by enclosing them with quotes ("). You can also use underscores (_) to build phrases (partially discovered by fagan), compare:

The standard way for searching phrases inside fields, like title:, inurl:, etc, do not work (example title:"index of"). Nevertheless for every such field you have two ways for searching phrases. Example for title: (the other are similar):

Phrase searches are often used to search for documents generated by some kind of software (and therefore have some fixed strings of text). The "index of", or more precisely title:index_of is a classical example, where you search for open directories, in this case those generated by the apache server.

Phrase searches are also a valuable tool when you arrive to pages showing a glimpse of some document and trying to sell the whole document... More often than not, that very same document is available somewhere else for free! Lets take these bastards, which have stolen the previous version this very same document and are trying to sell the access to their database for $9.99 a month. There you find the following snippet:

Inktomi is one of the best search engines out there. Unfortunately its search syntax is not well documented, which is a pity, because Inktomi offers one of the richest search syntaxes, with lots of unique features and a ranking algo which works often quite well. The purpose of this essay consists precisely in documenting Inktmi's search syntax and providing examples showing its usefulness. For that purpose old HotBot's search FAQs and others Inktomi's web partners' search FAQs were read. The core syntax present in them was expanded using search engines and the WayBack Machine. Finally, from the source code of old HotBot's advanced search pages, additional search syntax was guessed: feature:homepage, originurlextension: and stem:. Inktomi unveiled Inktomi doesn't provide a public search engine in a way that search engines like AltaVista or Google do. This paper is the property of learnessays.com Copyright 2003-2005

My dear reader when you find something like this all you have to do is take a phrase and put it on a search engine, example: "Inktomi is one of the best search engines out there". Morale: phrases provide very powerfull spells to summon the document you want!

Wildcards

The asterisk * can be used within a phrase search to match any word in that position. Thanks to the * you can do proximity searches on Yahoo! This is a very handy feature to search images for instance, because most people follow the "content - name relation" when naming files. For example if you are searching for a Caravaggio picture, you can do the following search on Yahoo: "caravaggio * jpg". That way you'll get pages linking/containing images named "caravaggio_2.jpg", "caravaggio 07.jpg", etc. Do not expect as many search results as in Google, because Yahoo do not index image's alt attribute (done by Google), nor images src attribute, nor the href attribute of <a...> tags.

Case

Inktomi has no case sensitive searching. Using either lower or upper case results in the same hits.

Truncation

No truncation (*, ?) is currently available, but you can use word stemming (stem:).

Stop words

All words are searched. There are no known stop words.

Ranking

Inktomi was one of the first search engines allowing you to change its ranking algorithm. This is done by giving to each keyword a weight. Weight factors can vary betwen 0.0 and 9.9 and the syntax is weight*keyword, by default each keyword has weight 1.0 as you can see comparing these two queries: 1.0*fravia and fravia. The simplest way of using this feature is by using the 80 - 20 rule, i.e. multiply bad keywords (the highly spamed ones) by 0.2*, multiply context keywords by 0.0* (to not disturb the ranking algo, they must be there, but don't rank) and multiply good keywords (those less likely to be spammmed) by 1.0*. Example:

As a rule of thumb this Pareto rule is not too shaby...

depth:[number]

Denotes how far webpages will be searched in a site's directory structure. The number (0, 1, 2, 3, 4) specifies the maximum number of subdirectories, relatively to host's root directory, which could appear in the URL. As a general rule (not universal! duh:) webpage's content increase with directory's depth and, besides, spammers think that webpages on home directory get a ranking boost and are more likely to being indexed, therefore they put often their doorway pages there. This useful feature offers a handy way of getting ride of those anoiances... excluding root directories' pages!

Example: title:german hear feature:audio -depth:0

domain: (aka site:)

Restricts a search to the selected domain. Domains can be specified up to three levels deep. Once you have found a promising site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat. Example of use -constructing a local search engine to searchlores site- : domain:searchlores.org .

feature:acrobat

Searches for pages linking to PDF files, although there are some who do escape. Compare the queries:

As quality documents, like papers, are often written in pdf format, this filter provides a way of getting high quality pages, those linking to that very same files. Example: "link structure" feature:acrobat. As PDF files may have not been indexed for some reason (examples: robots.txt file or robots meta tags), this feature may provide, in an indirect way, some interesting results.

feature:activex

Detects pages containing embedded content, be it sounds, movies, flash, java, pdf files, powerpoint presentations, etc... almost everything can be embedded in a webpage. The detection is made by verifying the presence of an <object... > tag, as you can see comparing the results of following queries with the original pages:

Content embedded with the <embed...> tag is not matched by feature:activex, as the following example shows:

As the canonical way of embedding content for M$IE is by using activex and as almost every luser uses M$IE, page's creators are compelled to also embed content by using the <object...> tag which, nowadays, is also the official HTML 4.01 standard. That said, this feature provides a handy way of getting (or excluding:) pages containing precisely that very same content. Example -searching for pages containing fravia's workshops embedded as movie or sound: fravia stem:workshop feature:activex.

feature:applet

Detects <applet ...> tags in page's source code, compare:

the tags <object ...> (for Internet Explorer) and <embed ...> (for Netscape) can also be used embed applets, but Inktomi doesn't detect applets embedded this way. Compare:

Documents containing links to .class or .java aren't also taken into account, compare:

Example of use -searching for pages where you can play chess interactively: feature:applet title:play title:chess.

feature:audio

Detects if a page contains a link to an audio file. Audio files could be among others: wav, mp3, m3u, mid, midi, au, snd, ... The link could be in a:

feature:audio doesn't match embedded audio files:

If you want to search for embedded audio files you must have to resort to use the rather coarse feature:activex. Example of use -searching for audio files of fravia's workshops- : fravia feature:audio

feature:flash

Contrary to what we could expect, Inktomi do not detect neither the existence of the <embed ...> tags, nor the existence of the <object ...> tags. For Inktomi feature:flash simply means webpages linking to files with extensions: fla, spl or swf, compare:

If you want to search for embedded flash you must have to resort to use the rather coarse feature:activex.

feature:form

The Inktomi's crown jewel. Detects the <form> tag in page's source code. Inktomi may not index the hidden web, but offers you a way of knowing where the front doors are! For instance you can use Inktomi to find Laws' Databases, translation services: dutch english translate url feature:form, etc.

feature:frame

Detects pages containing frames.

feature:homepage

Restrict your search to personal pages (identifier ~). Very useful, because it's still the convention for personal pages on educational sites. Example: web search feature:acrobat feature:homepage.

feature:image

Detects <img...> tag in HTML or a link to an image.

Interested in finding images of birds of paradise? Try the following query on Yahoo!:

("bird of paradise" OR "birds of paradise") (papua OR "new guinea") feature:image -stem:travel -stem:hotel

Images are widely used for aesthetic reasons. If an HTML webpage doesn't contain images you may wonder if there's an hidden agenda... probably it's a cloaked/spammed page by a a spammer putting only keywords n' links and not taking the hassle of building a real webpage. You can often trash those annoyances using this useful feature!

feature:index

Restricts your search results to the host's top page. Very useful to find sites about a given theme! The host's homepage is the most valuable site's real estate, there the site's owner should put a resume of what his site is all about and provide links to his most important pages. Example searching for FTP search engines: ftp search feature:index feature:form. Inktomi indexes approximately 1,520,000,000 webhosts cf.: feature:index. 1.5 Billion webhosts is quite an odd figure, because Inktomi has 19.2 Billion documents in its database, so, on average, Inktomi indexes 13 documents per webserver. Given that some domains spam a lot, for most servers Inktomi indexes only the entry page... maybe there's nothing more to index... Nevertheless is quite odd. Altough, as ritz points out, this probably is an anecdotal evidence that the number of sites containing n pages folow a Pareto distribution.

feature:javascript

Detects pages containing the <script ...> tag with the attribute language="javascript", compare:

Inktomi doesn't recognize javascript embedded in other tags' attributes, compare:

webpages linking to javascript files (extension .js) are not considered as containing javascripts, compare:

Javascript, with the help of forms, is a cheap, yet powerfull, way of providing interactive pages. Sometimes is the right tool to cut all bragging pages that do not offer the interactive content they promise. Example:

title:german exercises feature:javascript feature:form

Spitze!

feature:meta

Detects <meta ...> tags in webpage's source code.

feature:shockwave

Detects pages containing links to files with extension dcr, dir, fla, spl or swf, compare:

If you want to search for embedded shockwave you must have to resort to use the rather coarse feature:activex.

feature:script

Detects <script ...> tags in HTML, in particular detects other script languages than javascript (for instance VB script), compare:

feature:table

Search for pages containing the <table ...> tag. Tables are widely used to control page's layout and of course to build tables! If an HTML webpage doesn't contain tables you might wonder if there's an hidden agenda... probably its a cloaked/spammed page by a SEO fearing that some search engines may not full y support tables, or a spammer putting only keywords 'n' links and not taking the hassle of building a real webpage. You can often trash those annoyances using this useful feature!

feature:title

Detects pages containing the <title> tag. As allmost all webpages contain a title, this feature gives a good estimation of how many HTML documents are in Inktomi's database. Cf.: feature:title.

feature:video

Search for pages linking to video files (file extensions: avi, mpg, mpeg, mov, etc.). Videos embedded with <img...> tags with the legacy attribute dynsrc are not matched, compare:

neither pages with <embed...> or <object...> tags, compare:

If you want to search for embedded video files you must have to resort to use the rather coarse feature:activex. Interested in finding videos of fravia's workshops? Try the folowing query on Yahoo!: fravia feature:video.

feature:vrml

Search for pages containing a link to a vrml file (wrl, wrz, vrml). Compare:

Inktomi is unable to see embedded vrml files. Compare:

Example: web links graph feature:vrml

hostname:

Allows one to find all documents from a particular host only. It has similar uses to those already found on domain:.

inurl:

Searches for words in URL, you can also search for phrases, but the syntax isn't the one we would expect: inurl:"keyword1 keyword2", instead it is "inurl:keyword1 inurl:keyword2". Once you have found a promising directory's site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat, or of getting a 'directory listing'. Example of use -constructing a local search engine to the Seeker's message board- : domain:2113.ch inurl:mb001.

link:

Finds pages containing hypertext links to the exact specified URL. Comes in handy when you land, using a search engine, on a webpage you like and you want more similar pages from that web site. In those cases you can try to find the 'table of contents' of that specific site. Example, using the URL of this essay to find where are the searchlores' 'table of contents': link:http://www.searchlores.org/inktomi.html domain:searchlores.org, among other 'table of contents' you find the folowing ones (once they get reindexed:) http://www.searchlores.org/news.htm, http://www.searchlores.org/essays.htm and http://www.searchlores.org/main.htm.

Intelligent seekers search the web backwards! Once they have found a good site, they identify the most interesting pages on that site, 'table of contents' pages are good candidates, and see who's linking to them. This strategy provides a way, once we know a good site, of finding more good sites or pages linking to good sites. The rationale is good sites only link to good sites! Searching backwards, once you have found a good site, is the main way of combing the web for good sites which others have already found. Example: link:http://www.searchlores.org/news.htm, link:http://www.searchlores.org/essays.htm and link:http://www.searchlores.org/main.htm.

Inktomi doesn't have an operator which lets you search for keywords on links, so you don't have a direct way of searching for links to a given directory. The only thing you can do in such cases is to use wget to get a directory listing from that directory and feed all those URLs to Yahoo using wget and the link: operator.

linkdomain:

Searches for pages linking to any page in a given domain up to three levels deep. This operator provides a more versatile way of searching the web backwards (cf. the link: operator). Examples of use:

linkextension:

Searches for pages linking to files with a given extension. This operator provides a way of searching for files which are not downloaded and processed by Inktomi's spiders, such as images, audio, videos and other binary files. One possible use for this operator is searching for blogs. Most blogs, at least most blogs that are nowadays worth reading, have a RSS feed somewhere on them. This operator provides a great way of finding pages containing RSS feeds, which are usually just a RSS, XML, RDF, or ATOM document type. Interested in finding blogs on web search techniques? Try: searchlores linkextension:rss, searchlores linkextension:xml, searchlores linkextension:rdf and searchlores linkextension:atom.

originurlextension:

Restricts documents search by type, aka file extension. Document's type is a good proxy of document's quality. Some examples of high quality documents are .pdf, .doc (word), .xls (spread sheets), .ppt (power point presentations), .ps, .dvi and .rtf files. Example of use: web search originurlextension:pdf.

outgoingurltype:[url_type]

Searches for pages linking to document with a given mime type. Mime type is inferred by document's extension as the folowing example shows: outgoingurltype:image/jpeg -linkextension:jpg -linkextension:jpeg -linkextension:jpe -linkextension:jfif. This operator does more or less the same as linkextension:, altough is a little bit more general, because it clusters file extensions by type .

path: (aka originurlpath:)

Searches for words in URL's path, you can also search for phrases, but the syntax isn't the one we would expect: path:"keyword1 keyword2", instead it is "path:keyword1 path:keyword2". Once you have found a promising directory's site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat, or of getting a 'directory listing'. Example of use -constructing a local search engine to the Seeker's message board- : domain:2113.ch "path:phplab path:mbs.php3" inurl:mb001.

region:name

Restricts your search to a geographical region (africa, centralamerica, downunder aka Oceania, europe, mediterranean, mideast aka Middle East, northamerica, southamerica, southeastasia). You can find which countries are included in each region here. I think that Inktomi assigns, for each domain name, either the information got by a whois search when top domains are .com, .org, .net, .biz, .edu, etc, as we can infer from the following two queries on Yahoo:

or assigns the country corresponding to the two letters top domain names (examples: .au, .ca, .de, .es, .fr, .uk, .us, etc). The main use for this operator is restricting your search to a given geographical region, example: stem:laws noise stem:levels region:europe. This field can also be used to get an estimation of how many documents are in Inktomi's database:

region:africa
region:asia
region:centralamerica
region:downunder
region:europe
region:mediterranean
region:mideast
region:northamerica
region:southamerica
region:southeastasia
Total:
                 32,100,000 documents
549,000,000 documents
13,300,000 documents
242,000,000 documents
4,200,000,000 documents
25,800,000 documents
51,600,000 documents
13,000,000,000 documents
274,000,000 documents
45,000,000 documents
18.723.000.000 documents



Petit image

(c) 1952-2032: [fravia+], all rights reserved, coupla wrongs reversed