SEARCH ENGINE COMPARISON
Google, AllTheWeb, Northern
Light, and Ixquick
Goals
of Search
My
mother was treated for breast cancer about eight years ago. Doctors found evidence that it had
metastasized to her bones two years after that. They changed her treatment, and she continues to survive with no
evidence of increased spread of the cancer after that point in time. In this search I wanted to discover the
range of treatments used for breast cancer metastasized to bone, and some idea
of the survival expectations for this disease.
I
chose to search using the following search engines:
1) Google
2) All The Web
3) Northern Light
and
the metasearch engine:
4) Ixquick
Search
Terms
After analyzing the subject and the
capabilities of the engines I chose, and performing a few trial simple
searches, I searched using Boolean methodology, but substituting the minus sign
(-) for NOT in the following manner:
treatment “metastasized breast
cancer” -brain -pancreas -skin -lung -lungs
In formulating my search terms my goals
were to 1) produce a combination of terms that could be used in exactly the
same form on all of the engines tested, 2) to produce a manageable set of
results (less than 100 on each engine), and 3) to maximize extremely relevant
results while minimizing irrelevant ones.
Because all the engines I used had AND as the default it was not
necessary to use it in my search combination.
The lack of support for truncation on two of the search engines made it
difficult to use any form of the word survive in my search because of its many
permutations, and also made it necessary to include both lung and lungs. Finally, the length of my combination was
limited by the fact that one of the engines only searched a maximum of 10
terms.
In test searches the use of some form of
the term survive in the search seemed to consistently produce a subset of those
results using just treatment, so not being able to use it probably did not
affect the results significantly. I
chose to use “metastasized breast cancer” instead of “breast cancer” followed
by metastasized because, although both were effective in eliminating a
multitude of results covering breast cancer with no metastases, the second
(with all other NOT terms) still produced search engine results of
500-900. It should be noted that
without the goal of creating a set of less than 100 results, using [treatment
“breast cancer” metastasized -brain -pancreas -skin -lung -lungs] and scanning
only the top results might give the searcher equal or better results. Using the term bone in test searches to zero
in on that particular metastasis produced results too heavily skewed toward the
bone marrow treatments of this disease.
It was also not as effective in eliminating other metastases as was
using NOT (substituting -) with each other organ that sometimes appeared. I realize that there might some sources that
discussed bone metastases in addition to these others, but I was willing to
accept this loss of “recall” in the interest of producing a manageable number
of results that eliminated most irrelevant ones.
Methodology
I performed the search on all three
engines by typing the search terms directly into the search box provided in the
“advanced search” or “power search” page.
When a form was presented as replacement for Boolean search terms, I
used it, but then tested again with the exact search combination in the simple
search box to verify that the results were the same. This verification process revealed, for example, that in the
google.com advanced search form, my multiple NOT terms could not be placed
together in the same box. This was not
obvious from the instructions at the site, or in other reviews and tutorials of
the engines that I checked.
(searchenginewatch.com, pandia.com, lib.berkeley.edu, infopeople.org).
After gathering the results from the engines,
I analyzed them for relevancy, duplication, and broken links within each
engine, and for overlap between the engines.
The total results for each engine are as printed on the original results
pages, and do not include any results available by removing filters or by
clicking on additional links. Google
does not count its sponsored links, so I didn’t include them in the
analysis. To be judged relevant, a hit
did not have to provide information of interest specifically to me, but only
refer in some way to a treatment of breast cancer metastasized to bone. Although one cannot fault the engines too
much for including different url’s that provide exactly the same information,
these instances do reduce the usable results, and so they are counted as duplicates. Northern Light includes “special collection”
items among its results. These are
available for a charge of $2.95 at their site.
I chose to compare only free results, and so any special collection
items that were not eliminated because they were irrelevant or duplicates, are
eliminated in a separate category in the table below. The table covers the three search engines. The results of the metasearch engine are
discussed later.
|
|
Google |
All
The Web |
Northern
Light |
|
Total
Hits (+) Irrelevant
Hits (-) Duplicate
Hits (-) Hits
With Cost (-) |
46 3 3 0 |
39 2 1 0 |
49 2 11 0 |
|
Total
Useable Hits (=) |
40 |
36 |
33 |
|
%
of Total Unique Hits
Found (=) Total Hits/69 |
57.97% |
52.17% |
47.82% |
The 69 total unique hits discovered
in these three searches are distributed as follows:
Results found in all three
engines
9
Results found only at Google 15
Results found only at All The
Web
15
Results found only at Northern
Light
14
Results found just at Google and All
The Web 7
Results found just at Google and
Northern Light 5
Results found just at All The Web
and Northern Light 4
Analyzing these findings alone, it is not
possible to definitively present any one of these engines as superior to the
others. In this particular search,
Google does lead in total useable hits, and % of total hits found, but not by a
percentage that I would consider significant.
In addition, each engine found an almost equal number of hits unique to
itself, which would lead one to believe that if time is available, one would
benefit by checking all three. If there
is only time to check one, the individual features of each engine discussed
below would add to establishing a preference.
Engine
Capabilities and Features
Similarities
All three engines allowed some form of
advanced Boolean search and used AND as the default, beyond that, they differed
somewhat in form. In this particular
search, none of the engines seemed to be particularly helpful in ranking the
hits so that the most helpful results were first. Instead, I often found results of particular interest to me at
all points in the lists. All three engines allowed searching of a phrase in
quotations, but Google required a + before stop words (small words it sees as
insignificant by default). All allowed
some customization of results like number of results on a page, which can be a
big timesaver. All allowed limitation
of results in regards to title, url, domaine, date, and links. All allowed searching in other languages,
but All The Web supported only a few, whereas the other two supported
dozens. Two features I would like to
have seen were not available on any of the engines. First, I understand Altavista has a large search box so that a
longer query does not scroll out of sight.
All these engines had small boxes although on All The Web advanced
search, the entire form was available at the bottom of each results page. Second I would have liked the ability to
limit my search to hits in which my search terms appeared quickly in the text
rather than being buried. Only a
document length feature on All The Web even approached this ability.
Features
I Liked at Google
·
cache
feature – each result came with a link to Google’s cache of the page. A great time saver and a fix for sites no
longer available.
·
highlighting
– on the cached pages search terms were highlighted making only a quick scan
necessary to determine relevance.
·
results
window – if you actually wanted to visit the url, customization allowed the
link to appear in a new window. Also a
big timesaver.
·
similar
pages – this option was available at each result making it a lot easier than
the other engines.
·
search
within these results – a simple search box at bottom of results
·
translation
– allowed translation of results appearing in a language other than English
·
clean
and uncluttered results page
·
safesearch
– can customize to limit objectionable sites in results. Helpful with children.
Features
I Didn’t Like at Google
·
limitations
to Boolean searches – no truncation, no wild cards, no parentheses, no NEAR, no
NOT (except by using -).
·
stop
words – if you want to include the little words this engine ignores by default,
you have to add extra +s and –s, even within a quoted phrase. Can be confusing. As a learning tool it is helpful to put the terms in the simple
search box and then go to the advanced page to see how they appear there, and
vice versa.
·
form
in advanced search, which is supposed to eliminate the need for Boolean terms,
can actually create problems of its own.
Says “limit of one phrase per box”, but when I just used a list of
individual terms, that also didn’t work.
·
limitation
on number of search terms – limit of only 10.
This was not stated except when you tried a search over the limit, and
then it could be missed, allowing you to think you some terms were included
that weren’t.
·
lack
of folders
Additional
Features at Google
Though I did not use the following
features, they might be very helpful in some searches. “I feel lucky” gives you just one
result. Some words are underlined
allowing you to search word definitions.
You can search maps, phonebooks, stocks and images. The domain field can be used to limit
results to that domain, or to exclude the domain from results. One can also use Google to search specific
topics: mail order catalogues, MAC, BSD Unix, Linex, government sites, and
universities.
Features
I Liked at All The Web
·
customization
pages – right on the home page there is a link to a customization form. You can go there first, create
customizations on a considerable number of points and save them. Then you can go to the advanced search form,
and either choose to use your saved customization, or use defaults – by far
more complete and accessible than the other engines.
·
more
boolean features – allows nesting
·
whole
language text under results – I actually have a slight preference for this
engine’s use of whole sentence from text under each result over Google’s phrase
report of first use of search terms, and I strongly prefer it to Northern
Light’s mixed bag presentation.
·
results
folders – sorts results into numerous folders, helping searcher to find
relevant ones
·
clean
– addition of folders is tiny and does not clean lines of site
·
document
size – a helpful feature
·
Advanced
search form – created none of the problems noted at Google. Nicely combines Boolean limitations, and
where in the result you want them to apply (title, text, url, or host) in one
part of the form. Particularly like
exact phrase box.
·
clicking
on logo always takes you to home page
·
search
within the results – advanced search form at bottom of page does this although
it is by no means obvious that this is the case.
Features
I Don’t Like At All The Web
·
lack
of cache – re-searching each url or a shortened form is much too time
consuming.
·
need
to scroll to the right to see number of results
·
includes
more than one result from same page without indentation or other notation
·
folders
can be misleading – one of the best treatment results was in a “prevention and
risk factor” folder.
Other
Features At All The Web
Although I did not need this feature, it
would be indispensable for some searches.
One can search just news, pictures, videos, MP3 files, or FTP files, and
these options are available right on the home page.
Features
I Liked At Northern Light
·
folders
– sorts results into folders as does All The Web when helpful
·
natural
language – can search in natural language, a feature not available at other
engines
·
more
Boolean features – allows nesting and truncation in multiple positions, stems
by default.
·
alert
account feature – allows searcher to choose to have engine notify of future
relevant results – handy for ongoing research of students or people with
medical conditions, etc.
Features
I Didn’t Like At Northern Light
·
lack
of cache – broken links were a definite problem at this engine.
·
folders
a) – can be misleading, much overlap although if you’ve already looked at a
result in one folder it will be highlighted in another.
·
folders
b) – their presentation on this site takes up almost a third of the page
ruining clean lines
·
don’t
like to have to scroll right to see number of results.
·
special
allocations – though they might be helpful for people willing to pay for
results, I don’t like the inclusion of these.
At least in my search, they padded the results with many
duplications. However, at least some
were also available at commercial sites in same results.
Other
Features At Northern Light
Although I didn’t use these features,
they would add utility for many searchers.
On the advanced search page, one can limit results by subject, a
considerable list. One can also limit
by country of origin or type of document.
On a limited basis, one can also search by company and by ticker symbol,
and the “special collections” are searchable by publication. There are special pages and forms to help
with investment research, industry focused research, and neighborhood searches
for people or services.
Metasearch
- Ixquick
I
chose to search my term combination at Ixquick, a metasearch engine. Here there is only a simple box to place
your terms in, but it fully supports all Boolean language, pluses and minuses,
and natural language. It then takes
your terms and translates them into a form that will be recognized at each
database it polls. At least in my case
it searched AOL, AltaVista UK, All The Web, Espotting, FindWhat, Hotbot UK,
LookSmart UK, Lycos, MSN UK, Mirage, Open Directory, Overture UK, Sprinks,
UKPlus, and Yahoo. The results are
limited to the top ten on each page, and duplications are eliminated. I thus received just 18 results for the
search, each noted with stars for the number of engines listing in their top
ten. Although Ixquick does not search
Google or Northern Light, most of its results were found in one or the other as
well. Only three results (pulled from
AltaVista) did not appear in my original search of the three engines. This engine can also search news, MP3 files,
and pictures. The lib.berkeley.com site
warns that that metasearch engines have a “time out” feature, so that some
databases are actually not in the results, because they took too long to
report. This type of search is good for
producing a limited number of highly relevant results, or for use in addition
to a more comprehensive search of engines.
If used at the beginning of a comprehensive search, one could use the
results to zero in on engines to check.
If used at the end, one can find a few other good results as I did.
Directory
Search – Librarians’ Index To the Internet
Although it is not required in the
assignment, I searched lii.org, thinking that this directory, evaluated and
annotated by librarians, might provide sources not located in my searches. I was correct in this assumption. Lii.org “breast cancer” category listed 5
very comprehensive sources, only one of which turned up in the other searches.
Conclusion
The specific results of this search
lightly favor Google as the best of the three search engines. When you add consideration of its special
features, especially the highlighted available cache of each result, Google
becomes a clear leader for a compound search of this type, at least in my
mind. For a sophisticated searcher,
Google would be a good choice for a limited search. An unsophisticated searcher might have trouble filling in the
advanced search form at Google, with the possible complications noted above,
but if assisted with that, anyone could readily use the results. Results from a Ixquick metasearch or a
directory subject search at Librarians Index to the Internet might be equally
satisfying to a patron looking for limited results. Provided the searcher has time to do so, certain aspects of this
exercise encourage checking all three engines.
Each
engine provided either 14 or 15 results unique to itself that would have been
missed it were not checked. Given time,
searches of directories and the use of metasearch engines can increase the
relevant results as well.