Reference Tools
Home

                               

SEARCH  ENGINE  COMPARISON  

Google, AllTheWeb, Northern Light, and Ixquick

                                                                       

 

Goals of Search

My mother was treated for breast cancer about eight years ago.  Doctors found evidence that it had metastasized to her bones two years after that.  They changed her treatment, and she continues to survive with no evidence of increased spread of the cancer after that point in time.  In this search I wanted to discover the range of treatments used for breast cancer metastasized to bone, and some idea of the survival expectations for this disease.

 

I chose to search using the following search engines:

 

     1) Google

     2) All The Web

     3) Northern Light

 

and the metasearch engine:

 

     4) Ixquick    

 

Search Terms

     After analyzing the subject and the capabilities of the engines I chose, and performing a few trial simple searches, I searched using Boolean methodology, but substituting the minus sign (-) for NOT in the following manner:

  

              treatment “metastasized breast cancer” -brain -pancreas -skin -lung -lungs    

 

     In formulating my search terms my goals were to 1) produce a combination of terms that could be used in exactly the same form on all of the engines tested, 2) to produce a manageable set of results (less than 100 on each engine), and 3) to maximize extremely relevant results while minimizing irrelevant ones.  Because all the engines I used had AND as the default it was not necessary to use it in my search combination.  The lack of support for truncation on two of the search engines made it difficult to use any form of the word survive in my search because of its many permutations, and also made it necessary to include both lung and lungs.  Finally, the length of my combination was limited by the fact that one of the engines only searched a maximum of 10 terms.

 

      In test searches the use of some form of the term survive in the search seemed to consistently produce a subset of those results using just treatment, so not being able to use it probably did not affect the results significantly.  I chose to use “metastasized breast cancer” instead of “breast cancer” followed by metastasized because, although both were effective in eliminating a multitude of results covering breast cancer with no metastases, the second (with all other NOT terms) still produced search engine results of 500-900.  It should be noted that without the goal of creating a set of less than 100 results, using [treatment “breast cancer” metastasized -brain -pancreas -skin -lung -lungs] and scanning only the top results might give the searcher equal or better results.  Using the term bone in test searches to zero in on that particular metastasis produced results too heavily skewed toward the bone marrow treatments of this disease.  It was also not as effective in eliminating other metastases as was using NOT (substituting -) with each other organ that sometimes appeared.  I realize that there might some sources that discussed bone metastases in addition to these others, but I was willing to accept this loss of “recall” in the interest of producing a manageable number of results that eliminated most irrelevant ones.

 

Methodology

     I performed the search on all three engines by typing the search terms directly into the search box provided in the “advanced search” or “power search” page.  When a form was presented as replacement for Boolean search terms, I used it, but then tested again with the exact search combination in the simple search box to verify that the results were the same.  This verification process revealed, for example, that in the google.com advanced search form, my multiple NOT terms could not be placed together in the same box.  This was not obvious from the instructions at the site, or in other reviews and tutorials of the engines that I checked.   (searchenginewatch.com, pandia.com, lib.berkeley.edu, infopeople.org).

 

     After gathering the results from the engines, I analyzed them for relevancy, duplication, and broken links within each engine, and for overlap between the engines.  The total results for each engine are as printed on the original results pages, and do not include any results available by removing filters or by clicking on additional links.   Google does not count its sponsored links, so I didn’t include them in the analysis.  To be judged relevant, a hit did not have to provide information of interest specifically to me, but only refer in some way to a treatment of breast cancer metastasized to bone.  Although one cannot fault the engines too much for including different url’s that provide exactly the same information, these instances do reduce the usable results, and so they are counted as duplicates.  Northern Light includes “special collection” items among its results.  These are available for a charge of $2.95 at their site.  I chose to compare only free results, and so any special collection items that were not eliminated because they were irrelevant or duplicates, are eliminated in a separate category in the table below.  The table covers the three search engines.  The results of the metasearch engine are discussed later.

 

 

 

    Google

All The Web  

  Northern  Light

Total Hits              (+)

Irrelevant Hits        (-)

Duplicate Hits        (-)

Hits With Cost       (-)

  46

   3

   3

   0

39

2

1

0

 49

 2

 11

0

Total Useable Hits (=)

  40

36

33

% of Total Unique

Hits Found             (=)

 Total Hits/69

 

      57.97%

 

 

52.17%

 

 

47.82%  

          The 69 total unique hits discovered in these three searches are distributed as follows:

 

          Results found in all three engines                                         9  

          Results found only at Google                                              15

          Results found only at All The Web                                     15

          Results found only at Northern Light                                  14

          Results found just at Google and All The Web                     7

          Results found just at Google and Northern Light                  5

          Results found just at All The Web and Northern Light         4

 

     Analyzing these findings alone, it is not possible to definitively present any one of these engines as superior to the others.  In this particular search, Google does lead in total useable hits, and % of total hits found, but not by a percentage that I would consider significant.  In addition, each engine found an almost equal number of hits unique to itself, which would lead one to believe that if time is available, one would benefit by checking all three.  If there is only time to check one, the individual features of each engine discussed below would add to establishing a preference.

 

 

Engine Capabilities and Features

 

Similarities

   All three engines allowed some form of advanced Boolean search and used AND as the default, beyond that, they differed somewhat in form.  In this particular search, none of the engines seemed to be particularly helpful in ranking the hits so that the most helpful results were first.  Instead, I often found results of particular interest to me at all points in the lists. All three engines allowed searching of a phrase in quotations, but Google required a + before stop words (small words it sees as insignificant by default).  All allowed some customization of results like number of results on a page, which can be a big timesaver.  All allowed limitation of results in regards to title, url, domaine, date, and links.  All allowed searching in other languages, but All The Web supported only a few, whereas the other two supported dozens.  Two features I would like to have seen were not available on any of the engines.  First, I understand Altavista has a large search box so that a longer query does not scroll out of sight.  All these engines had small boxes although on All The Web advanced search, the entire form was available at the bottom of each results page.  Second I would have liked the ability to limit my search to hits in which my search terms appeared quickly in the text rather than being buried.  Only a document length feature on All The Web even approached this ability.

 

Features I Liked at Google

·        cache feature – each result came with a link to Google’s cache of the page.  A great time saver and a fix for sites no longer available.

·        highlighting – on the cached pages search terms were highlighted making only a quick scan necessary to determine relevance.

·        results window – if you actually wanted to visit the url, customization allowed the link to appear in a new window.  Also a big timesaver.

·        similar pages – this option was available at each result making it a lot easier than the other engines.

·        search within these results – a simple search box at bottom of results

·        translation – allowed translation of results appearing in a language other than English

·        clean and uncluttered results page

·        safesearch – can customize to limit objectionable sites in results.  Helpful with children.

 

Features I Didn’t Like at Google

·        limitations to Boolean searches – no truncation, no wild cards, no parentheses, no NEAR, no NOT (except by using -).

·        stop words – if you want to include the little words this engine ignores by default, you have to add extra +s and –s, even within a quoted phrase.  Can be confusing.  As a learning tool it is helpful to put the terms in the simple search box and then go to the advanced page to see how they appear there, and vice versa.

·        form in advanced search, which is supposed to eliminate the need for Boolean terms, can actually create problems of its own.  Says “limit of one phrase per box”, but when I just used a list of individual terms, that also didn’t work.

·        limitation on number of search terms – limit of only 10.  This was not stated except when you tried a search over the limit, and then it could be missed, allowing you to think you some terms were included that weren’t. 

·        lack of folders

 

Additional Features at Google

     Though I did not use the following features, they might be very helpful in some searches.  “I feel lucky” gives you just one result.  Some words are underlined allowing you to search word definitions.  You can search maps, phonebooks, stocks and images.  The domain field can be used to limit results to that domain, or to exclude the domain from results.  One can also use Google to search specific topics: mail order catalogues, MAC, BSD Unix, Linex, government sites, and universities.

 

Features I Liked at All The Web

·        customization pages – right on the home page there is a link to a customization form.  You can go there first, create customizations on a considerable number of points and save them.  Then you can go to the advanced search form, and either choose to use your saved customization, or use defaults – by far more complete and accessible than the other engines.

·        more boolean features – allows nesting

·        whole language text under results – I actually have a slight preference for this engine’s use of whole sentence from text under each result over Google’s phrase report of first use of search terms, and I strongly prefer it to Northern Light’s mixed bag presentation.

·        results folders – sorts results into numerous folders, helping searcher to find relevant ones

·        clean – addition of folders is tiny and does not clean lines of site

·        document size – a helpful feature

·        Advanced search form – created none of the problems noted at Google.  Nicely combines Boolean limitations, and where in the result you want them to apply (title, text, url, or host) in one part of the form.  Particularly like exact phrase box.

·        clicking on logo always takes you to home page

·        search within the results – advanced search form at bottom of page does this although it is by no means obvious that this is the case.

 

Features I Don’t Like At All The Web

·        lack of cache – re-searching each url or a shortened form is much too time consuming.

·        need to scroll to the right to see number of results

·        includes more than one result from same page without indentation or other notation

·        folders can be misleading – one of the best treatment results was in a “prevention and risk factor” folder.

 

Other Features At All The Web

     Although I did not need this feature, it would be indispensable for some searches.  One can search just news, pictures, videos, MP3 files, or FTP files, and these options are available right on the home page.

 

Features I Liked At Northern Light

·        folders – sorts results into folders as does All The Web when helpful

·        natural language – can search in natural language, a feature not available at other engines

·        more Boolean features – allows nesting and truncation in multiple positions, stems by default.

·        alert account feature – allows searcher to choose to have engine notify of future relevant results – handy for ongoing research of students or people with medical conditions, etc.

 

Features I Didn’t Like At Northern Light

·        lack of cache – broken links were a definite problem at this engine.

·        folders a) – can be misleading, much overlap although if you’ve already looked at a result in one folder it will be highlighted in another.

·        folders b) – their presentation on this site takes up almost a third of the page ruining clean lines

·        don’t like to have to scroll right to see number of results.

·        special allocations – though they might be helpful for people willing to pay for results, I don’t like the inclusion of these.  At least in my search, they padded the results with many duplications.  However, at least some were also available at commercial sites in same results.

 

Other Features At Northern Light

     Although I didn’t use these features, they would add utility for many searchers.  On the advanced search page, one can limit results by subject, a considerable list.  One can also limit by country of origin or type of document.  On a limited basis, one can also search by company and by ticker symbol, and the “special collections” are searchable by publication.  There are special pages and forms to help with investment research, industry focused research, and neighborhood searches for people or services.

 

Metasearch - Ixquick

     I chose to search my term combination at Ixquick, a metasearch engine.  Here there is only a simple box to place your terms in, but it fully supports all Boolean language, pluses and minuses, and natural language.  It then takes your terms and translates them into a form that will be recognized at each database it polls.  At least in my case it searched AOL, AltaVista UK, All The Web, Espotting, FindWhat, Hotbot UK, LookSmart UK, Lycos, MSN UK, Mirage, Open Directory, Overture UK, Sprinks, UKPlus, and Yahoo.  The results are limited to the top ten on each page, and duplications are eliminated.  I thus received just 18 results for the search, each noted with stars for the number of engines listing in their top ten.  Although Ixquick does not search Google or Northern Light, most of its results were found in one or the other as well.  Only three results (pulled from AltaVista) did not appear in my original search of the three engines.  This engine can also search news, MP3 files, and pictures.  The lib.berkeley.com site warns that that metasearch engines have a “time out” feature, so that some databases are actually not in the results, because they took too long to report.  This type of search is good for producing a limited number of highly relevant results, or for use in addition to a more comprehensive search of engines.  If used at the beginning of a comprehensive search, one could use the results to zero in on engines to check.  If used at the end, one can find a few other good results as I did.

 

Directory Search – Librarians’ Index To the Internet

     Although it is not required in the assignment, I searched lii.org, thinking that this directory, evaluated and annotated by librarians, might provide sources not located in my searches.  I was correct in this assumption.  Lii.org “breast cancer” category listed 5 very comprehensive sources, only one of which turned up in the other searches.

 

Conclusion

     The specific results of this search lightly favor Google as the best of the three search engines.  When you add consideration of its special features, especially the highlighted available cache of each result, Google becomes a clear leader for a compound search of this type, at least in my mind.  For a sophisticated searcher, Google would be a good choice for a limited search.  An unsophisticated searcher might have trouble filling in the advanced search form at Google, with the possible complications noted above, but if assisted with that, anyone could readily use the results.  Results from a Ixquick metasearch or a directory subject search at Librarians Index to the Internet might be equally satisfying to a patron looking for limited results.  Provided the searcher has time to do so, certain aspects of this exercise encourage checking all three engines. 

Each engine provided either 14 or 15 results unique to itself that would have been missed it were not checked.  Given time, searches of directories and the use of metasearch engines can increase the relevant results as well.


Evaluation and Assessment
Home