The difference between a search engine and a directory is that the directory allows the user to 'climb the abstraction ladder.' This is to say that they are provided with a set of general catagories to choose from. After selecting a catagory, the sub-catagories of that catagory are presented for selection. This process continues until the user has reached the top of the ladder. Consider this example:
Computers, Software, Operating System, OS/2, Configuration, How to
When you climb the ladder (left to right), you are getting more specific. When you go right to left, you are becoming more general.
A search engine, on the other hand, takes a user querry and slams through its database until it finds matches for the terms. This requires the user have a good idea what they are looking for. The most popular search engines are AltaVista, Webcrawler, and Hotbot.
In this discussion I will use the term search engine to refer to both types of systems. This is appropriate, as all the directory systems employ search engine technology to toss a user directly into relevant catagories. The primary difference at this point is that the real search engines are searching their mirrors of documents, whereas the directories are searching their catagories.
I attemped searches to determine if this alternate text was indexed by inputting hey dude, which I knew to be included under an image on my welcome.htm page. This search on AltaVista yielded over 62,000 documents. By adding nerd (randomly seleted from the body) to the search, I found a reference to my page dated 04 May, 1997.
This test leads me to feel that use of the ALT parameter would be wise for all images.
Using the term Cannon Launched Guided Projectile and its acronym CLGP, as a control value, I have discovered several interesting aspects of search engine operation.
On Altavista, a search on CLGP (July 7, 1997) would yield 39 hits. Several of these hits lead to foreign language documents where clgp is a Spanish or Chezc dipthong. My article, Lasers on the Modern Battlefield, contains this term three times in the text, and appears as item number eight.
By inputting "cannon launched guided projectile" (note use of quotation marks), the search yields four hits, my article is number four. By loading all documents, counting the occurances of the data and viewing the HTML source, I found that the search term only appeared once in each doc. Furthermore, none of the articles used the META keywords function.
As I researched these documents, I began to wonder why my main page, welcome.htm did not show in the search. I tried the search using a comma in the arguement ("cannon launched, guided projectile") and by capitalizing the first letter of each word. The search yields the same four hits, indicating input parameters are not case sensitive or punctuation dependant.
Each of the docs should have carried the same weight, but mine was as the last item. My article appeared as the last hit because of the way Altavista executes its sort. Altavista lists articles with equal weight in alphabetic order based on the first line of text in the document.
The reason my mainpage is not on the list, is that the both the acronym and term appear within a tag. I placed the information on my top page in the form of an alphabetic index, but used the terms in the text of the link. If the search engine were to catalogue data within tags, a huge percentage of their information would be millions of URL's.
To increase my odds, the HTML line that reads:
<A HREF="docs/laser.htm"> CLGP- Cannon Launched, Guided Projectile</A>
needs to be changed to move the text outside of the tag:
<A HREF="docs/laser.htm"></A> CLGP- Cannon Launched, Guided Projectile
Unfotunately, this does not provide a target for the user to select.
It is considered bad form to use 'click here', yet this would solve the
problem. The link could contain an imbedded GIF in the form of
a bullet, but displaying the graphics would delay page loading. This
presents us with a classic trade-off situation. To test the ability
to manipulate the search engine, I will recode the HTML with an text bullet:
<A HREF="docs/laser.htm">[_] CLGP- Cannon Launched, Guided Projectile
This will provide the user with a target, load quickly, but will not be
as 'pretty' as graphical bullet. This is acceptable when considering
the majority of personal webpage visitors are using low bandwidth modems.
A search for CLGP on Webcrawler, however, finds the top document
(welcome.htm) and not the article (laser.doc). This would indicate that
Webcrawler can look inside tags, but raised the question as to why the
actual article did not display.
This raises an interesting question: if a document contains more than one keyword META statement, are the subsequent statements ignored.
The usefullness of adverbs is debatable. Some engines include what they call "fuzzy logic" and allow the use of words like "very." This is actually a natural language search that places more weight on the modified word than the other arguements. A search on "very red apples" would weight the word red, and may yeild an article about the sunset in Borneo. Even executing a relative position search would be of no help, as the author may have used the term "extremely red apples." This requires the engine be programmed to search not simply for "very," but all its synonyms. This would grind most servers to a halt.
Another search billed as fuzzy logic, is actually a phonetic search. In this search, "stashun" would successfully yields "station." Again, this places additional demands on the server--, demands engine operators are not prepared to accept.