What is a Search Engine Spider?

Comments Off on What is a Search Engine Spider?, 13/02/2012, by , in Google, Plugins and Tools

IndexesWe’ve spoken a lot on this blog about how it is important for your website to be fully searchable by the search engine spiders so now it’s time to get some in depth knowledge about what a search engine spider actually is.  A spider isn’t a cute little arachnid like you may have been led to believe; rather it is an automated software agent that seeks out the content on each and every one of your webpages.  The findings of the spider are then relayed back to the search engines enabling them to correctly position your webpage in the search engine results.

How do Spiders Work?

Before a search engine can correctly bring up results for your search terms it needs to know that webpages for your search terms exist and where to find them.  This is the job of the search engine spiders that literally crawl through webpages looking for keywords and indexing them.  As a spider moves through websites looking for keywords the process is called web crawling (yes this is one of the disadvantages of calling the internet the World Wide Web, we end up with a lot of spider centred names).  In order for the search engine results to be beneficial to the searcher, the spiders must crawl through millions of pages and index them correctly.

Where Does a Spider’s Job Begin?

A search engine spider will generally begin its search on a popular website or on a heavily used server.  As it begins to crawl through the webpages and index them, the spider will continue on to follow any links that are found on the webpage too.  By following links the search engine spiders find themselves indexing large parts of the internet, and in particular the most popular websites.  When Google was first invented, inventors Lawrence Page and Sergey Brin announced that when operating at peak performance the spiders could crawl up to 100 pages per second.

What do the Spider’s Look For?

The answer to this question varies depending on the search engine.  For example the Google spiders take into account the titles, subtitles, meta tags and keywords on a page whilst ignoring words like “a, an, of, in” etc.  Other spiders’ crawl the webpages differently – some may even still pay attention to keyword tags it all depends on the search engine they are working for.  Additionally the search engines are continually updating the way they regulate their searches and so whilst the Google spiders may take certain things into account now, this may all change in a few months time.

A Spider’s Job is Never Finished

With the World Wide Web changing on a daily basis, a spider’s job is never finished.  After indexing webpages, the search engines will use a variety of different algorithms to determine which webpage is the most useful for the searcher when they enter a search term.  Each page is given a different amount of “weight” by the search engine and so when a search is conducted by a search engine user they should, in theory, be presented with a list of the most useful webpages for their search term.  As the amount of webpages on the internet increases on a daily basis, search engine spiders need to continue to crawl the web (both new and existing pages) in order to ensure they are still presenting the most relevant search results to users.

If you want to see how your website looks to a search engine you might want to check out the Pingler Spider Viewer Tool.  Whilst the results aren’t intended to be seen by human eyes it can help you to determine what the spiders see and whether they can crawl your page effectively.