The Basics of Search Engine Spiders

Search Engine WebSearch engine spiders, also known as crawlers, consist of scripts or programs that when used by search engines, go out and collect information about publicly-accessible web pages.  These spiders are solely responsible for collecting information – including content, keywords and meta data – and use this information to index your site and rank it against other web sites for any given search query.  There are multiple types of spiders that are designed to scan and engage in specific functions and acts, each contributing to how any given search engine views your site.  By understanding what these spiders are doing when they happen upon your website, you will be better equipped to know how to use these spiders to your advantage.

Selection Spiders

Selection spiders are the most basic form of search engine crawler and are typically what we think of when the word ‘search engine spider’ comes to mind.  As these spiders crawl the web, they are primarily looking for web pages that have not been indexed yet and from there, proceed to add these sites’ data and information to the specified search engine.  These types of spiders will also look for any major changes to already-indexed pages and may apply the update, depending on the rank of the page and the exact commands of the spider.

Re-Visitation Spiders

This form of spider is tasked with the exclusive job of doing what some selection spiders do: visiting already-indexed pages to determine if any major updates or changes need to be applied to the search algorithms.  Re-visitation spiders and crawlers are predominantly useful for larger sites and blogs that have constant content being added and need continuous updates in order for that content to be viewable by users of search engines.  Re-visitation spiders will check any given site at a frequency that is calculated based on factors pertaining both to the specific search engine and the site itself.

Polite Spiders

These form of spiders work in many of the same ways as other spiders do, with the main difference being that the spider alternates data gathering between multiple websites in order to prevent excessive bandwidth consumption on behalf of the bot and to prevent the bot from crashing due to certain scripts being ran on particular sites.  “Impolite” bots will often ignore specifications made by the site (robots.txt file is one example) and will end up wasting bandwidth or crashing the site (the bots can also be crashed by ignoring the commands of the site and being tricked into a “spider trap”).  Polite bots gain their name from being considerate when it comes to bandwidth and an individual website’s specifications.

What Are Spiders Seeing?

If you are curious as to what data search engine spiders are collecting about your website, then you should try out Pingler’s Site Spider Viewer, which allows you to insert the URL of any primary, secondary or tertiary webpage and see the same information that spiders are collecting about your website.  This tool is perfect for anyone who wants to see or revise the information being collected about their website.






5 comments

  1. Gin
    March 26th, 2012 19:20

    Thanks for your useful article. It’s my new knowledge.

    Reply

  2. March 27th, 2012 18:14

    very nice article and also true thanks

    Reply

  3. March 28th, 2012 3:15

    Great info! I didn’t know there were so many different spiders, I thought it was only one spider doing all the work. Very interesting, now I’ll focus on specific spider food that will attract the spiders I want.

    Reply

  4. April 5th, 2012 10:00

    Oh there are many types of spiders. this is the first time i read about it. seems there are lots of things i have to learn

    Reply

  5. July 22nd, 2012 22:16

    Hey there, You’ve done an excellent job. I’ll certainly digg it and in my opinion recommend to my friends. I’m sure they’ll be benefited from this site.

    Reply

Leave a reply translated

Your email address will not be published. Required fields are marked *

eight + one =