Navigation

MAIN
Home
Articles
Ebooks
Report Your Scams
Dictionary

My Affiliate Place Blog

Sitemap
Contact
Privacy Policy

 Subscribe in a reader


TECH SECTION
BUSINESS SOFTWARE
Accounting Software
Video Web Conferencing


COMPUTER SOFTWARE
Norton Utilities Software
Norton Business Security 


COMPUTERS &
ELECTRONICS

Dell Weekly Deals
Dell Server-Electronic Deals


TECH ACCESSORIES
Tech Accessories Deals


LEARN AFFILIATE
MARKETING

Affiliate Marketing Info
Landing Page Basics
Net Etiquette


AFFILIATE PROGRAMS & PROMOTE
Find Affiliate Programs
Promote Your Business


ADSENSE
Adsense TidBits
Adsense Basics


ADWORDS
Adwords Basics


SEO
SEO Basics
Meta Tag Tips
Keyword Research
Search Methods


WEBSITE BASICS
Website Overview

Web Host Services Compare
Building Your Website 
Domain and Subdomain
Website Protection-htaccess



EQUIPMENT FOR
ONLINE BUSINESS

Buying A Laptop
Buying A Desktop
Protect Your Data


ARTICLE MARKETING
Article Marketing
Article Submission Services


EMAIL MARKETING
Email Marketing

BLOGGING
Blogging Basics
Blogging Income

PODCASTING
Podcasting Overview
Mechanics of Podcasting


Protecting Your Website with a
Robots.txt file




Do You Need A Robots.txt file 

Do you need a robots.txt file? Is it important? Yes, you need a robots.txt file and, yes, the file is important. The main purpose of a robots.txt file is to give spiders/robots instructions on what they can crawl or cannot crawl. In fact, some robots will not venture to your site without a robots.txt file be present within the main directory of your website. So what can a robots.txt file do you?

The robots.txt file can help you to:

1. exclude pages from being indexed that are still under construction
2. exclude directories or pages that you do not want indexed
3. exclude search engines that you may not want your website to appear in. (more on that later)

However, not all spiders are polite, and some are just downright rude. But for those who mind their manners, they look first to the robots.txt file on your server before entering. 

For information on taking care of the rude spiders you'll find the following information regarding htaccess file helpful.

For the moment, lets discuss the robots text file.


Building a Robots.txt file

Building a robots.txt file is relatively simple. There are certain commands that you use to either tell a robot to please visit your site, you are welcome; or to tell a robot/spider to please go away, you're no longer welcome.

Below are some specific commands that you can use within your Robots.txt file to keep control of the different bots.

*If you want all robots to index your pages you can put the following command.

User-agent: *

Disallow:

*If you want to disallow a robot from your site you would use the following code:

User-agent: specificbadbot

Disallow: /

(The "/" is needed, because that means "all directories")



*If you don't want a complete directory to be indexed you would put the following code. The "nogoindex" is the directory you don't want indexed.

User-agent: Googlebot

Disallow: /nogoindex/

The forward slash at the beginning and at the end,  tells the search engine not 
to include any of the directories.


*If you don't want Google to index a page you would put the following code.
(The "nogoindex" being the directory and the "donotenterpage" being the page you don't want Google to Index.)

User-agent: Googlebot

Disallow: /nogoindex/donotenterpage.html


After you have created your robots.txt file, be sure and validate your coding. What can happen if it is invalidated? If you happen to forget a line of code, and say the Googlebot hits your file, you may find in a few days your site has been de-indexed.  Thus, to save-guard yourself against such an incident from happening there are several places in which you can have your robots.txt file validated:

Free Validation tools:

Clockwatchers

Search Engine Promotions


Note: There are also a lot of robots.txt generator tools if you don't feel comfortable creating your own at the very beginning. Just do a search in Google on Robots.txt generator tools.


Google Rules with HTML

I've read other sites that say, don't use the META TAGS to indicate to the search engines that you do not want the page indexed. But when I checked Google Webmaster-they do say to use the META TAGS within the head section for doing some of the following:

To prevent robots from indexing your page you would:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">


To allow robots to index the page on your site, but
instruct them not to follow outgoing links:


<META NAME="ROBOTS" CONTENT="NOFOLLOW">

To Index the page on your site but not index the images on that page:


<META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">


Rules-And Search Engines


In a perfect world, all rules would be followed. But alas, not all spiders follow the rules of the robots.txt file. How do you know that a spider is not following your rules?

Check your stats. Are you find a reoccurring IP address hitting your site? Or, maybe you find a spider hitting your site and it is consistently pulling information that they shouldn't be-they could be scraping your content, or maybe the spider is consuming bandwidth. Finally, if you do nothing with keeping your  visiting robots in check, you may get an email from your web hosting provider that happens to say -"Did you know that your content is being stolen?" 

If you are having these types of experiences, you're not alone. It happens to all on the Internet, at one time or another. So, what can you do if the robots.txt file is not keeping all your visitors in line?

If you find a visiting robot is not following your rules, consuming your bandwidth, stealing your content, and running haphazardly on your website or blog, then it is time to utilize the htaccess file to prevent them from even entering your site at all. 

Previous: Create htaccess File    



Relevant Articles

Referrer Spam-What Is It?
What is referrer spam? Can it affect your blog or website? The answer is yes. I'll tell you what it is, how it can affect you and what you can do about it.

Web Log-Analyze Your Data
Your web log can be a great source of information. You can learn a lot about your website performance and your visitors, be it human or robots/spiders. It's all useful.

Sitemaps Are They Important
Site maps, are they important? Do you really need one? These were several questions that I often asked but was hesitate to ponder to closely because of the technical aspects involved.









Copyright © 2005-2010 All Rights Reserved My Affiliate Place.