|
Navigation
MAIN
Home
Articles
Ebooks
Report Your Scams
Dictionary
My Affiliate Place Blog
Sitemap
Contact
Privacy Policy
Subscribe in a reader
TECH SECTION
BUSINESS SOFTWARE
Accounting Software
Video Web Conferencing
Virtual
Phone Service
COMPUTER SOFTWARE
Kaspersky
Utilities Software
Avast
Security Utilities
COMPUTERS &
ELECTRONICS
Dell Weekly
Deals
Dell Server-Electronic Deals
TECH ACCESSORIES
Tech Accessories
Deals
LEARN AFFILIATE
MARKETING
Affiliate Marketing Info
Landing Page Basics
Net Etiquette
AFFILIATE PROGRAMS & PROMOTE
Find Affiliate Programs
Promote Your Business
ADSENSE
Adsense TidBits
Adsense Basics
ADWORDS
Adwords Basics
SEO
SEO Basics
Meta Tag Tips
Keyword Research
Search Methods
WEBSITE BASICS
Website Overview
Building Your
Website
Domain and Subdomain
Website Protection-htaccess
EQUIPMENT FOR
ONLINE BUSINESS
Buying A Laptop
Buying A Desktop
Protect Your Data
ARTICLE MARKETING
Article Marketing
Article Submission
Services
EMAIL MARKETING
Email Marketing
BLOGGING
Blogging Basics
Blogging Income
PODCASTING
Podcasting Overview
Mechanics of Podcasting
| |
Protecting Your Website with a
Robots.txt file
Do You Need A Robots.txt file
Do you need a robots.txt file? Is it
important? Yes, you need a robots.txt file and, yes, the file is important. The main purpose of a robots.txt file is to give spiders/robots instructions on what they
can crawl or cannot crawl. In fact, some robots will not venture to your site without a robots.txt
file be present within the main directory of your website. So what
can a robots.txt file do you?
The robots.txt file can help you to:
1. exclude pages from being indexed that are still under construction
2. exclude directories or pages that you do not want indexed
3. exclude search engines that you may not want your website to
appear in. (more on that later)
However, not all spiders are polite, and some are just downright
rude. But for those who mind their manners, they look first to the
robots.txt file on your server before entering.
For information on taking care of
the rude spiders you'll find the following information regarding htaccess
file helpful.
For the moment, lets discuss the robots text file.
Building a Robots.txt
file
Building a robots.txt file is relatively simple. There are certain
commands that you use to either tell a robot to please visit
your site, you are welcome; or to tell a robot/spider to please
go away, you're no longer welcome.
Below are some specific commands that you can use within
your Robots.txt file to keep control of the different bots.
*If you want all robots to index your pages
you can put the following command.
User-agent: *
Disallow:
*If you want to disallow a robot from your site you would use the following code:
User-agent: specificbadbot
Disallow: /
(The "/" is needed, because that means "all directories")
*If you don't want a complete directory to be indexed you would put the following code.
The "nogoindex" is the directory you don't want indexed.
User-agent: Googlebot
Disallow: /nogoindex/
The forward slash at the beginning and at the end, tells the search engine not
to include any of the directories.
*If you don't want Google to index a page you would put the following code.
(The "nogoindex" being the directory and the "donotenterpage"
being the page you don't want Google to Index.)
User-agent: Googlebot
Disallow: /nogoindex/donotenterpage.html
After you have created your robots.txt file, be sure and validate your coding. What can
happen if it is invalidated? If you happen to forget a line of code, and say the Googlebot hits your file, you may find in a few days
your site has been de-indexed. Thus, to save-guard yourself
against such an incident from happening there are several places in which you can have your robots.txt file validated:
Free Validation tools:
Clockwatchers
Search Engine
Promotions
Note: There are also a lot of robots.txt generator tools if you
don't feel comfortable creating your own at the very beginning. Just
do a search in Google on Robots.txt generator tools.
Google Rules with HTML
I've read other sites that say, don't use the META TAGS to indicate to the search engines that you do not want
the page indexed. But when I checked Google Webmaster-they do say to use the META TAGS within the head section for doing some of the following:
To prevent robots from indexing your page you would:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
To allow robots to index the page on your site, but
instruct them not to follow outgoing links:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
To Index the page on your site but not index the images on that page:
<META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">
Rules-And Search Engines
In a perfect world, all rules would be followed. But alas, not all
spiders follow the rules of the robots.txt file. How do you know that a
spider is not following your rules?
Check your stats. Are you find a reoccurring IP address hitting your
site? Or, maybe you find a spider hitting your site and it is
consistently pulling information that they shouldn't be-they could
be scraping your content, or maybe the spider is consuming
bandwidth. Finally, if you do nothing with keeping your
visiting robots in check, you may get an email
from your web hosting provider that happens to say -"Did you
know that your content is being stolen?"
If you are having these types of experiences, you're not alone. It
happens to all on the Internet, at one time or another. So, what can
you do if the robots.txt file is not keeping all your visitors in
line?
If you find a visiting robot is not following your rules, consuming
your bandwidth, stealing your content, and running haphazardly on
your website or blog, then it is time to utilize the htaccess file to prevent them from even entering your site at all.
Previous: Create htaccess File
Relevant
Articles
Referrer Spam-What Is It?
What is referrer spam? Can it affect your blog or
website? The answer is yes. I'll tell you what it is, how it can affect you and
what you can do about it.
Web
Log-Analyze Your Data
Your web log can be a great source of information. You
can learn a lot about your website performance and your visitors, be it human or
robots/spiders. It's all useful.
Sitemaps Are They Important
Site maps, are they important? Do you really need one?
These were several questions that I often asked but was hesitate to ponder to
closely because of the technical aspects involved.
|