Navigation

MAIN
Home
Articles
Ebooks
Report Your Scams
Dictionary

My Affiliate Place Blog

Sitemap
Contact

 Subscribe in a reader


TECH SECTION
BUSINESS SOFTWARE
Accounting Software
Video Web Conferencing

Virtual Phone Service

COMPUTER SOFTWARE
Avast Security Utilities
Farstone Backup/Recovery
Kaspersky Internet Security

COMPUTERS &
ELECTRONICS

Dell Weekly Deals
Dell Server-Electronic Deals


TECH ACCESSORIES
Tech Accessories Deals


LEARN AFFILIATE
MARKETING

Affiliate Marketing Info
Landing Page Basics
Net Etiquette


AFFILIATE PROGRAMS & PROMOTE
Find Affiliate Programs
Promote Your Business


ADSENSE
Adsense TidBits
Adsense Basics


ADWORDS
Adwords Basics


SEO
SEO Basics
Meta Tag Tips
Keyword Research
Search Methods


WEBSITE BASICS
Website Overview
Building Your Website 
Domain and Subdomain
Website Protection-htaccess
Website Protection-Robots.txt


EQUIPMENT FOR
ONLINE BUSINESS

Buying A Laptop
Buying A Desktop
Protect Your Data


MARKETING
Article Marketing
Email Marketing


BLOG/PODCAST
Blogging Basics
Mechanics of Podcasting

Behind the Scenes with Bots Manners

By Vickie J. Scanlon


Do bots have manners? You may be saying to yourself, "I really don't care if bots have manners". But you should. Why? If they behave badly, you are the recipient of lost bandwidth, traffic, or content and email. But there are some ways in which you can keep search engine bots honest. How, you ask? It is the .htaccess file, the robots.txt file and the and the meta tag that can save your traffic, content and email from being stolen or abused. The first two, are simple text files, and the last, a line of code that direct all or specific bots that enter the web page, to follow your set rules.

You may be one of those people who may be a little programming challenged-and that possibly means you are bulking at the challenge ahead of you. But when you begin to see your website under attack or your traffic coming to a sudden stand still, you may change your mind. With that being said, I will attempt to clarify, keeping the information as simple as possible to the how to of setting them up, putting the files online.


How do you know you have bad bots?

You can check your traffic stats page or a log file supplied by your server to see how many times the bot is hitting your site. For example purposes, let's say your traffic stats page has identified for you the search engines that are hitting your site. You noticed that several of the bots are hitting your site many times, but you're not familiar with the name. What do you do?

I would grab the bots that you are not familiar with or are questioning, and do a search in Google. In the search box put "BadBotName + bad bot". If it's a bad bot, you will see the screaming of the bots transgression by webmasters quickly. Sometimes the bot will identify itself with a url address-check it out and see what they are doing. Sometimes the "bad bot" label is subjective -- so you will have to decide if you want to allow the bot or not.

Note: If a bot wants to stay out of the "bad bot" file, they should have a page that identifies itself and it's purpose for webmasters, as well as, how to prevent the bot from entering the page if the webmaster decides this is not the type of search engine they want on their site.


How to set up a Robots.txt file

The good bots will abide by what you put into the Robots.txt file. But yet, there are the bad bots that will ignore your robots.txt file instructions, taking your bandwidth,content and whatever else they find of quality from your website. I'll go into the "following no rules bad bots" later, but first, let's learn how to tame the bots that follow the rules.

There are two ways in which you can tell the bots to either not hit a certain page, or not to bother you at all.

Here is how you set the Robots.txt up to say, "Hey bot, don't bother me at all".

1. Open a Notepad session
2. Put the following info into the file to deny a specific bots from entering your site:

    User-agent: specificbadbot
    Disallow: /

3. For the user-agent, you will need to identify the specific bots, which you can get from your traffic stats page or from your log files.

Example: Let's say your traffic stats identify a bot called "gimmeemailbot" is hitting your site. You did a search, and found that this fellow is a bad bot that harvests emails. Here is how you would enter the information to restrict "gimmeemailbot" from entering your website.

     User-agent: gimmeemailbot
     Disallow: /

The "/" means to disallow all. Now that you entered this piece of code into Notepad, save the file and call it robots.txt
4. You will now have to upload the robots.txt file to your main directory.
5. Verify the syntax of your robots.txt file. You can go to Google Search and put in Robots.txt Checker to get a free tool to check your file.


All Bots Welcome

If you are a new site, and don't have anything at the moment that they can and or want to steal, you can put code in to say,
all bots are welcome.

    User-agent: *
    Disallow:


But let me warn you, if you have images, or are collecting emails, or have unique copy you must be ever vigilant and look at your stats daily, for bots that may be hitting your site continuously.


Other ways to tell bots not to enter a web page

Another way to say tell a robot not to index or follow the links on a web page is by putting the following code into a meta tag within the head section of your web page:

<META name="robots" content="noindex, nofollow">

If you want to have the page that has useful content but the links do not have much relevancy, then put the following code on the page within the head section of your web page:

<META name="robots" content="nofollow">


Bad bots that do not follow any of the rules.

For the bad bots that do not follow any of the rules, and that fly by the robots.txt as if it does not exist, you will need
to use a .htaccess file. The .htaccess file will exclude the specific robot from entering your website.

This is how to set up the .htaccess file and how to put it into your main directory.

1) Open up a session of Notepad.
2) Put the following code in:

#block bad bots

SetEnvIfNoCase user-agent "^Zeus" bad_bot
SetEnvIfNoCase user-agent "^EmailSiphon" bad_bot
SetEnvIfNoCase user-agent "^EmailWolf" bad_bot


<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

The first line of text beginning with "#" is a comment-stating the purpose of the .htaccess file

The "^" states that anything looking like this, I want to include in my exclusion.

3) Save it as htaccess.txt
4) Upload the file to your main directory. Now, this is the tricky part. After FTPing it to your site you will need to change the name of the htaccess file to read: .htaccess

You need to get rid of the .txt file extension or the server will not recognize the .htaccess file. So within the server directory you will hit rename and change the htaccess.txt to .htaccess

5) You're done.

However, if you have a bad bot, don't be surprised if they change their naming contention to get back into your site. If they do, you will have to add the new name to your .htaccess file to keep them out.


Though it may seem like a lot of work, trust me, it will save you the anguish I felt when I found out the bots were stealing my traffic and my content. Take the time to learn ways to keeping the bots honest, and your data safe and making money for you.


About the Author:

Vickie J Scanlon -- Visit her site at: http://www.myaffiliateplace.biz  for articles, ebooks, affiliate/internet how to info,  tech accessories, security software, accounting software, and computers for your home office or online business.