Navigation

MAIN
Home
Tools
Articles
Ebooks
Report Your Scams
Dictionary

My Affiliate Place Blog

Sitemap
Contact
Privacy Policy

 Subscribe in a reader

 
TECH SECTION

COMPUTER SOFTWARE
Norton Utilities Software
CA Utilities Software


COMPUTERS &
ELECTRONICS

Dell Weekly Deals
Dell Server-Electronic Deals
Apple Computers & Tech


TECH ACCESSORIES
Tech Accessories Deals


LEARN AFFILIATE
MARKETING

Affiliate Information
Website Information
SEO Keyword Information
Find Affiliate Programs
Promote Your Business
Keyword Search
Landing Page Basics
Protect Your Data
Adsense TidBits
Net Etiquette
Buying A Laptop
Buying A Desktop


BLOGGING
Blogging Basics
Blogging Income

LINKING
One-Way Inbound Linking 
Link Buying

SCAMS
Affiliate Program Scams
Ebay Scams


PCRush

Protecting Your Website-Behind The Scenes





Do you need to protect your website? Yes. If you're new to the Internet, or just coming onboard with your website do not neglect protecting your website. 

Once you have been online for a while, robots will begin spidering your site. You say, "Gee, that's a good thing." It can be. But it also can be bad thing, if their intent is to steal your content, harvest your emails, or harvest your images. So how do you protect your online investment. Through the .htaccess file and the robots.txt file.  

.htaccess

In most cases you will have to create your own .htaccess file, but yes, some web hosting services are beginning to offer services to help you create an .htaccess file.

The  .htaccess file (hypertext access) allows you to customize your configuration to specify security restriction for your directory or directories, password protect areas on your site, deny or allow IP addresses, and deny or allow search engines, and customize your error responses (such as 404 errors or rewriting your urls).


How do you create an .htaccess file?

The .htaccess file is created as a text file (htaccess.txt). After creating the file, you would FTP the file to your server, than rename the htaccess.txt to .htaccess



Custom error pages

Creating error pages is not difficult if you know the type of error you are wanting to use. For example: 404 error message is "no page found" error. To create a custom error page for 404 you will need to do two things:

First, you will need a custom file. For example: Let say you call the error document nofile.html. You would put in your .htaccess file the following:

ErrorDocument 404/nofile.html
(ErrorDocument errornumber/file.html

The nofile.html would be your custom file for no file found. You would then upload the nofile.html onto your service along with your .htacess file.

To deny access of an obnoxous bot you can use the following method:

#block bad bots
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot



<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

(the ^ code indicates the command of "anything that begins with")


Other Helpful Redirects with htaccess

The best way to create a redirect is through the htaccess file. If you are contemplating use a META redirect (that occurs within the <HEAD> of your web page) please reconsider. Why? Many search engines have troubles with this one and spammers use it in bad ways. Since the robots does not know a good guy from a bad guy you may find you inadvertently banned by the search engines.  With that being said, this is how you can do a redirect.

Redirect of a single page:

Redirect 301 /oldpage.html http://www.example.com/newpage.html

Redirect of a whole site:

Redirect 301 / http://www.example.com/



Robots.txt file

The main purpose of a robots.txt file is to give spiders/robots instructions on what they can crawl or cannot crawl. In fact, some robots will not venture to your site without a robots.txt file.

The robots.txt file can help you to:

1. exclude pages from being indexed that are still under construction
2. exclude directories or pages that you do not want indexed
3. exclude search engines that you may not want your website to appear in. (more on that later)



What is a robots.txt file

The robots.txt file is a simple text file. And yes, the robots.txt file is important. 
With a robots.txt file, it will allow your site to be indexed by the spiders much faster and more accurately.

Some Robots.txt commands

*If you want all robots to index your pages you can put the following command.

User-agent: *

Disallow:


*If you want to disallow a robot from your site you would use the following code:

User-agent: specificbadbot

Disallow: /

(The "/" is needed, because that means "all directories")


*If you don't want a complete directory to be indexed you would put the following code. The "nogoindex" is the directory you don't want indexed.

User-agent: Googlebot

Disallow: /nogoindex/

The forward slash at the beginning and at the end,  tells the search engine not 
to include any of the directories.

*If you don't want Google to index a page you would put the following code. The "nogoindex" being the directory and the "donotenterpage" being the page you don't want Google to Index.

User-agent: Googlebot

Disallow: /nogoindex/donotenterpage.html


Rules-And Search Engines

In a perfect world, all rules would be followed. But alas, not all robots follow the rules of the robots.txt file. How do you know that a robot is not following your rules. Look at your stats and how often they are hitting your site and what
they are pulling from your site. Or worse yet, you get an email from your web hosting server saying-"your content is being stolen."

If you find a robot is not following the rules, consuming your bandwidth, stealing your content, and not following your robots.txt instruction then it is time to utilize the .htaccess file to prevent them from even entering your site at all. 

After you have created your robots.txt file, be sure and validate your coding. What can happen if it is unvalidated? If your happen to forget a line of code, and say the Googlebot hits your file, you may find in a few days your site has been de-indexed.  Several places in which you can have your robots.txt file validated is at:

Free Validation tools:

Clockwatchers
Search Engine Promotions

Note: There are also a lot of robots.txt generator tools if you don't feel comfortable creating your own at the very beginning. Just do a search in Google on Robots.txt generator tools.

 


Google Rules with HTML

I've read other sites that say, don't use the META TAGS to indicate to the search engines that you do not want the page indexed. But when I checked Google Webmaster-they do say to use the META TAGS within the head section for doing some of the following:


To prevent robots from indexing your page you would:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">


To allow robots to index the page on your site, but
instruct them not to follow outgoing links:


<META NAME="ROBOTS" CONTENT="NOFOLLOW">

To Index the page on your site but not index the images on that page:


<META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">






© Copyright 2005-2008 MyAffiliatePlace.com  All Rights Reserved     My Affiliate Place - Website Protection