Magento robots text SEO

Magento robots txt SEO
   
It is groovy when search engines often invade your site and index your content. To discuss the scenario in a more meticulous manner let’s glimpse through a small example. When a page possess two versions one for viewing and the other for printing purpose, to evade threat of duplicate content penalty, the printing version can be omitted from creeping. Another instance could be when there are some sensitive data put up on the site and not intended to expose to the world, it is preferred that search engines do not index these pages. Robots metatag is one way to inform the search engines the files and folders to be avoided. But one should always hold in mind that metatag go unnoticed often, in this case robots.txt can be utilized.
What is a robots.txt file?
Robots.txt in magento is a significant aspect and let’s dig in more about the same. A robots.txt file is not perplexed but a text file which stands on your site which informs search robots the page to be excluded while visiting. It is essential to highlight that these files are not prevented from invading your sites, but if there are sensitive data present relying on robots.txt can be a perfect sketch from being displayed. Here is a basic Robots.txt file that you can edit to your specifications:
# Crawlers Setup
User-agent : *
# Allowable Index
Allow:  /*?p=
Allow:  /media/
# Directories
Disallow:  /404/
Disallow:  /app/
Disallow:   /errors/
Disallow:   /downloader/
Disallow:  /includes/
Disallow:  /js/
Disallow:   /lib/
Disallow:  /magento/
Disallow:  /pkginfo/
Disallow:  /report/
Disallow:   /skin/
Disallow:   /stats/
Disallow:  /var/
#  Paths    (clean URLs)
Disallow:   /index.php/
Disallow:   /catalog/product compare/
Disallow:   /catalog/category/ view/
Disallow:  /catalog/ product/ view/
Disallow:  /catalogsearch/
Disallow:  /ckeckout/
Disallow:   /control/
Disallow:   /contacts/
Disallow:  /customer/
Disallow:  /customize/
Disallow:  /newsletter/
Disallow:  /poll/
Disallow:   /review/
Disallow:   /sendfriend/
Disallow:  /tag/
Disallow:  /wishlist/
# Files
Disallow:   /cron.php
Disallow:   /cron.sh
Disallow:  /error_log
Disallow:  /install.php
Disallow:  LICENSE.html
Disallow:  /LICENSE.txt
Disallow:   /LICENSE  AFL. txt
Disallow:  /STATUS . txt
# Paths (no clean URLs)
Disallow:   /*.js$
Disallow:  /*.css$
Disallow:  /*.php$
Disallow:  /*?p=*&
Disallow:  /*?SID=
How to generate a robots.txt file?
Generally your Magento Go store already has a robots.txt file set to “INDEX, FOLLOW”. By default the setting directs all the search engine bots to index the pages and the follow links, this is also in your interest to get your content indexed by search engines-especially the product pages.
You can modify these instructions from your Admin Panel of your Magento Go store and go to system -> Configuration -> Design -> Search Engine robots. If you want to set the Default Robots option, do as follows:
NOINDEX,FOLLOW: In this setting the pages will not be indexed, but the search engine bots will be allowed to follow links from the applicable pages*
INDEX, NOFOLLOW: The pages are indeed, but search engine bots do not follow the links.
NOINDEX, NOFOLLOW: Neither the pages are indexed and neither search engine bots follow links.
Note (*): Applicabable pages are the ones excluded by using the “disallow:” feature in robots.txt.
For your convenience, we are including a widely available robots.txt file for use with your Magento which is beneficial as it improves your SEO and reduces the bandwidth usage and server load as well.
1. # $Id: robots.txt,v magento-specific 2010/28/01 18:24:19 goba Exp $
2. #
3. # robots.txt
4. #
5. # This file is to prevent the crawling and indexing of certain parts
6. # of your site by web crawlers and spiders run by sites like Yahoo!
7. # and Google. By telling these ?robots? where not to go on your site,
8. # you save bandwidth and server resources.
9. #
10. # This file will be ignored unless it is at the root of your host:
11. # Used:  http://example.com/robots.txt
12. # Ignored: http://example.com/site/robots.txt
13. #
14. # For more information about the robots.txt standard, see:
15. # http://www.robotstxt.org/wc/robots.html
16. #
17. # For syntax checking, see:
18. # http://www.sxw.org.uk/computing/robots/check.html
19. # Website Sitemap
20. # Sitemap: http://www.mywebsite.com/sitemap.xml
21.
22. # Crawlers Setup
23.User-agent: *
24. Crawl-delay: 30
25. # Allowable Index
26. Allow: /*?p=
27. Allow: /index.php/blog/
28. Allow: /catalog/seo_sitemap/category/
29. Allow: /catalogsearch/result/
30. Allow: /media/
31. # Directories
32. Disallow: /404/
33.Disallow: /app/
34. Disallow: /cgi-bin/
35. Disallow: /downloader/
36. Disallow: /errors/
37. Disallow: /includes/
38. Disallow: /js/
39. Disallow: /lib/
40. Disallow: /magento/
41. Disallow: /pkginfo/
42. Disallow: /report/
43. Disallow: /scripts/
44.Disallow: /shell/
45. Disallow: /skin/
46. Disallow: /stats/
47. Disallow: /var/
48. # Paths (clean URLs)
49. Disallow: /index.php/
50. Disallow: /catalog/product_compare/
51. Disallow: /catalog/category/view/
52. Disallow: /catalog/product/view/
53. Disallow: /catalogsearch/
54. Disallow: /checkout/
55. Disallow: /control/
56. Disallow: /contacts/
57. Disallow: /customer/
58. Disallow: /customize/
59. Disallow: /newsletter/
60. Disallow: /poll/
61. Disallow: /review/
62. Disallow: /sendfriend/
63. Disallow: /tag/
64. Disallow: /wishlist/
65. # Files
66. Disallow: /cron.php
67. Disallow: /cron.sh
68. Disallow: /error_log
69. Disallow: /install.php
70. Disallow: /LICENSE.html
71. Disallow: /LICENSE.txt
72. Disallow: /LICENSE_AFL.txt
73. Disallow: /STATUS.txt
74. # Paths (no clean URLs)
75. Disallow: /*.js$
76. Disallow: /*.css$
77. Disallow: /*.php$
78. Disallow: /*?p=*&
79. Disallow: /*?SID=
80. Disallow: /*?limit=all
81
82. # Uncomment if you do not wish for Google to index your images
83. #User-agent: Googlebot-Image
84. #Disallow: /

If you want to install robots.txt file to your domain, then follow these simple steps:
Step 1: Download the robots.txt file.
Step 2: In case, your Magento is installed inside a sub-directory you will require modifying your robots.txt file accordingly. Example, you will require changing ‘Disallow:/404/’ to ‘Disallow:/your-sub-directory/404/” and ‘Disallow:/app/’ to ‘Disallow:/your-sub-directory/app/’ and so on forth..
Step 3: If you have a sitemap for your domain, you can uncomment the line number 21 of the robots.txt file and include the URL to your sitemap.xml.
Step 4: Now upload your robots.txt file to your web-root folder. This can be easily done by placing the robots.txt file within your ‘httpdocs/’ directory by logging into your Plesk Hosting Control Panel with your credentials or through your choice FTP client.
How to use robots.txt file?
The purpose of this file is to let the search bots know which file to index and which not to. Mostly it is used for specifying the files not to be indexed by the search engines. If you want the search bots to crawl and index your site, you can do so by adding these lines in your robots.txt file:
User-agent: *
Disallow:
And, if you want to disallow indexing of your site, add these lines:
User-agent: *
Disallow:  /
Benefits of using robots.txt
The main benefit of using robots.txt will definitely be in its ability of keeping your sensitive information private. Say your e-commerce site stores their client’s personal information on the site and this information can be accessed from the website. In this case, robots.txt would tell the search engine spiders to stay away from this information.
Robots.txt know how to avoid Canonicalization
Robbots.txt files help in avoiding canonicalization issues or problems of having multiple canonical URLs, better known as duplicate content. This problem can be encountered if you have the same information on more than one page. For e.g. one page might have the details of the product and the pother page might have the print-out of the details. In this issue the search engine needs to figure out which is the canonical one. E.g. site.com/product-a.html or site.com/product-a-print.html.

Comments

Popular posts from this blog

Hair Loss Treatment

SEO Marketing Services

Hair Loss Causes