Wordpress is truly a great piece of software. It really changed up dramatically the way people create, use and manage blogs or websites. But when it comes to duplicate content, boy they really messed it up.
It just scares me how Wordpress makes the categorization of items. Without a few tweaks in terms of dealing with duplicate content after a fresh Wordpress you are dead. Normally the number of pages indexed by Google should be equal to the number written on posts and pages … well, things are not that simple when using Wordpress. Wordpress creates pages for paging, search, trackback, authors, dates, categories and archive, which can contain only the excerpt of the post or the entire post.
The way the content is arranged and structured in Wordpress is useful for the users, no doubt, but becomes a problem when search engines index three times more pages than they should. Literally, on average, for a fresh Wordpress installation with 10 published posts, Google ends up indexing close to 30 URLs, all pointing back to just 10 unique articles.
You might say you’ll get more traffic from Google if you have more pages in Google index, but there is a bigger chance of getting a penalty rather than a boost in traffic. There are obvious advantages of clearing up this mess, like better crawling results, better Page Rank distribution and original content.
The problem can be solved pretty simple, just by adding “noindex, nofollow” to unwanted pages. These pages can still be crawled by spiders; however the search engines will not include them in the index anymore.
Using the built-in Wordpress functions you can create and add the code below inside your header.php file, before the </head> tag.
<?php if ( $paged > 1 ) { echo '<meta name="robots" content="noindex,follow" /> '; }?> <?php if (is_author() ) { echo '<meta name="robots" content="noindex,follow" /> '; }?> <?php if (is_trackback() ) { echo '<meta name="robots" content="noindex,follow" /> '; }?> <?php if (is_search() ) { echo '<meta name="robots" content="noindex,follow" /> '; }?> <?php if (is_date() ) { echo '<meta name="robots" content="noindex,follow" /> '; }?>
For my personal blog for example, I am allowing Google Bot and also the other search engine spiders to crawl and index my categories, which contain only the excerpts of my blog posts, so the content is not really the same as the content on the pages displaying the single posts.
You can use the following code if you want to exclude also the categories from the Google index.
<?php if (is_category() ) {
echo '<meta name="robots" content="noindex,follow" /> ';
}?>
Google likes pages that have large amounts of content, therefore pages like categories or archives, will most likely receive more credit then single post pages for example.
After using the code above you might notice a small decrease in traffic but it should be temporary, until the link juice gets redistributed between the pages that remained in the index.
You can check to see if everything went well by doing a search query site: example.com and see if the pages you wanted to remove from Google index are still there or not. Depending on the crawl rate of your website it can take anywhere from a couple of hours to a few days for the changes appear in the results page.
We hope you will find the tips above useful. We would love to hear your feedback regarding this article, so please comment below! For more useful articles like this please don’t forget to subscribe to the RSS-feed and follow Inspirationfeed on Twitter + Facebook! If you enjoyed the following article we humbly ask you to comment, and help us spread the word!