How Regular Expression Affects SEO
This is perhaps one of the things that an average SEO specialist overlooks. Regular Expression is an important part of technical SEO. In this entry, I’ll discuss the basics and uses of Regular Expression.
Case Study
As an SEO specialist, Regular Expression (or more commonly known as RegEx) is going to be mostly used for mathematically or logically measuring or identifying words, numbers and symbols in the URL. Personally, I use it to track and tag URLs or to make smart redirects happen from .htaccess. For this entry, I’ll focus on smart redirects so as to keep your brain juices flowing.
Here’s a scenario:
I have a client and some other ‘SEO web development company’ did their website revamp for them. That’s fine. However, when I checked the URL structure, it had a very fundamental mistake. Instead of writing:
http://example.com/consumer-retail-products
it said:
http://example.com/consumer_and_retail_products
And the extremely sad thing about it is that it has other subfolders aside – all the way to level 4. Which makes the URL look like this:
http://example.com/consumer_and_retail_products/home/samsung-television
I was extremely ticked off that this web development company even boasted that they’re also SEO experts. What they did was a very fundamental mistake. I wanted to change all the URLs to look like this:
http://example.com/consumer-retail-products/home/samsung-television
For the reason that it looks a lot cleaner and I want the stop words removed (to learn more about stop words, download our eBook.)
Of course, I don’t want to just DELETE all the pages that has http://example.com/consumer_and_retail_products/home/samsung-television and pray that Google picks up on what I did and fix my PageRank flow and rankings.
I want to make sure that the existing link juice will be effectively passed over to the new URLs. Hence, I need to put in 301 (permanent) redirects to from all the old, ugly, unoptimized URLs to all the new URLs.
One option is to go brute force and change ALL the pages with the ugly URL and change it to the optimized version. But that would take days and days of time and effort. Especially on big site with lots of pages. So I looked around for RegEx solutions I could apply from the .htaccess level.
After hours and hours of study and searching, I finally found the solution. And it looked like this:
# 301 Redirect for URL Optimization
RewriteEngine On
RewriteRule ^consumer_and_retail_products/(.*?)$ http://example.com/consumer-retail-products/$1 [R=301,L]
Now, don’t go on closing this window yet! I’m going to explain what this jargon means.
# 301 Redirect for URL Optimization is just a comment I placed in there to remind me that this is a code that does exactly that.
RewriteEngine On is an .htaccess command that turns off/on the comment mode for mod_rewrite. This ensures that the next lines of functions will be executed. Generally, this also gives you the power to redirect URLs.
RewriteRule is the parameter of when the Rewrite will happen and what type of Rewrite will happen.
^ – The start of an absolute set of string
$ – The end of an absolute set of string
( ) – This is a function that captures the set of characters inside to use for a latter purpose.
. – Any form of character. This can include A-Z, 0-9 or even some symbols for that matter.
* – This dictates that the character can be repeated any number of times or not repeated at all (thus, empty).
? – This dictates that the character before it may be optional.
(.*?) – This combination is just like saying: “Everything captured here can be empty or it can be any character in any quantity, which will be saved for a latter purpose.”
As I put consumer_and_retail_products/(.*?) in the middle of ^ and $, it ensures that the set of string has to absolutely look like such. The string ends there. In effect, the RegEx phrase: ^consumer_and_retail_products/(.*?)$ is saying:
“Every link that has consumer_and_retail_products/ will have any of its subsequent level of subdirectory afterwards captured. Such as http://example.com/consumer_and_retail-products/home/ or http://example.com/consumer-retail_and_products/home/samsung-television and is going to be replaced by the next statement.”
This way, any page that is created that has anything after the URL slug consumer_and_retail_products/ is captured for use on the next statement.
Now let me explain this: http://example.com/consumer-retail-products/$1 [R=301,L]
$1 – This dictates that the captured string set on the former statement will be placed exactly here.
[R=301,L] – This dictates that the statement is a 301 redirect and is a Last rule (not linked to any other rule in the .htaccess file).
Thus, the URL http://example.com/consumer-retail-products/$1 is what I’ll be replacing the former statement with. In effect I’m saying:
“Replace the former statement with http://example.com/consumer-retail-products/(captured string) and make it a 301 redirect”
This completes my smart redirect statement. The company is extremely happy of the change and we got to keep our rankings without breaking a sweat.
This is just one very interesting use of Regular Expression for SEO. It saved me days and days and days of clerical work trying to change the URL. Hope you got something out of this.
If you did, leave me some love in the comments section, would you?
P.S: To learn more about the basics of RegEx, I found RegExOne to be especially useful.