Visualizing your site structure in advance of a major change

In our last article, we looked at some interesting ways to visualize your website structure to illuminate how external links and PageRank flow through it. This time, we’re going to use the same tools, but we’re going to look instead at how a major site structure change might impact your site.

Search engine crawlers can determine which pages on your site are the most important, based, in part, on how your internal links are structured and organized. Pages that have a lot of internal links pointing to them — including links from the site’s navigation — are generally considered to be your most important pages. Though these are not always your highest-ranking pages, high internal PageRank often correlates with better search engine visibility.

Note: I use the phrase “internal PageRank,” coined by Paul Shapiro, to refer to the relative importance of each page within a single website based on that site’s internal linking structure. This term may be used interchangeably with “page weight.”

The technique I’ll outline below can be used to consider how internal PageRank will be impacted by the addition of new sections, major changes to global site navigation (as we’ll see below) and most major changes to site structure or internal linking.

Understanding how any major change to a site could potentially impact its search visibility is paramount to determining the risk vs. reward of its implementation. This is one of the techniques I’ve found most helpful in such situations, as it provides numbers we can reference to understand if (and how) page weight will be impacted by a structural adjustment.

In the example below, we’re going to assume you have access to a staging server, and that on that server you will host a copy of your site with the considered adjustments. In the absence of such a server, you can edit the spreadsheets manually to reflect the changes being considered. (However, to save time, it’s probably worth setting up a secondary hosting account for the tests and development.)

It’s worth noting that on the staging server, one need only mimic the structure and not the final design or content. Example: For a site that I’m working on, I considered removing a block of links in a drop-down from the global site navigation and replacing that block of links with a single text link. That link would go to a page containing the links that were previously in the drop-down menu.

When I implemented this site structure change on the staging server, I didn’t worry about whether any of this looked good — I simply created a new page with a big list of text links, removed all the links from the navigation drop-down, and replaced the drop-down with a single link to the new page.

I would never put this live, obviously — but my changes on the staging server mimic the site structure change being considered, giving me insight into what will happen to the internal PageRank distribution (as we’ll see below). I’ll leave it to the designers to make it look good.

For this process, we’re going to need three tools:

    Screaming Frog — The free version will do if your site is under 500 pages or you just want a rough idea of what the changes will mean.Gephi — A free, powerful data visualization tool.Google Analytics

So, let’s dive in…

Collecting your data

I don’t want to be redundant, so I’ll spare you re-reading about how to crawl and export your site data using Screaming Frog. If you missed the last piece, which explains this process in detail, you can find it here.

Once the crawl is complete and you have your site data, you need simply export the relevant data as follows:

Bulk Export > Response Codes > Success (2xx) Inlinks

You will do this for both your live site and your staging site (the one with the adjusted structure). Once you have downloaded both structures, you’ll need to format them for Gephi. All that Gephi needs to create a visualization is an understanding of your site pages (“nodes”) and the links between them (“edges”).

Note: Before we ready the data, I recommend doing a Find & Replace in the staging CSV file and replacing your staging server domain/IP with that of your actual site. This will make it easier to use and understand in future steps.

As Gephi doesn’t need a lot of the data from the Screaming Frog export, we’ll want to strip out what’s not necessary from these CSV files by doing the following:

Delete the first row containing “Success (2xx) Inlinks.”Rename the “Destination” column “Target.”Delete all other columns besides “Source” and “Target.” (Note: Before deleting it, you may want to do a quick Sort by the Type column and remove anything that isn’t labeled as “AHREF” — CSS, JS, IMG and so on — to avoid contaminating your visualization.)Save the edited file. You can name it whatever you’d like. I tend to use domain-live.csv and domain-staging.csv.

The third set of data we’ll want to have is an Export of our organic landing pages from Google Analytics. You can use different metrics, but I’ve found it extremely helpful to have a visual of which pages are most responsible for my organic traffic when considering the impact of a structural change on page weight. Essentially, if you find that a page responsible for a good deal of your traffic will suffer a reduction in internal PageRank, you will want to know this and adjust accordingly.

To get this information into the graph, simply log into Google Analytics, and in the left-hand navigation under “Behavior,” go to “Site Content” and select “Landing Pages.” In your segments at the top of the page, remove “All Users” and replace it with “Organic Traffic.” This will restrict your landing page data to only your organic visitors.

Expand the data to include as many rows as you’d like (up to 5,000) and then Export your data to a CSV, which will give you something like:

Remove the first six rows so your heading row begins with the “Landing Page” label. Then, scroll to the bottom and remove the accumulated totals (the last row below the pages), as well as the “Day Index” and “Sessions” data.

Note that you’ll need the Landing Page URLs in this spreadsheet to be in the same format as the Source URLs in your Screaming Frog CSV files. In the example shown above, the URLs in the Landing Page column are missing the protocol (https) and subdomain (www), so I would need to use a Find & Replace to add this information.

Now we’re ready to go.

Getting a visualization of your current site

The first step is getting your current site page map uploaded — that is, letting Gephi know which pages you have and what they link to.

To begin, open Gephi and go to File > Import Spreadsheet.  You’ll select the live site Screaming Frog export (in my case, yoursite-live.csv) and make sure the “As table:” drop-down is set to “Edges table.”

On the next screen, make sure you’ve checked “Create missing nodes,” which will tell Gephi to create nodes (read: pages) for the “Edges table” (read: link map) that you’ve entered. And now you’ve got your graph. Isn’t it helpful?

OK, not really — but it will be. The next step is to get that Google Analytics data in there. So let’s head over to the Data Laboratory (among the top buttons) and do that.

First, we need to export our page data. When you’re in the Data Laboratory, make sure you’re looking at the Nodes data and Export it.

When you open the CSV, it should have the following columns:

Id (which contains your page URLs)LabelTimeset

You’ll add a fourth column with the data you want to pull in from Google Analytics, which in our case will be “Sessions.” You’ll need to temporarily add a second sheet to the CSV and name it “analytics,” where you’ll copy the data from your analytics export earlier (essentially just moving it into this Workbook).

Now, what we want to do is fill the Sessions column with the actual session data from analytics. To do this, we need a formula that will look through the node Ids in sheet one and look for the corresponding landing page URL in sheet two; when it finds it, it should insert the organic traffic sessions for that page into the Sessions column where appropriate.

Probably my most-used Excel script does the trick here. In the top cell of the “Sessions” column you created, enter the following (the bolded numbers will change based on the number of rows of data you have in your analytics export).


Once completed, you’ll want to copy the Sessions column and use the “Paste Values” command, which will switch the cells from containing a formula to containing a value.

All that’s left now is to re-import the new sheet back into Gephi. Save the spreadsheet as something like data-laboratory-export.csv (or just nodes.csv if you prefer). Using the Import feature from in the Data Laboratory, you can re-import the file, which now includes the session data.

Now, let’s switch from the Data Laboratory tab back to the Overview tab. Presently, it looks virtually identical to what it had previously — but that’s about to change. First, let’s apply some internal PageRank. Fortunately, a PageRank feature is built right into Gephi based on the calculations of the initial Google patents. It’s not perfect, but it’s pretty good for giving you an idea of what your internal page weight flow is doing.

To accomplish this, simply click the “Run” button beside “PageRank” in the right-hand panel. You can leave all the defaults as they are.

The next thing you’ll want to do is color-code the nodes (which represent your site pages) based on the number of sessions and size them based on their PageRank. To do this, simply select the color palette for the nodes under the “Appearance” pane to the upper left. Select sessions from the drop-down and choose a palette you like. Once you’ve chosen your settings, click “Apply.”

Next, we’ll do the same for PageRank, except we’ll be adjusting size rather than color. Select the sizing tool, choose PageRank from the drop-down, and select the maximum and minimum sizes (this will be a relative sizing based on page weight). I generally start with 10 and 30, respectively, but you might want to play around with them. Once you’ve chosen your desired settings, click “Apply.”

The final step of the visualization is to select a layout in the bottom left panel. I like “Force Atlas” for this purpose, but feel free to try them all out. This gives us a picture that looks something like the following:

You can easily reference which pages have no organic traffic and which have the most based on their color — and by right-clicking them, you can view them directly in the Data Laboratory to get their internal PageRank. (In this instance, we can learn one of the highest traffic pages is a product page with a PageRank of 0.016629.) We can also see how our most-trafficked pages tend to be clustered towards the center, meaning they’re heavily linked within the site.

Now, let’s see what happens with the new structure. You’ll want to go through the same steps above, but with the Screaming Frog export from the staging server (in my case, domain-staging.csv). I’m not going to go make you read through all the same steps, but here’s what the final result looks like:

We can see that there are a lot more outliers in this version (pages that have generally been significantly reduced in their internal links). We can investigate which pages those are by right-clicking them and viewing them in the Data Laboratory, which will help us locate possible unexpected problems.

We also have the opportunity to see what happened to that high-traffic product page mentioned above. In this case, under the new structure, its internal PageRank shifted to 0.02171 — in other words, it got stronger.

There are two things that may have caused this internal PageRank increase: an increase in the number of links to the page, or a drop in the number of links to other pages.

At its core, a page can be considered as having 100 percent of its PageRank. Notwithstanding considerations like Google reduction in PageRank with each link or weighting by position on the page, PageRank flows to other pages via links, and that “link juice” is split among the links. So, if there are 10 links on a page, each will get 10 percent. If you drop the total number of links to five, then each will get 20 percent.

Again, this is a fairly simplified explanation, but these increases (or decreases) are what we want to measure to understand how a proposed site structure change will impact the internal PageRank of our most valuable organic pages.

Over in the Data Laboratory, we can also order pages by their PageRank and compare results (or just see how our current structure is working out).


This is just the tip of the iceberg. We can substitute organic sessions for rankings in the page-based data we import (or go crazy and include both). With this data, we can judge what might happen to the PageRank of ranking (or up-and-coming) pages in a site structure shift. Or what about factoring in incoming link weight, as we did in the last article, to see how its passing is impacted?

While no tool or technique can give you 100 percent assurance that a structural change will always go as planned, this technique assists in catching many unexpected issues. (Remember: Look to those outliers!)

This exercise can also help surface unexpected opportunities by isolating pages that will gain page weight as a result of a proposed site structure change. You may wish to (re)optimize these pages before your site structure change goes live so you can improve their chances of getting a rankings boost.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

5 local search tactics your competitors probably aren't using

Local SEO is competitive and fierce. With more and more local businesses vying for the Google local three-pack — and ads and online directories occupying a large percentage of the remaining SERP real estate — your local SEO strategy has to be aggressive.

So, what can you do to outrank your local competitors down the street, especially when you’ve all got the basics down? One approach is to use local SEO tactics that your competitors may not know about or aren’t using. Here are five local SEO tactics you can implement to help get ahead of your competitors.

Google Posts

First, every local business should claim their Google My Business (GMB) listing. It’s a must-do. Non-negotiable. If you don’t claim your Google My Business listing, you essentially don’t exist online! (Okay, that’s an exaggeration — but not claiming your GMB listing will significantly diminish your chances of showing up in local search results.)

Of your competitors who claim their Google My Business listing, most will just walk away and forget about it. However, claiming your listing and letting it sit there gathering dust is like purchasing a new home and not putting any furniture in it. There’s so much more you should do, and this is one way you can outsmart (and outrank) your competitors.

Google has insight into how you and your potential customers are engaging with your Google My Business listing — and generally speaking, the more activity, the better. Does someone use the click-to-call option on their smartphone? Is a potential customer asking you a question using the new Q&A feature? Did you answer the question? Are you updating your business hours for holidays? Are you uploading quality photos of your business or staff?

And are you utilizing Google Posts?

Google Posts are almost like mini-ads with a picture, description, offer, landing page URL and so on. You can create Posts that tell potential customers about a product or service, promote upcoming specials, offer holiday wishes, let customers know about an event you’re having, and more. Having an open house? Create a Post for that event. Offering a free report or white paper? Create a Post about that white paper and add the link to where people can go to download it.

Creating a Post is easy. Simply log in to your Google My Business dashboard, and to the left, you will see the Posts option. Click on it to get started creating your first Post!

Whether you’re creating a post about an upcoming event, sale, special offer, product or service, try to include keywords relevant to your business and city in the copy of the post. (It can’t hurt!) Make your post compelling so that people who see your GMB listing will want to click on the Post to learn more. (Remember, Google is watching those interactions!)

Once you’ve created your post, here’s how it will look on your Google My Business Listing:

To make sure that the Posts are timely, Google removes Posts after seven days (or, if you set up an event, the Post will be removed when the event date has passed). To keep you on your toes, Google will even send you email reminders when it’s time to create a fresh new Post.

Does creating a Google Post help your local rankings? The verdict’s not 100 percent in, but Joy Hawkins and Steady Demand did some research, and they found that Google Posts did appear to have a mild impact on rankings.

Check your Google My Business category

Speaking of Google My Business, selecting the best GMB category for your business can make a huge difference in how your business ranks on Google. If you find your competitors are leapfrogging ahead of you on the local three-pack, scope out what category their business is listed under — you may want to experiment with selecting that same category.

If matching your competitors’ categories doesn’t move the needle for you, try getting more granular. (Yes, this is a case of trial and error. You may need to test until you find the right category that will get you better visibility and/or more qualified leads.) See the example below, where one of my clients jumped up on Google rankings when we changed her category from the more general “Lawyer” category to a more specific category, “Family Law Attorney.”

It’s always best to choose the category that most accurately fits your business type. Sometimes, people select too many categories, which can “dilute” your specialty. Selecting the best category for your business is a strategy that may mean you fall before you rise — but once you find the “sweet spot,” you can outrank your competitors.

Apply URL best practices

URLs are an important part of your search engine optimization and user experience strategy. Not only do URLs tell your site’s visitors and search engines what a page is about, they also serve as guides for the structure of your website. Your URLs should be descriptive, user-friendly and concise. When appropriate, include keywords (like your city, the name of a product, the type of service and so on) in the URL.

If your website runs on a CMS, you may have to adjust the settings to ensure that your page URLs are SEO-friendly. For example, WordPress URLs have a default format of /?p=id-number, which does not adhere to SEO best practices and is not particularly user-friendly.

To fix this issue, you need to create a custom URL structure for your permalinks and archives. This improves the look of the URL for visitors and people that share your link, and it also allows you to add relevant and local keywords to a page’s URL.

To fix this WordPress default setting, log in to your WordPress dashboard and go to Settings and click on Permalinks:

There you will be able to change your setting to “Post Name.” Changing this setting will allow you to create SEO-friendly URLs like:

Please note that after you change the permalink structure on your website, you may need to create redirects from the old URLs to the new ones (assuming your CMS doesn’t do this automatically).

Make your site secure

If your site isn’t secure (i.e., not HTTPS), making it secure is something you should add to your to-do list. In January 2017, Google started showing “not secure” warnings for Chrome users on HTTP pages with a password or credit card field. And, as of October 2017, they’ve expanded this warning to display when users enter data on an HTTP page, and on all HTTP pages visited in Incognito mode.

Even worse, their goal is to eventually display this warning on all HTTP pages. With all the press about cyber-security and protecting your personal information online, seeing this “Not Secure” warning on your site could scare off potential customers. Google is essentially warning people not to visit your site. Since many people are apt to close a website if they see a security warning, that means you could be losing a lot of business.

The bottom line: If your site’s not secure, you could be losing business to competitors.

(For a primer on making the switch from HTTP to HTTPS, check out this guide by Patrick Stox: “HTTP to HTTPS: An SEO’s guide to securing a website.”)

There are immediate benefits to having a secure site, too. If you have a secure site, the https:// and the green locked padlock that appear next to your URL in Chrome will make your website seem more trustworthy than a competitor’s site that isn’t secure.

And, of course, Google has stated that secure sites receive a slight rankings boost. Though this boost is fairly minor, it could still give you an edge over a competitor, all else being equal.

Write quality content: End writer’s block

Not only does Google like fresh, relevant, high-quality content — your site visitors do, too.

When it comes to writing long-form content, however, some people freeze up with writer’s block. How can you determine what to write about in order to satisfy users and drive relevant traffic?

Rest easy. There are amazing tools out there that can help you find the most popular questions people ask about a particular topic, and these types of questions and answers make for great content fodder.

Each of these tools has a different algorithm they use to find popular questions that need answering, but many pull top-asked questions from Google, various user forums, Quora, e-commerce sites and more. Finding these questions and writing a piece of content that answers those questions can squash writer’s block — fast! Now you can write content that actually answers questions potential customers are really asking.

Here are just a few of the “content crushing” tools I use:

Question SamuraStorybaseAnswer the PublicBuzzSumo Question AnalyzerBlogSearchEngine.orgHubSpot’s Blog Topic Generator

Which local SEO tactics are YOU using to beat your competition?

I’d love to know what local tactics are giving you a competitive edge in rankings. Are you using any of the tactics above? Different ones? Let us know!

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

How to keep your staging or development site out of the index

One of the most common technical SEO issues I come across is the inadvertent indexing of development servers, staging sites, production servers, or whatever other name you use.

There are a number of reasons this happens, ranging from people thinking no one would ever link to these areas to technical misunderstandings. These parts of the website are usually sensitive in nature and having them in the search engine’s index risks exposing planned campaigns, business intelligence or private data.

How to tell if your dev server is being indexed

You can use Google search to determine if your staging site is being indexed. For instance, to locate a staging site, you might search Google for and look through the results or add operators like -inurl:www to remove any URLs. You can also use third-party tools like SimilarWeb or SEMrush to find the subdomains.

There may be other sensitive areas that contain login portals or information not meant for public consumption. In addition to various Google search operators (also known as Google Dorking), websites tend to block these areas in their robots.txt files, telling you exactly where you shouldn’t look. What could go wrong with telling people where to find the information you don’t want them to see?

There are many actions you can take to keep visitors and search engines off dev servers and other sensitive areas of the site. Here are the options:

Good: HTTP authentication

Anything you want to keep out of the index should include server-side authentication. Requiring authentication for access is the preferred method of keeping out users and search engines.

Good: IP whitelisting

Allowing only known IP addresses — such as those belonging to your network, clients and so on — is another great step in securing your website and ensuring only those users who need to see the area of the website will see it.

Maybe: Noindex in robots.txt

Noindex in robots.txt is not officially supported, but it may work to remove pages from the index. The problem I have with this method is that it still tells people where they shouldn’t look, and it may not work forever or with all search engines.

The reason I say this is a “maybe” is that it can work and could actually be combined with a disallow in robots.txt, unlike some other methods which don’t work if you disallow crawling (which I will talk about later in this article).

Maybe: Noindex tags

A noindex tag either in the robots meta tag or an X-Robots-Tag in the HTTP header can help keep your pages out of the search results.

One issue I see with this is that it means more pages to be crawled by the search engines, which eats into your crawl budget. I typically see this tag used when there is also a disallow in the robots.txt file. If you’re telling Google not to crawl the page, then they can’t respect the noindex tag because they can’t see it.

Another common issue is that these tags may be applied on the staging site and then left on the page when it goes live, effectively removing that page from the index.

Maybe: Canonical

If you have a canonical set on your staging server that points to your main website, essentially all the signals should be consolidated correctly. There may be mismatches in content that could cause some issues, and as with noindex tags, Google will have to crawl additional pages. Webmasters also tend to add a disallow in the robots.txt file, so Google once again can’t crawl the page and can’t respect the canonical because they can’t see it.

You also risk these tags not changing when migrating from the production server to live, which may cause the one you don’t want to show to be the canonical version.

Bad: Not doing anything

Not doing anything to prevent indexing of staging sites is usually because someone assumes no one will ever link to this area, so there’s no need to do anything. I’ve also heard that Google will just “figure it out” — but I wouldn’t typically trust them with my duplicate content issues. Would you?

Bad: Disallow in robots.txt

This is probably the most common way people try to keep a staging site from being indexed. With the disallow directive in robots.txt, you’re telling search engines not to crawl the page — but that doesn’t keep them from indexing the page. They know a page exists at that location and will still show it in the search results, even without knowing exactly what is there. They have hints from links, for instance, on the type of information on the page.

When Google indexes a page that’s blocked from crawling, you’ll typically see the following message in search results: “A description for this result is not available because of this site’s robots.txt.”

If you recall from earlier, this directive will also prevent Google from seeing other tags on the page, such as noindex and canonical tags, because it prevents them from seeing anything on the page at all. You also risk not remembering to remove this disallow when taking a website live, which could prevent crawling upon launch.

What if you got something indexed by accident?

Crawling can take time depending on the importance of a URL (likely low in the case of a staging site). It may take months before a URL is re-crawled, so any block or issue may not be processed for quite a while.

If you got something indexed that shouldn’t be, your best bet is to submit a URL removal request in Google Search Console. This should remove it for around 90 days, giving you time to take corrective actions.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

A site migration SEO checklist: Don’t lose traffic

Few things can destroy a brand’s performance in the search results faster than a poorly implemented site migration.

Changing your domain name or implementing HTTPS can be a great business move, but if you fail to consider how search engines will react to this move, you are almost certain to take a major hit in organic search traffic.

Use the following SEO checklist to prepare yourself as you develop a migration game plan for your website.

1. Carefully consider if migration is the right choice

A site migration will almost always result in a temporary loss of traffic — Google needs time to process the change and update its index accordingly. A carefully executed site migration can minimize traffic fluctuations, and in a best-case scenario, Google will ultimately treat the new site as if it were the original.

Still, that is only the best-case scenario. The reality is that site migrations, in and of themselves, typically offer little to no SEO benefit and do not eliminate search engine penalties. (That is why SEOs often use site migrations as an opportunity to make SEO improvements, like streamlining the site structure, fixing broken links, consolidating redundant pages and making content improvements.)

With all of that in mind, when is a site migration worth it?

When a strong rebranding is in order.When migration will generate press and links.When the site needs to be moved to HTTPS (one of the few cases in which migration alone offers an SEO gain).

2. Use a sandbox

Never do a site migration without first testing everything on a test server. Verify that the redirects work properly, and do all of the checks that follow in private before going public. Trying to do it all in one go without testing is bound to lead to errors, and if the mistakes are bad enough, they can set your site back by weeks.

3. Plan to migrate during a slow period

A well-planned and monitored migration shouldn’t permanently affect your traffic, but you should plan for a temporary dip. For that reason, it’s best to perform the migration during a slow part of the year, assuming that there is some seasonality to your site’s performance. A site migration during or shortly before the holidays is always a bad idea. While the goal should always be to avoid losing any traffic, it’s important to make sure that if you do lose traffic, you lose it when business is already slow.

4. Crawl your site before the migration

Crawl your site with a tool like Screaming Frog, and be sure to save the crawl for later.

You need to make sure you have a complete list of the URLs on your old site so that nothing ends up getting lost because of the transition.

Use this as an opportunity to identify any crawl errors and redirects that exist on the old site. These have a tendency to creep up over time. I rarely come across a site that doesn’t have at least some broken or redirected links.

You should absolutely remove or replace any links that point to 404 pages during the migration process. In addition, I highly recommend updating any links that point to redirected pages so that they point to the final page. You do not want to end up with redirect chains after the migration.

Remember that a site crawl may not be able to identify every single page on your site. For example, if you have pages that aren’t linked from other pages on your site, they won’t show up in a crawl. You can use your own records and databases to find these pages, of course, but if this isn’t possible, you can find these pages in your Google Analytics data, as well as through a link explorer like Ahrefs.

If you find any orphan pages, make sure to update the site, and link to these during the migration. These pages are much less likely to pick up search engine traffic if they aren’t linked to from the rest of your site.

5. Benchmark your analytics

Make a copy of your Google Analytics data; you will need this information so that you can quickly identify if any traffic is lost after the migration.

If any traffic is lost, export the Analytics data from your new site and run a side-by-side comparison with the data from your old site, so that you can identify precisely which pages lost the traffic. In many cases, a loss of traffic will be isolated to individual pages, rather than taking place across the entire site.

You may also want to identify and take note of your top linked-to pages using a tool like Ahrefs. After the migration, you will want to pay special attention to these pages and monitor them closely. If these lose traffic, it is a sign that the authority isn’t being properly transferred from your old site to the new one. These pages contribute the most to your authority, so losses here may affect the overall performance of your site.

6. Map all changed URLs from old to new

You should have a spreadsheet that lists every old URL and every new URL.

Ideally, during a site migration, all of the old pages exist on the new site. Obviously, removing a page removes its ability to capture search engine traffic. On top of that, dropping too many pages during the migration may lead Google to conclude that the new site isn’t the same as the old site, causing you to lose your rankings.

Also, ideally, the URL architecture should be identical to the old one unless you have very strong reasons to change it. If you do plan on changing it, a site migration may seem like the ideal time to do it, but you should be aware that doing so may cause Google to see it as an entirely different site. If you do both at the same time, you will not be able to determine whether any losses in traffic were the result of changing the architecture or of migrating the site.

Another reason to keep the architecture the same is that it allows you to use regex in your .htaccess file to easily redirect from your old pages to the new ones. This puts less load on your server than naming the redirects one by one, and it makes the process of setting up the redirects much less painful.

7. Update all internal links

The HTML links on your new site should point to the new site, not the old one.

This might sound obvious, but as you go through the process, you will quickly realize how tempting it might be to leave the links unchanged, since they will redirect to the new URL anyway. Do not succumb to this temptation. Apart from the server load, which slows down site performance, the redirects may dampen your PageRank.

The ideal way to rewrite the links is by performing a search and replace operation on your database. The operation should be performed so that it updates the domain name without changing the folder structure (assuming you’re keeping your site structure the same).

Write your search and replace operations carefully so that only text containing a URL is updated. You generally want to avoid updating your brand name and your URLs with the same search and replace operation.

8. Self-canonicalize all new pages

Verify that canonicalization on the new site references the new site and not the old. Canonicalizing to the old site can be disastrous, as it may prevent the new site from being indexed.

I recommend self-canonicalizing all of your pages on the new site (except, of course, for pages that should canonicalize to another page). In combination with the redirects, this tells Google that the new site is, in fact, the new location of the old site. Sitewide self-canonicalization is recommended anyway, since URL parameters create duplicate content that should always canonicalize to the parameter-free URL.

9. Resolve duplicate content issues

Various missteps during the migration process can result in duplicate content issues. Be aware of these issues, and take steps to avoid them:

If both multiple versions of a URL are published, it results in duplicate content. If self-canonicalization is put in place properly, this should take care of the issue, but I always recommend setting up redirect rules in .htaccess so that only one version of the page is accessible. Make sure that links are consistent to avoid redirects from internal links.IP addresses should redirect to URLs.Look out for folders that lead to the same content, especially “default” folders.Verify that only HTTPS or HTTP is used and that only the www or non-www version of the site is accessible. The others should redirect to the proper site.If your site has a search function, the search result pages should be noindexed.I mentioned this earlier, but self-canonicalization should be in place to avoid duplicate content created by URL query strings.

10. Identify and address any removed pages

I mentioned above that you should generally avoid removing any pages during the migration. If some pages simply must be removed for branding purposes, take the following steps:

Make a list of all the pages.Do not redirect the old pages to the new site.Remove all links from these pages.Remove the pages from the old site and allow them to redirect to 404.If there is a suitable replacement for the page, set up a redirect and change all of the links to point to the new page. You should only do this if the replacement page serves the same purpose as the old page.Do not redirect the removed pages to the home page (also called a “soft 404”). If there is no suitable replacement for a page, it should 404. A 404 is only an error if you link to the page.

11. Ensure that a custom 404 page is in place

A custom 404 page allows users to easily navigate your site and find something useful if they land on a page that no longer exists.

12. Manage and submit sitemaps

Keep your old sitemap in the Google Search Console, and add the sitemap for the new site as well. Requesting Google to crawl the old sitemap and discover the redirects is a good way to accelerate the process.

13. Keep analytics in place at all times

Install Google Analytics on the new domain and get it up and running well before you launch the site to the public. You do not want to have any missing data during the transition, and it’s important to watch for any changes in traffic during the migration.

14. Redirect all changed links

As mentioned above, the ideal way to set up your redirects is with a regex expression in the .htaccess file of your old site. The regex expression should simply swap out your domain name, or swap out HTTP for HTTPS if you are doing an SSL migration.

For any pages where this isn’t possible, you will need to set up an individual redirect. Make sure this doesn’t create any conflicts with your regex and that it doesn’t produce any redirect chains.

Test your redirects on a test server and verify that this doesn’t produce any 404 errors. I recommend doing this before the redirects go live on your public site.

Keep in mind that once the redirects go live, your site has effectively been migrated. The new site should be in pristine condition before setting up the redirects.

15. Keep control of the old domain

Unless the purpose of the migration was to sell the original domain, I would strongly advise against giving up control of the old domain. Ideally, the old domain should redirect to the new one, on a page-by-page basis, indefinitely. If those redirects are lost, all of the inbound links earned by the old site will also be lost.

Some industry professionals claim that you can give up control of the old domain once Google stops indexing it, but I would never advise doing this. While it’s possible that Google will attribute links pointed at the old site to the new one, even without the redirect, this is placing far more faith in the search engine then I would ever recommend.

16. Monitor traffic, performance and rankings

Keep a close eye on your search and referral traffic, checking it daily for at least a week after the migration. If there are any shifts in traffic, dive down to the page level and compare traffic on the old site to traffic on the new site to identify which pages have lost traffic. Those pages, in particular, should be inspected for crawl errors and linking issues. You may want to pursue getting any external links pointing at the old version of the page changed to the new one, if possible.

It is equally important to keep a close eye on your most linked pages, both by authority and by external link count. These pages play the biggest role in your site’s overall ability to rank, so changes in performance here are indicative of your site’s overall performance.

Use a tool like SEMrush to monitor your rankings for your target keywords. In some cases, this will tell you if something is up before a change in traffic is noticeable. This will also help you identify how quickly Google is indexing the new site and whether it is dropping the old site from the index.

17. Mark dates in Google Analytics

Use Google Analytics annotations to mark critical dates during the migration. This will help you to identify the cause of any issues you may come across during the process.

18. Ensure Google Search Console is properly set up

You will need to set up a new property in Google Search Console for the new domain. Verify that it is set up for the proper version, accounting for HTTP vs. HTTPS and www vs. non-www. Submit both the old and new sitemaps to solidify the message that the old site has been redirected to the new one.

Submit a change of address in the Google Search Console, request Google to crawl the new sitemap, and use “fetch as Google” to submit your new site to be indexed. It is incredibly important to verify that all of your redirects, canonicalizations and links are error-free before doing this.

19. Properly manage PPC

Update your PPC campaigns so that they point to the correct site. If your PPC campaigns are pointing to the old site, attribution will be lost in Analytics because of the redirect.

20. Update all other platforms

Update all of your social media profiles, bios you use as a guest publisher, other websites you own, forum signatures you use, and any other platforms you take advantage of, so that the links point to the new site and not the old.

21. Reach out for your most prominent links

Contact the most authoritative sites that link to you in order to let them know about the migration, and suggest that they update the link to point to the new website. Not all of them will do this, but those that do will help accelerate the process of Google recognizing that a site migration has occurred.

I wouldn’t recommend doing this with every single link, since this would be extremely time-consuming for most sites, but it is worth doing this for your top links.

22. Monitor your indexed page count

Google will not index all of the pages on your new site immediately, but if the indexed page count is not up to the same value as the old site after a month has passed, something has definitely gone wrong.

23. Check for 404s and redirects

Crawl the new site to verify that there are no 404s or 301s (or any other 3xx, 4xx, or 5xx codes). All of the links on the new site should point directly to a functioning page. The 404 and 501 errors are the biggest offenders and should be taken care of first. If there is a suitable replacement for a 404 page, change the link itself to point to the replacement, and verify that a 301 is in place for anybody who arrives at the missing page through other means.

The second-worst offenders are links to 301 pages that exist on the old site. Even though these redirect to the new site, the server load is bad for performance, and linking back to the old site may lead to confusion over the fact that a site migration has taken place. While all of the other efforts taken should clarify this to Google and the other search engines, these things are best never left to chance.

Any other 301s can be taken care of after this. Always update your internal links to point directly to the correct page, never through a redirect.

24. Crawl your old URLs

Use Screaming Frog or a similar tool to crawl all of your old URLs. Be sure to crawl a list of URLs that you collected before the migration, and make sure the list includes any URLs that were not discoverable by crawling. Do not attempt to crawl the site directly; the 301s will cause it to crawl only the first page.

Verify that all of the old URLs redirect to the new site. There should not be any 404s unless you removed the page during the migration process. If there are any 404s, verify that there are no links to them. If the 404s are not intended, set up a proper redirect.

Check the external URLs to verify that all of the redirects are functional. None of the external URLs should be 301s or 404s. A 301 in the external URLs is indicative of a redirect chain and is bad for performance. A redirect to a 404 will lead to a very frustrating experience for your users and may hurt your SEO in other ways.


If a site migration is carried out without taking SEO into account, you can almost bet on losing search engine traffic in the process. Other than clients who have approached me after being penalized by Google, the worst SEO predicaments I’ve come across were the ones caused during a site migration by professionals who didn’t consider how search engines would react to the process. Keep all of the above in mind if you are planning to migrate your site, and it should go off without a hitch.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

How we hacked the Baidu link submission script for better indexation

It is no secret that the Baidu Link Submission Script is an effective tool for surfacing links that are not quite visible to Baidu spiders. In Baidu’s own words (translated from Chinese):

The JavaScript snippet pushes links to Baidu directly; it serves all platforms, as well as both desktop and mobile devices. Your page is discovered by Baidu the moment of its first page view, which accelerates the progress of new content discovery.

By inserting the snippet into the source code of pages that you want Baidu to discover and index, it pushes the URL of the current page to Baidu automatically.

Baidu Link Submission Script

How the script works

The script simply checks the protocol of your page and selectively downloads a JavaScript (JS) file from Baidu. It copies the script in the file and creates a node of <script> on your page. The copied script then takes the URL of the current page, plus the referrer URL of the page, as the parameters when requesting a GIF file (a 1×1 pixel GIF for carrying the parameters) from Baidu.

Every time a visitor browses the page, the script is executed and Baidu will be notified. For example, if you were to drive a considerable amount of traffic to the pages, Baidu would know how important and popular the content is.

As you can see, the code snippet is straightforward and open to other configurations. For example, you may use it as an impression-tracking pixel. That gives us reason to believe that it won’t get high priority at Baidu. To be honest, we don’t think this will improve your ranking on Baidu. However, it does help the indexation of your site to some extent.

How we improved it

As a veteran of SEO, you may have noticed that submitting the URL of the current page does not follow the best practices, because:

the URL may not be the canonical URL you want the engine to index.the script makes an extra request which is unnecessary and potentially slows down the page load speed.furthermore, those two JS files are static and are mostly identical, except the URL of the GIF you are requesting.

In order to tackle those issues, my colleague Woody Chai and I have tweaked the Baidu script a bit. See the code snippet below.

The improved Baidu Link Submission Script by Merkle

In the code snippet above, we added a step to check if the canonical directive exists and hard-copied the script by merging those two JS files. Now, we can push the canonical URL to Baidu by only one HTTP(S) request.

Last thoughts

In this example of the improved Baidu Link Submission Script, we have demonstrated how we can make the script Baidu gave us more SEO-friendly. There is one thing you should keep in mind: The scripts (including the URL of the GIFs Baidu put in those JS files) may be changed in the future, though it hasn’t changed since day one. If you find any 404s of those resources, or any sign that the script is lagging the page load speed, you should go back to Baidu Webmaster Tools for updated solutions.

Code snippet

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Tips to troubleshoot your technical SEO

There are lots of articles filled with checklists that tell you what technical SEO items you should review on your website. This is not one of those lists. What I think people need is not another best practice guide, but some help with troubleshooting issues.

info: search operator

Often, [info:] can help you diagnose a variety of issues. This command will let you know if a page is indexed and how it is indexed. Sometimes, Google chooses to fold pages together in their index and treat two or more duplicates as the same page. This command shows you the canonicalized version — not necessarily the one specified by the canonical tag, but rather what Google views as the version they want to index.

If you search for your page with this operator and see another page, then you’ll see the other URL ranking instead of this one in results — basically, Google didn’t want two of the same page in their index. (Even the cached version shown is the other URL!) If you make exact duplicates across country-language pairs in hreflang tags, for instance, the pages may be folded into one version and show the wrong page for the locations affected.

Occasionally, you’ll see this with hijacking SERPs as well, where an [info:] search on one domain/page will actually show a completely different domain/page. I had this happen during Wix’s SEO Hero contest earlier this year, when a stronger and more established domain copied my website and was able to take my position in the SERPs for a while. Dan Sharp also did this with Google’s SEO guide earlier this year.

&filter=0 added to Google Search URL

Adding &filter=0 to the end of the URL in a Google search will remove filters and show you more websites in Google’s consideration set. You might see two versions of a page when you add this, which may indicate issues with duplicate pages that weren’t rolled together; they might both say they are the correct version, for instance, and have signals to support that.

This URL appendix also shows you other eligible pages on websites that could rank for this query. If you have multiple eligible pages, you likely have opportunities to consolidate pages or add internal links from these other relevant pages to the page you want to rank.

site: search operator

A [] search can reveal a wealth of knowledge about a website. I would be looking for pages that are indexed in ways I wouldn’t expect, such as with parameters, pages in site sections I may not know about, and any issues with pages being indexed that shouldn’t be (like a dev server). keyword

You can use [ keyword] to check for relevant pages on your site for another look at consolidation or internal link opportunities.

Also interesting about this search is that it will show if your website is eligible for a featured snippet for that keyword. You can do this search for many of the top websites to see what is included in their featured snippets that are eligible to try and find out what your website is missing or why one may be showing over another.

If you use a “phrase” instead of a keyword, this can be used to check if content is being picked up by Google, which is handy on websites that are JavaScript-driven.

Static vs. dynamic

When you’re dealing with JavaScript (JS), it’s important to understand that JS can rewrite the HTML of a page. If you’re looking at view-source or even Google’s cache, what you’re looking at is the unprocessed code. These are not great views of what may actually be included once the JS is processed.

Use “inspect” instead of “view-source” to see what is loaded into the DOM (Document Object Model), and use “Fetch and Render” in Google Search Console instead of Google’s cache to get a better idea of how Google actually sees the page.

Don’t tell people it’s wrong because it looks funny in the cache or something isn’t in the source; it may be you who is wrong. There may be times where you look in the source and say something is right, but when processed, something in the <head> section breaks and causes it to end early, throwing many tags like canonical or hreflang into the <body> section, where they aren’t supported.

Why aren’t these tags supported in the body? Likely because it would allow hijacking of pages from other websites.

Check redirects and header responses

You can make either of these checks with Chrome Developer Tools, or to make it easier, you might want to check out extensions like Redirect Path or Link Redirect Trace. It’s important to see how your redirects are being handled. If you’re worried about a certain path and if signals are being consolidated, check the “Links to Your Site” report in Google Search Console and look for links that go to pages earlier in the chain to see if they are in the report for the page and shown as “Via this intermediate link.” If they are, it’s a safe bet Google is counting the links and consolidating the signals to the latest version of the page.

For header responses, things can get interesting. While rare, you may see canonical tags and hreflang tags here that can conflict with other tags on the page. Redirects using the HTTP Header can be problematic as well. More than once I’ve seen people set the “Location:” for the redirect without any information in the field and then redirect people on the page with, say, a JS redirect. Well, the user goes to the right page, but Googlebot processes the Location: first and goes into the abyss. They’re redirected to nothing before they can see the other redirect.

Check for multiple sets of tags

Many tags can be in multiple locations, like the HTTP Header, the <head> section and the sitemap. Check for any inconsistencies between the tags. There’s nothing stopping multiple sets of tags on a page, either. Maybe your template added a meta robots tag for index, then a plugin had one set for noindex.

You can’t just assume there is one tag for each item, so don’t stop your search after the first one. I’ve seen as many as four sets of robots meta tags on the same page, with three of them set to index and one set as noindex, but that one noindex wins every time.

Change UA to Googlebot

Sometimes, you just need to see what Google sees. There are lots of interesting issues around cloaking, redirecting users and caching. You can change this with Chrome Developer Tools (instructions here) or with a plugin like User-Agent Switcher. I would recommend if you’re going to do this that you do it in Incognito mode. You want to check to see that Googlebot isn’t being redirected somewhere — like maybe they can’t see a page in another country because they’re being redirected based on the US IP address to a different page.


Check your robots.txt for anything that might be blocked. If you block a page from being crawled and put a canonical on that page to another page or a noindex tag, Google can’t crawl the page and can’t see those tags.

Another important tip is to monitor your robots.txt for changes. There may be someone who does change something, or there may be unintentional issues with shared caching with a dev server, or any number of other issues — so it’s important to keep an eye on changes to this file.

You may have a problem with a page not being indexed and not be able to figure out why. Although not officially supported, a noindex via robots.txt will keep a page out of the index, and this is just another possible location to check.

Save yourself headaches

Any time you can set up any automated testing or remove points of failure — those things you just know that someone, somewhere will mess up — do it. Scale things as best you can because there’s always more work to do than resources to do it. Something as simple as setting a Content Security Policy for upgrade-insecure-requests when going to HTTPS will keep you from having to go tell all of your developers that they have to change all these resources to fix mixed content issues.

If you know a change is likely to break other systems, weigh the outcomes of that change with the resources needed for it and the chances of breaking something and resources needed to fix the system if that happens. There are always trade-offs with technical SEO, and just because something is right doesn’t mean it’s always the best solution (unfortunately), so learn how to work with other teams to weigh the risk/reward of the changes you’re suggesting.

Summing up

In a complex environment, there may be many teams working on projects. You might have multiple CMS systems, infrastructures, CDNs and so on. You have to assume everything will change and everything will break at some point. There are so many points of failure that it makes the job of a technical SEO interesting and challenging.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

19 technical SEO facts for beginners

Technical SEO is an awesome field. There are so many little nuances to it that make it exciting, and its practitioners are required to have excellent problem-solving and critical thinking skills.

In this article, I cover some fun technical SEO facts. While they might not impress your date at a dinner party, they will beef up your technical SEO knowledge — and they could help you in making your website rank better in search results.

Let’s dive into the list.

1. Page speed matters

Most think of slow load times as a nuisance for users, but its consequences go further than that. Page speed has long been a search ranking factor, and Google has even said that it may soon use mobile page speed as a factor in mobile search rankings. (Of course, your audience will appreciate faster page load times, too.)

Many have used Google’s PageSpeed Insights tool to get an analysis of their site speed and recommendations for improvement. For those looking to improve mobile site performance specifically, Google has a new page speed tool out that is mobile-focused. This tool will check the page load time, test your mobile site on a 3G connection, evaluate mobile usability and more.

2. Robots.txt files are case-sensitive and must be placed in a site’s main directory

The file must be named in all lower case (robots.txt) in order to be recognized. Additionally, crawlers only look in one place when they search for a robots.txt file: the site’s main directory. If they don’t find it there, oftentimes they’ll simply continue to crawl, assuming there is no such file.

3. Crawlers can’t always access infinite scroll

And if crawlers can’t access it, the page may not rank.

When using infinite scroll for your site, make sure that there is a paginated series of pages in addition to the one long scroll. Make sure you implement replaceState/pushState on the infinite scroll page. This is a fun little optimization that most web developers are not aware of, so make sure to check your infinite scroll for  rel=”next” and rel=”prev in the code.

4. Google doesn’t care how you structure your sitemap

As long as it’s XML, you can structure your sitemap however you’d like — category breakdown and overall structure is up to you and won’t affect how Google crawls your site.

5. The noarchive tag will not hurt your Google rankings

This tag will keep Google from showing the cached version of a page in its search results, but it won’t negatively affect that page’s overall ranking.

6. Google usually crawls your home page first

It’s not a rule, but generally speaking, Google usually finds the home page first. An exception would be if there are a large number of links to a specific page within your site.

No, but that's commonly the first page we find from a site.

— John ☆.o(≧▽≦)o.☆ (@JohnMu) August 24, 2017

7. Google scores internal and external links differently

A link to your content or website from a third-party site is weighted differently than a link from your own site.

8. You can check your crawl budget in Google Search Console

Your crawl budget is the number of pages that search engines can and want to crawl in a given amount of time. You can get an idea of yours in your Search Console. From there, you can try to increase it if necessary.

9. Disallowing pages with no SEO value will improve your crawl budget

Pages that aren’t essential to your SEO efforts often include privacy policies, expired promotions or terms and conditions.

My rule is that if the page is not meant to rank, and it does not have 100 percent unique quality content, block it.

10. There is a lot to know about sitemaps

XML sitemaps must be UTF-8 encoded.They cannot include session IDs from URLs.They must be less than 50,000 URLs and no larger than 50 MB.A sitemap index file is recommended instead of multiple sitemap submissions.You may use different sitemaps for different media types: Video, Images and News.

11. You can check how Google’s mobile crawler ‘sees’ pages of your website

With Google migrating to a mobile-first index, it’s more important than ever to make sure your pages perform well on mobile devices.

Use Google Console’s Mobile Usability report to find specific pages on your site that may have issues with usability on mobile devices. You can also try the mobile-friendly test.

12. Half of page one Google results are now HTTPS

Website security is becoming increasingly important. In addition to the ranking boost given to secure sites, Chrome is now issuing warnings to users when they encounter sites with forms that are not secure. And it looks like webmasters have responded to these updates: According to Moz, over half of websites on page one of search results are HTTPS.

13. Try to keep your page load time to 2 to 3 seconds

Google Webmaster Trends Analyst John Mueller recommends a load time of two to three seconds (though a longer one won’t necessarily affect your rankings).

14. Robots.txt directives do not stop your website from ranking in Google (completely)

There is a lot of confusion over the “Disallow” directive in your robots.txt file. Your robots.txt file simply tells Google not to crawl the disallowed pages/folders/parameters specified, but that doesn’t mean these pages won’t be indexed. From Google’s Search Console Help documentation:

You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file. If you want to block your page from search results, use another method such as password protection or noindex tags or directives.

15. You can add canonical from new domains to your main domain

This allows you to keep the value of the old domain while using a newer domain name in marketing materials and other places.

16. Google recommends keeping redirects in place for at least one year

Because it can take months for Google to recognize that a site has moved, Google representative John Mueller has recommended keeping 301 redirects live and in place for at least a year.

Personally, for important pages — say, a page with rankings, links and good authority redirecting to another important page — I recommend you never get rid of redirects.

17. You can control your search box in Google

Google may sometimes include a search box with your listing. This search box is powered by Google Search and works to show users relevant content within your site.

If desired, you can choose to power this search box with your own search engine, or you can include results from your mobile app. You can also disable the search box in Google using the nositelinkssearchbox meta tag.

18. You can enable the ‘notranslate’ tag to prevent translation in search

The “notranslate” meta tag tells Google that they should not provide a translation for this page for different language versions of Google search. This is a good option if you are skeptical about Google’s ability to properly translate your content.

19. You can get your app into Google Search with Firebase app indexing

If you have an app that you have not yet indexed, now is the time. By using Firebase app indexing, you can enable results from your app to appear when someone who’s installed your app searches for a related keyword.

Staying up to date with technical SEO

If you would like to stay up to date with technical SEO, there are a few great places to do that.

First, I recommend you watch the videos Barry Schwartz does each week.Second, keep your eye on Search Engine Land.Third, jump on every blog post Google publishes on Google Webmaster Central.Finally, it is always a good idea to jump into a Google Webmaster hangout or simply watch the recording on YouTube.

I hope you enjoyed these 19 technical SEO facts. There are plenty more, but these are a few fun ones to chew on.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

3 ways to improve link equity distribution and capture missed opportunities

There’s a lot of talk about link building in the SEO community, and the process can be time-consuming and tedious. As the web demands higher and higher standards for the quality of content, link building is more difficult than ever.

However, few SEOs are discussing how to better utilize what they already have. There seems to be an obsession with constantly building more and more links without first understanding how that equity is currently interacting with the website. Yes, more links may help your website rank better, but your efforts may be in vain if you’re only recouping a small portion of the equity. Much of that work dedicated to link-building efforts would then be wasted.

For many websites, there is a big opportunity to improve upon the link equity that has already been established. The best part about all of this is that these issues can be addressed internally, as opposed to link building which typically requires third-party involvement. Here are some of my favorite ways to reclaim lost link value.

1. Redirect old URL paths

On client websites, I often see discontinued product pages that haven’t been redirected or entire iterations of old websites where almost all of the URLs are returning 404 errors. Leaving these pages broken leaves too much unused link equity on the table.

Finding old URL paths and 301 redirecting them can lead to huge wins in search engine visibility. In one fell swoop, you can reactivate the value of hundreds or even thousands of links that are pointing toward your domain.

So the question becomes, how can you surface these old URLs?

There are a few different methods I use, depending on the resources I have at hand. Occasionally, I’ve had clients who just went through a migration that moved their old website to a staging site. If this is the case, you should be able to configure Screaming Frog to crawl the staging environment (you may need to ignore robots.txt and crawl nofollow links). After the crawl is complete, simply export the data to a spreadsheet and use Find/Replace to swap out the staging domain with the root domain, and you should have a comprehensive list of old URL paths.

However, what if you don’t have access to any resources that list old URLs? For these situations, I use a combination of Ahrefs, Google Analytics and Google Search Console (credit to Dan Shure’s article on redirect chains, which helped me refine this process).

First, using Ahrefs, I’ll enter my domain, and then click the “Best Pages By Links” report.

From there, I export the entire report into an Excel file. It’s important that you export all of the URLs Ahrefs gives you, not just the ones it identifies as 404 errors. Ahrefs will only provide the initial status code the URL returns, which can be misleading. Often, I’ll see situations where Ahrefs identifies the status code as a 301, but the URL actually redirects to a 404.

Once I have my Excel file, I run the URLs through Screaming Frog using “List Mode” and export the 404 errors it finds into a master Excel document.

Next, I go to Google Analytics and navigate to the “Landing Pages” report. I’ll typically set the date ranges for as far back as the account tracks, but this varies for each situation. I’ll export all of the data it gives me to a spreadsheet and then add the domain name in front of the relative URL path using Excel’s CONCATENATE function.

I once again run this list through Screaming Frog and add the 404 errors it finds to the master document.

Finally, I log in to Google Search Console, open up the “Crawl Errors” report, and navigate to the “Not Found” tab. I export these URLs and confirm that they do, in fact, return 404 status codes by using Screaming Frog. I add these 404 pages to the master document.

Now there’s one master spreadsheet that contains all of the potential broken URLs in one place. De-dupe this list and run Screaming Frog in “List Mode” and export the URLs that return 404 status codes.

To help prioritize which URLs to redirect first, I connect Screaming Frog to the Ahrefs API, which will allow the crawler to gather the link metrics associated with each page. I sort that list by number of linking root domains and assign priority to the redirections that way.

After I have the final list of 404 errors, it’s simply a matter of identifying the destination pages on the client website each URL should redirect to. To scale this effort, I often use a combination of MergeWords and the OpenList Chrome extension.

2. Analyze the .htaccess file

When evaluating how your website distributes link equity, it’s important to understand how your global redirects are working as well. This is where the .htaccess file comes into play. In this file, you can see the syntax that instructs your website how to handle redirect rules.

When using a tool like Ahrefs, if I’m seeing common redirect patterns, this is a good sign that these rules are defined in the .htaccess file.

Often, I’ll see that the .htaccess file is causing 302 redirects that should be 301, pushing unnecessary redirects (causing redirect chains), or missing redirect rules that should be there. For instance, a common mistake I see are files that 302 redirect HTTP URLs to HTTPS instead of 301.

Each situation is entirely different, but here are some of the .htaccess rules I commonly look for:

“HTTP” to “HTTPS” rulesNon-WWW to WWW rulesURL capitalization rulesTrailing slash rules

There are many opportunities to better control the directives of the .htaccess file. If you’re noticing similar patterns of improperly configured redirects, it may be worth pulling this file and talking to your developers about how these issues can be fixed.

3. Fix internal 301 redirects

Now that you’ve accumulated as much link equity as possible from external sources, it’s time to ensure that your website is passing it efficiently internally. If your website has a bunch of internal 301 redirects, there’s a chance that your deeper pages may not be receiving as much link equity as they possibly could be. While Google claims there is no link equity lost in 3xx redirects, why leave this up to chance? I would rather be 100 percent sure that internal links are passing their full value throughout the website.

To identify these, I run Screaming Frog in “Spider Mode” on the domain being analyzed. Screaming Frog will crawl the website and gather instances of 301 redirects in the “Redirection (3xx)” report. If you want to determine the order of importance, sort this report by “Inlinks.” You will now see the pages that are internally 301 redirecting the most.

Often, these are instances of internal redirects in key areas such as the primary/secondary navigation, footer or sidebar links. This is great because with one change, you can eliminate a large quantity of these internal 301 redirects. While you’ll want to fix as many as possible, I recommend you start there.

Final thoughts

One thing I’ve learned during my time as an SEO is that webmasters are fantastic at diluting equity. Changes such as website migrations and previous URL redirects all have a large impact on link equity.

While in an ideal world link equity would be kept in mind during these implementations, that is often not the case. The above steps should serve as a good starting point to getting some of yours back.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

SEO 101: Which URL versions to add to Google Search Console

Google Search Console serves as an excellent (not to mention free) source of technical data about your website’s organic visibility and performance. To maximize its usefulness, it’s important to properly set up your website in Search Console by adding all versions of your domain as properties that you manage.

Let’s assume the domain name of the website is

The first step here is to add the following to Google Search Console as a new property:

Make sure to verify the domain name, preferably using the a TXT record or CNAME record in the DNS.

Google Search Console “Add a property” form

Next, add the www version as a property (even if it redirects to the non-www version):

In this case, both URLs above redirect to the HTTPS version of the website (learn how to move your website to HTTPS). That means that these variations will also need to be added as two separate properties in Google Search Console:

Note that you must specifically include “https://” when adding these two properties, which you did not have to do with the HTTP version. If no protocol is defined when adding a property to Google Search Console, it defaults to the HTTP-protocol.

At this point, the following URLs have been added to Google Search Console as properties, even if the HTTP versions do not serve any content and redirect fully to the HTTPS versions:

To summarize, for any website on its own domain and being served only from the HTTP-protocol, at a bare minimum, two versions of your domain need to be present in Google Search Console. For any website on its own domain and being served from the HTTPS protocol, at a bare minimum, four versions of your domain need to be present in Google Search Console.

Getting more data from Google Search Console

If the website has any subdomains, or language/country/content or otherwise specific subdirectories, it will be beneficial to add these properties separately in Google Search Console. Doing so will allow you to get more data, set geographic targets or define specific site maps. (Note that this also includes subdomains that are not meant for indexing, such as staging servers, or have no data available, such as an admin login subdomain.)

Let’s assume the website has two additional subdomains (blog and news), two language subdirectories (DE and EN), two content-specific subdirectories (product and amp) and a staging subdomain all on the HTTPS-protocol variation. This means that, in addition to the URLs above, the following additional URLs also need to be added as new properties in Google Search Console:

To be safe, it is best to also add the following as new properties in Google Search Console:

And to be extra, extra safe, the following (www versions) can also be added as new properties to Google Search Console:


Now, Google Search Console can provide additional specific and detailed search-related data, such as Search Analytics data, for each subdomain and subdirectory.

Making the data more useful

If all the URL variations mentioned above are added as properties, there are now 24 separate properties in Google Search Console, each one providing specific and valuable insights on how Google “sees” the website. So it may be hard to know which property to check for ranking data in Google Search Console Search Analytics. Luckily, Google added a new feature called “property sets” last year.

Google Search Console “Add a property set” screen

Property sets combine the data from several properties and present the data in a unified view. To create a property set, go to the Google Search Console and click “Create a set.” Next, give the set a name and add previously verified Google Search Console properties to the set.

There are various property sets you may find useful in terms of data segmentation; below are my suggestions for grouping properties together.

All data property set

To get one source for all ranking data in Google Search Console for the website, add all 24 properties to one property set (highly recommended):

English language data

To narrow the ranking data in Google Search Console for the English part of the website, group the following into another property set:

German language data

To narrow the ranking data in Google Search Console for the German part of the website, group the following into another property set:

News/blog data

To narrow the ranking data in Google Search Console for the news/blog part of the website, group the following into a property set:

Product page data

To narrow the ranking data in Google Search Console for just the product part of the website, group the following into a property set:

Keep track of staging URLs

To make sure none of the staging URLs are indexed, add the following to another property set:

Continue creating new property sets in Google Search Console if it makes sense for your business. Keep in mind that property sets do not show data retroactively — they only start collecting data from the moment they are created, and it can take several days before the first data becomes available for the user. Thus, creating a property set sooner rather than later is in the site owner’s best interest.

Just a start…

A great Google Search Console setup is just the first step towards maximizing your SEO efforts. It is an important one, though.

The sample data provided by Google can help improve your rankings, help Googlebot better understand the website and provide invaluable and otherwise unavailable insights into your organic visibility and performance. It is also possible to download sample data through an API, integrate the data with internal data and bring your SEO to the next level.

Adding the right properties to Google Search Console is a priority because you never know when your business may need the data. And it’s free — so what are you waiting for?

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Don't follow the leader: Avoid these 5 common e-commerce SEO mistakes

Competitive research is an important part of any SEO program — after all, it’s a zero-sum game that we’re playing. However, there is often a tendency for companies to become fixated on what dominant competitors in the marketplace are doing. The assumption is that because they’re getting the most SEO traffic, they must be doing things right.

In many industries, it is true that the high SEO traffic sites really are doing an exceptional job. But in the world of e-commerce, this is often not the case. Many of the highest traffic e-commerce sites are doing things that are objectively bad for SEO. It turns out that a strong backlink profile and other prominent brand signals can make up for an awful lot of mistakes.

Getting things right for enterprise e-commerce SEO can be really challenging. You often have to merge very different sources of product data into a single system and make everything work. There are more pages than you could ever curate manually. And in most cases, SEO is not the largest driver of traffic and may have to take a back seat to other priorities. It’s tough.

Eventually, people are going to figure out how to address the issues that make e-commerce SEO so cumbersome and hard to scale. Sites that apply these new techniques will gain an advantage, and then everyone will race to copy them and this article will be outdated. I believe that point is still some years away.

Until then, there are opportunities to gain an SEO advantage over most of the major e-commerce players by simply avoiding their most common mistakes.

1. Faceted navigation disasters

When faceted navigation isn’t controlled, you can often end up with more category URLs, by orders of magnitude, than total products on the site. Clearly, something is wrong with that picture.

On the other end of the spectrum, you have companies that are so scared of creating too many pages that they noindex their entire faceted navigation or canonical everything to the root page. Doing this can prevent indexation of potentially valuable pages (usually ones with two or one attributes selected) and it still may not fix the crawl problems that their navigation poses.

There is a middle path, and few try to walk it. While fixing your filtered navigation is an entire topic of its own, a good starting point is to consider using dynamic AJAX sorting for thin attributes, so users can refine the product set without changing the URL.

2. Slow site speed

There is plenty of readily available data about the impact of site speed on conversion and bounce rates. A couple of seconds can make an enormous difference in user engagement. So why do retailers seem to be competing to load the most external scripts? The retail market is underinvested in speed and overinvested in lag-inducing features that often have marginal benefits and may even serve to overwhelm the user.

My experience is that the SEO benefits of page speed are not yet as substantial as the conversion optimization impact. With all the information Google is sharing about the user benefits of fast, streamlined sites, it’s only a matter of time until speed becomes a more prominent ranking factor. However, when UX impact is also taken into account, there’s no reason to wait.

3. Reliance on XML sitemaps for indexation

If there is one simple piece of SEO wisdom that every enterprise manager should remember, it’s that each page needs to have a crawl path to have a chance to rank for competitive queries. There are many unique and exciting ways (from the perspective of someone who is paid to fix websites) that sites are able to orphan a large percentage of their product or other important pages from their browsable architecture.

Possibilities include broken pagination, creating nearly infinite URL spaces, and any form of link generation logic that doesn’t systematically ensure that every product has a crawl path.

If you’re unsure about whether you have an adequate crawl path, crawl your site and see if all your important pages are showing up. If you are not able to do a complete crawl of your site, that means either that you have too many pages or you need a better crawler. If you have a very large site, you likely need help with both. And if you’re spending lots of time looking at the sitemaps dashboard in Google Search Console, wondering why your pages aren’t being indexed, it’s most likely because they don’t have a good crawl path.

4. Using tags completely wrong

Many e-commerce sites have conflicting tagging signals on their category pages and tagging structures that are suboptimal. I have seen at least two Fortune 500 owned e-commerce sites that were making all the pages on their site canonical to the home page, which is equivalent to telling Google that none of the other pages on the site have anything else to offer. I have seen more sites than I can count on one hand do their pagination tagging incorrectly, which is surprising, because it’s a plainly spelled-out specification.

I suspect that Google’s assumed omniscience sometimes hinders the careful adoption of standards. People think they can get it close enough and Google will figure it out. Sometimes they do. Sometimes they don’t. And sometimes, even if Google can figure out all your mistakes, it’s still a loss — especially if they are having to crawl extra pages to do so.

5. Ugly URLs

Here’s a thought experiment. Let’s set SEO aside for a moment and look at two different URLs that we might see in a SERP:

Site 1:

Site 2:

Which site seems more likely to make things easy for their shoppers, and which site seems more likely to make things easy for themselves? What kind of conscious and unconscious assumptions might a shopper make about each?

My experience is that short, clear and concise URLs tend to rank well and get more traffic than long, parameter-laden addresses. There are some correlational studies that support this observation. I don’t consider any of them definitive — but I know what I would choose to do for my site.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.