Search Knowledge Base by Keyword
Screaming Frog for URL Auto Updates
Screaming Frog has made a lot of enhancements to the application that allow you to schedule crawls and export XML site maps that we can import into HREFLang Builder for setting up Auto Updates. It is not yet a perfect solution but is a perfect way to build or augment your source files. There are a few things you need to consider and suggestions for setting up before you get started.
Single Crawl or Multiple Crawls
Once you set up the format for your country and language files in HREFLang Builder it does not matter if the source files are in a single file or one for each country.
Which option you choose does depend on how your site is set up, memory of your computer and how you want to organize your data you need to decide how to crawl your site.
If you are using a .com and country/language folders it is easier to just let Screaming Frog run across all the versions and build a single file. However, larger sites, ccTLD’s and a computer without a lot of memory it makes more sense to create individual files for each country.
Screaming Frog Scheduler
With the introduction of the automated scheduling function, Screaming Frog became a relatively economical and valid way to ensure the URLs you provide for Hreflang Builder are valid.
Pro Tip: We set up ours with an individual country folder. Primarily since you cannot yet name your output – they are all sitemap.xml you need separate folders not to overwrite each market file. This allows us to export not only an XML site map but also the master file of URLs to use for our diagnostic work. See the example below.
Setup Folder Structures
Create a folder for each country/language version of the site.
Configuring Screaming Frog Crawls
There are many spider settings for Screaming Frog and suggest you read their help guides completely to understand the full power. You can exclude directories or parameters. Also, you can set unique User Agents, crawl depth, and speed limits.
The following are some of the quick settings that we use to get the cleanest output.
Spider Configuration
You can use all of your normal or the default configuration settings. To get you the best result we suggest the following:
Crawl Canonicals – With this option Screaming Frog will follow the canonical URL and make sure it loads as 200 and Indexable. If you also set the “Respect Canonical” option on the Advanced tab, Screaming Frog will exclude the non-canonical URL version from the export.
We want to ensure that we are submitting the final canonical version of the URL only so this will minimize non-canonical versions from being collected. Sometimes sites have tracking or feature parameters in the URL but will set a canonical to the root page and this ensures that the desired version is the one added to the source file.
On the Advanced tab, this is where you will have the most settings to configure so ensure these are checked. These force SF into valid pages.
- Always Follow Redirects
- Always Follow Canonical
- Respect No Index
- Respect Canonical
If you have parameters or session IDs or other items appended to URLs that do not have a correct canonical tag to exclude them suggest you use the powerful Exclude function in Screaming Frog to remove them.
We also suggest you import your XML sitemaps into the crawl so they can be validated as well.
Note: We assume you have followed generally accepted SEO Best Practices related to HTTP status codes, redirects, and canonical tags which will allow Screaming Frog to filter out problem URLs. If you have not, then you can assume there will be URLs with errors in your source files.
Configuring XML Site Maps
Click the Sitemaps tab in the header ribbon bar and open the top. Suggest that you leave all the options unchecked except for 2XX which will only add valid URLs.
My wish list item is that they also let us choose “Indexable” URL and that would be a double check
Saving your Configuration
To use the scheduler it is important that you save your configuration. You can set a master job for XML site map creation or unique filters for each country. By creating config files you can call them in using the scheduler.
Save your XML site map files with unique names representing the country and/or language it represents. Once you have your first set of files saved in Dropbox you can follow this process to add XML Site Maps from Dropbox into HREFLang Builder.