This is post 1 of 4 in a series titled Custom Sitemap Files
- Custom Sitemap files – Part One
- Custom Sitemap Files – Part Two
- Custom Sitemap Files – Part Three
- Custom Sitemap Files – Part Four
Sitemap files are a requirement for most websites these days. They help SEO by ensuring that search engines index the files and images they might not otherwise find, and that you think are most important. Whilst there are assorted pre-built add-ons for Sitecore that can help with this, that's no fun. It's much more fun to build your own...
Real work is getting in the way of blogging time at the moment, so I'm going to break up my investigations into this into three posts. This week I'll look some requirements, core configuration and overall algorithm. The next part will look at the core code. And the final one will look at adding image data to the sitemap files.
When I started looking at this, I had the following requirements to consider:
A common place to put configuration for extension modules like this is under
/sitecore/System/Modules
– so we'll create a "Sitemap" folder under here and make a note of its ID for later. Within that we're going to want to create an items for Sitemap Index files or Sitemap files that don't have an index. And they'll need templates.
The
Sitemap Index File
only needs configuration for its file name and the set of Sitemap files it's going to refer to. Filename is easy – that's just a single line text field. So we can create a simple template for
SitemapIndexFile
:
The
Sitemap Files
it's going to contain can be its children – so its insert options need to allow creating an instance of the
SitemapFile
template:
This one is a bit more complex. First of all it also needs a file name, but it requires some other settings too. When the publish operation runs you don't know what the "context" database is, so we'll need to record a reference to the database that you want the Sitemap file to be generated from. In this case I've made that a Droplist field that points to a set of database names. Most of the time you'd set this to "Web" (since that's the database which holds all the published data) however for testing purposes you might want to change this to "Master". Next the template allows editors to specify the root item that sitemap processing will start from. This allows configuring multiple sitemaps for different site roots, or for subsections of a website. The last two fields allow the editor to select a set of language versions that will be included in the output, and a set of templates that will be included. In both cases, selecting nothing here will be treated as "include all".
So, with these templates we can set up basic configuration for a couple of sitemap files:
Here "CustomSitemap" is a sitemap file on its own, where "TestIndexFile" is a sitemap index that contains one sitemap file.
The other thing that we need to create for configuration is a template to extend your web page items. We need to be able to specify the bits of configuration needed for individual pages:
A checkbox field SitemapInclude lets editors specify whether this item should be considered for sitemap processing or not. The SitemapPriority field lets them specify a relative value of how important this page is on your site, as per the sitemap schema. Finally, editors can choose a Droplist value for the expected change frequency of this page, as per the schema. Note the use of the "Shared" flag for these fields – the requirements I was thinking about needed these settings to be shared between all language versions of each page – but that might not be true for other sites and you might want to think about whether that's applicable in any work you do based on this approach.
This template can then be added to the template for the pages on your site:
Once all that config is defined, the basic behaviour for the code is as follows:
/sitecore/System/Modules/Sitemap
from the Master database.For each of the Sitemap files we need to process:
Nothing particularly difficult there – but it's quite a few things to do.
One area that can be done in a variety of ways is how to write the data for the sitemap files out to disk in the right schema. My first attempt at this code made use of the XML Serialisation infrastructure in the .Net framework. It worked, but it was quite fiddly and required quite a lot of mucking about to get the namespaces correct. It was also quite complicated to deal with "empty" attributes of an item in a sitemap file, so I ended up reverting to some custom code to write the file using the
XDocument
classes from Linq as this works more simply.
However for large sites this approach is not particularly scalable (It can generate a lot of in-memory objects). It would be better to write the data out directly via an
XmlTextWriter
. If you need to generate sitemaps for big files it would be sensible to consider this approach as an alternative.
So with all those configuration options available we can start on some code. Usual rules apply – I'm ignoring error handling and patterns like Glass for simplicity. Production code would include those things.
First of all, getting something to happen at publish time is simple – it just requires adding a handler to the
publish:end
event. To do that you must first define a class that will perform the action for this event. That class must have a method with the following signature:
namespace Testing.Sitemaps { public class Publisher { public void Publish(object sender, EventArgs args) { // Your code here... } } }
And that can be configured with a simple config patch:
<?xml version="1.0" encoding="utf-8" ?> <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> <sitecore> <events> <event name="publish:end"> <handler method="Publish" type="Testing.Sitemaps.Publisher" /> </event> </events> </sitecore> </configuration>
This will cause the
Publish()
method to be called every time someone triggers a publish on this server. But if your website has multiple web servers, you will need to enable the "Scalability Settings" configuration that ships disabled with Sitecore by default, and then trigger your
Publish()
method on the
publish:end:remote
event as well. This event is fired on all the other servers in your cluster at publish time.
So now our code will get triggered, we need to be able to configure a sitemap.
And we'll look at the code for that in part two...
↑ Back to top