I noticed the other week that the sitemap file my blog was generating included the urls, but none of the other metadata that they can report. To be honest, I'm not sure if search engines pay much attention to this these days, but since the schema for the files includes other options I decided to see if I could add them.
I spent a bit of time trawling through the
code for Statiq
to see how it generates sitemaps. There's
a pipeline defined
for generating them which does a few steps. It gathers documents and filters them based on the config for whether a sitemap should be generated and what pages should be included. And then it runs a module called
GenerateSitemap
which
does the work to format the data
for writing to the output. So that seemed like a good starting point.
Looking through this code, the
ExecuteContextAsync()
method is generating a list of strings where each one is a line in the resulting sitemap file. It goes through all the filtered documents passed in and calls
GetSitemapItemAsync()
to fetch an item, and then
AddSitemapItemContent()
to format it.
Looking at the formatting code, it clearly knows about the idea of the extra fields I was after:
private void AddSitemapItemContent(List<string> content, string formattedLocation, SitemapItem sitemapItem, HashSet<string> locations) { if (!formattedLocation.IsNullOrWhiteSpace() && locations.Add(formattedLocation)) { content.Add($"<url><loc>{formattedLocation}</loc>"); if (sitemapItem.LastModUtc.HasValue) { content.Add($"<lastmod>{sitemapItem.LastModUtc.Value.ToString("yyyy-MM-ddTHH:mm:ssZ")}</lastmod>"); } if (sitemapItem.ChangeFrequency.HasValue) { content.Add($"<changefreq>{ChangeFrequencies[(int)sitemapItem.ChangeFrequency.Value]}</changefreq>"); } if (sitemapItem.Priority.HasValue) { content.Add($"<priority>{sitemapItem.Priority.Value}</priority>"); } content.Add("</url>"); } }
That implies that the
SitemapItem
objects being passed to this method don't have values for the other fields. So why might that be? Well the document is getting transformed into a
SitemapItem
in
GetSitemapItemAsync()
so I looked at that for a bit, and stuck some breakpoints into the code to see what was happening.
The
beginning of that method
tries to transform the document into a
SitemapItem
:
// Try to get a SitemapItem object delegateResult = await __sitemapItemOrLocation.GetValueAsync.GetValueAsync(input, context); SitemapItem sitemapItem = delegateResult as SitemapItem ?? new SitemapItem((delegateResult as string) ?? context.GetLink(input));
But when I looked at this in the debugger as it processed my site, I could see that the call to
GetValueAsync()
here always returned null, so the new
SitemapItem
was always constructued by just passing in the URL for the page. And that seems to explain why the other fields are null for the formatting code above.
So to fix this, I need to change the way handles this transform from document to
SitemapItem
.
I spent more time looking at the code, and considered an approach which replaced
GenerateSitemap
by copying it and changing it's behaviour when the result of
_sitemapItemOrLocation.GetValueAsync()
was null to generate the right data. After poking around with that code for a while I realised this would work - but wasn't the simplest change possible. That class didn't look too friendly to inheriting and modifying due to some private fields and non-virtual methods. So it was going to require copying it into my project to modify rather than inheriting. And that made me think this was probably the wrong approach...
The existing pipeline creates an instance of
GenerateSitemap
using a parameterless consructor. That means a default
Config<object>
handler gets assigned to
_sitemapItemOrLocation
and it doesn't cope with mapping the
SitemapItem
with all the parameters I want. So I realised I could change the code more simply by creating the
GenerateSitemap
class by passing in a
Config<object>
type that did do the required mapping.
There are factory methods which help create instances of that
Config<object>
, so it's fairly easy to create a helper class which can create this mapping:
public static class SitemapLoaderConfig { public const string Frequency = "SitemapFrequency"; public const string Priority = "SitemapPriority"; public static Config<object> FetchSitemapMapper() { return Config.FromDocument<object>(doc => { var sitemapItem = new SitemapItem(doc.GetLink()); DateTime dt = doc.GetDateTime(WebKeys.Published); if (!string.IsNullOrWhiteSpace(doc.GetString(WebKeys.Updated))) { dt = doc.GetDateTime(WebKeys.Updated); } sitemapItem.LastModUtc = dt; var pr = doc.Get<double>(Priority, -1); if (pr == -1) { pr = doc.IsPost() ? 0.8 : 0.5; } sitemapItem.Priority = pr; var freq = doc.Get<string>(Frequency, "Weekly"); sitemapItem.ChangeFrequency = Enum.Parse<SitemapChangeFrequency>(freq); return sitemapItem; }); } }
So calling
FetchSitemapMapper()
here will create a mapper by calling
Config.FromDocument<object>()
and give it a function to transform the data. This creates the
SitemapItem
with the page url as the default code does, but then adds on the remaining properties. It takes data from the document where possible, and falls back to defaults if they're not provided. And these values can all be set in the metadata for the document.
The second part of the job is to plumb this new behaviour into the pipeline for sitemaps. Statiq makes this fairly easy. The bootstrap code which sets up the generator has a
ModifyPipeline()
method that allows you make changes. So I hacked up a quick extension method to make the change I needed:
public static class EnhancedSitemapExtensions { public static Bootstrapper AddEnhancedSitemap(this Bootstrapper bs) { return bs.ModifyPipeline("Sitemap", p => { var executeIf = p.PostProcessModules.First() as ExecuteIf; var ifCondition = executeIf.First() as IfCondition; ifCondition.ReplaceLast<Module>(new GenerateSitemap(SitemapLoaderConfig.FetchSitemapMapper())); }); } }
Looking at the pipeline data with the debugger showed me that the root of the Sitemap has an
ExecuteIf
object, which then contains an
IfCondition
that contains the modules for doing the filtering and generation. (Extracting these should really have some null checks, in case things change in the future - but I'll leave that as an exercise to the reader...) The final module in that
IfCondition
is the
GenerateSitemap
one - so that the one to replace with the new constructor behaviour. And it passes in the result of the helper to generate the mapper above.
And that helper can be used in the main bootstrap call:
public static async Task<int> Main(string[] args) => await Bootstrapper .Factory .CreateWeb(args) .AddEnhancedSitemap() .RunAsync();
And with that in place the site can be regenerated to test the new behaviour.
With those changes in place, the sitemap file data goes from the original:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://blog.jermdavis.dev/posts/2016/a-drink-from-the-gulp-firehose</loc> </url> ... snip ... </urlset>
to the updated:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://blog.jermdavis.dev/posts/2016/a-drink-from-the-gulp-firehose</loc> <lastmod>2016-08-25T00:00:00Z</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> ... snip ... </urlset>
Which is just what I was after, and now I can control the behaviour of the modified, frequency and priority values per-page as I need.
Success!
↑ Back to top