There are certain "rules of programming" that I keep hearing about in my career. One that came up in an interesting work debate recently was "you should never use regular expressions to parse HTML". Don't get me wrong - there can be a lot of useful knowledge wrapped up in these rules, but should we always follow them to the letter? I think it's an interesting question...
I bumped into an interesting redirect-loop issue with a Sitecore instance sitting behind Azure Front Door recently. It's not a product I know a great deal about, so this seemed worth writing down in case I come across it again, or others bump into the same challenge. Turns out it wasn't a Sitecore-specific issue, but its definitely something which could affect other Sitecore sites...
There was a lot of interesting discussion at SUGCON NA and the MVP Summit towards the back-end of last year. I've got piles of notes I took about stuff that caught my attention over the course of those events. But out of all the sessions, one specific thing stuck out to me as a vision of our future as Sitecore developers. And it's a topic that's come up a few times in my conversations with people at work and in the general community. So it seemed like it was worth writing about...
The second idea on my "little things I'd meant to add to this blog for a while" list was reading time estimates. Like the reading progress indicator from before, this shouldn't be tricky, and in this case I wanted to write it down in case anyone else working with Statiq was interested in achieving something similar on their site.
I'd had the idea that I should add a "reading progress" indicator to my blog posts for a while now, and I finally got around to adding it the other weekend. What I'd assumed would be a five minute job had an interesting issue I thought I should document for others...
Time for the final part of my series on controlling a web browser. With code to load a browser, and the overarching State Machine to control it, this part finishes off with the code for some states to load a page and extract its markup. Plus a few conclusions...
Continuing from my previous post about firing up a browser in order to automate it, this post moves on to the overall pattern for how the browser can be controlled.
I bumped into an issue recently where I needed to write some code to scrape a bit of HTML. The usual .Net approach of using an
HttpClient
didn't work here - the web site in question made use of some client-side JavaScript to generate mark-up at runtime. So I needed a different approach to fetch the resulting HTML. A while back I'd written some code to
grab images of rendered HTML using the Chromium DevTools APIs, and I figured I could play a similar game here...
I wasted a few hours recently when I did something which seemed entirely reasonable with Rule-Based Config in Sitecore and it did not work the way I thought it would. Here's an explanation of what I did and what happened as a result, so you can avoid making the same mistake as me...
There aren't that many places where RSS gets used these days (Shame! It's still good!), but that doesn't stop the occasional requirement for it coming up in projects. Recently I was having some discussions about how a client's site could offer RSS for their content which included custom UTM codes in the feed links. That's not too tricky to achieve with Sitecore, so here's an example of what you might do.