I bumped into an issue recently where I needed to write some code to scrape a bit of HTML. The usual .Net approach of using an
HttpClient
didn't work here - the web site in question made use of some client-side JavaScript to generate mark-up at runtime. So I needed a different approach to fetch the resulting HTML. A while back I'd written some code to
grab images of rendered HTML using the Chromium DevTools APIs, and I figured I could play a similar game here...
I wasted a few hours recently when I did something which seemed entirely reasonable with Rule-Based Config in Sitecore and it did not work the way I thought it would. Here's an explanation of what I did and what happened as a result, so you can avoid making the same mistake as me...