Time for the final part of my series on controlling a web browser. With code to load a browser, and the overarching State Machine to control it, this part finishes off with the code for some states to load a page and extract its markup. Plus a few conclusions...
Continuing from my previous post about firing up a browser in order to automate it, this post moves on to the overall pattern for how the browser can be controlled.
I bumped into an issue recently where I needed to write some code to scrape a bit of HTML. The usual .Net approach of using an
HttpClient
didn't work here - the web site in question made use of some client-side JavaScript to generate mark-up at runtime. So I needed a different approach to fetch the resulting HTML. A while back I'd written some code to
grab images of rendered HTML using the Chromium DevTools APIs, and I figured I could play a similar game here...
So while battling the jetlag that hit me pretty hard getting back from Sitecore Symposium, this issue came popped up in my bug queue last week. QA reported that a certain component on a test page was not allowing one field to be edited. It had worked in the past, but the behaviour suddenly changed so that one field no longer got the "you can edit this" overlay in Experience Editor. It took me longer than it should have to work out why...
My work sometimes involves picking up projects that were started by other developers / agencies and making changes or enhancements. Sometimes the approaches used by the original developers can make these enhancements harder than they need to be. The HTML, CSS and Javascript of a recent project I worked on caused some issues that I thought were worth calling out to try and help developers do better work in the future.
Quick (and not Sitecore specific) one today, as I've got a very busy week in the office and it's eating into my time for blog posts. To make a break from a few weeks of writing about navigation patterns, here's an idea about something I've found a surprising number of developers don't know about: "Protocol agnostic" or "Schemeless" URLs.