Time for the final part of my series on controlling a web browser. With code to load a browser, and the overarching State Machine to control it, this part finishes off with the code for some states to load a page and extract its markup. Plus a few conclusions...
Continuing from my previous post about firing up a browser in order to automate it, this post moves on to the overall pattern for how the browser can be controlled.
I bumped into an issue recently where I needed to write some code to scrape a bit of HTML. The usual .Net approach of using an
HttpClient
didn't work here - the web site in question made use of some client-side JavaScript to generate mark-up at runtime. So I needed a different approach to fetch the resulting HTML. A while back I'd written some code to
grab images of rendered HTML using the Chromium DevTools APIs, and I figured I could play a similar game here...