Driving browsers: #1 The Browser [by Jeremy Davis]

This is post 1 of 3 in a series titled Driving browsers

Driving browsers: #1 The Browser

Driving browsers: #2 The state machine

Driving browsers: #3 The states

I bumped into an issue recently where I needed to write some code to scrape a bit of HTML. The usual .Net approach of using an HttpClient didn't work here - the web site in question made use of some client-side JavaScript to generate mark-up at runtime. So I needed a different approach to fetch the resulting HTML. A while back I'd written some code to grab images of rendered HTML using the Chromium DevTools APIs, and I figured I could play a similar game here...

Now, some of you are probably thinking "why not just use Selenium or Playwright for this?" and you're right - I absolutely could. But this is one of those places where I was writing the code for me (not work) so the learning experience of how to put this together was more interesting than reusing someone else's code. But YYMV...

So I set to work trying to make some useful (and perhaps reusable?) code for driving a browser and fetching the resulting markup using the Chromium APIs. Something that could work in a console app and be more flexible than the WPF control I'd used in my previous work... (I wanted this as a console app, as I needed the tool that would use this approach to be run from a scheduled task)

Find yourself a browser url copied!

I wanted this code to be able to work on a couple of machines - one which had Chrome installed and one which had Edge. But they're both based on the same engine, so this isn't too tricky. A factory class that could create the right browser object for a particular machine seemed a sensible apprach.

To make use of the Chromium developer tools APIs you need to be able to run the browser, so we need code that can find and execute the browser. The registry can tell us where a browser is, and we can make use of that data to work out what we can run. A base class for this might look like:

public abstract class BrowserDetector : IBrowserDetector
{
    public string Name { get; init; }
    public string AppFolder { get; init; } = string.Empty;
    public string AppExecutable { get; init; } = string.Empty;
    public bool Installed { get; init; } = false;

    public BrowserDetector(string name, string regKey)
    {
        Name = name;

        var k = Registry.LocalMachine.OpenSubKey(regKey);

        if (k == null)
        {
            return;
        }

        var exec = k.GetValue(string.Empty) as string;
        var path = k.GetValue("Path") as string;

        if (string.IsNullOrEmpty(exec) || string.IsNullOrEmpty(path))
        {
            return;
        }

        Installed = true;
        AppExecutable = exec;
        AppFolder = path;
    }
}

Given a registry key, it can decide if that browser is installed, and extract the appropriate folder and executable to use later. So for Edge, the concrete class might look like:

public class EdgeBrowser : BrowserDetector
{
    public const string RegKey = @"SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\msedge.exe";

    public EdgeBrowser() : base("Edge", RegKey)
    {
    }
}

Chrome just needs a different registry key and name. And a factory class can take a set of these objects, find the first one that's installed, and create an instance of it:

public static class BrowserFactory
{
    public static readonly IBrowserDetector[] Browsers = new IBrowserDetector[] { new ChromeBrowser(), new EdgeBrowser() };

    public static Browser Create()
    {
        foreach(var browser in Browsers)
        {
            if(browser.Installed)
            {
                return new Browser(browser);
            }
        }

        throw new ApplicationException("No browser detected - unable to create an instance.");
    }
}

This tries Chrome first and falls back to Edge, returning an object that describes the required browser - or throws if neither of those exist.

Connect to the browser url copied!

The Browser object mentioned above is going to act as a wrapper to Chromium for us. To talk to the DevTools API we need to spawn an instance with some specific command-line parameters, and then talk to that over WebSockets. So the first step is to take the data we got from BrowserDetector that matched above, and get ready to spawn a browser. That involves a this object holding a few bits of data:

public class Browser : IDisposable
{
    private static readonly HttpClient _client = new();

    public string Name { get; init; }
    public string Folder { get; init; }
    public string Executable { get; init; }
    public int DebuggerPort { get; set; } = 9222;
    public string UserFolder { get; set; }
    public string Arguments { get; set; } = "--new-window {0} --remote-debugging-port={1} --user-data-dir={2}";

    private Process? _process = null;

    public Browser(IBrowserDetector detector)
    {
        Name = detector.Name;
        Folder = detector.AppFolder;
        Executable = detector.AppExecutable;

        UserFolder = Path.Combine(Path.GetTempPath(), $"Browser-{detector.Name}");
    }
}

The constructor takes the matched detector and stores the info we got from the registry. It also computes a path for a temporary profile folder. If you don't give this to Chrome it will use the profile of the current user - which may or may not work for your scenario. I chose to keep it separate. Note the need to have a browser-type-specific folder here. I did some testing with both browsers on one machine and got odd problems if they didn't use separate temp folders. I guess their common engine means they save some similar data, but not similar enough to avoid problems sharing...

We also need to specify the port that we'll be connecting to the API on later, and a template for the command line parameters to be sent when starting an instance of Chromium.

And finally, we're going to need to make a data request via HTTP so we need an HttpClient here (we'll get to the WebSockets later) and we need to control the behaviour of some JSON serialisation too.

So then we need to execute the browser:

    public void Open(string initialUrl)
    {
        if (_process != null)
        {
            throw new ApplicationException("Browser process is already running.");
        }

        if (!Directory.Exists(UserFolder))
        {
            Directory.CreateDirectory(UserFolder);
        }

        var psi = new ProcessStartInfo()
        {
            WorkingDirectory = Folder,
            FileName = Executable,
            Arguments = string.Format(Arguments, initialUrl, DebuggerPort, UserFolder)
        };

        _process = Process.Start(psi);
    }

That does a test to see if we already have a browser process running, creates the temp folder if needed and then executes the browser with the right command line parameters. The properties discussed above are used to start the process and format the command line. The initialUrl is the page the browser will open to first up, but I'll get to navigating the browser about later on.

Once that process is started we should have an instance of Chromium, listening on the debugging port for connections.

So the next task is to connect to that port:

    public async Task<BrowserConnection> Connect()
    {
        var result = await _client.GetAsync($"http://localhost:{DebuggerPort}/json");
        var content = await result.Content.ReadAsStringAsync();
        var sessions = JsonSerializer.Deserialize<BrowserConnection[]>(content, Json.Options);

        if (sessions == null || sessions.Length < 1)
        {
            throw new ApplicationException("Did not get a valid debug session back from json endpoint");
        }

        return sessions[0];
    }

This uses the HttpClient to make a request to the /json endpoint exposed on the browser's port. That returns a blob of json describing the available debugging sessions we can connect to. The data returned can be deserialised (Json.Options here is some standard format settings for the serialiser, shared across all the classes) using this structure:

public class BrowserConnection
{
    public string? Id { get; init; }
    public string? Title { get; init; }
    public string? Url { get; init; }
    public string? WebSocketDebuggerUrl { get; init; }
}

And that gets us the data we'll need in a bit for executing commands against this session. (Note that the JSON here returns a few other properties including the page's FavIcon URL, a description, and the URL for the DevTools UI. But those aren't relevant to this process so I ignored them.

At this point we'll be able to issue commands to the debugger. But at some point we'll be finished with the browser process we're controlling here. So the Browser class is IDisposable and when it's disposed it will tidy up the process:

    public void Dispose()
    {
        if (_process != null)
        {
            _process.Kill();
            _process.Dispose();
        }
    }

So a using block gets us convenient lifetime management of the browser we're talking to.

Part one wrap up url copied!

That's enough code to get a browser up and running. In the next part of this series we'll make a start on how to to control the browser once it's started.

If you can't wait, the code to go with this series is available on GitHub.

↑ Back to top

Feel like sharing?
⇒ On BlueSky
⇒ On LinkedIn
⇒ On Mastodon
⇒ On Email

Driving browsers: #1 The Browser

Because sometimes reinventing the wheel is fun!

Find yourself a browser url copied!

Connect to the browser url copied!

Part one wrap up url copied!

Post Headings

Sitecore MVP 2015-2024

Recent Tags

Top Tags

Recent Months

Socials