Jeremy Davis
Jeremy Davis
Sitecore, C# and web development
Article printed from: https://blog.jermdavis.dev/posts/2023/driving-browsers-2-statemachine

Driving browsers: #2 The state machine

Because sometimes reinventing the wheel is fun!

Published 06 November 2023
This is post 2 of 3 in a series titled Driving browsers

Continuing from my previous post about firing up a browser in order to automate it, this post moves on to the overall pattern for how the browser can be controlled.

Sending commands and receiving responses

The next step is to connect to the websocket at the WebSocketDebuggerUrl above, send a command and listen to responses. But when I sat down and played with this initially, one of the things I noted was that the responses which come back are not always the ones you're expecting, or in the order your expecting them in. The API replays lots of data to you - some the specific thing you're after and others which may not be relevant to the particular call you just made.

So the logic for driving the browser needs a way to send a command, and then handle responses appropriately until the correct response state occurrs. Now you could do this with a big old pile of methods and if statements. But that will quickly get unmaintainable in any reasonably complex "conversation" with the browser. So what's a better approach?

Well, design patterns to the rescue - the State Machine pattern fits a lot better here. That works roughly like this:

sequenceDiagram
  participant SM as State Machine
  participant B as Browser
  participant CS as Current State

  SM->>CS: Initialise the current State
  activate CS
  CS->>SM: Create a command to send
  SM->>B: Send the command
  activate B
  loop Process until State sees right data
    B->>SM: Send responses back
    SM->>CS: Process incoming response
  end
  deactivate B
  CS->>SM: Specify next state
  deactivate CS

					

There's an overall "State Machine" object which has a connection to a browser and a current State object to do processing for the point we're currently at in our flow. It initialises the state, and gives back a command to send to the browser. The browser sends back whatever responses the API generates, and the State Machine passes these to the current State to be processed. Eventually the State sees the response it wants, extracts the right data from it and finishes by telling the State Machine what the next State should be. And that whole process can continue until the State Machine has no more States to process.

So what does that look like in code?

Well we need a base type for a State first off:

public abstract class State
{
    public abstract Task Enter(StateMachine owner);
    public abstract Task Update(StateMachine owner, DebuggerResult data);
    public abstract Task Leave(StateMachine owner);
}

					

States do three things. When the State Machine first changes its current state it calls Enter() on the new State. That will deal with any setup and issuing the right command. Then every time the State Machine receives data from the WebSocket connection to the API it will call Update() and pass in the data received. And finally when the State Machine is told to change to another State it will call Leave() in case the outgoing state needs to tidy anything up. And since WebSocket operations are all async these methods return Task so they can be implemented as async later.

There will be a concrete State type for each of the different operations we need the debugger to perform. And the State Machine will assume all of these States store no internal data themselves, so they can have a static instance to make use of. (Which saves allocating new instances of the state classes every time we change state - which is a performance benefit in complex or repeated flows)

So the State Machine itself needs a class. The core of that is defining some fields:

public class StateMachine
{
    private State _currentState = NullState.Instance;
    private readonly BrowserConnection _connection;
    private readonly ClientWebSocket _ws = new();
    private readonly CancellationTokenSource _ct = new();
    private bool _running = true;

    public Dictionary<string, object> State { get; init; } = new Dictionary<string, object>();

    public StateMachine(State initialState, BrowserConnection connection)
    {
        _currentState = initialState;
        _connection = connection;
    }
}

					

Creating a StateMachine requires an initial State and the BrowserConnection data retrieved earlier.

Internally it needs storage for those, plus a ClientWebSocket object and a CancellationTokenSource to allow aborting communications with the web browser.

Once the StateMachine exists it can be started:

    public async Task Start()
    {
        ArgumentNullException.ThrowIfNullOrEmpty(_connection.WebSocketDebuggerUrl);

        await _ws.ConnectAsync(new Uri(_connection.WebSocketDebuggerUrl), _ct.Token);

        var _ = Receive().
            ContinueWith(t => Console.WriteLine($">RECEIVE EXCEPTION: {t.Exception?.Message}"),
            TaskContinuationOptions.OnlyOnFaulted);

        await _currentState.Enter(this);
    }

					

That connects the WebSocket (allowing for cancellation later if required), starts listening and then calls Enter() on the initial state.

The call to Receive() is one of those slightly odd async constructs that's worth a bit more explanation. We'll get to Receive() itself in a bit - but because it works with the WebSocket it has to be async to allow awaiting data. But we don't actually want to wait for it to complete here - no data's going to arrive until after we send a command. So we really want the code to continue straight to the Enter() call.

That can be achieved by calling Receive() without awaiting it. But that leads to two further things. First is you'll get a compiler warning for the ignored return value of Task - so assigning that to _ tells the compiler "please ignore and discard this value". And secondly it makes it harder to detect error conditions. Errors will happen in the background without doing anything. So the call to ContinueWith() tells the runtime that when the Task does complete, if its in an error state, do something with the error.

For simplicity this is just displaying the error - but real code would probably do something more sensible here...

The code for Receive() needs to gather the data sent back by the browser, and for each message it gets, send that information back to the current State for processing. That involves a couple of fields for buffering the incoming information:

    private readonly byte[] _byteBuffer = new byte[512];
    private readonly StringBuilder _messageBuffer = new();

    private async Task Receive()
    {
        while (_running)
        {
            _messageBuffer.Clear();
            var done = false;

            while (!done)
            {
                var result = await _ws.ReceiveAsync(_byteBuffer, _ct.Token);
                _messageBuffer.Append(Encoding.UTF8.GetString(_byteBuffer, 0, result.Count));
                done = result.EndOfMessage;
            };

            var json = _messageBuffer.ToString();

            try
            {
                var data = JsonSerializer.Deserialize<DebuggerResult>(json, Json.Options) ?? throw new ApplicationException("Unable to deserialise incoming data");

                if (data.Error != null)
                {
                    var _ = Error(data).
                        ContinueWith(t => Console.WriteLine($">ERROR HANDLER EXCEPTION: {t.Exception?.Message}"),
                        TaskContinuationOptions.OnlyOnFaulted);
                }
                else
                {
                    var _ = Update(data).
                        ContinueWith(t => Console.WriteLine($">UPDATE HANDLER EXCEPTION: {t.Exception?.Message}"),
                        TaskContinuationOptions.OnlyOnFaulted);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine($">Error: {ex.Message}");
            }
        }
    }

					

As long as the state machine is running, this will sit in a loop. For each iteration it'll clear out its _messageBuffer and then it will receive bytes from the WebSocket. Those messages can excede the buffer length, so it reads chunks of bytes, turns them into strings, and appends them to the _messageBuffer until the EndOfMessage flag is set.

Each time it gets a complete message, it will deserialise it. (using those default serialiser options again) If the received data says there was a browser-side error, it will trigger a handler for that:

    public async Task Error(DebuggerResult data)
    {
        Console.WriteLine($">>Error: {data.Error?.ToJsonString(Json.Options)}>>");
    }

					

(Again, you'd probably want better error handling here in real life)

If the browser response is valid, then it passes the data to the Update() handler. In both cases, these actions happen without waiting and if they raise errors, those will be acted on. (Same pattern as above with Receive() itself)

That Update() call when the data is valid does two things:

    public async Task Update(DebuggerResult data)
    {
        await _currentState.Update(this, data);

        if (_newState != null)
        {
            await _currentState.Leave(this);
            _currentState = _newState;
            await _currentState.Enter(this);

            _newState = null;
        }
    }

					

First it passes the data on to the Update() method of the current state, to allow for it to be processed. And secondly it looks to see if a request for a new State was recorded. If so, it exits the current state and enters the new one to enable that to be use for future processing. (So states can define a flow between them based on what data they see)

Individual states can signal the need for a new State by calling TransitionToNewState() passing the instance of the next State to use:

    private State? _newState = null;
    private readonly AutoResetEvent _waitFlag = new(initialState: false);

    public void TransitionToNewState(State state)
    {
        _newState = state;
        if (state == NullState.Instance)
        {
            _waitFlag.Set();
        }
    }

					

If the new State is the NullState this indicates the process is complete, and there are no more states to process. At this point we need a convenient way to flag to the program that we're done and there's no further reason to wait for any background operations. For that it sets an AutoResetEvent to indicate that the "waiting" code can no proceed and tidy up.

Otherwise, the new state takes over for the next call to Update().

The overall StateMachine needs one more method - one that can wait for the overall set of states to complete, as mentioned before. This method gets called by the code using the StateMachine so that it pauses until everying is done. It waits for the AutoResetEvent mentioned above, and once that gets set, it cancels the CancelationTokenSource, disposes of the WebSocket and cleans up:

    public async Task Wait()
    {
        _waitFlag.WaitOne();
        _waitFlag.Dispose();

        _running = false;
        _ct.Cancel();
        _ws.Abort();
        _ws.Dispose();
    }

					

Between the Start() and Wait() methods here we now have some basic flow control for the code which owns the StateMachine.

The code for a State needs access to one more method - one which can issue a command to the browser:

    public async Task SendCommand<T>(T command, int id = 0) where T : IDebuggerCommandProperties
    {
        var cmd = new DebuggerCommand<T>(command) { Id = id };
        var data = JsonSerializer.Serialize<DebuggerCommand<T>>(cmd, Json.Options);
        var bytes = Encoding.UTF8.GetBytes(data);
        await _ws.SendAsync(bytes, WebSocketMessageType.Text, true, _ct.Token);
    }

					

When called, this takes the parameters for a command, puts a wrapper around it, serialises the data, and sends it to the browser over the websocket. The interface for the properties is simple:

public interface IDebuggerCommandProperties
{
    string CommandName { get; }
}

					

All it does is define the name for the command we're sending. All the browser commands have text names, as layed out in Chromium's documentation. Individual command parameter classes can inherit this add the specific fields they need. Then the wrapper class goes around these properties:

public class DebuggerCommand<T> where T : IDebuggerCommandProperties
{
    public int Id { get; set; }
    public string Method { get; private set; }
    public T Params { get; private set; }

    public DebuggerCommand(T parameters)
    {
        Method = parameters.CommandName;
        Params = parameters;
    }
}

					

This allows SendCommand<T>() to serialises the data and have the naming and parameters come out in the right structure for the browser's API.

Part two wrap up

So with the overall pattern for controlling the browser in place, the next part of this series will look at the details of the states that the state machine above can make use of. That will include navigating to a specific page, and fetching its markup.

But if you can't wait, the code to go with this series is available on GitHub.

↑ Back to top