Getting pipelines from config [by Jeremy Davis]

In my last post I was thinking about a more functional approach to defining pipelines, after having heard about some interesting new code that Sitecore had been working on. Since writing that I've had a few conversations where the topic "but what about if I want a pipeline to come from configuration?" has come up. I've been away from work for the last week doing my civic duty on Jury Service, but I've had some time in between court sessions that I've spent thinking about how last weeks ideas and configuration files might be combined.

So here's one way it could work:

First step is having some configuration... url copied!

A very simple structure to store a pipeline might look like this:

<pipeline name="example">
  <step type="StronglyTypedPipelines.DoubleStep, StronglyTypedPipelines" />
  <step type="StronglyTypedPipelines.ToStringStep, StronglyTypedPipelines" />
  <step type="StronglyTypedPipelines.DuplicateStep, StronglyTypedPipelines" />
</pipeline>

It's just a root element for the pipeline, containing steps. And each <step> element then defines the type which must be instantiated to process that bit of the pipeline.

In Sitecore's world, the config for pipelines is commonly "patched" by add-on modules and the code you deploy for your site. Based on XML like it would be possible to implement some sort of config patching process that allows the same sort of "insert-before" or "replace" options as Sitecore does. I'm not going to go into the detail of that for this post, however. But for our purposes we can assume that code exists somewhere that can deal with doing any modification of the raw configuration before we try to instantiate the pipeline.

So the next task is to load the XML and get it ready for processing:

Loading the XML url copied!

Ideally there shouldn't be any significant differences between a "loaded from config" pipeline and a "created by code" one. So the code can start from a new abstract class that inherits from our original pipeline type:

public abstract class ConfigBasedPipeline<INPUT, OUTPUT> : Pipeline<INPUT, OUTPUT>
{
}

The functional difference between config and code based pipelines is basically how they're initialised, so the logic to do the loading can be put into the constructor of this type. Since the config will be written in XML, this code can be based on receiving the XML that describes the pipeline being loaded, and we can assume that any patching required has been done before the point that our class receives it.

So the constructor might look like:

public ConfigBasedPipeline(XElement pipelineXml)
{
    if (pipelineXml == null)
    {
        throw new ArgumentNullException(nameof(pipelineXml));
    }

    var pipelineSteps = parsePipelineSteps(pipelineXml);
    validatePipelineSteps(pipelineSteps);

    PipelineSteps = input => processPipelineSteps(pipelineSteps, input);
}

It checks that we got a valid XML Element, tries to parse it into pipeline step objects and then tries to validate that these meet the type requirements of the pipeline. Finally we can use those objects to initialise the Func<INPUT,OUTPUT> which actually runs the pipeline process that we saw described in the code-first constructor from the previous post.

Parsing and validating url copied!

Parsing the XML into objects is fairly simple:

private IList<IPipelineStep> parsePipelineSteps(XElement pipelineXml)
{
    var pipeline = new List<IPipelineStep>();

    foreach (var xStep in pipelineXml.Elements("step"))
    {
        string typeName = xStep.Attribute("type").Value;

        var type = Type.GetType(typeName);
        var ctr = type.GetConstructor(Type.EmptyTypes);
        var obj = (IPipelineStep)ctr.Invoke(Type.EmptyTypes);

        pipeline.Add(obj);
    }

    return pipeline;
}

The result of parsing is going to be a list of pipeline steps. In the previous post, the interface defining a step had generic parameters for the input and output. That makes it a bit difficult to handle here, as we don't know it's type parameters until after it's been created – which in turn makes defining a list to hold it harder. So to make this easier, I went back and added a base interface without type parameters that all the steps inherit from:

public interface IPipelineStep
{
}

public interface IPipelineStep<INPUT, OUTPUT> : IPipelineStep
{
    OUTPUT Process(INPUT input);
}

That allows creating a simple List<IPipelineStep> to store the result of the configuration.

The steps themselves can be generated by looping through each step element and creating an object from the type attribute that they have. In the real world this code probably needs some tests to ensure that the elements and attributes are all correct for parsing - but that's left out here for clarity.

But since last time we were talking about pipeline steps which can change the type of the input, just getting the right set of objects isn't really enough here. We need to be sure that the data type going in will be accepted, and the right result type will be generated. It seems better to test that before we try to run the pipeline – so hence some validation is required.

That's handled by the validatePipelineSteps() method:

private void validatePipelineSteps(IList<IPipelineStep> pipelineSteps)
{
    int stepNumber = 0;

    Type pipelineBaseInterface = this.GetType().GetInterface("IPipelineStep`2");
    Type currentInputType = pipelineBaseInterface.GenericTypeArguments[0];
    Type outputType = pipelineBaseInterface.GenericTypeArguments[1];
    foreach (var step in pipelineSteps)
    {
        stepNumber += 1;

        Type stepBaseInterface = step.GetType().GetInterface("IPipelineStep`2");
        Type stepInType = stepBaseInterface.GenericTypeArguments[0];
        Type stepOutType = stepBaseInterface.GenericTypeArguments[1];

        if (currentInputType != stepInType)
        {
            string msg = "Step #{0} {1} input type {2} does not match current type {3}.";
            throw new InvalidOperationException(string.Format(msg, stepNumber, step.GetType().Name, stepInType.Name, currentInputType.Name));
        }
        currentInputType = stepOutType;
    }
    if (currentInputType != outputType)
    {
        string msg = "Final step #{0} {1} output type {2} does not equal output of pipeline {3}.";
        throw new InvalidOperationException(string.Format(msg, stepNumber, pipelineSteps.Last().GetType().Name, currentInputType.Name, outputType.Name));
    }
}

This needs to iterate through each of the objects we generated from the XML and check it's inputs and outputs. It starts by looking at the type parameters for the overall pipeline object. Fetching the base generic pipeline step interface allows us to work out what the type parameters for the input and output of the overall pipeline is. So we can assume that the first step we encounter needs to have an input that is the same type as the overall pipeline.

Then the code can loop through each subsequent step, checking that the output type of one step matches the input type of the next. Again, it can use the type parameters of the base step interface to work this out. And finally, the output of the very last step must match the output of the overall pipeline.

If all of that matches up then all is well. Otherwise, the code raises exceptions that try to specify which step has been found to be incorrect and why.

And executing the steps... url copied!

If the steps validate, all that remains is to provide the code which can execute them. We can't easily express the `Func` from the previous post because we don't know the actual step objects involve this time around. But we can iterate them and make use of reflection to get them to do their job anyway:

private OUTPUT processPipelineSteps(IList<IPipelineStep> pipelineSteps, INPUT input)
{
    object output = input;

    foreach (IPipelineStep step in pipelineSteps)
    {
        MethodInfo mi = step.GetType().GetMethod("Process", BindingFlags.Public | BindingFlags.Instance);
        output = mi.Invoke(step, new[] { output });
    }

    return (OUTPUT)output;
}

We need to define the output as object as we know its type is going to change over the course of the execution. It starts out with the value of the input and then, for each step in the pipeline data we can use the reflection API to get a reference to the Process() method we know a pipeline step will define. And this can then be invoked by passing in the current state of the pipeline and recording the result as our new output state. Finally we know that the object holding the result state must have the same type as OUTPUT (since we validated the steps earlier) so we know we're safe to cast to that to return the "right" data.

Finally, we need to wrap up out abstract config-based pipeline class in a concrete type that specifies what the input and output are:

public class ExampleConfigBasedPipeline : ConfigBasedPipeline<int, string>
{
    public ExampleConfigBasedPipeline(XElement pipelineXml) : base(pipelineXml)
    {
    }
}

With that done, a config-based pipeline can be called with fairly similar code to the code-based ones:

var input = 13;

XDocument xd = XDocument.Load("ConfigBasedPipeline.xml");
//
// Patching the configuration data would go here
//
XElement pipelineXml = xd.Document.Element("pipeline");

var pipeline = new ExampleConfigBasedPipeline(pipelineXml);

var output = pipeline.Process(input);

Other than the need to fetch (and potentially patch) the XML to create the pipeline, this can behave in the same way as the code-first approach.

In conclusion... url copied!

So it turns out (if you have some XML patching code to use) you can have a simple approach to config-based pipelines fairly easily. The need for the data type passing through to vary does require some extra code, but it's not particularly complex.

One issue that I've not addressed here is creating the concrete pipeline object based on configuration data, rather than manually creating it as shown above. I'm not entirely sure whether real code would need that or not – since code usually wants to run a specific pipeline, and hence would know what concrete type it needed anyway. But if it was necessary, that could be done by putting the type data into the XML so that a factory method could instantiate the correct object.

I've put the source for this bit of experimentation into a gist in case anyone wants to tinker with it.

↑ Back to top