Jeremy Davis
Jeremy Davis
Sitecore, C# and web development
Article printed from: https://blog.jermdavis.dev/posts/2022/where-is-solr-living

Where is Solr living these days?

Lots of choices, some confusion...

Published 29 August 2022
Updated 30 August 2022

One thing we don't seem to be short of these days is options for deploying Solr. I've had to do a bit of thinking about this recently, as I draw up plans for a work project. So I figured I'd write a bit of it down because if I'm having to explain it to people, then chances are there are plenty of others out there in Internet Land who are finding themselves having to think about these issues too:

So where can it live?

In the traditional world of Sitecore XM/XP, broadly we have three choices:

  • IaaS Patterns: Install it on VMs or physical hardware
    This is where it all started off. The original deployment approach for all Solr/SolrCloud/Zookeeper installs was to deploy it directly to one or more servers. Historically they would have been physical - sitting in your rack cabinet somewhere - but these days it's much more likely to be a VM running in Azure or similar. The servers can be running Windows or Unix here. Broadly a Unix VM will require less infrastructure resource to deliver the same scale of Solr install. Solr is a Unix application natively, so if you have the skills to deploy and manage Unix VMs then you have the ability to reduce your infrastructure costs. But a lot of Sitecore people are mainly experienced with Windows, so that's not always a valid approach. For Windows users, I built a basic script library to help with these installs some time back. It will need some updates for the latest Solr versions by now, but the script may be of help if this is your approach.

    Pros: It's what the majority of people know. It's what most of the documentation describes for deployments. So it's probably the simplest way for many people to approach a deployment.

    Cons: It's the least flexible approach. All the work for maintenance and scaling is down to you, plus most of the work for deployment too. And in the event of problems, much of the work for fixing them is up to you too.

  • SaaS Patterns: Pay someone else to do it for you
    There are lots of things we outsource these days, so why not search too? There are a number of providers who will sell you Solr as a service. Generally you get to pick which cloud infrastructure provider you wish to have your instance based in, plus what server specs and scale you need, and the vendor will provision this for you. Once that's done there will be some further steps to deploy the relevant Sitecore config and indexes into the SolrCloud instance they've given you - but generally those which support Sitecore provide documentation or scripting for this. (The common provider for Sitecore use cases here is SearchStax)

    Pros: You don't need to worry about infrastructure yourself at all. There's a support process if you need help, or you have issues. So you need the least amount of knowledge to make it work.

    Cons: Likely the highest cost - outsourcing companies will charge you the cloud hosting costs plus a mark-up for their work and profit.

  • Container Patterns: The "modern" alternative is for hosting it yourself
    If you want to run it yourself, the modern approach is to deploy your instance of SolrCloud into containers. This puts the effort for setup and management back to you, but it gives you a lot more automation. This also fits more closely with modern deployment patterns for Sitecore itself. The simplest choice here is to run Solr using the Solr Operator toolset, which I wrote about recently. This provides automation and Kubernetes config files to quickly set up instances of SolrCloud. It makes use of Linux containers, which are generally lightweight and fairly easy to run. But you can run Solr under Windows containers too. That might be appropriate if your admin skills and knowledge for Windows are better than for Linux perhaps? But you will need to pay more compute resources. Sitecore provide a non-production Windows-based image - but you'd need to source an alternative if you want your live site to run in this way. Perhaps the key advantage here is that Kubernetes is very good for scaling things for you. You can configure it with the amount of CPU and memory resources you want to be available to individual Solr nodes, and you can also tell it to scale out to more nodes if required. That's much harder to achieve with IaaS patterns. It's also good at detecting issues with nodes and resetting them if necessary. Plus you also have the choice between running Kubernetes on your internal kit (on-prem) or out in the cloud with something like Azure Kubernetes Services.

    Pros: You're in control of all the power for automation, and scaling. And there's a lot of options for flexibility of hosting and config here.

    Cons: Even with tools like Operator there can be a big learning curve for running and managing containers.

Now there is a fourth option here, which is to run it under Azure PaaS App Services. I've left it out of the main list because this one's a bit controversial. There are various articles available on the internet which describe approaches for doing this - but they generally point out that this isn't a supported approach for Sitecore. You're likely to see some issues you need to work around (like this for example). That means that while it can be helpful for development or test instances, it's generally not likely to be a good option for production.

What about the future?

That's all well and good - but our world is being shaken up at present by the move to headless tech and Sitecore starting to offer their classic CMS in a SaaS style with XM Cloud. That has a pretty big effect on the "where to put Solr" issue, and you can think about this in two key areas:

  • For Content Management
    Based on what's been discussed publicly so far, when you click the "give me a new instance!" button for XM Cloud, it's going to spin up an instance of SolrCloud for the editing APIs to use. Which makes sense: In order to fire up an XM editing instance for you, all its dependencies need creating. So there's nothing for you to do here - it should all be automated for you.
    It's worth noting, however, that a key part of the XM Cloud concept is that none of the Content Management bit is exposed to general internet traffic. That means there's no concept of scaling here, as it's not expected to be under significant load. And fundamentally, that means you can't use the CM (and hence the Solr instance it uses) for Content Delivery tasks.

  • For Content Delivery
    The headless deployment model used by XM Cloud means there's a significant change here. There are no traditional Content Delivery servers used and all the patterns for deployment are changed from "normal" XM. The delivery "head" could be Vercel, a .Net Core App Service, or something else you set up. Content is deployed out to Experience Edge when it's published from the XM Cloud CM instance. This exposes APIs for providing data to your headless rendering host, and it does allow querying of data to some extent via GraphQL. But this is as close to "search" as Sitecore's core patterns get - but there's not actually any Solr here by default.
    If you want proper search in your website-head, over data you control then there are broadly two options here. You can provide something yourself - using whatever technology you want to build or procure that fits your requirements. (which might be your own use of SolrCloud - but it could be something entirely different) But you need to do the integration of your chosen search tech with your website-head's code yourself, and you need to manage its deployment and configuration too.
    Or if you want something more pre-packaged, you could look into Sitecore Search - the new composable product from their Reflektion aquisition. We don't know too many details of this yet (though I'm expecting to see some more detailed announcements around Symposium this year) but the broad concept is that this will be a composable/SaaS search product for content in much the same way that the Discover product is for e-commerce products. So in this scenario you're integrating some APIs rather than needing to host or run any search services. I'm interested to find out where the indexing plugs in to your website and its content here, as that is likely to have an impact on the use cases that this product will work well for.
    (And other providers are available too in the SaaS space for headless search - Pete Navarra reminds me that SearchStax have "Search Studio", to compliment their more traditional SaaS-Solr products mentioned above)

Plenty of choices, huh?

↑ Back to top