I've got a project on the cards that I'd like to use docker containers for, but we're talking about using SolrCloud for search. Right now, there isn't a SolrCloud container in the Sitecore community container repo. So I started thinking about what would it take to make one.
At their core Solr and SolrCloud are the same software, with some different configuration settings and data storage. So a key part of getting a single-node instance of SolrCloud to run for a developer is adding an extra command-line parameter when you start it up. More complex, however, is creating the indexes Sitecore will need. Ordinary Solr uses Cores, and you can create them easily just by copying files. But SolrCloud uses Collections – which are made up of both cores and data stored by ZooKeeper. Because the data is split out, you can't easily just drop files to make new collections – you need to use some sort of API.
As an aside, if you're new to SolrCloud you might want to watch (or read) my Symposium presentation about getting started with SolrCloud and deploying it to production. That explains more about why we have collections, and how they help you.
What this means is to make SolrCloud work, we're going to need to replace the process that the existing scripts use to create the default set of indexes. I've already spent a load of time on some scripts that can automate creating SolrCloud collections as part of the presentation linked to above – so those seem like a good starting point...
The first thing I noticed was that the base image for the existing Solr container is the standard Microsoft Nanoserver image. That's small – but it doesn't include PowerShell. While I think it would be possible to do this setup using batch files, it would be a lot of effort to re-write my SolrCloud scripts to avoid PowerShell. So the easy answer for me is to find a base image that does have the scripting engine included. Microsoft offer a PowerShell-on-top-of-Nanoserver image which seems ideal for this purpose. Swapping over is easy: just change the arguments passed into the Dockerfile that builds the base for the Solr container. That lives in the "build.json" file for the Java Runtime image that Solr sits on top of:
{ "tags": [ { "tag": "sitecore-openjdk:8-nanoserver-${nanoserver_version}", "build-options": [ "--build-arg BUILD_IMAGE=mcr.microsoft.com/windows/servercore:${windowsservercore_version}", "--build-arg BASE_IMAGE=mcr.microsoft.com/powershell:nanoserver:${nanoserver_version}" ] } ], "sources": [] }
So now when that container builds, PowerShell will be available for use.
In the existing scripts, the core creation is in two parts. The Dockerfile for the Solr image creates the set of empty core files. And then the entrypoint script that runs when the container starts can copy those files into the Solr folders if no cores exist. So the first step in removing this behaviour is to strip out the bit of the Dockerfile that's creating these base files.
In the Dockerfile, the change is to remove this bit:
RUN New-Item -Path 'C:\\clean' -ItemType Directory | Out-Null; ` Copy-Item -Path 'C:\\solr\\server\\solr\\*' -Destination 'C:\\clean' -Force -Recurse; ` # $env:CORE_NAMES -split ',' | ForEach-Object { ` # $name = $_.Trim(); ` # $schema = @{$true=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_XDB_NAME);$false=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_DEFAULT_NAME)}[$name -like '*xdb*']; ` # Copy-Item -Path 'C:\\clean\\configsets\\_default\\conf' -Destination ('C:\\clean\\{0}\\conf' -f $name) -Recurse -Force; ` # Copy-Item -Path $schema -Destination ('C:\\clean\\{0}\\conf\\managed-schema' -f $name); ` # Set-Content -Path ('C:\\clean\\{0}\\core.properties' -f $name) -Value ('name={0}{1}config=solrconfig.xml{1}update.autoCreateFields=false{1}dataDir=data' -f $name, [Environment]::NewLine); ` # New-Item -Path ('C:\\clean\\{0}\\data' -f $name) -ItemType Directory | Out-Null; ` # }; ` Remove-Item -Path 'C:\\clean\\README.txt'; ` Remove-Item -Path 'C:\\clean\\configsets' -Recurse;
(It's commented out here, because it's easier to show the change that way – this can be deleted)
Removing the start-up behaviour that copies these files is easy – because the next step is going to replace the entrypoint anyway...
The simplest way to get Solr to start up in SolrCloud mode for a developer is to add "-c" to the command line for running it. That could be done in the
entrypoint file that already exists
– but that's
a batch file, and we need PowerShell here. So instead, lets change the Dockerfile have a PowerShell entrypoint. If we name this file "Boot.ps1" to match the pattern, it can get copied in. There's a second PowerShell script that includes the business for creating collections later. Plus we need to make the container fire up PowerShell and run the new boot script with the appropriate parameters. That all happens at the end of the Dockerfile:
...snip... EXPOSE 8983 COPY Boot.ps1 . COPY MakeCollections.ps1 . CMD ["c:\\program files\\powershell\\pwsh.exe", "-f", "Boot.ps1", "c:\\solr", "8983", "c:\\clean", "c:\\data"]
The Boot.ps1 file needs to do some of the same stuff that the old batch file did, and we'll extend it. First up, it needs to receive the parameters:
param( [string]$solrPath, [string]$solrPort, [string]$installPath, [string]$dataPath )
The "copy files" behaviour from the original script can stay, but in PowerShell flavour – and it's doing something with lock files too:
$solrConfig = "$dataPath\solr.xml" if(Test-Path $solrConfig) { Write-Host "### Config exists!" } else { Write-Host "### Config does not exist, creating..." Copy-Item "$installPath\\*" "$dataPath" -force -recurse } Write-Host "### Preparing Solr cores..." Push-Location $dataPath if(Test-Path "write.lock") { Write-Host "### Removing write.lock" Remove-Item "write.lock" -Force } Pop-Location
And finally, fire up Solr in Cloud mode:
Write-Host "### Starting Solr..." & "$solrPath\bin\solr.cmd" start -port $solrPort -f -c
So earlier, the Dockerfile copied in a second PowerShell script. That is largely
my original collection creation script from my SolrCloud scripting. But I've made a couple of changes. First up, is that the Sitecore containers are setup without SSL for Solr – so all the API endpoints in the script need to change from
https://
to
http://
. That's a quick search and replace operation. And the secondly, it needs to include the logic to decide what to do when it's run.
The logic starts by waiting for Solr to be running:
# wait for it to start Write-Host "### Waiting on $solrPort..." Wait-ForSolrToStart "localhost" $solrPort Write-Host "### Started..."
Because SolrCloud uses both disk and Zookeeper for storing data, we need to be able to call API endpoints to set up collections, so the rest of the script has to wait for Solr to be running. Then we need to check to see if there's any work to do. This should really get refactored out to a function, but it asks the API how many collections exist at present:
# check for collections - /solr/admin/collections?action=LIST&wt=json $url = "http://localhost:$solrPort/solr/admin/collections?action=LIST" $response = Invoke-WebRequest -UseBasicParsing -Uri $url $obj = $response | ConvertFrom-Json $collectionCount = $obj.Collections.Length Write-Host "Collection count: $collectionCount"
If there are no collections, then we need to create them. Otherwise, we can assume it's already been done and skip this bit. If we do have to create them, the that calls the function I wrote when I was originally automating SolrCloud. This also needs a bit of further hacking, as the set of collections and aliases created is fixed right now and it should be configured by data passed in from the json config data – but it does the job for a demo:
if($collectionCount -eq 0) { Write-Host "Need to create" Configure-SolrCollection "c:\clean" "localhost" "$solrPort" 1 1 1 "" } else { Write-Host "Already exists" }
The last step is making sure this script runs. This bit was a bit of a head-scratcher. The script needs to run in parallel with Solr, but Solr has to run with the "-f" flag, which causes it to keep going forever. So the collection creation has to execute before Solr is started, but needs to not block execution of the startup. And it also needs to be able to send its output back to the Docker console, if we're running connected. After a bit of hackery I settled on PowerShell's
Start-Process
command. That runs the script in parallel, but doesn't wait for it to end. So "Boot.ps1" can be updated:
Write-Host "### Starting Solr..." Start-Process "c:\\program files\\powershell\\pwsh.exe" -ArgumentList "-f",".\MakeCollections.ps1 $solrPort" & "$solrPath\bin\solr.cmd" start -port $solrPort -f -c
It starts the process to make collections in the background, and then starts Solr in the foreground – allowing both parts to run in parallel.
So we can now build an image for SolrCloud and starting the container gives you a copy of SolrCloud with some collections created. Success!
And the data folder now includes the core data and the Zookeeper data:
(The prefix on the collection names here was part of my testing – it wouldn't be necessary in a real deployment of this)
But I'm aware there's a load more to do here:
CORE_NAMES
variable, but it's not used by the code above right now.;SolrCloud=true
parameter to them for a start.So I need to find some more time to work on this – but it's a promising start...