I've got a project on the cards that I'd like to use docker containers for, but we're talking about using SolrCloud for search. Right now, there isn't a SolrCloud container in the Sitecore community container repo. So I started thinking about what would it take to make one.
As an aside, if you're new to SolrCloud you might want to watch (or read) my Symposium presentation about getting started with SolrCloud and deploying it to production. That explains more about why we have collections, and how they help you.
What this means is to make SolrCloud work, we're going to need to replace the process that the existing scripts use to create the default set of indexes. I've already spent a load of time on some scripts that can automate creating SolrCloud collections as part of the presentation linked to above – so those seem like a good starting point...
{ "tags": [ { "tag": "sitecore-openjdk:8-nanoserver-${nanoserver_version}", "build-options": [ "--build-arg BUILD_IMAGE=mcr.microsoft.com/windows/servercore:${windowsservercore_version}", "--build-arg BASE_IMAGE=mcr.microsoft.com/powershell:nanoserver:${nanoserver_version}" ] } ], "sources": [] }
So now when that container builds, PowerShell will be available for use.
In the Dockerfile, the change is to remove this bit:
RUN New-Item -Path 'C:\\clean' -ItemType Directory | Out-Null; ` Copy-Item -Path 'C:\\solr\\server\\solr\\*' -Destination 'C:\\clean' -Force -Recurse; ` # $env:CORE_NAMES -split ',' | ForEach-Object { ` # $name = $_.Trim(); ` # $schema = @{$true=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_XDB_NAME);$false=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_DEFAULT_NAME)}[$name -like '*xdb*']; ` # Copy-Item -Path 'C:\\clean\\configsets\\_default\\conf' -Destination ('C:\\clean\\{0}\\conf' -f $name) -Recurse -Force; ` # Copy-Item -Path $schema -Destination ('C:\\clean\\{0}\\conf\\managed-schema' -f $name); ` # Set-Content -Path ('C:\\clean\\{0}\\core.properties' -f $name) -Value ('name={0}{1}config=solrconfig.xml{1}update.autoCreateFields=false{1}dataDir=data' -f $name, [Environment]::NewLine); ` # New-Item -Path ('C:\\clean\\{0}\\data' -f $name) -ItemType Directory | Out-Null; ` # }; ` Remove-Item -Path 'C:\\clean\\README.txt'; ` Remove-Item -Path 'C:\\clean\\configsets' -Recurse;
(It's commented out here, because it's easier to show the change that way – this can be deleted)
Removing the start-up behaviour that copies these files is easy – because the next step is going to replace the entrypoint anyway...
...snip... EXPOSE 8983 COPY Boot.ps1 . COPY MakeCollections.ps1 . CMD ["c:\\program files\\powershell\\pwsh.exe", "-f", "Boot.ps1", "c:\\solr", "8983", "c:\\clean", "c:\\data"]
The Boot.ps1 file needs to do some of the same stuff that the old batch file did, and we'll extend it. First up, it needs to receive the parameters:
param( [string]$solrPath, [string]$solrPort, [string]$installPath, [string]$dataPath )
The "copy files" behaviour from the original script can stay, but in PowerShell flavour – and it's doing something with lock files too:
$solrConfig = "$dataPath\solr.xml" if(Test-Path $solrConfig) { Write-Host "### Config exists!" } else { Write-Host "### Config does not exist, creating..." Copy-Item "$installPath\\*" "$dataPath" -force -recurse } Write-Host "### Preparing Solr cores..." Push-Location $dataPath if(Test-Path "write.lock") { Write-Host "### Removing write.lock" Remove-Item "write.lock" -Force } Pop-Location
And finally, fire up Solr in Cloud mode:
Write-Host "### Starting Solr..." & "$solrPath\bin\solr.cmd" start -port $solrPort -f -c
The logic starts by waiting for Solr to be running:
# wait for it to start Write-Host "### Waiting on $solrPort..." Wait-ForSolrToStart "localhost" $solrPort Write-Host "### Started..."
Because SolrCloud uses both disk and Zookeeper for storing data, we need to be able to call API endpoints to set up collections, so the rest of the script has to wait for Solr to be running. Then we need to check to see if there's any work to do. This should really get refactored out to a function, but it asks the API how many collections exist at present:
# check for collections - /solr/admin/collections?action=LIST&wt=json $url = "http://localhost:$solrPort/solr/admin/collections?action=LIST" $response = Invoke-WebRequest -UseBasicParsing -Uri $url $obj = $response | ConvertFrom-Json $collectionCount = $obj.Collections.Length Write-Host "Collection count: $collectionCount"
If there are no collections, then we need to create them. Otherwise, we can assume it's already been done and skip this bit. If we do have to create them, the that calls the function I wrote when I was originally automating SolrCloud. This also needs a bit of further hacking, as the set of collections and aliases created is fixed right now and it should be configured by data passed in from the json config data – but it does the job for a demo:
if($collectionCount -eq 0) { Write-Host "Need to create" Configure-SolrCollection "c:\clean" "localhost" "$solrPort" 1 1 1 "" } else { Write-Host "Already exists" }
The last step is making sure this script runs. This bit was a bit of a head-scratcher. The script needs to run in parallel with Solr, but Solr has to run with the "-f" flag, which causes it to keep going forever. So the collection creation has to execute before Solr is started, but needs to not block execution of the startup. And it also needs to be able to send its output back to the Docker console, if we're running connected. After a bit of hackery I settled on PowerShell's
Start-Process
command. That runs the script in parallel, but doesn't wait for it to end. So "Boot.ps1" can be updated:
Write-Host "### Starting Solr..." Start-Process "c:\\program files\\powershell\\pwsh.exe" -ArgumentList "-f",".\MakeCollections.ps1 $solrPort" & "$solrPath\bin\solr.cmd" start -port $solrPort -f -c
It starts the process to make collections in the background, and then starts Solr in the foreground – allowing both parts to run in parallel.
And the data folder now includes the core data and the Zookeeper data:
(The prefix on the collection names here was part of my testing – it wouldn't be necessary in a real deployment of this)
But I'm aware there's a load more to do here:
CORE_NAMES
variable, but it's not used by the code above right now.;SolrCloud=true
parameter to them for a start.So I need to find some more time to work on this – but it's a promising start...
↑ Back to top