I had to deal with a bug report in some Sitecore 6.6 / Advanced Database Crawler search code recently, relating to items with publishing restrictions not disappearing from search results until another publish occurred. It struck me that there's not much written about how publishing restrictions interact with search, so I figured I should take a bit of time to write down what I'd found while sorting the bug.
Clicking the "Change" button on the "Publish" ribbon opens the dialog. The editor can choose between the "version" and "item" tabs depending on which sort of restriction they wish to set. In both cases, the editor can choose if the item should ever be published (with the checkbox) or set a date/time range for publishing with the date and time selectors.
If you set "do not publish" on an item via the Publishing Restrictions dialog then you will not get a version in the Web Database – hence you won't get an index entry for that item. Also, items with future publishing dates appear to be ignored by the indexer.
The items that are added to an index can also be affected by the configuration for indexing – you may set up an index to receive items from only one database for example.
When indexing is applied, the restriction fields get indexed into the following Lucene fields:
the indexed data looks like:
No times go into Lucene - so you can only filter search results on a "per day" basis. It's also worth noting that if you don't set a value for a publishing restriction then the value added to the index is "00010101" – which is 01/01/0001 in Lucene's format.
string _unpublishedField = "__unpublish"; string _validToField = "__valid to"; string _noDate = Lucene.Net.Documents.DateTools.DateToString(new DateTime(0001, 01, 01), Lucene.Net.Documents.DateTools.Resolution.DAY); string _futureDate = Lucene.Net.Documents.DateTools.DateToString(new DateTime(2100, 01, 01), Lucene.Net.Documents.DateTools.Resolution.DAY); public void AddPublishingRestrictionsTerm(this BooleanQuery query) { string today = Lucene.Net.Documents.DateTools.DateToString(DateTime.Now, Lucene.Net.Documents.DateTools.Resolution.DAY); BooleanQuery clause = new BooleanQuery(); // clause for __unpublish BooleanQuery unpubTerm = new BooleanQuery(); unpubTerm.Add(new TermQuery(new Term(_unpublishedField, _noDate)), BooleanClause.Occur.SHOULD); unpubTerm.Add(new TermRangeQuery(_unpublishedField, today, _futureDate, false, true), BooleanClause.Occur.SHOULD); clause.Add(unpubTerm, BooleanClause.Occur.MUST); // clause for __valid to BooleanQuery validToTerm = new BooleanQuery(); validToTerm.Add(new TermQuery(new Term(_validToField, _noDate)), BooleanClause.Occur.SHOULD); validToTerm.Add(new TermRangeQuery(_validToField, today, _futureDate, false, true), BooleanClause.Occur.SHOULD); clause.Add(validToTerm, BooleanClause.Occur.MUST); query.Add(clause, BooleanClause.Occur.MUST); }
The
_noDate
and
_futureDate
fields declare two values that will be used for comparisons later. As mentioned before, Lucene stores "no date" as 01/01/01, and we can format that appropriately with the
DateTools.DateToString()
method. The future date is an arbitrary value in the far future.
In the
AddPublishingRestrictionsTerm()
method, we add two new clauses to the search. Both must evaluate as true for a result to be returned. Both clauses follow the same code pattern, but refer to different index fields. To cover both Item and Version expiries, we need to look at both the
__unpublish
and
__valid to
fields.
For each of these we test two things. Firstly, is the value of the field equal to the "no date" value. Secondly, is the value of the field between today and our "future" date. If either of these is true this field is valid for display.
Applying these clauses should mean that once an item or version's publishing restrictions expire, it will no longer be included in search results.
↑ Back to top