Home Insights Articles How to replatform Endeca rules to Solr

How to replatform Endeca rules to Solr

How to replatform Endeca rules to Solr

In the  previous article we discussed the Endeca rules model and explained how to re-implement this model using Elasticsearch. We needed to implement inverted search to trigger our rules and we leveraged powerful percolator feature in Elasticsearch which greatly simplified our implementation. In this blog post, we will discuss how to approach implementation of Endeca rules if you are running Solr.

Unfortunately,  Solr currently does not have a percolator-like functionality.  We believe it will be available soon because Lucene 8.2 support is already merged. Meanwhile, we can employ an alternative approach to implement inverted search based purely on Solr queries.  We will use the same example we used in previous article for illustration.

Quick refresher on Endeca triggers

Firstly, let’s recall the particular trigger types that we will have to implement:

Match phrase: the search phrase contains search terms sequentially in a strict order but may also contain other words before or after.

Example: The rule is configured with search terms = “how to”. The search phrase “how to make an order” will trigger this rule. At the same time, the search phrase “how can I get to the store” will not trigger the rule.

Match all: the search phrase contains all search terms in any order with optional additional words in any position.

Example: The rule is configured with search terms = “oven best pizza”. The search phrase “what is the best oven for cooking pizza” will trigger this rule.

Match exact: the rule will be triggered only and only when the search phrase is exactly equal to search terms. No additional words are allowed.

Example: The rule is configured with search terms “order status”. Only the search phrase “order status” will trigger this rule, not any other.

We will use those triggers as an example and we will use default Solr configuration for simplicity. So, lets roll our sleeves and get some inverted search up&running!

First, after launching Solr we need to create the new core/collection for named rules. We can do it from the core admin page or from the terminal by executing .bin/solr create -c rules command.

Solr rules model

We are going to use the same logical rule structure as in the previous post. The rule will be modeled as a parent document, with triggers represented as child documents. So how the example from the previous post will look in the case of Solr?

In this post, we will extend our simple rule engine functionality with two essential features :  rule collapsing and sorting. Rule collapsing refers to the situation when multiple rules of the same type fire, and we have to select the one with the highest priority (represented as lowest priority number).

Let’s start with basic rule structure. Note that we used *_i, *_s and *_t suffixes in order to map integer, string and text field types respectively.

{
 "id": "1",
 "priority_i": 1, //1
 "action_t": "<some serialized action>",   // 2
 "actionType_s": "<REDIRECT/FACET/BOOST/BURY....>",
 "scope_s": "rule",
 "_childDocuments_": [
   {
     "id": "1",
     "keyword_s": "<phrase to be triggered on>",
     "keyword_t": "<phrase to be triggered on>", //3
     "keyword_words_count_i":<Integer value. Count of words in keyword  field>, //3
     "matchmode_s": "<MATCHEXACT/MATCHPHRASE/MATCHALL>",
     "scope_s": "trigger"
   },
   {
     .....
   }
 ]
}
  1. Priority, which is used to sort rules of the same type during collapsing. Highest priority is represented as a lowest value. When multiple rule of the same type fire, rule with the lowest priority value is selected.
  2. This is the action payload. In case of REDIRECT it is URL, in case of other more complicated rules it can be JSON with serialized rule object.
  3. Technical derivatives of keyword fields. Usually are created on Solr side using custom Update Request Processors, but in this example (for simplicity) let’s use approach to produce derivatives on the client side.
    1. keyword_s – not tokenized keyword representation. Used for MATCHEXACT and MATCHPHRASE triggers.
    2. keyword_t – tokenized keyword representation. Used for MATCHALL trigger matching.
    3. keyword_word_count_i – Used for MATCHALL trigger matching.

Now, lets convert our sample rules into the Solr input structure:

{
 "id": "1",
 "priority_i": 2,
 "action_t": "http://retailername.com/FAQ",
 "actionType_s": "REDIRECT",
 "scope_s": "rule",
 "_childDocuments_": [
   {
     "id": "tr1",
     "keyword_s": "how to",
     "matchmode_s": "MATCHPHRASE",
     "scope_s": "trigger",
     "keyword_t":"how to",
     "keyword_words_count_i":2
   }
 ]
},
{
 "id": "2",
 "priority_i": 1,
 "action_t": "http://retailername.com/orders",
 "actionType_s": "REDIRECT",
 "scope_s": "rule",
 "_childDocuments_": [
   {
     "id": "tr2",
     "keyword_s": "order status",
     "matchmode_s": "MATCHEXACT",
     "scope_s": "trigger",
     "keyword_t":"order status",
     "keyword_words_count_i":2
   }
 ]
},
{
 "id": "3",
 "priority_i": 1,
 "action_t": "http://retailername.com/top10ovens",
 "actionType_s": "REDIRECT",
 "scope_s": "rule",
 "_childDocuments_": [
   {
     "id": "tr3",
     "keyword_s": "oven best pizza",
     "matchmode_s": "MATCHALL",
     "scope_s": "trigger",
     "keyword_t": "oven best pizza",
     "keyword_words_count_i": 3
   }
 ]
}

After the indexing, we will have our core filled with our sample rules.

Matching rules using Solr queries

Since we have 3 different match modes, in order to build our inverted search query, we need to create a disjunction boolean query. We will show you the final result and then walk you through every part of the query.

Let’s use the keyword “how to cook” as an example. Below is a complete request how to match rules using the “how to cook” user keyword.

http://localhost:8983/solr/rules/select?exactQuery=keyword_s:"how to cook"&fq={!collapse field=actionType_s sort='priority_i asc'}&matchAllQuery={!frange l=0 u=0 incl=true incu=true v='sub(sum(max(0, query({!lucene v="keyword_t:how^=1"})),max(0, query({!lucene v="keyword_t:to^=1"})),max(0, query({!lucene v="keyword_t:cook^=1"}))),field(keyword_words_count_i))'}&phraseQuery=keyword_s:"how to" OR keyword_s:”to cook”&q={!parent which=scope_s:rule v=$triggerQuery}&triggerQuery=+(({!lucene v=$exactQuery} AND filter(matchmode_s:MATCHEXACT)) OR ({!lucene v=$phraseQuery} AND filter(matchmode_s:MATCHPHRASE)) OR ({!lucene v=$matchAllQuery} AND filter(matchmode_s:MATCHALL))) AND filter(scope_s:trigger)

As you can see this request correctly returns rule  no.1 associated with a matchPhrase trigger configured on “how to”.

So, lets analyze all parts of this complex query

http://localhost:8983/solr/rules/select? 

is a request to regular select RequestHandler

q={!parent which=scope_s:rule v=$triggerQuery}&  

ToParentBlockJoinQuery is needed to match Rule (parent document) by it’s matched triggers (child documents)

triggerQuery=+(({!lucene v=$exactQuery} AND filter(matchmode_s:MATCHEXACT))  OR ({!lucene v=$phraseQuery} AND filter(matchmode_s:MATCHPHRASE))  OR ({!lucene v=$matchAllQuery} AND filter(matchmode_s:MATCHALL)))  AND filter(scope_s:trigger)&  

This is the main query for matching triggers. As you can see, this query is a disjunction query with 3 clauses for 3 different match modes. The specific queries for each type are extracted to separate nested params exactQuery, phraseQuery and matchAllQuery

exactQuery=keyword_s:"how to cook"& 

MatchExact query, It is very straightforward – we just need to check if  that keyword field content is exactly the same as the user’s query. As we are only looking for exact match, un-tokenized string field is used.

phraseQuery=keyword_s:"how to" OR keyword_s:”to cook”& 

MatchPhrase query. Here the query parser needs to cut all possible n-grams from the user search phrase. As we have a very short example keyword,  we have only two n-grams “how to” and “to cook”.  Using this approach, we are matching only those triggers which contain some subphrase of the user keyword.

matchAllQuery={!frange l=0 u=0 incl=true incu=true v='sub(sum(max(0, query({!lucene v="keyword_t:how^=1"})),max(0, query({!lucene v="keyword_t:to^=1"})),max(0, query({!lucene v="keyword_t:cook^=1"}))),field(keyword_words_count_i))'}&

Matchall query is the trickiest one, leading to inverted search problem. We will discuss it separately to properly explain all the details

fq={!collapse field=actionType_s sort='priority_i asc'}

Collapse Filter query in order to fetch only no.1 rule of each type with the lowest priority

Matchall query

Formally speaking, matchAll query means that we have to find such rules, where the tokens configured in the trigger are the subset of tokens from the user query. We don’t know which tokens will match, but we know that the number of matched tokens should be exactly the same as the total number of tokens in the trigger.

We conveniently store the number of tokens in the keyword in the field keyword_words_count_i.

We will use S0lr function query framework to perform this precise matching. Function queries were designed for match scoring, but with some simple tricks we can use them for precise filtering as well:

{!frange l=0 u=0 incl=true incu=true v=' //5
sub( // 4
sum( // 2
max(0, query({!lucene v="keyword_t:how^=1"})),// 1
max(0, query({!lucene v="keyword_t:to^=1"})),
max(0, query({!lucene v="keyword_t:cook^=1"}))
),
field(keyword_words_count_i)) // 3
'}

We will unwind this query from inside out, so follow the numbers in the listing:

  1. This clause returns a score of 1 if a term is present in the trigger. We will count every match as score 1.
  2. We sum all scores getting a total number of matches
  3. We retrieve the value of field keyword_words_count_i which contains expected number of matches
  4. We substract the number of matches and expected number of matches. If we are getting zero, rule should be retrieved.
  5. We use frange to create a range query over internal function value. We set upper and lower bounds of range to 0 to select only zero scores.

That’s it. Now we are able to perform inverted search and match our AllMatch triggers.

Lets consider some more examples:

The request for “order status” keyword, which correctly matches rule no. 2 associated with matchExact trigger configured on phrase “order status” goes as follows:

http://localhost:8983/solr/rules/select?exactQuery=keyword_s:%22order%20status%22&fq={!collapse%20field=actionType_s%20sort=%27priority_i%20asc%27}&matchAllQuery={!frange%20l=0%20u=0%20incl=true%20incu=true%20v=%27sub(sum(max(0,%20query({!lucene%20v=%22keyword_t:order^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:status^=1%22}))),field(keyword_words_count_i))%27}&phraseQuery=keyword_s:%22order%20status%22&q={!parent%20which=scope_s:rule%20v=$triggerQuery}&triggerQuery=+(({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT))%20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter(matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery}%20AND%20filter(matchmode_s:MATCHALL)))%20+filter(scope_s:trigger)

The request for “best oven for pizza” keyword, which correctly matches rule  no. 3 associated with matchAll trigger configured on words set “oven best pizza” goes as follows:

http://localhost:8983/solr/rules/select?http://localhost:8983/solr/rules/select?exactQuery=keyword_s:%22best%20oven%20for%20pizza%22&matchAllQuery={!frange%20l=0%20u=0%20incl=true%20incu=true%20v=%27sub(sum(max(0,%20query({!lucene%20v=%22keyword_t:best^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:oven^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:for^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:pizza^=1%22}))),field(keyword_words_count_i))%27}&phraseQuery=keyword_s:%22best%20oven%22%20OR%20keyword_s:%22oven%20for%22%20OR%20keyword_s:%22for%20pizza%22%20OR%20keyword_s:%22best%20oven%20for%22%20OR%20keyword_s:%22oven%20for%20pizza%22&q={!parent%20which=scope_s:rule%20v=$triggerQuery}&triggerQuery=+(({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT))%20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter(matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery}%20AND%20filter(matchmode_s:MATCHALL)))%20AND%20filter(scope_s:trigger)&fq={!collapse%20field=actionType_s%20sort=%27priority_i%20asc%27}

We can also consider keyword “how to cook best pizza” which is matching both “how to” matchPhrase trigger and “oven best pizza” matchAll trigger, but because of collapsing filter query(fq) we are getting only rule no. 3 with the highest priority.

http://localhost:8983/solr/rules/select?exactQuery=keyword_s:%22how%20to%20oven%20best%20pizza%22&matchAllQuery={!frange%20l=0%20u=0%20incl=true%20incu=true%20v=%27sub(sum(max(0,%20query({!lucene%20v=%22keyword_t:how^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:to^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:oven^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:best^=1%22})),max(0,%20query({!lucene%20v=%22keyword_t:pizza^=1%22}))),field(keyword_words_count_i))%27}&phraseQuery=keyword_s:%22how%20to%22%20OR%20keyword_s:%22to%20oven%22%20OR%20keyword_s:%22oven%20best%22%20or%20keyword_s:%22best%20pizza%22%20OR%20keyword_s:%22how%20to%20oven%22%20OR%20keyword_s:%22to%20oven%20best%22%20OR%20keyword_s:%22oven%20best%20pizza%22%20OR%20keyword_s:%22how%20to%20oven%20best%22%20OR%20keyword_s:%22to%20oven%20best%20pizza%22&q={!parent%20which=scope_s:rule%20v=$triggerQuery}&triggerQuery=+(({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT))%20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter(matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery}%20AND%20filter(matchmode_s:MATCHALL)))%20AND%20filter(scope_s:trigger)&fq={!collapse%20field=actionType_s%20sort=%27priority_i%20asc%27}

Conclusion

In this blog post, we discussed  the trickiest part of Endeca rule migration, matchAll trigger implementation. Full fledged implementation should also include other aspects, such as:

  • Matching rules by selected filters. We split the queries into two exactLocation true/false trigger matching.  TrueExactLocation implementation is trivial, while false exactLocation is very similar to the inverted search approach used in the matchAll trigger matching.
  • Matching default rules.
  • Applying normalizations, such as spelling correction to ensure that if “jeans” product is retrieved by a misspelled “jeanz” phrase, the rule configured for “jeans” will fire as well.
  • Splitting triggers by Browse and Search navigation types.

Happy searching!

Tags

You might also like

A shopping cart surrounded by silhouetted people in a vibrant, digital marketplace with hexagonal icons floating above, representing B2B composable commerce.
Article
Composable commerce for B2B: Overkill or delivers big?
Article Composable commerce for B2B: Overkill or delivers big?

The buzzword “composable commerce” has dominated digital strategy conversations since Gartner popularized the term in 2020. But behind the marketing hype lies a longstanding, proven practice of integrating specialized, best-of-breed technology components into a flexible and scalable ecosystem....

Multicolor whisps of smoke on a black background
Article
Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud
Article Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud

For many businesses, moving away from familiar but inherently unadaptable legacy suites is challenging. However, eliminating this technical debt one step at a time can bolster your confidence. The best starting point is transitioning from a monolithic CMS to a headless CMS. This shift to a modern c...

Ecommerce interface showing clothing on a rack to represent merchandising
Article
How a merchandising experience platform puts retailers in control of search, browse, and recommendations
Article How a merchandising experience platform puts retailers in control of search, browse, and recommendations

As a retail leader, are you in complete control of your search, browse, and recommendation strategies? Do your digital experiences align with your business goals while delivering what customers expect? Can you control product rankings to highlight specific items in search results, adjust categories...

Yellow bubbles coming out of a purple box
Article
10 reasons to migrate to a headless CMS with Contentstack and Grid Dynamics
Article 10 reasons to migrate to a headless CMS with Contentstack and Grid Dynamics

The headless CMS market is experiencing unprecedented growth as organizations recognize its potential for delivering flexible, personalized digital experiences. Recent market analysis reveals striking momentum—the global headless CMS software market, valued at $851.48 million in 2024, is projected...

Silhouette of a person standing on stairs in front of a large glass ball against a sunset background
Article
Probabilistic forecasting for enhanced demand prediction
Article Probabilistic forecasting for enhanced demand prediction

In today's fast-paced and data-driven world, accurately predicting demand is more critical than ever for businesses aiming to stay competitive. Traditional forecasting methods often provide a single-point estimate, which can be useful but falls short in accounting for the inherent uncertainties and...

Virtual model wearing a series of different clothing items to represent virtual try-on capabilities
Article
Digital dressing rooms: How generative AI is redefining virtual try-ons
Article Digital dressing rooms: How generative AI is redefining virtual try-ons

Have you come across a retail marketing message lately that states, 'Bring the fitting room home and find what you love'? Many retail brands today showcase their customer-first mindset through 'try before you buy' experiences, allowing customers to order products online, try everything, and return...

Woman navigating in-store inventory to represent Demand sensing and forecasting concept
Article
Retail demand forecasting: Use cases in Retail and Manufacturing
Article Retail demand forecasting: Use cases in Retail and Manufacturing

Demand forecasting is a crucial aspect of retail supply chain management that involves predicting future customer demand to make informed decisions about inventory levels, production, and resource allocation. It is a statistical analysis that considers numerous variables to optimize the predict...

Get in touch

Let's connect! How can we reach you?

    Invalid phone format
    Submitting
    How to replatform Endeca rules to Solr

    Thank you!

    It is very important to be in touch with you.
    We will get back to you soon. Have a great day!

    check

    Something went wrong...

    There are possible difficulties with connection or other issues.
    Please try again after some time.

    Retry