Posted by: Carole-Ann | March 14, 2011

## Rete Algorithm Demystified! — Part 2

In part 1 of this blog series, I covered the Rete Algorithm and its origin and even the origin of its name.  Now as promised, I am going to explain how it works.  It is a challenge to keep it simple not to lose our less technical audience and yet interesting enough for the techies.  I hope I will reach that goal.  I am trying to convince Carlos to do a Part 3 that will speak to the hard-core techies on more advanced considerations.

And without further due, let me introduce the Rete algorithm!

How it works

The brilliant idea that Dr. Charles Forgy developed was to decouple the evaluation of hypothesis from execution sequencing.

Why is that brilliant you might ask?  When large volumes of business rules need to be assessed against facts or data, the constant screening for applicability can be costly — both in terms of evaluation (and re-evaluation) and in terms of ordering the statements properly.  This innovative approach for inference allows savings in both areas.  Let’s explore how.

In absence of this algorithm, or derivatives, the traditional approach is to sequence and nest the IF-THEN-ESE statements of the rules in such a way that you capitalize on the test evaluations — you reuse them as much as possible.  When you think about it, it would be akin to creating the leanest decision tree possible, in a graphical form or more likely in straight code.  This “sequential” approach breaks (mostly in terms of performance) when rules need to execute as a result of the execution of other rules.  This is what we call inference.  In a sequential approach, you would have to restart the complete ruleset evaluation after each rule execution (if rules are prioritized) or after a full pass through the ruleset (in absence of prioritization).  Keep in mind too that prioritization might prevent you from reusing test evaluations all together in a sequential approach: if the conditions are not co-located in the logic, then you will likely have to perform those tests multiple times.  Systems have become much more sophisticated than they used to be back then, and most BRMS products provide a sequential deployment option which is still pretty good when inference is not needed.  The beauty of those products is that you can still manage the rules independently of each other and the sequential engine takes care of optimizing the generated spaghetti code.

For illustration, I’ll use a simple set of rules that we are mostly familiar with: airline miles calculations.  The decision service is responsible for processing the frequent flyer’s activity.

if award miles for last year or current year > 25,000 then status = Silver

if award miles for last year or current year > 100,000 then status = Gold

If flight is less than 500 miles then award 500 miles

If flight is 500 miles or more then award flight miles

if category is business or first then award 50% bonus miles

if status is Gold and airline is not partner then award 100% bonus miles

if status is Silver and airline is not partner then award 20% bonus miles

if member signed up for 3-flights-for-5k-in-March and number of return flights in March 2011 = 3 then award 5,000 additional miles

if status becomes Gold then award 8 upgrade certificates

I will live it up to the creative minds of the Airline marketers to add more promotions to our short list here: hotel or rental car bonuses, incentive programs, lifetime members, accelerator programs, etc.

This example does not use a lot of inference but that is enough to illustrate our point.  If you run through the rules, sequentially — let’s keep the current order — and this Frequent Flyer member reaches Gold status in that very transaction, we need to process the rules once more to update the status and award the proper bonus miles and other incentives.

Inference engines assess all the applicable rules and then fire the first applicable rule, propagate the changes and fire again.  How does that work?

Constructing the Rete Network

The Rete network is the heart of the Rete algorithm.  It is made of nodes that each hold a list of objects that satisfy the associated condition.  The original Rete algorithm worked out of facts but I will simplify the description to refer to objects since all commercial engines have evolved to be object-oriented nowadays.

The Rete network is constructed when you compile a rules project — only once when you start that service (or class for simpler designs) – and then shared across all invocations.

The discrimination tree is the first part of the Rete network.  It starts with Alpha nodes that are associated with Classes (in the object-oriented sense).  All instances of a given class will be listed in any given Alpha node.  The discrimination happens by adding conditions as single nodes attached to the Alpha node or to another parent node.

Let’s review what that means for our Airline example:

Alpha nodes are created for each class: Frequent Flyer Account and Flight.

Conditions are then appended:

• Frequent Flyer
• status is Gold
• status is Silver
• award miles for last year or current year > 25k
• award miles for last year or current year > 100k
• signed up for “3-flights-for-5k-in-March” promotion
• number of flights in March = 3
• Flight
• miles >= 500
• miles < 500
• airline is not a partner
• airline is partner
• category is business or first

etc.

Each node represents an additional test to the series of conditions applied upstream.  If you follow a path from top to bottom, you should be able to read the complete set of conditions THAT APPLY TO ONE GIVEN CLASS OF OBJECTS.

Finally the nodes are connected across classes.  This is where we can combine the conditions for a non-partner flight taken by a Gold member.  We call those “joints” as we will combine the list of objects that verify conditions on one branch with the list of objects that verify the conditions on another branch.

On the performance side, you may have been warned in the past not to combine too many patterns in a single rule.  It is clear looking under the hood that the product of all possible accounts by all possible flights could result into a combinatorial explosion. The more discrimination upfront, the better obviously.

The path eventually end with the action part of the rules.  The content of the actions is irrelevant for the Rete network.  you could replace the labels here with the name of the rule (I did not name the rules in our example so I displayed the actual actions).  The rules can be reconstituted by following the incoming paths.  All nodes are AND-ed.  This is an interesting point. ORs are typically not natively supported: rules are duplicated for each OR-ed flavor.  If you were to add a rule to grant the same bonus for SILVER or GOLD customers traveling to Hawaii then you would just have nodes connected to the existing GOLD and SILVER nodes as if they were written as separate rules.  In terms of performance or maintenance, it does not matter though since the Rete algorithm handles the supposed duplication as efficiently as any manual code would.

It is a great time to emphasize the great savings you would get when the number of rules increases.  Let’s say that you want to introduce a new promotion for GOLD customers to/from specific destinations or for reached milestones (50k and 75k award miles).  The nodes that test for GOLD status do not need to be created or duplicated.  The test as well as the execution of the test is leveraged and reused transparently regardless of the order in which the rules need to be executed.

Rete Cycle: Evaluate

Let’s look now at what happens at runtime when you invoke that service with customer data.  Your design will dictate whether those transactions are applied “real-time” (or on-demand) or whether you want to execute a batch of transactions at once, at night when you have access to your Mainframe window for example.  Rules can actually be used either way and in many projects, they are actually used both ways by two different systems.  For an Insurance underwriting project, you might deploy the same rules online in your self-service website as well as your older batch application that processes the applications coming from your brokers.

Let’s assume that we are processing transactions one by one for simplicity.

The evaluation phase consist in running the data through the Rete network to identify the applicable rules, for which all conditions are satisfied.  The nodes in the network will hold lists of objects that satisfy the conditions in the incoming path.

In our airline example, Joe flies from Washington DC (IAD) to San Francisco (SFO).  Flight accounts for 2,419 miles I recall.  Starting with already 150k award miles, Joe should bank double miles (4,838) for his GOLD status.  How does Rete propagates those facts to make the same decision?

The Account Alpha node references Joe as a Frequent Flyer.  Because of his 150k award miles, he also qualifies for the “> 100k” condition.  He is referenced in this node too.  My handwriting did not allow me to write down Joe in the small real estate so I just highlighted the node with a X.  In reality, Joe would be referenced specifically as we need to keep track of who qualifies in case we process multiple objects at once.  For this exercise, I also assume that service does not store the GOLD status and computes it every time so the associated node still has an empty list.

Similarly, the IAD-SFO flight is referenced in the list of all flights in the transaction.  The flight is more than 500 miles and on the airline.  The associated lists reference the flight accordingly.

At that point in time we do not have any joint to worry about since the account status is not known yet to qualify for GOLD.

2 rules are satisfied.  All facts have been propagated.  We are ready for execution.

Rete Cycle: Execute

The rules for which all conditions are satisfied are said to be active in the agenda.  The agenda contains the list of all rules that should be executed, along with the list of objects that are responsible for the conditions to be true.  I insist on the word “SHOULD”.  The agenda will sort them according to their priorities (and other conflict resolution methods).  If the execution of a rules invalidates a rule with a lower priority, then this second rule will logically never execute.  This would have been the case if Joe had started with 99k award miles – he would have qualified for the SILVER bonus until we bank the IAD-SFO miles and granted him GOLD status.  Joe can qualify for SILVER or GOLD bonus miles but certainly not both.

Priorities are a touchy subject in the Rules community as some purists are set against its use.  We had an interesting discussion on this topic in the community.

The agenda lists the 2 rules (GOLD status and 500+ Flight Miles).  The first one is executed.  In our airline example, order does not matter for those 2 rules.  Let’s assume that flight miles are awarded first.  Joe gets 152,419 miles in his account so far.

Do it again and again…

After the execution of the first rule, facts are propagated again.  Flights have not changed so none of that subtree is re-evaluated.  Joe’s account has changed so the conditions related to award miles are re-assessed, with no change in our example.  The GOLD status rule remains active in the agenda.

We are ready to execute the following rule: GOLD Status which is the only rule left in the agenda.

As a result, the GOLD Status node is updated for Joe, which leads to the GOLD status and not partner node to update as well, which leads to the 100% GOLD bonus rule to become activated in the agenda.

In the end, Joe gets the proper credit of 154,839 in his award miles account.

As you add more rules that apply to GOLD or SILVER customers, the RETE network will grow always reusing the same nodes as much as possible — which is the part that takes time to do properly by hand if you are still coding it manually.  Those spaghetti in the “Joint” part of the algorithm are a nightmare to maintain properly and can cost you significant debug time.  This trivial example can become quickly messy when you consider the many different ways you can qualify for status with eligible segments rather than eligible miles, with accelerators, with lifetime miles, as well as the incentive promotions that you might get by default and those that you need to sign up for.  Reusing those conditions — and their execution — regardless of when and where they show up is a performance boost that largely compensate for the initial overhead when inference is needed.

When inference is not needed and you only go through one cycle of propagation, it is debatable of course.

In my opinion, Charles opened the door for an algorithmic solution to properly ensuring the enforcement of many rules in an uncertain order.  In other words, he created an approach where very large volumes of rules could be assessed in a short time — much faster than any other algorithms when rules need to be executed repeatedly on a given data set. The further performance improvements he introduced in Rete II, Rete III and lately in Rete-NT have also changed the initial “ideal” characteristics of a Rete project to better support much larger data set — what was marketed as the Rete wall by the non-Rete vendors.

It is somewhat surprising that no new algorithm break-through (other than Charles’s) were made since then.  The basis for Business Rules execution is vastly based on Rete, or sequential, but hardly anything else.  It reminds me of the sharks and their evolution.  I remember, in my life science days, learning that sharks have not evolved at all in a very long time… likely because they are perfectly adapted to their environment.

## Responses

1. I am just noticing now that Rete is our 100th post on this blog! We are excited about the milestone. Thank you all for your continued readership!

• how do we actually implement the RETE algorithm? like in JAVA?

• Hi Sabbir,

BRMS vendors have implemented the Rete algorithm in Java and .NET, most frequently. Now having end-users do your own implementation of the algorithm is not what I recommend of course. Use a commercial engine.

I would not build my own database engine… It would be the same for the inference engine…

The reason I wanted to share details on how it works, is that, if you understand that part, then you will most likely better leverage it, better tune it, and use it for what it is good at.

Many people are passionate about their cars, and study how the engine works. Very few build their own…

I am evidently biased, but I think it is a sound advice nevertheless.

Carole-Ann

2. Carole-Ann,… Just a quick comment. I think educational posts, such as this, are a great benefit to the industry… to practitioners looking for deeper understanding of the fundamentals of business rules engines and decision management technologies, and also to adopting companies struggling for insights beyond the ever present marketing fluff. I hope that you continue with such posts. Thanks!

• Thanks, Michael! I am glad that you enjoy our contribution to the community of practitioners. We love sharing what we have learned over our many many years in that space. Feel free to shoot questions at us too!

3. Okay… Since you invited questions… and risking that this will show I have missed something foundational… which may lead to another post as a reply (which is cool ;-))…

While I realize that for some reason there is much less press these days on Complex Event Processing than there was just a year ago, the ‘need’ resonated with me. I am curious how you see the Rete algorithm, a condition/action approach in my understanding, addresses, or ‘can’ address, an event/action rules. Is this even an algorithm question… or is it beyond the algorithm and more up to the external mechanisms of ‘feeding’ the conditions to ‘an’ algorithm?

• Interesting question. It looks like we have our topic identified for part 3!

4. I have missed this post for 2 days. How can I?

There is at least one challenger for RETE: The TREAT algorithm. There were many papers on the comparison RETE vs. TREAT. There are also several other variants.

Each RETE is a specific one, many places in the RETE can be tailored or optimized.
For example, you do not always need an entire agenda. You do not need to always keep a join memory, and the test sharing can be more or less sophisticated.

It is important to understand the distinction between alpha node (or intra-object tests) and join-nodes (or inter objects tests). Intra-object tests are those which test the object against a constant (the person has age > 23), while inter-objects tests involve 2 objects or more. It is proven that intra-object tests are 4-5 times faster to execute than inter-objects tests, and this is why they are performed first and serve as “discrimination” to avoid doing too much evaluation.

• Changhai,

What about (person’s billing zip code == person’s shipping zip code)? Is this an intra-object test?

Thanks.

Kenny

• This is an inter-objects test. This kind of test can be significantly optimized using an index over each memory, and by looking at the right place.

• I might have responded to Kenny too quickly.

If you are dealing with a same person (the rule language will tell if this is the same variable, for example), then this is an intra-object test.

If you are doing pattern matching on people’s zipcodes in general, the test is rather an inter object one, and my previous reply applies.

In all the cases, there is a way to organize the engine’s memories such that this kind of equality tests can be performed very fast.

• Thanks Changhai.

You’ve read my mind. I was going to say that no matter it’s intra- or inter-object tests, there should be ways to optimize the speed of matching. :) Thanks.

5. I have couple of questions from the above explanation:

1) In the example, we assumed that flight miles rule will execute first from list of the 2 rules in agenda (GOLD status AND 500+ Flight Miles). After the flight miles execution, we understand that the GOLD status rule remains active in the agenda. Since Joe’s account has changed so the conditions related to award miles are re-assessed, in turn GOLD status rule is satisfied again, here my question is, Is the algorithm duplicates the ACTION node of GOLD status rule for execution? OR how it happens?

2) We understand that in a result of flight miles rule execution the award miles for Joe have been changed which leads to re-assessing of condition nodes involving award miles which leads to GOLD status rule and 100% GOLD bonus rule. here my question is while executing the 100% GOLD bonus rule how the the engine distinguish the actual award miles property value (150k) from the changed value (152,419) from the flight miles rule execution to set the final award miles value?

• Phani,

For your first question, the algorithm only reacts to state change in the node. the GOLD Status rule condition node was true and hence executed (150k). when the additional flight miles are credited, the state of that node is STILL true (152+k) so the rule does NOT fire again.

If we had started with 99k award miles then the SILVER Status rule would have fired initially — with the 25k condition node set at true and the 100k condition node set at false. After the credit of flight miles, the states of those 2 nodes would have been re-evaluated as a result of propagation and would have changed state. Then the GOLD Status rule would have overridden the status to GOLD.

On your second question… I am not sure what you mean by “distinguishing”… The flight rule adds the flight mileage to the award miles. The GOLD bonus rule action would likely be written as “award miles = award miles + flight miles” (it would be up to the marketer to decide if they would actually give at least 500 award miles for short trips). Does that make sense?

Carole-Ann

• Great article! (and I am a little late to the party) I am a little confused though, some of the Award Miles terminal nodes, don’t appear to have a link node that propagates the account to them for the award to be applied (miles > 500 for example). I’m wondering how that part works. If the terminal node only receives a flight and not a token(flight + account) then how do these terminal nodes change the account? I’m sure I’m overlooking something really simple, still getting my head around RETE.

• MF, great catch. You would likely need to link to the Account to have access to it in the action side. In this example, I could claim that we are only processing one account at the time and have a global variable pointing to it, but that would be ugly, wouldn’t it?

6. Yes Carole-Ann. I understood the explanation.

7. Hello,

Thanks for this refreshing blog, I bookmarked it on my Rete references :)

I’m not sure you are allowed to speak about this, but as FICO has extensive cooperation with Dr Forgy, will Rete-NT be incorporated in the forthcoming Blaze Advisor 7?

Paul

• Paul, thank you very much for the bookmark! It is always a pleasure to know we provide information that is helpful.

I left FICO over a year ago so I am not privy to their roadmap any longer. All I can say is that they have terminated their agreement with Charles in 2010.

8. [...] Our RETE Series: Part 1, answer and Part 2! [...]

9. Very good post! So when the part 3 is coming?

• Thank you! Soon if you twist my arm ;-)

10. [...] Rete Algorithm Demystified blog series attracted a huge crowd.  I want to thank you for your readership!  Following the [...]