Friday, July 30, 2010

Business rules and rules engines, oh my

This is a bit of a meandering post, thoughts and ideas that I put down with regards to our use of a rules engine within the environment. For the past five years I was involved in initially using and later promoting the use of a third-party rules engine within the company that I am at (I became the rules ‘evangelist’ within the company due to the work that was done within the rules environment). The rules engine (no, not one of the open source ones) was not a big name player, however it was flagged as an ‘up and coming’ challenger in the business rules engine space. Unfortunately I am not sure whether one can use the name of the engine, The rules engine provided not only a RETE engine, but also some BRMS capability.

Unfortunately a couple of years ago the company that built the rules engine was bought out by a very large corporate (no, not the one that starts with ‘O’ and ends in ‘racle’. For a change it was another one – think ‘Tree fluid’). As it was a no-no to do business with said big conglomerate, the version of the rules engine has never been updated. However system development and investment in rules technology continued, with – at some stage – a conversion to another rules engine in the making.

So, over the past five years there has been a big learning curve when it comes to the business rules engine space. Before I started at my employer a business rules engine was something one would occasionally read about, where some seemingly lofty promises were made. I’ve played with Drools and Jess and dabbled with expert systems, thought to myself that it is sweet, and there would be some place to use it. But never really saw the need for one. And then I joined my current employer that were ‘using’ a rules engine in a very interesting(*) manner.

Rules were developed in Java code – originally in neatly segmented rules classes but with copy / paste sickness came a spread of the rules all through the code base. Some (yes, only some) of the rules were called from the rules engine as it could invoke methods on Java classes supplied to it. We thus had a system where a third party rules engine was called, and that immediately called out back to the very application that called it to execute logic and return the outcome back to the rules engine. The outcome would then be returned to the calling application that calculated it in the very first place… Yes, this did not make sense at all. Just because the rules engine that was being used could integrate nicely into a Java environment and execute Java code, does not mean that one should abuse it like this – there was no clear separation of concern, no use of the power of RETE.

As architect I faced a choice – either get rid of the rules engine, the superfluous calls to it, or use it properly. A project was identified to try out the rules technology and see whether it can assist us. The outcome of this was an embracing of the rules engine as a concept and after five years the rules engine has become a cornerstone of the architecture of the system. And no, rules are not simply ‘shells’ over Java code, a couple of simple standards have been instituted to combat this and to help us with our use of the rules engine.

What standards were introduced?
  • Minimise the calling of pure Java code within the rules engine. Unfortunately we couldn’t get away completely from the ‘do not call Java code in the rules engine’ approach. However we try to keep it to a minimum, limiting it to convenience methods in order to make it easier to obtain data. What is not allowed though is – for example – manipulation of flows, saving (or retrieving) of data within a rule. The question that should arise out of this is how were the rules used then, if business requires a rule to direct information to a specific group? This leads to the next point…
  • Rules determine outcomes only, it does not do more. The rules engine is one of the components of the system, and how could we facilitate communication between components, yet ensure that components do not necessarily know of each other, but can communicate with each other. What we did on the rules side is that the rules will define outcomes where an outcome has a unique identifier. The identifier is a grouping code that can then be used by other components as an input that they will react to. This allowed us to separate the business rules and the effect (outcome) of the business rules from the actual components that will react to and do the hard work. It also means that if there is a requirement change (send out information to a different target for example) that an outcome can be either added, removed or the unique identifier changed.
  • Use of decision tables. Way back in the past rules engines used only the RETE algorithm, and if…then… rules were the mainstay. Most of the rules engines these days implement decision tables in one form or another. What makes them particularly attractive in our environment was the integration with Excel – a tool that a business user understands. We ‘trained’ some of our business users to define rules in an Excel spreadsheet that would allow us to import into and export from the business rules engine.
  • Use of Flow rules. Another new addition, and something which appealed to the modeler in me. Using flow rules allowed us to model graphically flows and the order in which rulesets are to be called. After a meeting with the architects of the rules engine company, we also started using flow rulesets as the entry point to all our rules projects – thus allowing for internal changes to rulesets to be shielded from the outside world (in this case other components in the system). Thus we could change ruleset names without affecting our calling code.


We learned some lessons over the five years of active usage. What were the top lessons learnt during five years of rules engine usage?
  • Business ownership. This is an aspect that we are – in all honesty – still battling with. Many Business Rules Engines sell themselves on the fact that business can define rules and can thus take ownership of 'their work'. We did not experience this, and are still trying to get business to take ownership of and even understand their rules. This may sound strange, but I have been in meetings where you hear ‘What are the rules in the system’ or ‘If we define it what would you guys do’. It has certainly improved, rules definitions are being shifted off to more business focused people (the business analysts for example). It will be a long road still. Other aspects that have helped us on this road is the format of how a rule is defined (we have a formal definition format), the rules bibles mentioned earlier and the ability to publish rules more easily.
  • Rules visibility. Another aspect that I believe we can improve, but one that has paid dividends in the past. One can report easier on the rules in terms of generating rules overviews as well as tracking what rules have fired. We developed a simple interceptor that saves the appropriate information for our analysts to drill into (yes, the rules engine has a stunning little API exposed that allows all sorts of information to be retrieved). Although I personally believe that we should actually expose the rules on our internal website for example for members of business to view.
  • Separation of concerns. The separation of Java code and what will be executed by the rules engine. I personally believe that due to our use of outcomes, and minimizing the execution of any Java code from within the rules engine is making our maintenance and continuous support a lot simpler.
  • Treat rules seriously. Once we started to understand the worth of the rules engine and building up on it, we changed from rules as an afterthought to it becoming a major part of the system. At one stage a mantra in the team was that ‘prove it cannot be done in the rules engine’ as a way to almost force the development of rules in the engine. That said it was also refreshing to see how the developers and later the business analysts took to the use and understanding of the rules engine.


I am (still) passionate about the use of a rules engine, or rather what a rules engine provides. I am after all in the business of building business applications in an environment that is rather heavy on the rules side. Not only from a company and area but also from various national and international standards as well as legislation. Our current rules documents are over 350 pages each and there are two of them. Yes, there is a lot of white space to improve readability, but it is still rather hefty. Has the investment in rules technology paid off? In my mind it has, especially when I compare some of the maintenance tasks that occurs within the legacy system as compared to the system using the rules engine.

The current rules engine will be phased out and replaced due to its age and the company’s lack of interaction with the provider. Replacing it with another rules engine (likely by one of the open source equivalents, ooo I wonder which one). It certainly has been an interesting road…



(*) In this case interesting is used as a euphemism for poor, bad, not interesting at all but rather scary.

Thursday, July 22, 2010

Tables, Keys and Hibernate fun

Okay, this is one of those blog posts where one has to walk the line of not saying too much for fear of giving away corporate secrets and not saying enough. So excuse any vagueness when it comes to specifics on the internals. I started digging into the corporate ODS (Operational Data Store) for a couple of reasons. I want to understand the structure of it better, what it can actually provide to us in terms of business value, to actually start utilising it and, well, just to do a bit of technical playing and get the ole gray matter working a bit on more than specs.

So, there I sat, looking at a structure which is similar to, but with some subtle differences, to what is in the day to day source system. Arming myself with Hibernate and myEclipse I decide to reverse engineer the structures - easy enough to do with the tools at hand. Imagine my surprise when after the first import the mappings showed that the unique identifier for every ODS table is a composite id of all of the columns within the table. Surprising at first considering that the source tables were built from the word go according to the guidelines and standards of the company and was designed from the ground up to have keys, referential integrity and all the good stuff that is always thrown about.

A quick investigation into the ODS tables showed that there weren't primary keys defined on any of the tables. Now, as I can't give away any corporate secrets, I've created a little sample table to work with to illustrate what happens - so I give you the SAMPLE table (created in Derby) that emulates what I saw typically in the ODS.


create table "EXAMPLE"."SAMPLE"(
"ID" INTEGER,
"TITLE" VARCHAR(128) not null,
"AVALUE" VARCHAR(255),
"AKEY" INTEGER not null,
"POSTTIME"
TIMESTAMP
);
create unique index "SAMPLEINDEX" on "EXAMPLE"."SAMPLE"("ID");


Interesting, but when one asks the myEclipse Hibernate generator to generate a mapping file, one gets the following


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<!--
Mapping file autogenerated by MyEclipse Persistence Tools
-->
<hibernate-mapping>
<class name="za.co.passif.mappings.Sample" table="SAMPLE" schema="EXAMPLE">
<composite-id name="id" class="za.co.passif.mappings.SampleId">
<key-property name="id" type="java.lang.Integer">
<column name="ID" />
</key-property>
<key-property name="title" type="java.lang.String">
<column name="TITLE" length="128" />
</key-property>
<key-property name="avalue" type="java.lang.String">
<column name="AVALUE" />
</key-property>
<key-property name="akey" type="java.lang.Integer">
<column name="AKEY" />
</key-property>
<key-property name="posttime" type="java.sql.Timestamp">
<column name="POSTTIME" length="26" />
</key-property>
</composite-id>
</class>
</hibernate-mapping>


Not quite what I was expecting initially, but it makes sense based on the table definition. A quick visit to the ODS architect and I garnered the following:

  • Due to how the ODS is populated they will never add a primary key to a table. Entries into the ODS will always only be added, thus an update to a value on the source table will translate to a new row in the corresponding ODS table. This will enable one to see 'change over time', but does place an onus on any query using dates to derive the appropriate value as at a specific moment in time.
  • There is a column in every ODS table that sounds like, tastes like and feels like a primary key - I represented it as the AKEY column in my example. It is a unique column using a sequence to generate a unique value and can thus be used as a primary key, however it will never be indicated as such as the sequence generation is handled within code. On a personal note I would have loved it if this column could have been indicated as a primary key on table level, however personal preferences and corporate standards do not neccesarily mix.


So, how to get past this? There must be some way to instruct Hibernate that the AKEY column can be used as a primary key. And there it was, cunningly hiding in plain sight - a way to define a custom strategy for reverse engineering the mappings that will be generated.



MyEclipse has a very short, but nice overview of how to configure and set up a custom strategy (just select the Browse button and follow the link).

So, at this stage, what did I know?

  • Every ODS table has a column that can be used to uniquely identify the row called AKEY

  • The AKEY column uses a sequence to generate a unique id

  • I can use the AKEY column as a primary key


Armed with this knowledge the solution proved to be quite simple. One of the methods on the DelegationReverseEngineeringStrategy class is the getPrimaryKeyColumnNames
method. It returns a list of the all columns that make up the primary key for the specified table.

public List getPrimaryKeyColumnNames(TableIdentifier identifier)


I overrode the method with the following - not the most elegant but it will have the desired result based on the above information


package za.co.passif.tools;

import java.util.ArrayList;
import java.util.List;
import org.hibernate.cfg.reveng.DelegatingReverseEngineeringStrategy;
import org.hibernate.cfg.reveng.ReverseEngineeringStrategy;
import org.hibernate.cfg.reveng.TableIdentifier;

public class SampleStrategy extends DelegatingReverseEngineeringStrategy {

public SampleStrategy(ReverseEngineeringStrategy delegate) {
super(delegate);
}

/**
* Forces the AKEY column to be returned as the primary key for any table
* being reverse engineered
*/
public List getPrimaryKeyColumnNames(TableIdentifier identifier) {
ArrayList aList = new ArrayList();
aList.add(new String("AKEY"));
return aList;
}
}


After rerunning the reverse engineering my mapping file now looks much more like what I was expecting in the first place.


<hibernate-mapping>
<class schema="EXAMPLE" table="SAMPLE"name="za.co.passif.mappings.Sample">
<id name="akey" type="java.lang.Integer">
<column name="AKEY">
<generator class="assigned">
</id>
<property name="id" type="java.lang.Integer">
<column name="ID" unique="true">
</property>
<property name="title" type="java.lang.String">
<column name="TITLE" length="128" null="true">
</property>
<property name="avalue" type="java.lang.String">
<column name="AVALUE">
</property>
<property name="posttime" type="java.sql.Timestamp">
<column name="POSTTIME" length="26">
</property>
</class>
</hibernate-mapping>


Mission accomplished. The mapping files now define the intent of the ODS table appropriately and correctly. Overkill for a single table, but when you look at thirty or so tables the internal lazy developer kicks in. Or maybe that is just Brian's influence...

As an aside if one chooses to also generate Id classes, the id class itself does not contain all of the definitions for the data structure (this originally started me off on this road to explore the Strategy). One of the comments I came across was that 'myEclipse only works with what it knows' which is true, but it was nice to discover one can dictate its behaviour and get consistency for an approach.

And now for this old dog to learn how the syntax highlighting works. Darn new-fangled technology - in my day a marker and paper was enough... grumble...grumble...

Wednesday, July 21, 2010

Why start Blogging

Well, this is it - the moment that a blank canvas needs to be filled with something that is interesting, entertaining and even - dare one say - capable of generating some ... controversy? High and lofty goals indeed.

No, not really, but seeing the fun that Brian is having on http://java-it-zen.blogspot.com/ I have decided to try this new fangled blogging technology to post items that I came across my day to day tasks that I found of interest and that will - hopefully - be of interest to others. At the very least I will be able to say that I have tried it.

Oh and the title of the blog? Quite simple, for a 44 year old scooter driver, what better title could there be...

I promise that the next entry should be of more IT interest, spinning a tale of intrigue and daring with regards to data stores, Hibernate and some rather interesting standards...