Building an Object Persistence Framework

Building an Object Persistence Framework

Jim Cooper - Tabdee Ltd


What is an Object Persistence Framework, and why should I care?

Overview of my OPF

Business objects

Mappers

Choosing the mapper

Filtering

Sorting

Presentation layers

Restrictions and possible enhancements

Examples of commercial and freeware OPFs

References

Conclusion

Source code for this paper.


What is an Object Persistence Framework, and why should I care?

An Object Persistence Framework (OPF) is a layered architecture that allows developers to exploit the power and expressiveness of object orientation without having to be concerned with the details of the persistence of objects.

In the last couple of decades, object orientation has emerged as the most powerful language and architectural tool to write complex software[1]. Research in areas like patterns has given us elegant ways to solve difficult problems. However, most software has some information that needs to be persisted. If we want to store an object in Delphi, this presents us with a few problems.

One way we could do it is to use Delphi’s object streaming in a similar way that Delphi stores forms in .dfm files. For prototyping, or even small single-user applications this approach can be very valuable[2]. We could write something similar that used an XML file. However, we normally want to store our valuable data in some sort of database. If we had an object-oriented database (OODBMS) to hand, then that would be perfect. Sadly these are uncommon and usually we have to store data in a relational-style database (RDBMS).

Out of the box Delphi doesn’t provide any direct support for this. Some people have attempted to use subclasses of datasets or data modules as their business objects (BOs). Success has been limited. It is possible to encapsulate data and operations this way, but in both cases developers using the classes can access the database directly, thus by-passing the business logic. While it is possible to have some business logic in the database, this is normally less maintainable, and is often fired too late to give the sorts of messages we want to end users. Some logic is extremely difficult to code that way.

A better approach is to have software that takes our persistent objects and maps them to and from the database. This is an OPF. Typically we also need some sort of presentation layer so that users can interact with the objects. We end up with a system like this:

OPF Layers

Normally we have one or more presentation layers, which connect user interfaces, reports, web interfaces and so on to the business objects in a system. These business objects typically contain both data and business rules. There will also be objects which are not persisted but which implement various program behaviours. The presentation layer often uses patterns like Observer, Mediator, Model View Controller or Model View Presenter to control the interaction between interface and objects. This is a topic all to itself, and we won’t be talking about it in much detail at all.

We then have our business objects. They are no different from any other type of object really, it’s just a common term to use. They often have a common ancestor with internal properties used by the OPF. The most common of these is some form of object ID (OID) which uniquely identifies the object. There are a number of schemes for generating OIDs. I use GUIDs, as I know my applications will only be required to run on Windows, and some users will need to run disconnected from the central database.

Then we have our persistence framework, which handles mappings between objects and databases (or other storage). There are a number of ways this can be done, and we’ll discuss some as we go.

Note that these are all logical, not physical, layers. Discussion of how they can be physically separated is beyond the scope of this paper.

Advantages of using an OPF include:

  • Can use high-level objects like TCustomer, say, in application level code
  • It separates business rules from event handlers on data-aware controls and TDatasets
  • Separating data access from application level code creates more maintainable code
  • It removes the need for complex data modules
  • Separating business objects and data access code means they can be worked on separately
  • Not allowing direct access to the persistence mechanism means that business rules cannot be flouted so easily
  • Prototyping can be very quick and easy as there is no need to implement the persistence layer
  • Changing the type of persistence used is just a matter of using a different persistence layer. The business objects are not affected, and indeed, the change can be made at run-time

I’m no great fan of data-aware controls either. They are complex things that are difficult to modify if their behaviour, either visually or in terms of their data accessing, is not want you want, or is inefficient. They also allow a programmer direct access to the underlying database.

Disadvantages:

  • User interfaces are more difficult to write, as you have to provide all the communication with the BOs yourself
  • Third-party reporting tools may not support BOs
  • If other applications access the same database, some business rules may need to be mirrored in the database, or in the other applications
  • Retrofitting an OPF on top of a legacy database structure may be difficult

Overview of my OPF

The framework we will be examining is a cut-down version of one I wrote that is in use in a number of applications tracking the performance of elite British athletes[3].. As such it is tailored to the systems we build, and lacks some things that would be required in other systems (like transaction support, say). As such, I make no pretence that this is even close to being the best OPF available, or that I have always chosen the best way to do everything. Hopefully, though, you will be able to get an idea of what an OPF does, and what issues you should consider when building or evaluating an OPF yourself.

The presentation layer code that accompanies my OPF is also quite specific, as the UIs make heavy use of list views, and have no save or cancel buttons and so on. As was mentioned earlier, it is also very complex, and we wouldn’t have time to go through it anyway. Certain other elements have been simplified to aid understanding.

There are 3 essential layers to consider. The first is the business objects themselves. The next is the factory that decides which mapper to use for a given class of object, and the last is the mapping layer, which does the actual work.

If you’re interested, the design I used was based mostly on the Larman and Fowler references given below.

Business objects

We should start with the business object base classes. In my system, everything that needs to be persisted is derived from TCollectionItem or TOwnedCollection (it’s better for streaming and so on than a TCollection):

  TstgCustomCollectionItem = class(TCollectionItem)
  public
    constructor Create(Collection : TCollection); override;
    constructor CreateWithOwner(AOwner : TstgCustomCollectionItem); overload;
    constructor CreateWithOwner(AOwner : TstgSystem); overload;
    destructor  Destroy; override;

    procedure Assign(Source : TPersistent); override;
    procedure CheckConstraints(ConstraintType : TstgConstraintType); virtual;

    property State : TstgObjectState read FState write SetState;
    property Proxy : Boolean read FProxy write FProxy;
    property Owner : TstgCustomCollectionItem read FOwner;
  published
    property Oid      : TstgOid read FOid write SetOid;
    property OwnerOid : TstgOid read GetOwnerOid;
    property Version  : Integer read FVersion write FVersion;
  end;


  TstgCustomCollection = class(TOwnedCollection)
  public
    constructor Create(AOwner : TPersistent;
                       ItemClass : TCollectionItemClass); virtual;

    function  Add : TstgCustomCollectionItem;
    function  FindItemID(ID : Integer) : TstgCustomCollectionItem;
    function  Insert(Index : Integer) : TstgCustomCollectionItem;

    procedure Clear;
    procedure Delete(Index : Integer);

    function  Find(Oid : TstgOid) : TstgCustomCollectionItem; overload;
    function  Find(const Value : string;
                   NonIncremental : Boolean = False;
                   CheckSortAssigned : Boolean = True) : Integer; overload;
    function  Find(const Value,PropertyName : string; FindFirstIfDuplicates : Boolean = False) : Integer; overload;

    procedure Assign(Source : TPersistent); override;

    procedure CheckConstraints(ConstraintType : TstgConstraintType); virtual;

    procedure Sort(SortOrder : TstgSortOrder); overload;
    procedure Sort(const SortPropName : string = '';Ascending : Boolean = True); overload;

    property Items[Index : Integer] : TstgCustomCollectionItem read GetItem write SetItem; default;
    property State            : TstgObjectState read FState write SetState;
    property Proxy            : Boolean read FProxy write FProxy;
    property HasOwner         : Boolean read GetHasOwner;
    property OwnerOid         : TstgOid read GetOwnerOid;
    property ServerSideFilter : TstgServerSideFilter read FServerSideFilter write FServerSideFilter;
    property CurrItem         : TstgCustomCollectionItem read GetCurrItem write SetCurrItem;
  end;

The base class for a business object is TStgCustomCollectionItem, and for a collection of business objects is TstgCustomCollection. Normally I create a collection class to match every BO class, which provides type-safe access to the items in the collection. That way no type-casting is needed in any application code. The collection item class has several important properties.

The first is Oid, which is essentially a GUID wrapped into an object. This uniquely identifies the object, even if it was created on a system not connected to any other. There is also an OwnerOid which is the OID of the owning object. All objects in a collection will have the same OwnerOid. This leads to one of the limitations of the system. It cannot deal with an object that has two collections of the same type of object, as all elements of both collections will have the same object as owner, and the mapper will not be able to tell to which collection they belong.

The State property is used to tell whether the object has been modified (and whether the modification was to a property that is itself a persistable object), is new, and so on. The state is important for the presentation layer, so it knows when to persist things, and also to the mapper, so it knows whether to do an insert or update operation. We’ll look at the Proxy property when we get to the mapping layer.

Typically, you will need to validate various property values. This is done either in the accessor methods for a property (i.e. the Get and Set methods), or if it relies on several property values, the CheckConstraints method is overridden. You may need business logic on saving or deleting, so CheckConstraints is called by the mappers in both cases. We’ll see an example of this later.

The collection class does the standard overrides to make a type-safe collection of one class of item, and adds a few useful methods like searching and sorting. It also has State and Proxy properties, which work in a similar fashion to theose on a BO. CurrItem is a property introduced to speed up operations on large collections. References in code like MyCollection[3000] meant that the collection code in the VCL counted 3000 items in every time it got called. CurrItem is essentially a cache of the current object (see the source if you’re really interested).

ServerSideFilter is used to transfer values needed in filtering operations in the mapper layer. For instance, it can be used to generate complex where clauses and joins in SQL statements. There is a corresponding mapper class that knows how to use each TstgServerSideFilter subclass, and it is also matched by the same sort of factory that chooses the proper BO mapper. This will all become a little clearer as we see that process in action. You can use the Iterator pattern for client-side filtering. You can have filters as complex as you need, and it’s quick as you don’t need another round-trip to the database.

Note that there are no calls to load, save or delete the object. There is no reason for a BO not to have these, except that I believe a BO should not know anything at all about persistence, not even whether it can be persisted. Other OPFs have methods like these on the BO and BO collection classes.

As we will see later, all objects in the system need an owner, so we will need an object to be the topmost owner of all other objects and collections in an application. This is always a Singleton, and is not persisted itself:

  TstgSystem = class(TComponent)
  public
    constructor Create(AOwner : TComponent); override;
    destructor  Destroy; override;
  published
    property Oid     : TstgOid read FOid write SetOid;
    property Version : Integer read FVersion write FVersion;
  end;

This simple class makes working with the object hierarchy during persistence operations much easier, since we never have to deal with orphan or unrelated objects. We’ll see how that works shortly.

Mappers

The job of a mapper is to take a BO or BO collection, and handle the saving, loading and deleting operations for a given persistence mechanism. Most commonly this will be some sort of relational-style database, so the examples will be based on that, but you use similar techniques if you wanted to use some other type of storage.

I’m going to use the word “table” a lot in the following discussion, but I will mean tables, views, queries and so on. I don’t know of a general term that covers all these things, so don’t get too hung up on the terminology.

Handling composition

There are three sorts of relationships between objects that we need to handle in our mappers. The first is composition: objects may contain other objects. An example might be that a person object has an address object as a property. These are truncated definitions of the address and person classes:

  TAddress = class(TstgCustomCollectionItem)
  public
    procedure Assign(Source : TPersistent); override;
  published
    property HouseNumber : string read FHouseNumber write SetHouseNumber;
    property Street      : string read FStreet write SetStreet;
    property Town        : string read FTown write SetTown;
    property Postcode    : string read FPostcode write SetPostcode;
  end;

  TPerson = class(TstgCustomCollectionItem)
  public
    procedure CheckConstraints(ConstraintType : TstgConstraintType); override;
  published
    property Title     : string read FTitle write SetTitle;
    property FirstName : string read FFirstName write SetFirstName;
    property Surname   : string read FSurname write SetSurname;
    property Address   : TAddress read FAddress write SetAddress;
    property FullName  : string read GetFullName;
  end;

I’ve left out some of the more obvious private and public fields and methods. There are a couple of important things to note here. One is that most properties I might want to persist, or have show in reports and so on, are published rather than public. This allows me to use RTTI to generalise these behaviours. We’ll see an example of that when we examine sorting. The mappers themselves also use RTTI to handle composition and inheritance.

All the properties have a setter accessor method, which generally has the form:

procedure TPerson.SetFirstName(const Value : string);
begin
  if FFirstName <> Value then begin
    FFirstName := Value;
    State      := osModifiedSelf;
  end;
end;

Notice how the State property is set for the object. In this case, it attempts to set the State to a value indicating that the object has changed, but the change is not in an object or collection that itself needs to be saved. If a new address object was to be assigned then this code would be called:

procedure TPerson.SetAddress(const Value : TAddress);
begin
  FAddress.Assign(Value);
  State := osModifiedProperty;
end;

This sets a different modified flag[4] that indicates a subobject has been modified (note that actually doing these sorts of assignments is rare, and you need to be a bit careful with memory leaks). In addition, if we set the State property of one of the Address properties:

procedure TAddress.SetHouseNumber(const Value : string);
begin
  if FHouseNumber <> Value then begin
    FHouseNumber := Value;
    State        := osModifiedSelf;
  end;
end;

then the SetState method of the base BO class gets called:

procedure TstgCustomCollectionItem.SetState(const Value : TstgObjectState);
begin
  if (FState = osNew) and (Value in [osModifiedSelf,osModifiedProperty]) then begin
    // Need to know to create a new entry or modify an existing one
    Exit;
  end;

  if (FState = osModifiedSelf) and (Value = osModifiedProperty) then begin
    Exit;
  end;

  FState := Value;

  // Let the collection know that there are items requiring action
  if FState in [osNew,osModifiedSelf,osModifiedProperty] then begin
    Changed(False);

    if Assigned(FOwner) and (FOwner is TstgCustomCollectionItem) then begin
      TstgCustomCollectionItem(FOwner).State := osModifiedProperty;
    end;
  end;
end;

If the object is new (i.e. it has not yet been added to the database), then attempt to change the State to one of the modified flags are ignored. The mapper will always need to know that a new record needs to be added (e.g. a SQL Insert clause will be generated rather than an Update). If we are trying to set the State to be either new or modified, then the owner object also has it’s State set to show that a persitable property has been modified. We shall see how that helps the mappers in a minute.

So now we know what objects need saving into the database, how do we go about it? The base class for all of our mappers is shown below (again, I’ve only included the most important declarations here, see the source for the full version):

  TstgCustomMapper = class(TObject)
  private
  protected
    class procedure AncestorOperation(Oid,OwnerOid : TstgOid;
                                     Item : TstgCustomCollectionItem;
                                     Op : TstgPersistOp);
    class procedure PropertyOperation(Oid : TstgOid;
                                     Item : TstgCustomCollectionItem;
                                     Op : TstgPersistOp);

    // Methods to override to create a new type of mapper
    class procedure DoLoad(Oid : TstgOid;
                           Item : TstgCustomCollectionItem); overload; virtual; abstract;
    class procedure DoSave(Item : TstgCustomCollectionItem;
                           OwnerOid : TstgOid); overload; virtual; abstract;
    class procedure DoDelete(Item : TstgCustomCollectionItem); overload; virtual; abstract;
    class procedure DoLoad(OwnerOid : TstgOid;
                           Collection : TstgCustomCollection;
                           FilterClass : TstgServerSideFilterMapperClass); overload; virtual; abstract;
  public
    // Procedures to act on collection items
    class procedure Load(Oid : TstgOid;
                         Item : TstgCustomCollectionItem); overload;
    class procedure Save(Item : TstgCustomCollectionItem;
                         OwnerOid : TstgOid); overload;
    class procedure Delete(Item : TstgCustomCollectionItem); overload;
    // Procedures to act on collections
    class procedure Load(OwnerOid : TstgOid;
                         Collection : TstgCustomCollection;
                         FilterMapperClass : TstgServerSideFilterMapperClass); overload;
    class procedure Save(Collection : TstgCustomCollection); overload;
    class procedure Delete(Collection : TstgCustomCollection);  overload;
  end;

Each of the public methods uses the Template Method pattern. For instance, the Save method for a BO looks like this:

class procedure TstgCustomMapper.Save(Item : TstgCustomCollectionItem;
                                      OwnerOid : TstgOid);
begin
  if Assigned(Item.Oid) and (not Item.Oid.IsValid) then begin
    // Not yet assigned an OID, so generate one
    Item.Oid.GenerateOID;
  end;

  if not (Item.State in [osNew,osModifiedSelf,osModifiedProperty]) then begin
    Exit;
  end;

  if Item.State in [osNew,osModifiedSelf] then begin
    // Only save the item when strictly necessary
    DoSave(Item,OwnerOid);
    AncestorOperation(nil,OwnerOid,Item,poSave);
  end;

  PropertyOperation(Item.Oid,Item,poSave);
  Item.State := osClean;
end;

The same basic operations need to be carried out for all BOs. However, the exact details will vary depending on the persistence mechanism used, so that is deferred to the DoSave method of descendent classes. Because DoSave is abstract, subclasses must implement that behaviour. Note that all these methods are class procedures, so an instance of the mapper is not necessary, we only have to know its class.

In the example I’ve implemented a mapper using ADO data access components (it is equally at home on Access and SQL Server databases). The DoSave method is:

class procedure TstgADOItemMapper.DoSave(Item : TstgCustomCollectionItem;
                                         OwnerOid : TstgOid);
  var
    OwnerOidStr : string;
begin
  OwnerOidStr := GuidToString(NullGuid);

  with Self.Create do begin
    try
      AdoQuery.Close;

      if Item.State in [osNew,osModifiedSelf] then begin
        if Item.State = osNew then begin
          AdoQuery.SQL.Text := InsertQuery;
        end else begin
          AdoQuery.SQL.Text := UpdateQuery;
        end;

        if Assigned(OwnerOid) then begin
          OwnerOidStr := OwnerOid.AsString;
        end;

        MapObjectToQuery(Item,AdoQuery,OwnerOidStr);

        AdoConnection.BeginTrans;
        AdoQuery.ExecSQL;
        AdoConnection.CommitTrans;
      end;
    finally
      Free;
    end;
  end;
end;

The three highlighted methods need to be implemented in all the concrete mapper classes. In the attached example is an Access database that contains a Persons and an Addresses table. The mapping of classes to tables is the obvious one. The structures of these tables are shown below.

Table mapping

So, for example, the insert query is generated by this code:

function TPersonMapper.GetTableName : string;
begin
  Result := 'Persons';
end;


function TPersonMapper.GetInsertQuery : string;
begin
  Result := 'insert into ' +
            TableName +
            '(OID, Owner_OID, Version, Pers_Title, Pers_FirstName,' +
            ' Pers_Surname) ' +
            'Values(:OID, :Owner_OID, :Version, :Pers_Title, ' +
            '       :Pers_FirstName, :Pers_Surname)';
end;

The parameters in the query are set in MapObjectToQuery:

procedure TPersonMapper.MapObjectToQuery(Item : TstgCustomCollectionItem;
                                         Query : TAdoQuery;
                                         OwnerOidStr : string);
begin
  with TPerson(Item) do begin
    Query.Parameters.ParamByName('OID').Value            := Oid.AsString;
    Query.Parameters.ParamByName('Owner_OID').Value      := OwnerOidStr;
    Query.Parameters.ParamByName('Version').Value        := Version;
    Query.Parameters.ParamByName('Pers_Title').Value     := Title;
    Query.Parameters.ParamByName('Pers_FirstName').Value := FirstName;
    Query.Parameters.ParamByName('Pers_Surname').Value   := Surname;
  end;
end;

In this example, and in most of my experience, classes correspond exactly to tables (using the term specifically to mean "table" this time), but this is not a requirement, particularly when dealing with legacy databases and completely changing the structure is not an option. When we see how to handle inheritance, we’ll see an example of more complex SQL involving joins. And as I said at the start, there is nothing to preclude you using views (if they are updateable on the database you’re using), calling stored procedures, using triggers, or whatever other facilities your database offers.

All the other methods in the base mapper class work in a similar fashion, except the collection save and delete procedures. In both of those instances, the collection is iterated over, with each item being saved or deleted separately. For the systems my OPF was designed for, this does not impose significant performance problems, as it’s turned out we don’t delete whole collections. You may want to modify the framework so that it can generate the appropriate SQL to delete a collection. Saving is done this way so that the client-side constraint checking is called, as we’ll see a little later.

Note also that there are only ever one TAdoQuery and one TAdoConnection component used. They are shared by all mappers. Bit of a drop in component count from the normal data-aware control based application, eh? I can get away with this because I haven’t needed to do operations in multiple threads, so if you need to do that you will need to take care of creating at least the query component in each thread.

As it stands, we can save a person object, but what if the address was changed? Well, if we go back and examine TStgCustomMapper.Save, we’ll see that we only call DoSave on the TPerson object if it is new, or the state is osModifiedSelf. In either of these situations, one of the non-BO properties of the object may have changed. After that, we call PropertyOperation on the item, which is another use of Template Method.

It’s too big to reproduce here, but essentially what it does is use RTTI to iterate through the published properties of a BO that is passed to it, looking for properties that are subclasses of of TStgCustomCollectoinItem or TstgCustomCollection. When it finds one, it calls the appropriate Save, Load or Delete method on the property. This way we can call Save on a top-level object and be sure it will save anything down the tree of subobjects that needs saving. It also only updates those objects that have actually changed. In our example code, if we change the house number on an address say, only the address mapper gets used when we save the person.

There are a couple of last points on handling composition. One is that the owner OID of all collection items is the OID of the owner object. This limits my OPF to one collection per BO for each collection item class, as the mapper will be unable to determine to which collection an item belongs. A more flexible arrangement would be to have a link table between the items table and the owner BO table. Of course, the generated SQL would be different, but it isn’t difficult. The other point is that BO properties that are themselves a BO, have the same OID as the owner object. So a person object and its address object have the same OID (and the owner OID of the address is the same as the person OID as well).

The reason this is handy is because we do lazy loads. That is, when we load collections we don’t necessarily load all the properties of each object. We don’t load any of the properties that are persistable BOS or collections. These things only get loaded when they are required, so that we can display lists of objects more quickly. On the main form of the example, we display the address of each person in a memo control as we move around in the list. The code that handles this is:

procedure TMainDlg.RefreshAddress(ARow : Integer);
  var
    TempContact : TContact;
begin
  if not InRange(ARow,1,SystemObject.Contacts.Count) then begin
    Exit;
  end;

  AddressMemo.Clear;
  TempContact := SystemObject.Contacts[ARow - 1];

  if TempContact.Address.Proxy then begin
    Persistence.Load(TempContact.Oid,TempContact.Address);
  end;

  AddressMemo.Lines.Text := TempContact.Address.FullAddress;
end;

The important lines have been highlighted. If the address is still a proxy it has not been fully loaded (in this instance it will not have been loaded at all). So we make sure we load the address before continuing. At this point, we don’t know its OID, so I found it useful if the OID was the same as its owner - the TContact object in this case (TContact is a subclass of TPerson).

The final point to note is that the OPF will do a cascading delete, that is, if you delete an object that contains other persistable objects or collections, those objects and collections will also be deleted.

Handling inheritance

Handling inheritance is a problem when mapping to a RDBMS, as the concept is completely foreign. If we are mapping classes to tables, then there are basically three approaches we can use. We’ll use the following class definitions as an example:

TPerson = class(TstgCustomCollectionItem)
  published
    property Title     : string read FTitle write SetTitle;
    property FirstName : string read FFirstName write SetFirstName;
    property Surname   : string read FSurname write SetSurname;
  end;

  TContact = class(TPerson)
  published
    property Description : string read FDescription write SetDescription;
    property PhoneNumber : string read FPhoneNumber write SetPhoneNumber;
  end;

  TRelative = class(TPerson)
  published
    property Relationship : string read FRelationship write FRelationship;
  end;

The first method we could use is to have one table containing all TPerson objects. This means that there would need to be a column for every one of the properties in each of the three classes above, some of which would not be used by certain types of objects. So we would need a table with six columns (Title, FirstName, Surname, Description, PhoneNumber, Relationship). This approach gets unwieldy if there are many subclasses. There is also the problem of identifying the type of object a row in the table represents.

A cleaner alternative is to have one table per class or subclass, so we would have three tables in our example. Each table has a column for each property in the class, and also has a column for each property in all superclasses. So the person table would have three columns (Title, FirstName, Surname), the contact table would have five (Title, FirstName, Surname, Description, PhoneNumber), and the relative table would have four (Title, FirstName, Surname, Relationship). The main problem with this approach is that changes to the base class or table must be reflected in the tables and mappers for each of the subclasses.

The final approach, and the one I used, is to have three tables, but with each only containing columns for the properties defined in the relevant class. So the person table would have three columns (Title, FirstName, Surname), the contact table would have two (Description, PhoneNumber), and the relative table would have one (Relationship). When loading a contact, say, the mapper could generate SQL that did a join between the contact and person tables, and loaded the relevant properties.To provide some resilience against changes to the base class though, I don’t normally do that when loading, saving or deleting a single object.

In the TstgCustomMapper.Save method shown earlier, there is a call to AncestorOperation just after we save the object. AncestorOperation looks like this:

class procedure TstgCustomMapper.AncestorOperation(Oid,OwnerOid : TstgOid;
                                                   Item : TstgCustomCollectionItem;
                                                   Op : TstgPersistOp);
  var
    Ancestor : TstgCustomCollectionItem;
    PClass   : TClass;
    IClass   : TstgCustomCollectionItemClass;
    OldState : TstgObjectState;
begin
  // Ensure that any persistent ancestors get acted upon
  PClass := Item.ClassParent;

  if (PClass <> nil) and (PClass.InheritsFrom(TstgCustomCollectionItem)) then begin
    IClass   := TstgCustomCollectionItemClass(PClass);
    Ancestor := IClass.Create(nil);

    try
      Ancestor.Assign(Item);

      case Op of
        poLoad   : Persistence.Load(Oid,Ancestor);

        poSave   : begin
          // Save state because want to force both updates 
          // to use same state
          OldState := Ancestor.State;
          Persistence.Save(Ancestor,OwnerOid);
          // Restore state because saving will have cleared it
          Ancestor.State := OldState;
        end;

        poDelete : Persistence.Delete(Ancestor);
      end;

      Item.Assign(Ancestor);
    finally
      FreeAndNil(Ancestor);
    end;
  end;
end;

This method looks to see if the ancestor class of the item passed to it is also a BO, and if it is, it calls the mapper for that class (with a bit of mucking around to preserve the State). This process is therefore recursive, and Load, Save or Delete will get called for every class in the inheritance hierarchy of the object, stopping when it hits TstgCustomCollectionItem.This way changes to base classes automatically propagate to subclasses. Admittedly there are more queries this way, but it is only done when loading, saving or deleting individual objects, and performance is normally acceptable.

Remember that earlier on I said that when l load a collection, I often only load a small subset of properties for each BO in the collection? This is to save time and memory and is an example of the Proxy pattern. In the attached demo program, we have a collection of contacts, where TContact is defined similarly to the example above. Typically in this situation, the information needed in a proxy load is that which is needed to display the collection items in a list. For contacts, this would normally be the title, firstname and surname. This means that the mapper for a contacts collection and a persons collection could actually be the same. We define which mapper to use by registering it with the Persistence object (we’ll examine this in more detail later):

    Persistence.RegisterClass(TPersons,TPersonsMapper);

So we could register the same TPersonsMapper for use on TContacts collections:

    Persistence.RegisterClass(TContacts,TPersonsMapper);

However, I wanted to show you that you can use more complex queries if you want, so the TContactsMapper class loads all the non-BO properties of each contact by using a join. Rather than show you all the code, I’ll let you examine that yourself, and we’ll just look at the generated query:

select 
  Cont_Description, Cont_PhoneNumber, 
  Pers_Title, Pers_FirstName, Pers_Surname, 
  Contacts.OID, Contacts.Owner_OID 
from 
  Contacts inner join Persons on 
  (Contacts.Oid = Persons.Oid) 
where 
  (Contacts.Owner_OID = '{00000000-0000-0000-0000-000000000000}')

As you can see, this lets us load all the properties of a TContact, except the Address property (and we could even do that if we wanted to). Normally, when doing a collection load, there are enough BO and BO collection properties that performance would suffer if we did this whenever we just wanted a quick display of a collection. However, sometimes performance is actually better with complex joins, even using outer joins to get all the members of subcollections. The most common time we need to do that is in reports, and for that reason, the OPF allows a different collection mapper to be registered for reports (if there is no report mapper registered then reports load collections using the normal mapper).

Handling aggregation

Aggregation is an il-defined term, but what I mean here is handling those situations where an object refers to other objects, but does not own them. An example is a cricketer object, that might contain a list of all the matches in which the cricketer has played. Deleting the cricketer should not result in a cascading delete in this case – we want the match objects to remain intact (although if they have references to the cricketer object, that should be removed). Surprisingly, given that the applications this OPF is mainly used for are sporting applications, this type of situation has rarely come up. I’m not going to go deeply into this aspect, at least partly because I’m not satisfied with the way my OPF handles this situation.

However, the general strategy is this. There is a different collection class that is used to hold this sort of relationship, that privately stores the OIDs and type of the related objects. The mappers for this type of collection use a link table in the database that has two columns of OIDs. Because we use lazy, or proxy, loading, the presentation layer has to take care to make sure the related objects are loaded before using them. Deleting is also an issue if there are links both ways between objects (e.g. the cricketer objects contain matches, and the match objects contain cricketers).

My code for handling this is not elegant, and I can’t help but think that I need a different approach. Delphi does this sort of thing when one component refers to another (like a TDatasource having a TDataset reference), and has a notification system to handle changes to the linked objects, and I think I need to build in something along those lines.

Validation and constraints

There are two main places to do validation and constraint checking. One is when a property value is being set, which normally happens when the presentation logic decides values need to be read in from the user interface. In our demo program, the PhoneNumber field will only accept numbers and spaces, so the SetPhoneNumber method looks like this:

procedure TContact.SetPhoneNumber(const Value : string);
  var
    i : Integer;
begin
  // Check phone number only contains numbers and spaces
  for i := 1 to Length(Value) do begin
    if not (Value[i] in ['0'..'9',' ']) then begin
      raise Exception.Create('A phone number can only contain ' + 
                             'numbers and spaces');
    end;
  end;

  if FPhoneNumber <> Value then begin
    FPhoneNumber := Value;
    State        := osModifiedSelf;
  end;
end;

It is also possible to check business rules when an object is about to be saved or deleted by overriding the CheckConstraints method, as we do when checking that a person (and therefore a contact) has either a first name or a surname before it can be saved:

procedure TPerson.CheckConstraints(ConstraintType : TstgConstraintType);
begin
  if ConstraintType = ctSave then begin
    if (Trim(FirstName) = '') and (Trim(Surname) = '') then begin
      raise Exception.Create('You must enter a first name or surname');
    end;
  end;
end;

The rules may be more complex. For example, in accounting systems you normally would not want customers deleted if they had unpaid invoices or unfulfilled orders, or if they were also a supplier, and so on. In a real-world situation, your objects and business logic will be more complex than our simple demo program, but hopefully this gives you the gist of how objects are persisted.

Choosing the mapper

Observant readers may have noticed that we have not yet shown where CheckContraints is called. We have also not yet shown how the relevant mapper is chosen for a BO or BO collection. All of this is handled by the Persistence object.

The Persistence object is a Singleton, and also a Facade, as it hides the complexities of the persistence layer behind a simple interface (again, somewhat abbreviated for clarity):

  TstgPersistenceFacade = class(TObject)
  public
    // Collection item persistence methods
    procedure Load(Oid : TstgOid;Item : TstgCustomCollectionItem); overload;
    procedure Save(Item : TstgCustomCollectionItem;OwnerOid : TstgOid); overload;
    procedure Delete(Item : TstgCustomCollectionItem); overload;

    // Collection persistence methods
    procedure Load(OwnerOid : TstgOid;Collection : TstgCustomCollection;DoReportLoad : Boolean = False); overload;
    procedure Save(Collection : TstgCustomCollection); overload;
    procedure Delete(Collection : TstgCustomCollection); overload;

    // Call this to register a class as persistable
    procedure RegisterClass(AClass : TPersistentClass;MapperClass : TstgMapperClass);
    // Call this to register a mapper for use in reporting 
    procedure RegisterReportClass(AClass : TPersistentClass;MapperClass : TstgMapperClass);
  end;

I often use classes with only class methods when I need a Singleton, but in this case that is not possible, as the Persistence object needs to keep track of the BO and BO collection class to mapper class mappings[5]. Internally, lists are kept of BO/mapper, BO collection/mapper and BO collection/report mapper pairs. These are built use the RegisterClass and RegisterReportClass methods. For instance, the contacts mappers are registered like this:

initialization
  Persistence.RegisterClass(TContact,TContactMapper);
  Persistence.RegisterClass(TContacts,TContactsMapper);

Doing the registering during initialisation of a unit means it is done before any loading of objects takes place. When an operation is requested on a BO or collection, the following routine is ultimately called:

procedure TstgPersistenceFacade.DoOperation(Oid,OwnerOid : TstgOid;
                                            Item : TPersistent;
                                            Op : TstgPersistOp;
                                            DoReportOperation : Boolean);
  var
    MClass : TstgMapperClass;
    FClass : TStgServerSideFilterClass;
begin
  if not Assigned(Item) then begin
    Exit;
  end;

  MClass := GetMapperClass(TPersistentClass(Item.ClassType),
                           DoReportOperation);

  if Assigned(MClass) then begin
    try
      if Item is TstgCustomCollectionItem then begin
        case Op of
          poLoad   : MClass.Load(Oid,TstgCustomCollectionItem(Item));

          poSave   : begin
            TstgCustomCollectionItem(Item).CheckConstraints(ctSave);
            MClass.Save(TstgCustomCollectionItem(Item),OwnerOid);
          end;

          poDelete : begin
           TstgCustomCollectionItem(Item).CheckConstraints(ctDelete);
            MClass.Delete(TstgCustomCollectionItem(Item));
          end;
        end;
      end else if Item is TstgCustomCollection then begin
        case Op of
          poLoad   : begin
            if Assigned(TstgCustomCollection(Item).ServerSideFilter) then begin
              FClass := TStgServerSideFilterClass(TstgCustomCollection(Item).ServerSideFilter.ClassType);
            end else begin
              FClass := nil;
            end;

            MClass.Load(OwnerOid,TstgCustomCollection(Item),GetFilterMapperClass(FClass));
            TstgCustomCollection(Item).Sort;
          end;

          poSave   : begin
            TstgCustomCollection(Item).CheckConstraints(ctSave);
            MClass.Save(TstgCustomCollection(Item));
          end;

          poDelete : begin
            TstgCustomCollection(Item).CheckConstraints(ctDelete);
            MClass.Delete(TstgCustomCollection(Item));
          end;
        end;
      end;
    except
      on EStgVerboseError do begin
        // Just reraise the exception because this routine can get 
        // called recursively, and we just want to exit the call 
        // stack while preserving the extra error information
        raise;
      end;

      on E : Exception do begin
        MClass.RaiseException(E,Item.ClassType,Op);
      end;
    end;
  end;
end;

In many ways, this routine is the heart of the OPF. It looks for a mapper for the item (which may be a BO or BO collection) that is passed to it using GetMapperClass. If it is for a reporting operation, then a different list of collection mappers will be searched. Depending on what we are trying to do, the constraint checking is done, and then the loading, saving or deleting is done.

Note that if there is no mapper class, then nothing happens. The big advantage of this is that work can proceed on the business objects completely separately to the mappers and database. You can develop the BOs first, and only implement persistence later when you’re happy with them. And you are completely insulated from changes in the presistence layer, be that modifications to the database structure, or the use of a completely different database. It is even possible to change backend databases at runtime. I do this by also registering the backend with the persistence facade:

  // Run locally on Access
  Persistence.RegisterBackend(TstgADOMapperBackend,ExtractFilePath(Application.ExeName) + 'Demo.mdb');

  Persistence.Initialise;  
  // Load all the contacts
  Persistence.Load(SystemObject.Oid,SystemObject.Contacts);

The call to Persistence.Initialise performs any setup and connection needed for the registered backend database. Registering a SQL Server database is similar, except that the ADO connection string is passed as a parameter. I’ve used this method when transferring data from one database to another; I read in all the data using one backend (marking it all as new), changed to the new one, and saved all the data.

Filtering

The DoOperation method also handles sorting and filtering when loading collections. Filtering is handled in a similar fashion to the persistence operations. A TstgServerSideFilter descendant is declared for any collection that might need a filter. For instance, this is the filter for a TPersons collection:

  TSurnameFilter = class(TstgServerSideFilter)
  private
    FSurname : string;
  public
    property Surname : string read FSurname write FSurname;
  end;

They are usually simple placeholders for values or ranges to filter on. If you want to apply a filter to a collection, then create an instance of the filter class, fill in the relevant values and assign it to the ServerSideFilter property of your collection, and load the collection. In our demo program we have a TSurnameFilter object defined on the main form. This is how it is used:

  FSurnameFilter.Surname                 := FilterEdit.Text;
  SystemObject.Contacts.ServerSideFilter := FSurnameFilter;
  Persistence.Load(SystemObject.Oid,SystemObject.Contacts);
  RefreshGrid;

To clear the filter, assign the ServerSideFilter property back to nil and load the collection again. (If you want to do client-side filtering, the easiest way is to use an Iterator on the collection.)

Each filter class needs a corresponding filter mapper class. For our surname filter, this is:

  TSurnameFilterMapper = class(TstgADOServerSideFilterMapper)
  protected
    function DoGetWhereClause : string; override;
  end;

implementation

function TSurnameFilterMapper.DoGetWhereClause : string;
begin
  Result := 'Pers_Surname = "' + TSurnameFilter(Filter).Surname + '"';
end;

As you can see in DoOperation, the Load procedure for a collection gets passed the relevant filter mapper, and the filter mapper generates extra SQL which is used in the where clause of the query that loads the collection. You can follow this through the demo application with the Delphi debugger to see how it all works.

Sorting

Sorting is done a little differently. Originally, all sorting was done by creating a TstgSortOrder subclass, and assigning it to a collection. It turned out thatmost of the sorting we needed was an ascending or descending sort on one property, so we added this method to the base collection class:

procedure TstgCustomCollection.Sort(const SortPropName : string;Ascending : Boolean);
  var
    OldSortOrder : TstgSortOrder;
    TempPropName : string;
begin
  TempPropName := SortPropName;

  // If the SortPropName is empty then use the default 
  // sort property name
  if TempPropName = '' then begin
    TempPropName := DefaultSortProp;
  end;

  // If the default sort property name is blank then don't sort
  if TempPropName = '' then begin
    Exit;
  end;

  OldSortOrder := FSortOrder;
  FSortOrder   := TstgGenericSortOrder.Create;

  try
    FSortOrder.Ascending                          := Ascending;
    TstgGenericSortOrder(FSortOrder).PropertyName := TempPropName;
    Sort(FSortOrder);
  finally
    FreeAndNil(FSortOrder);
    FSortOrder := OldSortOrder;
  end;
end;

I’ve highlighted the important lines: we create a sort order object, assign it the property name we want to sort on, and the sort direction, and sort the collection using that sort order object. (Sort objects are also used for searching, by the way, but work in the same fashion.) The Sort routine itself is simple, but a little strange:

function Compare(Item1,Item2 : TstgCustomCollectionItem) : Integer;
  var
    SortOrder : TstgSortOrder;
begin
  Result    := 0;
  SortOrder := TstgCustomCollection(Item1.Collection).FSortOrder;

  if Assigned(SortOrder) then begin
    Result := SortOrder.DoCompare(Item1,Item2);
  end;
end;


procedure TstgCustomCollection.Sort(SortOrder : TstgSortOrder);
begin
  FSortOrder := SortOrder;
  SetCurrItemAsInvalid;

  // Hacking to TCollection TList - see interface for explanation
  GetListFromCollection(Self).Sort(@Compare);
end;

We found that no matter what sort algorithm we used, performance on large collections was awful. This was finally found to be due to the way collections reference their items – time to access an item was directly proportional to its index in the collection. The internal TList in a collection seems to step through all the items until it gets to the one it wants, rather than jumping straight there. However, if you call TList.Sort, the sort is lightning fast. So that’s what we did, and the GetListFromCollection routine is a hack to get at the private, internal TList. My paper “The Path Less Travelled” explains how that works in more detail.

The TList.Sort routine takes a function as a parameter (not a method of a class; although they might look the same a function method is not the same thing as a plain unattached function). This function takes two items as parameters, and should return negative, zero and positive results if the first item is less than, equal to, or greater than the second item, respectively. The Compare routine used to sort collections looks for the owner collection of the items being compared, gets any sort object attached to that collection and uses its DoCompare method for the actual comparison.

In the case of a TstgGenericSortOrder object, RTTI is used to get the value of the named property for each item, and a relevant comparison routine is called on those values. All this is also explained in “The Path Less Travelled”. Obviously this technique has limits; the two items must both have a property of the given name, and it should be of the same type in each case. It is not essential that the items be of the same type for this technique to work, but as we are sorting a homogeneous collection, they will be. This also assumes that the sort property is of a base type like string, integer, enumerated type and so on. Without more work it will not work on complex types like records, arrays, sets or classes.

The demo application shows sorting and filtering in action.

Presentation layers

The presentation layer (using the term loosely!) in the demo code is extremely simple, with everything hard-coded into event handlers and so on.However, it should be obvious that editing dialogs will often take the same form as the TEditDlg, and that it could be refactored into a more generic solution, perhaps even to the point of generating the editing controls on the fly. In the production version of my OPF, the Mediator and Observer patterns were heavily used to create a particular style of UI, mostly using listview and treeview components. The code is complicated, highly specific and not pretty, so I won’t present it here[1].

User interfaces

Alternative solutions that have been used include the Model View Controller and Model View Presenter patterns, loading objects into in-memory datasets and using data-aware controls, and writing components that know how to deal with BOs (Bold uses Bold-aware controls). It’s too complicated a subject for this paper and I’m going to refer you to the references for more information.

Reporting

Another problem when using business objects is how to deal with reporting requirements. Report writers like Crystal Reports won’t know anything about your BOs, and most of the Delphi reporting components also normally assume they will have a direct connection to a database. My major client uses ReportBuilder, and it’s a common tool, so we’ll have a look at how to create a simple report with it. The principles will be similar with other vendors’ offerings.

The first thing we do is to create a new contacts collection mapper specifically for reporting. This mapper will make it possible to load the address at the same time by generating the following SQL:

select
  Cont_Description, Cont_PhoneNumber,
  Pers_Title, Pers_FirstName, Pers_Surname, 
  Add_HouseNumber, Add_Street, Add_Town, Add_Postcode, 
  Contacts.OID, Contacts.Owner_OID 
from 
  (Contacts inner join Persons on (Contacts.Oid = Persons.Oid))
            inner join Addresses on (Contacts.Oid = Addresses.Oid) 
where 
  (Contacts.Owner_OID = '{00000000-0000-0000-0000-000000000000}')

Each contact item can then be fully loaded from the resultset. We register the new mapper for use during reporting:

  Persistence.RegisterReportClass(TContacts,TContactsReportMapper);

I’ve found it best to load a separate copy of a collection when doing a report. That way there is no impact on the user interface. To create a simple report only requires a TppJITPipeline and a TppReport component on a form, with the pipeline hooked to the report component (see ReportForm.pas – this assumes you have ReportBuilder installed). The demo program has a simple two column report defined in the TppReport component showing names and addresses. The entire source code for the report is shown below:

class procedure TReportDlg.Print;
begin
  with Self.Create(nil) do begin
    try
      FContacts := TContacts.Create(nil,TContact);
      Persistence.Load(SystemObject.Oid,FContacts,True);
      Pipeline.RecordCount := FContacts.Count;
      Report.PrintReport;
    finally
      FContacts.Free;
      Free;
    end;
  end;
end;


procedure TReportDlg.FieldGetText(Sender : TObject;aLines : TStrings);
begin
  if not InRange(Pipeline.RecordIndex,0,FContacts.Count - 1) then begin
    Exit;
  end;

  if Sender = NameMemo then begin
    aLines.Text := FContacts[Pipeline.RecordIndex].FullName;
  end else begin
    aLines.Text:=FContacts[Pipeline.RecordIndex].Address.FullAddress;
  end;
end;

Again I’m using a class procedure. This Print method creates an instance of the form and a new contacts collection field local to the form. The contacts are then loaded by the OPF. You could use a filter and/or a sort here of course – I often use the same ones as are currently used in the displayed collection. The pipeline component feeds data to the report generating code in ReportBuilder, so we tell it how many records there are to display, and then print the report.

The report display components each have an event to get the text for the current record. The one event handler deals with both components to get the text for each column. The result is something like this:

Report

Restrictions and possible enhancements

That pretty much covers the design and use of my OPF. I do have specialist classes for reporting, exporting, user interfaces and so on, and you will no doubt find a need for similar classes yourself. It is a simple OPF, designed for applications with highly hierarchical data, where there will not be enormous demands on the database, either from the amount of data or the number of users. As such, there are a number of limitations to the framework, and some other improvements that could be made. In no particular order, some of these are:

  • Use RTTI in mappers. Currently I have a wizard that generates code for my BOs, their collections and mappers for both. In most cases, all that would really be necessary in the mapper is the table name and a prefix for the field names. Everything else could be generated using RTTI from the BO definition. One day I wil get time to write a general purpose mapper.
  • You can only one collection of a given type in a class. We discussed this earlier. A workaround is having two types of items that only differ in class name (ie new subclass with no new properties) that maps to a new table. A more general solution is to give collections an OID, and have a link table that stores that OID against each of the OIDs of the objects in the collection. This makes the database and mappers a bit more complex, and I haven’t needed to do it, but it is perfectly possible.
  • Collections are homogeneous only. You may need to store lists of different types of objects.
  • As we also saw earlier, the code I’ve attached doesn’t deal with collections of objects not owned by the owner of the collection.
  • Transaction support may need to be implemented. The Persistence object is the most likely candidate.
  • The OPF is not thread-safe. The lists used to store the mappings are thread-safe, but it has not yet proved necessary to develop multi-threaded applications, so that’s about as far as I went down that road. At the very least, the ADO mapper class would need to use different query components in each thread.

Examples of commercial and freeware OPFs

  • Borland’s Model-Driven Architecture (MDA). This used to be known as Bold in Delphi 7, and is the most sophisticated of the commercial offerings for Win32 versions of Delphi. Currently it is only available as part of Delphi Architect products (this actually makes it much cheaper than it used to be). Work on Bold has ceased, and effort is now focussed on the .NET version known as ECO.
  • Techinsite OPF (tiOPF) is a freeware offering
  • Spider Object Database is a commerical product, that despite the name, is an OPF
  • InstantObjects is a product that integrates with ModelMaker
  • There are a couple of slowly evolving efforts on SourceForge and JEDI, one of which is DePO

Conclusion

I hope that this tour of my Object Persistence Framework has given you some insights into the workings and advantages of this technique. The demo program is rather small and contrived, but you should be able to extrapolate from it toget some sense of what a larger application would be like (no matter how large or how many tables, there will still be only two data access components, and you won’t see either one on a data module!). This framework may well not be suitable for your needs, but it may inspire you to investigate one of the other options, or even write your own. It’s a rewarding exercise.

References

Fowler, Martin. Patterns of Enterprise Application Architecture, 2003, Addison-Wesley ISBN 0-321-12742-0
This was originally a set of articles at http://www.martinfowler.com.

Larman, Craig. Applying UML and Patterns, 2002 Prentice-Hall PTR ISBN 0-13-092569-1

Joanna Carter’s articles on object persistence and the MVP pattern: http://www.btinternet.com/~joannac/

Philip Brown's articles entitled "An Object-Oriented Persistence Layer Design" at CodeCentral on the Borland Developers Network

Scott Ambler’s articles




[1] Let’s call this ToD:ADS, or just ToD for short.
[2] See www.prevayler.org for a more sophisticated version of this that will work in multi-user situations.
[3] Note for Australians: Yes, I know that’s an oxymoron, but we need to let them salvage a little pride.
[4] Originally, there was only one modified flag, but in use we found that having two allowed us to optimise the mappers when objects got very complex (they didn’t have to check objects and collections down the composition hierarchy). Most of the data used with this system is extremely hierarchical, so any optimisations tend to reflect that.
[5]Damn, that’s an ugly sentence. We have pairs of classes, e.g. a BO class and a mapper class, and we need to keep track of which mapper class deals with which BO class.
[6]I’m not particularly proud of it either, to be honest. While I was able to make use of DUnit to test the OPF, that was next to impossible for the presentation stuff, and it shows. The code was much buggier, and much uglier because it’s too risky to refactor such important code without unit tests.