Mapping Objects 2 Tables

Introduction for Mapping Objects to Tables

Object-orientation and the relational model are different paradigms of programming. When objects need to be stored in relational databases, the gap between the two different sights needs to be bridged. If only data abstraction modules have to be mapped to a relational database, life is comparably easy [Kel+97 ]. With full blown object models the concepts of object oriented programming have to be mapped to relational table structures. These are:

Aggregation,
inheritance and polymorphism,
associations between classes,
and data types – smarter than SQL data types.

Each of the above concepts may be mapped using different solutions for the same problem. We will describe each different solution as a separate pattern. This approach allows us to clearly demonstrate the consequences of using a solution with respect to the general forces presented next. Mapping data types requires a larger effort [Kär95 ] and may be treated as a pattern language of its own. We will therefore leave it to future pattern mining work.

General Forces

Performance: One of the major forces that you should take into account when mapping objects to tables is performance. The way objects are mapped to tables has significant influence on the number of database accesses that occur in a system. Database accesses that have to be executed using hard disks or other external media are measured in milliseconds (10^-3 sec.). Processor cycles on the other hand are measured in nanoseconds (10^-9 sec). It is therefore a good idea to waste a few processor cycles and some RAM memory to economize on slow IO.
Read versus write/update performance: The solutions we will present for various mapping problems have different characteristics if it comes to read versus update/write performance. Some mappings allow you to read everything needed in a single database access while it takes several database operations to write the same object, due to the mapping used for inheritance. Therefore be sure you know the frequency of read and write/update operations before you commit yourself to a certain table design.
Flexibility and maintenance cost: Sometimes you might want to use a database mapping in a prototyping process. In this case flexibility is more important than performance as you will often insert or delete attributes, add or delete classes and restructure your class hierarchy. Once the hierarchy and classes become stable you may want to switch to a mapping with optimal performance.
Performance and redundancy versus maintenance cost and normal forms: The relational calculus helps you eliminate redundancy using normal forms and factorization. Relational database applications on the other hand show best performance with a minimal number of accesses to the database. The expensive factor is seek time for a certain record and not bandwidth for reading the data from a disk once they are located on the disk. Hence these applications perform best if they are able to retrieve all data needed for a use-case with a single access to the database or if they hit a cluster of data.
- Accesses to the database can be reduced by eliminating factorization and ignoring normal forms – which has negative consequences on the maintainability of the application [Kel+97
- Clustering can be influenced by database administration
- Maintainability of a data model and performance are two conflicting goals. Therefore: The harder you optimize your data model for performance, the higher your maintenance cost in case of changes to the application. Redundancy and all kinds of anomalies, normally prevented by the use of normal forms, have to be taken care of by maintenance.
Space consumption versus Performance: There are mappings that use no surplus database space (fields with null values and the like) and others that leave large portions of a database record unused. It is not surprising that the worst space hogs are often the fastest performers.
Query processing: There are two conflicting purposes, data have to serve in a business information system.
- Data have to be ready for online transaction processing with good performance. This implies restructuring of data for optimal performance. See [Kel+97 ] on how to optimize table schemes for good performance.
- Data have to be ready for data warehouse purposes. This implies that data have to be represented in a form that is well suited for ad hoc queries. Normal forms, no redundancy and maximum factorization serve this purpose.
- Building a data warehouse often implies separating the queryable data from the data needed for fast online processing. If you design a table mapping for objects, check whether the application is a data warehouse or an online processing application.
Application style: Besides database driven business information systems there are other types of information systems. Using a relational database as persistence mechanism for some of these might end in disaster. Some examples are.
- CAD applications: CAD applications are used to manipulate large sets of very complex, interrelated objects. Transactions are long. A CAD designer typically checks out a design, works on it for hours and then checks it back into some data store. Building such applications on top of a relational database, using an object relational database mapping is doomed to fail. Simple pointer dereferencing in working storage is faster by a factor 10⁶ than joins. Relational databases are not intended for very long transactions with zero collision rate.
- CASE Tools: CASE tools have characteristics similar to CAD systems. IBM’s negative experience with the AD/Cycle repository is a prominent example of what happens if such applications are implemented on top of a relational database.
- Any check in / check-out persistence applications: The above example can be generalized to applications that use complex, interrelated objects, allow direct manipulation and allow the user to check them out of a database for a longer period of time. Such systems should be built using non-relational data stores.
- Check you do not build one of the above applications before you map objects to relations.
Integration of legacy systems: Business information systems are seldom developed from scratch. Instead, you have to connect to legacy systems, which you are not allowed to touch. In our case you might have to build objects on top of legacy data. In this case you have to use whatever mapping patterns that fit your legacy data. You may then apply the consequences sections of the patterns. They inform you about performance implications of the mappings you have to use.