CRUD, Only When You Can Afford It
Summary: Microsoft Architect Maarten Mullender examines the CRUD (Create, Read, Update, Delete) model, and determines that CRUD requires a deep understanding of its limits, specifically when dealing with concurrency issues. (11 printed pages)
Table of Contents
After my article on Dealing with Concurrency: Designing Interaction Between Services and Their Agents and my talks at TechEd in Amsterdam, I needed to express my thoughts on CRUD (Create, Read, Update, and Delete). I concluded that article with the following recommendations:
- Confine pessimistic resource usage
- Make information predictable:
- Manage the time of validity
- Personalize by assigning ownership
- Limit optimistic updates, for instance, add rows instead of update fields
- Design business actions:
- State your purpose
- State your preconditions
- Design postings
- Use the journal pattern
When I wrote "Limit optimistic updates," one of the things I meant to imply is to limit the use of CRUD, because you shouldn't use CRUD if you don't have to. Here, I'll elaborate on this and discuss those cases where it makes sense to use it, because despite its limitations using replication (CRUD) is a great way to go for a certain set of problems.
Hearing people talking about CRUD makes me cringe; I feel the urge to react. So, let me get that off my chest first.
Let me try to explain what I mean with CRUD and what not. CRUD to me means that the entities are simply read, created, updated and deleted. CRUD works with entities and those entities are changed by retrieving data from a service, modifying the data by setting properties, and then sending the data back to the service for update. CRUD is data-oriented. I don't ask for a specific business action, I just ask the service to create an entity, or delete an entity, or I ask it to update the entity by giving the service the new data for that entity. If more needs to be done, the service derives from the data changes what action to execute.
In contrast, most order processing is not CRUD, or at least not according to my definition. For example, an order can be created offline and then sent (replicated if you will) to a service for processing. Processing of that order will affect many of the related entities. The service may update the customer information, potentially changing more than just the year-to-date totals. For instance, the customer might have reached the critical order mass and be upgraded, updating properties used for price and discount calculations; products may or may not be available; delivery dates may or may not have been realistic; and so forth. These changes are important to both parties, but with CRUD, the customer's copy of the order would not reflect them.
Regardless of the physical implementation, an order has the semantics of requesting the service accept and process a new order. Such an order is a complex request for a complex business transaction and I regard such a request as a "journal." A journal being a set of business actions combined in a single request document.
The reason for my allergic reaction to CRUD may stem from the way I see some developers misuse Datasets. Such a developer will write a service that returns a Dataset to the user, where it is modified and sent back. The service or even a stored-procedure checks the before and after image, the timestamp or the version – ouch! That is a very crude way of checking the validity of the request Will my request fail to execute if someone changed something irrelevant in this record, and can it execute even though someone changed related data? Am I prevented from changing the telephone number because someone else added an e-mail address? Can I still add the order line even though someone changed the price or the product description?
Instead, why not have the request itself specifically state what it entails and specify when the update should be executed or refused? For example, rather than "Buy when the version of the information retrieved in an earlier request was 30256917," have the condition be "Buy when the price is at most x." Or why not instead of: "the old address was y, the new address is z" have a request that specifies the reason for change. Did the customer move? Then we better start a complete process. Or did we just correct a typo in the contact information?
Let's take an example that clearly does not work and analyze why it doesn't work. I keep my local check register and I transfer money from my U.S. checking account to my Dutch Checking account. If I specify that change by specifying the changed values of my checking account:
|From||654321-90||U.S. checking||USD 1200.00||Rate 3.41%|
|To||654321-90||U.S. checking||USD 1000.00||Rate 3.41%|
And the changes of my Dutch checking account:
|From||654321-10||Dutch checking||EUR 45000.00||Rate 3.56%|
|To||654321-10||Dutch checking||EUR 45244.09||Rate 3.56%|
This fails because it:
- Implies dependencies you may not want. Suppose the interest percentage changes. Do I care? If I use all the values of the entity in my request, I have concurrency issues that I don't want. In this case, I should omit the dependency on the interest rate that I do not want.
- Omits dependencies you may need. If something related changes, such as the currency conversion rate, the amount I deposited would not be correct. If I only use the values of the entity in question, I do not check other dependencies. To avoid conflict, I should add the foreign currency rate from the currency table if I want to brace for change.
- Introduces dependencies that may have been avoided. Specifying absolute values increases the chance of optimistic failure. I only want to use the values that do not change such as the account-number in my request, but not the balance. Instead of relying on the balance being USD 1200.00, I could use a posting and specify a change of –USD 200.00.
- Loses the context of the change. How does the U.S. bank know that it needs to transfer money to another bank? How does the Dutch bank know where the money is supposed to come from? By specifying only entity values, I cannot request a more complex change. Unless I introduce a specific entity for this purpose—in this example, a transfer-money request entity.
- Does not guarantee consistency. By requesting per instance changes, the service cannot guarantee consistency—which delegates consistency to the service's consumer. How can I make sure that either both actions are executed or neither is, if I send two updates? I have to maintain that consistency by other means.
Is there a better way to do this? A much better way to do this is to send the data to the bank as a series of requests framed as business actions such as: "Add USD 200 to my Dutch checking." This way, there is only one source for the data and the two do not need to be rectified. Also, I let the business logic check for the other dependencies, such as the conversion rate.
Yet a lot of solutions have been built using the principle of CRUD and at least some of them were very successful. CRUD has been used successfully for entity aggregation as well as for offline scenarios. Why did the designers choose CRUD?
Replication is the main reason for designers to use CRUD. Ease of interface design and implementation is another. People want to use replication as offered by e-mail systems and databases. Even though I will not go into the various ways to configure databases for data replication, I will look into how to use replication to keep two or more systems synchronized while changes may occur on each. As we are all aware, in computer science there are only three numbers: zero, one and more. Hence, when I write about replication between two systems, feel free to extrapolate this to more.
When designing for replication, it is wise to assign ownership of the data. We often talk about "single master." The essence is that for each piece of data a single owner exists, making conflict resolution easier. This applies to services that have overlapping data, such as the customer information in an ERP system and in a CRM system.
When replicating between multiple clients and a service, the service obviously owns the data.
However, when changes can be made on multiple client machines, this single master concept doesn't help much. Making sure that changes stem from only one source, be it a system or a real person, helps avoiding conflict. And it helps providing clear error messages in case a conflict occurs after all.
When two systems, each with its own data store and each with its own business logic, need to synchronize their information I can, using CRUD, either replicate at the data level or I can replicate at the service level.
This type of replication can be seen between databases, between file systems and in email systems. This type of replication implies CRUD. It uses the changes to the data—the entities—for the replication. When replicating at the data level, the business logic in the services on either side have no easy way of protecting the consistency of the data; the data will just be replicated and the business logic will only execute when a conflict is detected, if at all. Since the data in the system gets changed without much scrutiny, for a workable solution, this implies a high degree of trust and leads to tight coupling between the systems.
Database-to-database–level replication has the advantage that you can use standard mechanisms such as database replication. This immediately leads to a couple of other advantages. These mechanisms are typically fast, they sometimes support synchronization through firewalls and, the design and coding effort is greatly reduced.
The alternative is to provide some additional logic to walk through the changed data on both sides and use the service's published interface to replicate the changes. Normally this takes more development work and may carry a performance penalty. When you choose this method, you are not necessarily constrained to using CRUD and you may use any mechanism you want.
Even though service-to-service replication needn't go through CRUD, often entity services will offer CRUD semantics because in general they make replication mechanisms easier to build. When a request comes in, the before image of the entity is retrieved, the changes are applied, the after image is retrieved and both before and after image are stored in a change log for later use during the replication. This can easily be provided as a piece of reusable code in the service facade.
I can make our example even more complex by combining these two mechanisms. I can do this by adding a data store in the middle. This would be useful in a scenario in which I use Microsoft Outlook 2003 as the front end to a CRM system for instance. Outlook would have the data in its local store, for offline use and Exchange Server 2003 would sit in the middle and synchronize the data with the CRM system on the one side and with Outlook on the other. Outlook and Exchange Server would replicate the data using their own established mechanisms and Exchange and the CRM service would replicate the entities through the business logic of the CRM system. I will get back to this scenario later.
By using CRUD, you choose for an optimistic concurrency mechanism. You retrieve an entity, you change it and then you send it back. If the entity was changed in the service in the mean time, you get an optimistic failure. CRUD does not work well, when there is a good chance of a concurrency conflict, nor when it is hard notifying the user of such a conflict.
However, concurrency is not always an issue. The contact information in a customer relations management (CRM) system will not change often. The appointments in my calendar are typically only changed by me. Even if our administrator has complete change rights to my calendar, the chance of both of us changing the same appointment is slim, because of the common business practice that she just does not change my calendar unless she has to.
This is a good example of how in many cases, concurrency issues are guarded better by business practice or habit, policies and process than by the software. The software should support these and possibly enforce these, but without them, the software enforcement makes little sense.
Using CRUD for replication solutions seems very reasonable. So can we use CRUD or can't we? Let's compare the CRUD-based service interfaces with a few other cases and see if we can draw parallels.
CRUD and OOD
In object-oriented design (OOD) a class typically has a set of properties. These properties are accessors (Get/Set) to the private data for that class. Properties are generally used to keep the internal state consistent—they are not used to trigger complex business logic. The Set methods typically contain little code and they typically have little or no effect on other classes and instances. Otherwise, changing the value of a property would lead to a cascading effect of property changes and the system would become unmanageable. This means that not all properties can be set by all levels of the design. The inventory quantity of a product cannot be changed simply by a user or even by another class. Usually the design will have a posting method that handles incoming goods, outgoing goods, or an inventory correction. It makes sure that the auditing requirements are met and that the general ledger and the cost accounting will be updated etc. The SetInventoryQuantity property will be private, not public, and it will definitely not be exposed outside of the component.
Using CRUD compares to using public properties and using business actions compares to using public methods. Only CRUD may change multiple properties at once. An update will retrieve the set of properties of an entity (or object), change one or more of these properties and send the property-set back to the service for update. (Luckily it allows us to send all properties at once otherwise performance alone would make CRUD unworkable.)
Property changes that have side effects (effects that transcend keeping internal state consistent) are not CRUD candidates. For these, you should define requests that convey the business meaning of those requests as well as the conditions under which to execute them. The inventory quantity cannot be changed through CRUD; it can only be changed indirectly, for instance, by submitting an incoming goods order.
CRUD and Databases
In working with databases, CRUD is the normal way of dealing with data, regardless of whether or not we offer CRUD at the service interface. If you offer a CRUD service interface, the service will in most cases take the property changes and map them to logical requests that are executed and, deep down in the service, use CRUD to do the database updates. If you do not use a CRUD interface, but offer business actions such as "upgrade the customer" or "set the new quantity for inventory," the service will execute the request and then, again, deep inside the bowels of that service, still use CRUD on the database.
But there are a few notable differences between talking to a database and talking to a service, even when talking to a service using CRUD:
Data in the database is typically normalized. That means that properties aren't duplicated and consequently a single update typically suffices to maintain consistency. Services often offer a denormalized view on the entities. But, if we look at a single service I don't think it's a major difference. A well-designed service could (and should) provide the logic to keep the data consistent. It becomes different if the same information is held in multiple services, for instance changes to my bank accounts in the above example. Then logic external to these services has to keep the services synchronized. In our CRM example, the contact information may be stored both in the CRM system and in the e-mail system.
- Database updates are transactional. This means that multiple database updates that are, at least from the database's view, unrelated can be combined in a single all-or-nothing, or ACID (Atomic, Consistent, Isolated, and Durable), transaction. When one of them fails, all fail. In contrast, multiple updates to the service cannot be combined – and thus logic external to the service has to maintain consistency. Otherwise we see problems like in our money transfer example where transactions could be lost. If the service only provides a CRUD interface, a business action that updates multiple entities must be split into multiple updates, each on a single entity. The service will not offer transactional consistency between them, simply because it cannot. Without additional external checking logic by the user only part of a business transaction may be executed. For example, if I want to change the contract conditions for a wireless telephone contract for my family and I send each update separately, the service has no way of knowing that this is a single business transaction. The user has the responsibility to maintain consistency. But, in replication scenarios, the user cannot take that responsibility either, therefore, complex business requests don't make good CRUD candidates.
- Databases do not provide business logic to protect updates. The end user or client is not allowed to update the database directly. Applications or services provide that business logic to keep the state consistent and they update the data. Therefore databases can expose CRUD logic, they don't need to know anything more about the data. Services on the other hand do provide that business logic and therefore they often, not always, need to have more information to execute that logic. This means I cannot just change the balance on my bank account: I need to say where the money goes or where it came from; otherwise it will not be accepted. I cannot just change the address in my customer record: I need to say why I changed it to start the correct process to change it in the other parts of my company (this service and other services).
I think all three points lead to the same conclusions. Services are not databases, but by only providing CRUD interfaces from a user's perspective they offer a poorer form of database-like behavior. Restricting yourself to CRUD is like designing classes that only expose properties and no methods. And, it is like designing a service that acts like a database. There is a very narrow range of problems for which this is acceptable. Offering only CRUD is not sufficient in cases where you change multiple entity instances in one business transaction and require consistency between these entities. In such cases, services should offer business actions and business logic and be given the information needed to execute those business actions and ensure that business logic.
Think about it this way. A service encapsulates entities and it protects consistency through business logic. If the original business request affects multiple entities, a service can only provide complete service if it gets enough information about the original request. Then it can execute business logic, apply that business logic even across multiple entities, and maintain both information consistency and transactional consistency. CRUD does not offer this.
Now, this may all be true, but is it important? As is always the case, sometimes it is and sometimes it isn't. Architecture and design are all about making trade offs. So, when is it important to expose business actions instead of CRUD?
- You cannot afford CRUD if you need to maintain the context of the change.
Why did the account value increase EUR 244.09? Was it a deposit? Transfer from another account? Payroll payment? Is this amount taxable?
- You cannot afford CRUD if it implies dependencies you may not want.
Because it uses more attributes than needed. The chance of conflict increases steeply with the number of attributes specified. Why can't I add the Dutch translation of the product description, when someone else added the German translation?
- You cannot afford CRUD if it omits dependencies you may need.
Many transactions depend on values in other entities. You cannot set the home currency balance of a foreign currency account without making sure the rate didn't change.
- You cannot afford CRUD if it is important to maintain consistency between changes on different entities.
Booking a trip where it is important to have a flight, hotel and car, or booking an order where both products and service are required at a specific time are examples of requests that require consistency.
- You cannot afford CRUD if there is a reasonable chance of concurrency issues and if it is not easy to resolve those issues.
The simplest examples of concurrency issues are the general ledger and the stock-on-hand information. They are hardly ever, if at all, exposed through CRUD. Not because of dependencies, but because of the concurrent nature of updates. The Venetians already knew this in the fourteenth century and that is why they invented postings and journals.
- You might afford CRUD if, on the other hand, updates:
- Are infrequent.
- Are only done by a single source (person or machine).
- Consist of adding new instances rather than changing existing instances.
The contact information in a CRM system will not change often. The appointments in my calendar are typically only changed by me. Even if our administrator has complete change rights to my calendar, the chance of both of us changing the same appointment is slim, because of the common business practice that she just does not change my calendar unless she has to. In many cases, concurrency issues are guarded better by business practice or habit, policies and process than by the software. The software should support these and possibly enforce these, but without them, the software enforcement makes little sense.
To summarize this: Only use CRUD when concurrency issues are rare and can be resolved.
Concurrency issues may be rare because:
- Updates are seldom.
- Updates have only one source (person or system) – the "single source" concept.
You can reduce concurrency issues by minimizing changes to entities and by creating new instances instead. Think of keeping multiple address records and itemization of totals like in general ledger or stock on hand.
Granularity or chattiness is neither an argument for nor against either approach. When the service exposes methods that accept complex requests and even combinations of requests (as in the journal pattern in Dealing with Concurrency), requests may even be less chatty than when using CRUD. On the other hand, the synchronization mechanisms are often well tuned, so there shouldn't be much difference between the two approaches. And both approaches are obviously infinitely better than sending individual set-property requests.
At Microsoft, the people in the field use a CRM system, but for most of their work they only use a subset of what that system has to offer. They keep their contacts up to date; they add opportunities and activities and keep these up to date as well; and they read the customer information. They would like to do that in the environment they use to communicate with the customers and with their colleagues and they would like to be able to do that while at the customer site. What if we were to provide this to them via Outlook? Outlook clearly only supports replication with Exchange, so that all changes made to local data would be replicated to the Exchange server and from there into the CRM service. While limited to CRUD, the solution would be limited in functionality, but I believe it would be a good, even recommended, solution.
Let's look at the entities involved:
- Customer: Customer information is provided in Outlook. The solution need not provide changes to this information. It would be hard to allow changes through Outlook, because there may be too many side effects. But, most users will not change the customer information anyway, only the account manager may want to.
- Contact: Creating and updating contacts is not critical. Users share contact information, but when they change contact information, it is a minor change without much business logic (like correcting my checking account balance by adding in interest earned). When an update fails the user may receive a synchronization error and deal with it as he or she deems appropriate. The chance of such a failure would typically be small.
- Opportunity: Creating an opportunity triggers a business process or workflow. But creating the opportunity in Outlook provides enough information to that process and this creation does not depend on any other changes. Even if these other changes are related, the opportunity does not have a "transactional" dependency.
- Activity: Adding activities is uncritical. Changing activities might lead to conflicts, but much like appointments, activities can be thought of as personal information. That is, the owner of the information is known and that ownership is respected by common practice. The activity may be created by one person and then owned by another, but that is normally arranged outside of the software.
More complex cases don't have to be offered through Outlook. The original solution provides these cases. Most users, and especially occasional users, will have sufficient functionality through Outlook and can hook up to the corporate network in the seldom case they need more. More demanding users can still use the original solution and have more overhead, but also more functionality.
Until here I've discussed CRUD as an all-or-nothing approach. It doesn't have to be that way. That would only be the case if you were to rely entirely on standard replication mechanisms. A good way of combining approaches is to replicate information from the service and to queue requests to the service. (You can optionally change the local data to reflect those requests, but if those changes are ignored and overwritten after the requests have been processed by the service these local changes don't affect the solution.) The business actions in the queue will then be processed and these actions can follow all recommendations made in Dealing with Concurrency.
Instead of entering the requests in a queuing system, I can store the requests in a database table that I use as a queue and have them replicated. This would reduce my infrastructure requirements by using one mechanism (database replication) instead of two (replication and queuing). By doing this, I use replication as a messaging mechanism. Processing the replicated requests is exactly the same as processing requests that have been sent to the service using mechanisms such as SOAP, only the transport mechanism is different.
A variation on this is to model the requests as entities. The order is a good example of this. The order is essentially a request to the service. Modeling the order as an entity has the additional advantage that it becomes part of the business system and that it now can be tracked and audited. Another example would be the address change of the customer. Instead of designing a request for address change, I could design an address change form that has all the required information. The form would state the reason, such as correcting incorrect information, a new telephone number, or moving to a new location; it would have the desired date of change and so forth. Now, if the customer calls me six months later, I have the original request and the activities that I executed that relate to that request. The address change form is now an entity in my system. I can use many of the standard replication mechanisms to replicate orders and address change forms. They are new instances in my system or database and they do not introduce concurrency issues. Note however, that neither the local order entry nor the local address changes need to have any effect on the other local data. Processing the request (or new entity) by the service may affect the other entities and those changes will then be replicated back to the local store.
Suppose you want to write a solution for a service technician. The technician needs to fill out hours on the road, hours spent servicing, diagnosing and repairing. He or she needs to specify the material used, the condition of the devices, and such. The information about the customers, the devices, the maintenance contract per device, the prices for the materials, all can be replicated to the notebook. However the per customer service information the technician enters consists of more than just an invoice, it can be a complex set of requests to the service that may contain a few problems on synchronization that need to be resolved before the complete set is accepted.
CRUD is not all bad, but it requires a deep understanding of its limits and when to use it. You cannot blindly apply it in all situations requiring multiple copies of data. You must make sure that your problem does not contain concurrency issues or require complex business requests. Even in these cases, CRUD can sometimes be used with other methods as a way of streamlining the solution.