Materialized View Pattern
Generate prepopulated views over the data in one or more data stores when the data is formatted in a way that does not favor the required query operations. This pattern can help to support efficient querying and data extraction, and improve application performance.
Context and Problem
When storing data, the priority for developers and data administrators is often focused on how the data is stored, as opposed to how it is read. The chosen storage format is usually closely related to the format of the data, requirements for managing data size and data integrity, and the kind of store in use. For example, when using NoSQL Document store, the data is often represented as a series of aggregates, each of which contains all of the information for that entity.
However, this may have a negative effect on queries. When a query requires only a subset of the data from some entities, such as a summary of orders for several customers without all of the order details, it must extract all of the data for the relevant entities in order to obtain the required information.
To support efficient querying, a common solution is to generate, in advance, a view that materializes the data in a format most suited to the required results set. The Materialized View pattern describes generating prepopulated views of data in environments where the source data is not in a format that is suitable for querying, where generating a suitable query is difficult, or where query performance is poor due to the nature of the data or the data store.
These materialized views, which contain only data required by a query, allow applications to quickly obtain the information they need. In addition to joining tables or combining data entities, materialized views may include the current values of calculated columns or data items, the results of combining values or executing transformations on the data items, and values specified as part of the query. A materialized view may even be optimized for just a single query.
A key point is that a materialized view and the data it contains is completely disposable because it can be entirely rebuilt from the source data stores. A materialized view is never updated directly by an application, and so it is effectively a specialized cache.
When the source data for the view changes, the view must be updated to include the new information. This may occur automatically on an appropriate schedule, or when the system detects a change to the original data. In other cases it may be necessary to regenerate the view manually.
Figure 1 shows an example of how the Materialized View pattern might be used.
Issues and Considerations
Consider the following points when deciding how to implement this pattern:
- Consider how and when the view will be updated. Ideally it will be regenerated in response to an event indicating a change to the source data, although in some circumstances this may lead to excessive overheads if the source data changes rapidly. Alternatively, consider using a scheduled task, an external trigger, or a manual action to initiate regeneration of the view.
- In some systems, such as when using the Event Sourcing pattern to maintain a store of only the events that modified the data, materialized views may be necessary. Prepopulating views by examining all events to determine the current state may be the only way to obtain information from the event store. In cases other than when using Event Sourcing it is necessary to gauge the advantages that a materialized view may offer. Materialized views tend to be specifically tailored to one, or a small number of queries. If many queries must be used, maintaining materialized views may result in unacceptable storage capacity requirements and storage cost.
- Consider the impact on data consistency when generating the view, and when updating the view if this occurs on a schedule. If the source data is changing at the point when the view is generated, the copy of the data in the view may not be fully consistent with the original data.
- Consider where you will store the view. The view does not have to be located in the same store or partition as the original data. It could be a subsets from a few different partitions combined.
- If the view is transient and is used only to improve query performance by reflecting the current state of the data, or to improve scalability, it may be stored in cache or in a less reliable location. It can be rebuilt if lost.
- When defining a materialized view, maximize its value by adding data items or columns to the view based on computation or transformation of existing data items, on values passed in the query, or on combinations of these values where this is appropriate.
- Where the storage mechanism supports it, consider indexing the materialized view to further maximize performance. Most relational databases support indexing for views, as do Big Data solutions based on Apache Hadoop.
When to Use this Pattern
This pattern is ideally suited for:
- Creating materialized views over data that is difficult to query directly, or where queries must be very complex in order to extract data that is stored in a normalized, semi-structured, or unstructured way.
- Creating temporary views that can dramatically improve query performance, or can act directly as source views or data transfer objects (DTOs) for the UI, for reporting, or for display.
- Supporting occasionally connected or disconnected scenarios where connection to the data store is not always available. The view may be cached locally in this case.
- Simplifying queries and exposing data for experimentation in a way that does not require knowledge of the source data format. For example, by joining different tables in one or more databases, or one or more domains in NoSQL stores, and then formatting the data to suit its eventual use.
- Providing access to specific subsets of the source data that, for security or privacy reasons, should not be generally accessible, open to modification, or fully exposed to users.
- Bridging the disjoint when using different data stores based on their individual capabilities. For example, by using a cloud store that is efficient for writing as the reference data store, and a relational database that offers good query and read performance to hold the materialized views.
This pattern might not be suitable in the following situations:
- The source data is simple and easy to query.
- The source data changes very quickly, or can be accessed without using a view. The processing overhead of creating views may be avoidable in these cases.
- Consistency is a high priority. The views may not always be fully consistent with the original data.
Figure 2 shows an example of using the Materialized View pattern. Data in the Order, OrderItem, and Customer tables in separate partitions in a Microsoft Azure storage account are combined to generate a view containing the total sales value for each product in the Electronics category, together with a count of the number of customers who made purchases of each item.
Creating this materialized view requires complex queries. However, by exposing the query result as materialized view, users can easily obtain the results and use them directly or incorporate them in another query. The view is likely to be used in a reporting system or dashboard, and so can be updated on a scheduled basis such as weekly.
|Although this example utilizes Azure table storage, many relational database management systems also provide native support for materialized views.|
Related Patterns and Guidance
The following patterns and guidance may also be relevant when implementing this pattern:
- Data Consistency Primer. It is necessary to maintain the summary information held in a materialized view so that it reflects the underlying data values. As the data values change, it may not be feasible to update the summary data in real time, and instead an eventually consistent approach must be adopted. The Data Consistency primer summarizes the issues surrounding maintaining consistency over distributed data, and describes the benefits and tradeoffs of different consistency models.
- Command and Query Responsibility Segregation (CQRS) Pattern. You may be able to use this pattern to update the information in a materialized view by responding to events that occur when the underlying data values change.
- Event Sourcing Pattern. You can use this pattern in conjunction with the CQRS pattern to maintain the information in a materialized view. When the data values on which a materialized view is based are modified, the system can raise events that describe these modifications and save them in an event store.
- Index Table Pattern. The data in a materialized view is typically organized by a primary key, but queries may need to retrieve information from this view by examining data in other fields. You can use the Index Table pattern to create secondary indexes over data sets for data stores that do not support native secondary indexes.