by Larry Clarkin and Josh Holmes
Summary: A mashup is a technique for building applications that combine data from multiple sources to create an integrated experience. Many mashups available today are hosted as sites on the Internet, providing visual representations of publically available data. This article describes the history and architecture of mashups, and explores how you can create mashups for use in your enterprise. We also impart some wisdom gained from projects with customers and systems integrators who have implemented mashups for the enterprise.
Mashups have gained popularity within the last few years, swept along with the momentum around Web 2.0. Early mashups took data from sources such as Craigslist (http://www.craigslist.org) and combined them with mapping services or photo services to create visualizations of the data (for example, http://housingmaps.com). Many of these early mashups were consumer-focused, although recently there has started to be both interest and acceptance of mashups in the enterprise. Organizations are starting to realize that they can put their well-defined services that do discrete bits of business logic together with other existing services, internal or external to the organization, to provide new and interesting views on the data.
As techniques for creating mashups have matured, we are starting to see companies build business models around mashups. For the real estate market in the United States, both Redfin (http://www.redfin. com) and Zillow (http://www.zillow.com) use large amounts of public and private real estate data (from sources such as county record offices and the Multiple Listing Service), combined with internal “value added” services, the result of which is displayed to the user on a map (using Microsoft’s Virtual Earth and Google Maps, respectively). There are many possible types of information that you could add to a real estate site; other similar listings, information on local schools, local hospitals, recent crime rates, classifieds for job postings and more.
Although there is a great variation in the user interface and the sources of data for many mashups, we can still derive common architectural patterns that they all share. For example, all mashups are RESTful in nature (they conform to the Representational State Transfer principles). Figure 1 shows an architectural rendering of a typical mashup.
Figure 1: The architecture of a typical mashup application (Click on the picture for a larger image)
The core element of any mashup is the data being aggregated and presented to the user. Although the above diagram depicts the source of the data as a database, the concept of a mashup does not require a database that is local to the mashup software or the client. The data can strictly come from Web services where data is serialized to XML or JSON (this is the most common pattern in Internet-based mashups). There are architectural trade-offs to be made from storing the primary data in a local data store and accessing the data with every request. As mashups move from being Internet-based applications to internal to the enterprise, they tend to depend less on external data stores.
Use of RSS (Really Simple Syndication) feeds is a common source of primary or supplemental data for mashups. RSS feeds are easy to consume as they are XML documents, and many libraries exist to manipulate the feeds. The format and specification for RSS is well documented and understood with only a few variations from version to version. The extensibility of RSS is also well known, as demonstrated by the number of extensions in use today, such as adding attachments to the feeds, creative commons licensing information and location information.
It is also common to include calls to Web services within mashups. It is common to see both WSDL-based Web services and REST-based Web services, with some services exposing both styles. Web services can be used to provide additional data or used to transform the data being mashed up. For a map-based mashup, the data may only contain street addresses and a call to a WSDL or REST-based Web service may be required to translate the street address to a Longitude / Latitude coordinate for the map.
Figure 2 depicts a special class of services that are used to create mashups. We are calling these platform services because they provide functionality beyond the typical request/response model of traditional Web services. A typical example of this is the mapping service provided by Virtual Earth. Virtual Earth includes an entire array of server-side and client-side processing capabilities, as well as the “services in the cloud.” We see the emergence of cloud-based building block services that begin to create value. For example, the Amazon S3 service offers storage “in the cloud”; this makes it easier to expose any static data by uploading it to a hosted storage provider. Microsoft’s BizTalk Services is a platform service that provides a different capability – the ability to relay communications from the Internet across a corporate firewall, thus exposing internal services for consumption by business partners or third parties building their own enterprise mashups. Relayed communication as provided by BizTalk Services is also useful even within a single enterprise, with many business units or with numerous network segments. An Internet-based communications relay can eliminate physical network topology as a communications obstacle.
Figure 2: Using BizTalk Services as a platform service to relay information (Click on the picture for a larger image)
Thus far we have identified many of the types of services that can be used to create mashups, but we have not addressed the importance of the software that creates and delivers the mashup experience. Think of the mashup application as a combination of middle-tier services and some lightweight business logic. For Internet-based mashups, the software is usually written using Web technologies (like PHP or ASP. NET), but we are starting to see the line between server processing and client application blur with the emergence of Rich Internet Applications (RIAs). RIAs are applications that run inside the browser with rich functionality similar to that of many desktop applications. These typically do not require a client side installation beyond a generic plug-in such as Adobe Flash or Microsoft Silverlight.
As time has passed and the development process has matured, a lot of the tedious coding has been replaced by frameworks and better codification of standards. Custom scripts on the server side are starting to be replaced by standardized libraries that will automatically generate the required client-side script. We are also seeing standardization in the message formats. An example of this is the GeoRSS extension to the RSS standard that allows you to specify the Longitude and Latitude that is related to the items in the feed. All three major mapping service providers (Google, Microsoft, and Yahoo) support GeoRSS, which means mashups that use this RSS extension require almost no coding.
The creation of mashups was once only the domain of the developer, but there is a movement to put the ability to create mashups directly into the hands of the end customer. As the frameworks to create mashups are becoming simpler to use and the message formats are becoming more standardized, the next logical step is to build tools that can create mashups. Some of these tools will be targeted at the end consumer of the mashups. Pipes by Yahoo and Popfly by Microsoft are examples of frameworks and tools for allowing users to create their own mashups.
We are seeing an increase in the importance of common schema and metadata in the development of mashups. We described in the previous section how the common schema of RSS feeds makes it easy to incorporate them into mashups. The same principles will need to be applied to other types of data as well (as robust as RSS is, we cannot model all of our data into that format). We are already seeing the emergence of other standard schemas, such as KML (Keyhole Markup Language) to describe geospatial data. Even more interesting will be Microformats, a promising framework for delivering semantic meaning which can easily be read by software such as mashups.
A great way of exploring mashups in the enterprise is with an example. Let’s imagine that you are an application architect for a call center system that receives calls about warranty and parts service. Using the phone number of the caller, we could display the records for that user, including a purchase history. This interesting application has already been implemented today in most call centers. But what if, in addition to looking up the customer information, we plotted the phone number on a map using a publicly available service and also displayed a list of local service centers or parts suppliers for our products overlaid on the map? With this data in hand, we might be able to answer the customer’s questions in seconds. What if we also looked up the current weather conditions in that area or the local sports teams and their recent games for a conversation starter or filler on long running calls?
Using this example, Figure 3 shows a mock customer service mashup that could be used by a representative when they receive a call. Combining data from public services (such as weather and news) with internal sources of data (a customer service alerts blog and a database of service locations), this application combines the mashup elements with the accessibility of a portal.
Figure 3: An example mashup for our call center (Click on the picture for a larger image)
As shown in Figure 3, the embedded mashup puts a tremendous amount of ready information at the call service agent’s finger tips – ranging from the top service items so that they can answer questions quickly to reverse lookups on the phone number to get conversation starters. This type of mashup is strictly internal but provides tremendous value on the initial call with any client.
There are several key elements to making this mashup successful. First and foremost, it is contextual to the task at hand. Second, it prioritizes the information in the order in which is most likely to be useful. You have to know who you are talking to, what products they have registered, what the top service items are for those products, and then start trying to answer questions that they may have such as where the local service centers are for the customer’s location. Third, once we have the customer information and locale, each of the panes of content are standalone and do not require interaction with the other parts. This allows gathering and aggregation of the data for each of the parts to be performed asynchronously.
Realizing Return on Investment (ROI) is a common concern for Service-Oriented Architecture (SOA) deployments. Many organizations find it difficult to justify the upfront investment for creating services for different functionality when it would be easier to create a single application. Enterprise mashups are a great example of how an investment in SOA can provide great value. (See the sidebar “Mashups deliver ROI for IDV Solutions.”)
Immediate return for a SOA can be realized when organizations start mixing and matching these services for new and exciting purposes. It can be exciting to start leveraging services or applications in ways that couldn’t have been imagined when they were written. (See the sidebar, “Quicken Loans leverages mashups for fast results.”)
It’s important to realize that you can’t own all of the information in the world; and there’s a fairly high return on investment when you can simply leverage someone else’s hard work instead of us inventing that particular wheel.
Another opportunity for enterprises, as mashups become more and more mainstream, is to build services that can easily be consumed by mashup applications. Going back to the example we cited, the customer service representative can make her customer happy by providing nearby store locations. But imagine if the stores themselves exposed their inventory and product availability to the mashup. Now the service representative can provide even more detailed and valuable information to the customer on the phone. That kind of service would be valuable for the customer, the call center, as well as the store itself.
Figure 4: Quicken Loans leverages mashups for fast results (Click on the picture for a larger image)
If you are leveraging existing Web services and robust platform services, the development time of mashups can be measured in hours or days, rather than weeks or months. The quick turnaround time for mashups can change the way that IT departments interact with the users in their organizations. Joint development and multiple iterations of the applications being delivered in a short time can become more realistic goals.
Perhaps the greatest benefit achieved from mashups is the visualization that they create for the users. Data that is visual in nature is far easier to understand and can create greater meaning for the customer. This is not limited to the mapping examples that we have discussed so far. Other techniques such as heat maps and tree maps can also be created as data visualizations in mashups.
Look for utilitarian services to consume (simple before complex). The more things that a given service does, the harder it is to mash it in with other services. The right services do one thing and do that one thing well. For example, a service that returns people data with all of their addresses, loans, cars, trains, planes and automobiles, may be offering too much data. You might want to create a subset service that returns just name and address to mash with, as it’s a waste of bandwidth and processing to pull data that you are not going to use.
Mashups are agile views into the data that they present. They are not an all-powerful editing surface capable of editing any and all data thrown at them. If someone wants to edit that data, they should go back to the application that created the data to do the editing. The mechanism for doing this should be obvious as well. This will dramatically reduce the complexity (and improve the agility) of your mashup applications.
Business decisions will be made based on enterprise mashups. This is dangerous when pulling data from multiple sources as the data might be stale. As with any report that you want to make decisions on, it’s important to have the timestamp for when the data was last refreshed. That will greatly improve the clarity and usefulness of your mashed data, and the confidence with which the decisions can be made.
As mashups move into the enterprise, it is inevitable that you will run into data fragmentation and normalization issues that are a result of years of legacy application development. For instance, to mashup sales by geographical area might seem like a logical mashup to create for your general sales manager. But what if the data is stored in three different systems by product line and each has different data structures and business rules governing them? Our advice is to find a better candidate for mashup technologies and let a more traditional project (such as a systems consolidation effort) resolve the more complex problems.
Authentication and authorization can be huge impediments to doing a mashup of several different services if they all require authentication with different schemes. For example, issues can arise if you have one service that requires username and password and another that requires an X509 certificate. This is not insurmountable, but can be a large roadblock that has to be overcome. There are several strategies to attacking this problem. You can avoid services that require authentication or that require authentication via methods that you don’t support. While you can get started this way, the reality is that there are services that you need to leverage that you can’t get to without authentication. You could try to force others, or pay them, to offer the type of authentication that you are comfortable with. The reality, however, is that your application will most likely have to support service authentication using many different mechanisms, which all need to be taken into consideration.
When implementing mashups, four areas of risk should be considered:
One of the major risks in creating enterprise mashups is when you create a dependency on services that are external to your company (such as the “services in the cloud”). The terms of the service agreements should be investigated before the dependency is created. For example, some services require that the software using the service be a public facing Internet site; this might occur when the service has an ad-based revenue model. The terms of service may also be subject to change in some cases in ways that could be detrimental to your use of that service. To mitigate this concern, look for service providers that have a model that fits your usage.
Loss of fidelity in the data being displayed is another key risk. As data is visualized, there is a tendency to make the data fit the confines of the presentation surface. There will be a natural tendency to not visualize small amounts of data or to group data into larger collections in order to conserve space on the presentation surface. This has the potential to “warp” the end user’s view of the data.
Figure 5: Mashups deliver ROI for IDV Solutions (Click on the picture for a larger image)
Politics can also be a hurdle when creating mashups. If you didn’t create the service, then it might or might not do exactly what you wanted and it takes too long to get the originator of the service to change it to your needs. This Not-Invented-Here (NIH) mentality is fatal to mashups. This also manifests itself in trust. If you don’t trust the provider of the service, then you will not rely on that service in your mission-critical application.
Consumer technologies are increasingly being used inside the enterprise without the awareness or governance or corporate IT, according to a recent report from Gartner, Inc. Consumer-facing tools are tightly focused on the creation of a mashup and the visualization, so it can be very easy to create the initial mashup, but longer term maintenance of the mashup is not taken into account. Also consider how dangerous would it be to have an end user upload corporate data into a public mashup tool like Pipes or Popfly.
There are several ways to mitigate these risks. First, for internal or external services, put into place a Service Level Agreement (SLA) that clearly describes responsibilities of both parties, response time for change requests, uptime requirements, bandwidth restrictions, and all other relevant details. Second, lay out the possible fall back requirements in your application when there is a failure calling a given service. For some services, it might be acceptable to just not show that data; with other services, you could cache data that you can fall back on anytime there’s a failure. You might need a secondary service lined up as a backup to call in case something goes horribly wrong.
Finally, you will need to address the consumerization issues and the politics. Unlike increasing the reliability and redundancy of services, this requires a governance process. If your organization has a mature process for governing the use of services, then you should leverage that process for mashup creation and consumption as well.
A January 2007 survey by McKinsey asked corporate customers about their adoption of Web 2.0 technologies. As you would expect, a great many of them had either invested or were planning on investing in one or more Web 2.0 technologies. The surprising part of the survey to us was that mashups were only in use or under consideration by 21 percent of the respondents and a majority of respondents, 54 percent, were not considering their adoption at all.
We think that the low response to mashups in the enterprise is due to the relative newness of the technology compared to other ones that were included in the survey (like Web services, podcasts and RSS feeds). The fact that tools to build mashups are just starting to emerge is a factor as well (many of the ones mentioned in this article are in beta and alpha stage as we write this article). We are certain that the techniques will become more common and the tools will mature. Concepts such as the Internet Service Bus (page 2) should make build these enterprise mashups both easier and more useful. We feel that mashups have a place in the enterprise and we encourage you to investigate their adoption.
· “Consumerization Gains Momentum: The IT Civil War,” Gartner Special Report, 2007 (summary) http://www.gartner.com/it/products/research/consumerization_it/ consumerization.jsp
· “How Businesses are Using Web 2.0: A McKinsey Global Survey,” The McKinsey Quarterly, August 2007 http://www.mckinseyquarterly.com/article_abstract_visitor.aspx?ar=1913
· Mashup (Web application hybrid), Wikipedia http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29
· Redfin http://redfin.com
· “Web 2.0 in the Enterprise,” Michael Platt, The Architecture Journal, Journal 12
· Zillow http://zillow.com
Larry Clarkin is an architect evangelist with Microsoft. Larry has over 15 years of experience in the design and construction of applications in a variety of technologies and industries. He specializes in integrating Web technologies with Legacy and ERP systems, which he has been doing for over 10 years. When not working with customers, you will find Larry talking technology at local User Groups. You can contact Larry through his blog at http://larryclarkin.com.
Josh Holmes is an architect evangelist with Microsoft. Prior to joining Microsoft last October, Josh was a consultant working with a variety of clients ranging from large Fortune 500 firms to smaller sized companies. Josh is a frequent speaker and lead panelist at national and international software development conferences focusing on emerging technologies, software design and development with an emphasis on mobility and RIA (Rich Internet Applications). Community focused, Josh has founded and/or run many technology organizations from the Great Lakes Area .NET Users Group to the Ann Arbor Computer Society and was on the forming committee for CodeMash. You can contact Josh through his blog at http://www.joshholmes.com.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.