Communications as a Service

Joseph Hofstader

November 2007

Summary: Being an architect on a CaaS solution over the last few years has brought some interesting perspective into the unique challenges of the development and deployment of these applications. As I mention throughout this article, CaaS applications require additional considerations past hosted multi-tenant solution development, whose design challenges tend to reside in the orchestration layer and the security services. The SIP services layer requires knowledge unique to communications solutions, and the lack of an understanding of the patterns required to implement these solutions can hinder the effort.


Converged Communications
Session Initiation Protocol
IP Multimedia Subsystem
Caas Reference Architecture
Thoughts on Caas
About the Author

Internet-based communications are not a new phenomenon. For many years, we have sent video, voice, and data across the public Internet to correspond with others free of charge or at a low cost. Paradoxically, mainstream adoption of communications services over the public Internet has been relatively slow. With the cost of service dropping and lingering concerns about the quality and reliability of these largely unregulated communications systems, a vast majority of companies and individuals communicate over the same circuit-switched network that has existed since the invention of the telephone.

It is important to address a common misperception about IP communications. This misperception is that IP communications always take place over the public Internet. For many years, IP has been used for voice communications on dedicated circuits or local area networks. One example is the routing of long distance phone calls over dedicated circuits in a telecommunications carrier network. Another example of IP communications not using the pubic Internet are IP-based telephone systems used for enterprises telephony. In other instances, like some software-based “phone” systems running on a personal computer (PC), IP communications does take place over the public Internet.

In the case of enterprise communications, IP communications is largely provided through customer-premise Private Branch Exchange (PBX) systems implementing a signaling protocol that runs over IP: H.323 or Session Initiation Protocol (SIP). These IP-based PBX systems offer benefits to companies, such as the ability to run a corporate telephony system on the same network used for data. The convergence of voice and data over the corporate network reduces the costs of installing and operating separate networks for voice and data. Another benefit is the reduction of administrative costs realized by allowing users of the telephone system to relocate their telephones within the network without the aid of an administrator. Unfortunately, these systems are expensive to purchase, configure, and manage; this makes them costly for a small- to medium-sized business (SMB) to implement.

As a result, there is an underserved market for providing cost effective corporate telephony to SMBs. Over the last few years, many incumbent enterprise communications providers have conducted efforts to offer corporate communications in a hosted environment, similar to the environment that corporate back-office applications, such as Microsoft Exchange, are hosted in. Communications applications deployed in this environment are becoming referred to as Communications as a Service (CaaS). CaaS builds on the basic foundation of Software as a Service (SaaS), with some requirements unique to communications applications.

This article is intended for solution architects interested in learning about CaaS. For the sake of keeping this article focused on CaaS, there is an assumption that the reader will have, at a minimum, a cursory understanding of SaaS and an awareness of the basic concepts implicit in architecting SaaS solutions, such as using metadata services to support a single-instance multi-tenant deployment and implementing a federated security model. The article begins with a brief discussion about converged communications. I will then discuss SIP, covering the concepts supported by the protocol that is becoming standard in communications software. After the introduction to SIP, I will discuss the IP Multimedia Subsystem (IMS), which adds additional possibilities for communications applications by introducing a solution for fixed-mobile convergence (FMC). I will then provide a conceptual architecture of a CaaS solution. I will conclude with some thoughts based on my experience as an architect on a CaaS solution.

Converged Communications

Over the last decade, converged communications emerged as a popular phrase in the telecommunications industry. Although the phrase has different definitions depending on the context in which it is used, one theme runs consistent in all contexts: multiple services – video; voice; and data; accessible over multiple devices – wire-line telephone; mobile or smart phone; and PC (to mention a few).

As more services become available over multiple devices, the line between software applications and communications applications begins to blur. Many of the traits traditionally associated with communications applications—such as event-driven connections established between endpoints (point-to-point conference, multi-cast) —using communications protocols will also become requirements of business or entertainment software.

The aforementioned requirements create a paradigm shift for architects of software solutions. Architects of CaaS solutions will be required to understand the networks and protocols that will be used to access the CaaS solution. As with any new technology, until frameworks and tools are created to abstract the details of the network environment, application developers will also need to understand the protocols that will be used within their applications.

Session Initiation Protocol

SIP is the signaling protocol used by the early entrants in the CaaS space. SIP is an application layer protocol that allows a SIP based application to be plugged into a SIP infrastructure without concern to the underlying transport, such as wireless or wire-line. The SIP infrastructure provides an environment for creating event-driven applications that contact the end user on the device of their preference. In other words, applications using SIP can communicate with end users wherever they are, on whatever device they specify, using multiple forms of media (video, voice, and data).

Context-based Software

Even though the initial implementations of SIP have been created by companies focused on corporate communications, there is a lot of buzz around the development of SIP applications that bear a closer resemblance to Internet applications as opposed to traditional telephony. There are a lot of examples on the Internet of applications streaming sports highlights to a cell phone or with SIP implementations in emergency services scenarios. Herein lies the true potential of SIP: the ability to create event-driven software solutions that communicate with end users wherever they choose to be contacted.

There are an endless number of scenarios in which people may want notification or to notify others after an event occurs. Imagine receiving a notification, wherever you are, about possible traffic problems on your route to work with alternate routes to take, based on your current location information. Or imagine registering yourself at a hotel and having customized sports highlights appear on the television when you turn on the television. Context-based software implementing SIP can make these solutions a reality.

SIP Messaging

As mentioned earlier, SIP is an application-level protocol that creates a media session between two or more user agents (UA)–SIP enabled endpoints. SIP is a “rendezvous” protocol, which extends it past signaling and allows a UA to communicate with other UAs in ways that traditional telephony cannot. SIP allows a UA to request information about or provide information to another UA. A UA may range from a SIP phone to a multi-media device, like a laptop or a smart phone.

A SIP request is used to establish a dialog between two UAs. A SIP response is a three-digit numerical code, where the first digit indicates the class of the response (for example, a success response would be a 2xx response code). A SIP transaction is a SIP request and the final SIP response to that request. Figure 1 shows the establishment of a Real-Time Transport Protocol (RTP) session within a SIP transaction.


Figure 1. SIP transaction

SIP messages are similar to an e-mail message with a request Uniform Resource Indicator (URI) as the destination and the body in Multipurpose Internet Mail Extensions (MIME). The header of a SIP message contains information used by the SIP application. The SIP message body supports multiple content types, from plain text and HTML to presence and conferencing state information. To create a dialog, SIP uses the Session Description Protocol (SDP), with one party making an SDP request and the other replying with an SDP answer.

SIP Infrastructure

SIP contains an infrastructure far larger than the UAs that are establishing the session. Devices in this infrastructure may include SIP proxy servers, SIP redirect servers, and SIP registrar servers. It is important to note that all of these devices are logical, meaning that they do not need to be physically deployed as separate processing elements.

The SIP proxy server receives a SIP request and forwards the request. A common usage for the SIP proxy server would be for Network Address Translation (NAT) or firewall access control. The SIP redirect server receives a SIP request and performs a query, returning the results to the requesting UA. The SIP registrar server allows a user to associate a UA with his or her identity.


In SIP, a UA has a URI associated with it in order to be located during a SIP transaction. A SIP interaction may contain more than one URI. The use of multiple URIs supports context-based solutions by decoupling the user from the UA that SIP messages are routed to. SIP solutions can use resource lists as a way of supporting group operations, such as conferencing.

There are three types of SIP URIs: user, device, and service. The user URI is the identity of the user in the communications system. The user URI is the address of record (AOR) for the end user. The device URI is a single SIP UA instance. A device can be temporarily associated with a user. For incoming addressing, SIP binds a device URI to the AOR. The service URI represents a many to many SIP interaction, such as a conferencing session.

SIP Interactions

Figure 2 shows a SIP interaction that establishes an RTP session between two UAs in separate domains.

Figure 2. Establishing an intra-domain RTP session

Figure 2 shows a UA establishing a dialog by sending an INVITE request to another UA. The INVITE request contains the URI of the user who is being contacted by the request. The proxy servers route the request between domains and to the UAs. After the media session is established, the proxy servers can be taken out of the interaction—it is the responsibility of the individual who architects the SIP solution to decide which SIP devices, other than the UAs, need to exist in the media session path.

Common SIP Interaction Patterns

Figure 2 shows a basic SIP interaction involving two UAs and two proxy servers. This interaction pattern can be used for UA to UA VoIP communications. To support advanced communications-based services, SIP also supports a few other key interaction patterns for: peer-to-peer, multicast, and conferencing.

Peer-to-peer SIP interactions do not involve any intermediaries; by using dynamic DNS, a UA can use a SIP URI without registering with a SIP server. Multicasting involves a request being sent to an URI with any registrar available to respond to the request. Conferencing involves the establishment of a focus, to which each participant establishes a dialog. The focus logically groups the set of dialogs to provide a conferencing service.

SIP Events

SIP events allow a SIP element to subscribe to events at another SIP element. Using this subscribe/notify pattern, a SIP UA can be notified when a state change occurs in the remote application. The concept of reactive (event-driven) applications is nothing new, but the ability to route events to a UA based on presence information is a powerful communications mechanism


Presence is a critical component of context-based software. Presence is such a significant concept that is has been referred to as the “dial tone of the 21st century.” Presence brings context to communications. Instant messaging (IM) services use presence to route messages to the correct computer. If you are logged into an IM account on a computer and then log on to another computer and sign into your account, your messages will be routed to the computer you are currently using based on presence information. The main functionality that is implemented to support presence is known as “rendezvous,” which allows somebody to contact someone using an AOR without knowing how they are attached to the network.

There are a number of architectural patterns for systems that use presence. The peer-to-peer presence architecture pattern has the UAs subscribe directly to the UA from which it wants to receive events. The opposite of the peer-to-peer presence architectural pattern can be implemented by deploying a presence server as a broker, handling all subscriptions and notifications from UAs.

A common pattern used to architect SIP-based systems is the Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE) architecture pattern. This architecture pattern is based on open standards, and it contains a number of roles among different processing elements. As with the broker architecture, SIMPLE contains a centralized presence server that controls all subscriptions. Another component in the SIMPLE architecture is the presentity, or presence entity. The presentity is a person that has presence that can be described, for example “busy” in an instance messaging program. This architecture also contains an entity known as a watcher, which is an endpoint that subscribes to presence changes. Another role is the presence agent (PA), which notifies the watcher of state changes of the presence of a resource. The final role is the presence user agent (PUA), which is a program that knows the current presence of the presentity.

The following diagram shows all of the software elements in the SIMPLE architecture and describes how they interrelate.

Figure 3. SIMPLE architecture roles

To be able to publish presence information across domain boundaries, an edge-proxy is added to the architecture. In this scenario, the presence servers in a domain are the presence agent to the users in the other domain and are notified of any changes in the state of the users.

Figure 4. Publishing presence information across domains

When a user has the ability to be associated with multiple user agents, there is always a chance of conflict. When publishing presence information, the normal rule is that the last UA to publish presence is the AOR.

IP Multimedia Subsystem

Along with understanding SIP, architecting a CaaS solution requires an understanding of the network context to which the solution will be deployed. In the section about SIP, two deployment scenarios were presented: an intra-domain scenario and a cross-domain scenario. However, this section did not address how to extend CaaS capabilities to the mobile network and public switched telephone network (PSTN). IMS is a network component that may have an impact on how communications solutions are architected. IMS is the element of Third Generation (3G) networks that facilitates the convergence of the Internet and cellular networks.

Considering wireless networks already offer forms of Internet communications, like text messaging and web browsing, it is reasonable to question the necessity of IMS. IMS adds value to the mobile network in two key areas:

  • Quality of Service (QOS) can be guaranteed in a 3G packet-switched network, improving the quality of the media session.
  • Services can be integrated into a network environment, facilitated by IMS defined standard interfaces used by service developers.

These standard interfaces create a “plug and play” environment for communications services, such as a voicemail service.

IMS is based on a very detailed specification, on which volumes have been written. The intent of this article is to describe enough of the high-level concepts of IMS that is required to understand the potential impact on the CaaS architecture.

IMS Architecture

IMS contains the following functions within its infrastructure:

  • Home Subscriber Server(s) (HSS) database
  • Call/Session Control Flow (CSCF) SIP servers(s)
  • Application Server(s) (AS)
  • Media Resource Functions (MRF)
  • Breakout Gateway Control Functions (BRCF)
  • PSTN Gateway(s)

Each of these functions support standard interfaces and can be deployed as separate nodes or physically co-located on the same node.

The HSS is the repository for user related information. User information stored in the HSS includes security information (authentication and authorization); user profile information, including services the user is subscribed to; and user location information.

The CSCF processes SIP signals. There are three types of CSCFs:

  • Proxy (P-CSCF). The P-CSCF is the first point of contact between the user endpoint and the IMS network. The basic functions of the P-CSCF focus around security and message compression/decompression.
  • Interoperating Call Session Control Flow (I-CSCF). The purpose if the I-CSCF is to retrieve user information and route the SIP request to the appropriate destination.
  • Serving Call Session Control Flow (S-CSCF) The S-CSCF is a SIP server that performs session control. The S-CSCF maintains a binding between the user URI and device URI.

The AS is a SIP entity that hosts and executes services. Much of what the AS does is mediate SIP messages and interacts with the IMS infrastructure. The AS also acts as a host for applications that interface using SIP. This application hosting functionality is the layer that proxies the SIP call to the CaaS application.

IMS Message Flow

Figure 5 shows the message flow through the infrastructure. In this example, there is a home network and a visited network to illustrate the main concepts in the IMS infrastructure. This is similar to a cellular network in which the visited network would be a network roamed into and the home network would be the cellular service provider. First, the UE sends a SIP message to the P-CSCF in the visited network. The P-CSCF then passes the SIP request to the I-CSCF, which retrieves user information from the HSS and passes the request to the appropriate destination, in this case the S-CSCF. The S-CSCF inspects the SIP message and interfaces with the HSS to retrieve user information. The S-CSCF determines if the SIP message should be passed to the AS or to a SIP proxy.

Figure 5 shows the S-CSCF passing the message to the AS. There are a number of roles the AS can fulfill: a SIP proxy, a SIP UA, or a concatenation of two SIP UAs, known as a back-to-back user agent (B2BUA). The AS then sends a SIP message back to the S-CSCF for routing.

Figure 5. IMS message flow

CaaS Reference Architecture

Combining the inherent complexities of architecting a multi-tenant solution with the additional requirements of interacting with a communications network across domain boundaries, architecting CaaS solutions is a complex task. Because there are many good references on architecting multi-tenant solutions, this section will focus on the requirements unique to CaaS solutions.

Requirements for Communications Solutions

One key differentiator between communications applications and line-of-business (LOB) applications are the high priority given to a few non-functional requirements, such as 99.999% availability (commonly referred to as 5-9s), high performance (sub-second response times for method invocations), a highly secured environment, interfaces to monitor service usage, and the ability to configure and administer services.

Another characteristic of communications applications is their mission-critical nature and their ubiquitous deployment within an organization. This characteristic makes it nearly impossible to have control of the environment in which a solution will be deployed. This creates dependencies that are somewhat unique to communications applications, such as being required to interface with an existing PBX system to interoperate with desktop phones or a network dependency in the environment in which the solution will be deployed.

Another variable inherent in hosted solutions, including CaaS applications, is the availability of operations support systems (OSS) in the deployment environment. Each deployment of the solution can, and most likely will, have investments in different OSS products. The deployment environment makes architecting a solution dynamic, and decisions need to be made about the different protocols and interaction patterns that will be used to integrate the solution into the hosted environment.

Conceptual Architecture for CaaS Solution

The following diagram contains the conceptual architecture for a CaaS solution. The colors in the diagram highlight different components of the solution:

  • Blue represents for the CaaS application itself.
  • Yellow represents components that may have to be provided by the creator of the CaaS solution.
  • Red represents components that will be in place in the service provider environment.

As stated earlier, the deployment environment will influence the requirements for the components that are not part of the CaaS application itself.

Figure 6. CaaS reference architecture

It is important to note that the conceptual architecture is high level and each component will be decomposed into specific services to fulfill the solution requirements. Another important thing to note is that the conceptual diagram does not imply any physical implementation.

CaaS Application

The CaaS application is the set of services that fulfill a communications function, such as IP telephony. Because this set of services contains the business value of the CaaS solution, this is the part of the architecture that most development organizations will focus on.

The CaaS application is implemented using a layered architecture. A layered architecture abstracts different parts of the application at different levels allowing different layers to be reused in different contexts; for example, the communications service layer, if correctly decomposed, can be customized in different ways to fulfill specific requirements for different customers. In the conceptual architecture, the CaaS application constitutes the following layers:

  • Management interface layer
  • Communications service orchestration layer
  • Communications service layer

This layering is purely logical, because different components may run in the same process as other components. The next sections describe each of these layers in more detail.

Management Interface Layer

Every communications application requires the ability to have the implementer of the solution (service provider, enterprise, or hosting provider) perform the tasks related to the following functions on the solution: fault, configuration, accounting, performance, and security (FCAPS). This layer provides interfaces for those functions.

Different functions of the components in the management interface layer will interface with external applications in different fashions. Some interfaces may be implemented with a request/response message exchange pattern (MEP) exposed as a Web service; other interfaces may report events by sending Simple Network Management Protocol (SNMP) traps. Current and emerging technical trends will dictate the method and protocol of the interface.

The following table describes the components contained in the management interface layer of the CaaS application.



Configuration interface

These interfaces are for the configuration of the hardware and software that facilitate the CaaS application. The interfaces permit for the configuration of the physical environment, such as hardware and networking, and the configuration of the logical environment, such as the services that are running on the communications solution.

Billing interface

These interfaces provide billing information to the service provider. This interface may be event-driven, providing a data stream, or polled regularly for formatted files.

Reporting interface

These interfaces provide information about the faults and performance. Depending on the content, information can be sent as events, such as SNMP traps, or polled by the consumer of the data.

Communications Service Orchestration Layer

The communications service orchestration layer orchestrates the communications services in the communications service layer. The orchestration layer provides a level of agility in the CaaS solution by allowing the autonomous services in the communications service layer to be composed into customized communications processes based on customer requirements. The communications service orchestration layer helps provide a customized communications experience in a multi-tenant environment. Customization is often a requirement of corporate communications solutions.



Communications orchestration

The communications orchestrations compose communications services to provide a communications function. For example, an Automatic Call Distribution (ACD) application may have specific routing requirements based on the keypad input of the user, such as “Dial 0 to reach the operator.” These routing requirements can be implemented as an orchestration: (1) answer call, (2) prompt for input, (3) process input, and (4) route call to appropriate destination.

Communications Service Layer

The communications service layer contains autonomous services that perform communication tasks; for example, the routing logic of a “forward call” function would be implemented in this layer of the CaaS solution. There are no specific technical requirements of this layer. Services can be implemented as class libraries or they can even be an entire communications system, such as a PBX or a voicemail system.



Communications service

The services that perform a set of communications tasks, such as call routing or voicemail.

SIP Services

The SIP services provide the ability for the CaaS solution to communicate using SIP. These services understand the capabilities of SIP, such as how to route a call to a user based on presence information and how to change the state of a user based on SIP events. These services also contain IMS services that interface with the HSS and the S-CSCF. In initial CaaS implementations, the solution architect will most likely have a high degree of influence about the provider of the SIP services.

Management Application

The CaaS management application is optional, depending on the needs of the service provider. The management application can supplement the CaaS application by providing additional services that are typically contained in the business support systems (BSS) and operations support systems (OSS). These services may include the management of the physical environment (servers, processes and networking), such as taking remedial action for faults; mapping the logical model (customers and services) to the physical environment, such as Customer X’s service is running on server 1234; and providing billing information.

Security Services

The CaaS security services may also be part of the existing service provider environment. In a CaaS environment, there is a need to federate security from the customer’s location to the service provider. If a federated security system is not part of the service provider environment, the provider of the CaaS application may have to add these capabilities to their solution.

Directory Services

Similar to the security services, federated directory services are a requirement of a CaaS solution. If a service provider does not have federated directory services, the provider of the CaaS application may have to add these services to its CaaS solution.

Business Support Systems/Operations Support Systems

BSSs/OSSs are carrier-grade systems that automate service provider’s business processes and operations. The functions of these systems include, among others, billing, provisioning, and fault management. BSS/OSS systems are almost always part of a service provider environment and CaaS architects will have very little influence over these systems. CaaS architects will have to define interfaces for their solutions to interoperate with the BSS/OSS. There are standards bodies, such as the TeleManagement Forum (TMF), that define standards for BSS/OSS interoperability.

IMS Infrastructure

The IMS infrastructure will be supplied by the communications service provider. Analogous to the network infrastructure for a Web site, CaaS architects will have no influence about the vendor of IMS equipment and will have to adhere to the standards that are part of the IMS specification.

Remote Domain

The remote domain in the conceptual architecture illustrates branch office or enterprise-to-enterprise (E2E) SIP communications. Early CaaS implementations may be able to restrict branch office or E2E communications to enterprises using SIP services from a single vendor. As the implementation of Internet-based communications evolves, the influence of a CaaS architect over the SIP servers at a remote domain will diminish.

Thoughts on CaaS

Being an architect on a CaaS solution over the last few years has brought some interesting perspective into the unique challenges of the development and deployment of these applications. As I mention throughout this article, CaaS applications require additional considerations past hosted multi-tenant solution development, whose design challenges tend to reside in the orchestration layer and the security services. The SIP services layer requires knowledge unique to communications solutions, and the lack of an understanding of the patterns required to implement these solutions can hinder the effort.

Another unique consideration of CaaS solutions are the implementation environments. Communications service providers all have unique environments and different criteria for qualifying and deploying a solution into their environments. Getting a CaaS solution qualified for deployment may take a significant amount of time after the software itself has been developed. As mentioned in the description of the conceptual architecture, parts of the application may be required for some deployments and not others, such as security services and the management application. These services will have to be developed and tested as part of the initial implementation, even though they may not be part of the initial deployments.

After those words of caution, you may wonder whether developing a CaaS solution is worth it. From a financial standpoint, it will be. Based on Gartner, Inc. Worldwide projection, “CaaS is projected to total $251.9 million in 2007, a 37.6% increase from last year. The market is expected to total $2.3 billion in 2011, representing a compound annual growth rate at more than 105% for the period.”

Along with the potential financial windfall, architecting CaaS solutions should appeal to the most fervent of alpha geeks. Along with defining a solution that implements a pure service-oriented architecture (SOA), the technologies involved with architecting a CaaS solution, the hardware, networking and software, should pique our technical interests for quite some time. The potential for CaaS applications that permit for multiple services (voice, video, and data) over multiple devices (mobile, wire-line, and computer) are boundless and will surely produce some killer applications in the near future.

About the Author

Joseph Hofstader is an Architect/Evangelist for Microsoft’s Communications Sector of North America. Joseph has spent his career architecting, designing, and developing applications in the telecommunications industry. Joseph has spent the majority of the last few years as an architect on a solution that would now be classified as CaaS.


Gartner Forecasts Worldwide Communications-as-a-Service Revenue to Total $252 Million in 2007; September, 2007.