August 2014

Volume 29 Number 8

Microsoft Azure : Microsoft Azure Media Services

Gregory Prentice

Over the years, Microsoft has helped design and support large-scale, live streaming video delivery. One prime example is the 2014 Sochi Olympics. This required a huge amount of technological resources, including hardware servers, source video streams, encoding, server redundancy (dual datacenters), output-adaptive video streams, Content Delivery Networks (CDNs), partner companies and, of course, Microsoft staff.

All these resources—along with customer software development—are needed to ensure the end-to-end live video event is executed with few or no incidents. The cost of orchestrating these resources is reflective of the scale of the live streaming event. There’s a high capital expense incurred to purchase the required servers, switches and related technology. This leads to obvious risks and the following questions:

  • Was enough or too much hardware purchased to handle the scale of the live event?
  • What do you do with said hardware until the next event?
  • How long will the acquired technology still be relevant?
  • Were the correct technology partners included to ensure the event is orchestrated flawlessly?

Considering Microsoft’s involvement, knowledge and success in live video events, the Microsoft Azure Media Services team developed live streaming to mitigate these risks. The team successfully streamed live global video feeds of the 2014 Sochi Olympics, running on Microsoft Azure Media Services. While that was an extremely large event, Microsoft Azure provides on-demand hardware to address scalability.

Depending on the size of the live video event, a request to Azure Media Services may create a scale unit—small, medium or large—to provide streaming live video to hundreds of thousands or just hundreds of viewers. The pricing allows pay-as-you-go models to procure, use and release the live streaming service, providing the costs are known prior to an event. The hardware and infrastructure are continually upgraded and refreshed. Microsoft also maintains strategic relationships that are relevant to delivering live streaming solutions.

In this article, I’ll focus on an example scenario that will use the new live streaming from the Microsoft Azure Media Services team (currently in private preview), address the aforementioned risks and discuss the relationship of the live streaming offering to the existing video-on-demand (VOD) service.

Concepts and Terminology

It’s important to understand some general concepts and terminology regarding Azure Media Services. The term “channel” refers to the developer’s view of the full end-to-end path that one live video stream takes through Azure Media Services. A channel can be in various states, with the two most important being “stopped” and “running.” A channel includes the following components that coordinate the video streams through the system:

  • Ingest URI: Represents an ingress point at which the channel receives one or more video bitrate streams for delivery.
  • Preview URI: Represents an egress point of the live stream as received by the channel, which should only be used for monitoring purposes.
  • Program: Associated with a channel and represents a portion of the live stream persisted to Blob storage as an asset. At least one program should be created on a channel and in a running state to enable a live video stream. The program will actively capture video while in a running state.
  • Asset: Associated with a program, this represents data stored in Blob storage. An Asset might contain one or more files including video, audio, images, thumbnail collections and manifest. The asset may still exist even after deleting a program.
  • Locator: Associated with the asset and an origin. A locator provides a URI used as the egress point of the live program stream.
  • Origin: Associated with a locator, this represents the scalable egress point for the video streams’ delivery from an asset’s locator URI in multiple bitrate (MBR) formats, such as Smooth, HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP.

Azure Media Services ensures proper redundancy is created and available for reliable delivery of video at scale. That delivery could involve many devices that consume varying streaming formats, such as Smooth, HLS and DASH. Figure 1 shows what constitutes a channel.

A Delivery Channel Is the Full End-to-End Path of a Live Video Stream
Figure 1 A Delivery Channel Is the Full End-to-End Path of a Live Video Stream

It’s important to understand the relationship between a channel and one or more programs created within the channel. In Figure 2, a live video event has been mapped onto a channel representation that’s in a running state from 5 p.m. until 2 a.m. It’s important to note the channel’s transition time from a stopped state to a running state can require 10 to 20 minutes. The interval markers—Concert1, Vip1, Act1, Act2 and Vip2—represent programs and their planned start and stop times. Finally, the interval markers named Transition represent the time between the current program and the next program, where one is stopped and the other is started.

A Live Video Event Mapped to Start at 5 p.m. and End at 2 a.m.
Figure 2 A Live Video Event Mapped to Start at 5 p.m. and End at 2 a.m.

This is one example of how a timeline might map onto a channel using programs. As defined in Figure 2, there’s one program that spans the entire event.

Concert1 will be in a running state from 5:15 p.m. until 2 a.m., with a sliding video buffer of 10 minutes. The buffering lets users rewind while viewing with a media player. The Concert1 program provides the primary live video stream of the entire event. The CDNs are provided with URIs for mapping prior to the event.

The remaining programs—Vip1, Act1, Act2 and Vip2—should be in a running state during the defined start and stop times, with a variance allowed for transitions. A program doesn’t start or stop immediately. You need to consider transitions when planning a live event. You can have multiple programs in a running state at the same time, all capturing video. Concert1 and another program will start and stop during the course of the event. To show how you might use a transition, you could issue a program start request for Vip2 at 6:45 p.m., then a program stop request for Vip1 at 7 p.m., providing a 15-minute transition time. Therefore, during this 15-minute transition, there could be three programs in a running state.

Having described general concepts and terminology, I’ll focus on using the event timeline illustrated in Figure 2, mapped onto a user scenario to develop coding examples using the Azure Media Services API.

The Theme: Contoso’s Blues Bar

A well-known U.S. venue for up-and-coming musicians, Contoso’s Blues Bar has a tight deadline to coordinate an upcoming live concert for a large European band called Contoso and the Shock Wallets. The band’s fan base is largely in Europe. The show at Contoso’s will be the band’s U.S. debut. Therefore, the show managers need to stream the concert live so the band’s European fan base can watch on computers and mobile devices.

The CIO of Contoso’s Blues Bar calls a meeting and asks his technology team to research and recommend a solution that will meet the following requirements:

  • It must be easy to implement.
  • It must stream video to a large number of different device configurations.
  • It must meet unknown scalability needs.
  • The cost is based on what’s used during the event, pay-as-you-go.

Sprint One

The Contoso’s Blues Bar event planning team spends the next few days defining their user stories. These user stories will be the basis for sprint planning to deliver video streaming of their live concert. During the meeting, the team defines some knowledge gaps that will force a number of spikes—or questions demanding resolution—during sprint one. The team also defines the following list of the larger user stories, typically referred to as Epics, in the standard “as a ... I want ... so that ...” format:

As a fan of Contoso and the Shock Wallets, I want to watch the live concert on my device at the highest video quality possible so that I can enjoy their U.S. debut.

As a fan of Contoso and the Shock Wallets, I want to watch a video of the concert on my device at the highest possible video quality at a later date and time so that I can watch the event at my leisure.

As a Contoso’s Blues Bar representative, I want to deliver a live broadcast of our concerts and reduce the expense of delivering live streaming video from our venue so that we can attract more musicians and customers while saving money.

The user stories and associated spike investigations include:

User Story 1.1: As an event production staffer, I want one or more cameras so that we can capture the live concert.

Spike 1.1.1: What camera type is being used and what is the output video stream?

The production staff learns they can purchase high-quality used video cameras to produce a high definition (HD) video/audio stream over a Serial Digital Interface (SDI).

User Story 1.2: As an event production staffer, I want to stream live video to the Internet so that fans can watch a concert.

Spike 1.2.1: How will the camera video feed get delivered to the production station?

The production staff learns Contoso Switch Company produces an HD SDI-to-fiber-optics video switch. They can use this switch with up to four cameras.

Spike 1.2.2: How will that video feed be delivered to the IT department?

The production staff learns the fiber-optics video switch will output an HD SDI video/audio stream they can connect to an audio/video switcher. They can then control which camera signal is active through a broadcast panel. The live video feed is ultimately sent from the broadcast panel by an HD SDI connection.

User Story 1.3: As an IT staffer, I want to deliver the video stream to Windows, iOS and Android devices so the concert video has the maximum device support.

Spike 1.3.1: What type of video feed am I receiving in order to deliver video to the Internet?

The HD SDI video feed will be sent to the IT department.

Spike 1.3.2: What type of video feeds are needed for the target devices?

The IT staff discovers they can use the following adaptive video protocols to deliver video to each of the target devices:

  • Smooth Streaming: Windows 8 and Windows Phone 8
  • HLS: iOS and Android
  • DASH: Windows 8.1, Windows Phone 8 and Xbox One

Spike 1.3.3: How am I delivering that video to the Internet in a scalable fashion?

The IT staff learns Microsoft has announced a new feature on Azure that specifically addresses streaming live video over the Internet. With some research, the team discovers Azure Media Services can provide the middle layer between their venue and the targeted devices. Most important, they learn the following details about Azure Media Services:

  • Once you’ve signed up for an Azure account and add a Media account, you can create a live streaming channel. A live channel is synonymous to a TV channel. All allocated server resources are dedicated to that channel for the delivery of your programs. A channel may have many programs. A program is the definition of a time slot and the associated asset. An asset is a storage location of the streamed video in Azure Media Services.
  • The server allocation ensures redundancy is built into the video path with no single point of failure.
  • A live video stream can be received as RTMP or MPEG TS by Azure Media Services via an ingest URI.
  • Devices can request a video stream that’s natively supported. Microsoft Azure Media Services will ensure the video is packaged in the proper format base on a device-specific URI.
  • Support is built into the live streaming origin servers for secure access by a CDN provider, such as Akamai Technologies.

Additional Results of Spike 1.3.1 and 1.3.2: The IT staff chooses an encoder, which can receive an HD SDI video/audio stream. This encoder can use the HD SDI streams to dynamically generate multiple bitrate streams for delivery to Azure Media Services, which provides an ingress URI that accepts RTMP or MPEG TS as an input format.

Sprint Two

After their morning coffee rituals, the developers begin developing code that will facilitate the live streaming event.

User Story 1.4.1: As a developer, I want to create the proper Azure accounts so that I can begin writing code to stream video.

Go to the Azure Web site (azure.microsoft.com), click on the Try for free button and follow the steps. Learn more about setting up an account at bit.ly/1mfacft. During the sign-up process, you can create a Windows Account that’s used as the administrator credentials. To create an Azure Media Services account, you must first create an Azure Storage account, as the video assets for a live event will be stored in Blob storage. The Azure Media Services documentation also recommends you create the storage account in the same datacenter from which you procure the Azure Media Services account. For example, create a storage account and media account in the US-WEST datacenter.

User Story 1.4.2: As a developer, I want to write code to manage a channel so that I can deliver a live streaming event.

Create a base project using Visual Studio 2012. Next, install the NuGet packages for Azure Media Services. Create a function called CreateContosLiveEvent and start with a CloudMediaContext object, which is always used to manage your live streams:

private void CreateContosLiveEvent()
{
  string liveAccount = "yourAzureMediaAccount";
  string liveKey = "yourAzureMediaAccountKey";
  CloudMediaContext LiveServices = new CloudMediaContext(
    liveAccount,  // URI created earlier
    liveKey );    // Account name

Write the code to ensure an origin service is running. The origin service will deliver the live streams to the CDN providers, as shown in Figure 3.

Figure 3 An Origin Service Delivers Live Video Streams to Content Delivery Network Providers

string originName = "abcdefg";
IOrigin origin = LiveServices.FindOrigin(originName);
if (origin == null)
{
  Task<IOrigin> originTask = 
    LiveServices.Origins.CreateAsync(originName, 2);
  originTask.Wait();
  origin = originTask.Result;
}
if (origin.State == OriginState.Stopped)
{
  origin.StartAsync().Wait();
}

Then you need to write the code to create a channel (shown in Figure 4). A channel requires security configured to allow authentication of the inbound video stream source. The source is typically an IP-configured encoder that converts an HDI SDI signal into the required MBR video streams.

Figure 4 Create a Channel for Your Video Feed

string channelName = "ContsoBluesChannel";
IChannel channel = LiveServices.FindChannel(channelName);
if (channel != null)
{
  Debug.WriteLine("Channel already exists!");
  return;
}
ChannelSettings settings = new ChannelSettings();
Ipv4 ipv4 = new Ipv4();
// Currently setting IP to 0.0.0.0/0 allows all connections
//  Don't do this for production events
ipv4.IP = "0.0.0.0/0";
ipv4.Name = "Allow all connections";
// Protect the ingest URI
settings.Ingest = new IngestEndpointSettings();
settings.Ingest.Security = new SecuritySettings();
settings.Ingest.Security.IPv4AllowList = new List<Ipv4>();
settings.Ingest.Security.IPv4AllowList.Add(ipv4);
// Protect the preview URI
settings.Preview = new PreviewEndpointSettings();
settings.Preview.Security = new SecuritySettings();
settings.Preview.Security.IPv4AllowList = new List<Ipv4>();
settings.Preview.Security.IPv4AllowList.Add(ipv4);
// Create the channel
Task<IChannel> taskChannel = LiveServices.Channels.CreateAsync(
  channelName, "video streaming", ChannelSize.Large, settings);
taskChannel.Wait();
channel = taskChannel. Result;

The sample code configures ingest and preview URIs with a Classless Inter-Domain Routing (CIDR) formatted IP address set to “0.0.0.0/0.” This allows all IP addresses access to the ingest and preview URIs. It’s recommended for production channels that you restrict access by using known values such as your encoders’ or authentication tokens’ IP address. Use a unique channel name. There will be an exception thrown if there’s already a channel with the same name.

Write the code that creates the programs, associated assets and locators, as shown in Figure 5. The values used for the names and timelines are related to the values presented in Figure 2. The enableArchive flag is set to true during program creation for everything except Concert1. A true value indicates a live video stream is captured during the program’s running state and remains persisted with the associated asset for later consumption as VOD. The locator with an access policy of 30 days is created to the asset’s manifest file. This provides a URI to the streaming video.

Figure 5 Create the Programs That Will Run During Your Feed

// Define the program name, DVR window and estimated duration in minutes.
// NOTE: DVR window value most likely will be removed when service 
// reaches public preview.
Tuple<string, int, int>[] programSettings = new Tuple<string, int, int>[]
{
  new Tuple<string,int,int>( "Concert1", 60, 60 ),
  new Tuple<string,int,int>( "Vip1", 75, 75 ),
  new Tuple<string,int,int>( "Act1", 90, 90 ),
  new Tuple<string,int,int>( "Act2", 165, 165 ),
  new Tuple<string,int,int>( "Vip2", 120, 120 ),
};
foreach (Tuple<string, int, int> programSetting in programSettings)
{
  IAsset asset = null;
  // To persist specific program's asset for Video on Demand (VOD) this
  // code tests if the DVR window and Event duration equal in length.
  // If true it sets the enable archive flag to true, which tells the
  // system a program's asset should not be deleted automatically.
  bool enableArchive = programSetting.Item2 == programSetting.Item3;
  try
  {
    // Create the asset that is used to persist the video streamed
    // while the program is running and VOD.
    asset = LiveServices.Assets.Create(programSetting.Item1 + "_asset",
      AssetCreationOptions.None);
    Task<IProgram> taskProgram = channel.Programs.CreateAsync(
      programSetting.Item1,
      "program description",
      // Enable archive note forcing to true for now
      true, // enableArchive,
    // NOTE: DVR window value most likely will be removed
    // This is not used
    TimeSpan.FromMinutes(programSetting.Item2),
    // Estimated duration time
    TimeSpan.FromMinutes(programSetting.Item3),
      asset.Id);
    taskProgram.Wait();
    LiveServices.CreateLocators(asset, TimeSpan.FromDays(30));
  }
  catch (Exception exp)
  {
    Debug.WriteLine(exp.Message);
  }
}

You can obtain an origin URI from a locator, as with an Azure Media Services VOD asset. You would then provide this to the CDN provider of choice. Because the enableArchive flag was set, once a program has been stopped, you can use the same origin URI to deliver the live stream as a VOD asset.

The remaining task is to write the code to start a channel and the programs when needed. The request to start a channel signals Azure Media Services to begin allocating resources for the ingress services, redundancy and load balancing. When a channel starts, the time between the starting and running states can take 10 to 20 minutes.

You should start the channel early enough prior to the event so as to not interfere with the ability to view the live video stream. It’s also important to note that once your channel has started, billing charges begin to accumulate. If you don’t plan on running a 24x7 live channel, you should stop the channel and associated programs when they’re not being used. The actual code needed to start a channel is quite simple:

// Start channel
Task start = channel.StartAsync();
start.Wait();

The code you need to start the program is also straightforward:

start = program.StartAsync();
start.Wait();

One primary requirement for the event is the ability to control when the channel and each program start and stop. Therefore, the next item up for development is a simple UI that will create the event in its entirety by calling the method CreateContosLiveEvent. This permits starting and stopping the channel and each corresponding program as needed.

To close out the scenario, as you’ve developed the application using Azure Media Services, the other team members have been busy setting up the cameras, running cables, installing a video encoder and configuring CDNs. Finally, the staff performs a series of tests to ensure the end-to-end scenario works.

Once the actual concert is about to happen, a live video stream starts and different user devices, desktops and tablets connect to Azure Media Services to view the live video streams. There are always a few problems uncovered during test runs, but the staff works the kinks out, and all the hard work is done in time to show live video streaming of the concert.

A Daunting Task Made Easier

Hosting live video events at scale can be a daunting task. Before solutions such as Azure Media Services, delivering live streaming video often required significant capital expenditure for infrastructure equipment. Other critical decisions included the quantity of systems needed to support different events that vary in size and scale. Such purchases were often either over- or under-provisioned. Microsoft Azure Media Services responds to that complexity and cost. The Live offering is built atop the Azure Media Services VOD feature set, providing a single, unified platform and APIs for delivering both live and VOD content to the industry at scale.


Gregory Prentice is a seasoned software architect with more than 25 years of experience designing and creating applications for various startup companies. He started working for Microsoft developing the following projects: Locadio, Microsoft Hohm and Microsoft Utility Rates Service. Most recently he helped the Microsoft Azure Media Services and Delta Tre teams deliver the 2014 Sochi Olympics live video stream. He’s currently an evangelist in the Developer and Platform Evangelism group at Microsoft. Read his blog at blogs.msdn.com/b/greg-prentice.

Thanks to the following Microsoft technical experts for reviewing this article: Steven Goulet, Principal PM, Live Services: and Jason Suess, Live Services for last Olympics