March 2012

Volume 27 Number 03

Touch and Go - Streaming Audio in Windows Phone

By Charles Petzold | March 2012

Charles PetzoldWhether running on the desktop, the Web or in your hand, computer programs sometimes need to play sounds or music. Most often, this audio will be entirely encoded in MP3 or WMA files. The big advantage to this approach is that the OS itself usually knows how to decode and play these files. The application can then focus on the relatively easier job of providing a UI for pausing, restarting and perhaps navigating among tracks.

But life isn’t always so convenient. Sometimes a program needs to play an audio file in a format not supported by the OS, or even to generate audio data dynamically, perhaps to implement electronic music synthesis.

In the parlance of Silverlight and Windows Phone, this process is known as audio “streaming.” At run time the application provides a stream of bytes that comprise the audio data. This occurs through a class derived from MediaStreamSource, which feeds the audio data to the OS audio player on demand. Windows Phone OS 7.5 can stream audio in the background, and I’ll show you how to do it.

Deriving from MediaStreamSource

The essential first step in dynamically generating audio data in a Windows Phone program is deriving from the abstract class MediaStreamSource. The code involved is rather messy in spots so instead of writing the code from scratch, you’ll probably want to copy somebody else’s.

The SimpleAudioStreaming project in the downloadable source code for this article shows one possible approach. This project contains a MediaStreamSource derivative named Sine440AudioStreamSource that simply generates a sine wave at 440 Hz. This is the frequency corresponding to the A above middle C that’s commonly used as a tuning standard.

MediaStreamSource has six abstract methods that a derivative needs to override, but only two of them are crucial. The first is OpenMediaPlayer, in which you need to create a couple Dictionary objects and a List object, as well as define the type of audio data your class provides and various parameters describing this data. The audio parameters are represented as fields of a Win32 WAVEFORMATEX structure with all the multibyte numbers in little-endian format (most significant byte first) and converted to a string.

I’ve never used MediaStreamSource for anything other than audio in the pulse-code modulation (PCM) format, which is used for most uncompressed audio, including CDs and the Windows WAV file format. PCM audio involves constant-size samples at a constant rate called the sample rate. For CD-quality sound, you’ll use 16 bits per sample and a sample rate of 44,100 Hz. You can choose either one channel for monaural sound or two channels for stereo.

The Sine440AudioStreamSource class hardcodes a single channel and a 16-bit sample size but allows the sample rate to be specified as a constructor argument.

Internally, the audio pipeline maintains a buffer of audio data, the size of which is specified by the AudioBufferLength property of MediaStreamSource. The default setting is 1,000 ms, but you can set it as low as 15 ms. To keep this buffer full, calls are made to the GetSampleAsync method of your MediaStreamSource derivative. Your job is to shovel a bunch of audio data into a MemoryStream and call ReportGetSampleCompleted.

The Sine440AudioStreamSource class is hardcoded to provide 4,096 samples per call. With a sample rate of 44,100, that’s not quite one-tenth second of audio per call. Playing around with this value and the AudioBufferLength property is necessary if you’re implementing a user-controlled synthesizer that must respond quickly to user input. For minimum latency you’ll want to keep the buffer sizes small but not too small such that gaps in the playback result.

Figure 1 shows the Sine440AudioStreamSource implementation of the GetSampleAsync override. Within the loop, a 16-bit sine value is obtained from a call to the Math.Sin method scaled to the size of the following short:

short amplitude = (short)(short.MaxValue * Math.Sin(angle));

Figure 1 The GetSampleAsync Method in Sine440AudioStreamSource

protected override void GetSampleAsync(MediaStreamType mediaStreamType)
{
  // Reset MemoryStream object
  memoryStream.Seek(0, SeekOrigin.Begin);
  for (int sample = 0; sample < BufferSamples; sample++)
  {
    short amplitude = (short)(short.MaxValue * Math.Sin(angle)); 
    memoryStream.WriteByte((byte)(amplitude & 0xFF));
    memoryStream.WriteByte((byte)(amplitude >> 8));
    angle = (angle + angleIncrement) % (2 * Math.PI);
  }
  // Send out the sample
  ReportGetSampleCompleted(new MediaStreamSample(mediaStreamDescription,
    memoryStream,
    0,
    BufferSize,
    timestamp,
    mediaSampleAttributes));
  // Prepare for next sample
  timestamp += BufferSamples * 10000000L / sampleRate;
}

That amplitude is then split into 2 bytes and stored in the MemoryStream, low byte first. For stereo, each sample requires two 16-bit values, alternating between left and right.

Other calculations are possible. You can switch from a sine wave to a sawtooth wave by defining amplitude like this:

short amplitude = (short)(short.MaxValue * angle / Math.PI + short.MinValue);

In both cases, a variable named “angle” ranges from 0 to 2π radians (or 360 degrees) so it references a single cycle of a particular waveform. After each sample, angle is increased by angleIncrement, a variable calculated earlier in the class based on the frequency of the sample rate and the frequency of the waveform to be generated, which here is hardcoded as 440 Hz:

angleIncrement = 2 * Math.PI * 440 / sampleRate;

Notice that as the frequency of the generated waveform approaches half the sample rate, angleIncrement approaches π or 180 degrees. At half the sample rate, the generated waveform is based on just two samples per cycle, and the result is a square wave rather than a smooth sine curve. However, all the harmonics in this square wave are above half the sample rate.

Also notice that you can’t generate frequencies greater than half the sample rate. If you try, you’ll actually generate “aliases” that are below half the sample rate. Half the sample rate is known as the Nyquist frequency, named after Harry Nyquist, an engineer who worked for AT&T when he published an early paper on information theory in 1928 that laid the foundations for audio sampling technology.

For CD audio a sample rate of 44,100 Hz was chosen partially because half of 44,100 is greater than the upper limit of human hearing, commonly regarded as 20,000 Hz.

Aside from the Sine440AudioStreamSource class, the remainder of the SimpleAudioStreaming project is fairly easy: The MainPage.xaml file contains a MediaElement named mediaElement, and the MainPage OnNavigatedTo override calls SetSource on this object with an instance of the MediaStreamSource derivative:

mediaElement.SetSource(new Sine440AudioStreamSource(44100));

I originally had this call in the program’s constructor, but I discovered that the music playback couldn’t be resumed if you navigated away from the program and then back without the program being tombstoned.

The SetSource method of MediaElement is the same method you call if you want MediaElement to play a music file referenced with a Stream object. Calling SetSource would normally start the sound playing, but this particular MediaElement has its AutoPlay property set to false, so a call to Play is also necessary. The program includes Play and Pause buttons in its application bar that perform these operations.

While the program is running, you can control the volume from the phone’s Universal Volume Control (UVC), but if you terminate the program or navigate away from it, the sound stops.

Moving to the Background

It’s also possible to use this same MediaStreamSource derivative to play sounds or music in the background. Background music continues to play on the phone if you navigate away from the program or even terminate it. For background audio, you can use the phone’s UVC not only to control the volume but also to pause and restart the audio and (if applicable) jump ahead or back to other tracks.

You’ll be happy to discover that much of what you learned in last month’s installment of this column (msdn.microsoft.com/magazine/hh781030) continues to apply to background audio streaming. In that column I described how to play music files in the background: You create a library project that contains a class that derives from AudioPlayerAgent. Visual Studio will generate this class for you if you add a new project of type Windows Phone Audio Playback Agent.

The application must have a reference to the DLL containing the AudioPlayerAgent but doesn’t access this class directly. Instead, the application accesses the BackgroundAudioPlayer class to set an initial AudioTrack object and to call Play and Pause. You’ll recall that the AudioTrack class has a constructor that lets you specify a track title, an artist and an album name, as well as specify which buttons should be enabled on the UVC.

Typically the first argument to the AudioTrack constructor is a Uri object that indicates the source of the music file you wish to play. If you want this AudioTrack instance to play streaming audio rather than a music file, you’ll set this first constructor argument to null. In that case BackgroundAudioPlayer will look for a class that derives from AudioStreamingAgent in a DLL referenced by the program. You can create such a DLL in Visual Studio by adding a project of type Windows Phone Audio Streaming Agent.

The SimpleBackgroundAudioStreaming solution in the downloadable code for this article shows how this is done. The solution contains the application project and two library projects, one named AudioPlaybackAgent containing an AudioPlayer class that derives from AudioPlayerAgent, and the other named AudioStreamAgent containing an AudioTrackStreamer class that derives from AudioStreamingAgent. The application project contains refer­ences to both these library projects, but it does not attempt to access the actual classes. For reasons I discussed in my previous column, doing so is futile.

Let me emphasize again that the application must contain references to the background agent DLLs. It’s easy to omit these references because you won’t see any errors, but the music won’t play.

Much of the logic in SimpleBackgroundAudioStreaming is the same as for a program that plays music files in the background, except that whenever BackgroundAudioPlayer tries to play a Track with a null Uri object, the OnBeginStreaming method in the AudioStreamingAgent derivative will be called. Here’s the exceptionally simple way in which I handle that call:

protected override void OnBeginStreaming(AudioTrack track, AudioStreamer streamer)
{
  streamer.SetSource(new Sine440AudioStreamSource(44100));
}

That’s it! It’s the same Sine440AudioStreamSource class I described earlier, but now included in the AudioStreamAgent project.

Although the SimpleBackgroundAudioStreaming program creates only one track, you can have multiple track objects, and you can mix AudioTrack objects that reference music files and those that use streaming. Notice that the AudioTrack object is an argument to the OnBeginStreaming override, so you can use that information to customize the particular MediaStreamSource you want to use for that track. To let you provide more information, AudioTrack has a Tag property that you can set to any string you want.

Building a Synthesizer

A steady sine curve at 440 Hz can be pretty boring, so let’s build an electronic music synthesizer. I’ve put together a very rudimentary synthesizer consisting of 12 classes and two interfaces in the Petzold.MusicSynthesis library project in the SynthesizerDemos solution. (Some of these classes are similar to code I wrote for Silverlight 3 that appeared in my blog in July 2007—accessible from charlespetzold.com.)

At the center of this synthesizer is a MediaStreamSource derivative named DynamicPcmStreamSource that has a property named SampleProvider, defined like so:

public IStereoSampleProvider SampleProvider { get; set; }
The IStereoSampleProvider interface is simple:
public interface IStereoSampleProvider
{
  AudioSample GetNextSample();
}

AudioSample has two public fields of type short named Left and Right. In its GetSampleAsync method, DynamicPcmStreamSource calls the GetNextSample method to obtain a pair of 16-bit samples:

AudioSample audioSample = SampleProvider.GetNextSample();

One class that implements IStereoSampleProvider is named Mixer. Mixer has an Inputs property that’s a collection of objects of type MixerInput. Each MixerInput has an Input property of type IMonoSampleProvider, defined like so:

public interface IMonoSampleProvider
{
  short GetNextSample();
}

One class that implements IMonoSampleProvider is named SteadyNoteDurationPlayer, which is an abstract class that can play a series of notes of the same duration at a particular tempo. It has a property named Oscillator that also implements IMonoSample­Provider to generate the actual waveforms. Two classes derive from SteadyNoteDurationPlayer: Sequencer, which plays a series of notes repetitively; and Rambler, which plays a random stream of notes. I use these two classes in two different applications in the SynthesizerDemos solution, one that only runs in the foreground and another that plays music in the background.

The foreground-only application is called WaveformManipulator, and it features a control that lets you interactively define a waveform used for playing back the music, as shown in Figure 2.

The WaveformManipulator Program
Figure 2 The WaveformManipulator Program

The points defining the waveform are transferred to an Oscil­lator derivative named VariableWaveformOscillator. As you move the round touch-points up and down, you’ll notice about a one-second delay before you actually hear a change in the music’s timbre. This is a result of the default 1,000 ms buffer size defined by MediaStreamSource.

The WaveformManipulator program uses two Sequencer objects loaded with the same series of notes, which are E minor, A minor, D minor and G major arpeggios. The tempi for the two Sequencer objects are slightly different, however, so they drift out of synchronization, first with a type of reverb or echo effect, and then more like counterpoint. (This is a form of “process music” inspired by the early work of American composer Steve Reich.) The initialization code from MainPage.xaml.cs that “wires up” the synthesizer components is shown in Figure 3.

Figure 3 Synthesizer Initialization Code in WaveformManipulator

// Initialize Waveformer control
for (int i = 0; i < waveformer.Points.Count; i++)
{
  double angle = (i + 1) * 2 * Math.PI / (waveformer.Points.Count + 1);
  waveformer.Points[i] = new Point(angle, Math.Sin(angle));
}
// Create two Sequencers with slightly different tempi
Sequencer sequencer1 = new Sequencer(SAMPLE_RATE)
{
  Oscillator = new VariableWaveformOscillator(SAMPLE_RATE)
  {
    Points = waveformer.Points
  },
  Tempo = 480
};
Sequencer sequencer2 = new Sequencer(SAMPLE_RATE)
{
  Oscillator = new VariableWaveformOscillator(SAMPLE_RATE)
  {
    Points = waveformer.Points
  },
  Tempo = 470
};
// Set the same Pitch objects in the Sequencer objects
Pitch[] pitches =
{
  ...
};
foreach (Pitch pitch in pitches)
{
  sequencer1.Pitches.Add(pitch);
  sequencer2.Pitches.Add(pitch);
}
// Create Mixer and MixerInput objects
mixer = new Mixer();
mixer.Inputs.Add(new MixerInput(sequencer1) { Space = -0.5 });
mixer.Inputs.Add(new MixerInput(sequencer2) { Space = 0.5 });

The Mixer, DynamicPcmStreamSource and MediaElement objects are connected together in the OnNavigatedTo override:

DynamicPcmStreamSource dynamicPcmStreamSource =
  new DynamicPcmStreamSource(SAMPLE_RATE);
dynamicPcmStreamSource.SampleProvider = mixer;
mediaElement.SetSource(dynamicPcmStreamSource);

Because WaveformManipulator uses MediaElement for playing the music, it only plays when the program is running in the foreground. 

Background Limitations

I gave a lot of thought to making a version of WaveformManipulator that played the music in the background using BackgroundAudioPlayer. Obviously you would only be able to manipulate the waveform when the program was in the foreground, but I couldn’t get past an obstacle that I discussed in last month’s column: The background agent DLLs your program supplies to handle background processing run in a different task than the program itself, and the only way I can see that these two tasks can exchange arbitrary data is through isolated storage.

I decided not to pursue this job, partially because I had a better idea for a program that played streaming audio in the background. This was a program that would play a random tune, but with notes that would change slightly when the Accelerometer registered a change in phone orientation. Shaking the phone would create a new tune entirely.

This project got as far as my attempt to add a reference to the Microsoft.Devices.Sensors assembly to the AudioStreamAgent project. That move invoked a message box with a red X and the message, “An attempt has been made to add a reference unsupported by a background agent.” Apparently background agents can’t use the accelerometer. So much for that program idea!

Instead, I wrote a program called PentatonicRambler, which uses background streaming to play a never-ending melody in a pentatonic scale consisting of only the five black notes of the piano. The notes are randomly chosen by a synthesizer component called Rambler, which restricts each successive note to either one step up or one step down from the previous note. The lack of large jumps makes the resultant stream of notes sound more like a composed (or improvised) melody rather than a purely random one.

Figure 4shows the OnBeginStreaming override in the AudioStreamingAgent derivative.

Figure 4 The Synthesizer Setup for PentatonicRambler

protected override void OnBeginStreaming(AudioTrack track, AudioStreamer streamer)
{
  // Create a Rambler
  Rambler rambler = new Rambler(SAMPLE_RATE,
  new Pitch(Note.Csharp, 4), // Start
  new Pitch(Note.Csharp, 2), // Minimum
  new Pitch(Note.Csharp, 6)) // Maximum
  {
    Oscillator = new AlmostSquareWave(SAMPLE_RATE),
    Tempo = 480
  };
  // Set allowable note values
  rambler.Notes.Add(Note.Csharp);
  rambler.Notes.Add(Note.Dsharp);
  rambler.Notes.Add(Note.Fsharp);
  rambler.Notes.Add(Note.Gsharp);
  rambler.Notes.Add(Note.Asharp);
  // Create Mixer and MixerInput objects
  Mixer mixer = new Mixer();
  mixer.Inputs.Add(new MixerInput(rambler));
  DynamicPcmStreamSource audioStreamSource =
    new DynamicPcmStreamSource(SAMPLE_RATE);
  audioStreamSource.SampleProvider = mixer;
  streamer.SetSource(audioStreamSource);
}

I would have preferred to define the assemblage of the synthesizer components in the program itself, and then transfer this setup to the background agent, but given the process isolation between the program task and the background agent tasks, that would require a bit of work. The synthesizer setup would have to be defined entirely in a text string (perhaps XML-based) and then passed from the program to the AudioStreamingAgent derivative through the Tag property of AudioTrack.

Meanwhile, my wish list for future enhancements to Windows Phone includes a facility that lets programs communicate with the background agents they invoke.


Charles Petzold is a longtime contributor to MSDN Magazine. His Web site is charlespetzold.com.

Thanks to the following technical experts for reviewing this article: Eric BieMark Hopkins and Chris Pearson