December 2011

Volume 26 Number 12

UI Frontiers - Video Feeds on Windows Phone 7

By Charles Petzold | December 2011

Charles PetzoldThe modern smartphone is packed with electronic sensory organs through which it can obtain information about the outside world. These include the cell phone radio itself, Wi-Fi, GPS, a touchscreen, motion detectors and more.

To the application programmer, these sensory organs are only as useful as the APIs associated with them. If the APIs are deficient, the hardware features become much less valuable or even worthless.

From the perspective of an application programmer, one of the features missing from the initial release of Windows Phone was a good eyeball. Although the camera has always been in Windows Phone, the only API available in the original release was Camera­CaptureTask. This class essentially spawns a child process that lets the user take a photo, and then returns that picture to the application. That’s it. The application can’t control any part of this process, nor can it obtain the live video feed coming through the lens.

That deficiency has now been corrected with two sets of programming interfaces.

One set of APIs concerns the Camera and PhotoCamera classes. These classes allow an application to assemble an entire photo-taking UI, including flash options; live preview video feed; shutter key presses and half-presses; and focus detection. I hope to discuss this interface in a future column.

The APIs I’ll be discussing in this column were inherited from the Silverlight 4 webcam interface. They let an application obtain live video and audio feeds from the phone’s camera and microphone. These feeds can be presented to the user, saved to a file or—and here it gets more interesting—manipulated or interpreted in some way.

Devices and Sources

The webcam interface of Silverlight 4 has been enhanced just a little for Windows Phone 7, and consists of about a dozen classes defined in the System.Windows.Media namespace. You’ll always begin with the static CaptureDeviceConfiguration class. If the phone supports multiple cameras or microphones, these are available from the Get­AvailableVideoCaptureDevices and GetAvailableAudioCapture­Devices methods. You might want to present these to the user in a list for selection. Alternatively, you can simply call the GetDefaultVideoCaptureDevice and GetDefaultAudioCaptureDevice methods.

Documentation mentions these methods might return null, probably indicating the phone doesn’t contain a camera. This is unlikely, but it’s a good idea to check for null anyway.

These CaptureDeviceConfiguration methods return instances of VideoCaptureDevice and AudioCaptureDevice or collections of instances of these two classes. These classes provide a friendly name for the device, a SupportedFormats collection and a DesiredFormat property. For video, the formats involve the pixel dimensions of each frame of video, the color format and frames per second. For audio, the format specifies the number of channels, the bits per sample and the wave format, which is always Pulse Code Modulation (PCM).

A Silverlight 4 application must call the CaptureDevice­Configuration.RequestDeviceAccess method to obtain permission from the user to access the webcam. This call must be in response to user input, such as a button click. If the CaptureDevice­Configur­­ation.AllowedDeviceAccess property is true, however, then the user has already given permission for this access and the program needn’t call RequestDeviceAccess again.

Obviously, the RequestDeviceAccess method serves to protect the privacy of the user. But the Web-based Silverlight and Silverlight for Windows Phone 7 seem to be a little different in this respect. The idea of a Web site surreptitiously accessing your webcam is decidedly creepy, but much less so for a phone program. It’s my experience that for a Windows Phone application, AllowedDeviceAccess always returns true. Nevertheless, in all the programs described in this column, I’ve defined a UI to call RequestDeviceAccess.

The application must also create a CaptureSource object, which combines a video device and an audio device into a single stream of live video and audio. CaptureSource has two properties, named VideoCaptureDevice and AudioCaptureDevice, that you set to instances of VideoCaptureDevice and AudioCaptureDevice obtained from CaptureDeviceConfiguration. You needn’t set both properties if you’re interested in only video or only audio. In the sample programs in this column, I’ve focused entirely on video.

After creating a CaptureSource object, you can call the object’s Start and Stop methods. In a program dedicated to obtaining video or audio feeds, you’ll probably want to call Start in the OnNavigated­To override and Stop in the OnNavigatedFrom override.

In addition, you can use the CaptureImageAsync method of CaptureSource to obtain individual video frames in the form of WriteableBitmap objects. I won’t be demonstrating that feature.

Once you have a CaptureSource object, you can go in one of two directions: You can create a VideoBrush to display the live video feed, or you can connect CaptureSource to a “sink” object to get access to raw data or to save to a file in isolated storage.

The VideoBrush

Definitely the easiest CaptureSource option is the VideoBrush. Silverlight 3 introduced the VideoBrush with a MediaElement source, and Silverlight 4 added the CaptureSource alternative for VideoBrush. As with any brush, you can use it to color element backgrounds or foregrounds.

In the downloadable code for this column is a program called StraightVideo that uses VideoCaptureDevice, CaptureSource and VideoBrush to display the live video feed coming through the default camera lens. Figure 1 shows a good chunk of the MainPage.xaml file. Notice the use of landscape mode (which you’ll want for video feeds), the definition of the VideoBrush on the Background property of the content Grid and the Button for obtaining user permission to access the camera.

Figure 1 The MainPage.xaml File from StraightVideo

<phone:PhoneApplicationPage
  x:Class="StraightVideo.MainPage"
  xmlns="https://schemas.microsoft.com/winfx/2006/xaml/presentation"
  xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml"
  xmlns:phone="clr-namespace:Microsoft.Phone.Controls;assembly=Microsoft.Phone"
  xmlns:shell="clr-namespace:Microsoft.Phone.Shell;assembly=Microsoft.Phone"
  ...
  SupportedOrientations="Landscape" Orientation="LandscapeLeft"
  shell:SystemTray.IsVisible="True">
  <Grid x:Name="LayoutRoot" Background="Transparent">
    <Grid.RowDefinitions>
      <RowDefinition Height="Auto"/>
      <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <StackPanel x:Name="TitlePanel" Grid.Row="0" Margin="12,17,0,28">
      <TextBlock x:Name="ApplicationTitle" Text="STRAIGHT VIDEO"
        Style="{StaticResource PhoneTextNormalStyle}"/>
    </StackPanel>
    <Grid x:Name="ContentPanel" Grid.Row="1" Margin="12,0,12,0">
      <Grid.Background>
        <VideoBrush x:Name="videoBrush" />
      </Grid.Background>
      <Button Name="startButton"
        Content="start"
        HorizontalAlignment="Center"
        VerticalAlignment="Center"
        Click="OnStartButtonClick" />
    </Grid>
  </Grid>
</phone:PhoneApplicationPage>

Figure 2 shows much of the codebehind file. The CaptureSource object is created in the page’s constructor, but it’s started and stopped in the navigation overrides. I also found it necessary to call SetSource on the VideoBrush in OnNavigatedTo; otherwise the image was lost after a previous Stop call.

Figure 2 The MainPage.xaml.cs File from StraightVideo

public partial class MainPage : PhoneApplicationPage
{
  CaptureSource captureSource;
  public MainPage()
  {
    InitializeComponent();
    captureSource = new CaptureSource
    {
      VideoCaptureDevice =
        CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice()
    };
  }
  protected override void OnNavigatedTo(NavigationEventArgs args)
  {
    if (captureSource !=
      null && CaptureDeviceConfiguration.AllowedDeviceAccess)
    {
      videoBrush.SetSource(captureSource);
      captureSource.Start();
      startButton.Visibility = Visibility.Collapsed;
    }
    base.OnNavigatedTo(args);
  }
  protected override void OnNavigatedFrom(NavigationEventArgs args)
  {
    if (captureSource != null && captureSource.State == CaptureState.Started)
    {
      captureSource.Stop();
      startButton.Visibility = Visibility.Visible;
    }
    base.OnNavigatedFrom(args);
  }
  void OnStartButtonClick(object sender, RoutedEventArgs args)
  {
    if (captureSource != null &&
        (CaptureDeviceConfiguration.AllowedDeviceAccess ||
        CaptureDeviceConfiguration.RequestDeviceAccess())
    {
      videoBrush.SetSource(captureSource);
      captureSource.Start();
      startButton.Visibility = Visibility.Collapsed;
    }
  }
}

You can run this program on the Windows Phone Emulator, but it’s much more interesting on a real device. You’ll notice that the rendering of the video feed is very responsive. Evidently the video feed is going directly to the video hardware. (More evidence for this supposition is that my customary method of obtaining screenshots from the phone by rendering the PhoneApplicationFrame object to a WriteableBitmap didn’t work with this program.) You’ll also notice that because the video is rendered via a brush, the brush is stretched to the dimensions of the content Grid and the image is distorted.

One of the nice features of brushes is that they can be shared among multiple elements. That’s the idea behind FlipXYVideo. This program dynamically creates a bunch of tiled Rectangle objects in a Grid. The same VideoBrush is used for each, except every other Rectangle is flipped vertically or horizontally or both, as shown in Figure 3. You can increase or decrease the number of rows and columns from ApplicationBar buttons.

Figure 3 Sharing VideoBrush Objects in FlipXYVideo

void CreateRowsAndColumns()
{
  videoPanel.Children.Clear();
  videoPanel.RowDefinitions.Clear();
  videoPanel.ColumnDefinitions.Clear();
  for (int row = 0; row < numRowsCols; row++)
    videoPanel.RowDefinitions.Add(new RowDefinition
    {
      Height = new GridLength(1, GridUnitType.Star)
    });
  for (int col = 0; col < numRowsCols; col++)
    videoPanel.ColumnDefinitions.Add(new ColumnDefinition
  {
    Width = new GridLength(1, GridUnitType.Star)
  });
  for (int row = 0; row < numRowsCols; row++)
    for (int col = 0; col < numRowsCols; col++)
    {
      Rectangle rect = new Rectangle
      {
        Fill = videoBrush,
        RenderTransformOrigin = new Point(0.5, 0.5),
        RenderTransform = new ScaleTransform
        {
          ScaleX = 1 - 2 * (col % 2),
          ScaleY = 1 - 2 * (row % 2),
        },
      };
      Grid.SetRow(rect, row);
      Grid.SetColumn(rect, col);
      videoPanel.Children.Add(rect);
    }
  fewerButton.IsEnabled = numRowsCols > 1;
}

This program is fun to play with, but not as much fun as the kaleidoscope program, which I’ll discuss shortly.

Source and Sink

The alternative to using a VideoBrush is connecting a CaptureSource object to an AudioSink, VideoSink or FileSink object. The use of the word “sink” in these class names is in the sense of “receptacle” and is similar to the word’s use in electronics or network theory. (Or think of a “heat source” and a “heat sink.”)

The FileSink class is the preferred method for saving video or audio streams to your application’s isolated storage without any intervention on your part. If you need access to the actual video or audio bits in real time, you’ll use VideoSink and AudioSink. These two classes are abstract. You derive a class from one or both of these abstract classes and override the OnCaptureStarted, OnCaptureStopped, OnFormatChange and OnSample methods.

The class that you derive from VideoSink or AudioSink will always get a call to OnFormatChange before the first call to OnSample. The information supplied with OnFormatChange indicates how the sample data is to be interpreted. For both VideoSink and AudioSink, the OnSample call provides timing information and an array of bytes. For AudioSink, these bytes represent PCM data. For VideoSink, these bytes are rows and columns of pixels for each frame of video. This data is always raw and uncompressed.

Both OnFormatChange and OnSample are called in secondary threads of execution, so you’ll need to use a Dispatcher object for tasks within these methods that must be performed in the UI thread.

The StraightVideoSink program is similar to StraightVideo except the video data comes through a class derived from VideoSink. This derived class (shown in Figure 4) is named SimpleVideoSink because it merely takes the OnSample byte array and transfers it to a WriteableBitmap.

Figure 4 The SimpleVideoSink Class Used in StraightVideoSink

public class SimpleVideoSink : VideoSink
{
  VideoFormat videoFormat;
  WriteableBitmap writeableBitmap;
  Action<WriteableBitmap> action;
  public SimpleVideoSink(Action<WriteableBitmap> action)
  {
    this.action = action;
  }
  protected override void OnCaptureStarted() { }
  protected override void OnCaptureStopped() { }
  protected override void OnFormatChange(VideoFormat videoFormat)
  {
    this.videoFormat = videoFormat;
    System.Windows.Deployment.Current.Dispatcher.BeginInvoke(() =>
    {
      writeableBitmap = new WriteableBitmap(videoFormat.PixelWidth,
        videoFormat.PixelHeight);
      action(writeableBitmap);
    });
  }
  protected override void OnSample(long sampleTimeInHundredNanoseconds,
    long frameDurationInHundredNanoseconds, byte[] sampleData)
  {
    if (writeableBitmap == null)
      return;
    int baseIndex = 0;
    for (int row = 0; row < writeableBitmap.PixelHeight; row++)
    {
      for (int col = 0; col < writeableBitmap.PixelWidth; col++)
      {
        int pixel = 0;
        if (videoFormat.PixelFormat == PixelFormatType.Format8bppGrayscale)
        {
          byte grayShade = sampleData[baseIndex + col];
          pixel = (int)grayShade | (grayShade << 8) |
            (grayShade << 16) | (0xFF << 24);
        }
        else
        {
          int index = baseIndex + 4 * col;
          pixel = (int)sampleData[index + 0] | (sampleData[index + 1] << 8) |
            (sampleData[index + 2] << 16) | (sampleData[index + 3] << 24);
        }
        writeableBitmap.Pixels[row * writeableBitmap.PixelWidth + col] = pixel;
      }
      baseIndex += videoFormat.Stride;
    }
    writeableBitmap.Dispatcher.BeginInvoke(() =>
      {
        writeableBitmap.Invalidate();
      });
  }
}

MainPage uses that WriteableBitmap with an Image element to display the resultant video feed. (Alternatively, it could create an ImageBrush and set that to the background or foreground of an element.)

Here’s the catch: That WriteableBitmap can’t be created until the OnFormatChange method is called in the VideoSink derivative, because that call indicates the size of the video frame. (It’s usually 640x480 pixels on my phone but conceivably might be something else.) Although the VideoSink derivative creates the WriteableBitmap, MainPage needs to access it. That’s why I defined a constructor for SimpleVideoSink that contains an Action argument to call when the WriteableBitmap is created.

Notice that the WriteableBitmap must be created in the program’s UI thread, so SimpleVideoSink uses a Dispatcher object to queue up the creation for the UI thread. This means that the WriteableBitmap might not be created before the first OnSample call. Watch out for that! Although the OnSample method can access the Pixels array of the WriteableBitmap in a secondary thread, the call to Invalidate the bitmap must occur in the UI thread because that call ultimately affects the display of the bitmap by the Image element.

The MainPage class of StraightVideoSink includes an Application­Bar button to toggle between color and gray-shaded video feeds. These are the only two options, and you can switch to one or the other by setting the DesiredFormat property of the VideoCaptureDevice object. A color feed has 4 bytes per pixel in the order green, blue, red and alpha (which will always be 255). A gray-shade feed has only 1 byte per pixel. In either case, a WriteableBitmap always has 4 bytes per pixel where every pixel is represented by a 32-bit integer with the highest 8 bits for the alpha channel, followed by red, green and blue. The CaptureSource object must be stopped and restarted when switching formats.

Although StraightVideo and StraightVideoSink both display the live video feed, you’ll probably notice that StraightVideoSink is considerably more sluggish as a result of the work the program is doing to transfer the video frame to the WriteableBitmap.

Making a Kaleidoscope

If you just need a real-time video feed with occasional frame captures, you can use the CaptureImageAsync method of Capture­Source. Because of the performance overhead, you’ll probably restrict the use of VideoSink to more specialized applications involving the manipulation of the pixel bits.

Let’s write such a “specialized” program that arranges the video feed into a kaleidoscopic pattern. Conceptually, this is fairly straightforward: The VideoSink derivative gets a video feed where each frame is probably 640x480 pixels in size or perhaps something else. You want to reference an equilateral triangle of image data from that frame as shown in Figure 5.

I decided on a triangle with its base on the top and its apex at the bottom to better capture faces.

The Source Triangle for a Kaleidoscope
Figure 5 The Source Triangle for a Kaleidoscope

The image in that triangle is then duplicated on a WriteableBitmap multiple times with some rotation and flipping so the images are tiled and grouped into hexagons without any discontinuities, as shown in Figure 6. I know the hexagons look like pretty flowers, but they’re really just many images of my face (perhaps too many images of my face).

The Destination Bitmap for a Kaleidoscope
Figure 6 The Destination Bitmap for a Kaleidoscope

The pattern of repetition becomes more apparent when the individual triangles are delimited, as shown in Figure 7. When rendered on the phone, the height of the target WriteableBitmap will be the same as the phone’s smaller dimension, or 480 pixels. Each equilateral triangle thus has a side of 120 pixels. This means that the height of the triangle is 120 times the square root of 0.75, or about 104 pixels. In the program, I use 104 for the math but 105 for sizing the bitmap to make the loops simpler. The entire resultant image is 630 pixels wide.

The Destination Bitmap Showing the Source Triangles
Figure 7 The Destination Bitmap Showing the Source Triangles

I found it most convenient to treat the total image as three identical vertical bands 210 pixels wide. Each of those vertical bands has reflection symmetry around the vertical midpoint, so I reduced the image to a single 105x480-pixel bitmap repeated six times, half of those with reflection. That band consists of just seven full triangles and two partial triangles.

Even so, I was quite nervous about the calculations necessary to assemble this image. Then I realized that these calculations wouldn’t have to be performed at the video refresh rate of 30 times per second. They need only be performed once, when the size of the video image becomes available in the OnFormatChange override.

The resultant program is called Kaleidovideo. (That name would be considered an etymological abomination by traditionalists because “kaleido” comes from a Greek root meaning “beautiful form,” but “video” has a Latin root, and when coining new words you’re not supposed to mix the two.)

The KaleidoscopeVideoSink class overrides VideoSink. The OnFormatChange method is responsible for computing the members of an array called indexMap. That array has as many members as the number of pixels in the WriteableBitmap (105 multiplied by 480, or 50,400) and stores an index into the rectangular area of the video image. Using this indexMap array, the transfer of pixels from the video image to the WriteableBitmap in the OnSample method is pretty much as fast as conceivably possible.

This program is lots of fun. Everything in the world looks rather more beautiful in Kaleidovideo. Figure 8 shows one of my bookcases, for example. You can even watch TV through it.

A Typical Kaleidovideo Screen
Figure 8 A Typical Kaleidovideo Screen

Just in case you see something on the screen you’d like to save, I added a “capture” button to the ApplicationBar. You’ll find the images in the Saved Pictures folder of the phone’s photo library.           


Charles Petzold is a longtime contributing editor to MSDN Magazine. His recent book, Programming Windows Phone 7 (Microsoft Press, 2010), is available as a free download at bit.ly/cpebookpdf.

Thanks to the following technical experts for reviewing this article: Mark Hopkins and Matt Stroshane