June 2012

Volume 27 Number 06

Kinect - Starting to Develop with Kinect

By Leland Holmquest | June 2012

Welcome to the inaugural article devoted to developing Kinect for Windows applications. In the April and May issues of MSDN Magazine, I introduced you to Lily, my virtual assistant (https://msdn.microsoft.com/en-us/magazine/hh882450.aspx and https://msdn.microsoft.com/en-us/magazine/hh975374.aspx). In those articles, I demonstrated how to use some of the capabilities described in the Kinect for Windows SDK (Beta 2) to create a virtual assistant that uses multimodal communication. To determine the action a virtual assistant like Lily executes, the user points to an option while speaking a command. The combination of the two modes of communication—the gesture and the audio command—determines the action.

In this article, I’ll start with the basics and run through a few how-to’s for starting to develop with Kinect. First, however, I want to offer a note of encouragement: If you assume that programming the Kinect and incorporating natural user interface into your applications is beyond your capability, think again. You’ll soon find out how to use the skeleton-tracking capability of the Kinect in a Windows Presentation Foundation (WPF) application without writing a single line of code! It doesn’t get any easier than that.

Getting the Kinect for Windows

Obviously, the first thing you need is the Kinect for Windows. The Beta 2 worked with Kinect for Xbox 360 (although depending on the unit, a power supply might be required). As of version 1, the SDK supports only the Kinect for Windows hardware. There are a couple of differences between the Kinect for Xbox 360 and the Kinect for Windows. First, the Kinect for Windows allows a Near mode that supports users being as close as 40 centimeters (15.75 inches) from the Kinect. Second, the Kinect for Windows is specifically built and tested with an improved USB interface for PCs. Microsoft also has dedicated a large team of engineers to continually improve the hardware and software associated with Kinect for Windows and is committed to providing ongoing access to its deep investment in human tracking and speech recognition. To find out where to purchase a Kinect for Windows (hereafter referred to simply as Kinect) go to microsoft.com/en-us/kinectforwindows/purchase/and click Learn More.

Next you need to download the SDK. The v1.5 version was released in late May 2012. This update offers some amazing new features. Check out some of the big-ticket items:

  • Kinect can track faces and facial features, making it possible to apply a 3D mesh to the user’s face in real time.
  • In previous versions, Kinect had to see the entire skeleton prior to actively tracking it. In v1.5, a seated skeleton mode provides the 10 joints that represent the upper body. This feature is useful when the user is sitting at a desk or when objects are blocking the lower half of the body. (Lily badly needs this improvement!)
  • Skeleton tracking has been improved. As I mentioned earlier, Kinect can now use Near mode to track users as close as 40 cm. Skeleton-tracking performance has been enhanced as well.
  • This SDK provides components, libraries, tools, samples and the Kinect Studio (see next entry), making it much easier to learn to build Kinect-enabled applications.
  • Kinect Studio works in conjunction with any Kinect app being developed. If you have already done some Kinect for Windows developing, you’ll really value this new feature. Just start Kinect Studio and begin running the Kinect application. The user can then act out any desired scenario. Kinect Studio captures the events and data. Once complete, Kinect Studio enables playing back the data on subsequent runs, making it much easier to run multiple tests without having to provide the physical actor.
  • Kinect has broken the language barrier. It now can understand English, French, Spanish, Italian and Japanese as well as distinguish some regional differences, such as English/Great Britain, English/Australia, French/ France and French/Canada.

Setting Up Your Kinect

Setting up the Kinect for Windows is simple. Start by installing the SDK you just downloaded so that the correct drivers are in place for the interface to work. After the SDK is deployed, connect the power supply provided with Kinect to the unit. Then connect the USB connector to a USB port on the computer. Once connected, if all is well, a green LED light should show on the front of the Kinect.

Beginning to Program with Kinect

Now I’m going to take you through a quick, simple application to demonstrate how to program the Kinect. You can start from scratch by creating a WPF application or whatever other project type you want, but that will require you to spend time coding what is essentially boilerplate. Another alternative is to use KinectContrib, which you can find at CodePlex (kinectcontrib.codeplex.com/). Installing KinectContrib provides Visual Studio 2010 with a few new project templates specifically for developing Kinect solutions for Windows. Figure 1 shows the project types available with KinectContrib.

Templates for Kinect development
Figure 1 Templates for Kinect development

For the purpose of this article, I have selected the KinectSkeletonApplication. This project template creates a WPF application that has all the basic components prebuilt to make an application that uses the Kinect. The application generates three ellipses, representing the head, the right hand and the left hand. Extending this model to the other joints that make up the skeleton is simple. Here are the available joints:

  • Right and left ankles
  • Right and left elbows
  • Right and left feet
  • Right and left hands
  • Head
  • Hip center
  • Right and left hips
  • Right and left knees
  • Right and left shoulders
  • Shoulder center
  • Spine
  • Right and left wrists

This collection of joints gives you the ability to accurately render the skeleton of a human user. Figure 2 shows the various joints relative to the human body.

Collection of joints
Figure 2 Collection of joints

The code for this sample is relatively straightforward. I’ll cover only a couple of the key concepts you need to understand to get a skeleton-tracking project underway. The first is getting a reference to the Kinect unit, which can be accomplished in a single line of code:

KinectSensor sensor = KinectSensor.KinectSensors[0];

Notice the indexer. If you’re coming from the Beta versions, the sensor constructor is one of the big differences. By calling on KinectSensors[0], the program has a reference to the first Kinect unit available. In more advanced applications, you can make multiple sensors available. The SDK shows you how to iterate through all available sensors.

With a reference to the Kinect unit, you also need to tell the application what capabilities you want to leverage from the sensor. The sample app does this in the constructor:

sensor.ColorStream.Enable();
sensor.SkeletonStream.Enable();

This tells the sensor to activate the ColorStream component (which provides the RGB video stream) and SkeletonStream (which provides the skeleton-tracking capability). I’ll cover other capabilities, such as DepthStream, in subsequent articles.

Figure 3 shows the events for window_loaded and window_unloaded.

 

Figure 3 Window events

void MainWindow_Loaded(object sender, 
  RoutedEventArgs e)
{
  sensor.SkeletonFrameReady += runtime_SkeletonFrameReady;
  sensor.ColorFrameReady += runtime_VideoFrameReady;
  sensor.Start();
}

void MainWindow_Unloaded(object sender, RoutedEventArgs e)
{
  sensor.Stop();
}

In the loaded event, event handlers are registered to cover when a SkeletonFrame and a VideoFrame are ready. In addition, the sensor is started so that the application can begin receiving data from the Kinect unit. Subsequently, in the unloaded event, the sensor is stopped.

The MainWindow.xaml includes a grid that contains an Image control (videoImage) and a canvas. The videoImage projects the RGB video coming from the VideoFrameReady event and the canvas to draw the ellipses representing the head and hands. Figure 4 shows the code that converts the RGB feed into a usable format for the Image control.

 

Figure 4  VideoFrameReady

void runtime_VideoFrameReady(object sender, 
  ColorImageFrameReadyEventArgs e)
{
  bool receivedData = false;
  using (ColorImageFrame CFrame = e.OpenColorImageFrame())
  {
    if (CFrame == null) 
    {
      // The image processing took too long. More than 2 frames behind.
    }
    else 
    {
      pixelData = new byte[CFrame.PixelDataLength];
      CFrame.CopyPixelDataTo(pixelData);
      receivedData = true;
    }
  }

  if (receivedData) 
  {
    BitmapSource source = BitmapSource.Create(640, 480, 96, 
      PixelFormats.Bgr32, null, pixelData, 640 * 4);

    videoImage.Source = source;
  }
}

The last thing needed is to handle when SkeletonFrame is available to draw the simplified skeleton on the canvas, as shown in Figure 5.

 

Figure 5 SkeletonFrameReady

void runtime_SkeletonFrameReady(object sender, 
  SkeletonFrameReadyEventArgs e)
{
  bool receivedData = false;

  using (SkeletonFrame SFrame = e.OpenSkeletonFrame())
  {
    if (SFrame == null) 
    {
      // The image processing took too long. More than 2 frames behind.
    }
    else 
    {
      skeletons = new Skeleton[SFrame.SkeletonArrayLength];
      SFrame.CopySkeletonDataTo(skeletons);
      receivedData = true;
    }
  }

  if (receivedData)
  {
    Skeleton currentSkeleton = (from s in skeletons
                                where s.TrackingState == 
                                SkeletonTrackingState.Tracked
                                select s).FirstOrDefault();

    if (currentSkeleton != null) 
    {
      SetEllipsePosition(head, 
        currentSkeleton.Joints[JointType.Head]);
      SetEllipsePosition(leftHand, 
        currentSkeleton.Joints[JointType.HandLeft]);
      SetEllipsePosition(rightHand, 
        currentSkeleton.Joints[JointType.HandRight]);
    }
  }
}

So, I downloaded and installed the SDK, connected my Kinect sensor, and, leveraging KinectContrib, built and ran a WPF application that renders a three-point skeleton over the top of the video imagery captured by Kinect. This took about 5 minutes. Microsoft has done a phenomenal job of putting really sophisticated capabilities in our hands, with an API that is straightforward and relatively inexpensive.

In the next few articles, I’ll cover more of the Kinect basics. Once I have gone through the full foundation, I’ll begin to demonstrate some of the more advanced capabilities as well as show you what others are doing with Kinect. I hope you will be encouraged to start incorporating Kinect into your projects if you haven’t ready. The capabilities of this technology are unlimited. Imagine what you will Kinect!


Leland Holmquest is a senior consultant at Microsoft.