Microsoft Windows Media and DirectShow: Options for Application Developers
Dennis Flanagan, Stan Pennington, and Nick Vicars-Harris
Microsoft Windows Digital Media Division
December 2001
Summary: This article discusses the capabilities of the Microsoft Windows Media 7.1 SDK and the Microsoft DirectShow 8.1 SDK. Each software development kit offers options for playing and creating Windows Media content. The purpose of this article is to help application developers decide which SDK is best suited to their particular needs. (14 printed pages)
Introduction
Microsoft provides a number of software development kits (SDKs) for developers of digital media applications for the Microsoft® Windows® platform. Currently, the two primary SDKs are Microsoft DirectShow® and the Windows Media™ SDK. DirectShow provides support for creating many types of digital media applications and many different media formats, including Windows Media. The Windows Media SDK is actually a suite of SDKs focused on specific support for the Windows Media Format. In some ways, the capabilities of these SDKs overlap. For example, simple playback of Windows Media Format files can be performed using the DirectShow SDK, the Windows Media Format SDK, or the Windows Media Player SDK; each option has its advantages and limitations. In other ways, each SDK offers unique capabilities. The purpose of this article is to compare the capabilities and limitations of each SDK in order to help developers decide which SDK is best suited to their needs.
This document contains the following topics:
About the Windows Media Format
The Windows Media Format is an extensible file format that enables audio and video content to be compressed and distributed through streaming, downloading, or local storage such as CD-ROM. The format comprises a file container called the Advanced Systems Format (ASF) along with audio and video data compressed with any of the following codecs:
- Windows Media Audio 8 codec for high-fidelity music and mixed audio.
- Windows Media Video 8 codec for low bandwidth or high-quality video delivery.
- The ISO MPEG-4 Version 1 video codec, which provides standards-based codec implementation for situations where standards are important. (This codec does not offer the same quality as the later Windows Media Video codecs but is provided for compatibility with this long-established industry standard.)
Windows Media Format supports additional data, such as:
- Metadata about the content, such as its title or author.
- A script stream that allows content to be synchronized with other information such as HTML, captioning, and so on. This stream also enables you to control the user interface in a browser from within the encoded stream itself.
Note Third parties may use the ASF file container to package any other type of data, including audio and video data that has been compressed with some other codec. Such files are not considered Windows Media Format files and cannot use the .wmv and .wma extensions, which are reserved for Windows Media Video and Windows Media Audio files. ASF files that use non–Windows Media content should use the .asf extension.
The Windows Media Format is designed to arrange and organize synchronized multimedia data. It is optimized for streaming data over networks and rendering the data on a client computer. Windows Media Format is suitable for streaming live presentations, as well as prerecorded files.
Windows Media Format offers the following advantages to developers and users:
- The size of ASF files is virtually unlimited, with a practical limitation based on the particular file system and operating system being used.
On Microsoft Windows® 95, Windows 98, and Windows NT® 4, the maximum file size (using FAT and FAT32 file systems) is 4 gigabytes (GB).
On Windows 2000 the NTFS file system has a very large theoretical file size limitation, but can support ASF files with a maximum practical file size of about 50 GB.
- Windows Media Format uses Windows Media Audio and Windows Media Video codecs to compress and decompress the data. Both codecs combine high-quality audio and video output with significant improvements in file compression, so that audio and video data uses less storage space and/or can provide greater quality over low bandwidth connections into the home or office.
- Many different types of software and hardware players can decode content stored in Windows Media Format, including Windows Media Player on Windows, Macintosh, and Unix operating systems, portable audio devices, and set-top decoders. In the future, it is likely that home and car stereo devices, cellular phones, and many other types of hardware will support Windows Media Format, providing users with access to their content wherever they may be.
- Windows Media Format provides robust and reliable synchronization of audio and video.
- Windows Media Format also guarantees the user experience will be of high quality.
- Content in Windows Media Format can be secured through digital rights management. Content providers can license their Windows Media data to users and protect it from copyright infringement and piracy.
Overview of the Windows Media SDK
The Windows Media SDK is an umbrella term for a suite of SDKs specifically geared toward creation, playback (including network streaming), and DRM protection of Windows Media content. The individual SDKs are described briefly below.
Windows Media Format SDK
Offers a low-level interface designed for natively integrating Windows Media Format into applications. Both Windows Media Encoder and the DirectShow ASF filters use this SDK internally. The Windows Media Format SDK enables applications to read, write, edit, and transfer Windows Media Format files, including playback of DRM-protected content and streaming of content over a network. This SDK also provides the only way to package non–Windows Media content inside the ASF file container. Here are some points to consider in deciding whether the Windows Media Format SDK is appropriate for your needs:
- While the Windows Media Format SDK provides low-level access to the media stream, applications are entirely responsible for capturing data from hardware devices and rendering it to the screen or the speakers. This gives developers complete control over the media streaming functions of their code, which can be useful for application-specific optimization and debugging. If such control is not required and your application must perform either capture or rendering, especially of video content, consider using the Windows Media Encoder SDK or the DirectShow SDK for those tasks to reduce development time. Audio rendering is relatively simple using the Win32® waveOut functions and is illustrated in the Code Examples section later in this article.
- Because Windows Media Format application programming interfaces (APIs) are focused exclusively on Windows Media Format, they might be easier to learn for developers who are already familiar with streaming media applications and whose applications are not based on DirectShow.
- The Windows Media Format SDK files that applications redistribute are relatively small compared to the Microsoft DirectX® files that are necessary for DirectShow. As a result, applications written using the Windows Media Format SDK may be smaller. This is an advantage when download time is important.
Windows Media Encoder SDK
Offers an Automation interface for the Windows Media Encoder, a free software tool available from Microsoft at the Windows Media page of the Microsoft Web site. The Windows Media Encoder captures audio and video from live sources such as TV cards and digital video (DV) camcorders, file conversion services, and from streamed audio and video content over a network. Developers can use the SDK to extend the capabilities of Windows Media Encoder, for example to perform automated batch encoding operations, or create a custom user interface. If your application requirements do not extend beyond the capabilities of Windows Media Encoder, this ready-to-run application is probably the most cost-effective solution for creating or streaming Windows Media content.
Windows Media Player SDK
Offers a scripting interface for controlling Windows Media Player, which is available at no cost from the Windows Media page of the Microsoft Web site. Developers who need simple audio or video playback capabilities in COM-based documents, applications, or Web pages can embed the Windows Media Player control and control it with Automation. Windows Media Player supports not only Windows Media content, but also any format supported by DirectShow, such as AVI, WAV, AIFF, MP3 and MPEG-1. This includes non-standard formats, assuming that the required codecs or DirectShow filters have been installed on the local system. Developers can use the Windows Media Player SDK to create custom skins and visualizations for the 7.x versions of the Player, and to manage playlists. The SDK also contains scripting interfaces for Windows Media Player version 6.4, which is installed with Microsoft Internet Explorer 5, Microsoft Windows 98 Second Edition, Microsoft Windows 2000, and Microsoft Windows XP Home Edition and Windows XP Professional.
Windows Media Services SDK (not discussed in this article)
Supports programmatic management of streaming services. Servers running Windows Media Services support unicasting (sending a stream to each client requesting it) and multicasting (broadcasting a single stream across the network so that it can be received by many clients at the same time).
Windows Media Rights Manager SDK (not discussed in this article)
Enables content owners to protect Windows Media files with encryption, and issue licenses for protected files using digital rights management (DRM) technology. Whether you create Windows Media content with Windows Media Encoder, the Windows Media Format SDK, or DirectShow, if you require DRM protection, you must use the Windows Media Rights Manager SDK to apply it.
Windows Media Embedded Product Adaptation Kit (WMEPAK) (not discussed in this article)
Adds Windows Media playback to portable digital music players, Internet appliances, and other embedded systems. Also enables encoding.
In this article we will be discussing primarily the Windows Media Format SDK, the Windows Media Encoder SDK, and the Windows Media Player SDK. These are the SDKs that overlap to some degree with each other, and with DirectShow, in their capabilities for Windows Media playback and file creation.
Overview of the DirectShow SDK
The DirectShow SDK offers a powerful digital media streaming architecture, high-level APIs, and an extensive library of plug-in components called filters that have been field-tested over several years of use. Filters separate the processing of digital media data into discrete steps; each filter represents one (or sometimes more than one) processing step, such as file reading or writing, parsing, decoding, rendering, and so on. Third parties can create their own filters to perform any type of custom processing.
The DirectShow architecture provides a generalized and consistent approach for creating virtually any type of digital media application. You write a DirectShow application by creating an instance of a high-level object called the filter graph manager, and then use it to create configurations of filters (called filter graphs) that work together to perform some desired task on a media stream. The filter graph manager and the filters themselves handle all buffer creation, synchronization, and connection details, so the application developer needs only to write code to build and operate the filter graph; there is no need to touch the media stream directly, although access to the raw data is provided in various ways for those applications that require it. Construction of many standard types of filter graphs can be accomplished with only one or two method calls.
DirectShow also includes a set of high-level APIs, known as DirectShow Editing Services (DES), which enables the creation of non-linear video editing applications. DES brings the following features to DirectShow:
- A timeline model that organizes video and audio tracks into nested layers, making it easy to manipulate the final production.
- Support for reading from and writing to Windows Media files.
- The ability to preview a video project on the fly.
- Project persistence through an XML-based format.
- Support for audio and video effects and transitions, such as fades and wipes.
- Almost 100 standard wipes, as defined by the Society of Motion Picture and Television Engineers (SMPTE).
- Keying based on hue, luminance, RGB value, or alpha value.
- Automatic conversion of frame rates and audio sampling rates, enabling a production to use a wide variety of sources.
- Resizing or cropping of video.
DirectShow support for Windows Media files is provided through two filters, the ASF Reader and the ASF Writer. (Actually, there are two versions of the ASF Reader filter; we'll discuss the differences between them in a moment.) The ASF Reader is used for playing Windows Media files. It acts as both a file reader and a stream parser. It reads Windows Media Audio or Windows Media Video files and passes the encoded content downstream to the audio and/or video decoders, which in turn pass the decoded content to the renderer(s). The ASF Writer is used for creating Windows Media files. Both of these filters use the Windows Media Format SDK internally to do their work, but as with any DirectShow filter, they handle all the low-level streaming details such as synchronization and buffer allocation. For an application, construction of a Windows Media playback filter graph is trivial. Creating a file-writing filter graph is slightly more complex, because the application must use the Windows Media Format SDK directly in this case to specify a profile, or set custom parameters such as bit rate, seconds per key frame, and so on.
As stated before, DirectShow has two ASF Reader filters. The original filter is maintained to ensure backward compatibility with Windows Media Player Version 6.4. This filter is geared toward network playback and provides support for fast-forwarding but not rate control. It is the default filter for playing Windows Media files and will be selected by the filter graph manager during automatic graph construction unless the new filter is specifically added to the graph by the application. The newer filter supports rate control, and should be used in most new applications. The DirectShow SDK explains how to ensure that the newer ASF Reader is used for file playback.
For playback of Windows Media, the advantage of DirectShow over the Windows Media Player SDK is that DirectShow allows you to integrate the playback window in containers other than a Web page. For creating Windows Media Files, the advantage of DirectShow over the Windows Media Encoder SDK is that DirectShow has a smaller footprint in host memory. If your application is already based on DirectShow and is not focused solely on creating or streaming Windows Media content, it probably makes sense to use DirectShow rather than the Windows Media Encoder. On the other hand, if the sole purpose of the application is to create or stream Windows Media content either from files or live sources, the Windows Media Encoder has already done most or all of your work for you. For both reading and writing, the advantage of DirectShow over the Windows Media Format SDK is ease of use, especially if you are familiar with DirectShow, or your application is already using DirectShow for other tasks such as video capture or non-linear video editing.
However, there are some limitations in DirectShow's support for Windows Media:
- The DirectShow ASF Reader filters that shipped prior to DirectX 8.1 do not handle files that have been protected with digital rights management.
- The ASF Writer cannot be use to create ASF files using non-Microsoft codecs.
DirectShow is a component of the Microsoft DirectX SDK. Using DirectShow with Windows Media requires the presence of Windows Media Format run-time components, and development requires the Windows Media Format SDK with proper licensing.
Code Examples
The following sections give examples of the code necessary to read and write Windows Media Format files by using the Windows Media Format SDK and then by using the DirectShow SDK.
Windows Media Format SDK Code Examples
Windows Media Format files are highly configurable so that they can be optimized for different kinds of source content and delivery scenarios. Using the Windows Media Format SDK, you can configure parameters such as overall bit rate and quality individually, or for convenience, you can use a predefined profile. A profile defines the expected bandwidth and bit rate of the output, along with video size, pixel depth, and the codec used to compress the content. The Windows Media Format SDK enumerates only Microsoft-licensed codecs and, therefore, compresses content only with these codecs. To use third-party compression, the developer must compress the content, and then hand off the compressed content to the Windows Media Format SDK. In turn, the Windows Media Format SDK, with the correctly defined output stream, will place this content directly into a ASF file. Reading, or decoding, will be discussed later in this article.
To set up the writer part of an application, use the following code as a guideline:
// Create a Writer Object.
hr = WMCreateWriter( NULL, &pWriter );
// Find its Writer interface.
hr = pWriter->QueryInterface( IID_IWMWriter, ( VOID ** )&pWriter );
// Set the chosen/specified profile.
hr = pWriter->SetProfile( pProfile );
hr = pWriter->SetOutputFilename( outfile );
// Initialize the Writer.
hr = pWriter->BeginWriting( );
// Put code here to WriteSample(s).
Hr = pWriter/->EndWriting ();
After the IWMWriter::BeginWriting method is called, the content can be written to the format file using the WriteSample method. The IWMWriterAdvanced interface supports WriteStreamSample. The difference between the WriteSample and the WriteStreamSample methods is that WriteStreamSample enables the writing of compressed samples to disk, for instance when using third-party codecs (see Third-party Codec Support later in this article).
Setting up a profile can be confusing because many options need to be configured. To help, the Windows Media Format SDK provides the Profile Manager, which enables the use of pre-defined profiles installed with the Windows Media Encoder and the Windows Media Format SDK runtime on a user's computer. The following code example shows how to load this profile:
// Set up some Interfaces.
IWMProfileManager* pIWMProfileManager = NULL;
IWMProfileManager2* pIPM2 = NULL;
IWMProfile* pIWMProfile = NULL;
// Get the manager.
hr = WMCreateProfileManager( &pIWMProfileManager );
if( FAILED( hr ) ) break;
// Query for the version 2 manager.
hr = pIWMProfileManager->QueryInterface(IID_IWMProfileManager2,
(void**)&pIPM2 );
if( FAILED( hr ) ) break;
// Use v7-based profiles.
pIPM2->SetSystemProfileVersion( WMT_VER_7_0 );
pIPM2->Release();//cleanup
// Get the required profile.
hr = pIWMProfileManager->LoadSystemProfile( dwProfileIndex,
&pIWMProfile);
if( FAILED( hr ) ) break;
Third-party codec support
The profile that defines the format and properties of the output file has some specific requirements for supporting third-party (non-Microsoft) codecs. First, you must identify a FourCC code (ID or Type). It is used to identify the codec at decoding time in the Windows Media Player (or other Windows Media Format SDK–based player). Then use the following steps to set up this profile prior to writing:
- Obtain an IWMWriterAdvanced interface by querying the IWMWriter interface.
- Define the media types for each stream using the IWMMediaProps interface.
- Specify the output file using the IWMWriter::SetOutputFilename method.
- Set the pInput parameter to NULL for each pre-compressed stream by using the IWMWriter::SetInputProps method.
- Create a WM_MEDIA_TYPE structure by using the IWMMediaProps::SetMediaType method:
- For the majortype member, use WMMEDIATYPE_Audio or WMMEDIATYPE_Video.
- For the subtype member, use the first DWORD from the codec GUID. Common terms are Audio ID for audio and FourCC for Video.
- For formattype member, use WMFORMAT_VideoInfo or WMFORMAT_WaveFormatEx.
The purpose of this procedure is to disable compression for the precompressed streams and to prevent the Windows Media Format SDK from trying to load your codec. The Windows Media Format SDK does not support third-party codecs directly and would report an invalid format type if these steps were not followed in sequence.
Reading Windows Media files with the Windows Media Format SDK
The Windows Media Format SDK does not directly support rendering of audio or video; before implementing these features of your application, you should investigate the following section, which contains a code example for rendering using DirectShow. However, it is possible to access uncompressed and unencrypted content using the Windows Media Format SDK and to render the content.
The process of reading and decoding Windows Media is supported directly through the Windows Media Format SDK using codecs that are implemented as either Audio Codec Manager (ACM) or Video Codec Manager (VCM) installable codecs, or DirectX Media Objects (DMOs). (The ACM and VCM are standard components of most versions of Windows, and DMOs comprise a new set of interfaces in the DirectX 8.0 SDK.) The Windows Media Format SDK uses the FourCC code in the digital media stream to find the installed and relevant codec in the registry. It then loads this DLL and delivers compressed content directly from the file to the codec for decompression. Pulse code modulation (PCM) or bitmap data is then rendered, for example, in a player. Using the IWMReaderAdvanced interface, it is possible to return compressed samples if required.
Reading and decoding Windows Media files is straightforward. The samples included with the Windows Media Format SDK explain the details, but the most relevant sections of code for decompressing streams are reiteratedhere.
This function is taken from the AudioPlayer sample and demonstrates a simple application to render audio in two parts. The first part loads and initializes the relevant types, and the second part renders the audio to the audio device. (See the sample for the complete code.)
HRESULT Play (LPCWSTR pszURL, HANDLE hCompletionEvent,
HRESULT *phrCompletion)
{
IWMReader *m_pReader;
WMCreateReader (NULL, 0, &m_pReader);
m_pReader- >Open (pszUrl ths NULL);
WaitForSingleObject (m_hOpenEvent, INFINITE);
IWMOutputMediaProps *pProps;
pReader->GetOutputProps (0, &pProps);
BYTE pBuffer [1024];
DWORD cbBuffer = 1024;
WM_MEDIA_TYPE *pMediaType = (WM_MEDIA_TYPE *)pBuffer;
pProps->GetMediaType (pMediaType, &cbBuffer);
pProps->Release ();
// Audio playback.
WAVEFORMATEX *ppwfx=(WVEFORMATEX *)pMediaType->pbFormat;
memcpy(&m_wfx, pwfx, sizeof (WAVEFORMATEX)+pwfx->cbSize);
waveOutOpen (&m_hwo, WAVE_MAPPER, m_wfx, (DWORD)WaveProc,
(DWORD)this, CALLBACK_FUNCTION);
// Start read, triggers render.
HRESULT hr = m_pReader->Start(0,0,1.0,NULL);
return hr;
}
The following code is the implementation of the IWMStatusCallback::OnSample callback method. After the reader object begins reading a file, the decompressed samples are delivered to this callback. Any operations to be performed on samples, including rendering, must be included in this method.
HRESULT CAudioPlay::OnSample( /* [in] */ DWORD dwOutputNum,
/* [in] */ QWORD cnsSampleTime,
/* [in] */ QWORD cnsSampleDuration,
/* [in] */ DWORD dwFlags,
/* [in] */ INSSBuffer __RPC_FAR *pSample,
/* [in] */ void __RPC_FAR *pvContext )
{
// Be sure it is an Audio sample.
DWORD dwIn;
m_pReader->GetOutputCount (&dwIn);
HRESULT hr = S_OK ;
BYTE *pData = NULL ;
DWORD cbData = 0 ;
hr = pSample->GetBufferAndLength( &pData, &cbData ) ;
if( FAILED( hr ) )return hr;
LPWAVEHDR pwh = (LPWAVEHDR) new BYTE[sizeof(WAVEHDR)+cbData];
if( NULL == pwh )
return( HRESULT_FROM_WIN32( GetLastError() ) ) ;
pwh->lpData = ( LPSTR )&pwh[1] ;
pwh->dwBufferLength = cbData;
pwh->dwBytesRecorded = cbData ;
pwh->dwUser = (DWORD)cnsSampleTime ;
pwh->dwLoops = 0 ;
pwh->dwFlags = 0 ;
CopyMemory( pwh->lpData, pData, cbData );
// Prepare the header for playing.
hr = waveOutPrepareHeader( m_hWaveOut, pwh, sizeof(WAVEHDR) );
// Render the Audio.
if( hr == MMSYSERR_NOERROR )
hr = waveOutWrite( m_hWaveOut, pwh, sizeof( WAVEHDR ) ) ;
return S_OK ;
}
DirectShow Code Example
The following code fragment provides a simple implementation of the main processing loop of a media player using DirectShow. This application can be used to play any audio or video file for which there is a DirectShow file parser and format decompressor. The application requires very little code because the DirectShow streaming infrastructure does the work of parsing the media file, decoding it, rendering and synchronizing the media, and streaming it.
#include <DShow.h>
#include <windows.h>
void main(void)
{
IGraphBuilder *pGraph;
IMediaControl *pMediaControl;
CoInitialize(NULL);
CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC,
IID_IGraphBuilder, (LPVOID *)&pGraph);
pGraph->QueryInterface(IID_IMediaControl, (void **)&pMediaControl);
pGraph->RenderFile(L"test.avi", NULL);
pMediaControl->Run();
MessageBox(NULL,"Click me to end playback.","DirectShow",MB_OK);
pMediaControl->Release();
pGraph->Release();
CoUninitialize();
}
To use the new ASF Reader filter, add the following two method calls to the listing before the call to RenderFile:
IBaseFilter *pASFReader;
CoCreateInstance(CLSID_WMAsfReader, NULL, CLSCTX_INPROC_SERVER,
IID_IBaseFilter, (void **) pASFReader);
pGraph->AddFilter(pASFReader, L"ASF Reader");
With the current version of the Windows Media Format SDK, DirectShow requires a software certificate (also called a key) to play Windows Media-based content. The application implements a lightweight COM object, called a key provider, and uses the IObjectWithSite interface to pass this object to DirectShow:
IObjectWithSite* pObjectWithSite = NULL;
hr = pGraph->QueryInterface(IID_IObjectWithSite,
(void**)&pObjectWithSite);
if (SUCCEEDED(hr))
{
hr = pObjectWithSite->SetSite((IUnknown*)(IServiceProvider*)&prov);
pObjectWithSite->Release();
}
For more information, refer to the DirectShow SDK documentation.
Conclusion: When To Use Which SDK
This section summarizes the information provided in the preceding sections from the perspective of user scenarios, and recommends which SDKs to use for which types of projects.
- To add Windows Media playback to an application or Web site:
Use the Windows Media Player SDK, which contains the ActiveX® control that is the basis of the Windows Media Player. Using this control you can create a custom player by writing XML script. Use the 6.4 version of the OCX control for an application and either version 6.4 or 7.0 for a Web site.
- To embed a branded player in your Web site:
Use the Windows Media Player 7 SDK. The Windows Media Player ActiveX control allows you to design custom skins to give the Player a unique look or brand. You can also insert custom branding and promotional or advertising information inside the Player window.
- To add Windows Media Format support to an application not based on DirectShow: (or where you prefer access to the full range of features in the Windows Media Format SDK)
Use the Windows Media Format SDK, which gives you direct access to the creation and management of streams that use Windows Media Format.
- To add Windows Media Format support to a DirectShow-based application:
Use the DirectX 8 SDK. This SDK contains DirectShow filters for reading and writing Windows Media, which you can easily integrate into your application. While the filters directly access the Windows Media Format SDK, many of the features are not exposed through these filters—this will change in future updates of the filters. For instance, Digital rights management is available only by using DirectX 8.1 (included in Windows XP, and available for Windows 98, Windows Millennium Edition, and Windows 2000), or via the Windows Media Format SDK directly.
- To build a Windows-based universal-format player:
Use the DirectX 8 SDK to write a DirectShow application. DirectShow provides a plug-in model that lets you support a wide range of codecs, file formats, and network protocols by adding filters for these operations. You can create your own filters for these proprietary components or license them from other parties.
- To build an application for editing Windows Media Format files:
Use the DirectX 8 SDK. This SDK contains the DirectShow Editing Services APIs, which simplify the task of editing Windows Media by presenting a timeline paradigm and operations for compositing media, inserting transitions and effects, and rendering from the timeline. DirectShow Editing Services APIs offer persistence of projects using XML and support "smart" recompression of Windows Media files, making it easier to edit or combine files that are already compressed.