Audio and video performance (HTML)

When you develop an app that uses audio and video, you should be aware of some important performance considerations. This document summarizes key design areas for getting high-performing media playback in Windows Store apps using JavaScript.

HTML5 media elements, specifically the audio tag and video tag, are becoming the standard for media playback on the web. They are also the standard for media playback in a Windows Store app using JavaScript. While the standard APIs are consistent across various web platforms, there are specific areas you should pay attention to when you develop a Windows Store app using JavaScript on the Windows 8. The Windows 8 specific extensions to the audio and video tags have feature-level and performance enhancements that you should be aware of when designing apps.

Enable full-screen video

Full-screen video playback is a typical scenario in an immersive user experience. This can be easily achieved by using the video tag and Cascading Style Sheets (CSS) styles. The following is a sample CSS style.

video {
    position: fixed;
    width: 100%;
    height: 100%;
}

For the best system performance when the video is full-screen, the app should automatically hide all unused web elements, such as the transport controls. Otherwise, this could impact the optimized rendering process. For example, apps can use CSS styles to set the "visibility: hidden" or "display: none" on the transport controls. Alternatively, apps can use CSS z-index to place the video on top of all other elements.

You should avoid using other web elements to make your own letterboxes on the video. This disables some of the optimized rendering enhancements.

When in full-screen playback mode, apps should stop any animation in the background. Timers running in the background could unnecessarily wake up the CPU, even when the animation itself is not visible.

The msIsLayoutOptimalForPlayback property

The msIsLayoutOptimalForPlayback property is a read-only property that was introduced as a Windows 8 specific extension to the video tag. In a Windows Store app using JavaScript, it indicates whether or not a video tag is in an optimized rendering path. The optimized code path allows for improved system performance, better data protection, and Stereo 3D video rendering. It can be helpful to keep track of this property during app implementation and the debugging process. You can listen to the onMSVideoOptimalLayoutChanged event to be notified when msIsLayoutOptimalForPlayback changes. Please note, the optimized rendering path is not limited to full-screen video playback.

You should avoid the things in the following list.

  • Video elements with CSS outlines. This forces the video rendering to not use the optimized code path that is implemented in Windows 8.

  • Video elements rendered through a Canvas. This involves extensive memory copies in the rendering process, which are not desirable in a high-performance playback experience.

  • Video elements embedded in Scalable Vector Graphics (SVG). This is similar to the Canvas case, and will invoke extensive memory copies that are not desirable in a high-performance playback experience.

  • Setting the msRealTime property on the video or audio tag to true. This causes the system to enter into a low-latency mode, which is desirable for communication scenarios but is less power-efficient.

  • Video playing in the background. When an app playing video is put into the background, the app should pause the video unless there is a specific reason for the video to continue playing in the background. This reduces the performance impact on the overall system.

Leverage Windows 8 specific extensions

Sometimes there are things you would like to do to the video as part of the presentation, such as flipping the video or zooming into a section of the video. We encourage you to look at the extensions added for the video tag before you choose a Canvas option.

Here is a short list of simple rendering options. They are implemented at the native media pipeline level and exposed as properties or methods on the video tag.

  • msZoom: when set to true, crops all letterboxes or pillar boxes around the video.
  • msSetVideoRectangle: selects a specific rectangular sub-region of the video to be rendered on the video tag. This can be used to zoom into a specific sub-region of the video.
  • msHorizontalMirror: flips the video horizontally when set to true.

For more advanced or complicated video and audio Digital Signal Processing (DSP) operations, you should consider writing Microsoft Media Foundation (MF) based media plugins. These can perform better than DSP operations written directly using JavaScript.

Here are a few tips for writing DSP plugins.

  • For video DSP, you should consider using the graphics processing unit (GPU) (i.e. DX Shader code) and avoid software-based implementations as much as possible.

  • If you are writing multiple DSP filters, you should consider including them in a single Media Foundation Transform (MFT). This reduces the overhead of having two DSP MFTs chained together (although chaining is allowed in the Windows 8 platform).

Here are some considerations when you need to implement other media plugins, such as MF Media Sources or Decoders.

  • You should keep in mind that codecs for proprietary media formats can typically only operate in software mode. Therefore, they won’t be able to leverage hardware acceleration that is available to other standard media formats, such as H.264.

  • The plugin components should implement functions to handle Quality-Management (QM) messages. Then the overall media pipeline doesn’t have to take on tasks beyond the system capacity.

Media format selection can be a sensitive topic and is often driven by business decisions. From a Windows 8 performance perspective, we’d like to recommend H.264 video as the primary video format and AAC and MP3 as the preferred audio formats. For local file playback, MP4 is the preferred file container for video content. As we mentioned in the previous section, H.264 decoding is accelerated through most recent graphics hardware. It is also worth mentioning that although hardware acceleration for VC-1 decoding is broadly available, for a large set of graphics hardware on the market, the acceleration is limited in many cases to a partial acceleration level (or IDCT level), rather than a full-steam level hardware offload (i.e. VLD mode). On the audio side, we expect hardware offload solutions will be available for AAC and MP3 on the upcoming Windows 8 SoC devices.

One thing you should consider for your apps or services, if you have full control of the video content generation process, is how to keep a good balance between compression efficiency and GOP structure. Relatively smaller GOP size with B pictures can increase the performance in seeking or trick modes.

When including short, low-latency audio effects, for example in games, you should consider using WAV files with uncompressed PCM data to reduce processing overhead that is typical for compressed audio formats.

Tips for building transport controls with high-performing user experiences

While the video and audio implementation in the Windows 8 web platform has built-in transport controls, it is expected that some Windows Store apps using JavaScript would include their own custom controls that matches the style or character of the particular apps.

Scrubbing is always a tough task for media platforms to make really responsive. Here are a couple tips on how to make scrubbing in a custom control as efficient as with the native transport controls.

  • Set the playbackRate property of the video tag to "0" during scrubbing and reset it back to the pre-scrubbing value afterwards.

  • Make sure all pixels (or positions) on the slider count. One common mistake is to make apps that use sliders with only a hundred or so valid positions, even though there may be 1200 pixels on the slider. A slider with pixel-precision is always necessary for smooth scrubbing.

Another tip is that apps should avoid layout changes and use msTransform to update the web element positions. This invokes the optimized rendering path internally rather than recalculating the layout again. The slider pointer on a transport control can be a good example to consider.

As mentioned previously, a general recommendation is that you should make the transport controls automatically fade away once there is no user interaction for several seconds. This is extremely helpful during full-screen video playback, though you should consider this even when the video is only a subset of your overall app layout.

Subtitles support

The Windows 8 web platform provides basic subtitle functionality based on the <track> element defined in the World Wide Web Consortium (W3C) spec. We adopted Web Video Text Track (WebVTT) and SMPTE Timed Text (SMPTE-TT) as the formats natively supported in HTML5 apps. For the best system performance, we encourage you to leverage what is supported by our web platform. You should keep in mind the possible performance impact if you need to pick a different subtitle data format and handle the parsing of the rendering at the app level.

Using poster images to improve network usage

The video tag has an optional attribute to allow an app to show a poster image before the video data has actually downloaded. This is a great way to make sure network and system resources are used only they are needed, such as when the user decides to actually play the video content. To use this functionality, apps should set the “preload” attribute of the video tag to “none”. The user might have to wait a few seconds more for the content to download, however, it will be more user-friendly in case the app is used in a metered network environment.

Reusing audio and video tags

It is common for apps to create audio and video tags dynamically during runtime. However, each system has limited resources to support only a certain number of media elements simultaneously. audio and video tags resources might not be freed right away when they are removed from the Document Object Model (DOM) tree, so they will hold onto system sources, such as system memory or GPU memory. One recommendation to mitigate the impact on system resources is to reuse video and audio tags from a pool of already created video and audio tags by setting new sources to the tags. If you need to release a media element, reset the src attribute and remove all “source” elements embedded in the corresponding media element.

Display deactivation and conserving power

To prevent the display from be deactivating when user action is no longer detected, such as when an app is playing video, you can call DisplayRequest.RequestActive.

To conserve power and battery life, you should call DisplayRequest.RequestRelease to release the display request as soon as it is no longer required.

Here are some situations when you should release the display request:

  • Video playback is paused, for example by user action, buffering, or adjustment due to limited bandwidth.
  • Playback stops. For example, the video is done playing or the presentation is over.
  • A playback error has occurred. For example, network connectivity issues or a corrupted file.

Hardware audio offloading

For hardware audio offload to be automatically applied, the msAudioCategory must be set to ForegroundOnlyMedia or BackgroundCapableMedia. Hardware audio offload optimizes audio rendering which can improve functionality and battery life.