Audio and video performance (HTML)
Full-screen video playback is a typical scenario in an immersive user experience. This can be easily achieved by using the video tag and Cascading Style Sheets (CSS) styles. The following is a sample CSS style.
For the best system performance when the video is full-screen, the app should automatically hide all unused web elements, such as the transport controls. Otherwise, this could impact the optimized rendering process. For example, apps can use CSS styles to set the "visibility: hidden" or "display: none" on the transport controls. Alternatively, apps can use CSS z-index to place the video on top of all other elements.
You should avoid using other web elements to make your own letterboxes on the video. This disables some of the optimized rendering enhancements.
When in full-screen playback mode, apps should stop any animation in the background. Timers running in the background could unnecessarily wake up the CPU, even when the animation itself is not visible.
You should avoid the things in the following list.
Video elements with CSS outlines. This forces the video rendering to not use the optimized code path that is implemented in Windows 8.
Video elements rendered through a Canvas. This involves extensive memory copies in the rendering process, which are not desirable in a high-performance playback experience.
Video elements embedded in Scalable Vector Graphics (SVG). This is similar to the Canvas case, and will invoke extensive memory copies that are not desirable in a high-performance playback experience.
Setting the msRealTime property on the video or audio tag to true. This causes the system to enter into a low-latency mode, which is desirable for communication scenarios but is less power-efficient.
Video playing in the background. When an app playing video is put into the background, the app should pause the video unless there is a specific reason for the video to continue playing in the background. This reduces the performance impact on the overall system.
Sometimes there are things you would like to do to the video as part of the presentation, such as flipping the video or zooming into a section of the video. We encourage you to look at the extensions added for the video tag before you choose a Canvas option.
Here is a short list of simple rendering options. They are implemented at the native media pipeline level and exposed as properties or methods on the video tag.
- msZoom: when set to true, crops all letterboxes or pillar boxes around the video.
- msSetVideoRectangle: selects a specific rectangular sub-region of the video to be rendered on the video tag. This can be used to zoom into a specific sub-region of the video.
- msHorizontalMirror: flips the video horizontally when set to true.
Here are a few tips for writing DSP plugins.
For video DSP, you should consider using the graphics processing unit (GPU) (i.e. DX Shader code) and avoid software-based implementations as much as possible.
If you are writing multiple DSP filters, you should consider including them in a single Media Foundation Transform (MFT). This reduces the overhead of having two DSP MFTs chained together (although chaining is allowed in the Windows 8 platform).
Here are some considerations when you need to implement other media plugins, such as MF Media Sources or Decoders.
You should keep in mind that codecs for proprietary media formats can typically only operate in software mode. Therefore, they won’t be able to leverage hardware acceleration that is available to other standard media formats, such as H.264.
The plugin components should implement functions to handle Quality-Management (QM) messages. Then the overall media pipeline doesn’t have to take on tasks beyond the system capacity.
Media format selection can be a sensitive topic and is often driven by business decisions. From a Windows 8 performance perspective, we’d like to recommend H.264 video as the primary video format and AAC and MP3 as the preferred audio formats. For local file playback, MP4 is the preferred file container for video content. As we mentioned in the previous section, H.264 decoding is accelerated through most recent graphics hardware. It is also worth mentioning that although hardware acceleration for VC-1 decoding is broadly available, for a large set of graphics hardware on the market, the acceleration is limited in many cases to a partial acceleration level (or IDCT level), rather than a full-steam level hardware offload (i.e. VLD mode). On the audio side, we expect hardware offload solutions will be available for AAC and MP3 on the upcoming Windows 8 SoC devices.
One thing you should consider for your apps or services, if you have full control of the video content generation process, is how to keep a good balance between compression efficiency and GOP structure. Relatively smaller GOP size with B pictures can increase the performance in seeking or trick modes.
When including short, low-latency audio effects, for example in games, you should consider using WAV files with uncompressed PCM data to reduce processing overhead that is typical for compressed audio formats.
Scrubbing is always a tough task for media platforms to make really responsive. Here are a couple tips on how to make scrubbing in a custom control as efficient as with the native transport controls.
Set the playbackRate property of the video tag to "0" during scrubbing and reset it back to the pre-scrubbing value afterwards.
Make sure all pixels (or positions) on the slider count. One common mistake is to make apps that use sliders with only a hundred or so valid positions, even though there may be 1200 pixels on the slider. A slider with pixel-precision is always necessary for smooth scrubbing.
Another tip is that apps should avoid layout changes and use msTransform to update the web element positions. This invokes the optimized rendering path internally rather than recalculating the layout again. The slider pointer on a transport control can be a good example to consider.
As mentioned previously, a general recommendation is that you should make the transport controls automatically fade away once there is no user interaction for several seconds. This is extremely helpful during full-screen video playback, though you should consider this even when the video is only a subset of your overall app layout.
The Windows 8 web platform provides basic subtitle functionality based on the <track> element defined in the World Wide Web Consortium (W3C) spec. We adopted Web Video Text Track (WebVTT) and SMPTE Timed Text (SMPTE-TT) as the formats natively supported in HTML5 apps. For the best system performance, we encourage you to leverage what is supported by our web platform. You should keep in mind the possible performance impact if you need to pick a different subtitle data format and handle the parsing of the rendering at the app level.
The video tag has an optional attribute to allow an app to show a poster image before the video data has actually downloaded. This is a great way to make sure network and system resources are used only they are needed, such as when the user decides to actually play the video content. To use this functionality, apps should set the “preload” attribute of the video tag to “none”. The user might have to wait a few seconds more for the content to download, however, it will be more user-friendly in case the app is used in a metered network environment.
It is common for apps to create audio and video tags dynamically during runtime. However, each system has limited resources to support only a certain number of media elements simultaneously. audio and video tags resources might not be freed right away when they are removed from the Document Object Model (DOM) tree, so they will hold onto system sources, such as system memory or GPU memory. One recommendation to mitigate the impact on system resources is to reuse video and audio tags from a pool of already created video and audio tags by setting new sources to the tags. If you need to release a media element, reset the src attribute and remove all “source” elements embedded in the corresponding media element.
To prevent the display from be deactivating when user action is no longer detected, such as when an app is playing video, you can call DisplayRequest.RequestActive.
To conserve power and battery life, you should call DisplayRequest.RequestRelease to release the display request as soon as it is no longer required.
Here are some situations when you should release the display request:
- Video playback is paused, for example by user action, buffering, or adjustment due to limited bandwidth.
- Playback stops. For example, the video is done playing or the presentation is over.
- A playback error has occurred. For example, network connectivity issues or a corrupted file.
For hardware audio offload to be automatically applied, the msAudioCategory must be set to ForegroundOnlyMedia or BackgroundCapableMedia. Hardware audio offload optimizes audio rendering which can improve functionality and battery life.