Scripting the track element

You can create customized subtitle displays for your video webpages and manipulate the track element, methods, and properties through scripting.

Introduction

Using the track element solely in HTML provides a basic captioning experience, but the track API can add flexibility to your webpages. The HTML solution requires that you use the controls attribute with the video element. If your app has a custom user interface to control playback, as shown in Using JavaScript to control the HTML5 video player, the built-in video controls will most likely be turned off. Any behavior you can do with the built-in controls you can also do by using script including caption selection and display. The following article shows how to access caption or subtitle text, display text in styled caption blocks, select tracks, and use special tracks to create a webpage slide show to go with your videos.

Get and use the track objects in JavaScript

The track API supports several objects that give you access the track elements and the content they represent. Here you can see the basic text track objects you can use to return content from the track and video elements.

Object Method or property to use to get it What it's used for
Video

<video> </video> as element, var oVideo = document.getElementById(videoElementId) as object.

Use to get track lists, or manipulate playback.
track <track> as element, oTrack = document.getElementById(trackElementId); as object. Track element and object. Use to get and set properties like kind, src, or label on specific tracks.
TextTrackList oTrackList = document.getElementById(VideoELementId).textTracks; Represents a list of all the TextTracks associated with a Video object. Use the length property to see how many tracks the list contains.
TextTrack oTextTrack = document.getElementById(trackElementId).track; or oTextTrack = oTrackList[i]; // Where i is the index of the track to get Use to get cues and timing values. It's returned by the track property on a track object, or select it from a TextTrackList.
TextTrackCueList oCuesList = oTextTrack.cues; This is the list of TextTrackCue objects that contain the individual subtitles for a track.
TextTrackCue var oFirstCue = oCueList[0]; The TextTrackCue (cue) object gives you access to the start and end times, and text of a subtitle segment.

 

Get a cue

To get a track's TextTrackCue content, you can start from either the track or video elements. It can get confusing with the generous use of the track and TextTrack names. In HTML, the track element (a child of the video element) defines the file that contains the times and captions. The track element provides the language, kind of file, and other attributes. The track object represents the track element in JavaScript, and can be used to get and set attributes in HTML, and to get the track property. The track property returns the TextTrack object. The TextTrack object represents the content and returns the TextTrackCueList, which consists of a series of TextTrackCue objects. The TextTrackCue objects provide the startTime, endTime, and text properties.

The following example hopefully clears any confusion. It defines a video element, source element, and a track element in the HTML. The buttons call JavaScript functions that return the cue text, start time, and end time using either the video or track elements.

<!DOCTYPE html>
<html>
<head>
    <title>How to get there from here</title>
    <!-- only force Internet Explorer 10 standards for testing on local machine -->
    <meta http-equiv="X-UA-Compatible" content="IE=10" />
    <script type="text/javascript">

    //  Get a textTrack object through the video element
    function cuesFromVideoElement(){
       var oVideo = document.getElementById("myVideo");  // get video object
       var oTrackList = oVideo.textTracks;  // get list of tracks on video object
       var oTextTrack = oTrackList[0];      // get first textTrack object (English)
       getCues(oTextTrack);                 // get cues from textTrack object and display
    }

    //  Get a textTrack object through the track element
    function cuesFromTrackElement() {
        var oTrack = document.getElementById("entrack");  // get track object from English track
        var oTextTrack = oTrack.track;                    // get track list       
        getCues(oTextTrack);                 // get cues from textTrack object and display
    }

    //  Get cue content from a textTrack object
    function getCues(oTextTrack){
       var oCueList = oTextTrack.cues  // get list of cues associated with textTrack
       var oMyCue = oCueList[0];       // get first cue in the list
       var sText = oMyCue.text;        // get text from first cue
       var sTime = oMyCue.startTime;   // get start time from first cue
       var eTime = oMyCue.endTime;     // get end time from first cue
       show(sText, sTime,eTime);     
    }

    //  Display the content in a div element
    function show(sText, sTime,eTime) {
        var dsp = document.getElementById("display"); // get display area
        dsp.innerHTML = "Text: " + sText;  // show text 
        dsp.innerHTML += "<br />StartTime: "+ sTime; // show start time
        dsp.innerHTML += "<br />End Time: " + eTime; // show end time
    }
    </script>
</head>
<body>
<h1>Get text, start and end time, from a track</h1>
<video id="myVideo" controls >
   <source src="http://ie.microsoft.com/testdrive/Videos/BehindIE9ModernWebStandards/Video.mp4" />
   <track id="entrack" label="English subtitles" kind="captions" src="entrack.vtt" srclang="en" default>
   <track id="estrack" label="Spanish subtitles" kind="captions" src="estrack.vtt" srclang="sp">
   <track id="detrack" label="Germam subtitles" kind="captions" src="detrack.vtt" srclang="de">
   HTML5 video not supported
   </video>
   <br />
   <button onclick="cuesFromVideoElement();">Get a cue from video element</button>
   <button onclick="cuesFromTrackElement();">Get a cue from track element</button>
   <div id="display">   
   </div>
</body>
</html>

In this example, if you click Get a cue from video element, it calls the "cuesFromVideoElement()" function. This function first gets the video object, and then returns a list of tracks from video object using the textTracks property. Though three tracks are returned, only the first track is used. The first track (oTextTrack) is then passed to the "getCues()" function.

The "getCues()" function uses the cues property to return a TextTrackCueList object. The function then gets the first TextTrackCue object from the TextTrackCueList object. The TextTrackCue object provides the text, startTime, and endTime properties, which are then passed to the "show()" function. The "show()" function displays the content in a div element on the webpage.

If you click Get a cue from track element, it calls the "cuesFromTrackElement()" function that uses the id property from the track element (myTrack) to get the track object. The track property returns the TextTrack object from the track object. At this point, the TextTrack object is the same object that was returned in the "cuesFromVideoElement()" function, and the "getCues()" function is called to get and display the content.

The previous example retrieved only the first cue of a track element. This next example loops through all the cues and displays the captions in a scrollable div element.

    <script type="text/javascript">
      function getCues() {          
        var myTrack = document.getElementById("entrack").track; // get text track from track element          
        var myCues = myTrack.cues;   // get list of cues                    
        for (var i = 0; i < myCues.length; i++) {
            //   document.getElementById("display").innerHTML += (myCues[i].getCueAsHTML().textContent + "<br/>");  //append track label
            document.getElementById("display").innerHTML += (myCues[i].text + "<br/>");  //append track label
        }
      }
  </script>
  </head>
  <body>
    <video id="video1" controls  >
      <source src="video.mp4">
      <track id="entrack" label="English subtitles" kind="captions" src="entrack.vtt" srclang="en" default>
    </video>
    <p>
    <button onclick="getCues();">Show tracks</button>
    </p>
    <div style="display:block; overflow:auto; height:200px; width:650px;" id="display"></div>

These examples use the text property to get the text portion of a cue. You can also get the cue's text content using the getCueAsHTML method. This method returns the text as a document fragment. This fragment can be added to your webpage using appendChild or replaceChild methods.

Important  For security reasons, Internet Explorer 10 removes most tags and scripting from caption text, but some elements are parsed into HTML nodes, such as br tags inserted for line breaks. For an example, see the getCueAsHTML reference page.

 

Using events to get synchronized text cues

When a video is playing, the cuechange event fires with each new time segment of a synchronized text track file. This enables an app to know when a new caption is being displayed in the video player, or the app can handle the caption itself. The activeCues property enables the app to get the cue that goes with the current time segment of the video. A text track can have times that overlap, so it's possible to have more than one caption for a given point in video time. The activeCues property returns a TextTrackCueList object that represents one or more cues that go with the time segment when activeCues is read. The length property returns the number of cues that are contained in the active track cue list.

The following example creates a handler for the cuechange event that uses the activeCues property to get the current caption and displays it in a div element. When the cuechange event fires, the handler checks the length property to determine if there are cues available. If there are one or more cues, the first cue in the activeCues list is displayed. For this example, it's assumed that captions are created end-to-end, so there's no provision for multiple active cues. Your app can use the length property value to drive afor loop if you want to get multiple cues and handle them appropriately.

<!DOCTYPE html >
<html >
  <head>
  <title>Cuechange event example</title>
    <!-- only force Internet Explorer 10 standards for testing on local machine -->
    <meta http-equiv="X-UA-Compatible" content="IE=10" />

    <script type="text/javascript">
        document.addEventListener("DOMContentLoaded", function () {  // don't run this until all DOM content is loaded 
            var track = document.getElementById("track1");
            track.addEventListener("cuechange", function () {
                var myTrack = this.track;             // track element is "this" 
                var myCues = myTrack.activeCues;      // activeCues is an array of current cues.                                                    
                if (myCues.length > 0) {
                    var disp = document.getElementById("display");
                    disp.innerText = myCues[0].text;
                }
            }, false);
        }, false);      
    </script>
  </head>
  <body>
    <video id="video1" controls>
      <source src="video.mp4"  >
      <track id='track1' label='English captions' src="entrack.vtt" kind='subtitles' srclang='en' default >    
    </video>
    <div id="display">      
    </div>
  </body>
</html>

The previous example displays the caption in both the video player, and in our own caption display. The next example builds on this code but turns off the video player's captioning display, and applies CSS styles. Because the caption we're displaying is text in a div element, it can be positioned anywhere on the screen, including over the video, as well as changing the font and color.

The overlays CSS styled caption text on the video player.

<!DOCTYPE html >
<html >
  <head>
  <title>Styled text example</title>
    <!-- only force Internet Explorer 10 standards for testing on local machine -->
    <meta http-equiv="X-UA-Compatible" content="IE=10" />
   <style type="text/css">
    
    #display
    {       
      position:absolute;    
      display:block;   
      text-align :center;
      color:yellow;
      font-size:24px;
      font-family:Comic Sans MS;
      text-shadow: 0.1em 0.1em 0.15em #333;
      z-index:100; /* set z-index to be sure div is on top */
    }    
   </style>
    <script type="text/javascript">        
        var video;
        document.addEventListener("DOMContentLoaded", function () {  // don't run this until all DOM content is loaded 
            //  get objects associated with the video, track, and div elements  
            video = document.getElementById("video1");
            var disp = document.getElementById("display");
            var track = document.getElementById("track1");           

            video.addEventListener("loadedmetadata", function () {
                document.getElementById("player").style.width = video.videoWidth + "px";  // make enclosure div width == video width
                disp.style.top = (video.style.top + (video.videoHeight * .05)) + "px"; // set the text to appear at 5% from the top of the video
                disp.style.left = video.style.left;  // set the text to appear relative to the left edge of the video
                disp.style.width = video.videoWidth + "px"; // set text box to the width of the video
            }, false);

            track.addEventListener("cuechange", function () {
                var myTrack = this.track;             // track element is "this" 
                var myCues = myTrack.activeCues;      // activeCues is an array of current cues.                                                    
                if (myCues.length > 0) {
                    disp.innerText = myCues[0].text;   // write the text
                }
            }, false);
        }, false);
                    
    </script>
  </head>
  <body>
    <div id="player" style="width:640px;"> <!-- container div that is sized by script -->
      <video id="video1" controls style="width:100%;">
        <source src="video.mp4"  >
        <!-- by using "metadata" as the kind, it will suppress the video player's own caption display -->
        <track id="track1" label="English captions" src="entrack.vtt" kind="metadata" srclang="en" default>    
      </video>
      <div id="display">      
     </div>    
    </div>
  </body>
</html>

The event listeners for the video and track elements are created within the DOMContentLoaded handler. This ensures that all the elements in the page (including video and track) are loaded before you try to access them. If an app tries to use events before the elements load, Internet Explorer 10 throws an exception.

The div element that displays the captions needs to be set to the same width as the video display. The onloadedmetadata event provides a notification when the dimensions of the video element are known. This enables the app to set the width and location of the caption area div. While the video element might be loaded, if the app tries to get the videoWidth and videoHeight before the content has loaded, it returns default or zero values. The style attribute is used to set the width, left, and top of the text and other elements. The pixel calculations are done first, and then the result is converted to a string by appending px (pixels) to work correctly with the style object.

Similar to the previous example, the oncuechange event drives the captions that are displayed as the video runs. Unlike the earlier example, the kind of track is set to metadata to suppress caption display in the video player. The track file itself is unchanged. If you want to also show built-in captions, change the kind attribute to kind="subtitles."

Note  Overlaying text using Cascading Style Sheets (CSS) won't work when the video is in full-screen mode. If you need captions to be visible when the player is in full-screen mode, use the built-in display.

 

Changing tracks from script

In Getting started with the Track element, an example shows how to use multiple track elements to provide the user with language choices for subtitles. To choose a different subtitle, the user picks it from the captioning selection menu on the built-in controls. To select a track this way, the controls attribute must be present on the video element.

If the controls attribute isn't used, you can programmatically change tracks using the mode property. The mode property offers three states defined as constants in the track object, SHOWING, HIDDEN, and OFF. The following example shows how to use modes to switch between three language tracks.

When the page loads, the list of text tracks associated with the video element is obtained using the textTracks property. The app uses createElement in a for loop to create a div element for each track label. Each div element has an onclick event that points to the "changeTrack()" function. To simplify the code, the Id attribute of each div is set to the current value of i in the for loop. The i value is also hard coded into the onclick event to be passed as a parameter. The "changeTrack()" function's for loop compares the incoming index parameter (i) and on a match, sets the track mode to SHOWING to show it's the active track. Angle brackets (>like this<) are also added to the track label text that's displayed. If the index doesn't match, the track's mode is set to OFF, and the track label in the innerText is updated to remove any angle brackets.

<!DOCTYPE html >
<html >
<head>
    <title>Switch track example</title>
    <style type="text/css">
    #display
{
    display:block;
    width:640px;
    height:100px;
    color:Blue;
    background-color:#e7f1fd;
    border-radius:20px;
    border: 1px solid blue;
    text-align:center;
    font-size:18pt;
    font-weight:bold;          
 }
    
    </style>
    <script type="text/javascript">
    //  When the DOM is loaded, create an event to displays the labels of the text tracks once all the media is loaded
        document.addEventListener("DOMContentLoaded", function () {
            document.getElementById("video1").addEventListener("loadeddata", function () {
                getTracks();
            }, false);
        }, false);
       
      function getTracks() {
        var allTracks = document.getElementById("video1").textTracks; // get list of tracks   
        document.getElementById("display").innerHTML = ""; // clear text       
        for (var i = 0; i < allTracks.length; i++) {
          //  append track label with a click event 
          var temp = document.createElement("div");
          //  create labels that highlight the active track
          if (allTracks[i].mode == 2) {
            temp.innerText = "> " + allTracks[i].label + " <";
          } else {
            temp.innerText = allTracks[i].label;
          }
          temp.setAttribute("onclick", "changeTrack(" + i + ")");
          temp.setAttribute("role", "button");
          temp.setAttribute("id", i);
          document.getElementById("display").appendChild(temp);
      }
    }

    function changeTrack(index) {
      var allTracks = document.getElementById("video1").textTracks; // get list of tracks
      for (var i = 0; i < allTracks.length; i++) {
        if (i == index) {
          allTracks[i].mode = allTracks[i].SHOWING; // show this track
          document.getElementById(i).innerText = "> " + allTracks[i].label + " <";
        } else {
          allTracks[i].mode = allTracks[i].OFF; // hide all other tracks
          document.getElementById(i).innerText = allTracks[i].label;
        }
      }
    }
    </script>
</head>
  <body>
    <video id="video1" autoplay controls>
      <source src="http://ie.microsoft.com/testdrive/Videos/BehindIE9ModernWebStandards/Video.mp4" type="video/mp4"/>
      <!-- The English text track is used by default -->
      <track id="entrack" label="English subtitles" kind="captions" src="entrack.vtt" srclang="en" default>
      <track id="sptrack" label="Spanish subtitles" kind="captions" src="estrack.vtt" srclang="es">
      <track id="detrack" label="German subtitles" kind="captions" src="detrack.vtt" srclang="de">
      HTML5 Video not supported 
    </video>
    <div id="display"></div>
  </body>
</html> 

Create a webpage slide show companion for your video

The track element can be used for more than just showing translations or commentary. By using the kind="metadata" attribute on a track element, you can specify that a track is active (synchronized with the video), but not displayed in the player. An earlier example used the characteristic behavior of kind="metadata" to turn off the built-in text display so that the text could be displayed separately. But, the metadata track can be used for much more.

In the next example the video has two tracks associated with it, one contains captions, and the other is a list of URLs. When the video is played, the caption track is displayed in the player, and the URLs are displayed in an IFrameElement to create a slide show. The URL is also printed under the video player as a live link that opens in a separate tab or window using the target="_blank" attribute on the anchor tag.

The format of the metadata file is a standard WebVTT file (see Create WebVTT or TTML files with Caption Maker), except that URLs are used instead of captions. This example shows the first few lines of a sample file (called "control1.vtt"):

WEBVTT

00:00:00.700 --> 00:00:06.342
https://msdn.microsoft.com/en-us/ie

00:07.210 --> 00:15.441
https://msdn.microsoft.com/en-us/library/ie/hh828809(v=vs.85).aspx

00:15.441 --> 00:34.861
https://msdn.microsoft.com/en-us/library/ie/hh771820(v=vs.85).aspx

00:34.861 --> 00:57.491
http://ie.microsoft.com/testdrive/

The "control1.vtt" file was created by copying and pasting URLs into the caption field of Caption Maker for the segments of the video where the content was appropriate.

<!DOCTYPE html>
<html>
  <head>
    <title>Metadata track driven slideshow</title>
    <style type="text/css">
      #slideshow
      {
        /* set the iframe dimensions large enough to see */  
        width:100%;
        height:400px;
      }       
    </style>
    <script type="text/javascript">
      document.addEventListener("DOMContentLoaded", function () {  // don't run this until all DOM content is loaded 
        var tracks = document.getElementById("video1").textTracks;
        // Set the metadata track to be active
        tracks[1].mode = tracks[1].SHOWING;  // Make sure it's active 
        var track2 = document.getElementById("control1"); // get the track object to add an event to 
        
        //  This is the main event that drives the slide show
        track2.addEventListener("cuechange", function () {          
          var myCues = this.track.activeCues;   // get an array of current cues.
          if (myCues.length > 0) {              // test to be sure there's cues
            document.getElementById("slideshow").src = myCues[0].text;  // get the text part of the cue (URL)
            // build an anchor tag and set it to the innerHTML of the span tag 
            document.getElementById("currentURL").innerHTML = "<a href='" + myCues[0].text + "' target='_blank' >" + myCues[0].text + "</a>";
          }
        }, false);
      }, false);      
</script>
</head>
  <body>
    <video id="video1" autoplay controls>
      <source src="http://ie.microsoft.com/testdrive/Videos/BehindIE9ModernWebStandards/Video.mp4" type="video/mp4"/>
      <track id="track1" label="English captions" src="entrack.vtt" kind="subtitles" srclang="en" default >    
      <track id="control1" label="command track" src="control1.vtt" kind="metadata">
    </video>
    <div>URL: <span id="currentURL"></span></div>
    <iframe id="slideshow" ></iframe>
  </body>
</html>

This example uses oncuechange event, which is attached to the metadata track (id = "track2"), and the activeCues property gets the current cue text that contains the URL. The src attribute for the iframe is set to the URL contained in the current cue text to display the webpage. The URL is also used to build an anchor link that's used as the innerHTML of a span element. This displays as a live link.

The mode property is set to SHOWING for the metadata track to ensure that it's active. However, while the mode is set to SHOWING, a metadata track will never be visible.

A metadata track can also be used to provide more info about the caption track, such as adding speaker information, or notes about who's talking. See Internet Explorer 10 video captioning demo for an example.

Hosting track files and going further

As mentioned in Create WebVTT or TTML files with Caption Maker, make sure that the server hosting your track files has the correct mime type.

Track file to serve Extension setting Mime type setting
Timed Text Markup Language (TTML) .ttml application/ttml+xml
Web Video Text Track (WebVTT) .vtt text/vtt

 

To test with working files, you can substitute a ".txt" extension for the ".vtt" on WebVTT files and have it still work. You'll still need to make sure that the internal format is correct as described in Create WebVTT or TTML files with Caption Maker. However, this is considered an unsupported format for Windows Internet Explorer, so only try this in a testing environment. It might stop working at some time.

The examples shared here show the basics of adding accessibility with closed captioning, subtitles and comments, and scripted external content to your HTML5 video webpages. Hopefully these examples inspire you to try them on your own webpages.

Create WebVTT or TTML files with Caption Maker

Getting started with the track element

How to create effective fallback strategies

HTML5 Timed Text Track sample

HTML5 audio and video

Internet Explorer 10 video captioning demo

Internet Explorer 10 Samples and Tutorials

Make your videos accessible with Timed Text Tracks