1.3 Overview

Initiation of a multimedia session or conference requires the exchange of the media details, transport addresses, and other metadata between the parties involved. This exchange facilitates the negotiation of media characteristics for establishing the session. The media characteristics associated with a session are specified in Session Description Protocol (SDP). The exchange and negotiation of the session properties follows the specification of the offer/answer model with the SDP. In applications, these protocols are used to negotiate and establish a multimedia session.

It is a common requirement that the media being exchanged in a multimedia session be protected using some form of encryption. When the Real-Time Transport Protocol (RTP) is used to exchange media, the media can be protected using the Secure Real-Time Transport Protocol (SRTP), which requires exchange of attributes related to SRTP, such as cryptographic parameters. These characteristics can also be negotiated using SDP when the cryptographic characteristics of the media stream are described by a new SDP attribute name crypto, as described in [RFC4568].

After the session is established, media can flow between the participating parties. Often, other networking components, such as network address translation (NAT), are present between two parties and prevent media from traversing between the two parties. In such cases, Interactive Connectivity Establishment (ICE) can be used to facilitate media traversal through these network components. For information about the proprietary extension to the ICE protocol see [MS-ICE]. ICE specifies a protocol for setting up the audio/video RTP streams in a way that allows the streams to perform NAT.

This protocol uses and extends these protocols to support multimedia sessions. This protocol extension consists of the following additions, enhancements, and restrictions:

  • Specifies the parameter and values that are supported for the attribute a=crypto, as specified in the security description for media streams.

  • SRTP and Scale Secure Real-Time Transport Protocol (SSRTP) encryption parameters are not renegotiated once the session is established.

  • An SDP attribute named a=cryptoscale is used for the negotiation of all the cryptographic parameters associated with SSRTP.

  • The RTAudio, G722-Stereo, SILK, RTVideo and H.264UC codecs, in addition to RTData, are supported.

  • A media format for video forward error correction (FEC), ULPFEC-UC.

  • SRTP encryption can be used optionally. This option allows for support of remote peers that do not support Secure-RTP.

  • Transmission Control Protocol (TCP) media addresses in SDP are not used when ICE is not used. This is a deviation from the specification of TCP-based media transport in the SDP.

  • TCP media addresses in SDP are not used in the first Session Initiation Protocol (SIP) INVITE method when ICE is used.

  • Addresses are not used for the rtcp attribute.

  • Limited support for the setup attribute. This is a deviation from the TCP-based media transport in the SDP.

  • Limited support for the connection attribute. This is a deviation from the TCP-based media transport in the SDP.

  • This protocol handles early media only in very specific scenarios in a constrained manner and does not support early media in any other scenarios. Section 3 has more details on these scenarios and constraints.

  • Special handling of Internet Protocol version 6 (IPv6) addresses in SDP.

  • Several deviations from the ICE specifications.

  • The session version on the o= line is not incremented.

  • The format for dual-tone multi-frequency (DTMF) in SDP.

  • A restriction on the name of the RTP payload for redundant audio data.

  • A restriction on the name and sampling rate for comfort noise.

  • A media type m=applicationsharing, which identifies an Remote Desktop Protocol (RDP) based application sharing media stream, or session, over RTP. In the context of the application sharing media stream, m=applicationsharing, five new attributes are defined:

    • a=x-applicationsharing-session-id: Identifies an RDP session.

    • a=x-applicationsharing-role: Determines the party sharing role.

    • a=x-applicationsharing-media-type ("sharer" or "viewer"): Negotiates the RDP media type.

    • a=mid: An identifier of the media described by the containing m= line.

    • a=x-applicationsharing-contentflow: Determines the direction of flow of RDP content.

  • A media level attribute a=tty, which indicates that a user agent has been configured to optimize the transfer of tones used in text telephony.

  • A video media level attribute a=x-caps, which indicates the video capabilities supported by a video receiver.

  • A Multipurpose Internet Mail Extensions (MIME) type application/GW-SDP is defined. This MIME type holds the gateway SDP or SIP trunk SDP.

  • A media level attribute a=x-bypassid, which is a declarative attribute used to indicate the location of the media endpoint or media processor associated with this SDP. It is used to optimize the media path with a gateway in the same location.

  • A media level attribute a=x-bypass, which is a declarative attribute that signifies that the media line with which it is associated involves bypass. It is a media level attribute sent in an answer when the answerer has chosen the bypass path.

  • A media level attribute a=x-mediasettings, which contains the stream capabilities of the sender.

  • A media level attribute a=x-ms-SDP-diagnostics, which is used to notify recipient of additional diagnostic information for that media line.

  • A media level attribute a=feature:MoH, associated with an m=audio media type, which is a declarative attribute to indicate the media channel is streaming music-on-hold media.

  • A session level attribute a=x-mediabw to declare the maximum send and receive bandwidth available for a modality.

  • A session level attribute a=x-devicecaps to declare the device capabilities on an endpoint.

  • A media level attribute a=rtcp-fb to declare that certain RTCP-based feedback messages can be sent and received for the media stream.

  • A media level attribute a=rtcp-rsize to indicate that certain RTCP-based feedback messages are transmitted in a Reduced-Size format, as defined in [MS-RTP].

  • A media level attribute a=x-ssrc-range to declare the range used for Synchronization Source (SSRC) identifiers for the send stream.

  • A media level attribute a=x-source-streamid to declare a Media Source ID (MSI) for a media stream.

  • A media level attribute a=x-source to declare the name of a media source for a media description.

  • A media level attribute a=x-sourceid to declare that the modality of a media stream is panoramic-video.

  • A protocol to negotiate multiplexed media streams.

  • A protocol to negotiate a multi-channel main-video modality.

  • A media level attribute a=x-candidate-info to declare additional attributes associated with a transport address or ICE candidate.

  • A protocol to negotiate a video-based applicationsharing modality.

For details about these extensions and deviations, see section 3.