1.1 Glossary

This document uses the following terms:

audio healer: One or more digital signal processing algorithms designed to mask or conceal human-perceptible audio distortions that are caused by packet loss and jitter.

audio video profile (AVP): A Real-Time Transport Protocol (RTP) profile that is used specifically with audio and video, as described in [RFC3551]. It provides interpretations of generic fields that are suitable for audio and video media sessions.

codec: An algorithm that is used to convert media between digital formats, especially between raw media data and a format that is more suitable for a specific purpose. Encoding converts the raw data to a digital format. Decoding reverses the process.

Comfort Noise payload: A description of the noise level of comfort noise. The description can also contain spectral information in the form of reflection coefficients for an all-pole model of the noise.

Common Intermediate Format (CIF): A picture format, described in the H.263 standard, that is used to specify the horizontal and vertical resolutions of pixels in YCbCr sequences in video signals.

conference: A Real-Time Transport Protocol (RTP) session that includes more than one participant.

connectionless protocol: A transport protocol that enables endpoints (2) to communicate without a previous connection arrangement and that treats each packet independently as a datagram. Examples of connectionless protocols are Internet Protocol (IP) and User Datagram Protocol (UDP).

connection-oriented transport protocol: A transport protocol that enables endpoints (2) to communicate after first establishing a connection and that treats each packet according to the connection state. An example of a connection-oriented transport protocol is Transmission Control Protocol (TCP).

contributing source (CSRC): A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the synchronization source (SSRC) identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). See [RFC3550] section 3.

datagram: A style of communication offered by a network transport protocol where each message is contained within a single network packet. In this style, there is no requirement for establishing a session prior to communication, as opposed to a connection-oriented style.

dominant speaker: A participant whose speech is both detected by a mixer and perceived to be dominant at a specific moment. Heuristics typically are used to determine the dominant speaker.

dual-tone multi-frequency (DTMF): In telephony systems, a signaling system in which each digit is associated with two specific frequencies. This system typically is associated with touch-tone keypads for telephones.

encryption: In cryptography, the process of obscuring information to make it unreadable without special knowledge.

endpoint: (1) A client that is on a network and is requesting access to a network access server (NAS).

(2) A device that is connected to a computer network.

FEC distance: A number that specifies an offset from the current packet to a previous audio packet that is to be sent as redundant audio data.

forward error correction (FEC): A process in which a sender uses redundancy to enable a receiver to recover from packet loss.

I-frame: A video frame that is encoded as a single image, such that it can be decoded without any dependencies on previous frames. Also referred to as Intra-Coded frame, Intra frame, and key frame.

instantaneous decoder refresh (IDR): A video frame that can be decoded without reference to previous video frames.

Interactive Connectivity Establishment (ICE): A methodology that was established by the Internet Engineering Task Force (IETF) to facilitate the traversal of network address translation (NAT) by media.

Internet Protocol version 4 (IPv4): An Internet protocol that has 32-bit source and destination addresses. IPv4 is the predecessor of IPv6.

Internet Protocol version 6 (IPv6): A revised version of the Internet Protocol (IP) designed to address growth on the Internet. Improvements include a 128-bit IP address size, expanded routing capabilities, and support for authentication and privacy.

jitter: A variation in a network delay that is perceived by the receiver of each packet.

Media Source ID (MSI): A 32-bit identifier that uniquely identifies an audio or video source in a conference.

mixer: An intermediate system that receives a set of media streams of the same type, combines the media in a type-specific manner, and redistributes the result to each participant.

network address translation (NAT): The process of converting between IP addresses used within an intranet, or other private network, and Internet IP addresses.

packetization time (P-time): The amount, in milliseconds, of audio data that is sent in a single Real-Time Transport Protocol (RTP) packet.

participant: A user who is participating in a conference or peer-to-peer call, or the object that is used to represent that user.

Real-Time Transport Control Protocol (RTCP): A network transport protocol that enables monitoring of Real-Time Transport Protocol (RTP) data delivery and provides minimal control and identification functionality, as described in [RFC3550].

Real-Time Transport Protocol (RTP): A network transport protocol that provides end-to-end transport functions that are suitable for applications that transmit real-time data, such as audio and video, as described in [RFC3550].

RTCP packet: A control packet consisting of a fixed header part similar to that of RTP packets, followed by structured elements that vary depending upon the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet. See [RFC3550] section 3.

RTP packet: A data packet consisting of the fixed RTP header, a possibly empty list of contributing sources, and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined. Typically one packet of the underlying protocol contains a single RTP packet, but several RTP packets can be contained if permitted by the encapsulation method. See [RFC3550] section 3.

RTP payload: The data transported by RTP in a packet, for example audio samples or compressed video data. For more information, see [RFC3550] section 3.

RTP session: An association among a set of participants who are communicating by using the Real-Time Transport Protocol (RTP), as described in [RFC3550]. Each RTP session maintains a full, separate space of Synchronization Source (SSRC) identifiers.

RTVideo: A video stream that carries an RTVC1 bit stream.

Session Description Protocol (SDP): A protocol that is used for session announcement, session invitation, and other forms of multimedia session initiation. For more information see [MS-SDP] and [RFC3264].

Session Initiation Protocol (SIP): An application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants. SIP is defined in [RFC3261].

silence suppression: A mechanism for conserving bandwidth by detecting silence in the audio input and not sending packets that contain only silence.

stream: A flow of data from one host to another host, or the data that flows between two hosts.

Super P-frame (SP-frame): A special P-frame that uses the previous cached frame instead of the previous P-frame or I-frame as a reference frame.

synchronization source (SSRC): The source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback. Examples of synchronization sources include the sender of a stream of packets derived from a signal source such as a microphone or a camera, or an RTP mixer. A synchronization source may change its data format (for example, audio encoding) over time. The SSRC identifier is a randomly chosen value meant to be globally unique within a particular RTP session. A participant need not use the same SSRC identifier for all the RTP sessions in a multimedia session; the binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC. See [RFC3550] section 3.

Transmission Control Protocol (TCP): A protocol used with the Internet Protocol (IP) to send data in the form of message units between computers over the Internet. TCP handles keeping track of the individual units of data (called packets) that a message is divided into for efficient routing through the Internet.

Traversal Using Relay NAT (TURN): A protocol that is used to allocate a public IP address and port on a globally reachable server for the purpose of relaying media from one endpoint (2) to another endpoint (2).

TURN server: An endpoint (2) that receives Traversal Using Relay NAT (TURN) request messages and sends TURN response messages. The protocol server acts as a data relay, receiving data on the public address that is allocated to a protocol client and forwarding that data to the client.

User Datagram Protocol (UDP): The connectionless protocol within TCP/IP that corresponds to the transport layer in the ISO/OSI reference model.

video encapsulation: A mechanism for transporting video payload and metadata in Real-Time Transport Protocol (RTP) packets.

video frame: A single still image that is shown as part of a quick succession of images in a video.

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as defined in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.