NAT Traversal and VoIP

Article
03/24/2010

Other versions of this page are also available for the following:

Windows Embedded CE 6.0 R3

Windows Mobile Not Supported Windows Embedded CE Supported

8/27/2008

The Challenge to Peer-to-Peer Internet Communication

In an ideal world, all internet devices would be able to communicate with each other without restriction. The only intermediaries you would have in end to end communication would be routers. Each device would have a routable IP address giving it a publicly reachable Internet identity. Unfortunately, this isn't an ideal world or an ideal Internet. It is an Internet connected to private intranets with private IP (not publicly routable IP) addresses. These private nets used firewalls to keep out malicious attackers and use NAT (Network Address Translation) and NAT routers or NAT firewall/routers to pass traffic between the public IP space and the private, internal one. This passing of traffic through NAT is called NAT Traversal.

Bb330896.9629beb4-841b-4a6b-ad5e-b883679cb016(en-US,WinEmbedded.60).gif

The following bulleted item refers to the above figure, "VoIP calls are Peer-to-Peer":

IP address:port exposed by each Voice over IP (VoIP) client is the public Internet address. There is no NAT or Firewall changing or interfering with the signaling setup and media transmission/reception.

In brief, the way NAT works is that when a device on the internal, private network initiates a connection with a device on the public Internet, the initiating device will send all traffic to the NAT router first. The NAT router will replace the source address, the device's unroutable private address, with the NAT router's own public address before passing the traffic on to its Internet destination. When a response is received, the NAT router searches its translation tables, also known as mapping tables, to find the appropriate, original internal IP address and port of the initiating device. It then passes the response back to that device.

As you might suspect, while the NAT router can keep a mapping of its internal devices that are calling out to the Internet and fix up the addresses and ports, it has a more difficult problem when a device from the Internet calls into the internal network. That external device knows only the public information, meaning it knows the address of the NAT router not the internal address of the device it is seeking. In this case there needs to be some rule that tells the NAT router what to do with the message and resolve the address so that it can route it to the appropriate internal address. Without a mechanism for doing this, the router would simply discard the message and disallow any connections. Firewalls are meant to stop unknown connections from getting through, so obviously, if a connection can't be resolved the best security is to not let it through the firewall at all.

Different types of mechanisms are used to deal with this issue, the most common being what is referred to as a "Software Perimeter Network" and another technique called "port forwarding." In the first case, the NAT supports simple rules that tell it what to do in specific instances, such as "pass all incoming connection requests to the device with address XXX.XXX.X.X." In port forwarding, the NAT router passes incoming connection requests to different devices on the internal network depending on the type of connection, itself, such as web connections or email connections. If there are multiple devices on the internal network to which a certain type of connection from outside may or may not need to be established, then the rules break down and neither type of mechanism will do the trick.

These are the challenges to be overcome by all peer-to-peer communications, including VoIP. The challenge is industry wide in scope and the issue is definitely non-trivial in magnitude.

For more information about NAT, please see the following and its subsections, Network Address Translation.

For more information about NAT Traversal, please see NAT Traversal and also, under the RTC Client API documentation, Firewalls and NATs.

For an overview of Voice over IP services, please see VoIP Technology Overview and also, VoIP Technology and Protocol Overview.

NAT Traversal and VoIP

VoIP also adds a new wrinkle to the problem of NAT Traversal. Conventional VoIP protocols only deal with the signaling portion of the telephone call. The audio portion is handled by another protocol and the port used for audio can be random. A NAT router might be able to handle the signaling traffic, the SIP protocol, but it has no way of knowing that the audio traffic is related to the signaling traffic and therefore, should be passed to the same device that it is passing the signaling traffic. Hence, audio traffic is not translated properly between the address spaces. You will get into a situation where things seem to be working at first. The caller and the call receiver appear to be making contact. The phone rings and the caller hears a ringing feedback. However, once the receiver picks up the phone the audio may not be present. Either the caller will not hear the person being called and you will have a one-way audio situation or both caller and call receiver cannot hear each other and you will have a two-way audio loss situation.

Bb330896.4f2fbe75-6217-4ccb-abc7-e84c884cb8a2(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "Normal call path, SIP and Media":

Client A registers with a Registrar, which is usually a SIP Proxy server as well.
Client A makes a call to Client B using one or more SIP Proxy servers that find and route the signaling message to Client B.
Once a connection has been negotiated, media is transferred directly between Client A and Client B.
There are two media streams, since RTP is one-directional. One media stream runs from Client A to Client B. The other media stream is going from Client B to Client A.

Why doesn't NAT know about the media portion of the call? Many of the communication parameters in SIP, the signaling protocol, are transmitted within the SIP message itself, such as the IP address and port numbers that are to be used for both signaling and for media transport. A SIP device behind a NAT does not know much about how it will be perceived on the Internet. All it knows is its own IP address and port numbers to be used in SIP. A VoIP device is relying on the NAT to do the translation and to do it correctly. However, the NAT only translates the address information in the IP header. It does not translate the IP address and port information inside the SIP packet. This is the information though that will be used by the receiving device. The receiving device (including the proxy servers along the way) will be using the internal IP addresses and ports that they obtained from inside the SIP packet for the media streams. That information is wrong and the audio connection cannot be made.

Also, when the device initiates a session, making it available to send out or receive calls, it registers its current location with a SIP Registrar. However, again, the information it has is the private, internal IP address and ports if the client doesn't use any sort of NAT traversal technique. In such a case, it will be registering with it's private IP address and port. This is the unroutable information that is given to the SIP Registrar and so when calling applications contact the Registrar to get a valid address for the device, meaning a public Internet address, they get the unroutable one instead and the call fails.

The registration will also fail because the Registrar may not be able to contact the device back through the firewall since it is behind a NAT.

Even if a connection were made, the same situation arises when it comes time to transmit and receive the media streams. When the SIP Invite message starts, the devices negotiate a common media. Initial negotiation is performed by the Session Description Protocol (SDP) to convey information about the media streams, the addresses to use, the codec types, bandwidth requirements and so on. The problem is that the SDP conveys the wrong information as well, if no NAT discovery and traversal techniques are used by the VoIP client beforehand. So during that negotiation, again we have the situation where the client only knows the private, internal IP address of the device and does not have therefore, a valid, routable address and port that can be used in peer-to-peer communications.

With NAT, another hurdle to overcome is the need to refresh the mapping of the NAT. It is done for security reasons, so that an unused connection won't be allowed to hang around and possibly be misused by an outside attacker. However, for VoIP, it is possible that a connection may be dropped simply because that connection wasn't being used frequently enough to keep the connection alive. VoIP devices and applications must refresh the connection to keep their NAT mapping alive and viable for use to prevent a call or incoming call from being dropped.

Bb330896.953c5646-ce0a-427b-8eff-a566339b681e(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "Firewall with NAT, no NAT Traversal used by clients behind wall":

Client A, inside the firewall, makes a call but uses its internal, private IP address and port, in the SIP message. Client B receives the information, but replies using the private IP:port, not the address mapping used by NAT. The address is not valid and the call fails. (It likely does not even reach the firewall at all.)
Client C has the correct mapped address for Client A and tries to make a call to Client A. The call still fails since the Firewall rejects an unsolicited inbound UDP message.
Client A can call out, but Clients B & C cannot call in.
Therefore, Client A cannot receive incoming calls or complete outgoing calls. It must determine and use the address mappings and also find a way to allow incoming unsolicited calls.

A final issue to be dealt with is that of the nature of firewalls. Firewalls will allow a device to call out but not allow an unknown to call in. It will block unknown, uninvited UDP traffic. UDP is the usual means by which all Voice over IP data is sent. This is only good security, of course. . Unknown, unrequested connections must be regarded with suspicion and typically, this is good practice for most network issues.

However, the nature of VoIP, and peer-to-peer connections, is just the opposite. At any time, a phone call may come in to a VoIP device and from an unknown caller, too boot. It is only after the caller is allowed in, and the device is ringing, so to speak, that the device being called decides whether or not to accept the connection.

Ways of Dealing with NAT Traversal in VoIP

There are two ways of dealing with NAT Traversal:

Ignore the problem. Don't use NAT at all. Only use public Internet IP addresses with your VoIP device.
Find a workaround to both the NAT Traversal issue and the Firewall UDP blocking problem.

The first solution is self-explanatory. It is also unlikely to solve the problem since VoIP is becoming more and more prevalent and Consumer and Enterprise level networks more common. We will focus on the second alternative.

There are several methods for dealing with NAT Traversal with varying success rates and just as varying implementation related issues.

Smart Firewall using an Application Layer Gateway (ALG)

The most comprehensive solution is one that is transparent to the VoIP device. If a firewall was simply aware of how to handle VoIP traffic, then calls could proceed simply as if they were public IP peer-to-peer communication even if they had to cross a firewall and a NAT. The role of the Firewall is to protect the network from unauthorized sources. Therefore, the Firewall blocks traffic based on three types of information. These include the source address, the destination address and the traffic type. The Firewall also makes decisions based on the direction of traffic flow, meaning that it is easier to do outbound connections than it is to do inbound. The Firewall would prefer to only allow a session initiated from a device on the trusted, private network.

Bb330896.44fb3b15-0e32-4889-959f-65bf9e813461(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "Enhanced Firewall with NAT, SIP aware Application Layer Gateway (ALG)":

An enhanced firewall/NAT understands the needs of VoIP, the signaling messages and their relationship with the media streams.
SIP Proxy/Registrar server not shown in this diagram.

An Application Layer Gateway is an enhanced or smart Firewall/NAT. The ALG understands the signaling mechanisms and their relationship with the media transport streams. The ALG processes signaling and media streams, modifying the signal streams so that the public IP addresses and ports are used, for both signaling and media transport. The peer-to-peer connection now has the correct and routable IP address and ports that it needs for both the signaling side and the media transport part of the connection.

This method requires that you update your Firewall/NAT to one that supports ALG functionality. It also requires more advanced configuration and management skills. Therefore, you cannot count on all VoIP devices behind a NAT to be behind an ALG. Larger corporations are likely to adopt ALGs faster than smaller, less staffed businesses.

Static Mapping using Fixed IP addresses and ports for clients

With Static Mapping, the VoIP device is configured manually with the public IP addresses and ports that the NAT will use for signaling and media. The NAT is also manually configured to use static mappings or bindings for each client. This gives the client VoIP device a fixed IP address and fixed ports so that it can receive SIP and media transport traffic without having to worry about the NAT dynamically configuring the connection or timing out and removing the mapping.

This method is only suitable for very small networks and ones where there is a great deal of experience on the part of the network support staff in configuring and managing NATs and Firewalls.

UPnP will likely supersede this static mapping method as it grows in popularity.

Tunneling

Tunneling is a technique that creates literally a tunnel through the existing Firewall/NAT. With this method, there is a new server within the private network and another in the public network. SIP traffic is passed between the two. The external server modifies the signaling to reflect the external server's outbound port details. This allows the VoIP device and VoIP mechanism of SIP Proxy servers, Registrars and so forth to make both outgoing calls and accept incoming calls. The routing just happens between the two tunnel points rather than between the two VoIP devices, which potentially cause additional delays in the media path and affect voice quality.

Bb330896.86645be6-9b96-4ea7-ae81-894f59fb5802(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "Tunneling":

Client A calls the regular SIP Proxy/Registrar, except that the tunnel/firewall configuration will modify the SIP message such that the tunnel servers will be the designated endpoints and therefore a relay for signaling flow and media flow.
All traffic travels through the tunnel between the tunnel servers, the internal server and the external tunnel server, who communicate with each other directly.
Since the tunnel servers are both in the media flow, they are handling the mapping and exposing of real, public addresses and ports. As the source/destination, they can provide both source and destination IP:port information.
They relay both the signaling and the media sent by Client A to Client B and vise versa.

The Firewall does require some reconfiguring and changes to the Firewall's security policy, which will create additional security risks. In particular, since the tunnel through the Firewall/NAT is generally not encrypted, the external server side of the tunnel is a point of vulnerability.

Smart Proxy Servers and Session Border Controllers (SBCs)

This solution is often considered two different solutions based on who you talk to, which may have more to do with competing companies trying to differentiate their products. There is enough debate to warrant discussing both methods. Essentially, both methods are the same, or at least a very similar, workaround to the NAT Traversal issue.

In essence, this solution involves having a smart proxy server, one that is capable of reading and fixing up the internal IP addresses and ports to match up with the actual public IP address and port used by the NAT. This smart SIP Proxy server or Session Border Controller (SBC) sits outside the Firewall/NAT and receives the signaling traffic. It then replaces the source IP address and port, the private unroutable address, with either the exposed public IP address and port used by the NAT, or its own IP address and port. In the latter case, it will keep track of the mapping and act as a relay between the calling VoIP device and the receiver. It does the same procedure with the destination IP address and port, having the receiver relay any return traffic through the smart proxy/SBC.

Bb330896.06b846ca-4531-4a68-926b-dca7e455d9a3(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "Session Border Controller/Smart Proxy":

Client A calls the SBC/Smart Proxy as it might a regular SIP Proxy/Registrar, except that the SBC will modify the SIP message such that it will be the relay for signaling flow and media flow.
The SBC sits in the media flow and possibly the SIP flow.
Each SBC server, one for signaling and one for media, can be part of a combination unit in the same hardware device, so that they can talk to each other and be highly integrated.
Signaling streams can be passed along to in-between proxies by the SBC on the way to the receiver, Client B, just as with a normal SIP proxy, except the SBC is part of the transmission path.
The media SBC acts as a relay for the two media streams.
Since the SBC is in the media flow, it is the actual destination and can provide both source and destination IP:port information. It relays the media sent to it by Client A to Client B and vise versa.

The same situation happens with the media stream. It goes through the proxy similar to a relay, where the smart media proxy will be fixing up the addresses and making sure the media is routed correctly.

There are wrinkles of course.

In one implementation, the SIP Proxy is modified to be the smart proxy as well, so that the Firewall/NAT configuration does not need to be altered. This smart proxy server is then responsible for fixing up the IP addresses and ports and assigning a media relay for the media transport. In order to reduce any media relay delays and loss in Quality of Service (QoS), this implementation puts the smart SIP proxy server near the network and distributes media relays throughout the Internet. It would then assign the relay according to geographic location and load balancing issues. It also requires symmetric signaling and media in the SIP, or rather VoIP, devices. It keeps the current NAT mapping alive by periodically refreshing the connection, just as a STUN implementation does.

Another implementation puts an SBC in the path of the signaling stream, before the stream reaches the SIP Proxy server. In this situation, the SBC is a relay, modifying the IP addresses and ports so that all media is redirected through it. It then handles the relay back to the caller or on to the SIP proxy server. We could refer to SIP proxy servers as Call Agents, for a more general discussion of the process. However, Call Agents generally refer to multiple protocol systems with a gateway between VoIP and PSTN scenarios.

The SBC can be used just for signaling traffic or just for media traffic or for both, with both "servers" part of the same server even though they are effectively two different entities that can communicate with each other.

Both implementations need to keep the NAT mapping alive and both may have delay and QoS issues.

Universal Plug and Play (UPnP)

UPnP is designed to allow simple configuration of networks without having to be an expert. It lets client applications discover and configure network components, including NATs and Firewalls, if the systems are UPnP capable.

This allows VoIP devices to discover and use the external IP address and port that the NAT selects for signaling and media transport. The VoIP device can then use this information when identifying itself in the SIP header and the SDP. The call can then be established using the correct publicly routable IP address and port and proceed smoothly through the NAT.

In Windows Embedded CE RTC 1.5, UPnP traversal support in RTC has been disabled. It is supported by default in earlier versions of RTC only. Instead, RTC 1.5 allows you greater control and access to the NAT mappings through the Port Manager APIs, including two new additions to the interface set, IRTCPortManager3 and IRTCProfilePortManagement. For more information, please see Using STUN with the RTC Client API.

For more information on UPnP, please see Universal Plug and Play (UPnP).

Simple Traversal of UDP through NAT (STUN)

The STUN protocol will allow a client to discover if it is behind a NAT and determine what kind of NAT service as well. STUN will only work with three of the four types of NATs and depending on implementation server, may not address the potential need to support TCP-based SIP devices, just UDP based ones. Most STUN servers these days should be including TCP support as well, but it is always best to check first.

The STUN client, not necessarily part of the VoIP application but assumed to be part of the VoIP device in some way if STUN is used, generates STUN requests and receive responses from the STUN server. The STUN server is generally outside the Firewall/NAT so that it is part of the public Internet. The STUN client sends exploratory STUN messages to the STUN server. The server uses those messages to determine the public IP address and ports used and then, informs the client. The STUN server does not sit in the actual path of the signaling or media data streams, like a relay system, SBC or TURN system.

Bb330896.71c4e305-e54b-40d8-b630-71c5ee096e58(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "STUN server used":

Client A queries the STUN server for the exposed, public IP address and port mapped by the NAT.
Client A then uses that information in registering itself and making calls beyond the firewall.
The SIP Proxy can route the signaling correctly to Client B, with the correct symmetrical UDP return path back to Client A (if the NAT is not a Symmetric NAT.)
Again, STUN can only provide the source IP address and port of Client A (mapped), not the ultimate destination IP address and port of Client B.
For the query, the STUN server is the destination, so that is the only destination IP:port it knows.
The media streams, not shown, are peer-to-peer, now that Client B knows the correct Client A IP address and port to send media and vise versa.

Using STUN, a device can find out its public IP address and port and use that information in its configuration, before it begins a session or registers with a Registrar. STUN relies on the fact that once the outgoing port has been mapped for the STUN server traffic it can be used in the reverse direction. This means that the VoIP device calling that same IP address and port mapping, that same source IP address and port obtained by STUN, will be able to use the mapping in the reverse direction and reach the client inside the NAT.

This is the case for all but the most restrictive kind of NAT, Symmetric NAT.

Of the four types of NAT, those being Full Cone, Restricted Cone, Port Restricted Cone and Symmetric, the last is the most restrictive. That type of NAT uses all four components of the tuple <source IP address, source port, destination IP address, destination port> in order to create a binding or mapping between the internal address and port and the external, public address and port. Since the STUN server is not directly in the path between source and destination, the destination address and port will be different. Therefore, a different mapping will result when the VoIP device makes the call to the real destination.

Still, using Symmetric UDP and STUN, a VoIP device behind a firewall/NAT can use the Port Manager API of the RTC Client API to do effective NAT Traversal in the other three NAT scenarios.

For more information on how to use the RTC Client API with a STUN server, please see Using STUN with the RTC Client API as well as Configuring RTC for Symmetric UDP for NAT Traversal.

Traversal Using Relay NAT (TURN) or STUN relay server

TURN, also known more and more frequently as a STUN relay server, was designed to deal with the issue of Symmetric NATs. In this situation, a TURN server is inserted into the media path or into both the media and signaling path, though that is less common. The situation can happen only if and because TURN was envisioned to be part of larger combination of TURN server plus SIP Proxy server (and/or Registrar) and/or DNS server created by the service provider. TURN determines the address and port information, but passes that information on to a SIP Proxy for signaling control. Session Border Controllers do much the same thing. The server can be in either the customer's Perimeter Network or in the Service Provider network.

Bb330896.3742fc26-5841-4d41-8f7b-edaab1e1c6fe(en-US,WinEmbedded.60).gif

The following bulleted items refer to the above figure, "TURN server used":

Client A queries the TURN server for the exposed, public IP address and port mapped by the NAT, and sets up its links to the TURN server for media relay and TURN messages, bindings and authentication.
Client A then uses that information in registering itself and making calls beyond the firewall.
The TURN server sits in the media flow and possibly the SIP flow (or just the initial part of the process for determining the source and destination information, and then provides that information to a related SIP Proxy. Else, it will give the information to Client A for use with a SIP Proxy server.)
Signaling streams can be passed to the appropriate SIP proxy by either the internal SIP Proxy associated/linked with the TURN combination system or by Client A, while the TURN server, itself, simply acts a relay for the media streams.
The SIP Proxy can route the signaling correctly to Client B, with the correct symmetrical UDP return path.
Since TURN is in the media flow, it is the actual destination and can provide both source and destination IP:port information. It relays the media sent to it by Client A to Client B and vise versa.
Client A is still linked to the TURN server, receiving TURN messages from it, including authentication related calls, error codes for failed connections and data indicators for possible new connections.

The TURN-enabled SIP client, like the STUN client, sends an exploratory packet to the TURN server, again allowing the server to determine the public IP address and port being used for this session. This information is then used in the SIP call establishment messages as well as for the media streams with the TURN server acting as a relay. The TURN server can handle the media flow as a relay and potentially also the signaling flow if it has been designed to do so as well as part of a larger combination unit. Typically, if it intercepts the signaling path as well, it will likely just pass the signaling information on to a SIP proxy server linked with it, with the correct IP addresses and ports this time, and let the proxy server establish the signaling path independently. Implementations can vary widely, especially as the ability to host several servers, including both SIP proxy servers and media relays, makes combination units more common across network implementations.

Since there is no change in destination address, even a Symmetric NAT mapping is still valid. TURN is not yet widely implemented and many details on what it supports and what it should support are still being discussed, as well as the overhead and time lag costs associated with a TURN relay and authentication system. TURN also may not work with other systems that modify the packets of the messages, including TURN messages. It may interpret such interference as an internet attack.

Currently, the RTC Client API does not support TURN, so a VoIP application would have to implement TURN functionality and messaging on its own and bypass using RTC altogether.

Interactive Connectivity Establishment (ICE)

ICE is a framework rather than a protocol like STUN and TURN. In fact, ICE seeks to pull together several different techniques, including STUN and TURN, so that a client can investigate its environment and choose the best method for peer-to-peer communications.

ICE clients will likely rely on one or more STUN and TURN servers as well as SIP extensions in order to achieve connections. Using the information a client obtains, it will negotiate to find the best connection path to its destination as well as alternatives should that connection fail or drop below some client standard for latency, jitter and other QoS issues.

If the device being called is not an ICE-enabled client as well, then the call setup process will revert to a conventional SIP call requiring NAT Traversal by another mechanism.

Currently, the RTC Client API does not support ICE, so a VoIP application would have to implement ICE functionality and messaging on its own and bypass using RTC altogether.

"Early UDP" Dealing with UDP blocking in "early media" scenarios

Finally, even if you can now find a way to make NAT Traversing phone calls with VoIP, how do you just sit around and wait for phone calls when the Firewall is going to be blocking UDP connections? Again, if you have initialized your session and are listening, you've already established a mapping with the NAT and are keeping the mapping alive with refresh calls to the Registrar. However, this final case is for "early media" scenarios, where a mapping has not yet been established on the NAT.

Early UDP on the media side is required in order to receive media without first sending any media. During an early media scenario, you are not sending any media - just receiving it. Once the call is fully setup, you are sending media now as well and are no longer in need of an "Early UDP" situation.

SDP information is often exchanged in provisional packets such as 183 and PRACKs for "early media scenarios." In this situation, a device can start getting media from the other side before the call ever gets connected (the INVITE is accepted.) Examples of usage might include getting announcements or hearing special ring tones for that caller.

If the VoIP device is behind a NAT, then until a mapping is established on the NAT, no traffic will be allowed through the Firewall from the outside. Therefore, if an RTC device needs to support early media from behind a NAT, then the "Early UDP" feature must be turned on first.

RTC's media stack sends "Early UDP" packets to the remote IP address and port. In doing this, it is essentially punching holes through the NAT. Once that is done, an RTC-enabled device can start receiving early media. This early UDP behavior is off by default and is registry configurable.

For information on the registry key, EnableEarlyUDPPackets, please see RTC Client API General Registry Settings.

NAT Traversal and VoIP

NAT Traversal and VoIP

Ways of Dealing with NAT Traversal in VoIP

There are two ways of dealing with NAT Traversal:

Smart Firewall using an Application Layer Gateway (ALG)

Static Mapping using Fixed IP addresses and ports for clients

Tunneling

Smart Proxy Servers and Session Border Controllers (SBCs)

Universal Plug and Play (UPnP)

Simple Traversal of UDP through NAT (STUN)

Traversal Using Relay NAT (TURN) or STUN relay server

Interactive Connectivity Establishment (ICE)

"Early UDP" Dealing with UDP blocking in "early media" scenarios

See Also

Other Resources

Additional resources