Path Maximum Transmission Unit (PMTU) Discovery

A version of this page is also available for

Windows Embedded CE 6.0 R3

4/8/2010

Path maximum transmission unit (PMTU) discovery is described in RFC 1191. When a connection is established, the two hosts involved exchange their TCP maximum segment size (MSS) values. The smaller of the two MSS values is used for the connection. Historically, the MSS for a host has been the MTU at the link layer minus 40 bytes for the IP and TCP headers. However, support for additional TCP options, such as timestamps, has increased the typical TCP+IP header to 52 or more bytes.

The following illustration shows the relationship between the MTU, the MSS, and the datagram.

Aa916746.5ec64870-d42b-4cc0-a453-2558957a0916(en-us,MSDN.10).gif

When TCP segments are destined to a non-local network, the "do not fragment" bit is set in the IP header. Any router or media along the path can have an MTU that differs from that of the two hosts. If a media segment has an MTU that is too small for the IP datagram being routed, the router will attempt to fragment the datagram accordingly. It will then find that the "do not fragment" bit is set in the IP header. At this point, the router should inform the sending host that the datagram cannot be forwarded further without fragmentation. This is done with anICMP Destination Unreachable message. Most routers will also specify the MTU that is allowed for the next hop by putting the value for it in the low-order 16 bits of the ICMP header field that is labeled unused in the ICMP specification. See RFC 1191, section 4, for the format of this message. Upon receiving this ICMP error message, TCP adjusts its MSS for the connection to the specified MTU minus the TCP and IP header size, so that any further packets sent on the connection will be no larger than the maximum size that can traverse the path without fragmentation.

The minimum MTU permitted by RFCs is 68 bytes, and Windows Embedded CE TCP enforces this limit.

Some non-compliant routers may silently drop IP datagrams that cannot be fragmented or the routers may not correctly report their next-hop MTU. If this occurs, it may be necessary to change the settings in the PMTU detection algorithm. The registry entry EnablePMTUDiscovery completely enables or disables the PMTU discovery mechanism. When PMTU discovery is disabled, an MSS of 536 bytes is used for all non-local destination addresses.

The PMTU between two hosts can be discovered manually using the ping command with the -f (do not fragment) switch, as follows.

ping number of pingssizedestination ip address

As shown in the example below, the size parameter can be varied until the MTU is found. Note that the size parameter used by ping is the size of the data buffer to send, not including headers. The ICMP header consumes 8 bytes, and the IP header would normally be 20 bytes. In the case below (Ethernet), the link layer MTU is the maximum-sized ping buffer plus 28, or 1500 bytes.

C:\>ping -f -n 1 -l 1472 10.99.99.10
Pinging 10.99.99.10 with 1472 bytes of data:
Reply from 10.99.99.10: bytes=1472 time<10ms TTL=128
C:\>ping -f -n 1 -l 1473 10.99.99.10
Pinging 10.99.99.10 with 1473 bytes of data:
Ping transmit failed. Error code 11009.

In the example shown above, the IP layer returned an ICMP error message that ping interpreted. If the router had been a black hole router, ping would simply not be answered once its size exceeded the MTU that the router could handle. Ping can be used in this manner to detect such a router.

A sample ICMP destination unreachable error message is shown here.

_****************************************************************************_
	Src Addr       Dst Addr     Protocol  Description
	10.99.99.10    10.99.99.9   ICMP      Destination Unreachable: 10.99.99.10   
	See frame 3
	+ FRAME: Base frame properties
	+ ETHERNET: ETYPE = 0x0800 : Protocol = IP:  DOD Internet Protocol
	+ IP: ID = 0x4401; Proto = ICMP; Len: 56
	  ICMP: Destination Unreachable: 10.99.99.10   See frame 3
		  ICMP: Packet Type = Destination Unreachable
		  ICMP: Unreachable Code = Fragmentation Needed, DF Flag Set
		  ICMP: Checksum = 0xA05B
		  ICMP: Next Hop MTU = 576 (0x240)
		  ICMP: Data: Number of data bytes remaining = 28 (0x001C)
		  ICMP: Description of original IP frame
			  ICMP: (IP) Version = 4 (0x4)
			  ICMP: (IP) Header Length = 20 (0x14)
			  ICMP: (IP) Service Type = 0 (0x0)
				  ICMP: Precedence = Routine
				  ICMP: ...0.... = Normal Delay
				  ICMP: ....0... = Normal Throughput
				  ICMP: .....0.. = Normal Reliability
			  ICMP: (IP) Total Length = 1028 (0x404)
			  ICMP: (IP) Identification = 45825 (0xB301)
				  ICMP: Flags Summary = 2 (0x2)
				  ICMP: .......0 = Last fragment in datagram
				  ICMP: ......1. = Cannot fragment datagram
			  ICMP: (IP) Fragment Offset = 0 (0x0) bytes
			  ICMP: (IP) Time to Live = 32 (0x20)
			  ICMP: (IP) Protocol = ICMP - Internet Control Message
			  ICMP: (IP) Checksum = 0xC91E
			  ICMP: (IP) Source Address = 10.99.99.9
			  ICMP: (IP) Destination Address = 10.99.99.10
			  ICMP: (IP) Data: Number of data bytes remaining = 8 (0x0008)
			  ICMP: Description of original ICMP frame
				  ICMP: Checksum = 0xBC5F
				  ICMP: Identifier = 256 (0x100)
				  ICMP: Sequence Number = 38144 (0x9500)
	00000:  00 AA 00 4B B1 47 00 AA 00 3E 52 EF 08 00 45 00   ...K.G...>R...E.
	00010:  00 38 44 01 00 00 80 01 1B EB 0A 63 63 0A 0A 63   .8D........cc..c
	00020:  63 09 03 04 A0 5B 00 00 02 40 45 00 04 04 B3 01   c....[...@E.....
	00030:  40 00 20 01 C9 1E 0A 63 63 09 0A 63 63 0A 08 00   @. ....cc..cc...
	00040:  BC 5F 01 00 95 00                                 ._....

This error was generated by using ping-f –n 1 -l 1000 on an Ethernet-based host to send a large datagram across a router interface that only supports an MTU of 576 bytes. When the router tried to place the large frame onto the network with the smaller MTU, it found that fragmentation was not allowed, so it returned the error message indicating the largest datagram that could be forwarded is 0x240, or 576 bytes.

See Also

Concepts

Transmission Control Protocol (TCP)