1.3 Overview

The WebSocket Protocol [RFC6455] creates an asynchronous, bidirectional communication channel that works across existing network intermediaries such as web proxies and firewalls. A client uses HTTP [RFC2616] to communicate with a server and then both sides switch to using the WebSocket Protocol over the underlying protocol on which HTTP is layered, such as TCP or SSL over TCP. The goal is to first use HTTP to traverse network intermediaries and then use the established end-to-end underlying TCP/SSL channel for bidirectional application communication.

The WebSocket Protocol requires that all frames are masked by a random security key to avoid possible confusion with the HTTP protocol by intermediaries. Some intermediaries will continue to parse HTTP requests even if the beginning of the byte stream does not match the HTTP grammar. In the WebSocket Protocol, such intermediaries skip the frame header and interpret the application payload as an HTTP request. Such a deficiency allows an attacker to inject bad data as discussed in [RFC6455] section 10.3 by sending specially crafted HTTP requests as data through the WebSocket Protocol. Masking prevents such a deficiency.

However, masking can have a significant performance impact.  If the WebSocket Protocol is used in a controlled environment, such as within an enterprise network where there are no intermediaries or where intermediaries recognize the WebSocket Protocol, masking might not be needed. Turning off masking in such cases thus has a positive impact on the performance.

If the WebSocket Protocol is used by a sandboxed application, such as running in a browser where the sandbox only allows the application to communicate over HTTP, the cache-poisoning attack can have serious consequences if a malicious application can bypass the restrictions imposed by the sandbox. However, non-sandboxed applications that can use TCP directly can already perform the same actions, and therefore, disabling masking does not introduce additional risk.