Secure Cross-Domain Communication in the Browser
by Danny Thorpe
Summary: A shopper can walk into virtually any store and make a purchase with nothing more than a plastic card and photo ID. The shopper and the shopkeeper need not share the same currency, nationality, or language. What they do share is a global communications system and global banking network that allows the shopper to bring their bank services with them wherever they go and provides infrastructure support to the shopkeeper. What if the Internet could provide similar protections and services for Web surfers and site keepers to share information? (9 printed pages)
Developing applications that live inside the Web browser is a lot like window shopping on Main Street: lots of stores to choose from, lots of wonderful things to look at in the windows of each store, but you can't get to any of it. Your cruel stepmother, Frau Browser, yanks your leash every time you lean too close to the glass. She says it's for your own good, but you're beginning to wonder if your short leash is more for her convenience than your safety.
Web browsers isolate pages living in different domains to prevent them from peeking at each other's notes about the end user. In the early days of the Internet, this isolation model was fine because few sites placed significant application logic in the browser client, and even those that did were only accessing data from their own server. Each Web server was its own silo, containing only HTML links to content outside itself.
That's not the Internet today. The Internet experience has evolved into aggregating data from multiple domains. This aggregation is driven by user customization of sites as well as sites that add value by bringing together combinations of diverse data sources. In this world, the Web browser's domain isolation model becomes an enormous obstacle hindering client-side Web application development. To avoid this obstacle, Web app designers have been moving more and more application logic to their Web servers, sacrificing server scalability just to get things done. Meanwhile, the end user's 2GHz, 2GB dumb terminal sits idle.
If personal computers were built like a Web browser, you could save your data to disk, but you couldn't use those files with any other application on your machine, or anyone else's machine. If you decided to switch to a different brand of photo editor, you wouldn't be able to edit any of your old photos. If you complained to the makers of your old photo editor, they would sniff and declare "We don't know what that other photo editor might do with your data. Since we don't know or trust that other photo editor, then neither should you! And no, we won't let you use 'your' photos with them, because since we're providing the storage space for those photos, they're really partly our photos."
You couldn't even find your files unless you knew first which application you created them with. "Which photo editor did I use for Stevie's birthday photos? I can't find them!"
And what happens when that tragically hip avant-garde photo editor goes belly up, never to be seen again? It takes all your photos with it!
Sound familiar? It happens to all of us every day using Internet Web sites and Web applications. Domain isolation prevents you from using your music playlists to shop for similar tunes at an independent online store (unrelated to your music player manufacturer) or at a kiosk within a retail store.
Domain isolation also makes it very difficult to build lightweight low-infrastructure Web applications that slice and dice data drawn from diverse data servers within a corporate network. A foo.bar.com subdomain on your internal bar.com corpnet is just as isolated from bar.com and bee.bar.com as it is from external addresses like xyz.com.
Nevertheless, you don't want to just tear down all the walls and pass around posies. The threats to data and personal security that the browser's strict domain isolation policy protects against are real, and nasty. With careful consideration and infrastructure, there can be a happy medium that provides greater benefit to the user while still maintaining the necessary security practices. Users should be in control of when, what, and how much of their information is available to a given Web site. The objective here is not free flow of information in all directions, but freedom for users to use their data where and when it serves their purposes, regardless of where their data resides.
What is needed is a way for the browser to support legitimate cross-domain data access without compromising end user safety and control of their data.
One major step in that direction is the developing standards proposal organized by Ian Hickson to extend xmlHttpRequest to support cross-domain connections using domain-based opt-in/opt-out by the server being requested. (See Resources.) If this survives peer review and if it is implemented by the major browsers, it offers hope of diminishing the cross-domain barrier for legitimate uses, while still protecting against illegitimate uses. Realistically, though, it will be years before this proposal is implemented by the major browsers and ubiquitous in the field.
This URL technique has been used by Web designers since iframes were first introduced into HTML, but uses are typically primitive, purpose-built, and hastily thrown together. What's worse, passing data through the iframe src URL can create an exploit vector, allowing malicious code to corrupt your Web application state by throwing garbage at your iframe. Any code in any context in the browser can write to the iframe's .src property, and the receiving iframe has no idea where the URL data came from. In most situations, data of unknown origin should never be trusted.
This article will explore the issues and solution techniques of the secure client-side cross-domain data channel developed by the Windows Live Developer Platform group.
An iframe is an HTML element that encapsulates and displays an entire HTML document inside itself, allowing you to display one HTML document inside another. We'll call the iframe's parent the outer page or host page, and the iframe's content the inner page. The iframe's inside page is specified by assigning a URL to the iframe's src property.
Even though the host cannot read the iframe element's src property, it can still write to it. The host page doesn't know what the iframe is currently displaying, but it can force the iframe to display something else.
Each time a new URL is assigned to the iframe's src property, the iframe will go through all the normal steps of loading a page, including firing the onLoad event.
We now have all the pieces required to pass data from the host to the iframe on the URL. (See Figure 1.) The host page in domain foo.com can place a URL-encoded data packet on the end of an existing document URL in the bar.com domain. The data can be carried in the URL as a query parameter using the ? character (http://bar.com/receiver.html?datadatadata) or as a bookmark using the # character (http://bar.com/receiver.html#datadatadata). There's a big difference between these two URL types which we'll explore in a moment.
Figure 1. iframe URL data passing
The host page assigns this URL to the iframe's src property. The iframe loads the page and fires the page's onLoad event handler. The iframe page's onLoad event handler can look at its own URL, find the embedded data packet, and decode it to decide what to do next.
That's the iframe URL data passing technique at its simplest. The host builds a URL string from a known document url + data payload, assigns it to the src property of the iframe, the iframe "wakes up" in the onLoad event handler and receives the data payload. What more could you ask for?
A lot more, actually. There are many caveats with this simple technique:
· No acknowledgement of receipt—The host page has no idea if the iframe successfully received the data.
· Message overwrites—The host doesn't know when the iframe has finished processing the previous message, so it doesn't know when it's safe to send the next message.
· Capacity limits—A URL can be only so long, and the length limit varies by browser family. Firefox supports URLs as long as 40k or so, but IE sets the limit at less than 4k. Anything longer than that will be truncated or ignored.
· Data has unknown origin—The iframe has no idea who put the data into its URL. The data might be from our friendly foo.com host page, or it might be evil.com lobbing spitballs at bar.com hoping something will stick or blow up.
· No replies—There's no way for script in the iframe to pass data back to the host page.
· Loss of context—Because the page is reloaded with every message, the iframe inner page cannot maintain global state across messages
Should we use ? or # to tack data onto the end of the iframe URL? Though innocuous enough on the surface, there are actually a few significant differences in how the browsers handle URLs with query params versus URLs with bookmarks. Two URLs with the same base path but different query params are treated as different URLs. They will appear separately in the browser history list, will be separate entries in the browser page cache, and will generate separate network requests across the wire.
URL bookmarks were designed to refer to specially marked anchor tags within a page. The browser considers two URLs with the same base path but with different bookmark text after the # char to be the same URL as far as browser history and caches are concerned. The different bookmarks are just pointing to different parts of the same page (URL), but it's the same page nonetheless.
The URLs http://bar.com/page.html#one, http://bar.com/page.html#two, and http://bar.com/page.html#three are considered by the browser to be cache-equivalent to http://bar.com/page.html. If we used query params, the browser would see three different URLs and three different trips across the network wire. Using bookmarks, however, we have at most one trip across the network wire; subsequent requests will be filled from the local browser cache. (See Figure 2.)
Figure 2. Cache equivalence of bookmark URLs (Click on the picture for a larger image)
For cases where we need to send a lot of messages across the iframe URL using the same base URL, bookmarks are perfect. The data payloads in the bookmark portion of the URL will not appear in the browser history or browser page cache. What's more, the data payloads will never cross the network wire after the initial page load is cached!
The data passed between the host page and the iframe cannot be viewed by any other DOM elements on the host page because the iframe is in a different domain context from the host page. The data doesn't appear in the browser cache, and the data doesn't cross the network wire, so it's fair to say that the data packets are observable only by the receiving iframe or other pages served from the bar.com domain.
Perhaps the biggest security problem with the simple iframe URL data-passing technique is not knowing with confidence where the data came from. Embedding the name of the sender or some form of application ID is no solution, as those can be easily copied by impersonators. What is needed is a way for a message to implicitly identify the sender in such a way that could not be easily copied.
There is another way, which takes advantage of the critical importance of domain name identity in the browser environment. If I can send a secret message to you using your domain name, and I later receive that secret as part of a data packet, I can reasonably deduce that the data packet came from your domain.
The only way for the secret to come from a third-party domain is if your domain has been compromised, the user's browser has been compromised, or my DNS has been compromised. All bets are off if your domain or your browser have been compromised. If DNS poisoning is a real concern, you can use https to validate that the server answering requests for a given domain name is in fact the legitimate server.
If the sender gives a secret to the receiver, and the receiver gives a secret to the sender, and both secrets are carried in every data packet sent across the iframe URL data channel, then both parties can have confidence in the origin of every message. Spitballs thrown in by evil.com can be easily recognized and discarded. This exchange of secrets is inspired by the SSL/https three-phase handshake.
Most of the problems associated with the iframe URL data passing technique boil down to reply generation. Acknowledging packets requires the receiver to send a reply to the sender. Exchanging secrets requires replies in both directions. Message throttling and breaking large data payloads into multiple smaller messages require receipt acknowledgement.
Figure 3. Message in a Klein Bottle (Click on the picture for a larger image)
So, how can the iframe communicate back up to the host page? Not by going up, but by going down. The iframe can't assign to anything in its parent because the iframe and the parent reside in different domain contexts. But the bar.com iframe (A) can contain another iframe (B) and A can assign to B's src property a URL in the domain of the host page (foo.com). foo.com host page contains bar.com iframe (A) contains foo.com iframe (B).
The host page can pass data to iframe A by writing a URL to A's src property. A can process the data, and send an acknowledgement to the host by writing a URL to B's src property. B wakes up in its onLoad event and passes the message up to its parent's parent, the host page. Voilà. Round-trip acknowledgement from a series of one-way pipes connected together in a manner that would probably amuse Felix Klein, mathematician and bottle washer.
To maintain global state in the bar.com context across multiple messages sent to the iframe, use two iframes with bar.com pages. Use one of the iframes as a stateless message receiver, reloading and losing its state with every message received. Place the stateful application logic for the bar.com side of the house in the other iframe. Reduce the messenger iframe page logic to the bare minimum required to pass the received data to the stateful bar.com iframe.
An iframe cannot enumerate the children of its parent to find other bar.com siblings, but it can look up a sibling iframe using window.parent.frames if it knows the name of the sibling iframe. Each time it reloads to receive new data on the URL, the messenger iframe can look up its stateful bar.com sibling iframe using window.parent.frames and call a function on the stateful iframe to pass the new message data into the stateful iframe. Thus, the bar.com domain context in browser memory can accumulate message chunks across multiple messages to reconstruct a data payload larger than the browser's maximum URL length.
Our goal is to groom this channel code into a reusable library, available to internal Microsoft partners as well as third party Web developers. While the code is running well in its current contexts, we still have some work to do in the area of self-diagnostics and troubleshooting; when you get the channel endpoints configured correctly, it works great, but it can be a real nightmare to figure out what isn't quite right when you're trying to get it set up the first time. The main obstacle is the browser itself—trying to see what's (not) happening in different domain contexts is a bit of a challenge when the browser won't show you what's on the other side of the wall.
Hardly 40 years ago, a shopper on Main Street USA had to go to considerable effort to convince a shopkeeper to accept payment. If you didn't have cash (and lots of it), you were most likely out of luck. If you had foreign currency, you'd need to find a big bank in a big city to exchange for local currency. Checks from out of town were rarely accepted, and store credit was offered only to local residents.
Today, shoppers and shopkeepers share a global communications system and global banking network that allows the shopper to bring their bank services with them wherever they go, and helps the shopkeeper make sales they otherwise might miss. The banking network also provides infrastructure support to the shopkeeper, helping with currency conversion, shielding from credit risk and reducing losses due to fraud.
Now, why can't the Internet provide similar protections and empowerments for the wandering Web surfer, and infrastructure services for Web site keepers? Bring your data and experience with you as you move from site to site (the way a charge card brings your banking services with you as you shop), releasing information to the site keepers only at your discretion. The Internet is going to get there; it's just a matter of how well and how soon.
Kudos to Scott Isaacs for the original iframe URL data-passing concept. Many thanks to Yaron Goland and Bill Zissimopoulos for their considerable contributions to the early implementations and debugging of the channel code, and to Gabriel Corverra and Koji Kato for their work in the more recent iterations. "It's absolute insanity, but it just might work!"
XMLHttpRequest 2, Ian Hickson
Anne van Kesteren's Weblog
About the author
Danny Thorpe is a developer on the Windows Live Developer Platform team. His badge says "Principal SDE," but he prefers "Windows Live Quantum Mechanic," as he spends much of his time coaxing stubborn little bits to migrate across impenetrable barriers. In past lives, he worked on "undisclosed browser technology" at Google, and, before that, he was a Chief Scientist at Borland and Chief Architect of the Delphi compiler. At Borland, he had the good fortune to work under the mentorship of Anders Hejlsberg, Chuck Jazdzewski, Eli Boling, and many other Borland legends. Prior to joining Borland, he was too young to remember much. Check out his blog at http://blogs.msdn.com/dthorpe.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.