It might sound like an obscure branch of theoretical physics, and possibly have some similar-looking enthusiasts, but the mysteries of cookies are quick to penetrate.
Cookies enhance Web requests
http://www.netscape.com/newsref/std/cookie_spec.html), the most official specification is easily readable here:
http://www.cis.ohio-state.edu/htbin/rfc/rfc2109.html. Most of the basic points of the specification are covered in the following sections.
Each URL or HTTP request made by a browser user is turned into lines of text called headers for sending to the web server. When the web server issues a response, the same happens. Cookies are just extra header lines containing cookie-style information. This is all invisible to the user. So user requests and web server responses may occur with or without invisible cookies riding piggyback.
Anatomy of a Cookie
A cookie has the following attributes:
The value part of a cookie is a string of any characters. That string must follow the rules for URLs which means the
'null' values for cookies, but zero length strings are possible.
If two different web sites are viewed in a browser, they shouldn't be able to affect each other's cookies. Cookies have a domain property that restricts their visibility to one or more web sites.
Consider an example URL
http://www.altavista.yellowpages.com.au/index.html. Any cookies with domain
'www.altavista.yellowpages.com.au' are readable from this page. Domains are also hierarchical—cookies with these domains: '.
yellowpages.com.au' and '.
com.au' could all be picked up by that URL in the browser. The leading full stop is required for partial addresses. To prevent bored University students making a cookie visible to every web page in the world, at least two domain portions must be specified.
In practice, the domain attribute isn't used much, because it defaults to the domain of the document it piggybacked into the browser on (very sensible), and because it's unlikely that you would want to share a cookie with another web site anyway.
In a similar manner to domains, the path attribute of a cookie restricts a cookie's visibility to a particular part of a web-server's directory tree. A web page such as
http://www.microsoft.com/jscript might have a cookie with path '
/jscript', which is only relevant to the JScript pages of that site. If a second cookie with the same name and domain also exists, but with the path ''
(equivalent to '/'), then the web page would only see the first cookie, because its path is a closer match to the URL's path.
Paths represent directories, not individual files, so '
/usr/local/tmp' is correct, but '
/usr/local/tmp/myfile.htm' isn't. Forward slashes ('/' not '\') should be used. Trailing slashes as in '
/usr/local/tmp/' should be avoided. That is why the top-level path is ''
(a zero-length string), not '/'.
The name, domain and path combine to fully identify an individual cookie.
The expiry time provides one of two cleanup mechanisms for cookies (see the next section for the other). Without such mechanisms, cookies might just build up in the browser forever, until the user's computer fills up.
The expiry time is optional. It is a moment in time. Without one, a cookie will survive only while the browser is running. With one, a cookie will survive even if the browser shuts down, but it will be discarded at the time dictated. If the time passes when the browser is down, the cookie is discarded when it next starts up. If the time dictated is zero or in the past, the cookie will be discarded immediately.
This is a true/false attribute, which hints whether the cookie is too private for plain URL requests. The browser should only make secure (SSL) URL requests when sending this cookie. This attribute is less commonly used.
Browser Cookie restrictions
Browsers place restrictions on the number of cookies that can be held at any one time. The restrictions are:
20 cookies maximum per domain.
4096 bytes per cookie description.
- 300 cookies overall maximum.
RFC 2109 says at least these maximums. Netscape's specification and browsers say at most these maximums, in an attempt to guarantee that all your disk space won't be consumed.
The Netscape file that the cookie data resides in when the browser is shut down is called
cookies.txt on Windows and Unix, and resides in the Netscape installation area (under each user for Netscape Communicator). It is a plain text file, automatically generated by the browser on shutdown, similar to the
prefs.js file. The user can always delete this file if the browser is shutdown, which removes all cookies from their system. An example file:
# Netscape HTTP Cookie File # http://www.netscape.com/newsref/std/cookiespec.html # This is a generated file! Do not edit. www.geocities.com FALSE / FALSE 937972424 GeoId 2035695874900187870 .linkexchange.com TRUE / FALSE 942191819 SAFE_COOKIE 3425efc81808cebe www.macromedia.com FALSE FALSE 877627211 plugs yes
The large number in the middle is expiry time in seconds from 1 January 1970. From this example, you can see that most web sites set one cookie only, and then it only contains a unique ID. Web sites often use this ID to look up their own records on the visitor holding the ID.
The equivalent files for Internet Explorer are stored by default in the
C:\WINDOWS\COOKIES directory in .TXT files with the user's name. These are almost readable. Ironically, if you copy the files to a Unix computer, they are easily readable.