I'll answer my own question:
Static public content
Date: <current time>
Expires: <current time + one year>
Rationale: This is compatible with the HTTP/1.0 proxies and RFC 2616 Section 14: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
The Last-Modified
header is not needed for correct caching (because conforming user agents follow the Expires
header) but may be included for the end user consumption. Including the Last-Modified
header may also decrease the server data transfer in case user hits the Reload/Refresh button. If Last-Modified
header is added, it should reflect real data instead of something invented up. If you want to decrease server data transfer (in case user hits Reload/Refresh button) and cannot include real Last-Modified
header, you may add ETag
header to allow conditional GET (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26). If you already include Last-Modified
also adding ETag
is just waste. Note that Last-Modified
is clearly superior because it's supported by HTTP/1.0 clients and proxies, too. A suitable value for ETag
in case of dynamic pages is SHA-1 of the contents of the page/resource. Note that using Last-Modified
or ETag
will not help with the server load, only with the server outgoing internet pipe / data transfer rate.
Static non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=31536000, s-maxage=0
Vary: Cookie
Rationale: The Date
and Expires
headers are for HTTP/1.0 compatibility and because there's no sensible way to specify that the response is private, these headers communicate that the response may not be cached. The Cache-Control
header tells that this response may be cached by private cache but shared cache may not cache the response. The s-maxage=0
is added because private
may not be supported by all proxies that support Cache-Control
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 - I have no idea which proxies are broken). The max-age
is set to value of 60*60*24*365
(1 year) because the HTTP/1.1 specification does not define any upper limit for this parameter, I guess that this is implementation dependant. The Expires
headers SHOULD be limited to one year in the future, so using the same logic here should be okay. The Vary: Cookie
header is required because the session that is used to check if the visitor is allowed to see the content is transferred in a cookie; because the returned response depends on the cookie value the cache may not use cached response if cookie header is changed.
I might personally break the last part. By not including the Vary: Cookie
header I can improve caching a lot. For example: I have a profile image at http://example.com/icon/12
which is returned only for selected authenticated users. I have a visitor X
with session id 5f2
and I allow the image to that user. Visitor X
logs out and then later logs in again. Now X
has session id 2e8
stored in his session cookie. If I have Vary: cookie
, the user agent of X
cannot use the cached image and is forced to reload this to its cache. Because the content varies by Cookie, a conditional GET with last modification time cannot be used. I haven't tested if using ETag
could help in this case because in that case, the server response would be the same (match the SHA-1 ETag
computed from the contents of the response). Be warned that Internet Explorer (at least up to version 9) always forces conditional GET for resources that include Vary: Cookie
even if suitable response were already in cache (source: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx). This is because internal cache implementation of MSIE does not remember which Cookie it sent the first time so it cannot know if the current Cookie is the same one.
However, here's an example of a problem that is caused by dropping the Vary: Cookie
header to show why this is indeed required for technically correct behavior: see the example above and imagine that after X has logged out, visitor Y logs in with the same user agent (the user agent may have been restarted between X and Y, it does not matter). If Y views a page that includes a link to http://example.com/icon/12
then Y will see the icon embedded inside the page even though Y wouldn't be able to see the icon if X had not been using the same user agent previously. In my case I don't consider this a big enough problem because Y would be able to access the icon manually by inspecting the user agent cache regardless of possibly added Vary: Cookie
. However, this issue may prevent Y from noticing that he wouldn't technically have access to this content (this may be important e.g. if Y is co-authoring the content). If the content is considered sensitive, the server must send no-store
regardless of the problems caused by this Cache-Control
directive.
Here too, adding Last-Modified
header will help with users hitting Reload/Refresh button (see discussion above).
Volatile public content
Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified
time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified
cannot be used, ETag
may be used as a replacement (see discussion above). It's critical to use Last-Modified
to allow conditional GET with HTTP/1.0 compatible clients.
If the content may be delayed even slightly, then Expires
, max-age
and s-maxage
[sic] should be adjusted suitably. For example, adding 5 seconds to those might help a lot for highly popular site, as suggested by symcbean's answer. Note that unlike conditional GET, increasing the expiry time will decrease server load instead of just decreasing server outgoing data traffic (because the server will see less requests in total).
Volatile non-public content
Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Vary: Cookie
Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified
time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified
cannot be used, ETag
may be used as a replacement (see discussion above). It's critical to use Last-Modified
to allow conditional GET with HTTP/1.0 compatible clients. Also note that Cache-Control
must not include no-cache
, must-revalidate
or no-store
because using any of these directives will break the back button in at least one user agent. However, if the content the server is transferring contains sensitive material that should not be stored in permanent storage, the no-store
flag MUST be used regardless of breaking the back button. Warning: note that the use of no-store
cannot prevent sensitive material ending up on the hard disk without encryption if the operating system has swapping enabled and the swap is not encrypted! Also note that using no-store
makes very little sense unless the connection is encrypted (HTTPS/SSL).