06 July 2011

Caching in IIS, TMG and the browser - How it all fits together Part II - TMG proxy cache

In Part I - Browse Cache I explored how the local browser reduces the amount of internet calls and downloaded data by caching valid content.

Proxy Cache
The second part to this is the proxy server.  Most common is a client side proxy that allows them to browse the internet.  This keeps a cache that is used to fill all subsequent requests to the same content.  One significant difference from browser cache is that this is a common cache that can be called on by all users of the proxy. This is called forward caching

This same principal is used for reverse proxy.  Web application published to the internet can take advantage of the proxy cache to reduce calls to the web servers for cache-able content.  This is reverse caching.

Browser and proxy cache is cumulative in that the browser will only retrieve content it needs that it can not  fill from local cache, if the content is in the proxy cache it will be returned from there.

Content Retrieval
When looking at the TMG logs you can easily see where content is served from by adding the Object Souce column.  The object source could be once of the following:


  • Internet - Returned from the internet, and object is added to cache
  • Cache - Object is returned from cache
  • Not Modified - Returned form cache. An If-Modified-Since request found that the object had not been modified
  • Verified Cache - Returned form cache - Object verified to the source as not modified
  • Verified Failed Internet - Returned form the Internet - Very form source indicated a change
  • Not Verified Cache - Returned form cache - Object could not be verified
  • Upstream - Returned form Upstream proxy cache - web chaining



Something to note is that when the source is Internet the request goes out to the public network.  If it is Not Modified Cache etc, it only goes to the TMG cache.

Cache Control
Content retrieval is based on the cache control information in the HTTP header.  Since there are various web servers that differ in their header generation there are a number of possible cache information values.

TMG assigns a caching information code based on the headers.

  • 0x00000001  - Request should not be served from the cache. 
  • 0x00000002  - Request includes the IF-MODIFIED-SINCE header. 
  • 0x00000004  - Request includes one of these headers: CACHE-CONTROL:NO-CACHE or PRAGMA:NO-CACHE. 
  • 0x00000008  - Request includes the AUTHORIZATION header. 
  • 0x00000010  - Request includes the VIA header. 
  • 0x00000020  - Request includes the IF-MATCH header. 
  • 0x00000040  - Request includes the RANGE header. 
  • 0x00000080  - Request includes the CACHE-CONTROL: NO-STORE header. 
  • 0x00000100  - Request includes the CACHE-CONTROL: MAX-AGE, or CACHE-CONTROL: MAX-STALE, or CACHE-CONTROL: MIN-FRESH header. 
  • 0x00000200  - Cache could not be updated. 
  • 0x00000400 IF-MODIFIED-SINCE time specified in the request is newer than cached LASTMODIFIED time. 
  • 0x00000800  - Request includes the CACHE-CONTROL: ONLY-IF-CACHED header. 
  • 0x00001000  - Request includes the IF-NONE-MATCH header. 
  • 0x00002000  - Request includes the IF-UNMODIFIED-SINCE header. 
  • 0x00004000  - Request includes the IF-RANGE header. 0x00008000 More than one VARY header. 0x00010000  - Response includes the CACHE-CONTROL: PUBLIC header. 
  • 0x00020000  - Response includes the CACHE-CONTROL: PRIVATE header. 
  • 0x00040000  - Response includes the CACHE-CONTROL: NO-CACHE or PRAGMA: NO-CACHE header. 
  • 0x00080000  - Response includes the CACHE-CONTROL: NO-STORE header. 
  • 0x00100000  - Response includes either the CACHE-CONTROL: MUST-REVALIDATE or CACHE-CONTROL: PROXY-REVALIDATE header. 
  • 0x00200000  - Response includes the CACHE-CONTROL: MAX-AGE or S-MAXAGE header. 0x00400000  - Response includes the VARY header. 
  • 0x00800000  - Response includes the LAST-MODIFIED header. 
  • 0x01000000  - Response includes the EXPIRES header. 
  • 0x02000000  - Response includes the SET-COOKIE header. 
  • 0x04000000  - Response includes the WWW-AUTHENTICATE header. 
  • 0x08000000  - Response includes the VIA header. 
  • 0x10000000  - Response includes the AGE header. 
  • 0x20000000  - Response includes the TRANSFER-ENCODING header. 
  • 0x40000000  - Response should not be cached.

Configuring Proxy Cache
When enabling Web Caching from the web access policy, you can specify the following. But first consider the following.

  • The cache on TMG is stored in both memory and disk.  
  • By default 10% of the ram is reserved for cache.  
  • Cache drives should then at least exceed the 10% of RAM size.  
  • The maximum cache file size is 64GB per drive.  
  • Files bigger than 512MB are not retained in cache after a reboot.
  • CARP
  • Cache Rules and exclusions

TMG arrays make user of Caching Array Routing Protocol (CARP)  it essentially reduces overhead  by serving the content out of the array member that already contains the cached item.  If the client is using a proxy configuration script (wpad) then client side CARP is used, this eliminates the need for array members to forward requests between each other, this reduces overhead. If they are not using a configuration script the CARP is performed by the Array members and web chained proxies.

General tab
You only Enable or Disable the rule

Cache Drives
You need to specify the drive and amount of space that you want to allocate to cache.  Rember that this should at the very least be bigger than 10% of the TMG server's RAM.  This setting needs to be specified per array member.  The default location of this files is C:\urlcache\dir1.cdat

Cache Rules
You should have three rules by default.

  • Microsoft Update Cache Rule - This is an exclusion rule
  • Web access scenario cache rule - This is an inclusion rules
  • Default Rule
To tab
This at first glance is a little misleading since here you need to specify "Cache content requested from these destinations"  You can also specify exceptions.

The Cache Store and retrieval tab 
This allows you to specify the behavior of the cache. 


Currently the default settings are indicated.  If you wish to store additional items in cache - I woulds suggest setting up a specific cache rule for that URL set / network.  Move this rule up above the "Web access scenario cache rule."

Content download tab
From here you can create jobs to run on a schedule to pre-download items and store it cache.  You can specify the download to exclude external links and the maximum depth of links to follow per page.  When specifying content caching you need to define how you want to cache that pre-downloaded data.  You have the ability to overwrite the caching information you receive from the web server and replace it with your own.  Bearing in mind that your setting will filter down to the browser cache and affect how it stores cache.


On a personal note.  I have never needed to implement this.  If it is a frequently visited site by your proxy users, most of it will already be cached.  But it is there should you need it.

Advanced tab 
You get to set cache setting for content that have not got explicit cache setting.  You can cache object that have not got a successful download return code (200)
You can also configure additional setting for content that was marked to expire.

Also - quite important, but hiding away in the bottom of the last page is the setting for how much RAM to commit to caching.


Inspecting, refreshing and clearing the cache
Occasionally you might have to clear the cache.  If for instance a specific site had incorrectly specified content to be cached, TMG would normally use the same caching parameters.  You could then for instance have a object that is supposed to be dynamic "stuck in cache" with no way for the users to refresh it.

The crude but very effective way of fixing this issue is to clear the cache by deleting the cache files.   To do this you need to remove the cache drive from the array members, the services would then need to restart, Manually delete the cache files on all the array members.  Then you can add the cache drives back in again.  You will however lose every single cached item.  So users would have to start building it up from scratch.

If however you want to expire only a specific item you can use the Cache Directory Tool. http://www.microsoft.com/download/en/details.aspx?id=11183

To avoid the "The program can't start because msfpc.dll is missing on your computer." You need to copy the cachedir.exe into the TMG installation directory.

When you launch the tool give it a minute to go through the cache file.  Once loaded you can see all the items and the following details.
  • Page name - object name
  • Server - URL
  • Content size
  • Content MIME type
  • Expires 
  • Last Modified
  • Age
  • Protocol
  • Port #
  • Encoding
You can clear an item out of  the cache file by marking it as obsolete.


One important thing to keep in mind though. This will only clear it out of cache for that array member.  It will also only delete it form the file system and not from RAM.

Conclusion
TMG provides a flexible caching solution.  The default setting work just fine for most applications as it does not attempt to overwrite the caching information received fomr the originating web server.  You can force items to be cached but it is a cumbersome task to get them out again should there be a problem.

For reverse caching you can offload some of the cache functionality from the web servers, the effectiveness and saving because of this is largely determined by the application.  If an item that would normally be static with a long TTL like say a style sheet gets changed that change would only become visible to the user if the web server, proxy and browser cache has expired.

No comments:

Post a Comment