Server-side Caching of Embeds
Caching is a standard technique used to increase the speed with which pages are shown to users. Caching can occur in various places, for example at the browser, or at intermediate proxies, however this document is concerned with caching at the server itself.
Caching doesn't make any visual difference to a site, but is very important for usability as it speeds up a site, and can also be useful for SEO as search engines may rank faster sites higher. The compromise, as considered in detail below, is that users may not always be shown up to date information. How much that matters depends on the site and the particular page, and the caching can be easily tuned to balance freshness against speed.
Note: To see how to make the setting changes discussed here, see: Configuring caching
Different types of caching have different strengths and weaknesses:
Browser caching:
Pro: caches personalised content
Con: of no benefit for the first request by that user
Proxy caching:
Pro: several users share the cache, so some will get a first-request benefit
Con: doesn't handle personalised content
Con: can lead to under-reporting in traffic stats
Server caching:
Pro: caches personalised content
Pro: all users share the cache
Pro: all page hits are recorded
Con: uses disk space on the server
How server-side caching works
Server caching comes in different forms. To understand the difference we first need to look at how pages are put together.
When a page is requested by the browser, the system has to make it, before it can deliver it to the browser for display. To make the page, it runs a series of queries against the underlying database. For example, some queries will be used to find out which navigation links should be shown, and others are used to retrieve the content of the body of the pages. In reality, there may be hundreds of queries involved if the page is complex, and each of these takes time to run.
Server caching takes the various query embeds on the page, and when the underlying queries have been run, and the resultant HTML has been generated for the page, it stores that HTML in a cache. Subsequent requests for the same embed get given this cached content, instead of running the queries.
Since using the cached content is faster than running the queries, the page is delivered faster.
Choosing a cache duration
As the cache may not show the current information, a freshness setting is available in the Embed dialogs, where you can specify how old a cache item can be before it is discarded.
Similarly, caching will not be suitable for some queries, where it is essential that the user is shown the current data.
For example, if you are showing a user their account details, you won't want to show them stale data as this would be confusing if they have just changed them, and are then shown the previous version of their details
Choose a cache duration with care: you need to strike a balance:
A long cache duration will provide the most benefit from cache hits, but data shown may become increasingly out of date.
A short cache duration will provide fresher versions of the data, but will provide less benefit.
Consider the 'cache hit rate'. This is the number of times the cache is used compared to the number of times the query has to run. So if the query is run for the first request, and the next 99 requests are using the cache, you have a 99% hit rate – which is very good.
However if the page is only requested once per hour, that would need a cache duration of over 4 days. If your data never changes, or it doesn't matter too much, then such a long duration may be fine; however if the data changes frequently you may have to reduce the cache duration to prevent people seeing obviously out-of-date data, even if this means your cache hit rate is reduced.
Naturally for busy pages, which get requested more frequently, then this is not an issue, and the benefit of caching to reduce server load becomes even greater.
Personalised caching
When a query on a page gets cached, the cache also notes the circumstances in which the page was called, to ensure that the cached version is only shown to subsequent page requests with matching conditions.
For example, if the query contains a criteria based on who the current user is, it will be important to only show that cached version to subsequent requests from that same user. In that situation, a separate cache is stored for each user who visits the page. Other factors which could require a separate cache to be automatically generated include: domain name, browser version, whether Javascript is supported by the browser, cookies & associate id.
Whilst all of this is handled automatically, it is important to understand what is happening, as it will affect both the overall cache hit rate, and the amount of disk space needed to store the cache.
Indeed for highly personalised pages which are not visited repeatedly it may be better not to cache at all.
To protect the system, if there is less that half a gigabyte of space left on the server's hard drive, new cache files will not be written.
In order to reduce unnecessary cache personalisation:
- ensure that links to pages use the minimum number of parameters
- ensure embedded queries use the minimum number of parent or page parameters
Monitoring overall effectiveness
Consider using Google Analytics' Site Speed report to get an assessment of how fast pages are responding to users, and Google Webmaster Tools to see how your pages respond to Google.
Bear in mind that many other factors affect the page load time experienced by users, however this should give you a general comparison as you turn on or adjust your cache settings.