
There are 11 root caches active in the US. There is one at MAE-WEST and another at PAIX.
The total web is about 10 Terrabits. The cache is saving about 2.25 Terrabits.
The caches are supposed to be organized in a hierarchy, but it is not strict. The NLANR caches should only have clients that are caches themselves. The hit rate is around 40%, which is pretty high.
What kind of documents are cached?
Images have the highest hit rate as well.
Planton is a tool to visualize the relationship between caches.
What kind of return on investment should someone expect from doing caching?
Allan askes: How do I know how many hit counts get via the cache?
There is a behavior goes on called "cache busting" that keeps some URLs to caching. Jeff Mogul has an internet draft to make this possible. There are also some privacy issues on both sides.
Do you have any performance data on hits to misses?
I don't really have any solid data on this. I can say that the growth is about 3 cache sites per day. I would say that caches should be "in-line" with the web sites you want to cache.
Could there be issues of using a cache to avoid buying bandwidth? Yes, certainly. That's how it supposed to work. That does not mean that you won't have to pay to use the cache hierarchy, though. We are trying a new model by putting a cache at MAE-WEST. To see how those that are at MAE-WEST might be able to make use of this.
Cache busting is used to provide statistics, but it does not have to be continuous.
The root caches have 30Gb of disks and 1/2Gb of memory. Right now, we don’t have any problems with cache performance. SQUID is very configurable and can monitor lots of things.
Transparent proxying can be done.
Some folks charge folks based on bytes transferred. Caches make this lower.
Yes, it does. The IETF work can provide a possible solution.