Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

algorithm - Cache Invalidation — Is there a General Solution?

"There are only two hard problems in Computer Science: cache invalidation and naming things."

Phil Karlton

Is there a general solution or method to invalidating a cache; to know when an entry is stale, so you are guaranteed to always get fresh data?

For example, consider a function getData() that gets data from a file. It caches it based on the last modified time of the file, which it checks every time it's called.
Then you add a second function transformData() which transforms the data, and caches its result for next time the function is called. It has no knowledge of the file - how do you add the dependency that if the file is changed, this cache becomes invalid?

You could call getData() every time transformData() is called and compare it with the value that was used to build the cache, but that could end up being very costly.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you are talking about is lifetime dependency chaining, that one thing is dependent on another which can be modified outside of it's control.

If you have an idempotent function from a, b to c where, if a and b are the same then c is the same but the cost of checking b is high then you either:

  1. accept that you sometime operate with out of date information and do not always check b
  2. do your level best to make checking b as fast as possible

You cannot have your cake and eat it...

If you can layer an additional cache based on a over the top then this affects the initial problem not one bit. If you chose 1 then you have whatever freedom you gave yourself and can thus cache more but must remember to consider the validity of the cached value of b. If you chose 2 you must still check b every time but can fall back on the cache for a if b checks out.

If you layer caches you must consider whether you have violated the 'rules' of the system as a result of the combined behaviour.

If you know that a always has validity if b does then you can arrange your cache like so (pseudocode):

private map<b,map<a,c>> cache // 
private func realFunction    // (a,b) -> c

get(a, b) 
{
    c result;
    map<a,c> endCache;
    if (cache[b] expired or not present)
    {
        remove all b -> * entries in cache;   
        endCache = new map<a,c>();      
        add to cache b -> endCache;
    }
    else
    {
        endCache = cache[b];     
    }
    if (endCache[a] not present)     // important line
    {
        result = realFunction(a,b); 
        endCache[a] = result;
    }
    else   
    {
        result = endCache[a];
    }
    return result;
}

Obviously successive layering (say x) is trivial so long as, at each stage the validity of the newly added input matches the a:b relationship for x:b and x:a.

However it is quite possible that you could get three inputs whose validity was entirely independent (or was cyclic), so no layering would be possible. This would mean the line marked // important would have to change to

if (endCache[a] expired or not present)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...