According to LeetCode:
Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations:
get
andset
.
get(key)
- Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
set(key, value)
- Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.
When dealing with (key, vlaue) pairs, the most straight-forward data structure is hashmap(map or unordered_map in C++). However, at least for C++, it is difficult to control the insertion of new items: (1) the position you can specify is just a hint and does not force the new element to be inserted at that position within the map/unordered_map container, and (2) there is no push_back
or push_front
methods provided for map/unordered_map.
So if you implement the LRU cache only using hashmap, you may find that the order of inserted/deleted items is a mess. If you at first insert 10 items(keys 0~9) into the LRU cache of size 10, then enter another 10 items(keys 10~19), you will find that old items are not replaced in order.
Initial LRU cache: Key Value 9: 54 8: 71 7: 49 6: 12 5: 42 4: 10 3: 37 2: 12 1: 35 0: 83 Erase pair <9,54> Erase pair <10,2> Erase pair <8,71> Erase pair <7,49> Erase pair <6,12> Erase pair <5,42> Erase pair <15,87> Erase pair <16,35> Erase pair <17,72> Erase pair <18,36> Updated LRU cache: Key Value 19: 61 4: 10 14: 20 3: 37 13: 72 2: 12 12: 4 1: 35 11: 53 0: 83
To fix the order issue, there are three candidates: vector
, list
, queue
. Since queue doesn’t support deleting an item at arbitrary position and deleting an item requires O(n) time cost for vector, the best choice is list. List is flexible enough to support inserting items into any position and erazing an arbitrary item in constant time.
However, the potential issue for list is that searching for an item requires O(n) time. To find an item in list in constant time, the best way is constructing a hashmap to record the key and its corresponding address in list. In C++, the builtin(actually, from C++ STL) hashmap data structure is unordered_map
.
In order to update the values in the cache according to keys, list
also needs to record both keys and values. There are two data structures can be used: pair<int, int>
or struct{int key;int val;}
. There is a saying, when dealing with large data, the STL pair
struct will be much slower than the self-defined struct
, because std::pair<int, int>::pair() constructor initializes the fields with default values and initializing requires writing to each field which requires a whole lot of memory accesses that are relatively time consuming. But my tests showed that the pair
implementation always performs better.
I implemented LRU cache using both pair<int, int>
and struct{int key;int val;}
, and I also tested the time cost on my Mac. When setting the LRU capacity to 100, with the same test code, the pair<int, int>
implementation is faster.
macmini:LRU boyang$ time ./lru_pair real 0m0.010s user 0m0.007s sys 0m0.003s macmini:LRU boyang$ time ./lru_struct real 0m0.020s user 0m0.016s sys 0m0.003s
When setting the LRU capacity to 10000:
macmini:LRU boyang$ time ./lru_pair real 0m0.575s user 0m0.547s sys 0m0.003s macmini:LRU boyang$ time ./lru_struct real 0m0.584s user 0m0.580s sys 0m0.003s
When setting the LRU capacity to 100000:
macmini:LRU boyang$ time ./lru_pair real 1m23.615s user 1m23.509s sys 0m0.079s macmini:LRU boyang$ time ./lru_struct real 2m1.782s user 2m1.552s sys 0m0.167s
Following are my implementations of LRU cache. There is one tricky thing of implementing set(key,value)
: if you check if the cache size surpasses the capacity for every input and move the while
clause out of the if
block, then your code won’t pass the LeetCode tests, and you will receive an error of exceeding time limit. I think the cache size checking should be trivial, but I still cannot really understand why it matters so much in large data tests.
struct
implementation:
class LRUCache{
public:
struct Node {
int key;
int val;
Node(int k, int v):key(k),val(v) {}
};
LRUCache(int capacity) {
cap=capacity;
}
int get(int key) {
unordered_map<int,list<Node>::iterator>::iterator got=hash.find(key);
if(got!=hash.end()) {
// update key&value
Node ptr=*(got->second);
cache.erase(got->second);
cache.push_front(ptr);
hash[key]=cache.begin();
return ptr.val;
} else {
return -1;
}
}
void set(int key, int value) {
Node ptr(key, value);
unordered_map<int,list<Node>::iterator>::iterator got=hash.find(key);
if(got!=hash.end()) {
cache.erase(got->second); // erase so as to update key&value
hash.erase(key);
} else {
// Assume that least recently used items are stored at the end of the cache
while(cache.size()>=cap) {
Node it=cache.back();
//cout<<"Erase pair <"<<key<<","<<it.val<<">"<<endl; // TEST ONLY
hash.erase(it.key);
cache.pop_back();
}
}
cache.push_front(ptr);
hash[key]=cache.begin();
}
void print() {
cout<<"Key Value"<<endl;
for(auto& x: cache)
cout<<x.key<<": "<<x.val<<endl;
}
private:
list<Node> cache; // <value>
unordered_map<int,list<Node>::iterator> hash; // <key, iterator>
int cap;
};
pair
implementation:
class LRUCache{
public:
LRUCache(int capacity) {
cap=capacity;
}
int get(int key) {
unordered_map<int,list<pair<int,int>>::iterator>::iterator got=hash.find(key);
if(got!=hash.end()) {
// update key&value
int val=got->second->second;
cache.erase(got->second);
cache.push_front(pair<int,int>(key,val));
hash[key]=cache.begin();
return val;
} else {
return -1;
}
}
void set(int key, int value) {
// Assume that least recently used items are stored at the end of the cache
unordered_map<int,list<pair<int,int>>::iterator>::iterator got=hash.find(key);
if(got!=hash.end()) {
cache.erase(got->second); // erase so as to update key&value
hash.erase(key);
} else {
while(cache.size()>=cap) { // for big data, must run here
pair<int,int> it=cache.back();
//cout<<"Erase pair <"<<it.first<<","<<it.second<<">"<<endl; // TEST ONLY
hash.erase(it.first);
cache.pop_back();
}
}
cache.push_front(pair<int,int>(key,value));
hash[key]=cache.begin();
}
void print() {
cout<<"Key Value"<<endl;
for(auto& x: cache)
cout<<x.first<<": "<<x.second<<endl;
}
private:
list<pair<int,int> > cache; // <key, value>
unordered_map<int,list<pair<int,int>>::iterator> hash; // <key, iterator>
int cap;
};