Cookoo Table
[Hashing Home][Home]
Cuckoo hashing involves a method that avoids having collisions in a hash table. Overall we in our table we have a hashkey-value pair, and where we take a look up term and hash it to provide a lookup hash. This hash then has an associated value. With a cuckoo hashing table, a new key inserted into a cuckoo hashing table can push an older key into a different location in the table. In a traditional hashing table, two keys could produce the same hash and thus cause a collision. With the cuckoo table, we create two hashes to avoid this. They were first proposed by Pagh and Rodler [1].
|
Background
In cybersecurity we often hear of a canary, and which is a file which is used to probe for a response. But where would we find a cuckoo? Well, cuckoo hashing involves a method that avoids having collisions in a hash table. Overall we in our table we have a hashkey-value pair, and where we take a lookup term and hash it to provide a lookup hash. This hash then has an associated value. With a cuckoo hashing table, a new key inserted into a cuckoo hashing table can push an older key into a different location in the table. In a traditional hashing table, two keys could produce the same hash, and thus cause a collision. With the cuckoo table, we create two hashes to avoid this. They were first proposed by Pagh and Rodler [1].
The two hashes provide two possible locations in the hash table for each key. We can either split into two tables, or can have an index which merges the two hash functions. Let’s say we have four words (“apple”, “coconut”, “pear” and “banana”, we can use the BitHash() function to create two hash values for each:
v1 = BitHash(word1); v2 = BitHash(word1, v1); print(f"{word1}: {v1}, {v2}") v1 = BitHash(word2); v2 = BitHash(word2, v1); print(f"{word2}: {v1}, {v2}") v1 = BitHash(word3); v2 = BitHash(word3, v1); print(f"{word3}: {v1}, {v2}") v1 = BitHash(word4); v2 = BitHash(word4, v1); print(f"{word4}: {v1}, {v2}")
This gives two 64-bit values for each word:
apple: 13219638140771231075, 6469225138027186453 coconut: 6101404271930826019, 209953061413735561 pear: 774049672829110574, 11628957429084199886 banana: 1390890086164481159, 13841432658089043267
The lookup then involves inspecting two location in the hash table. Once a key is created, there will be otwo possible cells for it. If both are empty, the value will be placed in the first value. If this is already occupied, one of the keys will be moved to their second location, and where the target is then filled with the new key. This will continue until a space is found for the new key, and all the other keys have moved. If this can’t happen the table will be rebuilt, as it would go into an infinite loop.
Overall we have a process of “kicking out” a key to a new location, in the way that a cuckoo would push an existing egg out of the nest and replace it with its own egg. Some sample code is:
from bithash import BitHash, ResetBitHash import string import random import time from cookoo import CuckooHash import sys word1="apple" word2="orange" word3="banana" word4="pear" if (len(sys.argv)>1): word1=str(sys.argv[1]) if (len(sys.argv)>2): word2=str(sys.argv[2]) if (len(sys.argv)>3): word3=str(sys.argv[3]) if (len(sys.argv)>4): word4=str(sys.argv[4]) h = CuckooHash(100) h.insert(word1, "0") h.insert(word1, "5") h.insert(word2, "1") h.insert(word3, "2") h.insert(word4, "3") v1 = BitHash(word1); v2 = BitHash(word1, v1); print(f"{word1}: {v1}, {v2}") v1 = BitHash(word2); v2 = BitHash(word2, v1); print(f"{word2}: {v1}, {v2}") v1 = BitHash(word3); v2 = BitHash(word3, v1); print(f"{word3}: {v1}, {v2}") v1 = BitHash(word4); v2 = BitHash(word4, v1); print(f"{word4}: {v1}, {v2}") print(f"\n{word1}:\t{h.find(word1)}") print(f"{word2}:\t{h.find(word2)}") print(f"{word3}:\t{h.find(word3)}") print(f"{word4}:\t{h.find(word4)}") print ("\nDeleting first word, and adding again") h.delete(word1) h.insert(word1, "999") print(f"{word1}:\t{h.find(word1)}") print(f"{word2}:\t{h.find(word2)}") print(f"{word3}:\t{h.find(word3)}") print(f"{word4}:\t{h.find(word4)}")
A sample run is:
apple: 13219638140771231075, 6469225138027186453 orange: 7860921420059864069, 2950141556808904030 banana: 1390890086164481159, 13841432658089043267 pear: 774049672829110574, 11628957429084199886 apple: 0 orange: 1 banana: 2 pear: 3 Deleting first word, and adding again apple: 999 orange: 1 banana: 2 pear: 3
References
[1] Pagh, R., & Rodler, F. F. (2004). Cuckoo hashing. Journal of Algorithms, 51(2), 122-144 [here].