Order-preserving encryption (OPE)Order-preserving encryption (OPE) is a symmetric key encryption method that can be used to create an order on encrypted content. Obviously, we will lose some information in supporting the ordering of the cyphers, but the security level should still stay acceptable for the application area. If we encrypt a value of A with a key of \(k\) to get \(E_k(A)\), and then encrypt \(B\) to get \(E_k(B)\). Then if \(A \lt B\), then \(E_k(A) \lt E_k(B)\). |
Code
Order-preserving encryption (OPE) is a symmetric key encryption method that can be used to create an order on encrypted content. Obviously, we will lose some information in supporting the ordering of the cyphers, but the security level should still stay acceptable for the application area. If we encrypt a value of A with a key of \(k\) to get \(E_k(A)\), and then encrypt \(B\) to get \(E_k(B)\). Then if \(A \lt B\), then:
\(E_k(A) \lt E_k(B)\)
and if \(A\) is equal to \(B\), we get:
\(E_k(A) = E_k(B)\)
One of the best papers on the subject is [1]. First, we let M and N as positive integers, and where M is less than N. We next create a random order-preserving that ranges from M to N. Then: [M]={1,2,3,…M}and [N]={1,2,3,…N}.
We can choose M random elements from [N], and arrange them in order. To encrypt a value of i, we take the i-th element of M. In real-life, this would be difficult to implement, as we would need to generate many values in memory. The focus is then to create a function to generate the ith element.
To model, we can think of a large bin with a nuber of balls. Overall, there will be M marked balls, and M-N unmarked balls. The number of marked balls that we draw can then be defined by x, and after y samples. This is defined by a hypergeometric distribution:
Figure 1
In Figure 1, we make a sample of \(y\) balls, and then plot the probability of drawing x marked balls. In this case, HyperGeometric (100, 700, 1000) defines that we sample 100 balls, and which has 700 marked balls and 300 unmarked balls. The peak probability will at 70 marked balls and 30 unmarked balls. Overall, we have a generalised form of ProbHyperGeometric(\(k\), \(trials\), \(posEvents\), \(size\)), and where \(k\) is the target number of balls, trails is the number of balls that are drawn, \(posEvents\) is the number of marked balls, and the \(size\) is the total number of balls. The probability of drawing \(k\) marked balls is then:
\( p(k)=\frac{ \binom{posEvents}{k} \binom{size-postEvents}{trails-k}} { \binom{size}{trails} } \)
For example, if we have 20 marked balls out of 30 balls, and draw 10 balls at a time (but do not replace them). The probability of drawing seven marked balls can then be computed with Python using:
from math import comb k=7 trails=10 posEvents=20 size=30 a=comb(posEvents,k) b=comb(size-posEvents,trails-k) c=comb(size,trails) print (a*b/c)
The result is 0.3096. We thus have around a 3-in-10 probability in drawing seven marked balls.
We can now use this to determine the number of points that lie below a given point in our range, and do this by defining the range point to be the number of samples we take. and where the number of marked balls is 𝑀 and unmarked balls is 𝑁−𝑀.
More details are defined in [1]. Now, let's implement some code:
from pyope.ope import OPE, ValueRange import sys val1=100 val2=200 val3=300 range=1000 if (len(sys.argv)>1): val1=int(sys.argv[1]) if (len(sys.argv)>2): val2=int(sys.argv[2]) if (len(sys.argv)>3): val3=int(sys.argv[3]) if (len(sys.argv)>4): range=int(sys.argv[4]) random_key = OPE.generate_key() cipher = OPE(random_key) print (f"Key: {random_key.decode()}") print (f"\nVal={val1}, Cipher={cipher.encrypt(val1)}") print (f"Val={val2}, Cipher={cipher.encrypt(val2)}") print (f"Val={val3}, Cipher={cipher.encrypt(val3)}") print ("Val 1 decrypted: ",cipher.decrypt(cipher.encrypt(val1))) print ("Val 2 decrypted: ",cipher.decrypt(cipher.encrypt(val2))) print ("Val 3 decrypted: ",cipher.decrypt(cipher.encrypt(val3))) cipher = OPE(b'long key' * 2, in_range=ValueRange(0, val3),out_range=ValueRange(0, range)) print(f"\nRange input range 0 to {val3}, and output range 0 to {range}") print (f"\nVal1={val1}, Cipher={cipher.encrypt(val1)}") print (f"Val2={val2}, Cipher={cipher.encrypt(val2)}") print (f"Val3={val3}, Cipher={cipher.encrypt(val3)}")
A sample run shows that the values of 100, 200 add 300, are encrypted with 7,117,590, 14,119,730 and 20,705,532. With this, the encrypted values are in the same order as the input [here]:
Key: jazlAhZoQwLaruz2sEhGA9G5H7tI2RmFIxjpZn8x46g= Val=100, Cipher=7117590 Val=200, Cipher=14119730 Val=300, Cipher=20705532 Val 1 decrypted: 100 Val 2 decrypted: 200 Val 3 decrypted: 300
We can also defines an input range and an output range for the encrypted values [here]:
Range input range 0 to 300, and output range 0 to 100000 Val1=100, Cipher=33171 Val2=200, Cipher=70719 Val3=300, Cipher=99982
Overall, the paper [1] defines that an attacker could discovery half of the bits in the plaintext when the ciphertext is analysed. So, we have to define our ranges properly.
Examples
We live in a 20th-century world of data, where data processing and storage are often done with little trust and little privacy. Increasingly, too, we are moving our data storage and processing into the cloud, and this is often done by external parties. So, how could we integrate a privacy-aware method into database queries that are conducted by a third-party database provider? Well, Order Preserving Encryption (OPE) is one way to do this.
With this, we can encrypt values using a symmetric key, and able be to preserve the order of the values. We can now implement a simple use case.
Let’s say that Alice uses Bob as an SQL database provider, and wants to search for students who have a range of marks. If the marks of her students are:
She could then use OPE, to encrypt the grade for each student:
Overall, we have preserved the order of the grades, and where Wendy Rome has the highest value, and Cleopatra Smith has the lowest value. She can now pass the encrypted database to Bob:
Now, if Alice wants to find the students who have a grade between 70% and 100%:
SELECT Name WHERE Grade BETWEEN Ek(70) TO Ek(100)
If the value of Ek(70) is 3,584,251, and Ek(100) is 52,945,23, then the query would be:
SELECT Name WHERE Grade BETWEEN 3584251 TO 5294523
The records returned by Bob would then be:
Now, Alice will decrypt the grades with her symmetric key to give:
And, so, Alice now the records that match the query, but Bob cannot determine who the records relate to, or what the query that is being run.
References
[1] Boldyreva, A., Chenette, N., & O'Neill, A. (2011). Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Advances in Cryptology–CRYPTO 2011: 31st Annual Cryptology Conference, Santa Barbara, CA, USA, August 14–18, 2011. Proceedings 31 (pp. 578–595). Springer Berlin Heidelberg.