What is Secure Multiparty Computation?
Secure multiparty computation (MPC / SMPC) is a cryptographic protocol that distributes a computation across multiple parties where no individual party can see the other parties’ data.
Secure multiparty computation protocols can enable data scientists and analysts to compliantly, securely, and privately compute on distributed data without ever exposing or moving it. The main application for secure multiparty computation is to enable the utilisation of data without compromising privacy.
One of the most common examples of application of secure multiparty computation is the millionaires’ problem:
Three millionaires are at a dinner party. They’ve never met before, and therefore, they don’t trust each other. The first millionaire, Jane, wants to prove that they have a higher net worth than James and Janet, the other two millionaires; however neither Jane, James or Janet want to reveal their net worths to each other, nor to anyone else.
Using a calculator, Jane chooses a random number to add to her salary. Once the random number is added, Jane passes her phone to James. James then adds on his net worth, and passes the phone to Janet, who then adds her net worth.
Once everyone has added their salaries to the calculator, the phone is passed back to Jane, who subtracts the random number she originally added. Then Jane can calculate their average net worth and share that, without learning net worth of any one individual. If one millionaire’s net-worth is lower than the average, then they will know they are not the richest.
Why use SMPC?
SMPC is incredibly useful in performing low-trust multi-party computations where the parties involved want to answer a relatively simple question without leaking any data to the other parties. This is why it is commonly used commercially in different types of acquisition decisions or situations. For example:
1. M&A: A company is considering acquiring another firm and wishes to compare the two firms’ databases in order to understand potential customer synergies from the acquisition. They perform an overlap calculation and get back the matching records or a count of matching records (a common technique for this using secure multi-party computation is to execute a private set intersection).
2. Data acquisition: A company is considering purchasing a 3rd party dataset, but wants to assess the additional value it will bring to the company’s activities on top of the company’s own data assets. The company executes a match test using SMPC without either party leaking data to assess the “fit” of the 3rd party dataset prior to purchase.
3. Clinical data assessment: A researcher is seeking datasets to add to their training dataset which have comparable qualities to existing training data but do not contain an overwhelming number of duplicative records. The researcher uses SMPC to assess dataset compatibility without leaking patient information.
SMPC techniques can also be used for other types of calculations, however it is important to note that they can require significant computational power to perform.
How do we use Secure Multiparty Computation at Bitfount?
At Bifount, we use this cryptographic primitive for the SecureAggregation of the model parameters in federated model training and federated SQL queries, and for our PrivateSetIntersection protocol.
Based on the same idea that solves the millionaires’ problem, the main building block that enables our SecureAggregation is the ability to split a piece of data into multiple encoded parts known as secret shares. On their own, the shares reveal nothing about the original data. However, if two parties perform the same operation on a set of shares and then recombine them, it is as if that operation was performed on the original data. This process is also known as secret sharing.
The SecureShare algorithm we implement works as follows:
1. First every worker shares a securely generated random number (between 0 and a given prime q) with every other worker such that every worker ends up with one number from every other worker. These numbers are known as shares as they will form part of the secret (the state dictionary) which will be shared.
2. The values in the state dictionary are then converted to positive integer field elements of a finite field bounded by the prime q.
3. The random numbers generated are used to compute a final share for every value in the state dictionary. This final share has the same shape as the secret state dictionary.
4. This final share is then reconstructed using the shares retrieved from the other workers. At this point, the final share from each worker is meaningless until averaged with every other state dictionary.
5. This final share is sent to the modeller where it will be averaged with the state dictionaries from all the other workers (all the while in the finite fieldspace).
6. After averaging, the state dictionaries are finally decoded back.
Another way we integrate secure multi-party computation into our product is by enabling private set intersection. We have implemented a Private Set Intersection(PSI) protocol which currently supports the RSA + blinding PSI algorithm.
In a nutshell, PSI enables two parties, each with their own dataset, to privately find out which items from their own dataset they have in common with the other party without exposing their raw data to each other. As an example, a recent application of PSI has been in the context of contact tracing for COVID-19. PSI enables matching of location data with locations of previously diagnosed patients.
The most common scenario is a server-client scenario, in which only the clients will receive the resulting intersection data set of elements shared between the two original datasets. In the Bitfount setup, the Modeller has the role of the client, and the Pod has the role of the server. Each of them inputs their own dataset, and the Modeller learns the intersection between the two datasets, while the Pod learns nothing about the modeller’s dataset.
In more detail, the RSA+ binding algorithm we have implemented works as follows:
1. Modeller sends the task request to the worker, containing the columns on which the intersection should be performed.
2. Worker generates an RSA key pair, and sends the public key to the modeller, keeping the private key to themselves.
3. (Worker Side). Worker hashes and uses his private key to encrypt each row for the relevant columns, and then hashes again the obtained values.
3. (Modeller Side). Modeller receives the public key from the Pod. Then they choose random factors (as many as the size of the rows in his data), which they encrypt using the public key. They then hash each row of their input and multiply it with the encrypted random factors, which is referred to as blinding.
4. Modeller sends their blinded inputs to the worker.
5. Worker receives the blinded inputs from the modeller. They first encrypt the blinded values using their private key, and then hash each of the results. They then send the list of blinded hashed values and the list of hashed values from step 3 to the modeller.
6. Modeller receives the two lists from the Pod. They then can compute the private set intersection.