Abstract
Linear regression is a widely used machine learning model for applications such as personalized health-care prediction, recommendation systems, and policy making etc. Nowadays one important trend of applying this model (also others) is privacy-preserving linear regression, in which multiple parties, each possessing a part of dataset, jointly perform the learning process, while paying a specific attention to the goal of preserving privacy of their data. Consequently some works on how to achieve this goal with various properties appear in recent years.
In this paper, we present a new privacy-preserving linear regression protocol for the scenario where dataset is distributed horizontally, which works highly efficiently in particular when training dataset is of low dimension. Our protocol uses two non-colluding servers, in which multiple data providers share their private data, i.e. a dxd matrix where d denotes the dimension of dataset, into two shares and send shares to the two servers respectively which then jointly perform the training process securely.
Our technical novelties are as follows. We remark that in the method of solving linear systems (SLS), one mainstream method for linear regression (while the other one is stochastic gradient descent (SGD)), the time complexity only depends on the dimension, regardless of the number of samples. Note that known works using SLS employ garbled circuits for entire linear systems and thus use heavy computation. We implement SLS via the share-computation method which is known for securely implementing SGD and has not been applied to SLS to our knowledge, and thus inherit the advantages from them both (note that SLS admits fast computation when dataset is of low dimension, and the share-computation method is faster than the method of garbled circuits for entire linear systems).
In the share-computation for SLS we propose a hybrid method, combining garbled circuits and secret sharing, to realize a secure and round-efficient division for fixed-point number (8+2θ rounds, where θ is a small number of iterations and is set to 5 in our experiment). We then use the division protocol to implement our new protocol for linear regression. As a consequence, our protocol is highly efficient for dataset of low dimension. We implement our protocol in C++ and the experiments show that our protocol is much more efficient than the state of the art implementations for privacy-preserving linear regression for dataset of low dimension.