1. Introduction
Digital signatures are a very important cryptographic primitive in the modern world. Among the most popular, there are, for instance, schemes based on the RSA assumptions, discrete logarithm (DSA) and its elliptic curves version (ECDSA), all included in the FIPS standard 186-3 [
1]. Many schemes based on coding theory have been proposed over the years that either follow a “direct” hash-and-sign approach like the Courtois-Finiasz-Sendrier scheme (CFS) [
2] and the Kabatianskii-Krouk-Smeets scheme (KKS) [
3], or rely on the Fiat–Shamir transform [
4] to convert an identification scheme into a signature scheme. The latter schemes are usually built via a 3-pass protocol [
5] or, more recently, a 5-pass protocol [
6], in turn relying on the work of Stern [
7,
8]. Unfortunately, many of the various proposals have been broken, and all those that are still considered secure suffer from one or more flaws, be that a huge public key, a large signature or a slow signing algorithm, which make them highly inefficient in practical situations. This is particularly evident in the identification schemes, where it is usually necessary to repeat the protocol many times in order to guarantee correctness or security.
In [
9], we introduced a code-based signature scheme following a different approach, inspired by the work of Lyubashevsky [
10,
11]. Such a proposal had been attempted before (see [
12]) without success, the main issue being the choice of the setting (random binary codes), which proved to be too restrictive. Choosing quasi-cyclic codes allows for taking advantage of the innate ring metric and makes the scheme viable in practice.
Our Contribution
This full version features a detailed security analysis, including a proof of security that guarantees one-time existential unforgeability against chosen-message attacks, i.e., 1-EUF-CMA. While one-time signatures are not used directly in most applications, they are still relevant since they can be embedded in a Merkle tree structure to obtain a full-fledged signature scheme, which allows for signing up to a predetermined number of times. Our scheme compares very well to other one-time code-based proposals, obtaining what are, to date, the smallest sizes for both signature and public data in the code-based setting.
The paper is organized as follows: in the next section, we give some preliminary notions about codes and code-based cryptography, as well as identification schemes. In
Section 3, we describe the framework on which our scheme will be based, including the previous code-based proposal by Persichetti. Our scheme is presented in
Section 4, together with a detailed security analysis (
Section 5), and its performance and comparison with other code-base schemes are discussed in
Section 6. We conclude in
Section 7.
2. Preliminaries
2.1. Coding Theory
Let be the finite field with q elements. An linear code is a subspace of dimension k of the vector space . Codewords are usually measured in the Hamming metric: the Hamming weight of a word is the number of its non-zero positions, and the Hamming distance between two words is the number of positions in which they differ, that is, the weight of their difference. We denote those respectively by and .
Linear codes can be efficiently described by matrices. The first way of doing this is essentially choosing a basis for the vector subspace. A
generator matrix is a matrix
G that generates the code as a linear map: for each message
, we obtain the corresponding codeword
. Of course, since the choice of basis is not unique, so is the choice of generator matrix. It is possible to do this in a particular way, so that
. This is called
systematic form of the generator matrix. Alternatively, a code can be described by its
parity-check matrix: this is nothing but a generator for the
dual code of
, i.e., the code comprised of all the codewords that are “orthogonal” to those of
. The parity-check matrix describes the code as follows:
The product is known as syndrome of the vector x. Note that, if is a generator matrix in systematic form for , then is a systematic parity-check matrix for .
Code-based cryptography usually relies more or less directly on the following problem, connected to the parity-check matrix of a code.
Problem 1 (Syndrome Decoding Problem (SDP)).
Given: , and .
Goal: find with such that .
This problem is well known and was proved to be NP-complete by Berlekamp, McEliece and van Tilborg in [
13]. Moreover, it is proved that there exists a unique solution to SDP if the weight
w is below the so-called
GV Bound.
Definition 1. Let be an linear code over . The Gilbert–Varshamov (GV) Distance
is the largest integer d such that If this is not the case, multiple solutions exist (see, for example, Overbeck and Sendrier, [
14]). It follows that SDP is of particular interest when the weight
w is “small”.
Quasi-Cyclic Codes
A special subfamily of linear codes is that of cyclic codes.
Definition 2. Let be an linear code over . We call cyclic
if Clearly, if the code is cyclic, then all the right shifts of any codeword have to belong to as well. An algebraic characterization can be given in terms of polynomial rings. In fact, it is natural to build a bijection between cyclic codes and ideals of the polynomial ring . We identify the vector with the polynomial , and then the right shift operation corresponds to the multiplication by X in the ring.
Because of this correspondence, it is possible to see that both the generator matrix and the parity-check matrix of a cyclic code have a special form, namely circulant form, where the i-th row corresponds to the cyclic right shift by i positions of the first row.
Cyclic codes have been shown to be insecure in the context of cryptography, as they introduce too much recognizable structure. A subfamily, known as quasi-cyclic codes, has then been proposed with some success, mostly in the context of encryption.
Definition 3. Let be an linear code over . We call Quasi-Cyclic if there exists such that, for any codeword a, all the right shifts of a by positions are also codewords.
When , it is again possible to have both matrices in a special form, composed of circulant blocks. The algebra of quasi-cyclic codes can be connected to that of the polynomial ring , where each codeword is a length- vector of elements of the ring.
For the remainder of the paper, we consider only binary codes; thus, we set , and we restrict our attention to the case . We have the following ring-based formulation of SDP.
Problem 2 (Quasi-Cyclic Syndrome Decoding Problem (QC-SDP)).
Given: and .
Goal: find with such that .
This was shown to be NP-complete in [
15]. When
, it has been proved in [
16] that random quasi-cyclic codes lie on the GV bound with overwhelming probability. Moreover, the impact of cyclicity on SDP has been studied, for example in [
17], revealing no substantial gain.
2.2. Identification Schemes and Signatures
An identification scheme is a protocol that allows a party , the Prover, to prove to another party , the Verifier, that he possesses some secret information x, usually called witness, without revealing to the verifier what that secret information is. The paradigm works as follows: is equipped with a public key and some public data D. To start, chooses some random data y and commits to it by sending to , where f is usually a trapdoor one-way function or a hash function. then chooses a random challenge c and sends it to . After receiving c, computes a response z as a function of and y and transmits z. Finally, checks that z is correctly formed using and D.
A signature scheme is defined by a triple , respectively the key generation algorithm, the signing algorithm and the verification algorithm. The key generation algorithm takes as input a security parameter and outputs a signing key and a verification key . The private signing algorithm receives as input a signing key and a message m and returns a signature . Finally, the public verification algorithm uses a verification key to verify a signature that is transmitted together with the message m: it outputs 1, if the signature is recognized as valid, or 0 otherwise.
The standard notion of security for digital signatures schemes is Existential Unforgeability under Chosen-Message Attacks (EUF-CMA), as described, for example, in [
18]. In this scenario, the goal of an attacker is to produce a valid message/signature pair, and the attack model allows the attacker to obtain a certain, predetermined, number of signatures on arbitrarily chosen messages (signing queries). In particular, if the attacker is only allowed to obtain a single signature, we talk about 1-EUF-CMA security. Since this is the security target of this work, we give a precise definition below.
Definition 4. An adversary is a polynomial-time algorithm that acts as follows:
- 1.
Query a key generation oracle to obtain a verification key .
- 2.
Choose a message m and submit it to a signing oracle. The oracle will reply with .
- 3.
Output a pair .
The adversary succeeds if and . We say that a signature scheme is 1-EUF-CMA secure if the probability of success of any adversary is negligible in the security parameter, i.e., Fiat and Shamir in [
4] showed how to obtain a full-fledged signature scheme from an identification scheme. With this paradigm, the signer simply runs the identification protocol, where, for the purpose of generating the challenge, the verifier is replaced by a random oracle
(usually a cryptographic hash function). The signature is then accepted according to the validity of the response in the identification scheme. We report this in
Table 1.
Note that several signature schemes, including [
11] and this work, use a slightly modified version of the above paradigm, where the signature is
instead of
. The verifier can then calculate
Y from
z and the public key, and check the equality between
c and the hash digest obtained using this newly-generated
Y and
m.
4. The New Scheme
The core of our idea is to use quasi-cyclic codes in the framework that we have described above. The use of quasi-cyclic codes in cryptography is not a novelty: these have been proposed before in the context of encryption (e.g., [
15]). Their originally suggested use (i.e., with GRS codes) was cryptanalyzed in [
21] and it is thus not recommended, but other variants based on Low-Density Parity-Check (LDPC) and Moderate-Density Parity-Check (MDPC) codes are still considered safe. In both cases, the issue is that introducing the extra algebraic structure can compromise the secrecy of the private matrix used for decoding.
A big advantage of our proposal is that this issue does not apply. In fact, since there is no decoding involved, an entirely random code can be used, and the code itself is public, so there is no private matrix to hide. In this sense, our scheme is closer, to an extent, to the work of [
22], which is centered on
random quasi-cyclic codes.
As far as signature schemes go, Gaborit and Girault in [
23] propose a variant of Stern’s ID scheme that uses quasi-cyclic codes (called “double-circulant” by the authors). While this proves to be more efficient than the classical Stern scheme, the protocol still features the same flaw, i.e., a non-trivial cheating probability. This leads to the necessity of repeating the protocol several times, with an obvious impact on the efficiency of the scheme.
In our setting, we use 2-quasi-cyclic codes where words are vectors in . For a word , the syndrome function associated to is defined as , following the notation that takes a parity-check matrix in systematic form (and hence defined by h) as in Problem 2. For a more general formulation, we also adapt the notation from the previous section, indicating with and the distributions that sample uniformly at random vectors of having weight respectively less or equal to and . Our signature scheme is presented below. The scheme uses a hash function that outputs bit strings of fixed weight , which is one of the system parameters.
KeyGen
Input: parameters and a vector .
1. Sample .
2. The signing key is x.
3. The verification key is .
Sign
Input: a message m and the signing key x.
1. Sample .
2. Compute .
3. Compute .
4. Compute .
5. The signature is .
Ver
Input: a message m, a signature and the verification key .
1. Compute .
2. Use the verification key to compute .
3. Compute .
4. Accept if and .
5. Else, reject.
Like before, we have a constraint on the weight of the response vector z: in this case, , since c is no longer a constant. Then, w is required to be below the GV bound to ensure that the response z is the unique solution to the corresponding QC-SDP instance. This is a consequence of the security requirements, as we will see next.
To conclude, note that it is easy to check that an honest verifier always gets accepted. In fact, in an honest run of the protocol, then . Due to the transitivity of the syndrome computation, this is the same as . Therefore, and the verification is passed.
5. Security
The change of metric in our proposal means that our scheme is substantially different from the “naïve” SDP-based proposal of
Section 3.2, and in fact resembles the lattice setting much more. In fact, as in the lattice case, our objects are “vectors of vectors”, namely in this case a length-2 vector of length-
p binary vectors. Due to the inherent arithmetic associated to the ring
, this allows us to choose
c in the same realm, and perform an operation (ring multiplication) that is still compatible with the verification operation, but does affect the weight of the response vector. Polynomial multiplication simultaneously increases and scrambles the error positions, and in so doing prevents the simple attack based on statistical analysis that affected the previous proposal. Unfortunately, this is still not enough to hide the private information. The following procedure [
24] shows that it is still possible to recover the private key with a polynomial number of signatures.
Procedure 1. Start by obtaining a polynomial number ℓ of signatures, i.e., pairs for . For each pair, is chosen uniformly at random among the vectors of weight δ, and where is also chosen uniformly at random (sampled from ). For each i, write , that is, as a polynomial of weight δ in . Then, calculatewhere and . Since is just a shift of x and is just a shift of , and their support will likely have little to no intersection with the support of x (due to the weight of the vectors), it is possible to reveal the support of x simply by looking at the bits that belong to the support of a large enough number of .
Note that the above procedure is in fact a refinement of the simple statistical analysis attack encountered before: in both cases, the problem is that the weight of the vectors is simply too low to properly mask the private vector. It is then clear that it is impossible to sign multiple times and preserve security. It follows that our scheme only achieves one-time security. To prove the one-time security of our scheme, we follow the paradigm for a generic one-time signature scheme of Pointcheval and Stern, which was already employed in the code-based setting in [
25]. In this paradigm, signature schemes are treated in a unified way, as a protocol that outputs triples of the form
, where
represents the commitment, or a sequence of commitments, if the protocol needs to be repeated multiple times.
is the response, or a sequence of responses, and h is the hash value, as in the Fiat–Shamir scheme. To obtain security, it is necessary that
is sampled uniformly at random from a large set and that
only depends on
, the message m and the hash value h.
In our scheme, the first element is sampled uniformly at random from , which has size . Note that, even though this value is not explicitly output as part of the signature, it is immediate to recover it from the signature, as shown in Step 2 of the verification algorithm. The vector c is exactly the hash value obtained from the message m and , i.e., the element h in the Pointcheval–Stern notation. We clearly use c from now on, to avoid confusion as h is used to denote the vector defining the parity-check matrix in a QC code. Finally, we show that indeed only depends on the message m, and c. The dependence is obvious, given that z is computed using only the private key, c itself and y, which is in a one-to-one correspondence with (due to being below the GV bound). Furthermore, z is uniquely determined by those values. In fact, suppose there existed a distinct valid triple with . Since the triple is valid, it needs to satisfy the verification equation, thus . This is clearly not possible because both z and have weight below the GV bound, which implies that there exists only one vector having syndrome , i.e., .
The next step is to show that, in our signature scheme, it is possible to simulate the target triples without knowing the private key, unbeknownst to the adversary.
Lemma 1. It is possible to obtain artificially-generated triples of the form which are indistinguishable from honestly-generated triples, unless the adversary is able to solve an instance of QC-SDP.
Proof. To begin, notice that any valid triple is required to satisfy two constraints. First, the weight of z has to be below the GV bound; in fact, is expected to be statistically close to the bound . Second, the triple needs to pass the verification equation, and so . Then, to simulate a valid triple, it is enough to sample two elements at random and set the third to match. More precisely, one would sample and , the second one chosen such that . Then, one would proceed by setting to be exactly , which is possible since the public key is known.
Now, it is easy to see that all honestly-generated triples correspond to syndromes where y has weight below the GV bound, while, for simulated triples, the syndrome is obtained from a vector which has expected weight above the GV bound with overwhelming probability. This is because both c and z are generated independently and at random, and so the expected weight is simply , which is bigger than the bound with overwhelming probability.
In conclusion, distinguishing a simulated triple from an honest one corresponds to solving a QC-SDP instance as claimed. ☐
The last piece necessary for our proof is the well-known
forking lemma. We report it below, as formulated in [
26].
Theorem 1 (General Forking Lemma). Let be a signature scheme with security parameter λ. Let be an adversary, running in time T and performing at most q random oracle queries and ℓ signing queries. Suppose that is able to produce a valid signature with probability . If the triples can be simulated without knowing the private key with only a negligible advantage for , then there exists a polynomial-time algorithm that can simulate the interaction with and is able to produce two valid signatures and , for , in time .
We are now ready for our security result.
Theorem 2. Let be a polynomial-time 1-EUF-CMA adversary for the signature scheme with parameters , running in time T and performing at most q random oracle queries. Let the probability of success of be . Then, the QC-SDP problem with parameters can be solved in time .
Proof. We have seen in Procedure 1 that it is possible to recover the private key using a polynomial number ℓ of signatures. The forking lemma can be iterated so that it is guaranteed to produce ℓ distinct, valid signatures in time less or equal to . The thesis naturally follows from the combination of these two facts. ☐
6. Performance and Comparison
To properly evaluate the performance, we start by recalling the main components of our scheme. First of all, the public data consists of the vector
h (of length
p) and the syndrome
(also of length
p), for a total of
bits. The signature, on the other hand, is given by the challenge string
c and the response
z. In our scheme, this corresponds respectively to a vector of length
p and a vector of length
. It is possible to greatly reduce this size thanks to a storing technique [
27] which allows for representing low-weight vectors in a compact manner. Namely, a binary vector of length
n and weight
w is represented as an index, plus an indication of the actual vector weight, for a total of
. Note that in our case this applies to both
c and
z.
We now provide (
Table 2) some parameters for the codes in our scheme. These are normally evaluated with respect to general decoding algorithms such as Information-Set Decoding [
28,
29,
30,
31,
32]: the amount of security bits is indicated in the column “Security”.
The first two rows report well-known parameters suggested in the literature for QC-MDPC codes; however, since our codes do not need to be decodable, we are able to slightly increase the number of errors introduced. The last two rows, instead, are parameters chosen ad hoc, in order to optimize performance.
6.1. Existing Code-Based Solutions
We are now going to briefly discuss the three main approaches to obtain code-based signatures, and related variants. This will give an insight into why designing an efficient code-based signature scheme is still an open problem.
6.1.1. CFS
The CFS scheme [
2] follows the “hash and sign” paradigm, which is a very natural approach for code-based cryptography, and thus it retains most of its traits, both good and bad. For instance, the verification consists of a single matrix-vector multiplication and so it is usually very fast. On the other hand, the scheme features a very large public key (the whole parity-check matrix). Structured instances as proposed for example in [
33] reduce this size drastically and are therefore able to deal with this issue, although with a potential few security concerns. However, the main downfall of CFS is the extremely slow signing time. This is a consequence of the well-known fact that a random word is in general
not decodable, thus finding a decodable syndrome requires an incredibly high number of attempts (at least
in the simplest instances). To lower this number, the common solution is to use codes with very high rate, which in itself could lead to potential insecurities (e.g., the distinguisher of). Thus, it seems unrealistic to obtain an efficient signature scheme in this way.
6.1.2. KKS
The KKS approach [
3] still creates signatures in a “direct” way, but without decoding. Instead, the scheme relies on certain aspects of the codes such as a carefully chosen distance between the codewords, and uses a secret support. Unfortunately, the main drawback of KKS-like schemes is the security. In fact, it has been shown in [
34] that most of the original proposals can be broken after recovering just a few signatures. Furthermore, not even a one-time version of the scheme (e.g., [
25]) is secure, as shown by Otmani and Tillich [
35], who are able to break all proposals in the literature without needing to know any message/signature pair. It is therefore unlikely that the KKS approach could be suitable for a credible code-based signature scheme.
6.1.3. Identification Schemes
All of the code-based identification schemes proposed so far are 3-pass (or 5-pass) schemes with multiple challenges. Thus, the prover sends two or three entirely different responses depending on the value of the challenge (usually a bit or {0,1,2}). In this sense, our proposal represents a big novelty. In fact, multiple challenges allow for a malicious user to be able to cheat in some instances. For example, in the original proposal by Stern [
7], it is possible to choose any two out of three possible responses and pass verification for those even without knowing the private key, thus leading to a cheating probability of 2/3. This cheating probability is subsequently lowered in most recently proposals, approaching 1/2. Nevertheless, this causes a huge issue, since the protocol needs to be repeated several times in order for an honest prover to be accepted. The 35 repetitions of the original scheme can be lowered to approximately 16 repetitions in recent variants, but, even so, communication costs prove to be very high, leading to a very large signature size. In
Table 3 below, we report a comparison of parameters for different variants of the scheme, where the column Véron refers to [
5], CVE to [
6] and AGS to [
36]. Note that all of these parameters refer to a cheating probability of
, a weak authentication level.
In the latest proposal (column AGS), the size of the public matrix is considerably smaller thanks to the use of double-circulant codes. However, the signature size is still very large (about 93 Kb). Moreover, for a signature to be considered secure, one would expect computational costs to produce a forgery to be no less than
; this would require, as claimed by the authors in [
36], to multiply all the above data by 5, producing even larger sizes.
6.2. Comparison
A comparison of our scheme with the full-fledged schemes described above would not be entirely accurate. We can however compare (
Table 4) our scheme to other code-based proposals that are one-time secure, such as [
25,
37]. Both of these schemes follow the KKS approach, and therefore come with some potential security concerns, as mentioned in the previous section. For simplicity, we will refer to [
25] as BMS and to [
37] as GS. Note that the latter comes in two variants, which use respectively quasi-cyclic codes, and a newly-introduced class of codes called “quadratic double-circulant” by the authors. All the parameters and sizes (in bits) are reported in the following table, and correspond to a security level of
.
It is immediate to notice that our scheme presents the smallest amount of public data (which groups together public key and any additional public information) and the smallest signature size. To be fair, the BMS scheme employs the same indexing trick used in this work, while this is not the case for the other scheme. Since the signature of the GS scheme (in both variants) also includes a low-weight vector, we expect that it would be possible to apply the same technique to the GS scheme as well, with the obvious reduction in size. We did not compute this explicitly, but it is plausible to assume it would be very close to that of our scheme. Nevertheless, the size of the public data remains much larger even in the most aggressive of the two variants (GS 2).
6.3. Implementation
To confirm the practicality of our scheme, we have developed a simple implementation in C. The implementation is a straightforward translation to C with the addition of the steps for generating public and private keys. The hash function used was SHA-256. We ran the protocol on a small microprocessor, namely a 580 MHz single-core MIPS 24KEc. The choice of this microprocessor was made based on the usage of it, since this type of microprocessor is commonly used in the Internet of Things (IoT) applications. The measurements are reported in
Table 5 below.
Note that key generation is dominated by the syndrome computation necessary to obtain the verification key, while sampling the signing key has a negligible cost. The signing operation is the most expensive, which makes sense, while the verification is of the same order of magnitude as the key generation. Both signing and verification algorithm are relatively fast but could be sped up even further, since the hash function used was, at the time the measurements were taken, not optimized to run in such a small device.