Safe Short Hash Scheme

When you exchange email addresses with someone and you wish to communicate with that person securely, you also need to exchange key fingerprints. Before you encrypt or sign the first message to or from the other person, you additionally need to look up the key and compare fingerprints. Nobody wants to check 256bit fingerprint values against printed ones. And it is even worse if you do not have business cards ready or you want to exchange email addresses over the phone - you don't want to dictate or write a 256bit hash.

As a first layer of security that prevents unprivileged outsiders from feeding you bad keys when you do a key lookup by email address, the responsibility for looking up keys by email address needs to be moved over to mailservers. This could e.g. be implemented by letting the mailserver serve users' keys from https://[domain]/s2h_user_keys/[email address], thus ensuring that, if it is known that the mailserver supports this key lookup method, only the mailserver and rogue CAs are able to feed bad keys to clients.

Now you could get rid of birthday attacks by using this instead of a normal key fingerprint: $$ fingerprint = \operatorname{hashfn}(email\_address \| seperator \| keydata) $$ Because the birthday paradox doesn't matter anymore, now you only need 128 bits.

Classical fingerprints have two important properties:

In the scheme I'm proposing, you use an authenticator instead of a fingerprint. That authenticator can only have one of these two properties: Either it has to be secret or it has to be used in an interactive verification process. However, the authenticator is only 22 bits long.

An authenticator consists of two parts, a 9-bit hash and a 13-bit salt. Every time you want to give your email address to someone, you have to generate a new authenticator as follows:

  1. Choose a random salt.
  2. Calculate $ hash = \operatorname{hashfn}(fingerprint \| salt) $, where $fingerprint$ is the fingerprint as defined above.
  3. Put them together: $authenticator = salt \| hash$

A MUA can now, given the mail address, the key and the authenticator, verify the correctness of the hash for the salt contained in the authenticator. The remaining question is how much security this provides against an attacker. It is clearly impossible for an attacker to spoof an identity with full reliability because doing so would require bruteforcing a total of $ 2^{13} * 9 = 73728 $ independent bits, but it is also clear that with a lot of luck, the attacker only needs to get one of the possible authenticators right, requiring only 9 bits of bruteforcing.

In the following section, a lower bound for the number of hash function iterations required to perform a successful attack in 1% of all cases is calculated. $S$ is the size of the salt in bits, $H$ is the size of the hash in bits, $w$ is the wanted success probability.
The number of possible salts is $$ n = 2^S = 2^{13} = 8192 $$. Of those, the attacker needs to get $$ k_{min} = \left\lfloor n * w \right\rfloor = \left\lfloor 8192 * 0.01 \right\rfloor = 81 $$ right.
For every salt, the chance of the corresponding hash being correct is $$p = \frac{1}{2^H} = \frac{1}{2^9} = \frac{1}{512}$$.
The chance of a random fingerprint having enough correct authenticators is the binomial cumulative probability $$ q = \sum_{k=k_{min}}^{n} \binom{n}{k} * p^k * (1-p)^{n-k} = \sum_{k=81}^{8192} \binom{8192}{k} * \left(\frac{1}{512}\right)^k * \left(1-\frac{1}{512}\right)^{8192-k} = 6.37638 \times 10^{-31} $$. This is equivalent to a security level of $$B = \left\lfloor -\log_2(q) \right\rfloor = \left\lfloor -\log_2(6.37638 \times 10^{-31}) \right\rfloor = 100 $$ bits.

However, the total security level should ideally be 128 bits. This can be accomplished by demanding that $\operatorname{hashfn}(fingerprint \| x)$, where $x$ is a value that can be chosen freely by the party generating the key and needs to be stored together with the key, starts with $128-100=28$ bits that are zero, causing the generation of one new fingerprint to require $2^{28}$ hash operations on average. Because $fingerprint$ contains the email address, these fingerprints can not be reused for attacks against another user. This construct is essentially a proof-of-work mechanism and does introduce a nontrivial computational cost on key generation, but the cost is tolerable - according to the bitcoin mining hardware comparison for non-specialized hardware, generating $2*390000=780000$ SHA256 hashes takes one second on a Pentium 3 processor, meaning that on average, $$\frac{2^{28}}{780000} = 344$$ seconds are required to generate a valid $x$. Modern CPUs like Intel's Core i7 3930k can generate $2*66600000 = 133200000$ SHA256 hashes per second and therefore only need $$\frac{2^{28}}{133200000} = 2$$ seconds to generate a valid $x$.

The security of this scheme depends on the attacker not knowing which salt the user will use to check a key. As described above, this can be archieved by either transmitting the authenticator over a secret channel - for example, by writing it on a piece of paper and giving it to someone - or by transmitting the authenticator after the key to check has been delivered to the communication partner, which is e.g. suitable for use over the telephone.

A very powerful attacker might be able to perform successful attacks in up to 1% of all cases. Therefore, it is important that all clients alert the user whenever an authenticator can not be verified.

If you can print business cards with different email addresses, maybe you could even put the authenticator into the email address in Base32, marked with two plus signs or so and with a 8-bit checksum, like this: user++SEK3S3@example.org. Of course, the business cards would then have to be kept secret, and it would require mail clients to recognize the syntax.

A nice side effect is that, as long as an attacker doesn't know whether you actually have an authenticator or not, substituting the server's response would be dangerous for him. Therefore, even if a user does not supply an authenticator string to his mailclient, the client can retrieve a somewhat trustworthy key from the mailserver.

The code used for the calculations, which is horribly slow, but should be very precise, is here.