A system call for random numbers: getrandom()

Posted Jul 25, 2014 22:13 UTC (Fri) by giraffedata (guest, #1954)
In reply to: A system call for random numbers: getrandom() by jimparis
Parent article: A system call for random numbers: getrandom()

An attacker might exhaust file descriptors maliciously, just to get some software to pick a bad random number,

How would exhausting file descriptors get some software to pick a bad random number? The natural result of that would be for software that uses random numbers to refuse to continue.

But regardless of whether it's a valid expectation of the attacker, it doesn't explain why LibreSSL needs to have a fallback other than "return -1" for exhausted file descriptors. No other software does.

A system call for random numbers: getrandom()

Posted Jul 25, 2014 23:41 UTC (Fri) by dlang (guest, #313) [Link] (9 responses)

> How would exhausting file descriptors get some software to pick a bad random number? The natural result of that would be for software that uses random numbers to refuse to continue.

if the program zeros a buffer, then tries to read random data into that buffer and doesn't check the error codes properly, the result is that it continues on with zeros instead of it's random seed.

This is an advantage for the bad guy.

Yes, in theory this is handled by properly checking all error conditions

But in practice, we all know that such checks are not always done.

Also, note that shutting down the service is a DoS that is also to the advantage of the bad guy

A system call for random numbers: getrandom()

Posted Jul 26, 2014 1:42 UTC (Sat) by giraffedata (guest, #1954) [Link] (8 responses)

if the program zeros a buffer, then tries to read random data into that buffer and doesn't check the error codes properly, the result is that it continues on with zeros instead of it's random seed. ... in practice, we all know that such checks are not always done.

So that still doesn't shed any light on how the fact that file descriptors could be exhausted means LibreSSL needs a fallback method of generating random numbers. LibreSSL does check the error condition -- that's how it knows to fall back.

Also, note that shutting down the service is a DoS that is also to the advantage of the bad guy

And yet, no other program under the sun avoids DoS attacks by working around inability to open files. In fact, the program using LibreSSL most probably uses files other than /dev/urandom, so the bad guy can kill it by exhausting file descriptors regardless of what LibreSSL does.

It looks to me like the article is simply mistaken about the relevance of file descriptor exhaustion attacks. I think the reason LibreSSL has alternatives to /dev/urandom is that /dev/urandom might just be broken or not implemented on that system.

A system call for random numbers: getrandom()

Posted Jul 26, 2014 4:03 UTC (Sat) by jake (editor, #205) [Link] (7 responses)

> It looks to me like the article is simply mistaken about the
> relevance of file descriptor exhaustion attacks.

so, this comment that was quoted in the article:

> or consider providing a new failsafe API which
> works in a chroot or when file descriptors are exhausted.

(which comes from the LibreSSL source) was not enough to convince you that the LibreSSL folks (at least) are worried about file descriptor exhaustion?

> I think the reason LibreSSL has alternatives to /dev/urandom is
> that /dev/urandom might just be broken or not implemented on that
> system.

interesting, but it certainly isn't what they *say* ...

jake

A system call for random numbers: getrandom()

Posted Jul 26, 2014 15:55 UTC (Sat) by giraffedata (guest, #1954) [Link] (6 responses)

so, this comment that was quoted in the article:
or consider providing a new failsafe API which works in a chroot or when file descriptors are exhausted.
(which comes from the LibreSSL source) was not enough to convince you that the LibreSSL folks (at least) are worried about file descriptor exhaustion?

OK, I missed that. So the article is not mistaken. It's more like the developers were really confused, thinking it's worth adding a whole new system call to the kernel just to make a program progress a little further before succumbing to file descriptor exhaustion. Or there's some totally nonobvious attack vector I'm missing.

(I do understand that there are other, sensible, reasons to have getrandom()).

A system call for random numbers: getrandom()

Posted Jul 26, 2014 21:18 UTC (Sat) by dlang (guest, #313) [Link]

> It's more like the developers were really confused, thinking it's worth adding a whole new system call to the kernel just to make a program progress a little further before succumbing to file descriptor exhaustion.

well, that sort of thinking is par for the course for people who get tightly absorbed into security thinking. They start to see the small things that can fail and forget that the overall system is probably going to be down first.

A system call for random numbers: getrandom()

Posted Jul 27, 2014 11:57 UTC (Sun) by gioele (subscriber, #61675) [Link] (4 responses)

Is it that hard to create a side program that uses some technique to force the exhaustion of fds during the entropy gathering (to create some weakness in a cryptographical step) and then stops, leaving the attacked programs with plenty of fds, as if nothing ever happened?

A system call for random numbers: getrandom()

Posted Jul 27, 2014 16:11 UTC (Sun) by giraffedata (guest, #1954) [Link] (3 responses)

Is it that hard to create a side program that uses some technique to force the exhaustion of fds during the entropy gathering (to create some weakness in a cryptographical step) and then stops, leaving the attacked programs with plenty of fds, as if nothing ever happened?

It doesn't matter because even if it's possible to create such a program, it's impossible for it to achieve its goal of creating weakness in a cryptographic step if LibreSSL refuses to proceed when the open of /dev/urandom fails.

That's what we've been talking about: the design choice of LibreSSL refusing to proceed in that case (the easy, natural, conventional thing to do) versus getting random numbers in some way that doesn't require file descriptors (which involves wishing for a new kind of system call) and proceeding.

A system call for random numbers: getrandom()

Posted Jul 27, 2014 17:49 UTC (Sun) by jimparis (guest, #38647) [Link] (2 responses)

> it's impossible for it to achieve its goal of creating weakness in a cryptographic step if LibreSSL refuses to proceed when the open of /dev/urandom fails.

But what does "refuse to proceed" mean? Return an easily-ignored error code? Terminate the process? Sit in a busy loop? You'll get different answers based on who you ask. I generally agree with your point, but it's not as simple as you make it out to be. Making it so that the problem can never occur is just another way of fixing it.

A system call for random numbers: getrandom()

Posted Jul 28, 2014 22:50 UTC (Mon) by giraffedata (guest, #1954) [Link] (1 responses)

But what does "refuse to proceed" mean? Return an easily-ignored error code? Terminate the process? Sit in a busy loop? You'll get different answers based on who you ask.

It really doesn't matter that there are options, because at least one of them is an entirely reasonable response to a catastrophic failure such as file descriptor exhaustion - a more reasonable response than designing a new kernel interface or computing entropy some other way. As a practical matter, I think it's obvious in this case that "refuse to proceed" should just mean "return -1" when the open fails, which would ultimately cause the LibreSSL to return failure to the user instead of creating a connection. The user can ignore that failure, but there's no way he can leak private information to an eavesdropper over a connection that does not exist.

Making it so that the problem can never occur is just another way of fixing it.

I'm really just asking why would a developer single out this one particular catastrophic failure for heroic action to avoid it? I'll bet the same code allocates memory various places and just "refuses to proceed" if the allocation fails. And at some point it creates a socket and likely just "refuses to proceed" if it fails because of file descriptor exhaustion. Maybe it even uses a temporary file somewhere, and just "refuses to proceed" if the filesystem is full.

A system call for random numbers: getrandom()

Posted Jul 28, 2014 23:13 UTC (Mon) by jimparis (guest, #38647) [Link]

As a practical matter, I think it's obvious in this case that "refuse to proceed" should just mean "return -1" when the open fails, which would ultimately cause the LibreSSL to return failure to the user instead of creating a connection.

This has nothing to do with "creating a connection"; existing code calls RAND_bytes() all the time for all sorts of things and doesn't always check the return code.

I'm really just asking why would a developer single out this one particular catastrophic failure for heroic action to avoid it?

Because this is only a problem on Linux. Because the discussion was triggered by an article entitled LibreSSL's PRNG is Unsafe on Linux. Because, as a developer points out in the comments there, "we really want to see linux provide the getentropy() syscall, which fixes all the mentioned issues."