This article summaries the RSA-CRT leak developments since Red Hat published a technical report on this subject in early September 2015
.
Several vendors released software updates to harden their cryptographic libraries against RSA-CRT leaks.
Eldos (announcement has since been deleted from the web pages)
Preliminary patches for Nettle are available as well. We are still waiting for a fix for for the Go standard library
.
Keep in mind that lack of RSA-CRT hardening by itself does not cause any key leaks. To this day, no key leaks from free software whose source code is available to the general public has been observed. However, without this hardening, it is difficult to be certain no such leaks can happen.
ZyXEL published a statement saying that they are “not affected” by RSA-CRT key leaks. As far as I understand it, this only applies to current software releases. As explained in the Red Hat report, ZyXEL disabled RSA-CRT hardware acceleration on some devices in order to work around a defect which they perceived as a pure functionality issue, occasional HTTPS connection failures, not realizing the security implications of this issue. To this day, ZyXEL-branded devices remain on the Internet which leak their private RSA keys, which is not surprising because universal adoption of upgrades is difficult to achieve.
Towards the end of September 2015, I re-started the crawler, now as a side-project (due to recent developments). I also monitor the ZMap data set published by the University of Michigan for RSA-CRT leaks. Noteworthy developments are:
A VPN device by a vendor which had already been identified as affected leaked the private key from its HTTPS server. Somewhat unusual for such a device, it had an X.509 certificate signed by a browser-recognized CA. The certificate has been revoked, but the private key of the new certificate leaked as well. The device vendor has identified the root cause, but firmware updates are not yet available to end users. I am not sure what to do here. Cycling the server certificate every few weeks (or days) is not even a good interim solution.
I have observed key leaks from two additional devices which were not known to be vulnerable at the time of the report: a network security appliance and what appears to be a web camera. Both vendors are somewhat difficult to contact.
Perhaps most worryingly, a TLS-terminating load balancer (of the kind that is commonly used in front of large web sites) leaked two of its private keys. These leaks happened while there was a user-visible outage. It was only observed by the ZMap crawler, and I do not have independent handshake traces. This incident led to the revocation of two server certificates in the browser PKI. The device vendor has been contacted (in fact, had been notified prior to publication of the report, and I assumed they were not affected). In consultation with the affected organization, I'm running a dedicated crawler (at a slow rate), that monitors their public-facing web servers, and the key leaks have not returned.
In short, in the last two months, I saw four key leaks for certificates in the browser PKI (for three domains), and encountered three new device models not known to be affected before.
The two certificate authorities I had to contact were much more relaxed compared to what I experienced in 2008, when I dealt with the fallout from the Debian OpenSSL vulnerability. Neither was especially nervous about handling private subscriber key material, which made proving key compromises a lot easier. In 2008, we had to resort to zero-knowledge proofs (like signing a challenge with the compromised private key) to prove key compromise without revealing the key to the certificate authorities.
Regarding the already-known devices, the number of affected installations is still too high to begin notifying individual device owners, particularly those on dial-up/consumer Internet connections without any identifying WHOIS information.
I have written my own implementation of the early parts of the TLS handshake (up to the Server Key Exchange message). My crawler initially used OpenSSL with callback functions to intercept the TLS handshake messages. Right now, the savings from not writing the handshake parser myself in the first place appear miniscule: OpenSSL provides the Client Hello generation, and parsing of the Certificate message. OpenSSL also removes the TLS fragmentation layer before invoking my callback. But even in my OpenSSL-based implementation, I have to parse parts of the Client Hello and Server Hello messages (to get access to the client and server random), and the complete Server Key Exchange message. On top of that, the OpenSSL integration costs have to be taken into account.
Integration of my minimal TLS implementation with the main crawler code is still pending. Integration is difficult because I want to revisit the (completely insecure) fallback code in the crawler. The idea is to retry the handshake in response to a failure (a TLS Fatal Alert, a connection reset, or a similar issue), and apply some sort of simplification to it—try an older TLS protocol version, or do not send any extensions. Doing this properly is quite complicated. In the end, I may opt for retrying all choices (TLS 1.2, 1.1, 1.0 (without extensions), SSL 3.0), or skip fallback altogether, assuming that no one would care about leaks from already quite broken TLS implementations.
This brings me to the next point, bandwidth and CPU usage reduction. My handshake implementation stops after receiving a Server Key Exchange message (or a handshake message that confirms that no Server Key Exchange message will be sent as part of this handshake). This saves a Diffie-Hellman computation (either finite-field or elliptic curve) in the forward secrecy case, or else all cryptography. The client also sends just the Client Hello and none of the larger handshake messages that usually follow after receiving the Server Hello Done message. This should result in a nice saving of upstream bandwidth for the crawler (although the CPU savings are probably more significant if I end up running the crawler on some cloud).
I also wrestled with my SQLite transaction monitor, which is a topic for another article. The goal here is to make sure that a transaction eventually completes (without reporting an error to the application) as long as all the encountered error conditions are temporary. This turned to be more difficult than expected.
I hope that both the SQLite changes and the new TLS handshake implementation will eventually allow completely unattended operation of the crawler.
2015-11-02: published
2016-03-06: Corrected document path.