I saw Alex tweeting about Zoom's end to end encryption (e2ee) plans the other day and I've since made it through the first couple chapters of their whitepaper. TL;DR, based on the first two sections, it looks pretty good.
First, I have to give credit to the authors of the whitepaper. I'm only through the first two sections and it's really good, specifically section 1.2, which lays out the actors, the things that should be protected, and who should have access to what.
Section 1.3 is also good. It acknowledges things like the fact that a malicious participant in the call who colluded with the server could masquerade as another user. It also calls out metadata and traffic analysis attacks & software flaws. It's important to understand what "secure" actually means, and they do a good job at identifying the types of attacks they are not intending to address. When sections like this are omitted from the analysis, it's not clear if the proposed solution doesn't handle certain attacks intentionally, or whether it's a flaw in their solution.
Section 2 is the real meat of the paper. It goes over the phases planned for rolling out e2ee and the security guarantees at each step. In phase 1, it claims that it defeats passive attackers. It fails to mention that TLS already does that.
Section 2.2 acknowledges that Phase 1 allows the server to swap keys to allow unauthorized people in the meeting. Phase 2 more or less fixes this. With SSO, it fixes it, and without SSO, it uses a trust-on-first-use paradigm for detection.
Alex suggests using a content ringbuffer of the last X seconds of content that can be submitted to Zoom for triage. This is interesting. If it's the reporter who submits the content, this seems like a reasonable solution.
He also says that Zoom's Trust and Safety team can enter meetings and says nothing about any of the participants consenting to this. This doesn't match section 2 at all. I assume this is about the current system (no e2ee).
Alex then immediately says Zoom's employees would be prevented from entering a meeting, even visibly. He even specifically mentions there will not be a backdoor to allow this. This is why I suspect his previous comment was about the current system.
I did jump ahead a little bit to get a preview of how joining a meeting works, and 3.7.2 talks about this. The inputs to the meeting key are a none (32-byte seed), a static string, the meeting ID, and the meeting UUID (from Zoom infra).
All participants and the server have all of this information. The list of participants is also obtained from the server. At a glance, it looks like a malicious or compromised server could join as the leader, but Section 3.8 seems to be the protection against this being a meaningful attack. Each non-leader computes the meeting key using mk. mk contains "Meta" which includes the leader's public key. I haven't gone through everything, but the fact that the leader key is involved makes me think that there's going to be something that will require the attacker to have the leader's private key in order for any legitimate participants to believe the meeting is legitimate.
Keep in mind, I've only skimmed section 3. You should not take my skimming of custom cryptographic protocols as proof that the design is actually safe. If you're going to rely on this, read the spec, or have someone read it for you.
As the community continues to explore the evolving security posture of Zoom and the effects on corporate and personal security, there could be follow up posts on this topic.
If you need assistance with threat modeling systems you built, or rely on, feel free to reach out to us!