Everything about distributed systems is terrible

Distributed systems are hard. How do you test your system when it's spread across three services and four languages? Unit testing and type systems only take us so far. At some point we need new tools.

Enter TLA+. TLA+ is a specification language that describes your system and the properties you want. This makes it a fantastic complement to testing: not only can you check your code, you can check your design, too! TLA+ is especially effective for testing concurrency problems, like crashes, race conditions, and dropped messages.

This talk will introduce the ideas behind TLA+ and how it works, with a focus on practical examples. We'll also show how it caught complex bugs in our systems, as well as how you can start applying it to your own work.

OBJECTIVES

Attendees will understand what - fundamentally - makes concurrency a difficult problem, how TLA+ can help them, how to start learning TLA+, and where they can most easily apply TLA+ to help them in their current project.