Peter Van Roy

Distributed systems guru

Peter Van Roy is a professor in the ICTEAM Institute at the Université Catholique de Louvain (Belgium). In his spare time, he coordinates the LightKone Horizon 2020 project which is working on an Erlang-based programming platform for general-purpose computing on edge networks based on CRDTs and hybrid gossip algorithms.

He is the author of "Concepts, Techniques, and Models of Computer Programming" and of "Programming Paradigms for Dummies".

Past Activities

Peter Van Roy
Code Mesh V

Tutorial/ 04 Nov 2020
13.00 - 16.30

Highlights and Insights in Distributed Programming

There are three certainties in life, death, taxes, and node failures in distributed systems. This tutorial will explain some useful but less understood concepts and techniques for designing distributed systems. The tutorial is not intended to be exhaustive but to improve your knowledge and be entertaining. I will first go over some of the basic concepts and explain what they are all about. Then I will present several important problems: replicated data structures, laws of large systems, how to use failure, and self management. Example systems will be taken from projects that I have worked in. Prerequisite for this tutorial is some modest experience with concurrent and distributed programming, in particular experience with Erlang will help you get the most from this tutorial.

EXPERTISE

Intermediate

COURSE DURATION

3h30 minutes

Basic concepts. We will present partially synchronous systems, the Internet, and eventually perfect failure detectors. These concepts are ubiquitous but are not always well understood. We will explain causality and when to use it and not use it. We will talk about time-outs inside large systems and how to use them correctly.
Replicated data structures. An important distributed pattern is the consistent replicated data structure. We will talk about two important techniques for implementing such: the RSM (Replicated State Machine), which uses consensus to achieve consistency, and the CRDT (Conflict-free Replicated Data Type), which uses clever mathematics to achieve consistency without needing consensus. As part of the presentation on CRDTs, we will explain different kinds of consistency, including eventual consistency and convergent consistency, and when they are useful.

Laws of large systems. Large system design is a general discipline that is especially useful for distributed systems. We will talk about the CAP theorem and how to use it as a system design tool. We will talk about Buridan’s theorem and its consequences for big systems. We will talk about the induction fallacy, where relying on induction gives exactly the opposite effect to the desired result. We will introduce our good friend Dijkstra’s demon. We will present the next exponential wave, Internet of Things, and give some ideas on how to ride it.
Failure is good. Failure is very familiar to Erlang programmers. Failure is quite normal, and is even good. In large systems, if there are thousands of nodes, then there will always be failures. We will explain how to build systems that survive and even thrive (improve) with failures. Failures allow to continually renew the system, which is an approach that has been used by biological systems since the beginning of life on Earth, and which is especially useful in distributed systems. We will present software rejuvenation as one technique that takes advantage of failure.
Self management. Large systems must be able to manage themselves. We will talk about the two basic approaches to self management, namely control-oriented autonomic computing and data-oriented autonomic computing. We will give a general approach for control autonomy called WIFS (weakly interacting feedback structures). We will give a general approach for data autonomy called LiRA (LightKone Reference Architecture) and its two data management principles. Both data and control autonomy are important; different parts of a large system may use one or the other or both together. We will also present hybrid gossip, which is a sweet spot that combines the advantage of classical distributed algorithms (efficiency) with the advantage of gossip algorithms (resilience).

Back to Conference

Peter Van Roy
Code BEAM STO 2019

16 May 2019
09.15 - 10.00

Why time is evil in distributed systems and what to do about it

Building distributed systems is hard, even using languages such as Erlang that support them well. There are many problems that have to be solved: partial failure, nondeterminism, observable delays, event ordering, global state, distributed consistency, performance (latency and throughput), and so on. Progress has been made in solving these problems, but often in isolation and without realizing how the solutions are related.

In this talk, Peter goes to the heart of the matter and explain why almost all of these hard problems are avatars of one problem, namely real-world time. Peter will explain how to avoid time when building distributed systems. There are some cases when time cannot be avoided, even in principle, and he will explain those as well. In both situations, using and avoiding time, Peter will try to present general solutions at a correct level of abstraction. All ideas will be illustrated by examples of real distributed systems from my experience (edge computing, consistent replication, etc.).

This talk will give you a fresh outlook on distributed systems and help you design better ones.

OBJECTIVES

To explain the importance of real-world time for distributed systems. To help you master time in your own distributed systems.