ZenMonitor: Scaling Distributed Monitoring at Discord

Discord uses BEAM to power real-time communication between tens of millions of processes across dozens of servers. Running a full-mesh network at this scale presents unique challenges when scaling out the native monitoring capabilities of the BEAM VM.

Learn about ZenMonitor a new library developed at Discord which acts as a highly scalable drop-in replacement for process monitoring that reduces network traffic, improves reliability, and retains the core guarantees of BEAM.

OBJECTIVES

  • Provide a high-level understanding of process monitoring in a distribution.
  • Explore how scale effects the BEAM provided monitoring and the approach and design of a drop-in replacement.

TARGET AUDIENCE

To get the most out of the talk, having familiarity with how BEAM processes work and how monitoring works in practice is helpful, but a brief review will be provided. Best for people working on BEAM projects at scale that might be facing a similar scaling issue now or in the near future.