Parsing safely, from 500MB/S to 2GB/s

#Code Mesh LDN 2018

Download slides

To handle low level data, we now have a few safe languages, and good parsing libraries, to make sure untrusted data will never overstep its bounds. Unfortunately, when we need performance, we will too often resort to handwritten state machines, generally in C, and maybe a little assembly while we're at it.

Thanks to one of the most annoying formats to parse (HTTP), we will see how we can write a naive parser in Rust, and transform it to beat state of the art handwritten C parsers while keeping it as readable and safe as the original one.

OBJECTIVES

This talk will describe a few common issues in parsers, and how they interact (often badly) with performance, like looking for a specific token or partial parsing. We will see how they are commonly handled in C parsers, with techniques like GOTO based state machines, lookup tables and vectorisation. The goal of this talk is to show that it is possible to get the performance of handwritten C parsers without compromising on readability and maintainability, by using Rust and parser combinators.

Geoffroy Couprie

Back to all media