Using Selective Memoization to Defeat Regular Expression Denial of Service

Regexes across the system stack. ReDoS may occur when a slow regex meets unsanitized input on a slow regex engine.




Regex operators and NFA equivalents for character, concatenation, repetition, and disjunction (OR). Edges are labeled with the input characters that must be consumed in order to traverse them. The funny “e” symbols are called epsilon-edges and can be taken without consuming a character.
Regex with exponential worst-case behavior in a Spencer-style backtracking regex engine.

Why would you use backtracking and not lockstep?



Reducing memoization costs

(Left) In-degree>1 memoization for (a|a). (Right) Ancestor memoization for (a|a)+.

Table encodings

Extended regexes



Time cost

Case studies of applying our techniques to regular and extended regexes (K- and E-). The regexes are: (a) Exponential Microsoft username regex (responsibly disclosed); (b) Cloudflare regex; (c,d) Hand-crafted examples. All K-regexes and REWZWA can be handled in linear-in-n. For REWBR, memoization reduces the degree of the exponent.

Space cost

Sizes of the vertex sets for the selective memoization schemes. Whiskers indicate the (1, 99)th percentiles. Outliers are not shown. Among the supported regexes, the results are similar for all regexes and the super-linear subset.
Selective memoization significantly reduces the per-query storage cost. The overheads of a hash table appear to outweigh the benefits of the “only track what you’ve seen” approach (orange bars). For RLE, the 90th percentile cost was about 10 runs, resulting in effectively constant costs in this experiment. This experiment used input strings of length 20KB to imitate the Stack Overflow outage.


More information




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
James Davis

James Davis

I am a professor in ECE@Purdue. I hold a PhD in computer science from Virginia Tech. I blog about my research findings and share tips for engineering students.