SoK: A Literature and Engineering Review of Regular Expression Denial of Service (ReDoS)
This blog post summarizes our recent work, “SoK: A Literature and Engineering Review of Regular Expression Denial of Service (ReDoS),” which appeared at the ACM ASIA Conference on Computer and Communications Security 2025 (ACM ASIACCS’25). This work was done in collaboration with Cris Staicu and his student Masud Bhuiyan at Germany’s CISPA Helmholtz Center for Information Security. This post was written by the co-lead author, my student Berk Çakar. You can find the preprint of the full paper here.
Background: Regexes and ReDoS
Regular expressions are everywhere. They are used for input validation, search, parsing, and nearly every software system processes text. But what if a seemingly harmless regex in your codebase, or even one hidden away in an imported library, could be exploited by an attacker to bring your whole system down?
This threat is known as Regular Expression Denial of Service (ReDoS), and dignified by MITRE in 2021 as CWE-1333 [1]. High-profile services have experienced outages due to CWE-1333: In 2016, Stack Overflow went dark for 34 minutes because of one regex [2]. The same issue happened to Cloudflare in 2019 for 27 minutes [3]. For JavaScript applications, some authors have found that ReDoS is the fourth most commonly reported server-side vulnerability [4].
The following figure from our paper shows the frequency of ReDoS CVEs by programming language package ecosystem (e.g. NPM/JS, PyPI/Python, etc.), with the OWASP Top-10 for comparison. ReDoS CVEs represent ~1% of CVEs on average across the ecosystems.
A Quick Regex Refresher: Why Can Regex Matching Become Tricky?
Simply put, regexes define textual patterns. The original “Kleene’s regexes” (K-regexes) had the basic building blocks of matching a sequence of characters (/foo/), matching a choice of two sequences (OR, /foo|bar/), and matching a repeated sequence (/foo*/). These can be processed very efficiently, with an optimal linear-time algorithm ( the processing time grows proportionally to the length of the text you are checking).
But over time, regexes got empowered, with “Extended regexes” (E-regexes) adding complex features like backreferences (“match a previously-seen sequence”, /foo(bar)\1/) and lookarounds (check what’s before/after without consuming it, /(?<=foo)bar/ to match “bar” only if it is preceded by “foo” ). These add expressive power but can complicate matching done under the hood. You might ask, why would anyone need such features? The answer is that regexes are a common element of a search API, and sometimes people want to search for surprisingly complex things. You put the two together, and regexes have gotten saddled with loads of features over the years.
What Is ReDoS?
Now, speaking of the “under the hood”, a system component called a regex engine is used to process regexes. Regex engines typically implement one of two main approaches:
- Thompson’s algorithm-based engines [5]: These strictly stick to formal automata theory and are fast with guaranteed linear-time performance. The trade-off is that they have limited support for complex E-regex features. RE2 and the Rust engine are notable examples.
- Spencer’s algorithm-based engines [6]: These use a technique called “backtracking”. Imagine the engine trying one path to match; if it fails, it “backtracks” to try another. At various points in time, Python, C# (.NET), JavaScript (V8), and Ruby have all used this algorithm.
It is this second kind of engine that concerns us. The backtracking technique is really flexible. That makes backtracking a great way to implement the E-regex features. It also raises a problem: that flexibility reduces the optimization potential, resulting in worse-than-linear match times on certain regexes and certain inputs.
CWE-1333 (whether triggered by an attacker or inadvertently, as in the case of Stack Overflow and Cloudflare) happens when a particular kind of input string reaches a vulnerable regex in the system and triggers “catastrophic backtracking” on the regex engine. In this scenario the cost is asymmetric: the problematic input is small, but the server may still spend massive amounts of CPU time trying to match the regex, effectively denying service to legitimate users. That imbalance between the small cost for the attacker and the high cost for the server makes ReDoS an asymmetric threat [7].
For a ReDoS attack to work, three key ingredients are usually needed [8]:
- A Backtracking Regex Engine: The system uses an engine that can suffer from catastrophic backtracking.
- A Vulnerable (Super-Linear) Regex: The application uses a regex pattern that is “problematically ambiguous” and can lead to super-linear (as discussed above) matching time.
- Attacker-Controlled Input: The attacker needs a way to craft and send the malicious string so it reaches and gets processed by that vulnerable regex.
For more on ReDoS, I recommend checking out this blog post from my lab.
Why We Wrote This Paper
The knowledge gap
Although the academic literature has many research papers about ReDoS written over the years, nobody has asked: What do we know about ReDoS, and what exactly matters to software practitioners? We aimed to document what is known about ReDoS, the gaps, and where we should go next.
The personal side
This project was supervised by James C. Davis and Cris Staicu. Davis and Staicu led two of the major works on ReDoS in practice — Davis’s FSE’18 paper on “The Impact of ReDoS in Practice” and Staicu’s USENIX’18 paper on “Freezing the Web”. We ourselves built on prior work providing measurement tools, and since then we’ve seen dozens of follow-up works at top computing research venues. We’ve chatted at conferences about the state of the field, and finally decided we should write a review paper.
Method and Results
We proceeded down two prongs of research.
- First, we went over the existing academic research (i.e., the literature review part of the title), examining dozens of papers on how ReDoS is detected, prevented, and mitigated.
- Second, we conducted an engineering review. We surveyed the latest regex engines in popular programming languages to see if and how ReDoS defenses have been implemented and how effective they are in practice. This is the part we think software engineers and developers would find especially interesting.
What Do We Know About ReDoS? (The Literature Review)
In our paper, we systematically reviewed 35 academic papers focusing on ReDoS. The next figure shows the categories we identified.
Here is a brief overview of each category:
- Spotting ReDoS (Detection): How do we find these vulnerable regexes? Some authors use heuristics, others use formal automata. Others use dynamic detectors (e.g., fuzzing with runtime instrumentation) and ML models.
- Stopping ReDoS Before It Happens (Prevention): Can we design things to be ReDoS-proof? Some authors have pursued better regex matching algorithms or optimizations such as memoization to avoid catastrophic backtracking. Others have developed tools to automatically rewrite vulnerable regexes into safer, equivalent ones.
- Shielding Against ReDoS (Mitigation): What if an attack gets through? Some authors focus on anomaly detection, using ML or other techniques to spot when a regex is taking suspiciously long and then taking action. Others devise caps on how long regexes should be permitted to run, or how many backtracking steps can be taken.
During our review, we noticed that the academic community makes strong assumptions about ReDoS and attacker capabilities (e.g., assuming that attackers can send arbitrarily long inputs or always reach the vulnerable part of a regex), which may not always hold in practice due to common server-side protections like input validation & length limits, rate limiting, and application firewalls. We also found that definitions of what counts as a “ReDoS vulnerability” vary widely. Some studies consider a one-second slowdown significant, while others require much longer. These inconsistencies make it difficult to assess the actual risk in practice.
What Are We Doing Against ReDoS? (The Engineering Review)
It is great that researchers are working on ReDoS, but have their ideas made it into the regex engines developers use? We reviewed the regex engines used in nine popular programming languages: JavaScript, Ruby, C#, Java, PHP, Perl, Python, Rust, and Go. We found that some language runtimes modernized their regex engines in recent years to address ReDoS concerns, some utilized Thompson-style engines from the start to not give a chance for ReDoS, and some still use unoptimized Spencer-style engines that are vulnerable to ReDoS. The table below summarizes the current state of ReDoS defenses in these languages; a detailed description follows.
- JavaScript (Node.js — V8): Introduced an opt-in Thompson-style non-backtracking engine in 2021 for regexes that do not use E-regex features [9]. For those that do, the old Spencer-style backtracking engine is the only option. The change shipped in V8 v8.8, and Node.js adopted it as an experimental feature since v16.0.0. Engineering effort to implement this change was about 1558 lines of code (LOC), which is 10.6% of the original regex engine codebase.
- Ruby (MRI/CRuby): Adopted Davis et al.’s [6] memoization scheme for its Spencer-style backtracking engine for a defense that is on by default, and offers an optional timeout mechanism for E-regexes that can’t be memoized. Both features are available since Ruby 3.2 (2022) [10,11]. In total, both defenses took 1100 LOC to implement, corresponding to 4.7% of the original regex engine codebase.
- C# (.NET): Added an optional non-backtracking engine based on Brzozowski derivatives [12,13] in .NET 7 (2022) [14] and also had an optional timeout feature since .NET 4.5 (2012) [15]. However, the non-backtracking engine does not support many E-regex features similar to Thompson-style engines. The new non-backtracking engine is implemented in 5417 LOC, which is 34.7% of the original regex engine codebase.
- Java (OpenJDK): Introduced some bounded caching mechanisms to its Spencer-style backtracking engine, which can help in certain ReDoS scenarios. This ReDoS defense has been available and enabled by default since Java 9 (2016). 1712 LOC were added to the original regex engine codebase (35.5% expansion) to implement this change.
- Rust and Go: These languages were designed with ReDoS safety in mind from the beginning, using Thompson-style engines that guarantee linear time matching. As a consequence, they do not support most of the E-regex features.
- PHP (Zend Engine), Perl (perl5), and Python (CPython): PHP, Perl, and Python also use Spencer-style backtracking regex engines and are likely vulnerable to ReDoS attacks. PHP [16] and Perl utilize [17] “counter-based caps” (i.e., limits on the number of backtracking steps) for more than 20 years to prevent catastrophic backtracking as an alternative to C# and Ruby’s “time-based caps” (i.e., timeouts).
Lastly, Python has neither built-in engine optimizations against ReDoS vulnerabilities nor a native timeout-like mechanism for regex evaluations.
Now, it is great that engines are being updated, but how much safer are we? Or, reversely, how much more vulnerable are we if we use an engine that does not have ReDoS defenses?
To figure this out, we designed an experiment. We grabbed a set of ~500K regexes collected from open-source software [18]. Using vuln-regex-detector, we identified regexes that were predicted to have exponential or polynomial worst-case behavior. For these candidates, we generated input strings designed to trigger ReDoS.
Then, we ran these regex-input pairs on two versions of the reviewed regex engines: an older version (before any major ReDoS fixes) and the latest version (with any new defenses turned on, even if they were not default). We timed how long the matches took and categorized them:
- Exponential: The worst-case scenario. The processing time exceeds five seconds with 500 or more pumps of an attack string.
- High-Polynomial: Huge improvement but still pretty bad. The processing time exceeds five seconds with 500 or more pumps of an attack string.
- Low-Polynomial: Getting better… The processing time exceeds 0.5 seconds on an attack string but does not meet the exponential or high-polynomial criteria.
- Linear: Nice and speedy. The regex match never times out on the attack string.
OK, how did it go?
The figure shows a substantial decrease in super-linear behavior in the engines that had implemented ReDoS defenses. Notably, exponential behavior is resolved in C# and JavaScript but persists in Ruby and Java, while polynomial behavior continues to manifest in JavaScript and Java. Therefore, the answer is yes! The defenses appear to be effectively working in addressing the common causes of super-linear behavior.
What Does All Mean?
Our work is not only an academic exercise but also a call to action for developers and software engineers, and for researchers. Here are some key takeaways:
- ReDoS is Contextual: A regex that is dangerous in one deployment can be harmless in another because engine versions, thread models, and resource limits differ. For example, a super-linear regex might be more serious in a single-threaded web application architecture like Node.js.
- Question the Threat Model: Much prior work assumes the adversary can send huge payloads or hit any input point without throttling. Production systems often enforce size caps, rate limits, or WAF filters. Before triaging a “vulnerable” regex, ask whether an external user can realistically reach it with a suitably long string.
- Expect Developer Pushback: Engineers sometimes label ReDoS reports as noise because few real outages have been traced to deliberate attacks, and some CVEs have been disputed. When filing bugs or reviewing reports, accompany them with a reproducible proof-of-concept under realistic circumstances; otherwise, the issue may be dismissed.
- Know Your Regex Engine: Understand the default behavior of the regex engine in your programming language and whether it offers safer modes or ReDoS-specific defenses. Are they on by default? Which regex APIs do you need to use to get the safest behavior? If you are using a Spencer-style regex engine, be more cautious.
- Look Beyond Regexes: Some parts of your program take longer than others to execute. If an attacker can repeatedly reach these slow spots, the service can stall, much like in a ReDoS attack. While most denial-of-service cases still rely on big traffic spikes, attackers may switch to these “complexity” tricks. Regular profiling, removing needless loops, and adding timeouts on heavy work can save CPU cycles and leave attackers with less to exploit.
Final Thoughts
Our exploration of ReDoS research and regex engine defenses shows a promising trend: awareness is growing, and solutions are emerging.
In fact, we think this is a clear success stories for software engineering and cybersecurity research: researchers identified the problem, analyzed its causes, proposed solutions, and, in many cases, helped shape practical defenses now used in production systems.
However, the ReDoS threat still continues. By understanding the nuances of ReDoS, knowing your tools, and thinking critically about exploitability, you can help taking ReDoS under control.
We hope our work provides a valuable map for researchers studying ReDoS as well as for software engineers on the front lines using regexes in their applications.
We thank the US National Science Foundation (NSF) for supporting this work through NSF-SaTC-2135156.
References
[1] CWE-1333: Inefficient Regular Expression Complexity. https://cwe.mitre.org/data/definitions/1333.html.
[2] Stack Exchange Network Status. 2016. Outage Postmortem — July 20, 2016. https://stackstatus.tumblr.com/post/147710624694/outage-postmortem-july-20–2016
[3] John Graham-Cumming. 2019. Details of the Cloudflare outage on July 2, 2019. https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019
[4] Masudul Hasan Masud Bhuiyan, Adithya Srinivas Parthasarathy, Nikos Vasilakis, Michael Pradel, and Cristian-Alexandru Staicu. 2023. SecBench.js: An Executable Security Benchmark Suite for Server-Side JavaScript. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 1059–1070. doi:10.1109/ICSE48619.2023.00096
[5] Ken Thompson. 1968. Programming Techniques: Regular expression search algorithm. Commun. ACM 11, 6 (June 1968), 419–422. doi:10.1145/363347.363387
[6] Henry Spencer. 1994. A Regular-Expression Matcher. In Software Solutions in C. Academic Press Professional, Inc., USA, 35–71
[7] Georgios Mantas, Natalia Stakhanova, Hugo Gonzalez, Hossein Hadian Jazi, and Ali A. Ghorbani. 2015. Application-layer denial of service attacks: taxonomy and survey. International Journal of Information and Computer Security 7, 2/3/4 (Nov. 2015), 216–239. doi:10.1504/IJICS.2015.073028
[8] James C. Davis, Francisco Servant, and Dongyoon Lee. 2021. Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS). In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 1–17. doi:10.1109/SP40001.2021.00032
[9] Martin Bidlingmaier. 2021. An Additional Non-Backtracking RegExp Engine. https://v8.dev/blog/non-backtracking-regexp
[10] Yui Naruse. 2022. Ruby 3.2.0 Released. https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/.
[11] Victor Shepelev. 2023. Ruby 3.2 Changes. https://rubyreferences.github.io/rubychanges/3.2.html#regexp-redos-vulnerability-prevention.
[12] Dan Moseley, Mario Nishio, Jose Perez Rodriguez, Olli Saarikivi, Stephen Toub, Margus Veanes, Tiki Wan, and Eric Xu. 2023. Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics. Proc. ACM Program. Lang. 7, PLDI, Article 148 (June 2023), 24 pages. doi:10.1145/3591262
[13] Olli Saarikivi, Margus Veanes, Tiki Wan, and Eric Xu. 2019. Symbolic Regex Matcher. In Tools and Algorithms for the Construction and Analysis of Systems, Tomás Vojnar and Lijun Zhang (Eds.). Springer International Publishing, Cham, 372–378. doi:10.1007/978–3–030–17462–0_24
[14] Stephen Toub. 2022. Regular Expression Improvements in .NET 7. http://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7.
[15] NET Contributors. 2023. What’s New in NET Framework. https://learn.microsoft.com/en-us/dotnet/framework/whats-new/#whats-new-in-net-framework-45.
[16] PHP Documentation Group. 2024. PHP: Runtime Configuration Manual. https://www.php.net/manual/en/pcre.configuration.php.
[17] Joerg Ludwig. 2009. Complex regular subexpression recursion limit. https://www.perlmonks.org/?node_id=810857.
[18] James C. Davis, Louis G. Michael IV, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee. 2019. Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 443–454. doi:10.1145/3338906.3338909