A Sense of Time for JavaScript and Node.js

James Davis
8 min readAug 21, 2018

--

Introduction

This is a brief for the research paper A Sense of Time for JavaScript and Node.js: First-Class Timeouts as a Cure for Event Handler Poisoning, published at USENIX Security 2018. I was the first author; Eric R. Williamson and Dongyoon Lee rounded out the team.

Summary

We defined a class of denial of service (DoS) attack called Event Handler Poisoning. This attack affects web servers that use the event-driven architecture. Web servers written in Node.js use this architecture and are thus vulnerable to these attacks. As part of our study we examined a vulnerability database and found that this type of vulnerability is quite common (35% of vulnerabilities are of this type).

We built a prototype defense, First-Class Timeouts, against this type of attack. Our prototype is for Node.js so we called it Node.cure. The prototype defeats all known examples of Event Handler Poisoning attacks in the wild, though of course it incurs some performance overhead.

Node.js

What is it?

Node.js lets JavaScript escape from the web browser.

For many years developers would only use JavaScript on the client-side, running in web browsers. This led to inefficiencies and code duplication, since the server-side code that this JavaScript would talk to would have to work with the same constructs, validate them in the same ways, and so on, but in another programming language like PHP or Java. In 2009 Ryan Dahl published the Node.js framework which makes it easy to write JavaScript that can run on the server-side instead.

Node.js supports:

  • The same Event Loop / Workers style common in web browsers.
  • But with access to system calls (fork/exec, file system, network access, DNS, etc.)
  • And handy built-in tools for web server development, like native support for HTTP/S, crypto, compression, etc.

The vision is that developers can use a single language — JavaScript — for a uniform application that stretches across both client- and server-side.

Why study it?

We studied Node.js because it’s A Big Deal (7M application developers, 700K modules in npm, 1B framework downloads, 24B module downloads/month, etc.), but there has not been a lot of academic research on it. Although there are plenty of opinions in “grey literature” (blog posts, whitepapers, etc.), I believe an academic perspective is valuable too.

I have previously studied Node.js from the perspective of correctness (free pdf). In this paper we studied it from the perspective of security.

Which Node.js applications?

Although you can build non-server applications with Node.js (e.g. command-line tools like npm), from a security perspective the most interesting class of applications are web-facing web servers.

Web server architectures

Web servers have been around for decades now. There are two classic architectures: One Thread per (Active) Client, and Event-Driven.

One Thread per (Active) Client (OTPC)

In this architecture, a Dispatcher catches incoming requests and passes them off to a Threadpool. The generation of responses proceeds concurrently thanks to the operating system’s preemptive scheduler.

The classic example of the One Thread per Client architecture is the Apache http server.

One Thread per Client architecture. Each client request is dispatched to a separate thread.

Event-Driven (ED)

In this architecture, the client requests are multiplexed onto the same small set of threads: the Event Loop and the Worker Pool. The Event Loop is single-threaded and the Worker Pool has dozens, not thousands, of threads.

In the Event-Driven architecture, the application developer is responsible for ensuring that each stage (callback/CB or task) of response generation does not take more than its share of the CPU. This is called cooperative multitasking.

Node.js is the most prominent server-side framework that uses this architecture. There have been others, like Twisted (Python), Vert.x (Java), and EventMachine (Ruby).

(Asymmetric Multi-Processing) Event-Driven architecture. Every client request is multiplexed onto the same set of threads, called Event Handlers.

Event Handler Poisoning (EHP) Attacks

What happens if the application developer makes a mistake, and permits a single stage of a client request to take a long time?

In the OTPC architecture, nothing very bad happens. The thread to which this client was assigned will take a long time handling the request, that’s all. It may consume an abnormally large amount of resources (CPU, I/O bandwidth, etc.) while it does so, but thanks to the preemptive multitasking of the OTPC architecture, other clients will continue to be handled.

In the ED architecture, this situation is a disaster. Here is an illustration of what happens when client A sends an input that triggers a long-running callback on the Event Loop.

Event Handler Poisoning Attack on the Event Loop. Client A’s long-running request A blocks the handling of client B’s request.

The result is a denial of service (DoS) to current and future clients. If a malicious client identifies this behavior then they can launch DoS attacks driven by resource exhaustion — the exhausted resource is the set of Event Handlers.

Defenses

Suppose an attacker identifies an EHP vulnerability on your Node.js-based web server and launches an EHP attack.

  • This isn’t that unlikely. We analyzed Snyk.io’s vulnerability database and found that about 35% of the reported vulnerabilities in npm modules can be used as an EHP vector, so EHP vulnerabilities are pretty common in practice.
  • And Staicu and Pradel showed that you can map some of these vulnerabilities into attacks on popular websites.

What should you do?

Bad idea 1: Heartbeats

Idea

You could put a heartbeat on the Event Loop and the Workers, and monitor for failures. If they fail, restart the server.

Problems

  • Every time you restart the server, existing connections are severed — one-time DoS.
  • And there’s nothing preventing the attacker from launching the attack again —so now the one-time DoS becomes an ongoing DoS.

Bad idea 2: Partitioning

EHP vulnerabilities happen when your application doesn’t regularly yield between handling different clients. For example, if you might take 10 seconds in a single callback, then the client associated with this callback is getting an unfair amount of time on the Event Loop.

Idea

EHP vulnerabilities are as potent as the weakest link (least fair) portion of your code. So you could find all of the callbacks in your code that might “take awhile” and break them up into smaller pieces. For example, you could make sure all loops take at most a constant number of steps before yielding and re-entering later (e.g. using generators).

Problems

  • This will require a huge amount of refactoring in your application, plus a hefty maintenance burden as your application evolves.
  • Unless you plan to rewrite all of the 3rd-party modules from npm, you won’t be able to guarantee that the EHP vulnerabilities in your dependencies are fixed.
  • What about I/O? This scheme seems fine for computation, but for I/O we have to ask the OS to handle the request and we have no guarantees about how long it will take.

First-Class Timeouts

In our paper we propose First-Class Timeouts as a solution.

Idea

Like how an OutOfBoundsException protects against a buffer overflow attack, we suggest that a TimeoutException can protect against an EHP attack. We believe it is easier for developers to reason about timeouts than for them to reason about the computational and I/O costs of running millions of lines of 3rd-party libraries.

Instead of permitting callbacks on the Event Loop to run forever, an ED framework should deliver a TimeoutException to make sure the application knows it is being unfair and to let it make a decision about whether to proceed or not. The same is true for tasks on the Worker Pool.

In effect, we are proposing that ED frameworks should not rely on application developers to correctly implement pure cooperative multitasking — it’s really hard! Instead, these frameworks should offer developers a time-aware cooperative multitasking environment.

Problems

First-Class Timeouts permit the detection and response of EHP attacks.

  1. Detection: The application developer must choose a timeout threshold.
  2. Response: The application developer must decide how to respond to a TimeoutException. But we think this refactoring is a lot easier than it would be for partitioning. If you’re already using Promises or Async/Await, then put a trailing try-catch handler for Timeouts and then blacklist the (presumably malicious) client.

Node.cure

To implement First-Class Timeouts in Node.js, we made the Event Loop and the Worker Pool time-aware.

At a high level, we introduced watchdog helper threads that sit alongside the Event Loop and each of the Workers, and ensure that these Event Handlers take no more than the requested timeout handling a single user request. If they do, the watchdog threads deliver a TimeoutException. The details get a bit complicated and they depend on the internals of Node.js, so that’s about all I’ll say on this.

This solution is similar to the “heartbeats” idea mentioned above. But where heartbeats are an external measure of liveness, First-Class Timeouts are an internal measure of liveness and can address DoS without harming existing connections. That’s why we call them “First-Class”: we made a sense of time a core part of the ED framework.

Community engagement

I believe research should impact practice. In some disciplines, like Physics or Chemistry, it may take a long time before anyone other than researchers benefits from the work. But in Computer Science a lot of research can be applied right away. This work is no exception.

We therefore engaged with the Node.js community!

  1. We wrote a guide for nodejs.org. Our guide describes how to avoid Event Handler Poisoning attacks in Node.js. Our pull request benefited from helpful feedback from community members.
  2. We partitioned the implementation of fs.readFile in the core fs module. Before our change, fs.readFile would stat the file and then submit a single read spanning the entire file. If the file were large, this would block the Worker Pool. Our pull request partitions the read into chunks, with the same overall memory cost but improved sharing of the Worker Pool. The pull request was accepted after a months-long discussion on the performance-security tradeoff involved.
  3. We documented several “Vulnerable APIs”, potential EHP DoS vectors among the core APIs. These include fs.readFile (before our patch), crypto.randomBytes and crypto.randomFill, and child_process.spawn.

More information

  1. The official paper, slides, and presentation video are available here.
  2. The prototype is on GitHub here.
  3. A version of the slides in PowerPoint is here.
  4. Our guide on nodejs.org is here.
  5. I wrote a blog post about the history of this paper here, covering its many rejections, rebuttals, revisions, and resubmissions.

--

--

James Davis
James Davis

Written by James Davis

I am a professor in ECE@Purdue. My research assistants and I blog here about research findings and engineering tips.

No responses yet