Advice for aspiring software engineers

Tools are helpful — but only if you use them well.

This blog post describes an anti-pattern in how some aspiring software engineers use the Internet. My observations are my own, and I have made little effort to connect them to scientific studies. Nevertheless, I hope they are helpful to someone.

As a professor of Computer Engineering, I teach and mentor many aspiring software engineers. These proto-engineers face a temptation that I did not have to struggle with when I was a student: Stack Overflow and its brethren.

Stack Overflow emerged while I was in college, and my friends and…


Regexes across the system stack. ReDoS may occur when a slow regex meets unsanitized input on a slow regex engine.

This is a brief for the research paper Using Selective Memoization to Defeat Regular Expression Denial of Service (REDOS), published at IEEE S&P 2021. I led the work, with help from Francisco Servant and Dongyoon Lee.

In this article I use the word “regex” as shorthand for “regular expression”.

Summary

Attackers can use regex-based denial of service (ReDoS) attacks to damage vulnerable web services. These attacks take advantage of the slow algorithm used by regex engines to evaluate regexes. We present novel optimizations to provably improve the worst-case behavior of these engines to linear-time. Nothing in life is free, so these…


Why you should measure the cost of your GraphQL queries, and how you should do it.

Proposed applications of our query analysis. The client’s malicious query requests an exponentially large result from GitHub’s GraphQL API. At the time of our study, GitHub permitted the shown query, but halted its execution after it exceeded a time limit. Using our techniques, client-side query inspection can provide feedback during composition (see “Complexities” inset). Server-side query enforcement can reject queries and update rate limits based on provider-defined policies.

This is a brief for the research paper A Principled Approach to GraphQL Query Cost Analysis, published at ESEC/FSE 2020. Alan Cha led the work, with help from Erik Wittern, Guillaume Baudart, me, Louis Mandel, and Jim Laredo. Most of these authors are affiliated with IBM Research or IBM’s product teams, as part of IBM’s ongoing involvement with GraphQL.

This project is a follow-up to our previous work studying GraphQL schemas.

Summary

The state of practice: The landscape of Web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. The latest web API model is…


Introduction

This post is intended as a “technical two-pager” to summarize a security vulnerability called Regex-based Denial of Service (AKA Regex DoS, ReDoS). There are a variety of write-ups about ReDoS, but I’m not aware of a good one-stop-shop with a higher-level treatment of all aspects of the subject. I have included links at the end to more detailed treatments.

I have used headings liberally to help you navigate to your issue.

What is a regular expression?

A regular expression (regex) is a tool that your engineering team uses to manipulate strings. They probably use it to impose some kind of order on unstructured input, e.g…


My wife Kirsten Davis and I just finished up a two-person academic job search. We were successful!

The (Professors and) Doctors Davis

This essay shares our experiences solving the dreaded “two-body problem”. I hope that it helps another couple in the future.

One note before we begin: My wife studies Engineering Education, and I study Computer Science. The job market in 2020 was pretty good for both of these fields, with a “large” number of openings relative to applicants. This afforded us some luxuries that may not be available to couples in other disciplines.

The two-body problem

My wife Kirsten Davis and I were both interested in tenure-track…


Introduction

In May 2018, I got some pleasant news: an academic paper accepted to USENIX Security 2018 (full version here, summary here).

This notification was thrilling for two reasons:

  1. This was the first paper I had owned from start to finish.
  2. The paper had been rejected a lot of times.

This post presents the saga of the paper, and includes the different stages of the manuscript and the reviews each version received. I conclude with some reflections about the process.

My intention in writing the post is to give a behind-the-scenes look at the life of an oft-rejected paper. I have…


This is a brief for the research paper An Empirical Study of GraphQL Schemas, presented at ICSOC 2019. Erik Wittern led the work, with help from Alan Cha (implementation), myself (experimental design), Guillaume Baudart (theoretical analysis), and Louis Mandel (theoretical analysis). Most of these authors are affiliated with IBM, as part of IBM’s ongoing involvement with GraphQL as part of the GraphQL Foundation.

Since GraphQL is unfamiliar to many readers, I’ve included a bit more introductory material and illustrations than I usually do.

Summary

GraphQL is a query language for data that can be represented as a graph, and reportedly offers…


This is a brief for the research paper Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions, presented at ASE 2019. Mischa Michael led this project, with support from James Donohue, myself, Dongyoon Lee, and Francisco Servant.

In this article, I use the word “regex” as shorthand for “regular expression”.

Summary

This paper describes the first large-scale qualitative examination of the ways software engineers interact with regexes. We surveyed 279 professional developers and conducted 17 interviews. …


This is a brief for the research paper Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study, presented at ASE 2019. I was the first author, alongside Daniel Moyer, Ayaan Kazerouni, and Dongyoon Lee.

In this article I use the word “regex” as shorthand for “regular expression”.

Summary

This paper is about methodology and measurements. We don’t describe any (particularly) new techniques or shocking results. Instead, we take a step back and rigorously evaluate whether the methodologies followed by different research groups are comparable, and whether their results can be generalized to new places.

Background and Motivation

What are regexes?

Regexes are a tool to…


This is a brief for the research paper Node.fz: Fuzzing the Server-Side Event-Driven Architecture, presented at the European Conference on Computer Systems (EuroSys) 2017. I was the first author, supported by Arun Thekumparampil and Dongyoon Lee.

Summary

The event-driven architecture (EDA)has reached the mainstream on the server side.

To use the EDA, developers register callbacks for client events, stringing together complex response generation using asynchronous ordering primitives like nested callbacks, async/await, futures, or promises. The event-driven architecture is typically implemented with a single thread, avoiding explicit race conditions due to concurrency. …

James Davis

I am a professor in ECE@Purdue. I hold a PhD in computer science from Virginia Tech. I try to summarize my research findings in practitioner-friendly ways.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store