Photo by Alina Grubnyak on Unsplash

This is a brief for the research paper “A Partial Replication of “DeepBugs: A Learning Approach to Name-based Bug Detection”, published in the artifact track of ESEC/FSE 2021 [0]. This paper resulted from a course project in my course ECE 595: Advanced Software Engineering at Purdue University. The team members contributed equally during the semester, but Jordan M. Winkler stayed up late one night to trim the report into a 2-page abstract and so he is the first author on the artifact.

Original paper

In 2018, Pradel & Sen published a paper called DeepBugs that described a software defect detection tool [1].


Follow-up note on 19 May 2021: This post was written concurrent with discussions across the cybersecurity research community. Since then: The authors withdrew their paper; the conference chairs described significant changes for the next edition of the conference; the Linux community issued a statement. There is also a related comment from Ted Ts’o (Linux contributor) at the bottom of this blog post.

In April 2021, the Linux developer community issued a blanket ban on contributions from the University of Minnesota. This remarkable outcome occurred as a result of a research project by a team at UMN. The headlines from the…


Power tools are helpful — but use them safely.

This blog post describes an anti-pattern in how some aspiring software engineers use the Internet. My observations are my own, and I have made little effort to connect them to scientific studies. Nevertheless, I hope they are helpful to someone.

The Internet Anti-Pattern

As a professor of Computer Engineering, I teach and mentor many aspiring software engineers. These proto-engineers face a temptation that I did not have to struggle with when I was a student: Stack Overflow and its brethren.

Most programming-themed Internet help sites emerged while I was in college, and my friends…


Regexes across the system stack. ReDoS may occur when a slow regex meets unsanitized input on a slow regex engine.

This is a brief for the research paper Using Selective Memoization to Defeat Regular Expression Denial of Service (REDOS), published at IEEE S&P 2021. I led the work, with help from Francisco Servant and Dongyoon Lee.

In this article I use the word “regex” as shorthand for “regular expression”.

Summary

Attackers can use regex-based denial of service (ReDoS) attacks to damage vulnerable web services. These attacks take advantage of the slow algorithm used by regex engines to evaluate regexes. We present novel optimizations to provably improve the worst-case behavior of these engines to linear-time. Nothing in life is free, so these…


Proposed applications of our query analysis. The client’s malicious query requests an exponentially large result from GitHub’s GraphQL API. At the time of our study, GitHub permitted the shown query, but halted its execution after it exceeded a time limit. Using our techniques, client-side query inspection can provide feedback during composition (see “Complexities” inset). Server-side query enforcement can reject queries and update rate limits based on provider-defined policies.

This is a brief for the research paper A Principled Approach to GraphQL Query Cost Analysis, published at ESEC/FSE 2020. Alan Cha led the work, with help from Erik Wittern, Guillaume Baudart, me, Louis Mandel, and Jim Laredo. Most of these authors are affiliated with IBM Research or IBM’s product teams, as part of IBM’s ongoing involvement with GraphQL.

This project is a follow-up to our previous work studying GraphQL schemas.

Summary

The state of practice: The landscape of Web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. The latest web API model is…


Introduction

This post is intended as a “technical two-pager” to summarize a security vulnerability called Regex-based Denial of Service (AKA Regex DoS, ReDoS). There are a variety of write-ups about ReDoS, but I’m not aware of a good one-stop-shop with a higher-level treatment of all aspects of the subject. I have included links at the end to more detailed treatments.

I have used headings liberally to help you navigate to your issue.

What is a regular expression?

A regular expression (regex) is a tool that your engineering team uses to manipulate strings. They probably use it to impose some kind of order on unstructured input, e.g…


My wife Kirsten Davis and I just finished up a two-person academic job search. We were successful!

The (Professors and) Doctors Davis

This essay shares our experiences solving the dreaded “two-body problem”. I hope that it helps another couple in the future.

One note before we begin: My wife studies Engineering Education, and I study Computer Science. The job market in 2020 was pretty good for both of these fields, with a “large” number of openings relative to applicants. This afforded us some luxuries that may not be available to couples in other disciplines.

The two-body problem

My wife Kirsten Davis and I were both interested in tenure-track…


Introduction

In May 2018, I got some pleasant news: an academic paper accepted to USENIX Security 2018 (full version here, summary here).

This notification was thrilling for two reasons:

  1. This was the first paper I had owned from start to finish.
  2. The paper had been rejected a lot of times.

This post presents the saga of the paper, and includes the different stages of the manuscript and the reviews each version received. I conclude with some reflections about the process.

My intention in writing the post is to give a behind-the-scenes look at the life of an oft-rejected paper. I have…


This is a brief for the research paper An Empirical Study of GraphQL Schemas, presented at ICSOC 2019. Erik Wittern led the work, with help from Alan Cha (implementation), myself (experimental design), Guillaume Baudart (theoretical analysis), and Louis Mandel (theoretical analysis). Most of these authors are affiliated with IBM, as part of IBM’s ongoing involvement with GraphQL as part of the GraphQL Foundation.

Since GraphQL is unfamiliar to many readers, I’ve included a bit more introductory material and illustrations than I usually do.

Summary

GraphQL is a query language for data that can be represented as a graph, and reportedly offers…


This is a brief for the research paper Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions, presented at ASE 2019. Mischa Michael led this project, with support from James Donohue, myself, Dongyoon Lee, and Francisco Servant.

In this article, I use the word “regex” as shorthand for “regular expression”.

Summary

This paper describes the first large-scale qualitative examination of the ways software engineers interact with regexes. We surveyed 279 professional developers and conducted 17 interviews. …

James Davis

I am a professor in ECE@Purdue. I hold a PhD in computer science from Virginia Tech. I try to summarize my research findings in practitioner-friendly ways.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store