Ethical conduct in cybersecurity research

Follow-up note on 19 May 2021: This post was written concurrent with discussions across the cybersecurity research community. Since then: The authors withdrew their paper; the conference chairs described significant changes for the next edition of the conference; the Linux community issued a statement. There is also a related comment from Ted Ts’o (Linux contributor) at the bottom of this blog post.

In April 2021, the Linux developer community issued a blanket ban on contributions from the University of Minnesota. This remarkable outcome occurred as a result of a research project by a team at UMN. The headlines from the incident have been covered by a variety of tech media outlets, e.g. Neowin. I’d like to take a deeper look, and discuss the case from the perspective of research ethics and experimental design.

I have carefully read the authors’ paper [1] and their FAQ [2]. I believe that they acted in good faith. In their prior work they have made exemplary efforts to promote cybersecurity in major software projects, for which I thank them. However, in this case, they made an ethical misstep out of a misapprehension of what constitutes ethical conduct and human-subjects research in computing.

This post is intended to clarify these topics. I hope it is helpful to other researchers, and perhaps to some Institutional Review Board (IRB) staff.

What happened?

Researchers at the University of Minnesota conducted a research project titled On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits.

Open-source software is software whose source code is publicly visible. Typically, this code is maintained by a core community (“The Maintainers”), and contributions are also solicited by the users of the software to make it better suit their needs. One of the premises of open-source software is that “with many eyes, all bugs are shallow” [3]; although software defects happen, the belief is that by working together these defects can be identified and eradicated more quickly. However, this philosophy supposes mostly-good actors. Some versions of the open-source model permit contributions from unknown users! If these users act maliciously, the defective behavior they introduce may be accepted into the codebase and subsequently exploited. Open-source communities use a variety of mechanisms to avoid this unhappy outcome, including human review and automatic tools.

As the title suggests, the authors investigate the feasibility of introducing vulnerabilities through deliberately-incorrect code. They refer to such submissions as “hypocrite commits” — the code contributions say one thing but do another. The authors specifically study such commits in the Linux kernel, a hugely important open-source project used in billions of devices around the world (including Android phones, much of “the Cloud”, and most supercomputers). In their investigation, the authors applied two research methods:

This project was published at IEEE Security & Privacy in 2021 — this is one of the most prestigious research conferences in computing. At the time of writing, the paper is available [1], along with a FAQ from the authors that responds to criticism [2].

When the paper was originally accepted, there was some pushback about ethics from other security researchers. The authors made some modifications to the final version of the paper. You can see their remarks in section 6.A “Ethical considerations” and in section 8.D “Feedback of the Linux Community.”

As a consequence of the researchers’ engagement with the Linux community, in April 2021 Linux chief Greg Kroah-Hartman banned all future contributions from the University of Minnesota [4].

This outcome is shocking. Let’s recap:

I hope it is not outlandish for me to suggest that there may be some mismatch between what the academic community accepted as ethical conduct, and what the subjects perceived as ethical conduct. Let’s dive in to the details.

What is human-subjects research?

Here is the US federal government’s definition of human-subjects research [5], emphasis added:

Human subject means a living individual about whom an investigator (whether professional or student) conducting research, [including if the researcher] obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens.

This definition applies to federally-funded research. That includes funding from the National Science Foundation (NSF), which paid for this study [1].

This is an excerpt from a lengthy federal document, with plenty of sub-definitions and clarifications. The document also lists some exemptions here, e.g. in the case of matters of public record.

Here is a handy flowchart.

Was this human-subjects research?

Arguments for “yes”

Let’s map the federal definition of human-subjects research to the authors’ research method #2 from above. The authors:

I am not a lawyer, but that sure looks like the definition of human-subjects research to me.

Arguments for “no”

*I hesitate to paint with a broad brush, and I understand that individual community members may feel as I do…but the paper was accepted at the conference.

Settling the argument

We are at an impasse. My naive reading of the federal guidelines says the authors conducted human-subjects research. The research community seems to feel otherwise.

How shall we settle the stalemate? I do not think that decision should be up to the experimenter — let us consult the possibly-experimental-subjects. Do they feel they were experimented on?

In his email message banning the University of Minnesota, Linux chief Greg Kroah-Hartman wrote “Our community does not appreciate being experimented on, and being “tested” by submitting known patches that are either do nothing on purpose, or introduce bugs on purpose. If you wish to do work like this, I suggest you find a different community to run your experiments on, you are not welcome here” [4].

If humans feel they have been experimented on, we should call this “human-subjects research” — despite what the authors, UMN’s IRB, and the research community say.

Did the Linux developers overreact?

The Linux developer community responded to this experiment by banning all contributions from the organization that sponsored the research: the University of Minnesota. This affects both the researchers who conducted this study, and all other UMN researchers, students, and staff members. Is this an over-reaction? No.

The researchers did not act alone. They obtained approval from their university’s human-subjects ethics oversight board, the IRB. The approval was retroactive, but it was approval! From the perspective of the unwitting experimental subjects, UMN can no longer be trusted to provide ethical oversight of research that involves the Linux developer community. It is the university that failed to provide responsible oversight.

But by the same token, the approval of the research community is problematic. IEEE S&P has granted the work its imprimatur, thus, the leaders of the cybersecurity research community agree that the authors behaved appropriately. How do you think the Linux maintainers feel about that? And Linux is a leader in the open-source movement— how might other open-source communities react?

Humans within sociotechnical systems

How can it be that these various parties had such mis-aligned perspectives? I suggest that the academic community failed to consider the notion of a sociotechnical system. Let me illustrate:

Experiment conducted by the authors.

From the academic interpretation, the research was apparently on a purely technical entity: the “review process”. If the entity being studied is technical, not human, then the work is not human-subjects.

But this architectural view is only a partial picture, an incomplete model of the actual system. It ignores the role of the Linux maintainers — living individuals — who carry out the review process. These humans actually carry out the process being studied, and as a result they are indeed an (indirect) subject of the experiment.

The role of humans was made explicit by the authors themselves:

In the experiment, we aim to demonstrate the practicality of stealthily introducing vulnerabilities through hypocrite commits. Our goal is not to introduce vulnerabilities to harm OSS. Therefore, we safely conduct the experiment to make sure that the introduced UAF bugs will not be merged into the actual Linux code. In addition to the minor patches that introduce UAF conditions, we also prepare the correct patches for fixing the minor issues. We send the minor patches to the Linux community through email to seek their feedback. Fortunately, there is a time window between the confirmation of a patch and the merging of the patch. Once a maintainer confirmed our patches, e.g., an email reply indicating “looks good”, we immediately notify the maintainers of the introduced UAF and request them to not go ahead to apply the patch.

Although the research is focused on the review process, human subjects are involved in every step: the authors send email to humans, the humans review the patches and reply, and the authors then tell the humans not to proceed. The review process is a sociotechnical system. There is a human inside the box. We cannot pretend otherwise.

Designing an ethical version of this study

I believe that the researchers conducted human-subjects research, and will proceed under this supposition. Did their research protocol honor the human subjects?

To decide this, let us examine the ethical standard to which human-subjects researchers are held: that of their institution’s Insitutional Review Board, or IRB. An IRB is charged with ensuring that human-subjects research is conducted ethically. Among other things, they decide whether the benefits of the experiment outweigh its risks, and they are supposed to take the perspective of the human subjects into consideration.

The current study design is flawed

I suggest that the experiment was low-reward and high-risk.

The authors attempted to control for the third risk by “immediately notifying” the maintainers after their malicious patches were approved. There are several ways this protocol might have failed, including:

Improved study designs

So, how might we modify this study to obtain interesting findings without the ethical issues? Based on the sociotechnical system depicted above, here are a few ways that a similar experiment might be conducted more ethically (pending IRB approval, anyway):

Each of these changes would decrease the realism of the experiment, and might decrease the generalizability of the results. For example, participants may change their behavior if they are being observed (Hawthorne Effect). But there is a realism-ethics trade-off, and researchers need to stay on the “ethical” side of the trade.

Beyond this case study

Let’s apply sociotechnical reasoning to other cybersecurity experiments. Here are some examples with my perspective:

Conclusions

As a member of the research community, I find this outcome troubling. I also think it is the responsibility of concerned community members to speak up when they see unethical behavior. Here is my voice.

As a concerned community member, I recommend that the following steps be taken:

References

[1] On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits. Wu & Lu, IEEE S&P’21.

[2] Clarifications on the “Hypocrite Commit” work. Wu & Lu, 2021.

[3] The Cathedral and the Bazaar. Raymond. 1997.

[4] Re: [PATCH] SUNRPC: Add a check for gss_release_msg. Linux mailing list. Koah-Hartman. 21 April 2021.

[5] Code of Federal Regulations, TITLE 45 PUBLIC WELFARE, DEPARTMENT OF HEALTH AND HUMAN SERVICES, PART 46 PROTECTION OF HUMAN SUBJECTS, §46.102. U.S. Department of Health & Human Services. 2019.

[6] IEEE Code of Ethics. IEEE Policies, Section 7 — Professional Activities. Institute of Electrical and Electronics Engineers. 2014.

I am a professor in ECE@Purdue. I hold a PhD in computer science from Virginia Tech. I try to summarize my research findings in practitioner-friendly ways.