Ethical conduct in cybersecurity research

What happened?

Researchers at the University of Minnesota conducted a research project titled On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits.

  1. A historical analysis of Linux, examining the process by which previous problematic commits entered the codebase without being caught by human or automatic review. This research method is a specific form of “mining software repositories”, wherein researchers seek to learn by analyzing the history of a software project.
  2. An experiment in which they determine whether the Linux maintainers are capable of detecting three security vulnerabilities submitted by the authors. This research method is a specific form of human-subjects research, in which the researchers learn something from human behavior.
  • Human-subjects research in cybersecurity was conducted under the oversight of UMN’s IRB, reviewed favorably by academic peers, and accepted to a (prestigious) conference under the aegis of the sponsoring professional organization, the IEEE.
  • The human subjects involved felt that the researchers had experimented on them unethically, and banned their sponsoring organization from future involvement with their open-source project.

What is human-subjects research?

Here is the US federal government’s definition of human-subjects research [5], emphasis added:

Was this human-subjects research?

Arguments for “yes”

Let’s map the federal definition of human-subjects research to the authors’ research method #2 from above. The authors:

  • Interacted with living individuals (the Linux maintainers)
  • Obtained information (whether the maintainers would approve a malicious commit)
  • Analyzed that information (discussion of factors that lead to patch acceptance, and, as discussed in section 8.D, the maintainers’ perspectives on this experiment).

Arguments for “no”

  • In their FAQ, the authors write “This is not considered human research. This project studies some issues with the patching process instead of individual behaviors” [2].
  • The UMN IRB agreed with the authors. They (retroactively) approved the research protocol, and according to the authors this approval stated that the work did not involve human-subjects [2].*
  • The IEEE S&P review committee agreed with the authors. According to the conference’s call for papers, which includes “Ethical Considerations for Human Subjects Research”, the reviewers reserve the right to reject papers that do not meet ethical standards. Since the paper was accepted despite having been conducted without oversight from the institution’s human-subjects review board (the IRB), it appears that the research community* agrees that this was not human-subjects research — or at least, not unethical human-subjects research.

Settling the argument

We are at an impasse. My naive reading of the federal guidelines says the authors conducted human-subjects research. The research community seems to feel otherwise.

Did the Linux developers overreact?

The Linux developer community responded to this experiment by banning all contributions from the organization that sponsored the research: the University of Minnesota. This affects both the researchers who conducted this study, and all other UMN researchers, students, and staff members. Is this an over-reaction? No.

Humans within sociotechnical systems

How can it be that these various parties had such mis-aligned perspectives? I suggest that the academic community failed to consider the notion of a sociotechnical system. Let me illustrate:

Experiment conducted by the authors.

Designing an ethical version of this study

I believe that the researchers conducted human-subjects research, and will proceed under this supposition. Did their research protocol honor the human subjects?

The current study design is flawed

I suggest that the experiment was low-reward and high-risk.

  1. Low-reward. Let’s recall that the authors began with a historical analysis of problematic commits in Linux. The authors concluded this analysis by noting that many “potentially-hypocrite commits” had already entered the Linux project. It is not clear what additional knowledge the research community would learn from creating and submitting new malicious submissions.
  2. High-risk. First, the protocol involved human subjects who were being deceived. This is an unusual requirement, and should be scrutinized for the way in which it treats the subjects. Second, the human subjects are non-consenting. Their time is valuable; from their perspective they are being volunteered to waste it on the researchers’ game. Third, the protocol could have resulted in security vulnerabilities reaching the Linux core and spreading to billions of devices. Linux is part of the critical infrastructure for systems across the globe.
  • The maintainer might have lost or discarded their follow-up email. Emails are lost and ignored all the time.
  • The authors sent their patches to the general Linux mailing list. Although the patches might not have been merged to mainline Linux, since Linux uses a distributed programming model, any community member could have incorporated the patches into their own versions. [Credit: Santiago Torres-Arias pointed this out to me.]
  • The authors themselves might have been incapacitated after the patch was approved. Given the timing, the work was presumably conducted during the COVID-19 pandemic. It’s not a great stretch of the imagination to see the whole research team laid low by COVID-19, just in time for the malicious patch to be merged, published, and exploited.

Improved study designs

So, how might we modify this study to obtain interesting findings without the ethical issues? Based on the sociotechnical system depicted above, here are a few ways that a similar experiment might be conducted more ethically (pending IRB approval, anyway):

  • Change the patch. Submit non-critical commits, e.g. non-functional problems like typos instead of security vulnerabilities. See if these commits are accepted. This still involves deceit and non-consent, but removes the risk of damaging a critical global system.
  • Inform some of the participants: “CTO-approved pentesting”. Obtain approval from the Linux chiefs (e.g. Greg K-H), who will retroactively explain to the experimented-on maintainers. This still incorporates elements of deceit and non-consent, but obtains organizational buy-in and substantially decreases the risk of merging malicious commits to the Linux codebase.
  • Inform the participants. Involve the Linux maintainer community throughout the experiment. Everyone consents to the experiment, and there is limited risk of malicious commits reaching the Linux codebase.
  • Simulate: Ask the Linux maintainers to separately review a bunch of commits “in the style of their Linux review process”, with most commits benign and a few malicious. Again, everyone consents, and this time there is no risk of damaging the Linux codebase.

Beyond this case study

Let’s apply sociotechnical reasoning to other cybersecurity experiments. Here are some examples with my perspective:

  • Finding vulnerabilities in source code or binaries: These studies examine technical artifacts. They need not involve a social component. However, these studies sometimes include discussion with developers. If the researchers report on these interactions, then IRB approval may be necessary. The cybersecurity and systems research communities generally do not seek IRB approval for this class of low-grade interactions. Often these interactions occur publicly in the spirit of the open-source community. Although the data is public, the humans involved are responding to the researchers’ stimulus. I am not certain whether the research community practice is consistent with the IRB’s aims here.
  • Mining software repositories or public discussions: These studies examine human-generated artifacts (e.g. code, comments) and human data (e.g. posts on Stack Overflow or Twitter). The data are publicly accessible, so the research is likely exempt. The authors might consult the IRB to ensure their analysis plan is acceptable.
  • Probing sandboxed systems: These studies set up a software system under the control of the researchers, in a “research sandbox”. Only the research team interacts with the system. No human subjects are involved; I suggest no IRB oversight is needed.
  • Probing systems in the wild: These studies probe a live system operated by some external entity, e.g. a REST API hosted by a company. Live systems are sociotechnical systems. If the researchers’ investigation is “read-only” and at a limited scale, this smacks of a purely technical study. However, if the researchers’ experiment involves either (a) “writing to” — interacting with — the live system, or (b) an intensive workload, e.g. attempting to crawl the entire Internet, then the social side of the system may be called into play. Perhaps an on-call beeper goes off, or perhaps a legitimate user cannot access a service because it is being probed intensively by the research team. I do not know how an IRB might treat this case, but I suggest they be consulted. Ethical norms within the research community should also govern your behavior here.
  • Observing malicious behavior: These studies may involve a honeypot that is made deliberately insecure, and then observe how it is exploited. The exploit may be automated (a technical artifact) or manual (human behavior). The researcher cannot control whether the data is derived from a human subject in advance, so they should consult their institution’s IRB.


As a member of the research community, I find this outcome troubling. I also think it is the responsibility of concerned community members to speak up when they see unethical behavior. Here is my voice.

  • The authors should retract the paper. Although they acted in good faith, their study was unethical. A retraction is the right choice to honor the integrity of the research process.
  • IEEE S&P should remove the paper from the proceedings — not as a result of the retraction, but as a separate step.
  • IEEE, the conference sponsor, should weigh issuing a statement about proper research conduct [6].
  • The University of Minnesota should repair relations with the Linux maintainer community. They have already acknowledged an internal investigation, which is a promising first step.
  • The academic cybersecurity community should clarify its standards for human-subjects research and for ethical research. These standards should be drafted promptly — hopefully in time for discussion at S&P’21 in May! — and be the subject of a keynote at each of the upcoming “Big Four” meetings of the community. Going forward, these standards should be clearly listed on each of the “Big Four” cybersecurity conference websites as part of the call for papers. Clearly we have failed to communicate these standards to at least one research group. Let’s not wait for more mistakes.


[1] On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits. Wu & Lu, IEEE S&P’21.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
James Davis

James Davis

I am a professor in ECE@Purdue. I hold a PhD in computer science from Virginia Tech. I blog about my research findings and share tips for engineering students.