Incorporating Failure Knowledge into Design Decisions for IoT Systems: A Controlled Experiment on Novices
This is a brief for the research paper “Incorporating Failure Knowledge into Design Decisions for IoT Systems: A Controlled Experiment on Novices”, published at the 2023 5th International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT’23). This work was led by Dharun Anandayuvaraj. The full paper is available here.
Background
In prior work (see blog post), we reviewed 20 news stories about failures in Internet Of Things (IoT) systems. All of these stories involved a software failure. One example is illustrated below, based on an article by the New York Times.
In that study we noted that many of these failures were recurring — design flaws at one company were repeated, either by the same company or by another company. Some of these failures recurred within the same industry, and others reccurred across multiple industries.
The following figure shows our general thesis: If we can learn from past failures, we will improve future systems. Historically, lessons from failures have influenced system design in engineering disciplines such as civil, mechanical, and aeronautical engineering. While we hope the thesis is unobjectionable, its use in software engineering is unclear. Agile methods and the “move fast and break things” software culture are somewhat in opposition to careful documentation of knowledge from past failures. With the proliferation of IoT systems, software systems are increasingly safety-critical; thus, we advocate for the practice of learning from system failures within software engineering.
Our Approach
We inquire: Could we improve software design decisions using design failures as a learning treatment ? For complex systems, design decisions greatly impact outcomes. Design decision rationales are used to understand the justifications, alternatives considered, and trade-offs evaluated of design decisions. Current materials to guide decision making are limited to guidelines, such as from government, industry, researchers, and textbooks.
The following figure illustrates our core technique. The state of the art approach is to inform design using guidelines. For example, “Use redundancy when designing critical systems” is a common recommendation. However, guidelines typically do not include a story showing what can happen when the guideline is violated. Such a story may affect the decisions that engineers make. We therefore compared this guideline-only approach to one where an illustrative story accompanies the guideline, as shown in the next figure.
To evaluate whether engineers make different decisions (or different rationales) when presented with a failure story, we developed a design scenario for a hypothetical IoT-enabled robotic warehouse. This scenario had 8 design decisions, each of which had a budgetary constraint. 4 of these decisions matched the failure stories (higher criticality), while the other 4 were deemed less crucial by the research team. Thus, spending the budget on the higher-criticality decisions was the “correct” answer to the design scenario.
We recruited subjects from computer engineering students (hence the “novice” in the title). The following figure illustrates our protocol.
Results
To summarize our results: All subjects achieved similar scores on the design scenario (decisions weren’t affected by treatment), but the treatments did influence the rationales that the subjects provided.
1. Decisions don’t change
We did not observe a statistically significant difference between the performance of the subjects on the tasks. In each group of subjects, the average participant answered ~6 questions correctly (see next figure).
2. Rationales do change
We did, however, observe interesting differences in the rationales that subjects provided along with their decisions. We examined the rationales and categorized them into 4 categories: (1) reasoning about criticality; (2) reasoning about safety; (3) reasoning about cost; and (4) reasoning about performance.
See the following figure! We observed differences in decision rationales by group, notably in criticality and safety. The Treatment 1 and the Treatment 2 groups reasoned more about the criticality of the subsystems than the Control group did. The Treatment 2 group reasoned more about safety than the Control group, with Treatment 1 mentioning this least. The Control group was more concerned with cost and performance. It is also noteworthy that a larger amount of responses incorrectly judge criticality from Treatment 2 than from Treatment 1 with respect to our assessment. We conjecture that this might be due to an over-precaution for criticality. The failure stories emphasized the catastrophic impacts of bad design, possibly priming the subjects to be conservative.
Future Directions
We propose an investigation into the effectiveness of guideline-based practices at enabling developers to reason about holistic (e.g., safety, security, performance) implications of their design decisions. While we compared the influence of a guideline-based design practice against a failure-aware design practice, there is limited knowledge on the effectiveness of guideline-based design processes in the first place.
We advocate for experimental methods to measure and understand design rationales. We suggest that qualitative analysis of rationales seems an appropriate path. A taxonomy of rationales could be a useful aid in experimental design. Measuring rationale in the context of the more systematic engineering techniques used in IoT design (e.g., FMEA, STAMP) is an open problem.
We propose a broader investigation of processes for learning from design failures. If failure-based learning treatments are effective at instilling good design practices, then we need processes to identify failures and apply the knowledge. Software industry leaders advocate for improved postmortem practices, but lack empirical evidence. First, we need to investigate the current processes used by organizations. For example, practitioners could provide insight into whether and how organizations currently document and learn from system failures. Second, the knowledge transfer processes used to share system postmortems across teams and organizations could be studied. How effectively do teams generalize from a specific failure to the broader class? What are the limits of generalization? Third, we need to study the use of these failure stories in internal training (e.g., during onboarding). When and why are failure stories shared to team members? What is learned, and how effective is it?
Conclusions
We investigated the influence of failure stories on design decisions. We found that failure stories yielded similar outcomes as simple design guidelines, but that they influenced the rationales that engineers used to justify their decisions. Since rationales play an important role in what designs are actually implemented, we believe this result suggests the value of communicating not just WHAT to do in design (guidelines), but also WHY to do it (failure stories). Our observations about failure-aware design decisions motivate new research directions into failure-aware design processes.
The full paper is available here. The design scenario and results are available here.