Notes on Sarter and Woods’ “Autonomy, Authority, and Observability: Properties of Advanced Automation and Their Impact on Human-Machine Coordination”
For background on the purpose of these notes, and other notes (eventually), see my entry “Learning in public.”
About
- Media: Paper
- Availability: free, electronically
How I came to study this
I’ve been interested in the topics of software engineering and the development of software-centric systems since my first year of undergrad at Penn State University in 1996. As IBM started its aggressive moves to SaaS in the early 2010s, I became both interested in and responsible for the operations of software-centric systems, which lead me to the topic of “DevOps.”
For the past ten years, I’ve tried to gain some level of expertise with the topics that swirl around DevOps, including things like continuous delivery, monitoring and alerting, challenges of interdisciplinary collaboration, etc. I also came to follow the writings, observations, and sharing of folks’ in the DevOps community whose work spoke to me.
One of the key folks I’ve come to follow is John Allspaw, who helped popularize the practice of continuous deployment in the late 2000s, curated a canonical book on web operations, and later sought relevant insights from other, more mature domains, and worked to bring them back to the web operations community. Over the years I’ve had the good fortune of forming a friendship with John, and it’s had a massive impact on both my learning and my work at IBM; as just one example, I can trace our transformative adoption of modern tools at scale to things John shared with me in the mid-2010s when he was at Etsy.
Recently I’ve been working on an article on the topic of the adoption of machine learning-centric systems in large organizations that lack experience with machine learning both as a technology and as an integral part of a complex socio-technical system. One of the key claims relates to how I believe the naive adoption of machine learning, especially the popular deep learning variant, will compound the problem of poor observability in a complex socio-technical system.
One problem: Since I am intellectually honest with myself, I realized that my understanding of the concept of observability is quite shallow, as I’ve never formally studied it. As the concept seemed more and more central to my article, I figured I better study up, but where to start?
John to the rescue!
I Googled “Allspaw observability Twitter” based on a vague memory of John Tweeting about this topic from time to time, and search result number two lead me to this tweet from John, which recommended the Sarter/Woods paper that’s the topic of this post.
One final interesting point on finding this paper: John noted in a previous tweet:
Which meaning of “observability” do you mean? The one with empirically supported research behind it in a number of engineering domains, or the more recent one popularized in software? I’m guessing you’re asking about the latter, in which there is very few.
For those who don’t know John, he is allergic to sort of “folk models” that often are espoused around our young software and web operations, which is one of the reasons he dove deep into other domains; this is a warning to me that as I continue my study of observability, I should be extra cautious about the underpinnings of the claims.
Notes and observations
First a meta-observation: this paper is really short and concise as far as these sorts of papers usually go—only four pages, including references. This was a pleasant surprise as I am new to this topic, and long papers on new topics can be overwhelming. If you’re game, you might consider reading or re-reading the paper before continuing here.
Here’s the abstract from the beginning of the paper:
In a variety of domains, problems with human-machine coordination are a matter of considerable concern. These problems are the result of a mismatch between human information processing strategies and the communication skills of automated systems. Results of recent empirical research on pilot interactions with advanced cockpit automation on the Airbus A-320 illustrate how problems with man-machine coordination are changing in nature in response to the ongoing evolution of modern systems from passive tools to active yet ‘silent’ agents. The highly autonomous and powerful nature of modern technology in combination with low system observability fails to support operators’ expectation-driven approach to monitoring.
Structure
The paper has four main sections, and then references. The sections are:
- Introduction
- Pilot interaction with highly advanced cockpit automation
- Contributing factors to breakdowns in human coordination with advanced automation
- Conclusion
Since the introduction and conclusion provide different framings of the same basic content, there are really only three distinct sections:
- Key concepts and problem space overview (1 & 4)
- A case study from the domain of aviation, based on actual empirical research (2)
- Enumeration and analysis of several key factors that tend to cause the man-machine breakdowns that are the focus of the paper (3)
Mode errors and automation surprises
The paper describes the phenomenon of “automation surprises” which, in my layman’s mind is when a piece of software, often embodied in hardware (e.g. airplane, iPhone) behaves in a way you didn’t expect. The authors describe the phenomenon of “mode errors,” which I’ve studied before in books on human-computer interaction design, where the user’s expectations don’t match up with the actual state of the systems.
The following sentence is key re: mode errors:
A lack of mode awareness can be viewed as the consequence of a mismatch between required and available feedback on system behavior, between operators’ monitoring strategies and the communication skills of the automation.
For programmers, a simple example of this is the vi editor, which has two distinct mode: command and edit, and keystrokes produce completely different outcomes depending on your mode. If you think you’re in edit mode and you’re actually in command mode … hijinks quickly ensue 😅. Now, if the system you’re operating is not a text editor but a production SaaS service, a 747 jet, or a space shuttle … the potential consequences of mode errors and automation surprises are obviously much, much higher.
Going back to the quote, the most interesting concepts to me were “operators’ monitoring strategies” and “communication skills of the automation.” In layman’s terms, I’d characterize monitoring strategies as “how an operator decides to what they should pay attention in a (typically) information-overloaded environment.” It’s interesting to think about an automation having “communication skills.” I hadn’t previously thought about things like alerts as a type of “communication skill” but it makes perfect sense what you think about it. Alerts can be very helpful (well-timed, good context, worthy of your attention) or very unhelpful or even misleading and counterproductive.
When you frame your thinking about system design in terms of good alignment between “operators’ monitoring strategies” and “communication skills of the automation,” it’s … thought-provoking 🤔.
The increasing autonomy and authority of automated agents
My next key insight was the paper’s discussion of how automations (computer programs etc.) are becoming increasingly autonomous and are being granted increasing authority. I thought the authors’ description of these phenomena were helpful:
… increasingly autonomous in the sense that they are capable of carrying out long sequences of actions without requiring operator input once they have been programmed and activated. They are increasingly powerful as they are capable under certain circumstances to override or modify operator input.
I had never heretofore thought of this definition of autonomy, which can be measured in terms of uninterrupted sequences of actions. One way to think about this is the autopilot functionality on some modern cars, where under certain, relatively simple conditions, the car can “drive itself” or “is autonomous,” e.g. on a highway, in stop-and-go traffic, but also at least attempts to know when it’s out of its depth and needs to return control to the human operator.
I had also never thought of this framing around “authority of agents.” We use the term “agency” to talk about this concept … what may we do without asking for permission or seeking guidance, and automation designers increasingly attempt to make their systems better / safer / more differentiated by giving the automation more control. Going back to cars, some cars will not just warn you that a collision is imminent, but they may proactively take countermeasures like activating the braking system. This is all well and good … when it works correctly and doesn’t surprise the human operator, but one of the key themes of the paper is that if the automation is making decisions for which the human operator is not aware, the human operator may make his or her own decisions based on faulty assumptions (i.e. the aforementioned “mode error” which, in the best case results in momentary confusion, but in the worst case, could literally lead to death and destruction. While it’s not discussed explicitly in this paper, I’ve read other books by Woods, e.g. Behind Human Error, that describes several famous plane crashes that were at least partially caused by this sort of mode error.
For completeness, along with the situation where an external “sensor trigger” puts the system in an unexpected state, the paper also observes that many systems are multi-user, and another user’s input might put the system in an unexpected state for other users. E.g. if a co-pilot makes a change to the flight control and the pilot doesn’t realize it—communication breakdowns obviously don’t merely occur between humans and machines—it’s a struggle in the human-to-human world too (ask me how I know!).
A short word on the case study
The case study backing the observations of the paper is quite interesting. It involves observations of the behavior of pilots in a simulation that introduces expected mode errors in an attempt to understand how automation surprises manifest, and impact both the human operator’s perceptions and behaviors. I won’t recapitulate it here (you can just read it yourself), but it helps me understand why John respects this paper.
What causes man-machine communication breakdowns?
The third section, and the last I’ll cover here, discusses why man-machine communication breakdowns occur. This paragraph was especially interesting:
The (human operator’s) monitoring of an automated system requires an adequate mental model of the functional structure of the system. A mental model allows for the formation of expectations of system responses to input from various sources. In that sense, the model is a necessary prerequisite for the timely allocation of attention to critical information, as it guides the exploration of subsets of the usually large amount of available data.
I think a lot about mental models and much of the work I do with “culture change” involves working to understand mental models, especially tacit assumptions, and then figuring out how to change them, at scale. (Spoiler: it’s hard 😅.)
What threw me off a bit in this section was the concept of “the functional structure of a system.” I tend to think of systems in terms of their structure (static) and behavior (dynamic) and I don’t think I’ve ever used the term “functional structure.” Based on the surrounding context, I think this just means that we tend to construct mental models in terms of how a system works based on things like input/output, workflow logic, states, etc. E.g. whenever I think about a SaaS service, I often picture and incoming HTTP request, going into the ingress for authentication/authorization/etc., and then being passed to a relevant downstream system based on load, request nature, etc. But I guess a simpler way of thinking about this is “If I do <x>, what would I expect the system to do in response.” E.g. if I push down on my car’s gas pedal, I expect the car to accelerate.
Anyway, the other key part of this paragraph has to do with how one determines what to what they should pay attention. This is a big problem in everyday life—too many emails/Slacks/etc., too many entertainment options—but it becomes much higher stakes in some operational environments where if attention isn’t focused on the right thing at the right time, bad things can happen. An everyday case is being distracted by your iPhone or kids right as the person in front of you slams on the breaks.
This section concludes by going through the actual contributors to man-machine communication breakdowns which are:
- Input from multiple sources
- System coupling
- Inconsistent system design
I thought all of these sections made sense and I encourage you to read them directly.
My conclusion
Since I’m not young, this was yet another example where a well-researched paper doesn’t introduce a ton of new concepts to me, but it does help crisp up my understanding of the concepts and the dynamics underlying observable phenomena. While the concept of “observability” is still somewhat fuzzy in my head, I think that as I reflect on this paper, study other items on observability, and let my brain do its synthesizing thing, I’ll get there.
Thanks to John again for the reference and for suggesting I write down these thoughts. I’m sure I got things wrong and missed essential points, so I welcome folks’ feedback.