Chat with us, powered by LiveChat The first two questions are short answer an - Study Help

The first two questions are short answer and require reading.

The third question is the discussion question.

No cited needed

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 1

Beyond Objects:
A Software Design Paradigm Based on Process Control

Mary Shaw
Carnegie Mellon University

Pittsburgh PA 15213

[email protected]


A standard demonstration problem in object-oriented programming is the design of an automobile cruise
control. This design exercise demonstrates object-oriented techniques well, but it does not ask whether the
object-oriented paradigm is the best one for the task. Here we examine the alternative view that cruise
control is essentially a control problem. We present a new software organization paradigm motivated b y
process control loops. The control view leads us to an architecture that is dominated by analysis of a
classical feedback loop rather than by the identification of discrete stateful components to treat as objects.
The change in architectural model calls attention to important questions about the cruise control task that
aren’t addressed in an object-oriented design.

1 . Design Idioms for Software Architectures

Explicit organization patterns, or idioms, increasingly guide the composition of modules into complete systems.
This stage of the design is usually called the architecture, and a number of common patterns are in general, though
quite informal, use [Garlan and Shaw 93]. One of these, the object-oriented architecture [Booch 86], is the subject
of much current attention. Although several architectural idioms have strong advocates, no single paradigm domi-
nates. The choice of an architecture should instead depend on the computational character of the application.

Here we explore a software idiom based on process control loops. This system organization is not widely
recognized in the software community; nevertheless it seems to quietly appear within designs dominated by other
models. Unlike object-oriented or functional design, which are characterized by the kinds of components that
appear, control loop designs are characterized both by the kinds of components and the special relations that must
hold among the components.

The paper first explains process control models and derives a software paradigm for control loop organizations.
Then it applies the result to a well-known problem, the design of a cruise control system. The differences between
the control-loop-based and the object-oriented designs reveal relative strengths of the models for problems of this
kind. The control view clarifies the different roles played by various problem inputs; further, it helps the designer
recognize a safety problem and a system limitation. Drawing on the knowledge of process control also offers
prospects for design guidance and quantitative analysis of system response characteristics.

1 . 1 . Process control paradigms

Continuous processes of many kinds convert input materials to products with specific properties by performing
operations on the inputs and on intermediate products. The values of measurable properties of system state
(materials, equipment settings, etc.) are called the variables of the process. Process variables that measure the
output materials are called the controlled variables of the process. The properties of the input materials,
intermediate products, and operations are captured in other process variables. In particular, the manipulated
variables are associated with things that can be changed by the control system in order to regulate the process.
Process variables must not be confused with program variables; this error can lead to disaster [Åström and
Wittenmark 84, Leveson 86, Perry 84, Seborg et al 89].

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 2

Process Control Definitions
Process variables: properties of the process that can be measured; several specific kinds are

often distinguished. Do not confuse process variables with program variables.
Controlled variable: process variable whose value the system is intended to control
Input variable: process variable that measures an input to the process
Manipulated variable: process variable whose value can be changed by the controller
Set point: the desired value for a controlled variable
Open loop system: system in which information about process variables is not used to

adjust the system.
Closed loop system: system in which information about process variables is used to ma-

nipulate a process variable to compensate for variations in process variables and
operating conditions.

Feedback control system: the controlled variable is measured and the result is used to
manipulate one or more of the process variables

Feedforward control system: some of the process variables are measured and disturbances
are compensated for without waiting for changes in the controlled variable to be visible.

The purpose of a control system is to maintain specified properties of the outputs of the process at (sufficiently
near) given reference values called the set points. If the input materials are pure, if the process is fully-defined, and
if the operations are completely repeatable, the process can simply run without surveillance. Such a process is
called an open loop system. Figure 1 shows such a system, a hot-air furnace that uses a constant burner setting to
raise the temperature of the air that passes through. A similar furnace that uses a timer to turn the burner off and
on at fixed intervals is also an open loop system.

Furnace Hot Air

Return Air

Figure 1: Open loop temperature control

The open-loop assumptions are rarely valid for physical processes in the real world. More often, properties such
as temperature, pressure and flow rates are monitored, and their values are used to control the process by changing
the settings of apparatus such as valves, heaters, and chillers. Such systems are called closed loop systems. A
home thermostat is a common example: the air temperature at the thermostat is measured, and the furnace is turned
on and off as necessary to maintain the desired temperature (the set point). Figure 2 shows the addition of a
thermostat to convert Figure 1 to a closed-loop system.

Furnace Hot Air

Return Air


Temperature-setting control

Temperature sensor

Gas valve control

Figure 2: Closed loop temperature control

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 3

There are two general forms of closed loop control. Feedback control, illustrated in Figure 3, adjusts the process
based on measurements of the controlled variable. The important components of a feedback controller are the
process definition, the process variables (including designated input and controlled variables), a sensor to obtain the
controlled variable from the physical output, the set point (target value for the controlled variable), and a control
algorithm. Figure 2 corresponds to Figure 3 in the following ways: The furnace with burner is the process; the
thermostat is the controller; the return air temperature is the input variable; the hot air temperature is the con-
trolled variable; the thermostat setting is the set point; and the temperature sensor is the sensor.

Set Point



∆ s to


Input Variables

Figure 3: Feedback Control

Feedforward control, shown in Figure 2, anticipates future effects on the controlled variable by measuring other
process variables whose values may be more timely; it adjusts the process based on these variables. The
important components of a feedforward controller are essentially the same as for a feedback controller except that
the sensor(s) obtain values of input or intermediate variables. It is valuable when lags in the process delay the
effect of control changes.

Set Point




∆ s to


Input Variables

Figure 4: Feedforward Control

These are simplified models. They do not deal with complexities such as properties of sensors, transmission
delays, and calibration issues. They ignore the response characteristics of the system such as gain, lag, and
hysteresis. They don’t show how to combine feedforward and feedback or choose which process variables to
manipulate. Chemical engineering provides excellent quantitative models for predicting how processes will react
to various control algorithms; indeed there are a number of standard strategies [Perry 84, Section 22]. These are
mentioned in Section 3.4, but detailed discussion is beyond the scope of this paper.

1 . 2 . Software paradigm for process control

We usually think of software as algorithmic: we compute outputs (or execute continuous systems) solely on the
basis of the inputs. This normal model does not allow for external perturbations; if non-input values of a
computation change spontaneously, this is regarded as a hardware error. The normal software model corresponds to
an open loop system; in most cases it is entirely appropriate. However, when the operating conditions of a
software system are not completely predictable—especially when the software is operating a physical system—the
purely algorithmic model breaks down. When the execution of a software system is affected by external

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 4

disturbances—forces or events that are not directly visible to or controllable by the software—this is an indication
that a control paradigm should be considered for the software architecture.

We can now establish a paradigm for software that controls continuous processes. The elements of this pattern
incorporate the essential parts of a process control loop, and the methodology requires explicit identification of
these parts:

Computational elements: separate the process of interest from the control policy.

Process definition, including mechanisms for manipulating some process variables.

Control algorithm to decide how to manipulate process variables, including a model for how the
process variables reflect the true state.

Data elements: continuously updated process variables and sensors that collect them.

Process variables, including designated input, controlled, and manipulated variables and knowledge
of which can be sensed.

Set point, or reference value for controlled variable.

Sensors to obtain values of process variables pertinent to control.

The control loop paradigm establishes the relation that the control algorithm exercises surveillance:
it collects information about the actual and intended states of the process and tunes the process
variables to drive the actual state toward the intended state.

The two computational elements separate issues about desired functionality from issues about responses to
external disturbances. For a software system, we can bundle the process and the process variables; that is we can
regard the process definition together with the process variables and sensors as a single subsystem whose input and
controlled variables are visible in the subsystem interface. We can then bundle the control algorithm and the set
point as a second subsystem; this controller has continuous access to current values of the set point and the
monitored variables; for a feedback system, this will be the controlled variable. There are two interactions between
these major systems: the controller receives values of process variables from the process, and the controller
supplies continuous guidance to the process about changes to the manipulated variables.

The result is a particular kind of dataflow architecture. The primary characteristic of dataflow architectures is that
the components interact by providing data to each other, each component executing when data is available. Most
dataflow architectures involve independent (often concurrent) processes and pacing that depends on the rates at
which the processes provide data for each other. The control loop paradigm assumes further that data related to
process variables is updated continuously.

Other, perhaps more familiar, members of the dataflow family are batch sequential processing and pipe-and-filter
architectures. Both are largely linear: data enters the system and is processed progressively by a number of distinct
computations. In batch sequential architectures each phase runs to completion and delivers the result (historically
as a magnetic tape!) to the next. In pipe-and-filter architectures, on the other hand, each filter processes its input
stream incrementally (in unix, by characters or lines) so the filters can operate concurrently, at least in principle.
The control loop architecture described here differs from both by the commitment to a dataflow loop and in the
intrinsic asymmetry of the control element from the process element.

It is appropriate to consider a control loop design when:

• the task involves continuing action, behavior, or state

• the software is embedded; that is, it controls a physical system

• uncontrolled, or open loop, computation does not suffice, usually because of external perturbations
or imprecise knowledge of the external state

2 . Cruise control

2 . 1 . The cruise control problem

Disciplines often work out the details of their methods through type problems, common examples used by many
different people to compare their models and methods [Shaw et al nd]. Booch and others have used the cruise
control problem to explore the differences between object-oriented and functional (traditional procedural)
programming [Atlee and Gannon 91, Booch 86, Ward 84]. As given by Booch, this problem is:

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 5

A cruise control system exists to maintain the speed of a car, even over varying terrain. In Figure 5 we see
the block diagram of the hardware for such a system.

System on/off

Engine on/off

Pulses from wheel



Increase/decrease speed

Resume speed



Figure 5: Booch block diagram for cruise control

There are several inputs:

• System on/off If on, denotes that the cruise-control system should maintain the car

• Engine on/off If on, denotes that the car engine is turned on; the cruise-control
system is only active if the engine is on.

• Pulses from wheel A pulse is sent for every revolution of the wheel.

• A c c e l e r a t o r Indication of how far the accelerator has been pressed.

• B r a k e On when the brake is pressed; the cruise-control system temporarily
reverts to manual control if the brake is pressed.

• Increase/Decrease Speed Increase or decrease the maintained speed; only applicable if the
cruise-control system is on.

• R e s u m e Resume the last maintained speed; only applicable if the cruise-
control system is on.

• C l o c k Timing pulse every millisecond.

There is one output from the system:

• T h r o t t l e Digital value for the engine throttle setting.

The problem does not clearly state the rules for deriving the output from the set of inputs. Booch provides a
certain amount of elaboration in the form of a data flow diagram, but some questions remain unanswered. In the
design below, missing details are supplied to match the apparent behavior of the cruise control on the author’s car.
Moreover, the inputs provide two kinds of information: whether the cruise control is active, and if so what speed it
should maintain.

The problem statement says the output is a value for the engine throttle setting. In classical process control the
corresponding signal would be a change in the throttle setting; this avoids calibration and wear problems with the
sensors and engine. A more conventional cruise control requirement would thus specify control of the current
speed of the vehicle. However, current speed is not explicit in the problem statement, though it does appear
implicitly as “maintained speed” in the descriptions of some of the inputs. If the requirement addresses current
speed, throttle setting remains an appropriate output from the control algorithm. To avoid unnecessary changes
in the problem we assume accurately calibrated digital control and achieve the effect of incremental signals by
retaining the previous throttle value in the controller.

The problem statement also specifies a millisecond clock. In the object-oriented solution, the clock is used
only in combination with the wheel pulses to determine the current speed. Presumably the process that
computes the speed will count the number of clock pulses between wheel pulses. A typical automobile tire has a
circumference of about 6 feet, so at 60 mph (88 ft/sec) there will be about 15 wheel pulses per second. The
problem is overspecified in this respect: a slower clock or one that delivered current time on demand with sufficient
precision would also work and would require less computing. Further, a single system clock is not required by the
problem, though it might be convenient for other reasons.

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 6

These considerations lead to a restatement of the problem: Whenever the system is active, determine the desired
speed and control the engine throttle setting to maintain that speed.

2 . 2 . Object view of cruise control

Booch structures an object-oriented decomposition of the system around objects that exist in the task description.
This yields a decomposition whose elements correspond to important quantities and physical entities in the
system. The result appears in Figure 6, where the blobs represent objects and the directed lines represent
dependencies among objects. Although the target speed did not appear explicitly in the problem statement, it does
appear in Figure 6 as “Desired Speed”.










Figure 6: Booch’s object-oriented design for cruise control

2 . 3 . Process control view of cruise control

Section 1.2 suggests the selection of a control loop architecture when the software is embedded in a physical
system that involves continuing behavior, especially when the system is subject to external perturbations. These
conditions hold in the case of cruise control: the system is supposed to maintain constant speed in an automobile
despite variations in terrain, vehicle load, air resistance, fuel quality, etc. To develop a control loop architecture
for this system, we begin by identifying the essential system elements as described in Section 1.2:

Computational elements
• Process definition: Since the cruise control software is driving a mechanical device (the engine), the

details are not relevant. For our purposes, the process receives a throttle setting and turns the car’s
wheels. There may in fact be more computers involved, for example in controlling the fuel-injection
system. From the standpoint of the cruise control subsystem, however, the process takes a throttle
setting as input and controls the speed of the vehicle.

• Control algorithm: This algorithm models the current speed based on the wheel pulses,
compares it to the desired speed, and changes the throttle setting. The clock input is needed to
model current speed based on intervals between wheel pulses. Since the problem requires an
exact throttle setting rather than a change, the current throttle setting must be maintained by the
control algorithm. The policy decision about how much to change the throttle setting for a given
discrepancy between current speed and current speed is localized in the control algorithm.

Data elements
• Controlled variable: For the cruise control, this is the current speed of the vehicle.
• Manipulated variable: For the cruise control, this is the throttle setting.
• Set point: The desired speed is set and modified by the accelerator input and the

increase/decrease speed input, respectively. Several other inputs help control whether the cruise
control is currently controlling the car: System on/off, engine on/off, brake, and resume.

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 7

These interact: resume restores automatic control, but only if the entire system is on. These inputs
are provided by the human driver (the operator, in process terms).

• Sensor for controlled variable: For cruise control, the current state is the current speed, which is
modeled on data from a sensor that delivers wheel pulses using the clock. However, see
discussion below about the accuracy of this model.

The restated control task was, “Whenever the system is active determine the desired speed and control the engine
throttle setting to maintain that speed.” Note that only the current speed output, the wheel pulses input, and
the throttle manipulated variable are used outside the set point and active/inactive determination. This leads
immediately to two subproblems: the interface with the driver, concerned with “whenever the system is active
determine the desired speed” and the control loop, concerned with “control the engine throttle setting to maintain
that speed.”

The latter is the actual control problem; we’ll examine it first. Figure 7 shows a suitable architecture for the
control system. The first task is to model the current speed from the wheel pulses; the designer should
validate this model carefully. The model could fail if the wheels spin; this could affect control in two ways. If the
wheel pulses are being taken from a drive wheel and the wheel is spinning, the cruise control would keep the
wheel spinning (at constant speed) even if the vehicle stops moving. Even worse, if the wheel pulses are being
taken from a non-drive wheel and the drive wheels are spinning, the controller will be misled to believe that the
current speed is too slow and will continually increase the throttle setting. The designer should also consider
whether the controller has full control authority over the process. In the case of cruise control, the only
manipulated variable is the throttle ; the brake is not available. As a result, if the automobile is coasting faster
than the desired speed, the controller is powerless to slow it down.

The controller also receives two inputs from the set point computation: the active/inactive t o g g l e, which
indicates whether the controller is in charge of the throttle , and the desired speed, which only needs to be valid
when the vehicle is under automatic control. All this information should be either state or continuously updated
data, so all lines in the diagram represent data flow. The controller is implemented as a continuously-evaluating
function that matches the dataflow character of the inputs and outputs. Several implementations are possible,
including variations on simple on/off control, proportional control, and more sophisticated disciplines. Each of
these has a parameter that controls how quickly and tightly the control tracks the set point; analysis of these
characteristics is discussed in Section 3.4. As noted above, the engine is of little interest here; it might very well
be implemented as an object or as a collection of objects.




Pulses From Wheel



Wheel Rotation

Figure 7: Control Architecture for Cruise Control

The set point calculation divides naturally into two parts: (a) determining whether or not the automatic system is
active—in control of the throttle and (b) determining the desired speed for use by the controller in automatic

Some of the inputs in the original problem definition capture state (system on/off, engine on/off,
accelerator, brake) and others capture events (wheel pulses, increase/decrease speed, resume, clock).
We will treat accelerator as state, specifically as a continuously-updated value. However, the determination of
whether the automatic cruise control is actively controlling the car is cleaner if everything it depends on is of the

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 8

same kind. We will therefore use transitions between states for system on/off, engine on/off, and brake.
For simplicity we assume brake application is atomic so other events are blocked when the brake is on. A more
detailed analysis of the system states would relax this assumption [Atlee and Gannon 91].

The active/inactive toggle is triggered by a variety of events, so a state transition design is natural. It’s shown in
Figure 8. The system is completely off whenever the engine is off. Otherwise there are three inactive and one
active states. In the first inactive state no set point has been established. In the other two, the previous set point
must be remembered: When the driver accelerates to a speed greater than the set point, the manual accelerator
controls the throttle through a direct linkage (note that this is the only use of the accelerator position in this
design, and it relies on relative effect rather than absolute position); when the driver uses the brake the control
system is inactivated until the resume signal is sent. The active/inactive t o g g l e input of the control system
is set to active exactly when this state machine is in state Active.

OFF Inactive



Engine On System On

All states

Accel ↑ Set Point
Accel ↓ Set Point

Brake Resume
Engine Off

All states
except OFF

System Off

Figure 8: State Machine for Activation

Determining the desired speed is simpler, since it does not require state other than the current value of desired
speed (the set point). Any time the system is off, the set point is undefined. Any time the system on signal
is given (including when the system is already on) the set point is set to the current speed as modeled by w h e e l
pulses. The driver also has a control that increases or decreases the set point by a set amount. This, too, can be
invoked at any time (define arithmetic on undefined values to yield undefined values). Figure 9 summarizes the
events involved in determining the set point. Note that this process requires access to the clock in order to
estimate the current speed based on the pulses from the wheel.

Event Effect on desired speed

Engine off, system off Set to “undefined”
System on Set to current speed as estimated from wheel pulses
Increase speed Increment desired speed by constant
Decrease speed Decrement desired speed by constant

Figure 9: Event Table for Determining Set Point

We can now (Figure 10) compose the control architecture, the state machine for activation, and the event table for
determining the set point into an entire system. Although there is no need for the control unit and set point
determination to use the same clock, we do so to minimize changes to the original problem statement. Then,
since current speed is used in two components, it would be reasonable for the next elaboration of the design to
encapsulate that model in a reusable object; this would encapsulate the clock.

All of the objects in Booch’s design (Figure 6) have clear roles in the resulting system. It is entirely reasonable to
look forward to a design strategy in which the control loop architecture is used for the system as a whole and a
number of other architectures, including objects and state machines, are used in the elaborations of the elements of
the control loop architecture.

Mary Shaw Beyond Objects: A Software Design Paradigm Based on Process Control 9

The shift from an object-oriented view to a control view of the cruise control architecture raised a number of design
questions that had previously been slighted: The separation of process from control concerns led to explicit choice
of the control discipline. The limitations of the control model also became clear, including possible inaccuracies
in the current speed model and incomplete control at high speed. The dataflow character of the model showed
irregularities in the way the input was specified, for example mixture of state and event inputs and the
inappropriateness of absolute position of the accelerator.


Wheel Rotation Control



for Toggle

Event Table
for Set


Figure 10: Complete cruise control system

3 . Analysis and Discussion

3 . 1 . Correspondence between architecture and problem

The selection of an architecture commits the designer to a particular view of a problem. Like any abstraction, this
emphasizes some aspects of the problem and suppresses others. Booch [Booch 86] characterizes the views inherent
in object-oriented and functional architectures:

Simply stated, object-oriented development is an approach to software design in which the decomposition
of a system is based upon the concept of an object. An object is an entity whose behavior is characterized
by the actions that it suffers and that it requires of other objects. Object-oriented development i s
fundamentally different from traditional functional methods, for which the primary criteria for
decomposition is that each module in the system represents a major step in the overall process.

The issue, of course, is deciding which abstractions are most useful for any particular problem. We have argued
that the control view is particularly appropriate for a certain class of problems. In this case, the control view
clarifies the design in several ways:

• The control view leads us to respecify the output as the actual speed …

To read
• [required] M. Shaw. Beyond objects: A software design paradigm based

on process control . ACM Software Engineering Notes, 20(1):27-38, Jan.


• [required] M. Shaw. Comparing architectural design styles. IEEE Software,

12(6): 27-41, Nov. 1995.

• [required] JULIA CAMBRE. One Voice Fits All Social Implications and

Research Challenges of Designing Voices for Smart Devices

To turn in
Prepare a brief written answer to the following two questions. Write up your
answer using MS Word or LaTex. One well-presented paragraph for each
question is sufficient.

1. What characteristics of a software design problem would make it a
good match for a process control solution style? (150 words)

2. Looking at the designs in the readings and how each attempts to be
flexible, what kinds of potential change/adaptability can you identify
for the cruise control design problem? (150 words)

3. Do you think that there should be one universal virtual assistant
template (same design in voice, gender, etc.) or that there should be
some variation? Why? (100 words)

  • To read
  • To turn in


One Voice Fits All? Social Implications and Research
Challenges of Designing Voices for Smart Devices

JULIA CAMBRE, Human-Computer Interaction Institute, Carnegie Mellon University
CHINMAY KULKARNI, Human-Computer Interaction Institute, Carnegie Mellon University

When a smart device talks, what should its voice sound like? Voice-enabled devices are becoming a ubiquitous
presence in our everyday lives. Simultaneously, speech synthesis technology is rapidly improving, making it
possible to generate increasingly varied and realistic computerized voices. Despite the flexibility and richness
of expression that technology now affords, today’s most common voice assistants often have female-sounding,
polite, and playful voices by default. In this paper, we examine the social consequences of voice design, and
introduce a simple research framework for understanding how voice affects how we perceive and interact with
smart devices. Based on the foundational paradigm of computers as social actors, and informed by research in
human-robot interaction, this framework demonstrates how voice design depends on a complex interplay
between characteristics of the user, device, and context. Through this framework, we propose a set of guiding
questions to inform future research in the space of voice design for smart devices.

CCS Concepts: • Human-centered computing → HCI theory, concepts and models.

Additional Key Words and Phrases: voice interface; voice assistants; human-robot interaction; IoT; voice
design; speech interfaces; intelligent personal assistant; voice user interface

ACM Reference Format:
Julia Cambre and Chinmay Kulkarni. 2019. One Voice Fits All? Social Implications and Research Challenges of
Designing Voices for Smart Devices. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 223 (November 2019),
19 pages.

The human voice is rich in social information. Independent of content, the sound of a voice
conveys several signals that humans are naturally attuned to recognize, such as the gender, age,
and personality of the speaker [35, 47, 59]. As Nass and Brave (2005) note in their book, Wired For
Speech, these powerful responses to voice were evolved to facilitate human-human conversation,
provoking a crucial research question: “How will a voice-activated brain that associates voice with
social relationships react when confronted with technologies that talk or listen?” [59].

In the years since, voice technology has become ubiquitous: already, 46% of adults in the United
States use a voice assistant on a daily basis [62], and estimates suggest that there will be upwards
of 8 billion voice assistants worldwide by 2023 [69]. While smart speakers and smartphones may be
largely driving this growth, there is also a growing trend towards embedding voice assistants in a
diverse range of “smart” devices: these range from in-car navigation and entertainment systems, to
microwaves, thermostats, and even toilets [15, 51, 81]. At the same time, speech synthesis technology

Authors’ addresses: Julia Cambre, [email protected], Human-Computer Interaction Institute, Carnegie Mellon University,
5000 Forbes Avenue, Pittsburgh, Pennsylvania, 15213; Chinmay Kulkarni, [email protected], Human-Computer
Interaction Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania, 15213.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected]
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2573-0142/2019/11-ART223 $15.00

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

223:2 Julia Cambre & Chinmay Kulkarni

has also advanced considerably in recent years; new models based on deep neural networks such
as WaveNet are now capable of generating increasingly varied and more human-sounding speech
compared to prior approaches like concatenative or parametric synthesis [30, 63]. This explosion
in the popularity and pervasiveness of voice interfaces—along with rapid improvements in speech
technology—adds new urgency and complexity to the question Nass and Brave raised nearly 15
years ago.

Within the Human-Computer Interaction and Computer-Supported Cooperative Work commu-
nity, this trend has not gone unnoticed. In recent years, researchers have studied voice assistants
from a number of angles. Several papers have explored users’ patterns of everyday use with common
voice assistants like Alexa, Siri, and the Google Assistant [3, 44, 66, 70]. Others have considered
usability challenges faced by natural language processing errors [58, 74], and future use scenarios
such as leveraging speech to navigate videos [13] or promote workplace reflection [36]. There have
also been efforts to establish a more theoretical or vision-setting perspective on voice technology:
for example, Cohen et al. [18] and Shneiderman [72] have weighed in on the merits of voice as
an interaction medium, while Murad et al. [56] proposed an initial set of design guidelines for
voice interfaces. Within the CSCW community specifically, voice interactions have also received
considerable attention in recent years, with papers and workshops on topics ranging from ac-
cessibility [12], to automated meeting support [49], Wizard of Oz prototyping techniques [45],
privacy [37], multi-user interaction [67], and more. While these papers all offer useful perspectives
on voice interface design, their focus has almost exclusively been on what voice assistants say in
conversation, rather than on how they say it.
This paper poses a seemingly straightforward question: What should the voices of our smart

devices sound like? Specifically, as we move towards a future in which users interact through speech
with not just smartphones and smart speakers, but with an increasing array of everyday objects,
selecting a voice identity for these smart devices remains an open design challenge with important
social consequences.

This paper introduces a research framework for understanding the social implications of design
decisions in voice design. To demonstrate the utility of this framework, we both summarize existing
research using it, and discuss a sampling of new research questions it generates. To generate
this framework, we consider the design space of smart device voices, and organize the literature
around what we know about how the features of a synthesized voice shape our interactions with
speech-enabled technology. In doing so, we rely heavily on research in human-robot interaction
(HRI), while still incorporating research from other fields such as social psychology and design
We are not the first to propose a framework for voice design. For example, Clark et al [16]

mapped out the existing space of research on voice in HCI through a recent review of 68 papers.
Through this review, the authors suggest a set of open challenges for the field, including a need
for further design work and studies of multi-user interaction contexts. Importantly, however, their
review deliberately excluded papers focusing on embodied interfaces. Our framework complements
Clark et al.’s review by focusing explicitly on this area of embodied voice design.

Our HRI-based perspective also distinguishes this paper from recent work that studies the design
of speech interfaces with voice in isolation. For example, Sutton et al. propose a framework based
on findings from socio-phonetics [71]. While studying voices in isolation prevents the confounding
effects of voice with the effects of embodiment, in practice embodiment, form-factor, and contexts
of use do indeed influence how people perceive voice interfaces and social robots [23, 28, 34, 50]. In
our work, we hold that these attributes are not undesirable confounds, but necessary dimensions of
analysis: smart devices necessarily will possess form, contexts of use, and perhaps even human-like
embodiment. Thus, because embodiment and form and voice together affect perception is precisely

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

Social Implications and Research Challenges of Designing Voices for Smart Devices 223:3

why we they should be studied in a holistic fashion. Therefore, an HRI-based perspective that
combines embodiment and voice offers more holistic guidance that would be difficult if these factors
were studied in isolation.

The lack of research frameworks that consider embodiment might also be responsible in part
for current practice that seems to be moving towards a “one voice fits all” approach, with large
companies embedding their respective assistant service across as many supporting devices as
possible. Recent reports from Amazon indicate that there are over 28,000 Alexa-enabled smart
home devices [10], meaning that a given user could own a microwave, car, smoke detector, and
more, all of which speak with the same synthesized voice.

On the one hand, companies may favor this design choice as it helps solidify their brand identity
and ensures a more consistent experience across products. Early work on speech interfaces has also
suggested that using the same voice for multiple services can increase perceptions of intelligence in
the voice persona [59, p.105-112]. On the other hand, the trend towards uniform assistant identities
has drawn repeated criticism from popular press [76]. Feminist HCI researchers have also justly
criticized these decisions [77], particularly because today’s main voice assistants (e.g. Siri, Alexa,
the Google Assistant) take on female, polite, and friendly voices by default in many locales. As
journalist Chandra Steele writes, “companies have repeatedly launched these products with female
voices and, in some cases, names. But when we can only see a woman, even an artificial one, in
that position, we enforce a harmful culture” [76]. Indeed, a recent report by the UN cited artificial
intelligence—and particularly the personification of many voice assistants as young women—as
responsible for perpetuating harmful gender stereotypes [82]. To us, these design decisions and their
corresponding critiques underscore the need for a framework that carefully considers embodiment,
paralinguistic aspects of voice design, and their social implications together.

This paper contributes a novel research framework for understanding the design of smart device
voices. Drawing upon literature from human-robot interaction and other fields, we synthesize three
lenses that we believe are particularly useful in voice design: user, device, and context. Through
this work, we hope to open the conversation and inform new research directions in the rapidly
evolving space of voice-based interaction.

A rich history of research suggests that human-computer interactions largely parallel human-human
interactions. In their 1994 paper, Nass et al. [61] asserted that humans engage with computers
in ways that are consistently and fundamentally social: a user behaves towards computers as
they might towards other human beings, despite knowing intuitively that computers are not
animate [61]. This theory, known as the “Computers are Social Actors” paradigm, has become an
influential blueprint for a long line of subsequent research in social computing. Through a series of
five experimental studies, Nass et al. systematically replicated several key findings from literature
in sociology and psychology that had been well-established patterns of interpersonal behavior.
Particularly relevant to this discussion of voice interaction, their findings suggested that people
naturally use voice (rather than a device’s physical “box”) to differentiate computer identities,
and that people automatically ascribe gender stereotypes to computers as well. The conclusion
that voice serves as the key feature for distinguishing between computers was elaborated over
two experiments, where the authors first found that users considered different computer boxes
that spoke with different voices as distinct intelligences, and built upon this to find that “subjects
responded to different voices as if they were distinct social actors, and to the same voice as if it
were the same social actor, regardless of whether the different voice was on the same or different
computer” [61]. In the context of today’s smart device ecosystem, this finding has important

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

223:4 Julia Cambre & Chinmay Kulkarni

implications, suggesting that users may consider all the devices that share a common voice (e.g. all
Alexa-enabled objects) to have a common intelligence [50].

The CASA paradigm suggests that models of human-human interactions might inform how users
might respond to social forms of technology. For example, it is possible that models of collaboration,
trust, or even social support between people may apply to interactions between people and voice
interfaces. This suggests a central role in this research for the CSCW community, which has long
studied these models.
Within the scope of this paper, we focus specifically on aspects of impression formation and

management (i.e. how users form initial impressions of others, and how they manage others’
impressions of themselves.) Clearly, impression formation and management has immediate and
profound effects on interaction. For example, consider the well-established phenomenon from the
social psychological literature of thin slicing, which suggests that people make rapid, but often
accurate judgments with only brief glimpses of behavior. For example, in a famous study by Ambady
and Rosenthal [2], participants were able to judge the teaching ratings of college professors from
short, silent video clips (from 10 seconds, down to even 2 seconds) with high accuracy compared
to end-of-semester student ratings [2]. Others have shown similar effects for speech. McAleer
et al. [47] investigated how robustly people could predict personality traits from an extremely
brief sample of a speaker’s voice. Participants listened to audio clips of various speakers saying the
word “hello,” resulting in sub-second exposure to each voice (recordings were 390ms on average).
Listeners were highly consistent in how they rated perceived personality traits of the voices [47].
These results suggest that people form rapid judgments about a person’s characteristics through
their voice.

Similarly, does impression formation by thin slicing apply to voice-based agents as well? Indeed,
results by Chang et al. suggest this may be the case [14]. They presented participants with 10
second clips of eight “candidate” voices for a caregiving robot, which varied along gender, age, and
personality characteristics. Impression formation theory suggests that these short clips should lead
participants to correlate them with personality traits, and indeed despite the range of potential
voice options, participants overwhelmingly tended to prefer extroverted female voices, aligned
with stereotypes of humans that take on caregiving roles.

Results such as these and others presented in the sections that follow suggest that because voice-
based agents (as computers) are social agents, impression-formation and management processes
might therefore have immediate and profound implications for how that device is perceived.
This observation also suggests a framework for voice-design: designing voices can be seen as
analogous to impression-management. Just as human impression management is mediated by
physical characteristics, traits, and behaviors in context (or on-stage) [26], voice design can be
seen as mediated by device characteristics, interactional traits with users, and contextual issues.
Other scholars have also similarly hypothesized that voice design is analogous to designing for
performance i.e. on-stage behaviors, but also that “we are a long way from realizing a sense of
performance from speech systems” [6]. With our framework, we hope to fill in this gap.

While the theory of impression management and performance give an overall guiding principle,
what concrete features must researchers and designers focus on while designing voice interfaces?

A large body of related work from nearby fields—particularly in human-robot interaction (HRI)—
considers aspects of impression management and performance, albeit often indirectly as questions
of embodiment and paralinguistics. We draw upon these results here to inform our design space,
and to make concrete recommendations for future research. Specifically, we considered the broader

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

Social Implications and Research Challenges of Designing Voices for Smart Devices 223:5

Fig. 1. Overview of the conceptual model for smart device voice design. Here, we argue that voice design
should be considered through three lenses: user (representing aspects of the user’s identity, such as gender
and personality), device (concerning the smart device’s appearance and functionality), and context (aspects of
the situation in which the device is used, such as language and culture, or longitudinal changes). The amount
of overlap depicted between user and device characteristics (light grey) may vary depending on the designer’s
goals for the interaction.

space of research on human-agent interactions (where an agent might be a robot or a disembodied
voice), with an eye towards studies on impression formation and voice characteristics.

One point of departure from our impression management metaphor is that unlike humans,
nearly every aspect of voice interfaces is malleable (humans, on the other hand, find it challenging
to change physical attributes such as their height). Therefore, as an organizing framework we
eschew the rich, nuanced models of impression formation and take inspiration from a simple
model introduced by Mutlu et al. [57]. Mutlu et al. suggest that social interaction in human-robot
interaction emerges from three components: user attributes, robot attributes, and task structure. In
their model, user attributes constitute demographic information like the user’s age or gender; robot
attributes are aspects of robot’s appearance or other features that suggest personality, such as voice;
and task structure refers to whether the activity that the user and robot perform together involves
cooperation, competition, or other shared behavior like planning.
Taking these three elements as a starting point, we propose a slightly modified version of the

model for the particular use case of designing voices for smart devices, consisting of: user, device,
and context. The following sections define and discuss related literature for each of these lenses in

These lenses are one way to simplify the organization of this literature, but these lenses are not
intended to be mutually exclusive; instead, we conceptualize them as modeled in Figure 1, where
user-device relationship may share some amount of overlap, and are together situated within the
broader contextual concerns we will describe. Note that this paper introduces our framework, but
the task of filling it in is far from complete: the most obvious omission is around how linguistic
content affects speech-based interaction, which is thoroughly studied in [16]. Specific linguistic
aspects will enrich each of the lenses we describe.

Finally, even though the majority of our examples below are from the HRI and Communications
community, there is a growing number of studies that directly addressed voice design in the context
of smart devices. We hope our paper offers a guiding framework for such work in the future.

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

223:6 Julia Cambre & Chinmay Kulkarni

3.1 User
One of the most pervasive themes that emerged from the literature was a focus on the user’s
identity, and on how personal attributes affect responses to an agent (robot or voice). Through this
lens, a person’s characteristics (e.g. their gender, personality traits, etc.) serve as an anchor; studies
that take this approach generally measured the user’s attributes and looked for interactions based
on whether the agent’s attributes matched.
Within the domain of human-robot interaction, prior work has found that users can not only

identify personality traits in a robot based on verbal and non-verbal behavior, but are attracted to
robots that had a personality complementary to their own [39]. In a study by Lee et al., participants
played with an AIBO, a social robot that resembles and plays like a dog. Equal numbers of introverted
and extroverted participants were randomly assigned to interact with either an introverted or
extroverted version of the AIBO dog; to simulate the AIBO’s introversion / extroversion, the
researchers adjusted features like the loudness and pitch of its synthesized voice, and manipulated
the AIBO’s physical movements to match personality traits (e.g. making larger and faster movements
to signal extroversion). The study found strong evidence for a “complementarity attraction effect”
with the AIBO: in other words, participants felt more positively towards an AIBO that complemented
their own personality in introversion / extroversion, as measured by responses to ratings of
intelligence and social attractiveness [39].

Interestingly, these findings are somewhat inconsistent with earlier work on disembodied com-
puterized voices, which found that users preferred voices that exhibited a similar personality to
their own [60]. As Lee et al. discuss, this discrepancy may be a consequence of how much sensory
information people have when interacting with a voice versus a robot: “we believe that there is a
fundamental difference between the interaction with disembodied agents and the interaction with
embodied agent” [39].
Other work has investigated whether a user’s age may also influence their perception of agent

voices. Chang et al. [14] explored how different synthetic voices were perceived by baby boomers in
Taiwan; their focus was on voices embedded within social robots given the potential future caregiv-
ing applications. In the study, participants first watched a prototype video of “ELLIQ” (a care-giving
robot that reminds users to take medicine, call family, and so on), presented with Chinese language
subtitles. Participants were then presented with 10 second clips of eight “candidate” voices, which
were prerecorded human voices. The voices were chosen to vary on gender, age, and personality
characteristics. Despite the range of potential voice options, participants overwhelmingly tended
to prefer extroverted female voices; there was no significant difference in preference for younger
versus older sounding voices [14].

3.2 Open questions about user-centric voice design
Individualization of voices To what extent should the voice of a smart device be tailored to its
user? Voice assistants currently take a largely “one size fits all” approach in which each instance
of a given device takes on the same voice by default; indeed, the same voice is often used across
devices powered by the same company’s assistant software (e.g. Alexa-enabled or Google Assistant-
enabled devices). However, these studies on user characteristics suggest the alignment between
user demographics and the demographics suggested by a device’s voice likely play a crucial role in
affecting interaction. This invites two open questions. First, if a device is used by more than one
person, how might it adaptively individualize its voice to match multiple people? Previous work
in HRI has found that robots which engage in vocal entrainment—changing the pitch, speaking
rate, intensity, and other features of speech to mirror the user—have positive social outcomes by
improving perceptions of rapport, trustworthiness, and learning [42, 43]. Such real-time voice

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

Social Implications and Research Challenges of Designing Voices for Smart Devices 223:7

adaptations between smart devices and the user or users might yield similar benefits. Second,
individualization may suggest either voices that are similar to users’, or with attributes which are
complementary. We examine this in more detail below.

Similarity vs. complementarity Following from the discrepant results between personality
alignment preferences with robotic agents versus disembodied computer voices, one rich area for
research is how the degree of embodiment affects similarity versus complementarity attraction
effects in voice characteristics. Lee and Nass used a desktop computer with headphones as their
source of their disembodied voice [60]. In the 18 years since, voice devices have vastly increased in
diversity. Future research may thus investigate what characteristics of embodiment might work
better with similar or complementary voices.

User preferences with multiple devices In the above studies, users were exposed to only a
single robot or voice agent. One area that remains less explored within the user-level framing is
understanding how robust these effects and preferences are across multiple devices. One possibility
for future research is to investigate voice preferences for users who are surrounded by multiple
robots or devices capable of interacting through speech. For instance, if an individual owns several
smart home devices, should the devices all take on the same voice identity, or each speak with a
subtly different voice?

3.3 Device
How might features of the device influence preferences and expectations for the device’s voice? In
what ways could a device’s appearance or stereotypes associated with its functionality affect how
people perceive it?

3.3.1 Appearance. The human-robot interaction community has long been interested in studying
anthropomorphic tendencies towards robots. As one example, Kalegina et al. [32] systematically
examined 157 robots with screen-based faces and coded for 76 nuanced features like eye color,
mouth shape, and the presence of eyebrows. Through two surveys, they identified correlations
between facial features and anthropomorphized traits; for example, robots that had cheeks were
perceived as significantly more feminine and childlike than those without, whereas robots lacking
a mouth were perceived as unfriendly and creepy [32]. Similarly, other studies have found that
minimal visual cues can activate automatic stereotypes with robots. Replicating prior results finding
that cues in the robot’s appearance affected the perceived gender of the robot, [31] found that robots
fashioned in gender-stereotypical ways (with pink earmuffs or with a black hat) were perceived as
female and male, respectively.

These inclinations to anthropomorphize based on superficial characteristics of a robot’s design
can also reveal implicit biases that extend from human-human interactions into human-agent
interactions. In a recent paper, Bartneck et al. found that people automatically attributed a race
to a robot based on superficial physical characteristics, and revealed a bias towards both Black
individuals and robot agents racialized as Black in the context of the study. The experiment adopted
the “shooter bias” method from social psychology: participants were presented with a series of
images in rapid succession, and were asked to simulate the role of a police officer deciding whether
to “shoot” at the subject in the picture, who is either carrying a gun or some other harmless object,
like a cell phone or wallet [7]. Previous studies depicting human subjects have found a tendency
to shoot Black subjects more readily than White subjects. The authors also interpret their results
as pointing to a troublesome trend in which most robots are stylized in such a way that they
lack implied racial diversity. Whether and how people attribute race to robots is an active area
of study, and it is likely that the relationships among designed color, interpreted race, and social
consequences that result from either will become more clear over the next several years.

Proc. ACM Hum.-Comput. Interact., Vol. 3, No. CSCW, Article 223. Publication date: November 2019.

223:8 Julia Cambre & Chinmay Kulkarni

Similar research has specifically investigated how a robot’s appearance shapes expectations
and perceptions of the robot’s voice. From both theoretical and empirical standpoints, much of
the prior work in this space points to a fundamental mismatch between …

error: Content is protected !!