Do Architects and Designers Think about
Interactivity Differently?
DAVID KIRSH, UCSD
This essay has three parts. In Part 1, I review six biases that frame the way architects and human–computer
interaction (HCI) practitioners think about their design problems. These arise from differences between working on procedurally complex tasks in peripersonal space like writing or sketching and being immersed in
larger physical spaces where we dwell and engage in body-sized activity like sitting, chatting, and moving
about. In Part 2, I explore three types of interface: classical HCI, network interfaces such as context-aware
systems, and socio-ecological interfaces. An interface for an architect is a niche that includes the very people
who interact with it. In HCI, people are still distinct from the interface. Because of this difference, architectural conceptions may be a fertile playground for HCI. The same holds for interactivity. In Part 3, I discuss
why interactivity in HCI is symmetric and transitive. Only in ecological and social interaction is it also reflexive. In ecological interfaces, people co-create bubbles of joint awareness where they share highly situated
values, experience, and knowledge.
CCS Concepts: • Human-centered computing → Human computer interaction (HCI); HCI theory, concepts and models;
Additional Key Words and Phrases: Interactivity, interface, direct manipulation, networked interaction, embodied interaction, ecological interfaces, socio-ecological, architecture, control transparency, transparency,
seeing through, control effectiveness, joint activity
ACM Reference format:
David Kirsh. 2019. Do Architects and Designers Think about Interactivity Differently? ACM Trans. Comput.Hum. Interact. 26, 2, Article 7 (April 2019), 43 pages.
https://doi.org/10.1145/3301425
1 INTRODUCTION
University curricula in Architecture and human–computer interaction (HCI) are different, so much
so that it is hard to find evidence of cross-pollination. Why is that? Is it because one field is about
designing buildings where people dwell, while the other is about making interfaces for people to
control things or to receive services?
In what follows, I argue that the conceptual separation between the two fields is more interesting than the historical distinction between inhabiting vs. controlling, though admittedly it is
related. I distinguish six ways architects and HCI practitioners think differently about their design
problems, and that, moreover, these six are themselves a reflection of a fairly profound division in
how the two disciplines think about interface, interactivity, and meaning/readability. In the latter
Author’s address: D. Kirsh, Cognitive Science, UCSD, La Jolla, CA 92093-0515; email: kirsh@ucsd.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
1073-0516/2019/04-ART7 $15.00
https://doi.org/10.1145/3301425
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7
7:2
D. Kirsh
two-thirds of this article, I explore what interface and interaction means in each field, where the
concepts overlap and how they differ in the two fields.
To foreshadow my conclusion, I argue that architects work with a more embodied and social
notion of humans than HCI. HCI might do well to consider what is missing from their models of
humans if it is true that they do take a less embodied approach. We need to explicitly ask deeper
questions about what is special about the human side of the human–computer relation. Likewise,
foundational questions about interface and interactivity have also to be asked because when we
design for humans we want to be informed by our best scientific theories of what it means for
humans to interact. These fundamental concepts shape a field. By clarifying and systematizing
these ideas, I hope to open a conversation that will lead to reframing existing views in both HCI
and Architecture.1
A few words about my approach. For several years, I have been resident at the Bartlett School of
Architecture trying to understand the distinctive elements of architectural thought. I have had the
help of many architects who explained their views on dozens of questions over scores of hours,
related to their ways of thinking about buildings and people’s relations to them. My goal here is to
present ideas that architects often seem to assume without explicit discussion. If the result appears
to border on the philosophical, it is because the major theme of this inquiry is how embodiment
plays out in the foundations of architectural thinking and how this can inform HCI. The points I
make about interface, interactivity, and architectural readability are not ones to be found in books
or architectural articles; they are the outcome of my effort, as an outsider, to make sense of how
architects speak. The result is a somewhat novel theory of interface and interactivity and a view
of architecture that has my own cognitive bias. In the end, many of the differences between HCI
and architecture stem from the undisguisedly embodied and social nature of architecture. Social
interaction is between flesh-based humans in proximity. Although virtual and digitally mediated
beings are becoming more sophisticated and common in buildings, and we soon may share our
activity with digitally clothed colleagues, the soul of architecture remains tied to space consuming
proximate humans. People we can shake hands with and embrace. What this means for HCI and
Architecture is inevitably a bit philosophical.
What follows next are what I believe are foundational differences in the way architects and HCI
practitioners think about design. Those impatient to see how the two fields think differently about
interfaces and interactivity may skip to part two or part three.
2 PART 1: DIFFERENCES IN THINKING BETWEEN HCI AND ARCHITECTURE
Here are six ways architects and practitioners of a somewhat idealized form of HCI think differently about their design problems. Of course, HCI as practiced today, is broad with specialists
working on topics that look very different than the old days of designing interfaces for computers.
Nonetheless, I believe a vestigial mindset remains that keeps architects and HCI designers from
fully understanding the other’s perspective. This difference in mindset also represents a challenge
I personally had to grapple with in order to see things the way my architectural informants do.
2.1
Social vs. Task Focus
How would you design a room like that in Figure 1: room shape, lighting, windows, ceiling, and
passageways? To design a space, an architect needs to know who will be there, how long, what
they are likely to do, how they get there and how they leave. Architects design for activities of
all sorts, from manufacturing and assembly, to professional cooking, working in foundries and
1 After completion of this article the author found a provocative earlier effort to jump start a dialogue between architecture
and HCI. See [17].
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:3
Fig. 1. A space designed for different forms of social interaction and work. Image credit: Gensler Microsoft
London.
offices, and taking enjoyment in retail shopping. Still, it is often said that the heart of building
design is about supporting social activity rather than task performance per se. In virtually every
functional context, social encounter is important. It is a tenet of modern design that much of value
creation comes through meaningful social exchange and team work. “In Silicon Valley the tight
correlation between personal interactions, performance, and innovation is an article of faith” [84].
Architects are expected to design to facilitate personal interaction. They must focus on how space
and surfaces can be structured to help people interact. This holds as much for shopping, as for
office work, individual creative work, and even production lines, where observers and managers
must pass through.
What is special about social interaction? In physical space, it is typically face-to-face, or within
speaking distance, and it is embodied in that it takes place between space-occupying agents with
faces and backs and shared situation and social awareness. When two people are in the same space,
they share knowledge of what is around them, they can see where each is pointing [80], they can
act jointly on nearby things, and they can engage in full body interaction – what some have called
performative actions [33]. These are the starting assumptions of architects.
In much of HCI, social interaction is important too. Yet, historically, the standard interface was
designed for single users working with digital input and output [61]. Requirements were based on
functionality and task needs of that individual at her station. In Computer-supported cooperative
work (CSCW), where the focus is explicitly on collaboration and social interaction, the mindset
is group oriented and indeed joint activity is fundamental. Central questions, though, are less
about physical presence, and embodied social interaction than on sharing digital information and
coordinating computer-mediated activity. The metrics that matter most in CSCW have been (and
remain) related to task completion and effectiveness [59]. Even context-aware systems operating
ubiquitously and invisibly still have as objectives the notion of delivering a functionally specified
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:4
D. Kirsh
service [25]. Architects, perhaps to their loss, rarely have specifications that involve such task or
service related metrics2 because their focus is more on the embodied and social use of space.
2.2
Immersive
Buildings are essentially 3D forms and much larger than humans. Experience suggests that the
“spatio-temporally immersive” experiences that users have in buildings are very different than
users’ experience of artifacts that are manually controlled [2]. Working on something is different
than being in some place or working in there. When interaction is hand sized, it takes place in
peripersonal space – the space within hand and arm’s reach [72]. This certainly applies to handsized input devices and monitors whose WYSIWYG style reinforces the sense of hand control.
Things have changed in HCI. But the imprint of peripersonal thinking is evident even in today’s
more contextual HCI. The inspiration of direct manipulation is strong.
In buildings, interaction tends to be body-sized, taking place in extrapersonal space, where people interact among themselves and move about [81]. Navigation is not like direct manipulation;
nor is social interaction. Naturally, there are moments when people are touching knobs, handles,
and switches – situations where interaction is peripersonal – and at such times architects must
think about direct manipulation. But supporting how people act in open space requires a different
model of interaction.
Along with the neuropsychological differences between acting in peri- and extrapersonal space
[30], there is a cognitive difference related to perspective that comes when a person adopts a
first person point of view (POV) when acting [12]. When objects are small relative to us, they
can be seen from above, and from multiple angles. In principle, we could manipulate them in our
peripersonal workplace where we can see them from all angles by rotating them. Monitors bolster
this feeling of acting in a manual workspace. When things are large relative to us, however –
big pieces of furniture, or the spaces we traverse – we cannot survey them as if from heaven.
Think of the difference between playing with a doll house where one can see all corridors and
spaces from above and we are able to move “large” furniture by hand vs. moving real sofas and
walking through open space from room to room. Or consider the difference between playing with
a toy car on the floor and driving on the highway. When we play with small things, we assume
a more cosmic, god’s eye perspective that includes a feeling of control we do not have inside real
traffic or real buildings [14]. David Marr captured the spatial portion of this idea by distinguishing
object-centered representations – where we conceptualize an object as a 3D form rotatable in
any dimension (as if in a 3D graphics system) – from subject-centered representations, his 2½D
representations that are shaped 2D surfaces, viewed from a point in 3D space, like the skin on our
face viewed from a specified point in the room [53, 54]. Architects think about interactivity and
interfaces as being experienced in a world of subject-centered 2½D surfaces and social encounter
but created from countless points in 3D geometry. What does it look like from this angle, and this
perspective? See Figure 2. People inhabiting space always have a POV. People handling objects
2 The
absence of metrics for the efficiency and effectiveness of a designed space has often been held as a sign that architecture has yet to reach the evidence-based stage. Part of this challenge reflects the credit assignment problem: how well a
building is working for its occupants measured by the success of activities depends not only on design but on work practices, technology, and furniture, all acting over time, making it hard to assign credit or blame to any one component or any
one time period. The result: almost all post-occupancy studies confine their tests and evaluations to light and air quality,
energy efficiency, and other physical design parameters that may affect workers and building performance even though
these effects may be small compared with the impact of structural layout, interior design, architectural beauty, novelty, not
to mention team composition, work practices, social happiness, and so on. Without a good measure of occupant efficiency,
effectiveness, and happiness on the one hand, and a good method of describing the meaningful structural differences of
spaces on the other, architecture remains an evaluation challenged design field.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:5
Fig. 2. A staircase in the Villa Savoye, Poissy, France designed by Le Corbusier and his cousin, Pierre
Jeanneret, 1928–1931, is a perfect example of how architects think of structure from a ‘multi-point firstperson point of view. Here, we see how Le Corbusier used black hand rails to create Mondrian-like limned
forms visible from certain angles. Image Credit: europaconcorsi.com.
can take a more 3D view, conceptualizing the whole shape almost at once because manual objects
can be so easily rotated.
The sooner HCI’s God’s eye approach is swapped for a “multi-point” first-person POV, the more
relevant an architectural perspective becomes to HCI design. It has already happened. The HCI that
increasingly matters now is woven into the fabric of our everyday lives. Interaction is moving fast
from dedicated input devices, like the mouse, joystick, wand, mobile screen, and watch, which are
all hand sized and under a user’s intentional and manual control, to devices that are less visible and
more appropriate to inhabiting space, such as ambient, sensor-based technologies that are in-body,
on-body, or distributed throughout our environments. Interaction is becoming more embodied and
tangible [32]. More implicit too [37]. Understanding new notions of interaction and embodiment is
bound to be useful as HCI pushes toward a more inhabiting, immersive model. Architecture might
offer some lessons. Certainly, notes should be compared.
2.3 Unmediated
When two people collaborate, much of their time is spent in unmediated interaction; just doing
things together. There is no interface boundary between people, it is a social relation. When people interact with digital systems, by contrast, their actions must be mediated in order to cross the
digital–physical boundary. Because feedback from the digital side is fast and often perceptually
realistic, it engenders a sense of direct manipulation and agency – a feeling that interaction and
control is unmediated. Nonetheless, control is always mediated because we use tools, input devices, to effect change. Even when the input device is a camera and there is no special action we
must perform to create a signal, it is still clear where the input surface is. It is the camera lens. It
constitutes a boundary that must be crossed.
There are further differences between interacting socially and interacting through a digital interface. First, when people interact, they choose where and how: with coffee nearby, a table between
them, while sitting on chairs, or a couch, with a whiteboard in easy reach or while they jointly
look at their phones. These physical elements are part of the social interface they inventively
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:6
D. Kirsh
Fig. 3. (a) There is a clear physical interface between hand and screwdriver, screwdriver and wood. A tool
mediates our actions on screws. (b) A digital mouse is a digital–physical interface point. By moving a mouse,
we interact with objects represented by the images on the screen. (c) When two people interact face to face,
it is less apparent where their interface is. Spoken language and gesture are obvious mediators and might be
said to provide an interface that both parties use as tools to pass thought and express feelings. But that view
is limited because proxemics, the shape of space and one’s surroundings all affect how people interact and
what their words and gestures mean. Nearby things are props and shareable referents that challenge the
simple idea of mediation. (d) What mediates two people shaking hands? Their joint hand action constitutes
the process of shaking hands. By definition it is unmediated.
create. If we ask where the interface is, given that there is no set of physical things that must
serve to transmit signals, or that are required to mediate social exchange, it is likely to be found
in the comportment of people and their changing relation to physical objects and others in the
space. For instance, chairs and coffee function like props on a stage; they help create the setting in
which people interact. But they don’t mediate social interaction the way mouse and touchscreen
mediate human–computer interaction; they scaffold it. There is no physical–digital boundary between people. “Props” and their layout are part of the live context where people make sense of
each other, determining the kind of person the other is, what their demeanor means, their level
of engagement. This happens through embodied presence in shared physical space. The objects in
that space can help to shape the process of social interaction [7] such as when a person moves a
chair. But these objects don’t create a boundary-like interface that must be crossed for interaction
to take place. See Figure 3.
Second, when people communicate via speech and gesture, the semantics of these communicative media is far more complicated than the semantics of input devices, where signal and interpretation is predetermined. It is tempting to suppose that social interaction is mediated by language
in a precise way, and that this mediation is analogous to sending a signal in a fixed code through
an input channel. As many years of research has shown this downplays the importance of context
in interpretation. The meaning of an interpersonal action, even a speech action, is highly contextual and relies on people sharing an understanding of situation and context. Speech is situated and
indexical. No simple code or input medium completely mediates this shared understanding. It is
built up, negotiated, and dependent on dozens of social cues. It relies on shared understanding and
meaning making.
Accordingly, people in buildings communicate and share action quite differently than humans
do with computers. Sometimes an object – e.g., a seesaw between children – does indeed mediate activity. But what mediates the joint activity of shaking hands? Hands are not tools. In HCI
systems, mediation is required for interaction. In social interaction, it is not required, and when
props and tools are present their mediating role is far more complex than just carrying signals.
This difference runs deep.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:7
Fig. 4. An extreme example of appropriating an object for a purpose it was never designed for is seen in
Duchamp’s famous urinal displayed in 1917. This became known as found or appropriated art.
2.4 State Space vs. Social Space
Architectural space is not usefully conceptualized as an environment of forced choices – i.e., a network of choice points where people must choose among a small set of possible ways of occupying a
space or doing things in the space. In software-driven systems, even in networked and distributed
systems, the functionality of tool, apps, sensors, and actuators is well specified in advance. In
classical HCI, specification is even narrower: it is the space of physical-inputs and digital-outputs
defining the use of tools, buttons, sliders, and filters. Even in invisible context-aware systems services are well specified. This means that when interacting with digital systems our actions are not
just constrained, they are state space driven. Where we have choice, our decisions come down
to choosing this or that parameterized action. This has the effect that we can’t use a tool in a
completely novel, unprogrammed manner.
How different our everyday world is. In addition to our freedom in speech, gesture, and social
action, we also are free in how we use the objects around us. If the urge takes us, we can throw
pillows from a couch to the ground where they are not meant to be placed and use them there as
backs, seats or bolsters; we can pile them on stairs and make mock chairs; we can take hot coffee
and threaten someone with it, or use it as a prop in a discussion. Although some architectural
spaces have preferred uses – kitchens, bathrooms, studies, and TV rooms – people still co-opt
space relentlessly. We inhabit space, and we appropriate the artifacts around us for our social and
practical ends [9]. See Figure 4. When one inhabits and appropriates, the relation between body,
space, cause, and effect is quite special. It is situated, embodied, embedded, distributed, enactive,
and often extended [41]. We can be inventive and creative. The result is that Markov models and AI
state space models are descriptively inadequate because those formalisms require that all options
must be represented in advance and we do not know the full range of options [34].
This freedom to change things and innovate is only weakly duplicated by reconfiguration and
personalizing in digital systems. Change is lightweight in the social world. It’s nothing to reposition a chair, leave a door open, and alter the lighting. Our spaces have been designed with reconfiguration in mind. Physical space may be inhabited in multiple ways in the course of a single day.
Consider how a breakfast nook in a shared house might be used: for eating, reading a newspaper,
private studying, intimate discussion, repacking a backpack, and working on a computer. It is a
place to do many different things, and often each activity requires laying things out and moving
things around. Reuse of space is intuitive not contrived.
Because alteration of physical space is lightweight, it happens far more frequently than personalization or customization in HCI [8]. And it is collaborative. In buildings, people regularly tailor
make their interpersonal interfaces. They move chairs to face each other or to face sideways to
avoid the feeling of confrontation. They reposition a chair to rest their foot on a ledge. Such small
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:8
D. Kirsh
changes can have outsize effects on the success of a social encounter, suggesting intimacy rather
than formality. Owing to the centrality of social activity in buildings, some architects think of an
interface as a 3D spatial ensemble, the volumes and forms that shape a communicative context.3
These sorts of interfaces come and go. They are configured by the participants in the short run,
but there is no doubt that initial room design plays a major role. And there is no doubt that architects have views about how different spaces should be architecturally arranged to support the
customary unfolding of social activity.
For instance, it is accepted wisdom that in most buildings there is an “intimacy gradient” [3]. A
personal powder room shouldn’t be by the front door, or the public space for guests right at the
back of the house, forcing them to make their way through intimate areas. These ideas are present
in architectural thinking when decisions about ceiling height, room size, and amount of natural
light must be made – all factors that affect our sense of seclusion, quietness, and privacy.
Once we accept that there are emergent interfaces – places of social encounter – that arise because of how architecture, interior design, and the momentary desires of occupants meet we have
to wonder whether the coming wave of AI-driven context-aware systems will operate with a rich
enough model of users. Architects have learned to balance their ideas about how space should be
used with an appreciation that occupants reconfigure and repurpose.
2.5
Procedural Complexity
HCI arose from the need to tame complexity skillfully [77]. Complex systems like power plants or
plane cockpits needed to be controlled more reliably. In application environments, the problem is
often to stay in control when many steps and sub-goals are required for task performance. Almost
nothing we do in buildings is like this. A person may, at times, have to stop and think about the
steps needed to rearrange furniture; or they may take 5 minutes to create complex lighting that requires sequencing of switches. But these are exceptions, not the rule. Cooking can be complicated;
its complexity, though, is not rightly seen as architectural, since so many of the parts interacted
with are pots and pans brought into architectural space. Running machinery in factories is complex, but it is not part of the architecture. Everyone agrees that architects play an important role
in designing the space for machines to operate and pass products. Designing a production line or
a robotic playpen regularly requires architectural involvement. But the job of an architect stops
before the design of the controller interface.
So when does an architect think about facilitating long chains of interaction? To date, aside
from rooms for controlling the building, not often. Again, the reason is not that people don’t do
complex things in buildings; it is that they do not interact with a building or any of its niches in
such complex ways. That is not what building interaction or building interfaces are about. So far!
It is one explanation why the fields have historically had such separate concerns and training, and
a major reason why architecture will increasingly need HCI as greater digital interpenetration of
the physical raises the complexity of building interaction.
2.6 Sense Making
People don’t read buildings the way they do HCI interfaces. Each medium draws on different
metaphors and narrative inspirations. This idea of meaning making goes beyond the semiotics of a
given design where the symbolic meaning of a design is front and center. In Figure 5, Frank Gehry’s
Binocular building made with the sculptor Klaus Oldenburg trumpets the idea of surveillance,
seeing into the distance, being a visionary. Perhaps, it is fitting that it is now owned by Google.
This is building semiotics.
3 Several
of my informants promoted this view.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:9
Fig. 5. The Binocular building in Los Angeles by Gehry and Oldenburg designed for the ad agency Chiat/Day
is a design driven by semiotics. It is more sculptural than functional. In this case, the exterior is more concerned with narrative than efficiency. Photo courtesy of wallyg, Flickr.
There is another sense of reading a building, however, that is deeper and less showy. Architects
are always looking for a good story to explain the design decisions they make about a building’s
morphology.4 In architecture class, the narrative that students tell about their design is often as
important as the design itself. What idea does the building implement? What was the guiding
thought? One reason that buildings often incorporate a narrative is that the design space of architecture, as any architecture professor will tell you, is so large that architects must give themselves
radical constraints to make their task doable [11]. This is a point my architectural informants
stressed. They need a guiding idea or a few distinctive features around which to create their structure. This doesn’t usually apply to designing a small extension to a residential building, or to redesigning a kitchen, where efficiency and functionality, as in HCI, drive imagination. In designing
larger structures, the architectural question is less about efficiency and functionality (primarily);
it is more about narrative and making a space meaningful and special for people.
The design space in HCI is large but orders smaller than in architecture. Narrative is important
in product design, at times, but the scope of the narrative is also smaller, the ambition less grand.
So far, people do not read HCI interfaces the way they read buildings.
Still, there are important overlaps. Part of the idea of making a space meaningful is indeed
similar to the HCI notion of affordance and making an environment readable. People need to be
able to see what they can do in a space: Which part of an HCI space, such as that defined by a
4 For
a beautiful example of a building’s narrative, see [49].
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:10
D. Kirsh
‘window’, is workspace and which parts are for displaying digital tools? What can those tools do
and what must you do with them to have them execute their function? This looks like reading a
church, where one can see where to expect the sermon to be preached, or the choir to stand. Or
how one experiences a restaurant with its booths, bar stools and interior tables. This is prosaic
reading off of function, but it is an element that is between the two.
This completes my account of six areas where there are differences in the way architects and
HCI practitioners frame their problems. The two groups come at their problems differently, with
different design goals, values, and presuppositions. Context-aware computing is bound to change
this. The question is: will it change HCI into something more like architecture, with its very different notions of interface, sense making and interactivity? Or will it go its own way? It is to these
big questions I now turn.
In part two, I first articulate what I believe is a pervasive but implicit view among architects of
what an interface is. I then contrast it with two concepts of interface, then I return to complete the
architectural view of an interface. The implications of those concepts for interactivity are discussed
in Part 3, where three types of interactivity are discussed.
3 PART TWO: INTERFACES
3.1 Architecture
In architecture, the interface of greatest interest is an emergent structure. It is not predetermined as
in HCI with its well-defined input and output devices. Rather, it arises from the way the volumes,
surfaces, material, furniture, colors and, above all, the location and nature of the people who occupy a space dynamically determine the structure of their interaction. How? By determining who
and what can be seen, heard, touched, what can be used in joint activity, what is in shared peripersonal and extrapersonal space and what can be assumed to be shared epistemic state. When people
choose where to sit and chat, how close to sit beside each other, the type of lighting, they co-create
an interface they are an essential part of. People constitute part of the interface for each other.
This is nothing like the concept of an interface in classical HCI or even in networked, distributed
AI-based HCI where an interface is where inputs are captured, and outputs distributed. It is an embodied and ecological view incorporating the ideas of joint activity theory and social co-presence.
A striking example of how architecture partly shapes the interface or social niche in which
people interact is seen in the way a serving hatch constrains social interaction between kitchen
and dining room in a 1960’s suburban American house [22].
A hatch, also known in French as a passe-plat, narrows the space of interaction between those
who are working in the kitchen – typically the wife in that era – and the husband and guests in the
dining or drinks area. See Figure 6(a). The hatch obviously structures the passing of food and drink,
but it also structures who and what can be seen from each area – that is, the situational awareness
of each participant; it determines how well sound travels and, accordingly, the degree of full social
exchange that can take place between the areas. One would not expect a serious exchange on the
meaning of life through a hatch, nor a private tête-à-tête. In Figure 6(b), the hatch now includes a
place to sit and drink or snack. In deciding to add a hatch to a house, an architect enshrines a set of
social conventions concerning who interacts with whom, when, and how they interact – i.e., social
roles. This architectural element thus defines a social, epistemic, and functional interface. Once a
cast of people populate the space the interface is complete and more precisely defined. Figure 6
shows how varying the design of a hatch and cooking area defines a social interface.
Materials are another component of an architectural interface. Transparent glass creates an
interface surface where events on each side can be seen but not heard; steel creates another. When
a wall is opaque and soundproof, the things on the other side are outside the interface. Floor
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:11
Fig. 6. (a) A serving hatch partially defines the social interfaces afforded by this space. It carries meaning too,
in terms of who may be expected to be in the kitchen, the social roles they play, and the scope and tenor of
conversation. (b) The hatch now includes a place to sit and drink or snack. The social relationship between
cook and diner is closer because stools allow a sushi bar style relationship. (c) The kitchen is completely
open to the main room. Food preparation is essentially a social activity, a show where guests watch or even
participate. Conversation is unconstrained.
Fig. 7. A room like this offers the possibility of many forms of social interaction: conversation, cleaning,
play, and sharing the closet. It contains many potential interfaces. Each comes into being when people with
certain relations animate the space. It is like a system of multiple overlapping niches in ecology.
materials like carpet, tile, or marble also affect the interface. Each material affects acoustics, line
of sight, ease of movement, comfort, privacy, how things feel – all attributes that help shape social
interaction. “God is in the details.”
None of these effects are deterministic. Look at the room in Figure 7. The space, surfaces, and
furniture define a collection of possible interfaces. In the same room, there is not one possible
interface; there are many, depending on who is in the room, the social conventions, the activity
the participants are engaged in, and the part of the room they occupy. Each interface, once created
by people in place, shapes the actions that are possible or invited and the actions that are denied
or inhibited. In the spirit of Simon and Newell [62], one might try representing these interfaces as
a set of task environments: one for tidying up (if the participants are cooperatively cleaning up),
another for dressing or chatting, or playing on the computer. Each task environment would define
a set of fixed choice points and option sets. When immersed in a task, subjects, according to this
model, would choose one among the task relevant options as they move from one choice point to
another.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:12
D. Kirsh
As valuable as this perspective has been in studying decision making and problem solving in
some domains, such as games and well-defined problems [48], in architecture, it seems utterly
forced. Imagine two people, one sitting on the bed, the other sitting at the desk. How many different
kinds of actions are possible – including speech actions and behavioral surprises? The option set is
ill-defined [78]. No architect would try to impose task environment formalism. It is too constrictive.
Still, everyone knows that not just anything can happen in a space. Over time it is quite clear what
the distribution of different types of behavior is. Clearly, the environment promotes and inhibits
certain actions. The twist is that the people who occupy a space co-create and redefine the actions
it affords, so it is not just tasks that matter, it is the setup and the sort of interactions that the
participants tacitly agree are acceptable. This tacit negotiation of acceptable or plausible activity
is the respect in which social interfaces are organic, like biological niches. The material properties
of the niche initially constrain activity, but things often change in the course of interaction. The
implication: interaction can change the interface!
The idea that interaction might change an interface runs contrary to HCI conceptions where the
point of an interface is to define where agent and system meet and where each may have effects
on the other. The interface itself is not changed because it is already predetermined. In biology,
however, a niche is a dynamic notion that can and usually is changed by the population that
occupies it. For instance, cows both fertilize and dry out the land they live on as their population
grows. When a new organism enters the “same” terrain, the other organisms adapt to accommodate
the “intruder” and each niche realigns as realities change and they all dynamically rejig what they
eat and how they live [69]. Organisms co-create their niche. Hence, we cannot define a niche
independently of the agent or agents who inhabit the niche and the effects of their behavior over
time. It emerges; it cannot be predetermined.
In human habitats, when one group moves furniture their niche changes subtly. This means
that we cannot fully understand the interface that a group creates – the niche it creates – unless
we understand what the physical setup is, who the group is and the tasks or activities they do in
that space. We need to know their cultural predispositions, their etiquette, and so on. Architects
design with people in mind; they have ideas about the types of interface these people are likely
to want and create. An architect’s job is to design the flexible substrate that supports these social
and activity niches – the spatio-socio-technological interfaces that people co-create. See Figure 8.
In HCI, the model of an interface and the interaction it supports has changed as the field has
advanced [see 35, 36, 31]. In the next section, I distinguish two models: a model typical of classical
HCI also applicable to tool use more generally, and a model that fits context-aware HCI, AI-based
HCI and distributed systems HCI. Both are significantly different than the niche model of interface
that I suggest is the driver of much architectural thinking.
3.2 Direct Manipulation Interface: Classical HCI
In the early days of HCI, an interface was understood as the boundary between physical and
digital systems – where two systems meet. The original idea was that an interface is the set of I–O
channels through which a person A and a digital system B communicate and interact. The two
abut at a physical/functional surface, where user activity on input devices crosses over into the
digital world.
This boundary is most naturally thought of as an n-dimensional I–O surface. Each dimension is
an information channel – an input or output stream – each with its defined capacity and expressive
power [13]. On the human side, a person acts on known input devices that transduce physical activity into electrical signals that cross the computer boundary and are then collapsed into a small finite
language of discrete impulses that represents the signal. Interpretation of signals proceeds through
a chain of programming contexts. For example, moving a mouse may be interpreted initially as
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:13
Fig. 8. (a) Represents an ecological niche that dynamically reshapes because of the effects the inhabitants
themselves have on their terrain and food and the arrival or departure of other species whose niches spatially overlap. (b) Represents the dependency of a niche on neighboring species. The niche of one species is
constrained by the niches of other species who live nearby. It is not predetermined. These type of dynamic
models of how a niche changes better fit the notion of an architectural interface where the social bubble
people construct depends on how many are in it, who else is around, who enters the bubble’s space, where
things are positioned in the bubble and what’s nearby, as in an open office layout. Removing a table or chair,
or opening a door, would change the socio-spatial interface, as would introducing a mobile phone or a group
printer or water cooler. Such a niche remains a social construct even if there is only one person in it, though
at such times the task and physical layout are the dominate constraints of the niche.
Fig. 9. In the classical view, an interface is an n-dimensional surface where inputs enter through one set of
dimensions and outputs exit through another set. The agent or user drives the system by choosing inputs.
The system displays feedback on the input actions as well as program results. User and system behave as a
closed loop.
moving the cursor from position (x1 , y1 ) on the screen to position (x2 , y2 ), at that point, it might
mean grabbing a corner of a geometric cube and rotating it, which is represented internally as a
cube of a certain size and color, relative position in a 3D space, and so on. Likewise, a keyboard tap
on lowercase T might be interpreted to mean add “t” to this text field or assign value “a” to a variable, depending on programming context. Output from the digital side emerges as a change in the
visual or auditory channel of the display, or a page being physically printed, and so on. See Figure 9.
Nowhere in this old HCI model, is there room to think of where the agent is. The relevance of
his sitting in a chair or lying in a bed is not part of this type of HCI interface. There is no need
to characterize the presence – social, physical, and epistemic – of anyone because people are a
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:14
D. Kirsh
construct defined over the set of predetermined actions that they can submit through the input
channels. They could be robots or remote controllers modulating mouse and other inputs. Effectively, a person is a function that takes their own goals and objectives and maps these into actions
to be performed on input devices in light of the output and feedback provided by the digital system.
Person and computer therefore form a loop: the user intentionally acts on input devices according to goals, the system reacts according to its operating system and active programs, including
providing feedback on display or audio channels. The cycle recurs.
Architects, on the other hand, quite naturally care about where a person might be and the
sort of social, physical, and epistemic presence they may have. Architecture is about embodied
interaction, between sentient, space filling beings. When a person is viewed as a goal directed system independent of bodily presence, then their location, inner feelings, social, and other non-input
interactions become irrelevant.
Another assumption of this very classical interface model is that agents know what things are
input devices, how they are to be used, what their input actions mean – as long as feedback is
well-designed and revealing. They therefore act on these devices intentionally and users feel they
are in control. Input devices are typically hand sized. This hand-based control paradigm of I do – it
does led the community to see interaction sequences in classical HCI (really interaction patterns)
as akin to dialog where turn taking is required. Another metaphor was to see interaction as akin to
working with objects or tools, as in assembly or sketching, where agents can directly manipulate
things to cause change. Richer interfaces soon supported more types of actions, each requiring
a shift in metaphor – queries, browsing, complex tool use, controlling the parameters of a process, and visualizing. Nonetheless, the direction of interaction continued to be primarily from user
to system. As the complexity of tasks increased the metaphor often changed from dialog to one
of support. When a task requires managing many complicated steps, where some of these steps
are dependent on completing others and sub-goal interactions become complicated, a thoughtful
interface can help the user by maintaining visibility of system and task state, it might offer recommendations, provide buffers that store interim results. In such situations, the functionality of
interfaces goes well beyond dialog to something closer to epistemic management. Nevertheless,
despite the need to change interaction metaphors, the basic structure of such an interface still
seems most naturally understood as a boundary where the user intentionally controls things via
input actions s/he understands and is aware of using; the system responds with feedback, recommendations, and operational performance from its side. The feeling of control is important.
A key step in the phenomenological feeling of being in control is that users move beyond registering a boundary between system and human [19, 57]. This is well known when applied to tools
e.g., screwdrivers, hammers, pencils, but it also applies to classical interfaces.
To dissolve a computer boundary, two phenomenological conditions must be satisfied: (a) control transparency – a computer user must feel that her input devices, her mouse, and keyboard,
are “transparent” – she sees through them and controls through them to operate “directly” on their
onscreen counterparts; and (b) semantic transparency – users see beyond the immediate onscreen
effects of their actions and take a second leap of intuition by seeing through onscreen icons to
their meaning.
Here are the two components in detail:
(a) Transparent control – feeling in control: Input devices feel like gloves or glasses – they
are things a user perceives through, not things they perceive. This happens when there is
a sufficiently tight coupling with the system that users feel their actions are operating on
the onscreen digital entities directly – the icons – rather than on a mouse or screen. The
analogy usually given is with a blind person’s white cane: its distinct physical presence as a
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:15
cane dissolves as it is absorbed into the sensory system of the acting person [55]. The tip’s
contact with the ground is what is sensed, not the mass of the cane or its inertia. Just as we
rarely feel our eyes move when we saccade or look sideways, so a cane user rarely feels her
cane move because her attention is on the forces at the tip, and through those to features
of the terrain. The target of attention lies beyond the physical cane.
(b) Transparent semantics – interpreting screen events semantically: Onscreen entities are intuitively meaningful, so users see past the icons or letters to their referents or meaning.
When reading a book, no one looks at words as letter sequences like we do when looking
at non-words such as uhsbxn mckdjg. We see in semantic units. Likewise, a user, call her
Jane, doesn’t think she is dropping an icon of a file onto an icon of a folder; she thinks she
is dropping a real digital file into a real digital folder. It feels to her as if her mouse operates
not just on a curser but on a functioning curser, one that lets her act on application objects.
This is possible, in part, because classical HCI systems are closed. Exogenous inputs are not
supposed to occur. Jane is in control. So when she drops the icon correctly, it does what
it should. Likewise, outputs are supposed to be rule determined. These deterministic rules
make it easy to learn the semantic function that maps screen entities to their referents.
We can summarize the normative model of a classical interface as a human–system relation that
satisfies the conditions shown in Table 1.
These conditions are not met in architecture where people are an intrinsic part of their interface.
Other than our individual interaction with knobs and switches etc., where a direct manipulation
model clearly does apply, there is no discernable n-dimensional interface in architecture with channels, bandwidth, and signal meanings. Because we never think of other people as input devices,5 it
makes no sense to think of an interface in architecture as a place where one side, a person, meets
another side, the thing controlled. Social interaction is not like that. For example, when people
dance the distinction between input and output is blurred because it is a joint activity without
clear directionality at each moment; control can move back and forth, or be truly joint. Or when
people use a nearby object as a prop in a conversation, they are not using it as an input device;
they jointly breathe meaning into it. So, it is absorbed into the social bubble where meaning is
situationally created.
A third respect in which architectural interfaces differ is that not all interface actions are intentional and explicit. This follows because in social interaction some interactions are implicit: body
stance, orientation, mutual distance, and facial gesture [23]. In virtue of having a body, participants always have a location, orientation, relation to other people, and building surfaces, whether
they want to or not. People can’t help but generate “input” in the sense of doing things that carry
meaning for others. These “inputs” are often unintentional, implicit, and non-transparent. They
are not part of the direct manipulation paradigm where agents feel they are the cause of events
across the interface boundary.
Last, architectural interfaces are not closed, deterministic, and independent. The interface
changes as people move about, as more enter, as new tools or props are brought in – all ele5 Ears
and eyes might be thought of as input devices, but importantly we do not control the full signal entering these.
The senses of others are not direct manipulation input devices. Moreover, were we to start thinking in this mechanistic
way the range of input devices we would have to say a person incorporates is broad and not semantically transparent. If I
stroke, shove, trip someone, for instance, how many channels of their input “system” have I acted on? All have effects, but
some effects are non-informational. The ones that are informational, such as words and gestures, usually do not provide
transparent feedback from the receiver to actor, echoing the meaning of the signal. We don’t tend to repeat the message
or say: “copy that.” Moreover, our words are ambiguous, highly contextual, and their meaning or impact on a listener is a
complex function of many things. Accordingly, the degree of control agent A has over input signals to B falls far short of
the requirements for direct manipulation.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:16
D. Kirsh
Table 1. Attributes of a Classical Interface
Attribute
Recognizable input devices
Manual control
Feeling of control
Explicit understanding
Intentional control
Closed
Circular
Deterministic
Independence
Direction of adaptation
What each attribute means in classical HCI – Direct
manipulation
Humans know what and where all the input devices are.
Input devices are typically hand sized, and directly
manipulable.
Humans feel they are in control.
They explicitly know what their actions mean to the system, at
least superficially. There are learnable semantic rules linking
digital images onscreen to digital entities deep inside the
system.
Humans intentionally control input according to their goals
and their understanding of the meaning of actions. They are
the initiators.
The entire I–O system is closed to outside factors. Only the
user can cause changes in input devices and only the digital
system can cause changes in output modalities.
Users can be understood as a function that maps system
outputs to inputs to the system in a goal-directed manner.
Changes in the ambient context of a user do not change how
an optimal user would behave given their goals and beliefs and
so a system need not monitor more than the signals a user
provides via input devices. The low-level meaning of input is
not affected by external context.
The system does not respond in an indeterminate or stochastic
manner. It is programmed to respond to inputs in a
pre-determined manner. (This also includes programs that are
designed to have a random response as long as it is
pre-determined.)
Users are not part of the interface, they interact with it. Hence
interactions with the interface do not change it (in the short
run).
Humans must acquire the skills necessary to use input devices
appropriately. It is humans who adapt to interfaces rather than
the digital systems, though of course good design makes it
easier for people to master the system and systems may make
small peripheral adaptations to users. See figure 10.
ments that violate closure. Actions that occupants perform change the interface – violating independence. And people are unpredictable, so in social interfaces input–output functions appear to
have non-deterministic elements.
3.3 Network Interfaces: Context-Aware/Ubiquitous Systems
The differences between HCI and architectural interfaces partially fade when the system a person
interacts with, is context aware. Because the overall system is distributed, partly invisible, and
system output may change the participants’ activity space it may seem to us as if we are not
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:17
Fig. 10. Humans adapt themselves to classical HCI interfaces when they master the skills necessary to control digital events via input tools. Image credit: http://www.kellychiropractic.co.nz/.
on completely different sides. The boundaries seem to blur further when the “system” contains
multiple people distributed remotely, all of whom interact via local devices networked together.
Now, the “HCI” system no longer feels closed and deterministic. It does not feel closed because
any one person’s actions may not have its expected effects. If two of us speak at the same time, or
attempt to act on the same shared object, we cannot predict how the system will behave. Prediction
fails not just because algorithms are too complex – though they often are; it fails because of the
impossibility of knowing the timing of signals. We cannot predict how a network will react owing
to the unpredictability of who will do what, when, what traffic there is on the net, and how dynamic
adjustments are made through adaptive programming. What happens next inevitably depends on
moment to moment exogenous factors that are unpredictable. Hence, the system feels open and
non-deterministic because it partly is. Actions on the system have probabilistic outcomes. This is
one profound difference between old fashioned HCI and distributed HCI. It does not technically
dissolve the boundary between physical and digital, but it feels as if it is beginning to.
Another difference between classical and pervasive, ubiquitous or context-aware systems arises
from passive sensors. Once sensors become an important input source, users no longer have explicit knowledge of what their actions mean to the system. Do I know what a KinectTM represents
me as doing when I pull a long espresso? Kinects often compute stick figure movement based on
infrared motion and depth detection. Other AI-based vision systems use additional methods. Do I
know how to act to create input that will lead such systems to represent my current action correctly? For instance, when I want a system to know that I am filling the portafilter with ground
coffee what must I do? And what should I assume it knows? My grasp of the semantics of its input
representations is incomplete. And possibly, my grasp is gappy because not only am I uncertain
what the sensors are capturing (since I may never see their output) I may also be unaware of how
many sensors there are, where they are, and what they are doing right now. Sensors can “manipulate” themselves – auto-on or auto-off. Do I know which are on now and which are off? This
further weakens my sense of agency and threatens my capacity to control input explicitly and
intentionally. In fact, who initiates interaction, me or it?
This radically changes the idea of an interface. If people are unaware of pervasive sensing, why
should the system offer them feedback of what it is tracking? Showing such content might be
distracting, like watching oneself walk upstairs.6 Worse, what the system knows or derives from
6 In
retail stores, where continuous camera feeds are broadcast on visible monitors, the objective of the display is not to
tell occupants what they have just done; it is to remind them that they are being recorded. This is quite a different purpose
than providing feedback to facilitate transparent control.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:18
D. Kirsh
sensor input is opaque to the user, even if the raw capture were typical video and not infrared
motion. It is opaque because video input must be interpreted. Why suppose that a camera represents the current situation in the way we see it, even when it computes stick figure movement, or
that the system collecting video categorizes things in the same vocabulary we use to characterize actions, beliefs, expectations, and predictions? Its classifications may be incomprehensible to
humans. This certainly would be the case if the system relied on a classification of camera input
that was derived from machine learning of big data. Meanwhile, if people at the other end of the
sensors are observing displays, we don’t know how much of the context they see. We don’t know
much, if anything at all, about what a system is taking in about us.
On the output/actuator side, we do not always know what the system does in response to inputs. Perhaps, it changes the Heating, ventilation, and air conditioning (HVAC), or load balances
electricity, or alters lighting to better see what we are doing. When humans do not know how a
system responds to their actions, or they are unaware that a system is sensing and responding to
them, why would they think they are interacting with something. What would be their grounds?
They don’t feel in control; they don’t feel they are agents in dialogue, or participants in joint activity, and often they don’t even know they are in the “sights” of a system and part of an interactive
chain of action reaction. Arguably, the only reason to say they are interacting with the system at
all is that it responds to their actions and adapts to their reactions whether they know it or not. If
someone suspects they are in the presence of an adaptive or context-aware system, they may try
things out. That might justify the claim of interaction. But, if the system’s response relation is too
complex, or the form of its response is to alter appliances they know little about, it is more of an
open question whether the two are interacting. At best, the interaction would be implicit; that is,
the human would implicitly know things are happening and somehow connected to their action,
but this knowledge of interaction occurs without their explicit awareness. System output is too
hard for us to explicitly detect it and tie it to input.
All this suggests that treating interfaces as an n-dimensional surface won’t quite work for context aware, distributed systems, or for sensor-based AI-driven systems. There is still a distinction
between agent and system, but the key features true in the classical interface of knowing the input
devices, knowing their semantics, being in intentional control of what they do and when are things
that no longer true for context-aware distributed systems. Because of the complex interconnections between multiple participants and processor(s), we need the expressive power of something
more complex, more like a network model. See Figure 11.
See Table 2 for a comparison of the classical and network/distributed notions of an interface.
With a network model, has HCI now reached the concept of an architectural interface? Not quite.
What is left out is full embodiment and the way humans interact with each other in a space they
share. In particular, interaction in buildings is connected to co-presence – social, physical, epistemic, and normative. This means that in buildings, agents are an intrinsic part of their interface.
They co-constitute the place they are in. In a network model, co-present agents are technically still
distinct from their context-aware interfaces. They are located in one place, sensors, and outputs in
another. Space is still the space of physics; it is not intrinsically a place [51]. Although it is not helpful to think about a context-aware interface as an n-dimensional surface, it nonetheless remains
an n-dimensional interface from a purely formal viewpoint because a physical digital boundary
is still present. This changes somewhat when we incorporate robotic systems into our pervasive
system. Robots reach into our world more ambiguously. Depending on their form factor they begin to enter the edge of our social world. And some may have an autonomy that runs contrary
to the notion of a ubiquitous system. But context-aware systems without robots still interface our
commonsense world through digital input and output ports.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:19
Fig. 11. (a) In a network interface, there are many lines of connectivity between input, apps, and human
agents. Because human agents do not know which nodes are active at any moment there is no semantic
transparency. They do not know what they are transmitting, what it might mean, what its effect might be, and
they do not always know when the system is producing output. There is too much unpredictable interaction
for them to really know when and how they are interacting with the system. In extreme cases, the two
interact only in an implicit sense. (b) Illustrates a context-aware system tracking peoples’ action in a specific
region. But it may also capture their speech, body temperature, posture, and other biometric information.
They probably don’t know, or know precisely, what is being read off of them. If they are completely ignorant,
they can’t explicitly interact, and they can be said to implicitly interact only when they tacitly know things
change somewhere as a consequence of walking in this area.
The bottom line is that in a network interface there is no room for the role of presence and
co-presence in all its complexity. For that, we must turn to a more embodied, ecological notion
consistent with architecture.
3.4 Architectural Interfaces: Embodied and Ecological
In architecture, but not in HCI (yet), people are part of their own interface. They are causally
embedded in the interface, constituents of it. They have a place and presence in the interface,
rather than being causes of input and recipients of output from an independent interface. This
is because when two or more people are present each sees the other as one of the entities to
interact with. They are players both on a stage and partly creating the stage. Following the stage
metaphor, when someone opens their mobile phone or moves a chair, they alter the interface. This
has weighty consequences. Since we co-create our interface, there is no full separation between
interaction and interface. Whoever or whatever else is in a place – other creatures (pets), artifacts,
and especially conspecifics – they are factors in the dynamics of interaction. They co-habitate with
us. When they are humans, we negotiate our interface. This means we tacitly set weak limits on
what we can do.
The idea that we co-construct our living space plays out in interesting ways. Constraints come
in epistemic, social, physical, and cultural forms. For example, where one can sit depends on where
others are sitting – a physical constraint. What one can say, read, or do depends in part on who else
is in the room and our culture. How long one might have to wait depends on the queue. Most of
these influences on our possible actions are not part of the network model, where agents interact
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:20
D. Kirsh
Table 2. Attributes of a Network Interface
Attribute
Direct
manipulation
Network
interface
What each attribute means for ubiquitous or context-aware
systems
Recognizable
input devices
YES
NO
Not all input devices need be recognizable. Sensors often are hidden
or invisible. We don’t know which are on.
Manually
controlled
YES
NO
Input devices are not typically hand-sized or hand controlled – they
are mostly sensors.
NO
You can’t feel in control if you don’t know the boundaries of a
system: where the sensors are, what is being registered and
recorded, how inputs are affected by implicit parameters like the
time of day, details of the situation, whether other people are
currently working with the system.
NO
In complex systems, how can agents know what side effects their
actions may have? Or what may happen given the other players, or
how the system balances everything? Smart systems make decisions
for us. So how can we have intentional control of output or system
behavior? Often the direction of control is from system to human.
Systems can initiate interaction.
Feeling of
control
Intentional
control
YES
YES
Explicit
understanding
YES
Sometimes
Occasionally, actions have clear meaning/results and agents
explicitly know what their actions mean to the system. But actions
can cascade, or lead to surprising events in the distance, or set off
unintended actions by others thereby limiting predictableness and
understanding. Sometimes there is explicit understanding,
sometimes implicit understanding, and sometimes no understanding.
Closed
YES
NO
The entire I–O system is open on many fronts: from multiple
humans, effects of timing, or random intrusions in a sensor’s
environment.
NO
Users are not assumed to implement a function that maps system
outputs to the inputs they create through acting. Too often they do
not know when they are creating inputs and they have no idea what
a system is doing in response. Moreover, the system is context aware
and reacting to more than just that user’s behavior. There is too
much openness, uncertainty and user ignorance to treat users as a
closed function from system output to user behavior.
Circular
YES
Deterministic
YES
NO
From any participant’s point of view, the system behaves
stochastically with unpredictable events happening because of
timing, collisions, load balancing, and so on. Even if all these are
under the operating system’s control, distant people can turn on or
off their own ports and this is not predictable in advance.
Independence
YES
YES
Interactions with the interface do not change it. A system may adapt
to agents by improving its service, but it does not change what
inputs it registers, and what outputs it can produce, nor even where
the digital–physical boundaries of input and output are located.
Direction of
adaptation
Human to
system
System to
human
The system adapts its output to users rather than user to system. Its
services are in response to user needs. Hence, advanced systems will
learn users’ behavior in context and adapt to these to maximize
service quality.
with the network through very specific channels and the network neither co-constructs their space
nor co-habits with them, though it may try to construct parts of it. The difference is clear because
if someone walks out of an architectural interface it substantially changes the interface, whereas
if someone walks out of range of a context-aware system the network interface remains exactly
as it was. Inputs have changed but the interface remains.
One way of thinking about a more niche-like model of interface is to reflect on joint activity.
Although not the standard way of thinking about joint activity, it is often constructive to think
about people being together as sharing bubbles: the bubble of shared situation awareness, shared
agency, shared knowledge, and common ground. Every person has social presence, epistemic
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:21
Fig. 12. In social interfaces, the participants unavoidably share bubbles of joint awareness. Each has their
own body and peripersonal awareness. But each also shares knowledge of much of what the other person
knows about the space they share. Because they are co-present, and being much like each other, they know
roughly what the other experiences, what each is attending to, what each can be expected to do and not
do, the conventions, norms, do’s and don’ts that come from participation in a social group, even ad-hoc
participation. Image credit: Getty free images.
presence, normative presence, and personal agency. When others are physically close, they share
social co-presence, epistemic co-presence (distributed situated knowledge), and joint or distributed
agency. See Figure 12. Two co-present agents share non-conceptual knowledge [16]. They each
know where to look when a sound startles them. They know where the sound is, relative to themselves and relative to the other. And they know that the other knows it too. This happens without
representing the sound in a Cartesian space. It is represented in each person’s activity space. They
also have the capacity to coordinate attention, to jointly “know where” as when both listen to the
steps of an intruder [60].
Likewise, because we are human and knowledgeable, we can reason causally about events.
Causal reasoning is more than statistical Bayesian reasoning [68]. That unexpected sound: what in
the environment might have caused it? Knowledge of the other person: what can we infer by their
implicit social interaction – their stance, aggressiveness in acting, facial gesture, position in the
room, yawning [27]? Has someone entered the room? We likely know if they have, and we know
that the newcomer likely knows we know it too. What can that person see? We rapidly triangulate
lines of sight. We share much that is related to knowledge and situation awareness just because we
are both humans and in the same space at the same time and about the same size. Context-aware
systems do not share our embodied perspective. As Wittgenstein said, much of our knowledge and
capacity to coordinate depends on our having a common “form of life” [85] something that robots
and context aware systems do not share.
How does this play out in the attributes of an architectural interface model? First, to talk of
people themselves as having input and output dimensions seems forced, inappropriate. We know
where others are and a great deal about them, but that does not support knowing what and where
they are as a thing outfitted with a set of input devices and sensors. Calling people devices, or any
part of them a “device,” is a category error [74]. Similar problems arise when we consider other
conditions of direct manipulation such as the semantic transparency of the effects of our actions on
them, our feeling of being in control [45], the two of us being a closed deterministic system. These
do not apply because when we interact with other people, we do not directly manipulate them,
they have freedom of will and interpretation. Nor do our fellow agents behave like context-aware
systems, whose raison d’etre is to provide services for people. Humans, as Kant [39] famously said,
are ends in themselves. Even human slaves have hidden moments of freedom when they are not
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:22
D. Kirsh
servants to others. We all are free to think as we like in our interior life and scratch, cough, choose
the words we speak, and so on. Machines are not. And when we do things for people, often those
things are not really services. They are things we do together; or things we implicitly do for them
without realizing it. In sitting by a fire with others, we may have no intention to interact, but just
by being close in space we may give others comfort or pleasure. We implicitly interact. The list of
differences continues. When we act with others as a joint system, there is no obvious directionality of adaptation and control, as if one party adapts to the other while the other goes about her
business deprived of awareness, or as if one party controls the other while the other is passive. The
others are by definition ‘involved’. Everything is “co”: co-adaptation, co-ordinated action, shared
knowledge, and co-present. Further, although there are always surprises when people interact –
some degree of indeterminacy and unpredictableness – people are different than network systems
because we strive for interpredictability when acting together [43]. Networks are happy to do
their work without our knowledge. To date, human–human systems are essentially different than
human–computer systems. Social interfaces are not HCI interfaces. See Table 3 for a list of these
differences.
We turn now to the different ideas of interaction that fit these different notions of interface.
How does each interface determine a set of possible interactions? What are the formal properties
of interaction in each interface?
4 PART 3: INTERACTIVITY
I have been arguing that architects have a third concept of interaction, one that goes beyond the
interaction of direct manipulation and networked mediation. Cognition (and brains) are different
when people are nearby and share a physical space [1]. This difference provides the basis for
a type of interaction that is not tied to an interface boundary and certainly does not rely on a
digital/physical boundary. What is needed is a notion of interaction that works in social interfaces
and reflects a non-boundary orientation.
In this part, I set out to clarify what is special about human–human co-present interaction by
contrasting it with interaction as found in direct manipulation and networked mediation. I begin
with a discussion of how the different conceptions of the relation “to interact with” differ with
respect to symmetry, transitivity, and reflexivity, the formal attributes of any relation.
4.1 Formal Properties of Interaction
4.1.1 Symmetry. A natural starting point for any modern theory of interactivity is with Newton
and his ideas of physical interaction. In Principia [64], Newton states that interaction occurs when
two bodies reciprocally act on each other. Body a exerts a force on b, and b in turn exerts an
equal and opposite force on a. A acts on b, causing b’s state to change, and b acts on a, causing
a’s state to change. Call this Newtonian condition “causal bidirectionality.” Formally, it states that
interaction is symmetric: whenever A interacts with B, B interacts with A. See Figure 13.
Symmetry is evident when we interact with knobs, walls, floors, or anything we touch. Touching is symmetric. When a physical attribute of the thing we touch changes, our next interaction
often changes. A floor when slippery creates a different interaction between floor and walker than
a floor when rough. Change a key property and interaction patterns change. Stairs cause a different interaction than walls, or escalators. How a person moves in each is different because of the
different shapes and textures that must be accommodated. We may expect to discover that materials, shapes, tools, and furniture give rise to characteristic patterns of interaction and a science of
architecture one day ought to tell us about these patterns.
One consequence of causal bidirectionality is that if interaction is necessarily bidirectional then
we do not interact directly with many of the things we might think we do. For example, we do
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:23
Table 3. Attributes of an Architectural Interface
Attribute
Direct
Network
Architectural
Recognizable
input devices
YES
NO
N/A
Manual
YES
NO
N/A
Feeling of
control
YES
NO
Yes, but jointly
with others if
present
Intentional
YES
NO
Mostly
Explicit
understanding
YES
NO
Partial
Closed
YES
NO
NO
Circular
YES
NO
NO
Deterministic
YES
NO
NO
Independence
YES
YES
NO
Direction of
adaptation
YES
NO
Circular or
joint
What each attribute means in architectural
interfaces
N/A. Social interfaces can exist without
digital devices. If they are present on or in
people, then we assume that at least the
agent wearing the device recognizes it. These
devices may be part of a social interface, but
they are not the focus of interaction.
N/A. No input devices then no manual input
devices.
In familiar social contexts, we feel in control.
But social interaction is a joint activity and
hence our own control is limited by the
autonomy of others in the group.
It is a convention of social life that we act
intentionally on others. Implicit social
interaction, such as stance, intonation, facial
gesture happens all the time, however,
implying that some interaction is
unintentional and implicit.
People tend to understand their social
interactions to a first order. In the context of
joint activity, consequences are reasonably
well defined and predictable. The track
record is less impressive for people knowing
the effects of their actions on others when
the point of action is unconstrained by task.
Actions can lead to surprising downstream
events.
The system of participants, furniture, props,
and ambient “stuff” is open on many fronts:
from multiple humans, effects of timing,
random intrusions. Social interaction is not
closed to outside and unpredictable
influences.
Users are not comprehensible as functions
that map system outputs to inputs in a goal
directed manner. Joint activity results in
surprises.
From any participant’s point of view, the
social interface is largely predictable because
social agents when acting in a coordinated
manner try to be predictable. But other
participants still do unexpected things. As a
system, the interface is permeable to events
from outside, so it is not deterministic.
Interactions with the interface often change
it. Agents change where the interface is
when they move, introduce props, change
their activity.
Humans co-adapt.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:24
D. Kirsh
Fig. 13. Symmetry in interaction means causal bidirectionality or reciprocity. This rules out one-way causation as a form or interaction, such as A being a stimulus that prompts a reaction in B. Interaction is always
two-way, though the return action of B need not be equal and opposite. E.g., when a switch is turned a light
may illuminate it, hence reciprocity; when a speaker chats with a signer, the reply to speech may be a sign.
not interact with TV, movies, or media when we watch or listen no matter how attentively, unless
we change the volume or channel, unlike what some in media studies historically have suggested.
Without our direct interference, our media continue to behave the way they would without us.
Nor do we directly interact with buildings when we navigate through them, other than in the
very basic interaction of walking on the floor, making sound, or creating moving shadows as we
move - those some might reasonably argue that this proves bidirectionality. If accepted, this means
that responding to signage, passing through open doors, and avoiding cordoned off areas is not
an interaction with signage, doors, or barricades; it is a one-way response by us to cues. Action
omissions, such as not hitting the door, avoiding restricted areas, going down the wrong passage,
are not interactions. They are just actions. Hence, causal bidirectionality implies that we do not
directly interact with many of the things in buildings that architects take pains to design. Welldesigned features provide guidance rather than pushing or pulling. Good design seems to make
interaction unnecessary.
Symmetry is a strong condition. It may exclude activities we might think are interactive.7 On
the other hand, if symmetry on its own is sufficient for interactivity, then it is possible we interact
with many things we do not currently think we interact with. For instance, when we speak in a
closed room, the room’s acoustics affect our sense of how we sound. Our vocal chords affect the
air which affects the walls, and when we hear the sonic rebound our internal state is affected.
Speech production is affected by speech perception and that is affected by acoustics [73]. Are we
interacting with the room? Most people, I expect, would require intentionality: if we intentionally
test acoustics, then we do interact, but not otherwise. When speaking in order to test acoustics,
we are explicitly listening to the response. And we get one. So yes, when we speak in a room we
are interacting with the room if we intend to adapt. Others, however, might argue that intentionality is not required. We interact with a room, whether we consciously test it or not, because we
7 Symmetry is an even stronger condition if we assume that interaction is non-instantaneous. Here’s why. If A → B takes
time, then B changes state after A started the interaction. That means that B → A will change A later. Since symmetry means
that it doesn’t matter whether we start with B → A or A → B, isn’t there a danger that we are committed to interaction (or
causation) being an unending loop of A interacts with B which means that B interacts with A, which means that A interacts
with B and not just logically but physically, hence leading A and B to loop interactively forever – like gravity? This raises
the natural question whether interaction implies that two things must have more than one-shot bidirectional causation.
That’s a worthy question since we tend to assume that if A interacts with B then A also responds to B’s return action on
A and in a manner that leads to B’s renewed reaction. But we need not let a concern with eternal interaction derail us if
we focus on a logical analysis of the meaning of the relation in abstraction from time. If interaction is a relation, it can be
both symmetric and have a starting direction, only were it a function or operator would it be commutative, implying that
no side has causal priority.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:25
implicitly react to acoustics by unconsciously modifying how we speak. In that case, there is a
loop of causation: from us to the room, from the room back to us, and then through our adaptation of speech back to the room. It can be argued that this is sufficient for interaction because if
we consider a contrived situation where we cannot hear ourselves speak our natural flow may be
partially derailed, proving that we rely on adapting to acoustics to implicitly control our volume
and clarity. Realize it or not, we implicitly interact with rooms when we speak.
These two consequences of causal bidirectionality – that necessity rules out some cases we
might intuitively include and that sufficiency rules in some cases we might intuitively exclude –
highlight the complex role that intention, awareness, and sensitivity play in interaction, especially
in implicit interaction. In HCI, the classic account assumes agency because it is inspired by direct
manipulation: a user intentionally acts on an input device to create changes in a target system
[36]. This presupposes that users also have the sensitivity to be aware of the effects of their acting.
It is a good start and works well for knobs and elevators, drawers and bathrooms. But it works less
well for many architectural elements that architects care about because no one thinks of things
like materials, textures, colors, openings, and shapes as input devices. They are indeed features of
the interface, but they are not devices that we act on or try to change. We are typically passive to
them. We register them and respond to them (often unconsciously) rather than try to change them
explicitly. These adaptations, as with our speech example, might mean we do interact with architectural elements, albeit implicitly, despite our being unaware that we act on them, and unaware
of the effects our actions have on them. If we do interact with them, it is likely that the nature of
our interaction is subtler, less obvious, and might well require scientific study to discover and understand. The same concerns about causal bidirectionality and explicit agency arise for pervasive
computing where contextual sensors are ubiquitous and for everyday human–social interaction,
where a network notion or something ecological may be required rather than an explicit “I push
on it and it pushes back” notion. Explicit agency may be overrated for interaction.
One reason explicit agency is indeed overrated, I believe, is that it is easy to underappreciate how
widespread implicit agency is in our social life where we both transmit and receive implicit (unintentional) social cues and social attitudes all the time. These cues are displayed in posture, movement dynamics, interruptions, pauses, prosody, and joint gaze. Members of a co-present group
do not have to be aware of the cues they each are transmitting for those cues to have social significance and affect how each member reacts [71]. The process recurses, with reactions feeding
and amplifying other reactions. In the theory of social interaction, such unconscious, unintentional cueing is an active area of research [23, 86]. It is clear that in the social world we implicitly
interact.
Why should implicit interaction be acceptable for social interaction but unacceptable for other
forms of interaction? Isn’t it more reasonable to assume that neither intention nor awareness nor
a sense of agency is necessary for interaction as long as all parties involved in the interaction are
relevantly affected and, if human, then implicitly “aware”?
Where does this leave us? Bidirectionality has given us implicit interaction. Good. But it has
rejected most building navigation as a form of interaction. Possibly bad. At least half of my architecture informants and about half of architecture grad classes when polled (about 75 students), all
believed that navigating through a building is a major way of interacting with it. Assuming those
architects are not just “the half that got it wrong” is there any way of saving navigation while
retaining causal bidirectionality? To put this in concrete terms, can we explain how it makes sense
to say that inhabitants of a building interact with walls and doors when they seamlessly move
through, never touching or bumping them?
One possible resolution is to see the problem arising from a simplified notion of causation.
Move beyond that and we can rethink casual bidirectionality, network interaction, and human
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:26
D. Kirsh
co-present interaction. The first step in that direction is to clarify transitivity and explain how
mediated interaction works.
4.1.2 Transitivity. Interaction is transitive when we control things through the use of other
things and receive feedback or a return action in some form. Transitivity means that when a interacts with b, and b interacts in the relevant way with c, then a interacts with c. ∀a, b, c (aRb ∧ bRc) ⇒
aRc, given appropriate conditions.8 Interaction can be mediated. It often is. For example, when a
tennis player makes an enviable shot, placing the ball in the cross-court corner, she may justifiably be said to have interacted with the ball. But not directly. The racket is a mediating instrument
through which the player interacts with the ball. When playing is fluent, the racket is transparent
to the player since her or his entire concentration is on what to do with the ball: where to place it,
what sort of spin or bounce to make it have. Causation and control pass through the racket enroute
to the ball.
The phrase “appropriate conditions” in the transitivity condition plays a vital role. Many actions that support interaction are transitive only under certain conditions. If a walks with b and b
walks with c, then a walks with c, assuming their walking is simultaneous. Walking-with is a joint
activity, and it is symmetric9 like playing tennis together or participating in a tug of war. Does
the joint activity of walking-with apply to a hundred people or to people so distributed that there
is no control, coordination, or communication between them? At some point, the transitivity of
walking-with breaks down. The conditions of application must be defined.
Consider these problems:
• Long causal chaining. Can a cause something that ripples transitively from b→ c→ d→ . . .
→ z, such as a’s veering sideways or increasing pace? If a’s impact on b washes out before
reaching z, how can a possibly interact with z? What determines the length of a causal
chain? If a cannot have an effect on z, how can a and z interact?
• Epistemic requirements of forming an intention. Suppose a has a causal effect on z but doesn’t
know it. Can a interact with z if a does not know of the existence of z? We’re assuming that
a doesn’t need to know much. It would be enough that a is able to refer to z as “the person
on the extreme right,” or “the person just after the farthest one I can see”? But if a does not
know that z is present, or if a lacks the referential capacity to think about z as an individual
thing or person, how could a form an intention to interact with z? Even implicit intentions
require the capacity to know (albeit implicitly) the difference between affecting z and not. At
a minimum, a must be able to detect feedback from z. Is there some symmetric ripple back
effect that a could pick up, even implicitly, that would serve to close the loop on interaction?
It can be something as small as a complaint from z, or a groan, or a stumble that somehow
in some form gets back to a and it carries enough signal for a to know that s/he was in
causal connection with z (wherever z might be).
8 Although
causation lies near the heart of interaction, and causation is usually thought to be non-transitive, this is one
respect in which interaction is not just bidirectional causation [Halpern & Pearl, 2005; Halpern, 2016a]. See Lewis (1973)
for a defense of causal transitivity.
9 All interactivity is symmetric, but it is symmetric in the sense that b’s return action must change a’s state though not
necessarily in the way a changes b’s state. When two people walk together, a walks with b iff b walks with a. They
symmetrically act in the narrowest sense of both acting on the other through walking with. If a directs b, then a interacts
with b but b’s action back on a may be that b’s changed activity is visible to a. This satisfies the condition that b provides a
with feedback. Feedback may be perceptual or non-perceptual, e.g., changes a’s non-perceptual physiological state. Walking
with also provides feedback in that b must walk within certain bounds of a and a must walk within certain bounds of b,
so the two dynamically entrain each other. Typically, this will involve perception but entrainment per se does not require
perception. So the feedback from b might be in terms of force, pressure, holding hands, and this might support long-term
interaction past the point of perception.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:27
The same question arises for every action we might consider as being interactive. If a is talking
to b and b to c, can we infer that a is talking to c through b, or that a, b, c form a group where each
speaks to every other? It depends. Talking is also a joint activity, but the rules of who, how, and
when people can “talk to” each other are different than the rules of who, how, and when people
can “walk with” each other. We need constraints on when talking to is transitive. These too are
part of the conditions of application we will have to define.
What about purely mechanical actions like steering? To control a car, a driver must interact
with the steering system. The steering wheel is connected to the steering system, usually through
pneumatics, linkages, and gears eventually to the front tire angle. As the car turns, the driver
receives feedback. Changes in the car’s angular velocity cause the feeling of acceleration; force
feedback from the steering wheel often gives a sense of wheel resistance, its arc gives a sense of
turning sharpness, and the movement as the car itself turns changes optic flow. This raises an odd
question. Because of the chain of internal parts, all playing a role in turning the tires it follows
that a driver cannot turn a car without affecting all the intermediate links in the chain. Does the
driver interact with those internal parts explicitly, implicitly, or not at all? The answer, predictably,
depends on the nature of feedback or return action from those intermediate parts. Defining the
requirements for chaining of mechanical interaction is another instance of the need for a deeper
analysis of the conditions when transitivity holds.
Our analysis of transitivity is becoming complex because interaction has several components
which until now we have only incidentally distinguished. To be more precise, if a explicitly interacts with z, regardless of whether z is nearby or is a distant thing and a interacts with z through
intermediaries b, c, . . . y, then a must have the following:
(1) The capacity to refer to z. a must be able to form a thought about z, a must be able to form
an intention to interact with z, and this epistemic capacity implies that a has concepts or
tacit categories that fit z in some way.
(2) The capacity to discern feedback from z. a must also be able to pick up information from z,
so a can distinguish doing things that have an effect on z from doing things that have no
effect on z. This is a second, though related, epistemic capacity that enables a to correctly
identify feedback as feedback from z.
(3) Causal power over z. a’s actions must actually have a causal effect on z. This is an objective
fact about a’s control over things like z.
Applying these conditions or capacities to the car driver: if we learn a is ignorant of the existence
of deeply internal steering parts, regardless of how significant their activity is for proper control,
how can a have thoughts about those parts, and so have explicit agency? To those who are not
car mechanics, the inner parts of a steering assembly are conceptually invisible. They are just
steering stuff. Without any notion of what the stuff is, how can someone set out to interact with
a specific part of the stuff? Explicit agency requires explicit intentions. And explicit intentions
require having an explicit object of thought.
What about implicit interaction: might a driver interact implicitly with inner parts of the steering system even without knowing the parts? The implicit knowledge part of interaction mostly
concerns making sense of feedback. A person who knows little about cars may still unconsciously
pick up cues about moving parts inside the steering system. Perhaps, the car pulls to one side in
specific circumstances. Or a funny sound seems to come from the wheel area when turning suddenly. These cues may be tuned into by a person’s implicit knowledge systems [82] long before
they are grasped as something identifiable by the explicit system. They can have an effect on intentionality too because without realizing it a person may adapt their driving to prevent those sounds.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:28
D. Kirsh
So someone may have an implicit idea of “the thing making that odd sound” and without thinking
about it explicitly they drive to minimize the sound. Condition one can be satisfied. Such drivers
also are sensitive to feedback because they can implicitly discern feedback from “the thing that
makes that odd sound.” Condition two can be satisfied. The final condition of causal efficacy can
also be met because by definition turning the steering wheel affects internal parts. Causal power
is an objective fact of causal connectivity, and again by the way we have posed the problem, they
are causally connected. Accordingly, a driver might under these conditions interact implicitly with
internal parts without realizing it. By satisfying all three conditions, a driver interacts with hidden
parts of the steering assembly. As to whether the interaction is explicit or implicit: the answer
will depend on whether the person’s beliefs about noisy internal parts are explicit or implicit – if
implicit, then they are things the driver knows about his car without realizing that he knows it.
These conditions sharpen the notion of interaction as happening, at times, through mediation.
But there is more to be said about mediation. We still have not explained the difference between
a person who interacts with z and feels s/he is interacting directly with it and a person who feels
s/he is interacting with z but only mediately, through a set of intermediaries. Both are cases of mediated interaction but one feels like it whereas the other does not. The idea of control transparency
discussed earlier requires feeling as if one is acting directly on a third thing, c. There is a chain of
intermediaries, but the chain is phenomenologically transparent. We see right through the chain,
and act right through it, with the sense that we are acting on the end target, z, directly, much the
way we see and act through a glove when picking up a board.
4.1.2.1 Transitivity and Transparency. Control transparency is the counterpart to perceptual
transparency – seeing through – that Merleau-Ponty [57] discussed with his example of long canes.
The two are connected because one never just sees through an intermediary – seeing right past it
to some more distant perceptual object – without also transparently controlling the intermediary.
And one never transparently controls through an intermediary without also transparently sensing
feedback from it. Control transparency implies transparency of seeing and seeing through implies
transparently controlling through.
Take the case of binoculars. When we look through binoculars, after a brief moment of adaptation, we become a binocular control system: moving our hands to ensure the binoculars scan or
track items far beyond what we can see with unaided eye. We don’t interpret our hand movements
in measures appropriate to the space around us, where things are understood in dimensional units
geared to hands and arms, the actuators intrinsic to peripersonal movement. Our sensori-motor
control of binoculars is tuned to the features of the target world, the remote world. Thus, we don’t
think about inching the binoculars to the right or left in local inches. We think about the target
domain; moving a few inches along the branch or a meter over to the next tree while watching the
movement of a bird. Any explicit intentions we have are rooted in concepts that are meaningful in
the target domain. If the binoculars could somehow magnify like a microscope, then our distant
domain would be the tiny world of bacteria and textures specific to that world and our interpretation of lens adjustments and focus would be in terms of clarity in the microscopic domain. Only
if things go wrong, do we momentarily shift attention from the remote to the local world [28].
Binoculars and long sticks are extensions of our sensory system. We interact with them, through
them, to perceive, to extend sight. But they were not designed to let us cause change in remote
things. Binoculars, hearing aids and microscopes leave their targets unchanged. Hence we do not
interact with the birds we see when we track them with binoculars. Binoculars show that when we
see through something we inevitably transparently control it. But they do not show the opposite
side of the equation: when we control through something we also see through it.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:29
Fig. 14. (a) We see through a laser pointer because when pointing, we don’t think about our hand, just the remote surface. In robotics, there are two problems: planning the position and orientation of the end-effector in
Cartesian space (the pinpoint of light on a surface) and solving the inverse kinematics problem of computing
from the Cartesian path the light creates on a shaped surface to the corresponding trajectory of the manipulator in the configuration space. (b) A robot arm with four joints in its start and goal configuration. The
obstacle constrains the viable paths. Once a path is chosen in Cartesian space, the path must be translated
into a path in the space of multidimensional joint angles. Forces too must be computed at joints. In humans,
the analogy is with thinking about the path the laser will mark out and letting our motor cortex determine
how to move the laser pointer to achieve that effect. Image credit: http://www.coppeliarobotics.com.
To explain this, we must go beyond the standard discussion of the seeing through type of transparency. When do we feel we can act at a distance with the same control that we have with things
that are at hand? Transparent control must feel intuitive.
Let us define a system coupling as the conditions that ensure there is a tight enough causal
connection between us and the target that we can both see through and directly control outcomes
through intermediate links in a transitive chain of interaction. We can interpret our actions in
terms of their distal effects on the remote target. The feedback we receive about our actions is
reliable and complete enough that it is comparable to acting on local objects. This means that we
know how to change the remote thing without thinking, and we understand those remote changes
as the natural outcome of our actions. Thus, when we turn the TV off by hitting ‘off’ on our remote,
we understand our local action in terms of its remote effects. action despite the extensive electronic
mediation.
A formal account of this sort of ‘controlling through’ algorithm is found in robotics where there
is distinction between thinking in the “goal” space and thinking in the “configuration” space [47].
In using a laser pointer to trace a contour on a screen, my thoughts are on deciding where to shine
the light, not how to move the physical pointer, using my shoulder, arm, elbow, wrist, and fingers. I
don’t give a thought to my motor system or how it solves the inverse kinematics problem of translating my chosen pointer path in the external goal space to internal joint angles and muscle forces
[40]. Unless there were an appropriate system coupling between pointer path and my muscular
control of the pointer’s angle over time, we couldn’t have this transparency. There must be a mapping from joint configurations to goal space that supports the desired movements in goal space.
And the computations necessary to solve the inverse kinematics problem must be reliable and unconscious else we will shift our awareness from the end point of the light to our joint positions
and movements. Only when things just seem to take care of themselves do I have control transparency [4]. See Figure 14. Like our binocular example, our sensori-motor sense of where we are
pointing and how far things are from each other now presupposes the inverse-kinematics problem
is solved, both for planning, executing, and interpreting. It explains why we can see through our
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:30
D. Kirsh
Fig. 15 (a) Perceiving through hammering is more complex than perceiving through a white cane because
all parts must be appropriately coordinated – made into a human-hammering control system. Controlling
through is a new term to describe the capacity and accompanying feeling that people have when they feel
in charge of a process because of their skill in creating a reliable set of couplings. They feel they are tightly
coupled with each link in a causal chain – hammer, nail, wood – hence, they feel they are responsible for the
process. They are the controlling part of a causal system. (b) We see the forward side of hammering. One
hand holds the hammer, the other the nail against the wood. When things are under control causal forces
move forward as they should. (c) We see the feedback side of hammering that is also necessary for control.
Vision enables us to directly see the hammer striking the nail, while haptic sensing via our nail holding
hand (hand2 ) enables us to directly feel the nail’s position on the wood. Haptic sensing also lets us discern
distal forces as these propagate back from wood to nail to hammer. As our skill in hammering increases, we
eventually sense and control things well enough to become a transparent hammering control system. It feels
to us as if we see and control through the hammer so effortlessly and so successfully that we transparently
act on the nail – we act at a distance.
joints and laser pointer to the dot on the screen, and why we can transparently control that dot.
The only difference between binoculars and laser pointer is that with a laser pointer we have a
remote effect on the thing we are monitoring.
Let’s apply this seeing and controlling through analysis to the transitivity of hammering a nail.
First, I must have in mind how I want the nail to move through the wood, the angle, the speed,
what to watch out for – these are my seeing through concerns. They only bear on the goal space:
hammering that nail in correctly. My attention soon shifts, however, to my grip and how to swing
the hammer to impact the nail. I can’t transparently control through the hammer yet because I
have concerns about controlling through the hammer that must be resolved before I feel I can
act at a distance – before I can transparently control the nail through the hammer. Because a
nail can shift position, I also must hold the nail rigidly to preserve the system coupling between
nail, hammer, and wood. That means that to feel in control of the coupling, there is a chain
of additional interactions beyond hand gripping hammer that I need to regulate, specifically,
ensuring the hammer strikes the nail correctly and the nail remains correctly aligned with the
wood. See Figure 15. Controlling through refers to the extra control I must exert to ensure the
causal chain works right. It depends on maintaining the right causal coupling and registering
feedback from links in the chain. I have to create and maintain a system coupling.
This highlights two points: (a) transitivity presupposes a reliable system coupling; (b) sometimes this happens automatically and becomes transparent without our conscious involvement or
concern, as is the case with the joints in our arm, the modern laser mouse, the steering system
in a car. At other times, we must consciously create and maintain a reliable system coupling between multiple parts, as is the case with hammer and nail, where we need to control three things:
our grasp, the hammer-nail dynamics, and the nail-wood dynamics. When everything goes well,
and we execute the coupling well, we may begin to attain control transparency and see through
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:31
hammer and nail to track the nail’s progress in wood. Of course, our eyes help here. Our visual
monitoring of the nail in wood is independent of our tactile sensori-motor control of hammering.
Mediated interactivity may depend on many modalities. The moral, though, is that creating and
maintaining a system coupling is central to reliably controlling through. When it becomes automatized, it enables control transparency. But it may never become automatized and we may never get
to the point where we make unconscious adjustments to keep things on track. Well-designed tools,
as most hardware tools are by now, provide the necessary affordances for achieving a measure of
control transparency. But as the number of links in a control process increases, we may need to
bring in helper tools to create a system coupling. This system coupling is the causal matrix that
guarantees chained feedback and a reliable form of chained forward force. In such cases, a person
who knows how to manage the links in the coupling will feel in charge of the coupling; they will
have a sense of controlling-through the couplings, producing an alignment between intention and
execution. They have explicit agency and explicit transitive interactivity. In the best cases, these
system couplings, despite their complexity, become transparent.
4.1.3 Reflexivity. Reflexivity is the last formal attribute to discuss before we can apply our analysis of interactivity to HCI and architecture. When a system can interact with itself, such as when
we scratch ourselves, talk to ourselves, warm ourselves up by running, we interact with ourselves
reflexively: ∀x: xRx. Reflexivity is a form of unmediated action on oneself. Walking is not a reflexive type of interaction because we cannot walk on ourselves or with ourselves. Most models
of interactivity do not support reflexivity because causation is not reflexive – things are rarely
self-causing.
It is easy to think that reflexivity is nothing special. Doesn’t a system interact with itself by
relying on symmetry and transitivity? For instance, dancers regularly seem to dance with themselves when in front of a mirror. But not the way two physical partners would. Their dancing with
themselves is mediated by their image: They act on the mirror, the mirror acts on them through
their perception, and they react to what they see. Owing to transitivity they act on themselves
mediately. So dancing with oneself via a mirror cannot be reflexive interaction. It is still a form of
interacting with oneself; it is just not reflexively dancing with oneself.
Dancing with another person is different. When two people dance together each is dancing
with the other, so they can change the system consisting of the two dancing by changing their
own movement. An individual dancer’s modification (stepping left instead of right) has a global
effect on joint dancing. Especially since the other is likely to adapt. When they change themselves,
they change the system they are part of.
Few systems support genuine reflexive interaction. In particular, neither classical nor networked
HCI do because both those types of interface distinguish the agent from the system. There is still
a distinction, a boundary, between where input enters a networked system and the agent creating
the input. People cannot be a component part of a network interface; they are on the outside. It
is the network that determines where the interface is. Anything people do with themselves, say
scratch or mutter, may be picked up by sensors as inputs to the network system. But that is not
good enough for reflexivity. Interaction with the system via its digital sensors is mediated; it is no
different than acting on any other input device. True reflexivity, by contrast, requires that the very
act of interacting with oneself is an act of interacting with the system without mediation. Only
when a person is a constituent of the interface – a part of the overall system – can they change the
system without mediation. That is why scratching oneself is reflexive. An arm is part of a body, so
one part’s action affects the whole without mediation. It is also why playing tug of war is reflexive,
since one person’s extra tug is a change to the whole system as well as to themselves, or why two
people talking is reflexive because any one person’s increase in volume is an increase in the volume
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:32
D. Kirsh
Fig. 16. (a) Reflexive interaction occurs when an entity acts on itself, thereby changing its own state. It
typically requires that a part of a system act on the system as a whole. (b) Scratching oneself is reflexive
interaction because one acts on oneself without the help of tools, such as a comb. Hands are part of the bigger system – the body. Scratching is both a change in a body part (it is moving) and a change in the body –
scratching gives relief. (c) As one dancer changes her steps or posture, she changes her relations to her partner and their joint position. In social interaction, individual people are part of a bigger system – a social
group. Some of their individual actions constitute social interactions with the group, possibly changing it.
(d) schematically represents how dancer A when moving to the right, or when putting on new shoes, effectively acts on his own state; this has an effect on dancer B whose response further affects A. The complete
set of actions and reactions shapes their duet, as represented by the outer ellipse.
of the two, or one person’s quickened speech is a change in the system’s communication rate. See
Figure 16.
With these properties of interaction in mind, we can now characterize the type of interaction
that goes with each interface type. Since buildings involve all three types of interfaces, and all
three types of interaction we now will have a more complete account of how humans interact
with buildings.
4.2 Direct Manipulation Interaction
In architecture, direct manipulation correctly characterizes our everyday causal interaction with
manipulable elements – switches, furniture, plumbing, and doors. We have other ways of interacting with buildings and building parts, but when facing the purely physical parts of a building
in one on one interaction, we primarily interact directly.
Direct manipulation is symmetric and transitive, but not reflexive. The symmetry of our physical interaction is obvious, Newtonian bidirectionality. When we act on our own, without anyone
nearby, interaction is essentially push me pull you. Transitivity is obvious too because so much
of what we do involves causal systems. Turning a knob turns the oven on, it begins heating the
interior and starts to cook the casserole. It is a system. To make that chaining of process transparent, our devices behave predictably. Well-designed direct manipulation systems have predictable
system couplings; they maintain a lawful connection between input actions, system state, and appropriate feedback. No surprises here. Ovens, voice-controlled TV’s, light systems, and so on, all
fit the direct manipulation model. Do this here, that happens there. Feedback happens through
the immediate visibility of effects, or through delayed but reliable response (the oven heats up),
through sound, or through onscreen displays.
Reflexive interaction, however, is not supported in direct manipulation systems. Physical interaction could, in principle, be reflexive if it is part of a bigger system of interaction. Sometimes it is.
But typically, physical interaction occurs at an interface where an agent a acts on physical thing
b and b in turn acts on a. The two interactants are distinct. One is a human capable of self-caused
actions, the other a physical thing. Both have separate trajectories. Thus, interaction between person and thing either happens at an interface – the knobs, floor, seats, or windows; or it happens
through mediation as is the case with digital systems, where we interact with our TV’s, HVAC’s,
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:33
wireless lights through input devices that mediate our control. The only way we might interact
with physical things in a reflexive manner is if we act on them without an interface. For that to
happen the tool, object, or artifact must be absorbed into the neural representation of the agent’s
body schema [52]. At that point, it becomes inappropriate to speak of direct manipulation because
to paraphrase Merleau-Ponty [57] how we control our hands is not how we control things outside
us. Body parts are not in space in the same way as things that are not part of us are part of space.
4.3 Network Interaction
In network interaction, multiple components act and sense non-sequentially. It is hard to know
when and where input is being gathered and when and where output is being returned. As more
sensors are being incorporated into buildings, the more our interaction with buildings becomes
network interaction.
A familiar complaint about sensor-based and context-aware systems is that humans do not feel
in control [26]. How can someone feel in control if they don’t know where sensors are, whether
they are on, or what they are capturing? Is their gesture being interpreted by cameras? Are inferences made from their speech intonation? More often than not input is initiated by the system;
people are frequently the patient not the agent. On the output side, things are no more intelligible.
A network may respond to an input in diverse ways. It might do any one of A or B or C, or all of
them. Disjunctive output is hard to notice, and disjunctive concepts are intrinsically more difficult
to learn [21]. This makes recognizing how a system responds to one’s action – on those occasions
when one knows that one’s actions are being captured by a sensor – hard to detect and understand. Indeed, the response function of a context-aware system is usually unlearnable without
large learning sets, which agents may never encounter. Essentially, occupants of a context-aware
system are either ignorant of the system’s actions or they feel like they are flying blind. They
regularly have no idea how to make the system serve their needs.
Why say that person and system interact if the person is unaware of their effect on a system, and
unaware of its response? People need to know how to introduce inputs and cause change if they
are to sensibly form explicit intentions. Without explicit intention there can be no explicit agency.
The simplest notion of human non-human interaction is when the human acts intentionally as a
voluntary agent. Yet in the social world, interaction does not have to be voluntary, intentional, or
explicit; people can interact implicitly. When people are together, whether or not they are aware
of their own implicit action, or of the other’s, or of the bidirectional relation holding between their
joint implicit actions, it is still correct to say the two are interacting. They implicitly interact when
they share space together because they mutually affect each other in a recursive manner. Think
how meaning laden body posture, word choice, body noises, and social distance is. People can
interact without explicitly knowing they are.
Why should it be different when the thing acted on is an invisible context-aware system? This is
the question that arises for network interaction. The obvious reason not to reserve implicit interaction for human–human interaction alone is that bidirectional causation – symmetry – is sufficient
for interactivity. But that reason is not quite adequate. A caveat is needed: the return action of the
affected system must in principle be discernable and intelligible. The system or thing responding
to an agent’s action – the interactant – must produce a learnable response. This extra condition
is needed because symmetry on its own says nothing about the detectability and comprehensibility of return actions. Bidirectional causation requires only that there be a causal law governing
how the interactant reacts and that this reaction must reach the original agent. The law and reaction might be arbitrarily complex or arbitrarily small. In extreme cases, the return action might
be undetectable or so chaotic that it is humanly unlearnable. But then how could a human know
that something has occurred? The main idea of implicit interaction is that explicit knowledge of
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:34
D. Kirsh
bidirectional causation is not necessary for bidirectionality to occur. Knowledge of bidirectionality may be implicit. This still requires that the function describing the interactant’s response be
statistically learnable. It may be hard to learn and elude learning through explicit strategies. But
if agents manage to learn it implicitly then they can respond adaptively and so control or partly
control the interactant unconsciously. This means that a person doesn’t have to have explicitly
intentional control to have control – implicit control [58]. Accordingly, bidirectional causation is
sufficient as long as the function describing that causation is learnable. Interaction is an objective
fact of reciprocal responsiveness, not of awareness or explicit belief.
If this analysis is on the right track, it means that people can interact in context-aware systems
even when the system is pervasive and seemingly invisible. It is irrelevant how many steps a
network transitions through or how many steps there are in its transitive chaining of b causes c
causes d. As long as the overall function that maps inputs to the eventual outputs that reach people
is reliably implemented and is implicitly learnable people can still interact with it despite being
unaware that they are able to. Agents can interact implicitly.
Here is an example to support this view. Imagine a case where an agent is unable to explicitly
detect whether some of his actions cause system outputs. Perhaps, those outputs are lagged or
disjunctive or come in a form that is unexpected. The network system reliably produces a response;
it just goes unnoticed. The person, in fact, is impacted. But he doesn’t register that impact and so
does not see it as a response to anything he did. For example, in some buildings, the last person
to leave and the first person to arrive trigger an auto-off/auto-on switch in the HVAC. Suppose
that last person is me, how would I know I caused the heating or AC to go off? I wouldn’t. I’d
know when I closed the door; but why think that triggered anything? If I am the early bird, the
lag between the HVAC turning on and my zone cooling or heating up would likely prevent me
from confronting my effect on the system. Imagine now that the HVAC does make a detectable
sound; but either I fail to notice it or fail to connect that forgettable sound with the HVAC or
with something I did. If this happens regularly, my implicit system may pick up a regularity and
implicitly learn there is a connection between my early arrival and the HVAC. In that case, I’ve
interacted with the building implicitly. I acted on the building and it symmetrically acted on me.
Although I had no idea that I was affecting the HVAC I did implicitly detect it. According to current
theories of explicit vs. implicit agency, having an active concept of what one wants to control is
necessary for explicit agency [79]. To form an explicit intention to do something requires explicitly
knowing what you want to do and achieve. Here, we have no such explicit judgment about cause
and effect. But we do have correlational knowledge. This is sufficient to form implicit intentions
and implicit agency, however, because making an implicit connection does not require explicit,
articulable understanding.
Let’s take the case further. Now I arrive early in the morning, and this time, I do in fact notice a
sound and notice that the heat soon comes on as soon as I open and close the door. On a hunch I
check in the evening. I hear a click, and then I hear the fans powering down. Suddenly, what was
tacit is no longer. I have a conjecture about the effects of my action, and I have confirmed it. My
explicit knowledge has changed. I now intentionally interact ever after. Of course, the feedback
is no more observable than it was before. It was there the whole time. The difference is that now
I notice it. This shows that we need to dissociate interaction and explicit agency. Else explicit
knowledge alone would change the truth about interaction while leaving control, and adaptive
response to feedback intact.
Following this thought, let us define implicit interaction as interaction that does not require
explicit agency. It still requires implicit recognition of an association between action and some
sort of feedback. What extra does agency – explicit control – confer? In a word, planning – the
apparatus for deliberate action and intervention that comes with conceptualization and reasoning
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:35
about effects. Once I know what my actions do, I know how to cause the lights to turn on and off.
Had my interaction been implicit I could not have made plans involving turning the lights on or
off via entry. My capacity to do things with the system increases as soon as I move from implicit
to explicit interaction.
Being in a context-aware system is not the only time we should think about interfaces and
interactivity as a network and think about implicit interactivity. Arguably, network interactivity
defines much of the everyday context of inhabiting a building where we interact in both explicit
and implicit ways.
Imagine your own living room. Surrounding you is furniture, walls, functional artifacts, and
various things you might use for entertainment or as props during social interaction. Direct manipulation nicely covers your manual interactions, but what about interaction at a larger scale? For
instance, suppose you move furniture around. Which type of interaction is this: direct or network?
It depends. Often when a person shifts a chair or side table is their real concern with the arrangement. Arrangements usually include more pieces than the few one might touch. It is a property of
a collection – a global property. This invites a question. When you touch one or two pieces are you
interacting with the whole configuration – the collection – or just the pieces you actually touch?
One reason to think you are interacting with the whole configuration is that in musical performance one interacts with the sounds coming from the group – all the other instruments and not
solely with one’s own instrument. Jointly, the group creates chords, harmonies, or syncopation
when the sounds combine. Emergent musical structures would not exist without the contribution
of each performer. Moreover, the interaction is social and involves distributed cognition [44]; players co-adapt to each other’s style and activity. Even when practicing alone in a practice room with
a recording of others playing – a situation more like moving furniture because canned music is not
responsive – the sound still blends with the others, suggesting that I interact with more than just
my own instrument. Perceptually, both live and music-minus-one10 situations are broadly similar.
The walls and other items in both rooms are acoustically affected in a similar way, supporting the
idea that I am interacting in an objective sense with the whole system of walls, space, sound, and
my instrument. The interaction with other sounds is not just in my brain; it is an objective property.
Might the same also be said about moving furniture? I touch one or two pieces, but I decide
where I have to move the pieces and how to adjust them on the basis of the whole configuration.
By acting on one piece, I affect its relations to all, and so transitively one might argue I interact
with the group too. They constitute a system. In actor network theory, non-digital artifacts can
combine to form artifact ecologies, where changes in one are thought to propagate to others [46].
The internet of things (IOT) with its constant message passing makes that position more credible,
but it may still apply without the IOT in many physical contexts.
Do network models of interaction offer an answer to the architectural dispute about whether
occupants interact with walls when they navigate? The question is more than cocktail amusement.
Architects passionately believe that when they design the shape of a room – say, making it round
vs. square – they affect the interaction of inhabitants. This has always puzzled me. Do they mean
(a) the shape affects interaction at certain times, while at other times it does not; or (b) the shape
affects interaction every moment but in a probabilistic way? There are possible arguments for each
side.
The obvious version of position (a) is trivial. We know that architects keep wall collision in mind
because they design walls without protrusions to prevent injury. Whenever a person walks from
10 Music-minus-one
is the name given to recordings of concerti and other classical pieces where the part of the soloist or
other key player is not recorded and musicians at home have the chance of feeling they are the performer with orchestral
or ensemble backup.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:36
D. Kirsh
Fig. 17. (a) Are the people in this art gallery interacting with the walls when they move through the space
without bumping into walls? (b) Are the people in the center of this crush of people now interacting with the
wall when they shift about in contact with each other and in partial response to how the outer folk bump
into and move along the walls? Images: 17a Fotolia_Dmitry Vereshchagin; 17b Geoff Robinson.
one point to another in a building, there is a small but real chance of scuffing or bumping into a
wall. When there is a collision, an interaction occurs that architects want their design to avoid.
Wall shape affects interaction at certain times, at other times it does not.
Yet, if this were all there is to it why should an architect be proud of a wall design that does
nothing more than minimizing occupants’ chance of collision? Surely, their pride and belief in a
wall’s interactivity during navigation, is based on something stronger, such as the way the wall
design affects the occupants’ social behavior or state of mind. This more reasonable ground for
pride suggests that walls can have additional effects on us beyond physical impact. They may shape
inhabitants’ navigational experience – the pleasure of following a slow curve or discovering an
elegant shortcut. Or perhaps, the interaction is socially mediated? I choose a corner of the room to
stand in and talk with a friend. Neither of us touches the wall though we come close. Clearly, the
wall acts on us because, for example, we always sense it and so it partly shapes how we interact
with each other, how we position ourselves, how we move relative to it and others. Our response
may not be called navigational, but it is certainly in the architectural spirit of interaction. But the
question remains: how do we affect the wall?
Here is where our position might get subtle. We interact with the wall cumulatively, a tiny bit
each time, but unnoticeably. If we move one way because the wall is round and another because
it is rectangular, we no doubt cause different wear patterns on the floor, we dirty it in different
ways, and in the long run, we may force a change in the lighting or sound or some other way we
might affect the wall. These noticeable effects are not one-off, but cumulative and lagged. Why
should interaction with insensate objects require that bidirectional causation be large enough for
the human agent to detect it each time?
Position b is more compelling. We affect the wall in a more substantial way each time but probabilistically. The argument now is that we always interact with the wall because there is always a
chance that we might touch it. The central idea behind probabilistic theories of causation is that
“causes change the probability of their effects” [29]. A wall’s shape partly determines the probability of hitting it, or hitting it in certain places, or in a certain manner. This is especially clear
if we consider what might happen if, counterfactually, we were in a group of twenty. As group
size increases, the chance that one of us will touch the wall rises significantly. See Figure 17. Since
social interaction in groups in a single room is transitive, it is enough that anyone in the group
touch the wall to warrant saying that everyone in the group interacts with (though not touches)
the wall. When an outer person touches the wall, they will move away, thereby sending a shudder
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:37
of adaptive interaction through everyone as they jointly accommodate that person’s positional
change. This physical adaptation may be unconscious for some. But still present. If the group size
shrinks, the probability of physical contact also shrinks. But from a probabilistic view, everyone
interacts with walls in this physical way, even when they are alone; it is just that their chance of
collision is rather low. This is the “probabilistically interact” at every moment view. It assumes
that to interact with something we do not need to always overtly cause changes in it in response
to its actions on us as long as periodically we do cause changes in it. The room partly determines
the probability distribution of behavior.
If either of these arguments is compelling, it offers support to those architects who argue that
we interact with walls when we navigate through a building, or when we collect in corners or
alcoves, even when, as individuals, we never touch their surface.
Related cases are easy to invent. Suppose in our building the lights come on across the whole
floor (as opposed to the local area where we are active) only when there are enough people over
the course of an hour, regardless of where they are. Two of us arrive first. Only the nearby lights
come on. Two more arrive. The threshold is reached and lighting for the whole floor comes on. We
are all partial causes, so presumably we all have interacted with the building. Suppose now some
of us leave before the threshold is reached. As others arrive within the hour, the lights come on
throughout. Did the early leavers interact? Any one of us might have been the last straw, but only
some of us were present to experience the effect. Must we all experience the output every occasion
to be said to interact? Not if probabilistic interaction is acceptable. It is enough that causes change
the probability of their effects. By coming into the space, we changed the probability that the lights
would come on throughout. We don’t have to receive feedback each occasion to be a cause and
to be a probabilistic recipient – a sometimes experiencer – of an effect. If causal invariance were
required then in artistic installations, where there is only a probability that a system will respond
to any of our actions, we could not be said to interact with the system, except in those moments
when it actively responds. Returning to the lighting case, if we never know our effect, we cannot
explicitly intend to help cause the lights to go on. On the other hand, as long as we implicitly know
there is a probabilistic connection then we are able to implicitly interact with the lights.
The network model of interaction deserves much more complete development. It can provide
commentary on many of the difficult intuitions architects and others have about interacting with
collectives, holistic structures like buildings, walls when we are in groups, and so on. It still falls
short of the ecological model, though, in that it does not support reflexive interaction. We turn
now to the last model of interaction, full blooded architectural interaction.
4.4 Interaction in Architectural Space
I have argued that only in systems containing social interfaces, or things just like them, do we
find support for reflexive interaction. That is one of the special features that architects need bear
in mind when thinking about interaction in and with buildings. Every sort of interaction can be
found in architecture: direct manipulation, network interaction, ecological interaction, but it is
co-present social interaction – a type of ecological interaction – that is what differentiates the
architectural design problem from network and direct manipulation design problems.
Why is interaction sometimes reflexive in architectural space? Because the very act of moving oneself can create a social interface where none existed. When two people are close enough
in physical space, especially physical space with walls, furniture and “props,” their biochemistry
changes – literally [27]. We don’t have to speak or generate behavior that resembles device input to
change our social interface; the change happens because we create and share bubbles of commonality: epistemic, value-based, experiential, and practical. By moving into range of another human,
we become a small social group, a social bubble, something bigger than each of us, but composed of
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:38
D. Kirsh
us and our context. It follows that any change we make to ourselves is a change to the group bubble.
We can act on our bubble from the inside. Changing our stance, physical distance, gaze, or fidgeting
nervously, are all reflexive actions and they carry social meaning that can change the bubble.
To my knowledge, reflexive interaction is not yet a topic of study in HCI. It is a marker for a type
of interactivity not subsumable in current HCI models. Overall, though, despite it being a welldefined attribute that can be used to distinguish interfaces, I suspect it will not be as important or
interesting to designers as the idea of embodied interaction and niche-style ecological interfaces,
which refer to an entire style of interaction. If there is gold in this inquiry here is where it lies.
Why? Because architectural intuitions about interface and interaction raise hard questions about
how people share physical space. There is so much we don’t know about how people share their
social, physical, and epistemic bubbles. How they co-create a space, or a place. How they see space
as a resource to share, to coordinate activity in, to extend their mind into and to use in cognitive
ways [42]. Indeed, what is the space of shared activity [66]? Yet, despite our conceptual naiveté,
this embodied sharing is the very thing driving the new model of interaction; it constitutes the
situated aspects of being in a place that context-aware systems cannot recover or recreate.
Here is an example of a bubble that has all the aspects we need to think about. To get the
possibilities for interaction that being in this bubble offers you, you have to be an embodied human
in this very space.
When a mother clothes her child the two participate in an intensely personal joint activity. The
mother without intending to, is teaching her child about coordination, cooperation, and shared
attention. Cartesian space is far from the mind and brain of each as spatial points of interest are
jointly defined by the two as they touch and move, together coping with the awkwardness of dressing. There is an odd sort of joint first-person perspective [83, 50] – a shared peripersonal space [10].
To clothe oneself or another, it is necessary to comply with shapes and forms. Mother and child
choreograph their movements to overcome the hassles of putting limbs in sleeves, and socks on
feet. For the participants, space is grasped in an embodied manner where both parties understand
what is happening in a highly indexical and situational manner, often using non-conceptual skills
[20]. The two construct an interdependent social reality. Those outside the duet can appreciate
but not grasp that constructed social reality in the same way owing to the neural consequences of
sharing peripersonal space [75] and the feeling of personal involvement [75]. This social reality
also includes a profound empathy for the experience of the other [65, 76]. People share experience
almost as much as they share activity. To participate in these special activities, you have to be in
close quarters with another, grasp shared peripersonal space and develop feelings of involvement.
These are hard to digitally recreate.
The embodied and situated nature of dressing highlights the complex way space is understood
by humans. Our knowledge of “where” is often grasped through relations defined by our bodies. Someone snaps their fingers. Our knowledge of where the sound comes from is not initially
encoded in objective cartesian space. We encode it relationally; our knowledge may be better understood by our capacities to orient, to look in the right place, to be primed for additional sound
from “there” [56]. This same type of indexical, embodied knowledge holds for my knowing where
I am in relation to you, or my knowing where you are looking [15], knowing how you are moving
your limbs and how I am moving mine. When two people are together, they assume that the other
roughly hears what they hear, and each knows the other assumes that too. It is recursive. This
is the sense in which participants jointly create an epistemic bubble that goes beyond what can
be explicitly articulated in common ground11 because it includes non-conceptual elements. Simon
Penny has a thoughtful account of how non-conceptual, sensori-motor interaction lies at the heart
11 See
Clarke’s original work on common ground (1995) and extensions by Klein et al. (2005).
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:39
of the best new media interactive installations [70]. The interactive possibilities that come from
sharing a bubble without another person are not readily recreated.
A second bubble has to do with experience and empathy [6]. When we stand beside a friend,
we likely share much of the same experience. We feel the same ambient temperature, we smell the
same smells, see the same colors and things around us. The list is longer than you might think. That
people differ at times and discuss their individual reactions is more an indication that we share
so much that our differences are noteworthy. Sharing experience is possible in part because we
engage the world through active and interactive perception [24]. We are not passive, our grasp of
things around us is through sensori-motor interaction [18]. So not only do we encounter a similar
world because we are a few feet from each other; we encounter it because in all likelihood we are
participating in some form of joint activity that shapes a salience structure of the world around us
[63, 67]. Once in this bubble unique interactive possibilities open up.
A third bubble, also related to experience, is about valuation [38]. People have different tastes
and preferences, but our differences are drawn up against a background of similarity. Wittgenstein
spoke of the necessity for intersubjective sense-making bottoming out in our common “form of
life” [85]. Without knowing what it’s like to be human, wanting and liking what humans like,
feeling what humans feel, seeing the world through senses like our own, how could we establish
the sort of deep embodied experiential common ground, we need to understand each other? Again,
being inside this bubble opens up interactive possibilities that differ from those outside the bubble.
To date, our digital creations do not participate in these social, co-agency bubbles where interaction is far more complex. The simplest reason is embodiment. Motor understanding of space
– shared space – and the haptic feel of materials and structure in local space are necessary for
full participation in bubbles. Joint activity often happens in joint space and sometimes this requires discovering that we share peripersonal space – it’s easy to collide. We need to fine tune our
coordination with others, and this requires acquired sensitivities. Imagine two people juggling
or carrying a couch up the stairs. The timing of force variation depends on our bodies and also
on the spatial properties of the material mediators we need for embodied interaction. Right now,
there are limitations on how successfully digital substitutes can recreate the sense of embodied
understanding that spatial co-presence gives rise to.
This body-based understanding is not reducible to intellectual understanding of problem space,
task structure, and sub-goal structure. There are other things going on that have to do with touch,
timing, force, and constructing real-time understanding of others. Architects have a deep responsibility to understand how humans jointly interact. In large halls, they can subtly change how
involved people feel by changing how visible others are and how easy it is to see the effect of
one’s single voice or body on others. There is a reason fascist and brutalist architecture is so cavernous: it overwhelms individuals, reinforcing the idea that the state is more important than the
single citizens who jointly constitute it. Does HCI face similar possibilities if it operates with a
shallow understanding of the psychology of human co-presence?
5 CONCLUSION
I have argued that HCI has pushed our notions of interface and interactivity far beyond direct
manipulation. With context-aware systems, we need to think about network interaction where
interaction can be implicit – invisible to the agent – it can be probabilistic, it can hold at the group
level, and it may be bewilderingly complex. Despite this rise in complexity, interaction in network
models still resembles direct manipulation in being symmetric and transitive. But the variety of
ways interaction can be transitive and how transitivity works in networks is far more complicated
than anything we find in object-based interaction, the direct manipulation model of the classical
account. It is also significant that in network interaction, agents oftentimes are not only unaware
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:40
D. Kirsh
they are interacting, they may not have been the initiators of interaction. The role of agency,
intention, and control may have to be rethought.
I argued, further, that network models and direct manipulation models of interaction do not
support interaction that is reflexive: cases analogous to a person acting on themselves, such as
laughing at oneself, scratching oneself, talking to oneself. They are not reflexive because if there is
a digital boundary between human and other, humans must interact with the digital other through
input sensors and devices. By definition, humans are separate from those digital things. There is
an interface at the digital divide. This separation between us and digital systems means that when
we close our eyes, or when we talk to ourselves – both being actions that are reflexive – we do not
engage the digital system in a reflexive manner. The system, if it is context aware, may record us,
it may react to us, but our interaction with it is mediated by its sensors.
This implies that we can interact with ourselves reflexively but with the system only mediately.
The one way we can reflexively interact with something other than ourselves is if we are an integral
part of the very system we interact with. In such cases, a simple reflexive action such as closing
one’s eyes, would be a reflexive interaction with something beyond us just if we are part of that
system. Only then would self-directed actions be actions on the whole system. In duets, a move
by one person has an effect on the other, and simultaneously on the duet as a whole. No tools, no
mediation, no action through an input device. In co-present social interaction, we are an intrinsic
part of the whole, this often happens in an interface provided by the built environment. I suggested
that architects have a line on social interaction and its role in design that is not yet standard in
HCI. This makes it interesting. The metaphor of this interaction is of organisms in a niche: when
we interact together in a building, we change the properties of our jointly created niche in the
process of interacting with each other.
Metaphors aside, what makes face-to-face social interaction interesting is that we have a special
relation to those we socially interact with. Specifically, we are sensitized to what others are feeling,
registering, valuing, and doing. This is a consequence of being embodied humans evolved for social
interaction. When we act jointly, we coordinate in a mutually created space conceptualized with
situationally specific concepts. These concepts may be jointly created and ad hoc [5], a type of
shared epistemic bubble. Further, our grasp of many of the things in this bubble are sensori-motor
based and non-conceptual. This makes it hard for AI and even robots to socially interact with
us in a full-blooded manner. We can interact with these inventions in a direct manipulation way,
or a network way. We can even interact in weakly social ways, exchanging words, working on
projects together. But there is something missing: empathy, shared understanding of momentary
experience, shared valuation, a sense of place. It will be interesting to see how architects change
their designs when they expect humans to be interacting with robots much more. My own guess
is that there will be less spatial intrigue in our built spaces and fewer nooks for social interaction.
Owing to the more embodied/embedded view of humans we find in architectural thinking, it
would be surprising if there were not significant differences in the way architects and HCI designers approach their design problems. I mentioned six at the outset and see these as reflecting
different requirements each field has faced historically. As the fields interpenetrate, we may predict
new metaphors of interaction and interface to emerge, to the benefit of both.
ACKNOWLEDGMENTS
I gratefully acknowledge my principal architect informants Alan Penn, Niall McLaughlin, Abel Maciel, Martha Tsigkari, Peter Scully, Ava Fatah gen Schieck, Ted Kroeger, Vasileios Papalexopoulos,
Ruairi Glynn, Chris Leung, Bob Sheil, and Sean Hanna. Among my key HCI informants I include
Yvonne Rogers, Steve Whittaker and Don Norman. I also am grateful for the thoughtful feedback
I received from Mikael Wiberg, Sarah Goldhagen, for the excellent challenges of two anonymous
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:41
TOCHI reviewers and to Hamed Alavi for provocative conversations. I gratefully acknowledge the
financial support of the Leverhulme Trust through their visiting Professor Award VP1-2016-011
from 2016-2018 for research at The Bartlett School of Architecture.
REFERENCES
[1] R. Adolphs. 2003. Cognitive neuroscience: Cognitive neuroscience of human social behaviour. Nature Reviews Neuroscience 4, 3 (2003), 165.
[2] H. S. Alavi, E. Churchill, and D. Lalanne. 2017. The Evolution of Human-Building Interaction: An HCI Perspective
PREFACE (No. ARTICLE, pp. 3–6). Interaction Design & Architectures.
[3] C. Alexander. 1977. A Pattern Language: Towns, Buildings, Construction. Oxford University Press.
[4] M. A. Arbib, J. B. Bonaiuto, S. Jacobs, and S. H. Frey. 2009. Tool use and the distalization of the end-effector. Psychological Research PRPF 73, 4 (2009), 441–462.
[5] L. Barsalou. 1983. Ad hoc categories. Memory and Cognition 11, 3 (1983), 211–227.
[6] B. C. Bernhardt and T. Singer. 2012. The neural basis of empathy. Annual Review of Neuroscience 35 (2012), 1–23.
[7] M. J. Bitner. 1992. Servicescapes: The impact of physical surroundings on customers and employees. Journal of marketing, 56, 2 (1992), 57–71.
[8] J. Blom. 2000. Personalization: A taxonomy. In Proceedings of the Extended Abstracts on Human Factors in Computing
Systems. ACM, 313–314.
[9] P. Bourdieu. 2017. Habitus. In Habitus: A Sense of Place. Routledge, 59–66.
[10] C. Brozzoli, G. Gentile, L. Bergouignan, and H. H. Ehrsson. 2013. A shared representation of the space near oneself
and others in the human premotor cortex. Current Biology 23, 18 (2013), 1764–1768.
[11] R. Buchanan. 1992. Wicked problems in design thinking. Design Issues 8, 2 (1992), 5–21.
[12] V. Caggiano, L. Fogassi, G. Rizzolatti, P. Thier, and A. Casile. 2009. Mirror neurons differentially encode the peripersonal and extrapersonal space of monkeys. Science 324, 5925 (2009), 403–406.
[13] S. K. Card, J. D. Mackinlay, and G. G. Robertson. 1991. A morphological analysis of the design space of input devices.
ACM Transactions on Information Systems 9, 2 (1991), 99–122.
[14] L. A. Carlson-Radvansky and D. E. Irwin. 1993. Frames of reference in vision and language: Where is above? Cognition
46, 3 (1993), 223–244.
[15] M. Chita-Tegmark. 2016. Social attention in ASD: A review and meta-analysis of eye-tracking studies. Research in
Developmental Disabilities 48 (2016), 79–93.
[16] A. Cussins. 2002. Experience, thought and activity. In Essays on Nonconceptual Content, Y. H. Gunther (Ed.). MIT
Press, 133–163.
[17] N. S. Dalton, H. Schnädelbach, M. Wiberg, and T. Varoudis. 2016. Architecture and Interaction. Springer, Cham, 978–3.
[18] F. P. De Lange, M. Spronk, R. M. Willems, I. Toni, and H. Bekkering. 2008. Complementary systems for understanding
action intentions. Current Biology 18, 6 (2008), 454–457.
[19] P. Dourish. 2004. Where the Action is: The Foundations of Embodied Interaction. MIT press.
[20] G. Evans. 1982. The Varieties of Reference. Oxford University Press.
[21] J. Feldman. 2003. The simplicity principle in human concept learning. Current Directions in Psychological Science 12,
6 (2003), 227–232.
[22] A. T. Friedman. 2006. Women and the Making of the Modern House: A Social and Architectural History. Yale University
Press.
[23] C. D. Frith and U. Frith. 2008. Implicit and explicit processes in social cognition. Neuron 60, 3 (2008), 503–510.
[24] V. Gallese. 2003. The roots of empathy: The shared manifold hypothesis and the neural basis of intersubjectivity.
Psychopathology 36, 4 (2003), 171–180
[25] T. Gu, H. K. Pung, and D. Q. Zhang. 2005. A service-oriented middleware for building context-aware services. Journal
of Network and Computer Applications 28, 1 (2005), 1–18.
[26] B. Hardian, J. Indulska, and K. Henricksen. 2008. Exposing contextual information for balancing software autonomy
and user control in context-aware systems. In Proceedings of the Workshop on Context-Aware Pervasive Communities:
Infrastructures, Services and Applications.
[27] R. Hari, L. Henriksson, S. Malinen, and L. Parkkonen. 2015. Centrality of social interaction in human brain function.
Neuron 88, 1 (2015), 181–193.
[28] M. Heidegger. 1962. Being and Time. 1927. (Trans. John Macquarrie and Edward Robinson). Harper, New York.
[29] Hitchcock Christopher. Probabilistic Causation. In The Stanford Encyclopedia of Philosophy (Fall 2018 Edition), E. N.
Zalta (Ed.). Retrieved from <https://plato.stanford.edu/archives/fall2018/entries/causation-probabilistic/>.
[30] N. P. Holmes and C. Spence. 2004. The body schema and multisensory representation (s) of peripersonal space. Cognitive Processing 5, 2 (2004), 94–105.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
7:42
D. Kirsh
[31] K. Hornbæk and A. Oulasvirta. 2017. What is interaction? In Proceedings of the 2017 CHI Conference on Human Factors
in Computing Systems. ACM, 5040–5052.
[32] E. Hornecker. 2011. The role of physicality in tangible and embodied interactions. Interactions 18, 2 (2011) 19–23.
[33] E. Hornecker and J. Buur. 2006. Getting a grip on tangible interaction: A framework on physical space and social
interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 437–446.
[34] R. A. Howard. 2012. Dynamic Probabilistic Systems: Markov Models, Vol. 1. Courier Corporation.
[35] L. E. Janlert and E. Stolterman. 2017a. Things That Keep Us Busy: The Elements of Interaction. MIT Press.
[36] L. E. Janlert and E. Stolterman. 2017b. The meaning of interactivity—Some proposals for definitions and measures.
Human–Computer Interaction 32, 3 (2017b), 103–138.
[37] W. Ju and L. Leifer. 2008. The design of implicit interactions: Making interactive systems less obnoxious. Design Issues
24, 3 (2008) 72–84.
[38] J. W. Kable and P. W. Glimcher. 2007. The neural correlates of subjective value during intertemporal choice. Nature
Neuroscience 10, 12 (2007), 1625.
[39] I. Kant. 1993. Grounding for the Metaphysics of Morals: With on a Supposed Right to Lie Because of Philanthropic Concerns. Hackett Publishing.
[40] O. Khatib. 1987. A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal on Robotics and Automation 3, 1 (1987), 43–53.
[41] D. Kirsh. 2013. Embodied cognition and the magical future of interaction design. ACM Transactions on ComputerHuman Interaction 20, 1 (2013), 3.
[42] D. Kirsh. 1995. The intelligent use of space. Artificial Intelligence 73, 1–2 (1995), 31–68.
[43] G. Knoblich, S. Butterfill, and N. Sebanz. 2011. Psychological research on joint action: Theory and data. In Psychology
of Learning and Motivation, Vol. 54. Academic Press, 59–101.
[44] J. Krueger. 2014. Affordances and the musically extended mind. Frontiers in Psychology, 4, 1003.
[45] S. Kühn, M. Brass, and P. Haggard. 2013. Feeling in control: Neural correlates of experience of agency. Cortex 49, 7
(2013), 1935–1942.
[46] B. Latour. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford University Press.
[47] T. Lozano-Perez. 1990. Spatial Planning: A Configuration Space Approach. In Autonomous Robot Vehicles, Ingemar J.
Cox and Gordon T. Wilfong (Eds.). Springer, New York, NY, 259–271.
[48] G. F. Luger. 2005. Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Pearson Education.
[49] W. L. MacDonald. 2002. The Pantheon: Design, Meaning, and Progeny. Harvard University Press.
[50] L. Maister, F. Cardini, G. Zamariola, A. Serino, and M. Tsakiris. 2015. Your place or mine: Shared sensory experiences
elicit a remapping of peripersonal space. Neuropsychologia 70 (2015), 455–461.
[51] J. Malpas. 2018. Place and Experience: A Philosophical Topography. Routledge.
[52] A. Maravita and A. Iriki. 2004. Tools for the body (schema). Trends in Cognitive Sciences 8, 2 (2004), 79–86.
[53] D. Marr. 1976. Early processing of visual information. Philosophical Transactions of the Royal Society London B 275,
942 (1976), 483–519.
[54] D. Marr and E. Hildreth. 1980. Theory of edge detection. Proceedings of the Royal Society B 207, 1167 (1980), 187–217.
[55] M. Martel, L. Cardinali, A. C. Roy, and A. Farnè. 2016. Tool-use: An open window into body representation and its
plasticity. Cognitive Neuropsychology 33, 1–2 (2016), 82–101.
[56] J. J. McDonald, W. A. Teder-Sälejärvi, and S. A. Hillyard. 2000. Involuntary orienting to sound improves visual perception. Nature 407, 6806 (2000), 906.
[57] M. Merleau-Ponty. 1945/1962. Phenomenology of Perception. C. Smith (trans.). Routledge, New York and London.
Originally published in French as Phénoménologie de la Perception.
[58] J. W. Moore, D. Middleton, P. Haggard, and P. C. Fletcher. 2012. Exploring implicit and explicit aspects of sense of
agency. Consciousness and Cognition 21, 4 (2012), 1748–1753.
[59] M. J. Muller, J. Freyne, C. Dugan, D. R. Millen, and J. Thom-Santelli. 2009. Return on contribution (ROC): A metric
for enterprise social software. In Proceedings of the European Conference on Computer-Supported Cooperative Work.
Springer, London, 143–150.
[60] P. Mundy and L. Newell. 2007. Attention, joint attention, and social cognition. Current Directions in Psychological
Science 16, 5 (2007) 269–274.
[61] B. Myers, S. E. Hudson, and R. Pausch. 2000. Past, present, and future of user interface software tools. ACM Transactions on Computer-Human Interaction 7, 1 (2000), 3–28.
[62] H. A. Simon and A. Newell. 1972. Human Problem Solving, Vol. 104, No. 9. Prentice-Hall, Englewood Cliffs, NJ.
[63] A. Nowak, R. R. Vallacher, M. Zochowski, and A. Rychwalska. 2017. Functional synchronization: The emergence of
coordinated activity in human systems. Frontiers in Psychology 8 (2017).
[64] I. Newton, A. Motte, and N. W. Chittenden. 1850. Newton’s Principia: The Mathematical Principles of Natural Philosophy. George P. Putnam.
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.
Do Architects and Designers Think about Interactivity Differently?
7:43
[65] K. N. Ochsner, K. Knierim, D. H. Ludlow, J. Hanelin, T. Ramachandran, G. Glover, and S. C. Mackey. 2004. Reflecting
upon feelings: An fMRI study of neural systems supporting the attribution of emotion to self and other. Journal of
Cognitive Neuroscience 16, 10 (2004), 1746–1772.
[66] E. Pacherie. 2011. The Phenomenology of Joint Action: Self-Agency versus Joint Agency. In Joint attention: New
Developments in Psychology, Philosophy of Mind, and Social Neuroscience, Seemann Axel (Ed.). MIT Press, 2011.
[67] T. Parr and K. J. Friston. 2017. Working memory, attention, and salience in active inference. Scientific Reports 7, 1
(2017), 14678.
[68] J. Pearl and D. Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.
[69] P. B. Pearman, A. Guisan, O. Broennimann, and C. F. Randin. 2008. Niche dynamics in space and time. Trends in
Ecology & Evolution 23, 3 (2008), 149–158.
[70] S. Penny. 2017. Making Sense: Cognition, Computing, Art, and Embodiment. MIT Press.
[71] U. J. Pfeiffer, K. Vogeley, and L. Schilbach. 2013. From gaze cueing to dual eye-tracking: Novel approaches to investigate the neural correlates of gaze in social interaction. Neuroscience & Biobehavioral Reviews 37, 10 (2013), 2516–2528.
[72] F. H. Previc. 1998. The neuropsychology of 3-D space. Psychological Bulletin 124, 2 (1998) 123.
[73] L. J. Raphael, G. J. Borden, and K. S. Harris. 2007. Speech Science Primer: Physiology, Acoustics, and Perception of Speech.
Lippincott Williams & Wilkins.
[74] G. Ryle. 1949. The Concept of Mind. Hutchinson & Co. Ltd., London.
[75] L. Schilbach, A. M. Wohlschlaeger, N. C. Kraemer, A. Newen, N. J. Shah, G. R. Fink, and K. Vogeley. 2006. Being with
virtual others: Neural correlates of social interaction. Neuropsychologia 44, 5 (2006) 718–730.
[76] M. Schulte-Rüther, H. J. Markowitsch, G. R. Fink, and M. Piefke. 2007. Mirror neuron and theory of mind mechanisms
involved in face-to-face interactions: A functional magnetic resonance imaging approach to empathy. Journal of
Cognitive Neuroscience 19, 8 (2007) 1354–1372.
[77] B. Shneiderman. 2010. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Pearson Education, India.
[78] H. A. Simon. 1973. The structure of ill structured problems. Artificial Intelligence 4, 3–4 (1973), 181–201.
[79] M. Synofzik, G. Vosgerau, and A. Newen. 2008. Beyond the comparator model: A multifactorial two-step account of
agency. Consciousness and Cognition 17 (2008), 219–239.
[80] M. Tomasello. 1995. Joint attention as social cognition. In Joint Attention: Its Origins and Role in Development, C.
Moore and P. Dunham (Eds.), 103–130.
[81] G. Vallar and A. Maravita. 2009. Personal and extra-personal spatial perception. In Handbook of Neuroscience for the
Behavioral Sciences, Gary Berntson and John Cacioppo (Eds.). John Wiley and Sons, Inc., 322–336.
[82] T. L. van Zuijen, V. L. Simoens, P. Paavilainen, R. Näätänen, and M. Tervaniemi. 2006. Implicit, intuitive, and explicit
knowledge of abstract regularities in a sound sequence: an event-related brain potential study. Journal of Cognitive
Neuroscience 18, 8 (2006), 1292–1303.
[83] K. Vogeley and G. R. Fink. 2003. Neural correlates of the first-person-perspective. Trends in Cognitive Sciences 7, 1
(2003), 38–42.
[84] B. Waber, J. Magnolfi, and G. Lindsay. 2014. Workspaces That Move People. Harvard Business Review.
[85] L. Wittgenstein. 1953. Philosophical Investigations. Basil Blackwell.
[86] K. Yun, K. Watanabe, and S. Shimojo. 2012. Interpersonal body and neural synchronization as a marker of implicit
social interaction. Scientific Reports 2 (2012), 959.
Received January 2018; revised December 2018; accepted December 2018
ACM Transactions on Computer-Human Interaction, Vol. 26, No. 2, Article 7. Publication date: April 2019.