VO Thesis Proposal 082716
VO Thesis Proposal 082716
VO Thesis Proposal 082716
Victor Okhoya
August 2016
Thesis Committee:
Ramesh Krishnamurti, PhD. Professor, CMU School of Architecture
John Haymaker, PhD. Director of Research, Perkins+Will
Aarti Singh, PhD. Associate Professor, CMU Machine Learning Department
Daniel Cardoso Llach, PhD. Assistant Professor, CMU School of Architecture
Contents
1.
ABSTRACT.............................................................................................................................................................. 3
2.
INTRODUCTION ............................................................................................................................................... 4
3.
4.
PURPOSE ................................................................................................................................................................ 7
5.
METHODOLOGIES ........................................................................................................................................15
6.
7.
CASE STUDIES...................................................................................................................................................20
8.
EXPERIMENT ....................................................................................................................................................27
9.
OUTLINE..............................................................................................................................................................28
10.
11.
12.
BIBLIOGRAPHY ...........................................................................................................................................32
1. ABSTRACT
Data science can be defined as a set of fundamental principles that support and guide the principled
extraction of information and knowledge from data (Hosack et al., 2015). It is one of the fastest
growing areas of computer science and is finding applications in several fields like healthcare,
finance and manufacturing. Glassdoor, a popular website where employees rank companies and
their management, rated data scientist as the best job in the United States in 2016 1. However, data
science methods are not yet being vigorously engaged in both architectural research and practice.
This thesis investigates whether data science methods can be applied to solve problems and support
decision making in architectural practice.
The thesis is motivated by three ideas. First, disciplines closely related to architecture are using data
science methods to good effect. In particular, the thesis will review data science examples from
Construction Management and Building Performance Analysis. Second, architectural decision
making in practice can be shown to lack rigor in some circumstances. This thesis will argue that data
science methods can improve the rigor and accuracy of architectural decision making. It will discuss
examples of decision making in architecture that data science methods can impact. Third, data
science methods point to a potential paradigm shift in digital design technology. This is because data
science methods herald the possibility of autonomously intelligent design, and the application of
computational creativity to architectural design.
Applying data science to architecture requires clear definitions of both data science method and
architectural practice. A conceptual framework is developed to help define these terms but also to
provide a context for relating them to the broader research question. The conceptual framework is
formulated based on identified sources of data in large contemporary architectural practices like
Perkins+Will, the definition of architectural services given in the Architectural Institute of America
(AIA) handbook of professional practice, and a description of data science methods derived from
the management consultants Booz Allen Hamilton in their data science field guide.
The thesis will examine four case studies based on projects and research undertaken at Perkins+Will
between 2015 and 2017. The first is a case study of micro-polling as a client engagement strategy and
statistical analysis of micro-polling survey results. The second studies the generation, analysis and
visualization of parametric energy analysis data, and design optimization based on the data analysis.
The third examines the utilization of Autodesk Revit journal data for anomaly detection. The last case
study explores using Revit model data for project performance monitoring.
Finally, a validating experiment will be performed where early stage design is undertaken using both
conventional design methods and then using data science methods. A comparative analysis
methodology will be used to evaluate the impact of the data science methods on the design process.
Conclusions based on the case study analyses and experiment will be drawn.
1
2. INTRODUCTION
Data science is an amalgamation of several disciplines. These include machine learning, statistical
analysis and data visualization. As such, data science is an important emergent discipline. According
to the McKinsey Global Institute digital data is now in every sector, in every economy, and in every
organization that uses digital technology. The ability to store and analyze data has become more
accessible with improvements in computing and storage such as cloud computing. This data can be
used to generate value (Table 1). For example, it is estimated that the potential value of data to the
US health care industry could be as much as $ 300 billion per year (Roberts & Sikes, 2011).
Table 1. Data by the numbers (Roberts & Sikes, 2011).
$600.00
5 billion
30 billion
40%
235 terabytes
15 out of 17
$600 billion
60%
140,000 190,000
1.5 million
Cost of a disk drive that can store all the music in the world.
Mobile phones in use in 2010.
Pieces of content shared on Facebook every month.
Projected growth in data generated per year.
Amount of data collected by the US Library of Congress by April 2011.
Sectors in the US have more data per company than the US Library of Congress.
Potential annual consumer surplus from using personal location data globally.
Potential increase in retailers operating margins possible with big data.
Deep analytical talent positions needed to take full advantage of data in the US.
Data savvy managers needed to take full advantage of data in the US.
Data science is concerned with the collection, preparation, analysis, visualization, management and
preservation of large collections of information (Stanton, 2012). As mentioned, data science is
related to several disciplines including statistics, artificial intelligence (AI), data analytics, business
intelligence and data mining. Given that we are generating large amounts of data through the
internet as well as in industry, data science seeks to transform this data into value.
This is already happening in several disciplines. According to Kaggle, a leading data science website,
industries using data science methods include: healthcare, finance, retail, insurance, construction, life
sciences, hospitality, manufacturing, travel, education and utilities. These industries are using data
science methods for marketing, sales, logistics, risk analysis, customer support and human
resources 2. The question then arises: how does data science impact architectural practice? This thesis
is an investigation of the relationship between data science methods and architectural practice.
3. PROBLEM STATEMENT
The thesis will seek to answer the question: can data science methods be applied to architectural
practice? The justification for this question is based on the authors personal observations and
experiences working for fifteen years in an architectural practice environment in North America. In
2
the authors experience architectural practitioners have not yet embraced data science methods. In
addition, while architectural researchers have begun to investigate data science methods in
architecture they are not yet as engaged as researchers from related disciplines, for example, civil
engineering.
Two pieces of evidence are given for these claims. First, review of thirty projects undertaken in the
last five years by two large North American practices, Kasian Architecture and Interior Designs
(Kasian) 3 and Perkins+Will 4, shows that compared to other recent computational technologies like
Building Information Modeling (BIM) and computational design, very few architectural projects
have applied data science methods in their execution (Table 2). Second, review of an international
architectural computation research publication, the International Journal for Architectural
Computing (IJAC) 5, shows about a quarter as much research into data science methods as a
comparable international civil engineering computation publication, the Journal of Computing in
Civil Engineering (JCCE) 6, over the last three years (Table 3). Taken together these two factors
suggest a gap in architectural research and practice with respect to data science methods.
This gap is at odds with the fact that architects have historically contemplated aspects of data science
like artificial intelligence. Christopher Alexander in Notes on the Synthesis of Form (Alexander, 1964), for
example, tried to apply AI thinking to solve the problem of growing complexity in architecture.
Nicholas Negroponte in The Architecture Machine (Negroponte, 1975) envisioned an architecture
machine with which a designer could have a creative, symbiotic dialogue. Cedric Price, according to
Royston Landau (Landau, 1984) sought in his Generator project to create an intelligent machine that
allowed users to set the terms of their interaction with architecture as opposed to accepting the
imposed will of the designer. 7 Investigating the impact of data science on architectural practice can
be seen in the light of this tradition.
Table 2. Use of data science methods on 30 recent projects at Kasian and Perkins+Will
PROJECT
DESCRIPTION
Okanagan Integrated
Health Care Facility
BC Hydro, Vernon
Ifrane Palace
Joseph Brant Hospital
Industrial warehouse
Luxury residential
Design Build Finance hospital
redevelopment
Qatar Ministry of Interior 150 bed general hospital
General Hospital
Data Science
Methods
Computational
Design
BIM
Porsche Vancouver
Car dealership
Kelowna Downtown
Hotel
RCMP Headquarters,
Kelowna
Sport Check, Vancouver
Hotel
Willow Park
Armory
Vandusen Botanical
Gardens
Shannon Mews
Chinook Hospital
Regional hospital
Academic building
Research centre
Car dealership
Police station
Retail store
Commercial building
K-12 School
True North
Marine Gateway
Ryerson University
Residence
YVR Miller Road
University residence
Light industrial
Table 3. Data science related topics from journals of computing in architecture and civil engineering.
IJAC
JCCE
2013
2014
2015
2013
2014
2015
Data Science
Related Entries
2
3
5
34
47
66
4. PURPOSE
4.1
Goals
The thesis question can be broken down into the following goals:
Define data science and data science methods in the context of architectural practice
Provide a rationale for researchers and practitioners to engage data science methods in
architecture
Establish a conceptual framework for analyzing and describing data science methods in
architectural practice
Analyze case studies of research and projects that have sought to apply data science methods in
architectural practice
Perform an experiment that validates the benefits of data science in architectural practice
compared to conventional design methods
Develop conclusions in response to the thesis question based on the analyses and experiments
undertaken
4.2
Rationale
The thesis will begin by providing literature-based rationale for architectural researchers and
practitioners to vigorously engage data science methods. It will give three arguments why data
science should be of interest to architectural researchers and practitioners. First, the thesis will
demonstrate that data science methods are being used to good effect in disciplines closely related to
architecture. In particular, the thesis will look at machine learning and genetic algorithm methods
being used in construction management, and machine learning methods being used to solve building
performance analysis problems.
Second, the thesis will argue that data science methods represent a more rigorous approach to
analysis and decision making in architecture. Using examples from practice the thesis will argue that
architectural decision making, in certain contexts, is in need of improved rigor and that data science
methods can provide such improvements including the ability to analyze complex problems more
accurately, the ability to improve the quality of decision making, and the ability to visualize problems
and solutions more effectively.
Third, the thesis will claim that architects need to be concerned about data science methods because
data science represents a potential paradigm shift in machine aided human cognition.
Whereas other historical information technologies in architecture like Computer Aided Design
(CAD) and Building Information Modelling (BIM) have seen the machine assist people in decision
making and task performance, data science is ushering in an age of autonomous machine
intelligence. It is likely that creative problem solving by the machine will come to the fore in this
new paradigm and we will move from trying to solve problems ourselves to trying to create
machines that will solve our problems for us. This fundamental shift in the person-machine
relationship makes it important for architects to begin investigating data science methods in design.
The thesis is written primarily from a design software experts perspective. That being said, the
thesis should be of interest to architectural practitioners, architectural researchers as well as
computer scientists. For architectural practitioners it provides an opportunity to improve process
and outcomes while leveraging the mountains of digital data they now routinely generate. For
architectural researchers it contributes to cross-disciplinary research between architecture and
computation. For computer scientists it helps open a unique niche sector for exploration in terms of
finding applications for innovative methods in data science.
4.3
Definitions
In order to effectively discuss the application of data science methods on architectural projects,
some definitions are required. In particular, it is useful to provide definitions of data science
methods as well as of architectural practice as they are discussed here. Data science methods are
defined as activities associated with the data science process (Figure 1). Data science methods are
thus distinct from general data processing activities. Data processing can be defined as the collection
of and manipulation of items of data to produce meaningful information (French, 1996). Data
processing is, as such, broadly defined and includes many data science activities in its definition.
However, it is clear that not all data processing activities are data science methods. For example,
relational database theory is not considered a part of data science according to this view. Architects,
like many other professionals, have been involved with data processing activities but not as much
with data science methods.
In this thesis the data science process is defined as being comprised of four steps: data collection,
data preparation, data analysis and data visualization. Each of these steps has a range of associated
activities (Table 4). It is these activities that we refer to as data science methods.
architectural services identified in Table 5 as well as on the description of data science methods given
in Table 4.
Like most other industries data sources in architecture have increased tremendously in recent years.
This can be attributed to several factors. Faster computers generate data at a faster rate and produce
more of it. Cloud computing frameworks allow for the processing of larger volumes of data. There
are many more data authoring tools and applications than ever before and there are also many more
proficient users of these data authoring tools. All this leads to an explosion of data in todays
architectural practices. Table 4 shows a classification of the common data sources identified at
Perkins+Will.
derived from the sources of data in architecture. To apply the framework, a service is selected from
Table 5 and data sources applicable to that service are identified from Table 4. A description of the
data science process applied to the service, and based on the data sources, is then developed.
The thesis documents an example of the application of the conceptual framework based on a
research project named Building Data Analytics undertaken at Carnegie Mellon University (CMU)
School of Architecture (SOA) by Lasternas and Aziz 8 (Figure 3). The service provided is an energy
monitoring service from the Operations and Maintenance category. The data source is post
occupancy data. Data collection is achieved using internet of things sensors, metering data and utility
data. A sophisticated pipeline involving Microsoft Azure and scripting in Java is used for data
preparation. Data analysis is performed using machine learning algorithms that enable predictive
building performance monitoring. Finally, web-based dashboard interfaces are used for visualization
and reporting.
Four case studies will be discussed within the context of the conceptual framework as shown in
Table 6. Two case studies will involve services from Pre-design and Planning as well as Design and
Construction. The other two case studies will involve BIM data sources. The case studies represent
research projects and real world project applications of data science methods at Perkins+Will
between the years 2015 and 2017.
4.6
Limitations
This thesis will focus on the application of data science methods in the context of research and on
projects at a single large North American practice Perkins+Will. In discussing data science
methods, the thesis will restrict its focus to the list of methods identified as part of the data science
conceptual framework in section 4.5. This list is certainly not a comprehensive list of data science
methods such as is found in The Field Guide to Data Science (Booz Allen Hamilton, 2016), but it does
represent data science methods as observed by the author in practice at Perkins+Will. It is assumed
that these methods, and the conceptual framework, can generalize to other similar architectural
practices. Similarly, in discussing architectural practice, the thesis is restricted to the services
architects provide as defined in the Architectural Handbook of Professional Practice. Other aspects of
architectural practice could be amenable to data science research but this thesis will not consider that
question. Finally, there are other topics within the broad definition of data science such as Big Data
Analytics or Business Intelligence that the thesis will not explicitly address.
Lasternas, B. & Aziz, A. have undertaken the Building Data Analytics project at the CMU, School of Architecture since
2013.
8
11
DATA COLLECTION
(Data Sources)
DATA PREPARATION
Filtering
Cleaning
Querying
Transformation
Normalization
Dimensional Reduction
DATA ANALYSIS
Machine Learning
Bayesian Networks
Statistical Analysis
Genetic Algorithms
Markov Decision Process
Design of Experiments
DATA VISUALIZATION
12
Design Construction
Operations Maintenance
Programming,
Research Services,
Site Analysis,
Strategic Facility Planning,
Zoning Process Assistance.
Accessibility Compliance,
Architectural Acoustics,
Building Design,
Code Compliance,
Construction Documentation
Drawings,
Construction Documentation Specifications,
Construction Management,
Construction Procurement,
Contract Administration,
Design-Build,
Energy Analysis and Design,
Environmental Graphic
Design,
Historic Preservation,
Interior Design,
Lighting Design,
Seismic Analysis and Design,
Space Planning,
Sustainable Building Design.
Commissioning,
Construction Defect Analysis,
Energy Monitoring,
Facility Management,
Indoor Air Quality Consulting,
Move Management,
Post Occupancy Evaluation.
13
Micro-polling
Service Category
Planning Pre-Design
Parametric
Energy
Analysis
BIM Based
Anomaly
Detection
Design Construction
Project
Performance
Monitoring
Service/Data
Source
Strategic
Facilities
Planning
Energy Analysis
Building
Information
Modeling
Building
Information
Modeling
Description
Gathering survey
data during client
engagement and
establish a
framework for
statistical analysis
of survey data
Data Collection
Micro-polling
using Current
technology
Generate large
parametric energy
analysis datasets
and use them for
interactive
visualization and
design
optimization
Cloud based data
generation using
Microsoft Azure
Data
Preparation
Transformation
of micro-poll
survey data for
analysis.
Data Analysis
Establish a
framework for
statistical analysis
of survey data
Data
Visualization
Use Watson
Analytics and MS
Excel for analysis
and visualization
Use Parallel
Coordinates Plots,
Bayesian
Networks and
Pivot Charts with
Slicers for
visualization
14
5. METHODOLOGIES
In discussing research methodology for this thesis Groat & Wangs book Architectural Research
Methods will be referenced 9. Groat & Wang recognize three levels of research activity: systems of
inquiry, research methodology and research methods. They identify systems of inquiry with research
philosophies like positivism, constructivism and critical theories. They identify research
methodologies with research strategies and research methods with research tactics. Based on their
exposition, three principle research methods will be used in this thesis: literature review, case studies
and experiment.
Since data science is a technical discipline the system of inquiry for this thesis is a post-positivist
approach. Post-positivist approaches are an evolution of positivist approaches. Positivist approaches
believe in an objective reality that can be fully understood. Post-positivist approaches are more
nuanced. They believe, rather, in an objective reality that can be known up to a level of probability.
Post-positivism is particularly suited to data science since data science methods are themselves
stochastic. The ontological assumptions of post-positivist systems of inquiry are the objectivity of
reality while the epistemological assumption of post-positivism is that the researcher is independent
of the research, and observes research variables in a dispassionate manner.
While the underlying philosophy of the research will be post-positivist, the specific research
methods will be literature review, case studies and experiment. Literature review will be used to
define data science and provide the rationale for the thesis. Four case studies will be used to
investigate whether architects can apply data science to architectural practice. An experiment will be
conducted to compare the conventional design process to a data science driven design process.
Finally, conclusions will be drawn based on the cases studies and experiments.
6. LITERATURE REVIEW
6.1
Definition and importance of data science
According to ONeil and Schutt (ONeill & Schutt, 2013), data science involves statistics (traditional
analysis), data munging (parsing, scraping and formatting data) as well as visualization (graphs,
interactive tools, etc.). They cite Drew Conways Data Science Venn Diagram (Figure 4) as a pithy
depiction of what data science entails. In the diagram, data science is depicted as an overlap between
mathematical and statistical skills, computer science skills (hacking skills) as well as domain
knowledge (substantive expertise). They explain that data science is emerging as an important
discipline at this point in history because of datafication. They describe datafication as the process of
taking all aspects of life and converting it into data.
Discussion is based on the book Architectural Research Methods. (David Wang, 2002)
15
10
16
Table 7. The Business Impacts of Data Science (Booz Allen Hamilton, 2016)
The Business Impacts of Data Science
17-49% increase in productivity when organizations increase data usability by 10%
11-42% return on assets (ROA) when organizations increase data access by 10%
241% increase in ROI when organizations use big data to improve competitiveness
1000% increase in ROI when deploying analytics across most of the organization, aligning daily operations
with senior management's goals, and incorporating big data
5-6% performance improvement for organizations making data-driven decisions
Deutsch identifies five factors that compel architectural practitioners to leverage data driven
methods. First is technology. The ability to process large quantities of data, access to cloud
computation and less expensive storage have made data driven methods easier to adapt. Second,
people are an important catalyst to change as a new generation embraces computation in all aspects
of life and develops new processes to leverage data driven methods in design. Third, although there
is more data than ever before, it is also easier to access this data than ever before through cloud
frameworks, web portals, company intranets, social media and traditional websites. Fourth, building
performance is becoming more important with increasing global concerns about sustainability.
Building performance analysis methods tend to be heavily data driven. Fifth, architects have begun
to understand that theirs is a fragmented industry with equally fragmented processes to the
detriment of their project delivery methods. They are increasingly looking to technology, including
data driven technology, to help improve their disjointed process.
Deutsch also recognizes five trends leading to the increase of data in the AEC industry. First,
instrumentation is being added to almost everything. The internet of things, as the network of
sensors and instruments is often referred to, is a massive source of real time data. Second,
datafication, described earlier, prescribes the conversion of all aspects of practice to data. Analog
content and processes are everywhere being converted to digital content and data driven processes.
Third, production methods and the demands of the supply chain require construction components
to be represented as data. This abets fabrication, procurement, tracking and installation. Fourth, data
is being relied upon more and more for the validation of designs and design decisions. Fifth, the
generating, analysing and visualizing of data leads to deeper insights into problems and their
potential solutions.
6.3
Data science in related fields
Data science is having an impact in fields closely related to architecture. In construction
management data science methods have been used to predict project success as well as to estimate
the cost at completion for construction projects. In Project success prediction using an evolutionary support
vector machine inference model Cheng et al (Cheng, Wu, & Wu, 2010) describe a model to predict project
success using a tool that integrates a support vector machine (SVM) with a genetic algorithm. In
17
Estimate at completion for construction projects using Evolutionary Gaussian Process Inference Model Cheng et al
(Huang & Cheng, 2011) employ a data driven artificial intelligence method of Estimate at
Completion (EAC) to extract historical data from previous projects, input the data into a Gaussian
Process algorithm for learning and then use Particle Swarm Optimization for optimizing the
process.
In Building Performance Analysis, data science methods have been used to solve occupant modeling
problems as well as improve efficiency of building systems during building operations. In Improving
Efficiency and Reliability of Building Systems using Machine Learning and Automated Online Evaluation (Wu et
al., 2012) the authors present an approach that uses machine learning and automated online
evaluation of historical and real time building data to improve efficiency of building operations. In
An Occupant Behavior Model Based on Artificial Intelligence for Energy Building Simulation Bonte et al (Bonte,
Thellier, Lartigue, & Perles, 2014) propose a new method aimed at reducing the uncertainty created
by oversimplified occupant models. Behavioural adaptation is considered the most important
occupant influence on building energy performance and thermal comfort is one of its main aspects.
The authors believe that statistical analysis is insufficient for the complex task of analyzing thermal
comfort based on human behaviour. They believe that a better model forecasts occupant behaviour
using AI. Their specific approach uses Reinforcement Learning.
6.4
Data science as a rigorous analysis method
Architects make decisions all the time. Many of these decisions are critical to the appropriate and
successful design and construction of their projects. In a world where the need for improved
decision making is increasingly important, how can architects improve the quality and accuracy of
design decision making. This thesis argues that data science can improve decision making and
therefore this should be incentive for architects and researchers to pursue data science methods.
Two examples from the authors experience in practice are discussed: one is the analysis of survey
data during client engagement and the second is the selection of best performing design options
during early stage design.
As part of the client engagement process for a corporate administrative campus project survey data
was gathered from end users with the goal of developing a basis of design document. The survey
sought to capture user experiences, concerns and departmental priorities among others. The
gathered data was analyzed by the architect and the results included in a report shared with the
client. Unfortunately, the analysis methods used were rudimentary with more emphasis placed on
the graphic outputs than on appropriate statistical methods. Indeed, some of the conclusions were
found to be in conflict with the actual data.
The second example involves early stage decision making with respect to energy and daylight
analysis. Architects are becoming more conscious of the need to improve building performance in
line with the global movement for a more sustainable planet. Many architects are performing
building performance analysis early in the design process to obtain guidance for critical early
decisions. However, it is known that some of the key design drivers in early stage design are
18
antagonistic. For example, improving daylight performance often adversely affects thermal
performance and vice-versa. Unfortunately, many architects do not use formal methods to arrive at
optimized decisions in resolving this type of conflict.
Data science methods can provide a data analysis framework for the statistical analysis of survey data
that improves the quality of the analysis and validates design hypotheses by performing statistical
significance tests. This improves the quality of design by using more accurate information for
decision making as well as allowing for the discovery of insights that are not evident without mining
the data. Data science methods can also assist with the optimization process of early stage building
performance analysis by developing objective functions that capture the interactive combinations of
critical design factors like daylighting and thermal comfort. Optimization methods like genetic
algorithms or the design of experiments can then be applied to these objective functions to
maximize desirability. The resulting combinations of inputs that yield optimized objective functions
are important information for the designer to possess and should lead to better performing design
outcomes.
6.5
Data science as a paradigm shift
Paradigm shifts are important because they have deeper impacts than the normal progress of
scientific development through incremental discovery. Paradigm shifts tend to be revolutionary in
character with far reaching implications, not just for the discipline in question, but for the entire
endeavour of human inquiry, and sometimes on human history itself. They are, therefore,
particularly worthy of attention whenever they are perceived to occur. In this thesis the claim is that
data science methods are of particular interest to the architect, as they are to others, because they
represent a potential paradigm shift in computation and human cognition.
In what way does data science represent a paradigm shift? To understand this, we first need to
understand that included under the broad umbrella of data science are several artificial intelligence
methods and models. While data science and AI are not synonymous, there is substantial overlap
with many modern AI methods being explicitly data driven. There are approaches to AI such as
decision theory that are not part of data science, and there are also parts of data science that are
separate from AI. Nonetheless AI methods like machine learning, Bayesian networks and artificial
neural networks are an important part of data science.
Right from its earliest inceptions in the 1950s AI promised to be one of the most impacting
paradigm shifts in the history of human cognition. For the first time humans would have another
sapient being to help with the challenging business of knowledge acquisition. Mankind had devised
machines, many marvelous, to help do work. But they had yet to devise a machine that could help
acquire knowledge or autonomously solve problems. The big promise of early AI was the prospect
of autonomous intelligence.
However, early AI soon proved to be a failure. Many of its promises went undelivered and the
discipline lapsed into what has been described as an AI winter (Russell & Norvig, 2009). Only in the
19
last two decades has there been a resurgence in AI interest and research. This resurgence has been
driven in large part by data driven intelligence in short by data science.
The arrival of true AI will undoubtedly be a monumental achievement, and if we are now in the
birth throes of this event then it is hard to think of any more significant subject for investigation. At
the very least, the advent of intelligent computation will allow machines to solve problems that
humans cannot by virtue of sheer complexity. We are already seeing this beginning to happen with
cognitive computation in healthcare (Kelly, 2015). In architecture, we must also start to ask what
impact data science in general, and AI methods in particular, will have on the design process.
7. CASE STUDIES
7.1
Micro-polling and statistical analysis
Micro-polling is used as a survey technique during client engagement. Micro-polling has two main
benefits for architectural surveys. First, it is mobile-based which means it is possible to capture user
responses at different times of the day and also while they are at different locations within a
building. Time and location makes a difference to user experience and capturing this nuance is
useful for design decision making. Second, micro-polling is typically set up to be repetitive sending
the same questions over and over again over a prescribed survey duration. This increases
participation, fidelity and sample size and makes the end results more accurate.
The micro-polling process allows design teams to collect insights related to the thermal, acoustic and
visual comfort of occupants. In addition, issues like productivity, engagement and wellness can also
be tracked. At Perkins+Will micro-polling involves the use of a propriety cloud based computing
platform (CBCP) called Current (Figure 5). Current lets the design team send out polls to the client in
real time using email or text messages. This allows the users to respond on their mobile devices
anywhere and at any time. Current thus has the advantage over traditional surveys of capturing the
users response at exactly the moment they are experiencing a particular space or design condition.
20
21
In addition, this case study reviews three data visualization strategies for parametric energy analysis
data: Parallel Coordinates Plots, Pivot Charts and Bayesian Networks. Parallel Coordinate Plots are a
graphic representation of data with every instance in a dataset represented by a polyline that
intersects several vertical axes, each axis representing a variable in the design space (Figure 7). Pivot
Charts are a graphic representation of Microsofts pivot tables while slicers allow users to
interactively input values into the pivot chart (Figure 8). Finally, Bayesian Networks are a graphic
reasoning tool that use probability networks to compute the values of interdependent variables
(Figure 9).
22
23
24
Okhoya 11 developed a Revit journal parser dubbed the Revit Journal Reader (Figure 11). This parser
iterated through the journal files at specified network locations and read in their text. The parser
used regular expressions to identify specified text patterns corresponding to a user command and
returned session data associated with the command: Date, User, Project and View. In this way the
Journal Reader obtains a record of all instances of specific commands executed during all the Revit
sessions for all users on a project. The data extracted from the Revit Journal Reader can be exported to
csv formats for further analysis.
Gathering the Revit journal files is, however, not a trivial task as they are typically spread out on user
machines across a network. Hunter 12 developed a Microsoft Power Shell script that can trawl user
machine locations across a network and gather all journal files into a single location. Once this is
done it is easier to perform a journal read on the files.
This case study will begin by briefly describing the method used to collect Revit journal data at
Perkins+Will. It will then focus on the structure of Revit journal data and the techniques used to
parse and extract the data into tabular data formats. It will then describe how parsed data can be
introduced into HTM Studio, an anomaly detection tool, in order to detect anomalies within the data.
Finally, it will describe how anomalies can be visualized in HTM Studio (Figure 12). It will also discuss
how the number of anomalies in a project can be used to flag the file for model management review.
11
12
Revit Journal Reader is a log file parsing application developed by Victor Okhoya in 2009.
Power shell script developed by Mathew Hunter, Site IT Lead, Perkins+Will in 2011.
25
Revit model data extraction exercise conducted by Matt Petermann, Digital Practice Manager, Perkins+Will in 2015.
26
floor volumes, curtain wall areas, ceiling areas, door counts, window counts and stair counts. Next,
an SQL query will be used to extract distinct project names from the project name table and each
target model category will be aggregated by project name. Finally, Vision total hours for each project
name identified from Revit will be used to label the data. This labeled data will be used for machine
learning and predictive classification.
27
The DEAM process will be undertaken on Sprout Space (Figure 15), a research initiative at
Perkins+Will looking at learning environments. A fixed set of design parameters will be provided
with each parameter being constrained to a given range. The designer will be permitted to use
variations of the design parameters that satisfy the constraints to develop their design. They will also
be free to use any tools to perform energy or daylighting simulations using conventional methods.
Separately, the design will be analyzed using parametric energy analysis with data driven design
optimization. The outcomes of both approaches will be analyzed using DEAM in order to evaluate
the impact of the data science process.
28
9.1
9.2
Definitions
Data science defined
Why is data science important?
What does data science involve?
Data science in architecture
Rationale
Data science in related disciplines
Data science methods in construction management
Data science methods for building performance analysis
Data science as a rigorous analysis method
Data science as a paradigm shift
Conceptual Framework
Sources of data in architectural practice
Architectural services
The data science process
9.3
Case Studies
Micro-polling and Statistical Analysis
Service: Planning Pre-design: Strategic Facility Planning
Data Source: Client Engagement Data
Data Collection: Micro-polling survey
Data Preparation: Transformation of survey data
Data Analysis: Statistical Analysis of survey data
Data Visualization: Survey data visualization strategy
Parametric Energy Analysis
Service: Design Construction: Energy Analysis and Design
Data Source: Energy Analysis Data
Data Collection: Cloud based data generation in MS Azure
Data Preparation: Computational Design data generation in Grasshopper
Data Analysis: Multi-objective optimization, Design of Experiments
Data Visualization: Parallel Coordinates Plots, Bayesian Networks, Pivot Charts
29
9.6
Validation
Perkins+Will Research Sprout Space
The Design Exploration Assessment Methodology
The Sprout Space project
Using DEAM on Sprout Space
Conclusions
30
TITLE
DESCRIPTION
Experiment Reports:
DURATION
4 weeks
4 weeks
4 weeks
8 weeks
8 weeks
8 weeks
8 weeks
4 weeks
TASK
31
12. BIBLIOGRAPHY
12.1
Works Cited
Alexander, C. (1964). Notes on the Synthesis of Form. October, 57(2), 216. http://doi.org/10.1086/601876
Bonte, M., Thellier, F., Lartigue, B., & Perles, A. (2014). An occupant behavior model based on artificial
intelligence for energy building simulation. In Proceedings of the 13th International IBPSA Conference BS2013,
Chambery, France.
Booz Allen Hamilton. (2016). The Field Guide to Data Science (2nd ed.). McLean, Virginia: Boos Allen Hamilton.
Cheng, M.-Y., Wu, Y.-W., & Wu, C.-F. (2010). Project success prediction using an evolutionary support
vector machine inference model. Automation in Construction, 19(3), 302307.
Clevenger, C. M., Haymaker, J. R., & Ehrich, A. (2013). Design exploration assessment methodology: testing
the guidance of design processes. Journal of Engineering Design, 24(3), 165184.
http://doi.org/10.1080/09544828.2012.698256
David Wang, L. G. (2002). Architectural Research Methods. Wiley.
Davis, D. (2015). How Big Data is Transforming Architecture. Architect. Retrieved from
http://www.architectmagazine.com/technology/how-big-data-is-transforming-architecture_o
Demkin, J. (Ed.). (2001). The Architects Handbook of Professional Practice (13th ed.). New York: John Wiley &
Sons.
Deutsch, R. (2015). Data-Driven Design and Construction: 25 Strategies for Capturing, Analyzing and Applying Building
Data. Wiley. Retrieved from https://books.google.ca/books?id=uyKsBwAAQBAJ
French, C. (1996). Data Processing and Information Technology (10th ed.). London: Thomson.
Hosack, B., Sagers, G., Provost, F., Fawcett, T., McKinsey & Company, Wang, Y., Demirkan, H. (2015).
Applied doctorates in IT: A case for designing data science graduate programs. Journal of the Midwest
Association for Information Systems, 1(1), 6168. http://doi.org/10.1080/01443610903114527
Huang, C.-C., & Cheng, M.-Y. (2011). Estimate at completion for construction projects using evolutionary
gaussian process inference model. In Multimedia Technology (ICMT), 2011 International Conference on (pp.
44144417).
Karaguzel, O. T., Zhang, R., & Lam, K. P. (2014). Coupling of whole-building energy simulation and multidimensional numerical optimization for minimizing the life cycle costs of office buildings. Building
Simulation, 7(2), 111121. http://doi.org/10.1007/s12273-013-0128-5
Kelly, J. E. (2015). Computing, cognition and the future of knowing. IBM White Paper, 7.
Landau, R. (1984). A Philosophy of Enabling. In The Square Book. London: Architectural Association.
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on
Psychological Science, 4(4), 379383.
Negroponte, N. (1975). The architecture machine. Computer-Aided Design, 7(3), 190195.
http://doi.org/10.1016/0010-4485(75)90009-3
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision
making. Big Data, 1(1), 5159.
Roberts, R. P., & Sikes, J. (2011). McKinsey Global Survey results: A rising role for IT. McKinsey Global Survey
Results, (Exhibit 1), 19.
Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach, 3rd edition. Prentice Hall.
http://doi.org/10.1017/S0269888900007724
Stanton, J. (2012). Data Science. An Introduction, 1157. Retrieved from
http://jsresearch.net/groups/teachdatascience/wiki/welcome/attachments/72f24/DataScienceBook1_
1.pdf\npapers2://publication/uuid/99B2E09F-00FE-448F-8E88-89102110B293
Wu, L., Kaiser, G., Solomon, D., Winter, R., Boulanger, A., & Anderson, R. (2012). Improving efficiency and
reliability of building systems using machine learning and automated online evaluation. In Systems,
Applications and Technology Conference (LISAT), 2012 IEEE Long Island (pp. 16).
32
12.2
References
Alexander, C. (1964). Notes on the Synthesis of Form. October, 57(2), 216. http://doi.org/10.1086/601876
Alfonsi, E., Capolongo, S., & Buffoli, M. (2014). Evidence based design and healthcare: an unconventional
approach to hospital design. Annali Di Igiene, 26(2), 137143.
Aussem, A. (2010). Bayesian networks. Neurocomputing (Vol. 73). http://doi.org/10.1016/j.neucom.2009.11.001
Bell, G., Hey, T., & Szalay, A. (2009). Computer science. Beyond the data deluge. Science (New York, N.Y.),
323(5919), 12971298. http://doi.org/10.1126/science.1170411
Bonte, M., Thellier, F., Lartigue, B., & Perles, A. (2014). An occupant behavior model based on artificial
intelligence for energy building simulation. In Proceedings of the 13th International IBPSA Conference BS2013,
Chambery, France.
Booz Allen Hamilton. (2016). The Field Guide to Data Science (2nd ed.). McLean, Virginia: Boos Allen Hamilton.
Brewka, G. (1996). Artificial intelligencea modern approach by Stuart Russell and Peter Norvig, Prentice Hall. Series in
Artificial Intelligence, Englewood Cliffs, NJ. The Knowledge Engineering Review (Vol. 11).
http://doi.org/10.1017/S0269888900007724
Carbonari, A., Vaccarini, M., & Giretti, A. (2014). Bayesian Networks for Supporting Model Based Predictive
Control of Smart Buildings. http://doi.org/10.5772/58470
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys
(CSUR), 41(September), 158. http://doi.org/10.1145/1541880.1541882
Cheng, M.-Y., Wu, Y.-W., & Wu, C.-F. (2010). Project success prediction using an evolutionary support
vector machine inference model. Automation in Construction, 19(3), 302307.
Clayton, M., Kunz, J., & Fischer, M. (1998). The Charrette Test Method.
Clevenger, C. M., Haymaker, J. R., & Ehrich, A. (2013). Design exploration assessment methodology: testing
the guidance of design processes. Journal of Engineering Design, 24(3), 165184.
http://doi.org/10.1080/09544828.2012.698256
Cochrane, A. L. (1971). Effectiveness and Efficiency: Random reflections on health services. The Nuffield
Provincial Hospitals Trust. http://doi.org/10.1136/bmj.328.7438.529
Corbusier, L. (1986). Towards a new architecture. Design.
David Wang, L. G. (2002). Architectural Research Methods. Wiley.
Davis, D. (2015). How Big Data is Transforming Architecture. Architect. Retrieved from
http://www.architectmagazine.com/technology/how-big-data-is-transforming-architecture_o
Delen, D., & Demirkan, H. (2013). Data, information and analytics as services. Decision Support Systems, 55(1),
359363. http://doi.org/10.1016/j.dss.2012.05.044
Demkin, J. (Ed.). (2001). The Architects Handbook of Professional Practice (13th ed.). New York: John Wiley &
Sons.
Deutsch, R. (2015). Data-Driven Design and Construction: 25 Strategies for Capturing, Analyzing and Applying Building
Data. Wiley. Retrieved from https://books.google.ca/books?id=uyKsBwAAQBAJ
Euclid, Heath, T. L., & Densmore, D. (2002). Euclids Elements: all thirteen books complete in one volume: the Thomas
L. Heath translation. Green Lion Press. Retrieved from
https://books.google.ca/books?id=nc1UAAAAYAAJ
French, C. (1996). Data Processing and Information Technology (10th ed.). London: Thomson.
Friedow, B. (2012). An Evidence Based Design Guide for Interior Designers. University of Nebraska-Lincoln.
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd Edition. Morgan Kaufman.
Heschong Mahone Group. (1999). Daylighting in Schools.
Hosack, B., & Sagers, G. (2015). Applied doctorates in IT: A case for designing data science graduate
programs. Journal of the Midwest Association for Information Systems, 1(1), 6168.
Hosack, B., Sagers, G., Provost, F., Fawcett, T., McKinsey & Company, Wang, Y., Demirkan, H. (2015).
Applied doctorates in IT: A case for designing data science graduate programs. Journal of the Midwest
Association for Information Systems, 1(1), 6168. http://doi.org/10.1080/01443610903114527
Huang, C.-C., & Cheng, M.-Y. (2011). Estimate at completion for construction projects using evolutionary
gaussian process inference model. In Multimedia Technology (ICMT), 2011 International Conference on (pp.
44144417).
Jencks, C. (1977). The language of post-modern architecture. Notes (Vol. 0).
33
Karaguzel, O. T., Zhang, R., & Lam, K. P. (2014). Coupling of whole-building energy simulation and multidimensional numerical optimization for minimizing the life cycle costs of office buildings. Building
Simulation, 7(2), 111121. http://doi.org/10.1007/s12273-013-0128-5
Kelly, J. E. (2015). Computing, cognition and the future of knowing. IBM White Paper, 7.
Korolija, I., Marjanovic-Halburd, L., Zhang, Y., & Hanby, V. I. (2013). UK office buildings archetypal model
as methodological approach in development of regression models for predicting building energy
consumption from heating and cooling demands. Energy and Buildings, 60, 152162.
http://doi.org/10.1016/j.enbuild.2012.12.032
Kricheff, R. (2014). Data Analytics for Corporate Debt Markets: Using Data for Investing, Trading, Capital Markets, and
Portfolio Management. Pearson.
Kuhn, T. S. (1996). The Structure of Scientific Revolution. Economy and Society (Vol. 29).
Landau, R. (1984). A Philosophy of Enabling. In The Square Book. London: Architectural Association.
Liu, C. (2008). A Simulation-Based Experience in Learning Structures of Bayesian Networks to Represent
How Students Learn Composite Concepts. International Journal of Artificial Intelligence in Education, 18, 237
285. Retrieved from http://iospress.metapress.com/content/3074000428p22130/
Lorentz, H. A., Einstein, A., Minkowski, H., Weyl, H., & Sommerfeld, A. (1952). The Principle of Relativity: A
Collection of Original Memoirs on the Special and General Theory of Relativity. Dover. Retrieved from
https://books.google.ca/books?id=S1dmLWLhdqAC
Manning, H. P. (2013). Introductory Non-Euclidean Geometry. Dover Publications. Retrieved from
https://books.google.ca/books?id=EOa_ykDmmLUC
Margaritis, D., Thrun, S., Faloutsos, C., Moore, A. W., & Cooper, G. F. (2003). Learning Bayesian Network
Model Structure from Data. Learning, (May).
Mattmann, C. A. (2013). Computing: A vision for data science. Nature, 493(7433), 473475.
http://doi.org/10.1038/493473a
McKinsey & Company. (2011). Big data: The next frontier for innovation, competition, and productivity.
McKinsey Global Institute, (June), 156. http://doi.org/10.1080/01443610903114527
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on
Psychological Science, 4(4), 379383.
Negroponte, N. (1975). The architecture machine. Computer-Aided Design, 7(3), 190195.
http://doi.org/10.1016/0010-4485(75)90009-3
Newton, I., Motte, A., & Chittenden, N. W. (1850). Newtons Principia: The Mathematical Principles of Natural
Philosophy. Geo. P. Putnam. Retrieved from https://books.google.ca/books?id=N-hHAQAAMAAJ
Nguyen, A.-T., & Reiter, S. (2015). A performance comparison of sensitivity analysis methods for building
energy models. Building Simulation, 8(6), 651664. http://doi.org/10.1007/s12273-015-0245-4
Nightingale, F. (1960). What is and what is not. London: Harrison.
Oh, J., Hwang, J., Smith, S. F., & Koile, K. (2006). Learning from Main Streets. Artificial Intelligence, 325340.
ONeil, C., & Schutt, R. (2013). Doing Data Science. OReilly. Retrieved from
http://proquest.safaribooksonline.com.proxy.library.cmu.edu/book/databases/9781449363871
Petrov, T. P. (n.d.). Application of bayesian believe networks for continuous risk evaluation and decision
support of safety management in mining.
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision
making. Big Data, 1(1), 5159.
Roberts, R. P., & Sikes, J. (2011). McKinsey Global Survey results: A rising role for IT. McKinsey Global Survey
Results, (Exhibit 1), 19.
Russell, A. D., Chiu, C.-Y., & Korde, T. (2009). Visual representation of construction management data.
Automation in Construction, 18(8), 10451062. http://doi.org/10.1016/j.autcon.2009.05.006
Russell, A. D., Chiu, C.-Y., & Korde, T. (2009). Visual representation of construction management data.
Automation in Construction, 18(8), 10451062. http://doi.org/10.1016/j.autcon.2009.05.006
Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach, 3rd edition. Prentice Hall.
http://doi.org/10.1017/S0269888900007724
Stanton, J. (2012). Data Science. An Introduction, 1157. Retrieved from
http://jsresearch.net/groups/teachdatascience/wiki/welcome/attachments/72f24/DataScienceBook1_
34
1.pdf\npapers2://publication/uuid/99B2E09F-00FE-448F-8E88-89102110B293
Studio, H. A. (2011). Energy Modeling: A Guide For The Building Professional. Energy, (May), 15. Retrieved
from http://rechargecolorado.org
Suppes, P. (1960). Axiomatic Set Theory. Dover Publications. Retrieved from
https://books.google.ca/books?id=sxr4LrgJGeAC
Ulrich, R. (1984). View through a window may influence recovery. Science, 224(4647), 224225.
Venturi, R. (1977). Contradiction in Architecture. New York. http://doi.org/10.1080/10464883.2012.714912
Wang, Y. (2009). On cognitive computing. Int. J. Software Sci. Comput. Intell., 1(3), 115.
Wang, Y., Baciu, G., Yao, Y., Kinsner, W., Chan, K., Zhang, B., Zhu, H. (2010). Perspectives on Cognitive
Informatics and Cognitive Computing. International Journal of Cognitive Informatics and Natural Intelligence,
4(1), 129. http://doi.org/10.4018/jcini.2010010101
Wu, L., Kaiser, G., Solomon, D., Winter, R., Boulanger, A., & Anderson, R. (2012). Improving efficiency and
reliability of building systems using machine learning and automated online evaluation. In Systems,
Applications and Technology Conference (LISAT), 2012 IEEE Long Island (pp. 16).
Zhang, Y., & Korolija, I. (2010). Performing complex parametric simulations with jEPlus. SET2010-9th
International Conference on Sustainable . Retrieved from
http://www.iesd.dmu.ac.uk/~yzhang/wiki/lib/exe/fetch.php?media=software:java:set2010-shanghaise102.pdf\nhttp://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Performing+complex
+parametric+simulations+with+jEPlus#0
Ziga-Can, C. L., & Burguillo, J. C. (2014). Advances in Artificial Intelligence -- IBERAMIA 2014: 14th
Ibero-American Conference on AI, Santiago de Chile, Chile, November 24-27, 2014, Proceedings. In L.
C. A. Bazzan & K. Pichara (Eds.), (pp. 698709). Cham: Springer International Publishing.
http://doi.org/10.1007/978-3-319-12027-0_56
35