The Role of Experimentation in Software
The Role of Experimentation in Software
The Role of Experimentation in Software
Abstract
Research proposals need to be validated either by formal proofs or by applying empirical
methods (e.g. controlled experiments). Many authors have pointed out that the level of experimentation in software engineering is not satisfactory. The quantity of experimentation is too
low as a lot of software engineering publication do not contain any empirical validation at all.
But also the quality of software engineering experiments conducted so far is often weak, because
no proper methodological approach is applied and statistical methods are misused.
This paper provides an overview of the status of experimentation in software engineering.
First, the role of experimentation among other types of research is clarified. Several research
paradigms are introduced, a classification of different types of experiments in software engineering is provided, and a comparison with experimentation in other research disciplines is drawn.
Afterwards the current state of experimentation in software engineering is analysed with more
detail. Some discussion points from various researchers about the situation of experimentation are summed up. Their recommendations for improving the state of experimentation are
provided as well as possible future directions of experimentation in software engineering.
1 Introduction
1 Introduction
Any research proposal in computer science needs to be validated properly to check it claims,
improvements and also its applicability in practice. To conduct such a validation either the proposal
has to be proven formally or empirical methods have to be used to gather evidence. The fact
that formal proofs are only seldom possible in software engineering (SE), has lead researchers to
emphasize the role of empirical methods, which are traditionally more common in disciplines like
physics, social sciences, medicine or psychology.
Several different empirical methods are known. Quantitative methods try to measure a certain
effect, while qualitative methods search for the reasons of an observed effect. Examples for quantitative methods are experiments, case studies, field studies, surveys and meta-studies. Examples
for qualitative methods are interviews and group discussions.
The controlled experiment is the method with the highest degree of confidence into the results.
In such an experiment researchers try to control any variable influencing the outcome, except the
variable they want to analyse. For example, this involves testing a product or method with a larger
group of persons, so that the differences in the qualifications of the participants can be reduced by
averaging and thus do not influence the result. A less strict method is the case study, in which a
(possibly artificial) example is analysed and initial interpretations of the observed effects can be
drawn, but the results are normally not generalizable to other examples. Surveys include searching
the literature or passing out questionnaires to experts to gather evidence. Meta-studies analyse
other studies and try to gain knowledge by comparing different approaches. A more detailed
introduction into empirical methods in computer sciences can be found in several books, which
have appeared recently [WRH+ 00, Pre01, JM01].
Many authors have pointed out that the level of experimentation in SE is not satisfactory. The
quantity of experimentation is weak, as a lot of researchers are reluctant to validate their approaches
empirically. The quality of experimentation is also often not sufficient, as statistical methods are
used inappropriately or it is neglected to draw proper conclusions from experiments. Based on
these observations this paper discusses the role of experimentation in SE in more detail.
This paper is organised as follows: The following section 2 contains an overview of general
research approaches and describes how research is conducted in other disciplines. Afterwards, the
focus shifts specifically to experimentation and its application in SE. Section 3 analyses the status
quo of experimentation in SE, while section 4 sums up several common fallacies on this topic.
Section 5 summarizes future direction of experimentation and presents some ideas how to improve
the conduction of empirical studies. Section 6 contains a critical reflection of some of the statements
found in the literature analysed before. Finally, Section 7 concludes the paper.
As survey inquiries have hardly been conducted in SE and not enough background knowledge
has been collected, it is not yet possible to construct theoretical models in SE. Empirical models
are a necessary step towards theoretical models for SE.
An overview of the empirical methods most common in computer sciences and SE (controlled
experiments, case studies, survey, meta-studies) can be found in [Pre01, WRH+ 00].
Basili provides a classification for several kinds of experiments [Bas96], which are identified by:
Type of results
In vivo: in the field (i.e. the software industry) under realistic conditions
In vitro: in the laboratory (i.e. the university) under controlled conditions
Level of control
Controlled experiment: typically in vitro, mostly with students, strong statistically confidence in results, expensive, difficult to control
Quasi-experiment: typically in vivo, with practitioners, qualitative character
Perry et. al. [PPV00] mention that the amount of empirical studies and also their quality is
rising over the last 10-20 years. Empirical Validation is still not a standard part of research papers,
yet a powerful addition. Especially in the testing community empirical studies are quite common.
US funding agencies such as the National Science Foundation (NSF) and the National Academy
10
Tichy also states that intuition and personal experience is not sufficient to claim the applicability of
a product or process in a matured engineering discipline. Several examples are known in computer
science in which intuition falsely favoured an opinion (the need for meetings in code reviews, the
lesser failure probabilities of multi-version programs). It is also dangerous to simply trust wellknown experts and not to demand hard evidence of their claims. A fundamental precondition of
science is, that it is based on healthy scepticism.
11
12
13
6 Critical Reflection
has been made explicit by the original authors. Dependent variables are variables regarding
the focus of the study. An example of a replication with a change of the dependent variables
may be using other metrics or measurements for the effects that are be studied. Varying the
context of an experiment might help in identifying influencing environmental factors. For
example an experiment might be carried out with a group of professionals opposed to a group
of students to analyse the influence of personal experience to the studied effects.
Replications that extend the theory: Replications in this category change a large part
of the process or product under analysis to determine its limit of effectiveness.
6 Critical Reflection
After reviewing the literature about the role of experimentation in SE, some critical remarks about
this topic are summed up in this section.
Tichy tried to refute common comments by researchers, who neglect empirical evaluations of
their work, and tried to encourage scientists to put more emphasis on experimentation. However,
overcoming the organisational effort for proper controlled experiments is still a major problem.
Lots of research is conducted by PhD-students, who simply do not have the means and time to
conduct elaborate experiments. Including the software industry into experimentation as suggested
by Tichy is also very difficult, because practitioners are hard to motivate to put resources and
money on testing unproven new methods, if the direct value for their customers cannot be made
evident. Also, as pointed out by Tichy, it is not necessary to evaluate every small research proposal
with large controlled experiments. But criterias for necessity of experiments are still informal and
hard to determine.
One problem sometimes mentioned by the authors cited here is the researchers inappropriate
knowledge of empirical methods in software engineering. This point is clearly underestimated
by the empirical software engineering community. The researchers knowledge is inappropriate
because universities do not teach empirical methods, and such courses are not part of the standard
curriculums. If experiments are conducted, the methods of experimentation have often only been
learned ad-hoc by the researchers, if they have laid emphasis on a methodological approach at
all. Young researchers are seldomly taught the proper conduction of an experiment and have to
collect experience about it on their own. The facts that books about methods like experiments
or case studies for SE have appeared only recently (in the last 5 years) and that the methods are
still not established well enough also contribute to this situation. Additionally the inappropriate
application of statistical methods in experimentation can also be seen as a result of the missing
courses in the computer science curriculum.
As seen in the literature analysed above, researchers often criticize experiments that are carried
out with students as the subject under analysis, claiming that the results are not transferable to
experienced practitioners. Apart from the organisational difficulties and high effort of including
practitioners, there are other reason to counter this criticism. First, experienced under-graduate
students often become practitioners just a short time later, so that their qualification and performance is not as different to practitioners as it seems. Second, the experience by practitioners might
actually distort the results of an experiment, because specialists might have an unusual advantage
over common developers. Thus, the results obtained with some experienced specialists might not
be generalisable for the average developer.
Another reason why researchers are reluctant to experiment, which has been neglected in the
literature reviewed, is simply the fact that some researchers do not like to experiment and consider it
an inconvenient but necessary task. Most scientists rather want to create new ideas than spending
time on validating old ones. Computer scientists in particular are more interested in technical
problems and how to solve them. If their solutions are intuitively correct, most of them do not
14
7 Conclusions
bother to conduct further evaluation on them. They even might be scared to prove their solutions
empirically wrong. A stronger motivation for experimentation has to be created, possibly by
documenting popular experiments in SE, which revealed unexpected results.
In the future experiments will be conducted with a higher quality and also more experiments
will conducted. But whether a level of experimentation can and needs to be established in SE like
it has been in other disciplines (physics, medicine, psychology) remains doubtful.
7 Conclusions
In this paper the situation of experimentation in SE in the past, present and future has been
analysed. Experimentation is central to the scientific process [Tic98]. This statement is especially
true for SE, because most research proposals cannot be proven formally. It has been discussed that
experimentation is vital for SE to become a matured scientific discipline. Multiple data collection
methods are known for empirical SE, controlled experiments are the method with the highest degree
of confidence in the results. When analysing experimentation in SE, the special characteristics of
the discipline (like the human factor and the high variability) have to be kept in mind.
In the past experimentation in SE has not been sufficient as two studies from the mid-nineties have
shown, although an improving trend can be observed. The reluctance of researchers to experiment
can be refuted with multiple arguments. Researchers in SE should always remember that empirical
validation is an essential part of their work and that their proposals are not valid unless empirical
evidence has been provided. In the future hopefully not only the quantity but also the quality of
experiments can be improved. It should be easier to conduct good empirical studies in SE because
an increasing body of literature about the topic has been published (e.g. [WRH+ 00, Pre01, JM01].
Additionally, replicated experiments and families of studies should help to create larger bodies of
knowledge.
References
[Bas93]
[Bas96]
[BCM+ 92] Basili, V.; Caldiera, G.; McGarry, F.; Pajerski, R.; Page, G.; Waligora, S.:
The software engineering laboratory: an operational software experience factory. In:
ICSE 92: Proceedings of the 14th international conference on Software engineering,
New York, NY, USA: ACM Press, 1992, ISBN 0-89791-504-6, pp. 370381, doi:http:
//doi.acm.org/10.1145/143062.143154
[BSH86]
[BSL99]
Basili, V. R.; Shull, F.; Lanubile, F.: Building Knowledge through Families of
Experiments. In: IEEE Trans. Softw. Eng. 25 (1999), 4, pp. 456473, ISSN 00985589, doi:http://dx.doi.org/10.1109/32.799939
15
References
[FPG94]
Fenton, N.; Pfleeger, S. L.; Glass, R. L.: Science and Substance: A Challenge
to Software Engineers. In: IEEE Softw. 11 (1994), 4, pp. 8695, ISSN 0740-7459,
doi:http://dx.doi.org/10.1109/52.300094
[Gla94]
Glass, R. L.: The Software-Research Crisis. In: IEEE Softw. 11 (1994), 6, pp.
4247, ISSN 0740-7459, doi:http://dx.doi.org/10.1109/52.329400
[JM01]
[JM03]
[KPP+ 02] Kitchenham, B. A.; Pfleeger, S. L.; Pickard, L. M.; Jones, P. W.; Hoaglin,
D. C.; Emam, K. E.; Rosenberg, J.: Preliminary guidelines for empirical research in
software engineering. In: IEEE Trans. Softw. Eng. 28 (2002), 8, pp. 721734, ISSN
0098-5589, doi:http://dx.doi.org/10.1109/TSE.2002.1027796
[LHKS92] Lewis, J. A.; Henry, S. M.; Kafura, D. G.; Schulman, R. S.: On the relationship
between the object-oriented paradigm and software reuse: An empirical investigation.
In: Journal of Object-Oriented Programming 5 (1992), pp. 3541
[LHPT94] Lukowicz, P.; Heinz, E. A.; Prechelt, L.; Tichy, W. F.: Experimental Evaluation in Computer Science: A Quantitative Study. tech. rep., Fakultat f
ur Informatik,
Universitat Karlsruhe, August 1994
[MWB99]
Murphy, G. C.; Walker, R. J.; Baniassad, E. L. A.: Evaluating Emerging Software Development Technologies: Lessons Learned from Assessing Aspect-Oriented Programming. In: IEEE Trans. Softw. Eng. 25 (1999), 4, pp. 438455, ISSN 0098-5589,
doi:http://dx.doi.org/10.1109/32.799936
[Pfl99]
Pfleeger, S. L.: Albert Einstein and Empirical Software Engineering. In: Computer
32 (1999), 10, pp. 3238, ISSN 0018-9162, doi:http://dx.doi.org/10.1109/2.796106
[PK04]
Port, D.; Klappholz, D.: Empirical Research in the Software Engineering Classroom. In: CSEET 04: Proceedings of the 17th Conference on Software Engineering
Education and Training (CSEET04), Washington, DC, USA: IEEE Computer Society, 2004, ISBN 0-7695-2099-5, pp. 132137
[Pop59]
[Pot93]
[PPV00]
Perry, D. E.; Porter, A. A.; Votta, L. G.: Empirical studies of software engineering: a roadmap. In: ICSE - Future of SE Track, 2000, pp. 345355
[Pre96]
Pregibon, D.: Statistical Software Engineering. National Academy of Sciences: Washington D.C., 1996
[Pre01]
[Spi88]
16
References
[Tic98]
Tichy, W. F.: Should Computer Scientists Experiment More? In: IEEE Computer
31 (1998), 5, pp. 3240
[TLPH95] Tichy, W. F.; Lukowicz, P.; Prechelt, L.; Heinz, E. A.: Experimental evaluation
in computer science: a quantitative study. In: J. Syst. Softw. 28 (1995), 1, pp. 918,
ISSN 0164-1212, doi:http://dx.doi.org/10.1016/0164-1212(94)00111-Y
[WBN+ 03] Walker, R. J.; Briand, L. C.; Notkin, D.; Seaman, C. B.; Tichy, W. F.:
Panel: empirical validation: what, why, when, and how. In: ICSE 03: Proceedings
of the 25th International Conference on Software Engineering, Washington, DC, USA:
IEEE Computer Society, 2003, ISBN 0-7695-1877-X, pp. 721722
st, M.; Ohlsson, M.; Regnell, B.; Wesslen,
[WRH+ 00] Wohling, C.; Runeson, P.; Ho
A.: Experimentation in Software Engineering An Introduction. Kluwer Academic
Publishers, 2000
[ZW97]
Zelkowitz, M. V.; Wallace, D. R.: Experimental Validation in Software Engineering. In: Information and Software Technology 39 (1997), pp. 735743
[ZW98]
17