Business Research Methodology-Tata McGraw-Hill Education (2011) (Z-Lib - Io)
Business Research Methodology-Tata McGraw-Hill Education (2011) (Z-Lib - Io)
Business Research Methodology-Tata McGraw-Hill Education (2011) (Z-Lib - Io)
Methodology
ABOUT THE AUTHORS
Dr T N Srivastava has been a visiting faculty in NMIMS University for about 12 years,
and has been on their board of studies for Decision Sciences. He is a retired Chief
General Manager, Reserve Bank of India. He obtained his doctorate degree in Statistics
under the guidance of Late Padmabhushan Dr V S Huzurbazar. He has authored/edited
five books, and published 35 articles/papers in reputed American and Indian journals
and Economic Times. He has about 35 years of teaching experience, and has taught a
wide spectrum of officers from Banking and Finance, Defence, Corporate World, Indian
Economic and Indian Statistical Services, as also students of management programmes.
He has guided hundreds of participants of various programmes in carrying out studies
involving use of statistical techniques.
T N Srivastava
Visiting Faculty, NMIMS
Mumbai
Shailaja Rego
Chairperson,
Department of Operations and Design Sciences
Mumbai
Copyright © 2011, by Tata McGraw Hill Education Private Limited. No part of this publication may be reproduced or
distributed in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise or stored in a
database or retrieval system without the prior written permission of the publishers. The program listings (if any) may
be entered, stored and executed in a computer system, but they may not be reproduced for publication.
ISBN-13: 978-0-07-015910-5
ISBN-10: 0-07-015910-6
Vice President and Managing Director—McGraw-Hill Education: Asia/Pacific Region: Ajay Shukla
Head—Higher Education Publishing and Marketing: Vibha Mahajan
Publishing Manager—B&E/HSSL: Tapas K Maji
Associate Sponsoring Editor: Piyali Ganguly
Assistant Manager (Editorial Services): Anubha Srivastava
Senior Copy Editor: Sneha Kumari
Senior Production Manager: Manohar Lal
Production Executive: Atul Gupta
Deputy Marketing Manager: Vijay S Jagannathan
Senior Product Specialist: Daisy Sachdeva
General Manager—Production: Rajender P Ghansela
Assistant General Manager—Production: B L Dogra
Information contained in this work has been obtained by Tata McGraw-Hill, from sources believed to be reliable.
However, neither Tata McGraw-Hill nor its authors guarantee the accuracy or completeness of any information
published herein, and neither Tata McGraw-Hill nor its authors shall be responsible for any errors, omissions, or
damages arising out of use of this information. This work is published with the understanding that Tata McGraw-Hill
and its authors are supplying information but are not attempting to render engineering or other professional services.
If such services are required, the assistance of an appropriate professional should be sought.
Typeset at The Composers, 260, C.A. Apt., Paschim Vihar, New Delhi 110 063 and printed at
Lalit Offset Printer, 219, F.I.E., Patpar Ganj, Industrial Area, Delhi 110 092
RAXCRRBZDLQL
Dedicated to
My parents,
wife Nita, our jewels Pankaj, Meeta, Vijay and Poonam
and our grand jewels Simran, Sajay, Saluni and Sumil
—T N Srivastava
The motivation for writing this book is the need that has been felt by a number of MBA students
as well as faculty members for a text that
has simple language and lucid presentation
provides comprehensive coverage of both theory and practice oriented towards business book
environment
provides tools for manual as well as modern computing.
Thus, the book should be complete in all respects to undertake a research study, collect and
analyse the relevant data, and prepare and present a report.
Accordingly, this book is designed to contain a judicious blend of the theory and practice of busi-
ness research and understanding and applications of statistical methodology. It has reader-friendly
illustrations, especially used in statistical packages for statistical analysis. The book should be
self-sufficient for MBA students to understand and apply research methods for carrying out complete
research projects from concepts to conclusions and finally, report writing.
Objectives
We have tried to meet several objectives during creation of this text. They are
1. To create an interest and motivation for studying the subject of research
2. To explain the concepts associated with research, and highlight its importance in a business
environment
3. To provide the skills necessary for conducting research projects, which are an integral part of
the curriculum
4. To provide expertise in the use of requisite statistical techniques manually as well as computer
oriented which are used in research projects.
5. To develop competence and confidence among students in identifying managerial issues that
could be resolved by organising an appropriate research project and subsequent implementa-
tion; in fact, the contents of the book are designed to enable a student in playing a pivotal role
as well as an advisory role in future.
For Whom
This book is intended to serve as a textbook for MBA students who pursue the subject of Business
Research Methodology.
This book is beneficial also for students in the field of education and behavioural sciences and
other professional courses like M Phil.
x Preface
The book might be useful to faculty members for conducting the course on Business Research
Methodology. They would benefit a lot from exclusive teaching aids like powerpoint presentations
given on the website of the book.
Contents
1. Chapter 1 encompasses the relevance and importance of research, in general, and business
research, in particular. It also presents an overview of the entire process of conducting
research called research process in a simple language to provide a general understanding of
various aspects of business research. In fact, it lays the foundation for easy learning of various
concepts and thorough understanding of topics in subsequent chapters.
2. Chapter 2 clarifies the concepts of various topics viz. constructs and concepts, variables, and
deductive and inductive logic, that are an integral part of a research project. It also describes
quantitative and qualitative research and case study method of research.
Two innovative features of this chapter are the discussion on two very important topics viz.
Creativity and PERT/CPM that are so relevant in conducting a research study which are
generally not covered in the books on BRM. While creativity is integrated with any research
or management activity or for that matter with any activity in life or work environment, PERT/
CPM is essential to manage any activity or project including conducting a research study.
Contrary to the wrong notion that it is a complicated technique useful only for big projects,
it is a simple technique useful for any work right from preparing a cup of tea to conducting a
research study to building a stadium or even organising Olympic games.
3. Chapter 3 describes various topics that are relevant for the actual conduct of research in an or-
ganisation. It provides an exhaustive and comprehensive view of the various steps of a research
process, from problem identification to hypothesis development, i.e. developing the statement
to be tested for acceptance or rejection. This facilitates conduct of further study involving col-
lection of data, carrying out relevant analysis, etc. and ultimately resolving the problem that
would have necessitated the conduct of research study.
It also provides guidelines to students for carrying out research projects that are an integral part
of their curriculum.
4. Chapter 4 provides a comprehensive knowledge about the various experimental and research
designs, and their applications in business environment, in general, and conducting a research
study, in particular.
5. Chapter 5 provides a comprehensive understanding of the four types of measurement scales viz.
Nominal, Ordinal, Interval and Ratio. It is intended to equip with a basic toolkit of various
comparative and non-comparative scales for research.
6. Chapter 6 explains primary and secondary types of data with respective advantages and limita-
tions. It describes the various sources of primary and secondary data as also the various methods
of collecting such data, including guidance for web-based searches.
7. Chapter 7 explains the process of collecting primary and secondary data. It provides requisite
knowledge about various aspects associated with designing a questionnaire for collection of
primary data. It also provides guidance in collecting/recording data from secondary sources
and also outlines the steps for preparation of data.
8. Chapters 8 to 13 relate to statistical topics that are covered, to a varying degree, in various
institutions, prior to the discussion on the subject of Business Research Methodology. These
Preface xi
topics have been discussed in the book with an orientation towards conducting business
research studies in various areas/fields.
9. Chapter 14 contains the Multivariate Analysis techniques which are included in the syllabus of
most of the management institutes. However, different institutes prescribe different subsets of
these techniques, out of the entire set covered in the book. Incidentally, all these techniques
are highly useful in designing and marketing of products and services. Many of the research
studies remain incomplete without the use of these techniques, mainly due to unawareness or
expertise in use of SPSS software package for arriving at ultimate conclusions. The chapter
tries to fulfil this need.
10. Chapter 15 provides guidelines for preparing a report of the research study.
11. Chapter 16 deals with the ethical issues associated with various levels of hierarchy involved
with a research project, and with various stages of a research process.
12. Appendix I describes the Indicative Topics for Business Research.
13. Appendix 2 describes the role of EXCEL in statistical calculations. Templates have been pro-
vided that make the drudgery of calculations very simple as one is just required only to input
the data in the corresponding template and the computer automatically calculates and displays
the result on the screen. One is not even required to remember the statistical formulas as these
have been in-built within the templates.
14. Appendix 3 describes the role of SPSS in statistical calculation packages in business
research.
Exclusive Features
The contents of the proposed book are perceived to be ideal blend of theory and practice of
Business Research and Research Methodology not only in size but also in depth. The exclusive
features are indicated below:
Business Research in an Organisation
Relevance of Research for MBA Students
Guide to Conducting Research Projects by Students
Research in Management Institutions Some Thoughts
Dissemination of Research
Indicative Topics for Business Research
Time Scheduling PERT and CPM
Creativity and Research in an Organisation
Research at Corporate and Sectoral Levels
Guide for Conducting Good Business Research
A Consultant’s Approach to Problem-solving
Cross-sectional Studies
Longitudinal Studies
Simulation
Use of Graphs as Management Tools
EXCEL
Relevant Details of the Package
Templates for Various Statistical Formulas
Using Templates for Solving Exercises
xii Preface
SPSS
Relevant Details of the Package
Choice of Technique and Inputting Data
Interpretation of Output Generated by the Package
CD
Examples
Data Sets
Excel Templates (to facilitate numerical calculations)
Cases
Faculty Resource
Power Point Presentations
Additional examples to explain the concepts/topics
Solutions to questions/ problems
Acknowledgements
We are grateful to Dr Rajan Saxena, Vice Chancellor, NMIMS Deemed University, for the permis-
sion and encouragement to Shailaja Rego for writing the book, as also for writing the ‘Foreword’
to the book. We are also thankful to Prof Kavita Laghate of Bajaj Institute of Management Studies
for encouraging and supporting the project by critical evaluation of the manuscript as also providing
some illustrations and cases. We also express our thanks to Mr Leslie Rego and Mr Pankaj Srivastava
for going through some parts of the manuscript and offering their valuable suggestions.
The reputed magazines like Business Today, Businessworld, Business India and India Today, and
newspapers like Economic Times, Hindustan Times and Times of India publish live data about indi-
viduals and companies in well-researched articles. While some data is analysed by them, the other
data is published just for dissemination among the readers. We have used their data to indicate the
use of statistics in analysing live data that could arise in any organisation. However, while doing so,
we have restricted ourselves only to analysing data without making any comments on the companies
and the individuals. We would like to add that while we have taken due care to avoid any errors
or omissions in recording the names of companies and individuals and the data relating to them, if
any errors or omissions have occurred, these are totally inadvertent, and we extend our unsolicited
apology for the same. The data and analysis provided by us could even stimulate thinking about
collecting similar data for facilitating decision-making in the work environment.
We express our sincere appreciation of the efforts and contribution of Rohit Kumar Singh,
Sarbani Choudhuri, Rohit Jain, students of MBA at NMIMS Deemed University, Mumbai, towards
critically going through the various parts of the manuscript and their valuable advise. We are thank-
ful to the students Akshay Cotha, Udit Sharma, Afreen Firdaus, Sajan John, Annu Asthana, Kinshuk
Awasthi, Saurav Kumar, Gautam M, Anushital, Suman, Ramuni, Manu Priya, Mittal Modi, Devang
Shah, Nirbhay Singhal, Sumit, Ahlawat, Kamal Kant Kaushik, Surya Sridhar A, Rishav Garg, Na-
mit Saigal, and Sachchida Anand Sudhansu for helping in the case studies and sharing their project
data. We are also thankful to R Vishwanathan and Prateek Gala, MBA students, for helping in the
conduct of surveys.
We are also thankful to Mrs Carol Lobo for the valuable support provided by her.
We conclude the acknowledgements by recording our grateful thanks to the publishers and their
dedicated team of officials viz. Ms Vibha Mahajan, for approval as well as generous support for
Preface xiii
the project and Mr Tapas K Maji, for inspiration right from the conceptual stage. We are especially
thankful to Ms Piyali Ganguly, for her guidance, in general, and editorial support, in particular. Her
encouraging support, guidance, patience and prompt response to our queries have been largely respon-
sible for this book being published in the stipulated time. Mr Manohar Lal, Ms Anubha Srivastava, and
Ms Sneha Kumari have contributed significantly by their untiring efforts in bringing out this
book.
We would also like to acknowledge the following reviewers for their invaluable feedback:
1. Sanjiwani Kumar, K.J.Somaiya College of Management, Maharashtra
2. Durga Surekha, SIES College of Management, Maharashtra
3. Vikas Nath, Jaipuria Institute of Management, Uttar Pradesh
4. S.Sivaiah, Malla Reddy PG College, Andhra Pradesh
5. K.Bharathi, St.Peters Engg College, Andhra Pradesh
6. Sourabh Bishnoi, Birla Institute of Management, Uttar Pradesh
7. Nisha Agarwal, Institute of Foreign Trade & Management, Uttar Pradesh
8. Sunita Tanwar, Ansal Institute of Technology, Haryana
9. Sanjeev Sharma, Apeejay School of Management, New Delhi
10. Susan Das, Asian School of Business Management, Orissa
11. RN Subudhi, KIIT School of Management, Orissa
12. Vijaya Bandyopadhyaya, KIIT School of Management, Orissa
13. P.Tony Joseph, Hindustan University, Tamilnadu
14. Tanmoy De, Institute of Management & Information Science, Orissa
T N SRIVASTAVA
SHAILAJA REGO
Contents
LEARNING OBJECTIVES
The main purpose of this chapter is to provide basic understanding of the objective and motivation for
conducting research. The concepts of research as also various types of researches are described in
this chapter.
1.2 Business Research Methodology
The criteria, characteristics and challenges of conducting any research are indicated. All the steps
for conducting a research study constitute what is called a ‘research process’. This chapter provides
an overview of the complete process of a research study, in a simple language. This is to lay the
foundation for easy learning of various concepts and thorough understanding of topics in subsequent
chapters.
A complete guide to students, especially those students pursuing MBA is provided to instill in them
the confidence and to develop competence for understanding a research study.
Relevance
Mr. Raj Kumar retired gracefully, earning a great deal of respect from his juniors as well as
seniors, as evidenced by the grand farewell function which he just attended.
While returning home with a cheque of Rs. 50,00,000 which he received as payment of PF,
gratuity etc., he was thinking about his wife who was not mentally prepared like him for this
unavoidable day. He reached home and took his wife out for dinner. On the way to the restau-
rant, at his wife’s favourite jewellery shop, he purchased a diamond earring to boost her morale.
While the day thus ended peacefully, the next morning Mr Kumar himself started pondering
about the ways to manage his future through financial planning. Even though, retirement was
a reality of life, he had never given a serious thought to it.
He started thinking of various investments options to ensure monthly inflow of income
that would be sufficient to live a reasonable quality of life. He started thinking about fixed
deposits of banks, post office and corporate bodies, monthly income schemes, mutual funds,
etc. One simple investment in MIS of a bank came to his mind; if he invested all Rs. 50 lakh
there, he would get a monthly income of about Rs. 33,500 (assuming 8% interest). While this
amount was sufficient at present, he prudently visualised that at annual inflation of about 7%,
this amount of equivalent to Rs. 33,500 will be Rs. 19546.93 after 10 years and Rs 9054.01
after 20 years. He figured out that this was not the ideal investment, and had to collect detailed
information about all types of investments, enumerating their advantages and disadvantages.
Since he could not invest on trial and error basis, he had to make a thorough study of all the
options. While some investments assured interest/amount on maturity or on yearly basis, other
investments like mutual funds indicated higher expected returns.
He also had to be financially prepared for contingencies like sickness, recession, etc.
He had to finalise a judicious mix of various types of deposits and investments in mutual
funds, etc. so as to ensure a regular inflow of income to assure a financially secure life for him
and his wife.
How, the subject of Business Research Mythology (BRM) can help in providing solution
to such problem is indicated later, in Section 1.7.
Name Research
Abraham Harold Maslow Hierarchy of Needs (Physiological Safety, Social, Esteem, and Self-
Actualisation)
Albert Einstein Relativity Theory
Albert Humprey SWOT Analysis (Strength, Weakness, Opportunities, Threats)
Alex Osborn Brainstorming
Alfred Nobel Dynamite. Received Nobel Prize in the areas of chemistry, physics,
literature, international peace and medicine
Archimedes The Archimedes Principle
Arya Bhatta Digit ‘0’
Bill Smith at Motorola Juran at GE Six Sigma Model
Chandrasekhara Venkata Raman Raman Effect (Scattering of Light)
Charles Darwin Darwin's Theory of Evolution
CK Prahalad Fortune at the Bottom of the Pyramid Strategy
CK Prahalad Bottom of the Pyramid Strategy
EJ McCarthy Four ‘Ps’ (Product, Price, Place, Promotion) classification Model in
Marketing
Galileo Galilei Used telescope to prove that the earth revolves around the Sun
CK Prahalad and Gary Hamel Core Competence Model
Hamel and Prahalad Core Competence Model
Hammer and Champy Business Process Reengineering (BPR)
Herman Ebbinghaus Learning Curve
IBM Smart Planet Project
(Contd)
1.4 Business Research Methodology
(Contd)
James Maxwell Electromagnetic Theory
Japanese Business Companies Kaizen Philosophy of Continuous Incremental Improvements
John Logie Baird Television
John Nash Game Theory
Taiichi Ohno, Toyota Corporation Just-In-Time Business
Marie Curie Theory of radioactivity (the first person honoured with two Nobel
Prizes)
Michael Porter Competition and Company strategy (Five Competitive Forces Frame-
work, Value Chain), Competition and Economic Development (Clusters,
Diamond Model), and Competition and Societal Issues
Myron Samuel Scholes Black–Scholes Equation (Valuing Derivatives/Options)
Norman Borlaug Green Revolution in India (Wheat)
Peter Drucker Management by Objectives, Knowledge Work Productivity
Raymond Vernon Product Life Cycle
Sir Isaac Newton Newton's Laws of Motion
Stephen R. Covey Seven Habits of Highly Effective People
Taiichi Ohno, Toyota Corporation Lean Manufacturing (Just-In-Time Supply Chain)
Thomas Alva Edison Motion Picture Camera and Electric Bulb
Wilhelm Conrad Rontgen X-rays
Wright Brothers, Orville and Wilbur Airplane
We believe, in the broadest sense, research is simply the process of finding solutions to a
problem after a thorough study and analysis of the situation.
If we take the above concepts to a logical conclusion, all of those who are engaged in finding out
new items or/and better ways of doing things could be said to be doing research.
Research Perceptions
“Deep dive into a subject; thirst for knowledge”
“Comprehensively analysing data to look for trends, themes, revelations”
“Curiosity and how it fuels the passion to discover or uncover truths, myths, trends, themes,
etc.”
“Piles and Piles of data–can get too mired in details”
“Needs someone with a strong ability to synthesise all the information and data to make it
meaningful, cogent and relevant (relevance can often gets overlooked)”
“Scientific research, labs, focus groups, numbers, statistics”
“Ways of working out market research, production planning, budgeting, manpower forecast-
ing”
“Looking for data, and/or information to provide meaningful insights towards a business
problem”
“Ph.D.”
“Ability to deliver insight”
“Doing analysis and background checks, examining options and alternatives in order to arrive
at best choice for a business challenge/requirement”
“A person in a white coat working in a lab”
“Something for the betterment of civilisation”
(iv) To continuously improve the effectiveness of present systems and procedures in any field. For
example, compensation, recruitment and retention policies.
(v) Test or challenge existing beliefs, notions, etc. which have not been empirically proved so far,
with flux of time and therefore need to be tested again for relevance in the changed context/
environment. For example, relationship between intelligence and creativity.
(vi) Explore into new areas that might have become relevant or even might become relevant in the
near future. For example, alternative sources of energy to reduce carbon emissions.
(vii) Anlayse the past data for discovering trends, pat-
terns and relationships. For example, business
performance, prices of stock, oil, etc.
(viii) To expand the sphere of knowledge (K), and in-
crease the horizon of vision (V). However, inciden-
tally this simultaneously increases the realisation
of ignorance (I). This phenomenon is explained
through the following diagram:
While standing at the top of the knowledge sphere, we
cannot see the area below the surface of the sphere, but
we can see farther towards the horizon.
From the above diagram, we note that as the sphere
of knowledge grows, the vision increases but simultane-
ously, realisation of ignorance increases. It is perhaps
because of this reason that the great philosophers and
intellectuals are very humble as they realise that they
do not know a lot!
(ix) Economic
Economic factors also play an important role in conducting a research study. For example, high
prices of petrol led to the development of cars with diesel engines. Higher value of dollar led to
product substitution for imported items. Volatility in stocks and foreign exchange markets led to
research in management of risk in financial investments.
(x) Infrastructure
Infrastructure is also important for conducting a research study. For example, increasing prices
of residential property and higher home loan rates led to designing and development of low-cost
housing projects.
(xi) Operations/Process Driven
With discerning customers opting for higher quality products and services, companies started
implementing six sigma system in operations. Research is definitely needed to achieve the desired
objectives in this direction. As regards the process, one illustration is from mechanical harvesting of
tomato crops in USA. When it was observed that many tomatoes got spoiled due to their thin skin,
the research was conducted to increase the thickness of tomato skin, thus resulting in considerably
lesser percentage of spoiled tomatoes.
(xii) Coping with Changes
It is said that
“The world is moving so fast that if you just want to remain where you are, you have to run”.
“The only thing that is constant in the world is the CHANGE”.
We would like to add that:
The only thing that is certain in the world is that Rate of Change will keep on
increasing.
The changes could occur in any one or all the factors described above.
The above factors are depicted in Fig. 1.1 on next page.
A research is also classified as Primary and Secondary research depending on the type of data
used. If the data is primary, i.e. collected for the study, it is called primary research; but if the data
is copied/recorded from the published sources/Internet, etc., the research based on such data is called
secondary research. Incidentally, market research involves both primary and secondary research, as
indicated below:
Primary Research (based on) Secondary Research (based on)
Data collected by government agencies like data on Reports/Publications analysing and evaluating data col-
prices, production, bank deposits, etc. lected by others
Data collected through consumer surveys, opinion polls, Tabular, pictorial, graphical presentations based on
interviews, etc. primary data
Financial data published in annual reports of compa- Compilation of data from a number of sources like done
nies by Centre for Monitoring Indian Economy, and then
publishing it
Original reports published by the collectors of data like Articles interviewing those who conducted primary
Annual Survey of Industries by the Government, Annual research
Report by Reserve Bank of India
The details of collecting primary and secondary data are explained in Chapters 6 and 7.
Further, a research could be termed as Quantitative or Qualitative depending on the topic/as-
pect of research and type of data collected for analysis. If the data involves quantitative aspects
like measurement and counting, the research is termed as quantitative. However, if the research
Introduction—Scope and Applications of Research 1.11
involves study of behaviour, attitude, etc., it is termed as qualitative. These are discussed, in detail,
in Chapter 2.
The failure rate is highest in the beginning, and decreases with time. Thereafter, it is constant for
quite some time, and then it starts rising.
Many applied fields like electronics, medicine, pharmaceuticals and biotechnology, etc. owe their
growth due to basic research in basic sciences such as physics, chemistry, etc.
*
Incidentally, the failure rate of physical items and living ones increases with time or age.
1.12 Business Research Methodology
Yet another example is the frequency of occurrence of different alphabets in the English text.
All the alphabets viz. a, b, c …………, y, z are not used equally frequently in the English language.
The frequency of their occurrence has been found to be as follows:
It may be noted that the alphabet ‘e’ is used most frequently to the extent of about 13% and the
alphabet ‘z’ is used least with a frequency of only about 0.1%.
F.B. Morse used this empirical analysis in designing codes for various alphabets for trans-
mitting of messages. These codes comprise two symbols viz. dots (‘.’) and dashes (‘–’). The
assignment of codes to different alphabets was based on the criteria that more frequently an
1.14 Business Research Methodology
alphabet is used, the lesser should be the time for its transmission. That is how, he assigned
the code ‘.’ to the letter ‘e’, the most frequently used alphabet so that it would take least time
for transmission. The present-day binary codes comprising of two digits viz. zeros (0’s) and
ones (1’s), ideal for transmission through computer media, are also designed on the above
criteria.
Anyone engaged in printing or manufacturing or using various alphabets for any other purpose,
also uses the above information, as each of the alphabets need not be produced or used in equal
quantity.
(Contd)
Assessment of Demand for the Product or a Service
Customer Profiling
Customer Relationship Management (CRM)
Finance and Banking Risk Management
Credit Rating
Evaluation of Investment
Quality of Assets
Brand Evaluation
Equity Research
Derivatives/Futures and Options
Mergers and Acquisitions
Operations Operations Research
Six Sigma—Controlling and Improving Production Process and Quality
Technology Absorption
Supply Chain Management
HRD Recruitment and Retention Policies
Performance Appraisal and Reward System
Training
IT Website Management
Network Management
Decision Support System
Business Intelligence and Data Mining
Retail Management Identifying Customer Buying Behaviour: Preferences and Patterns
Insurance Designing Policies of Various Types
Impact of Different Factors on Health and Life
Telecom Criteria for Selection of phone and service provider among different
age/income/professional groups
A detailed list of indicative topics for business research is given at the end of the book.
For example, a shirt manufacturer sponsored a survey to find the percentage of executives pur-
chasing different sizes of a shirt. The interviewer (researcher) was asked to record the sizes 36, 38,
40, 42, 44 as indicated by executives. The exploratory survey indicated that quite a good percent-
age of executives indicated the size as 39 and 41 (which were either imported or tailor-made). This
information led to change the questionnaire to include these options.
In a telecom survey, respondents were required to indicate the criteria for selecting a service pro-
vider. A number of categories of users were listed, however, while conducting an exploratory survey
it was observed that there were about 5% respondents (selected by criteria of contacting every 20th
person entering the mall) who were retired persons. Thereafter, the category of user was modified
to include retired category.
Similarly, exploratory analysis is considered mandatory while designing Management Informa-
tion System (MIS) for an organisation.
One interesting feature of exploratory research is the flexibility in selection of units for recording
information. For example, if the information/data is to be collected though interviewing persons
who could be either experts in the area or who could be subjects/participants for whom the research
is being conducted. In such cases, the selection of personnel is somewhat flexible and not rigid. It
depends on the researcher’s perception as to who would be able to provide the relevant and requisite
information.
In the context of exploratory research involving use of quantitative data, the following features
may be studied:
Range of the variables being studied.
Variability in the data – helps in deciding sample size. More variability requires more sample
size.
Proportion of units having a particular characteristic. Lower or higher proportion than 0.5 in-
dicates need for bigger sample size. As indicated in Chapter 10, the sample size required for
estimating proportion is 0.5. Thus, if in the exploratory sample the proportion of units having
particular characteristic is 0.5, the sample size required will be maximum. It goes on decreas-
ing as the observed sample proportion moves away from 0.5 on either side.
Proportion of units in various categories/groups (significantly higher or lower proportion may
lead to redefining categories/groups.
For example: In the credit card survey of a bank the variable for the average amount they
would be ready to spend (per month) if the cash-back of 5% is offered to them, was cat-
egorical, variables ranging from: less than 5000, between 5000 and 10000, between 10000
and 20000, between 20000 and 50000 and >50000. For conducting exploratory research, a
preliminary sample of cardholders was selected. It was found during the exploratory analysis
that the category less than 5000 had 40% of the responses, and >50000 had no responses. This
prompted the researcher to redesign the categories as <2500, 2500 to 5000, 5000 to 10000 and
>10000.
Trend Analysis/Pattern – For example, plotting of sales of selected items on various days of
a week may indicate the trend and pattern of sales.
Association – Study at a garment store may be used to discover association between colour of
shirts and age group of customers.
1.18 Business Research Methodology
Distribution – Return of equity of a group of companies, stock indices like Sensex, Nifty – Sym-
metrical or Skewed
Study the impact of proposed changes
Study the factors contributing to substantial decrease or increase in business
We may associate three issues with an exploratory research viz.
Why it is conducted?
When it is conducted?
How it is conducted?
These are explained in the following table.
Why Formulate the precise problem for more detailed or deeper study
To arrive at some theory or hypotheses to be tested in detailed study
Discovering
– Ideas
– Insights
– Trends
– Patterns
Designing Questionnaire/Schedule
Designing MIS
When There is no prior research done in the similar field or the field is ever changing
There are limitations of resources like Time and Money
How Methodology of Conducting
Qualitative
Quantitative
Primary
Secondary
had a work experience of 2.5 years, at the time of joining MBA as compared to 0.5 years for 2009
batch. Further, the employment scenario in 2009 was not as good as in 2008.
* The blueprint of appropriate study and methodology for collection of data, measurement of data elements/
units, and analysis of data.
7. Data Analysis: Calculation of fixed income and estimation of income associated with invest-
ments in equity, mutual fund and real estate, etc.
8. Interpreting and Conclusion: In the form of various options, their liquidity and returns with
risk assessment.
9. Final Decision by Raj Kumar
(b) This enables another researcher to repeat or replicate the research without any
ambiguity.
(iii) Establishes credibility of research and its conclusions.
(iv) All the assumptions are stated and the sources of data indicated.
It enables readers to get further insights by referring to the sources and assess the applicability
of the conclusions.
(v) Scope and limitations are clearly brought out. It enables to draw the conclusions in a realistic
manner.
Challenges of Research Research faces challenges at every stage as illustrated below:
The first challenge is to clearly define the objective that can be translated into hypotheses.
Many times, it is quite difficult to find the cause of the problem from a symptom. As a doctor
tries to find the root cause of the symptom, a researcher should target to identify the root cause
and formulate the right hypotheses.
The literature review has following challenges:
To identify which literature is relevant for the study;
How much to review (when to stop and proceed for next stage).
The process of developing hypotheses faces following challenges:
Formulating the right hypothesis that suits the objective.
At data collection stage, the major challenges are:
Availability of adequate, reliable and relevant data;
Maintaining the accuracy with limited resources;
Not changing the methodology to suit some vested interests or using a new methodology
because it is ‘comfortable’ though not applicable;
To have the courage to explain or accept deviations from plan, and own the responsibil-
ity.
Choosing appropriate sampling design.
At analysis stage, the challenges are:
Using right tools and techniques;
Considering and testing if assumptions of the tools/techniques are satisfied. For example,
if one uses regression analysis, one has to test the assumptions of linearity and normality
of the independent and dependent variables (refer to Chapter 10).
At conclusions/interpretation stage, the challenges are:
To avoid temptation to suppress or exaggerate the conclusions for making the study
‘sensational’;
Not to manipulate the results to serve one’s purpose;
Withstand pressure to change conclusions to suit some vested interests.
Even though the above points are guiding principles for any research, in practice, a research
study carried out for or within an organisation is as good as it is understood by decision-makers
and implemented to get desired results. The researchers may like to keep this in mind without
compromising on ethical standards described in Chapter 16.
Introduction—Scope and Applications of Research 1.25
Relevance of MBA
While Late Mr. Dhirubhai Ambani was having chat with the MBA students, after delivering
convocation address, he was asked as to why doing MBA was necessary when he could set
up such a big business empire without MBA degree. Mr. Ambani is reported to have replied,
in all humbleness, that he was blessed by the GOD with business acumen and visionary ap-
proach, etc., and therefore he could succeed in life. If they (students) were also so blessed,
they need not go far MBA; but if they were not, then doing MBA will certainly help them in
managing their pursuits better.
In addition to successful research carried out by the students, there are certain cases of unsuc-
cessful research projects due to various factors such as the use of inappropriate methodology,
non-availability of desired data, collection of data not amenable to relevant software package, etc.
Compilation of even such unsuccessful experiences could provide a ‘learning’ experience for the
students. This aspect is illustrated with a research project undertaken by a group of students at a
management institute.
S U M M A RY
The genesis of any research lies in the urge to improve/innovate—either on one’s own or necessitated
by outside environment. Similar is the case with an organisation. Various motivating factors that
cause an individual or an organisation to conduct a research are described, in detail. These factors
are initiated on one’s own initiative or are caused or sometimes even ‘forced’ by external environ-
ment that is beyond control.
Research is of various types like basic, applied, business, exploratory, descriptive, causal and
normative. Various kinds of research are used in studying various aspects of the functioning of a
business organisation or an entity.
For any research to be effective and useful, it has to fulfill certain criteria, have certain charac-
teristics and meet certain challenges.
Introduction—Scope and Applications of Research 1.29
Any research is as good as the researcher(s). Therefore, one has to possess certain qualities that
enable him/her to pursue the mission successfully.
DISCUSSION QUESTIONS
1. Discuss the various motivating factors for conducting a research study. Explain with an illustra-
tion.
2. Discuss the various types of researches with examples.
3. Describe the various criteria, characteristics and challenges of a good research.
4. Describe the qualitative requirement desired in a researcher.
5. Describe the different steps of research process and illustrate with a suitable example.
Concepts and Tools for
Business Research
2
1. Concepts:
(a) Introduction
(b) Research Methodology and Research Methods
(c) Selection of Appropriate Research Methods
(d) Constructs and Concepts
(e) Variables
Contents (f) Deductive and Inductive Logic
(g) Quantitative and Qualitative Research
(h) Case Study Method of Research
(i) Goal Setting for a Research Project
2. Tools
(a) Time Scheduling – PERT and CPM
(b) Creativity and Research in an Organisation
LEARNING OBJECTIVES
Business Research Methodology (BRM) is an improvement over the earlier methods of conducting re-
search that relied merely on statistical methods and common sense. Over a period of time, the research-
ers, based on their experience of certain shortcomings and also with an urge to make the research more
productive and useful, made optimum use of resources by extending the scope of statistical research.
The BRM is also more comprehensive in in-depth situation analysis that is useful for decision-making.
For example, a company, apart from analysing varying demands for its various products vis-a-vis that
for its competitors, would also like to study behavioural aspects of its employees with respect to latter’s
performance in duties and perception of customers. It could also have suggestions from the employees
about improving the systems and procedures and, expectations and preferences from customers and
dealers. Further, the company could explore the ways and means of improving the quality and reducing
costs in view of the competition through such studies.
In the process of improving the scope and utility of BRM, several tools and techniques have been
developed to improve the relevance and scope of BRM. The purpose of this chapter is to describe
some of these, in an exclusive way for their better understanding. While discussing them with the text
as and where they are relevant, their application and potential get marginalised as they are useful, in
addition to BRM, in other areas that are relevant for a researcher/executive. For example, the concepts
and techniques of PERT and CPM are useful in any sphere of managing work, in addition to BRM.
2.2 Business Research Methodology
Relevance
Mr. Anand had taken over as CEO of Modern Electronics, only a day back when he was in-
formed that a senior partner of a multinational consultancy firm was to visit him for presenting
the report prepared by the consultancy firm about the new MIS proposed by them. The meeting
was fixed by the earlier CEO who had to leave immediately for taking up foreign assignment.
During the presentation, Mr. Anand, to hide his ignorance, nodded his head, to several sugges-
tions without fully understanding several terminologies used by the team of consultants. He was
especially at sea when reference was made to the studies made by the consultants within the
Modern Electronics. After the meeting was over, Mr. Anand called the Head of MIS Department
to make a presentation for explaining all the terminologies, before he went through the report.
In fact, he issued a circular that, in future, before any discussion in the management committee
meeting, the concerned executive should come to him and explain the relevant concepts and
terminologies, so that he could guide the discussions with full understanding.
Incidentally, it was reported that one chairman confessed that he had sanctioned several
projects without knowing what ‘IRR’(Internal Rate of Return) actually meant!
2.1 INTRODUCTION
There are several concepts and terminologies that are used in Business Research Methodology (BRM).
These exclusive terminologies and concepts have been described in this chapter. This is on the prem-
ise that when these are mentioned while discussing the detailed aspects and process of conducting a
research study later, these will be easily appreciated and integrated with the corresponding text. In
addition to these, there are two tools namely PERT/CPM and creativity that are immensely useful
in enhancing the quality of a research project as also its planning and implementing in such a way
that the resources, including money and time, are utilised in the most optimum manner.
Case Studies
Presentations
Role Play
Workshops and Seminars
It also includes training evaluation methods.
Selection Methodology
It includes:
Written tests of various types for testing verbal and non-verbal communication
Group Discussion
Interview
Research Methodology
It includes:
Listing of appropriate Research Methods
Logic behind the selection of methods used
Context and objective of each method
Scope and limitation of each method
As a broader philosophy, research methodology can be viewed as seeking answers to the
questions:
Why?
How?
When?
Who?
What?
Where?
Which?
For example,
In a research study conducted to improve productivity and efficiency of employees of an
organisation, the above types of questions, could be worded as follows:
Why is the study being conducted?
How to measure the productivity of the employees?
When the productivity started declining?
What are the factors that led to declining productivity?
What are the different types of skills that are desired for different categories of employees?
Who are the employees needing immediate training?
Where the training will be conducted?
Which is the set of persons – internal or external – who will conduct the study and training?
As an illustration, the Director of a management institute desires to know the percentage of MBA
students with more than one year of work experience before joining MBA. In such a situation, the
methodology of seeking the requisite information could be obtained by any one of the following
methods:
Browsing through the application forms of the students and noting the number of students with
the suitable experience.
Scanning the computerised records of all the students.
2.4 Business Research Methodology
Personally contacting a sample of students, selected randomly, and enquiring about their work
experience. Once the percentage of students with more than one year of experience is found,
it can be used as an estimate of the percentage of such students in the population i.e. all the
MBA students at the institute.
In general, people tend to make little distinction between methodology and methods in the
context of research studies; and the two terms are used often without any distinction.
14 (Factor Analysis), scales have also been developed for various unobservable characteristics or
constructs such as attitudes, sales aptitude, image, personality and patriotism.
Through scaling techniques discussed in Chapter 5, one could measure the above constructs.
One may wonder as to how these are related to business entities. One simple logic is that any
business is run by the people (employees), for the people (customers) and it is essential to study
the above constructs for successful running of a business.
Like human beings, a business organisation has also physical characteristics like: employees,
sales, offices, etc. Being physical in nature, these are easily measurable. However, there are certain
abstract characteristics (constructs), like reputation, image of the entity, motivation, work culture,
commitment, customer’s perception and trust. Some other examples are—truth, honesty, intelligence,
happiness, motivation, achievement, satisfaction, personality, achievement, ambience, décor, beauty,
justice, values, etc.
All these perceptions and feelings of customers are extremely important because they help the
company to stay afloat and grow. Therefore, it is essential for the companies to consider the above
constructs relating to employees and customers.
A construct is based on ‘concepts’, or can be thought of as a conceptual model that has measurable
aspects. This allows a researcher to "measure" the concept and have a common acceptable platform
when other researchers do a similar research. For example, measuring advertising effectiveness
is a construct, and concepts related would be brand awareness and consumer behaviour. Quality
of a TV is a construct, while picture, sound, contrast ratio, etc. would be concepts, that could be
measured to define quality. In general, concepts are mental representations and are typically based
on experience, and relate to real phenomena (students, customers).
In general, constructs are abstract and concepts are components of constructs and are concrete,
and are, therefore, measurable.
Some examples are:
Construct Concepts
Job Competence Knowledge
Skills
Attitude
Mental Ability Memory
Analytical Ability
Logical Power
Language Skill Vocabulary
Syntax Spelling
Such traits or characteristics are not directly observed but get reflected in observed human behaviour.
For instance, intelligence is reflected by marks obtained in verbal, analytical and logical tests.
It is understood that software has been developed for assessing life skills, which will generate a
grade based on feedback developed by teachers.
2.4 VARIABLES
A business research study, invariably, involves study of characteristic(s) of an individual/item/unit/
entity, etc. These characteristic(s) are represented by variables. As the name suggests, a variable
changes values for different individual/item at the same time (e.g. income of individuals for the
year 2009–10, prices of stocks on a day) or for the same individual/item at different time (income
of an individual, sales of a company).
For example, the income of an individual is a quantitative variable, gender is a qualitative
variable.
In a study, data is, generally, collected for relevant variables. These are classified in five categories
as follows:
(i) Independent Variable
(ii) Dependent Variable
Concepts and Tools for Business Research 2.7
But the grasping power of a student has strong impact on the relationship. Two students having dif-
ferent grasping power may study for the same time, but may not get the same marks. In this case,
‘grasping power’ becomes the moderating variable. Some more examples are as follows:
Independent Dependent Moderating
Quality of teaching in a classroom Performance of students in exams Motivation for sitting in competitive
exams
Percentage of discount in a store Sales Aptitude of Sales Personnel
The various aspects like measurement and scaling relating to variables are described in
detail in Chapter 5.
2.5.1 Deduction
The basic concept in deduction is from
‘Many to One’
or
‘Population to Sample’
In this type of logic, we are given information about a population, and we deduce the information
about a sample or just one unit.
A few examples/illustrations of deduction are given in Table 2.1.
(Contd)
(iv) Premise (Law) According to Newton’s law of gravitation,
any object thrown up will come down
Given Information Saluni throws a ball in the air
Deduction/Conclusion The ball will come down Valid. Proven law.
(v) Premise For all companies manufacturing retail
products, advertisements do have favourable
impact on their sales
Given Information ALPHA is a company manufacturing mobile
phones
Deduction/Conclusion Therefore, the advertisement by ‘ALPHA’ Valid. Proven past record for all
company will improve its sales companies.
From the above examples, we may note that, Deduction reasoning works from the ‘General to
the Specific’. It may also be termed as ‘top-down’ approach.
It may be noted that it is analogous to ‘Brand Image’ wherein, conclusions are drawn just by
the name of the brand. That is how the brand image works. We infer about the quality of an indi-
vidual/product/service depending on the image of the company where an individual works or the
name of the company, where the product was produced or the name of the company which provides
the service. In the case of an individual, we may draw inference about him/her even from country
or race to which he/she belongs.
2.5.2 Induction
The basic concept of induction is from:
One to Many
or
Sample to Population
A few examples/illustrations of induction are given in Table 2.2.
(i) Observation The average work experience of a sample of MBA students at a manage-
ment institute is 18 months.
Induction/Conclusion We induce that the average work experience of all the MBA students at
the institute is 18 months.
(ii) Observation While cooking rice, a cook picks one piece of rice and finds if it is
cooked
Induction/Conclusion All the rice pieces are cooked.
(iii) Observation One biscuit from a packet is stale
Induction/Conclusion All the biscuits in the packet are stale
(iv) Observation In GMAT 2009, the average score of Indian students was 562 as compared
to 539 for the world
(Contd)
Concepts and Tools for Business Research 2.11
(Contd)
Induction/Conclusion Indians are very competitive and hardworking
It may be noted that, this inductive conclusion is not valid for the entire
population of Indians. If at all one may say that “Indian students appearing
in GMAT are very competitive and hardworking”
Induction, in simple terms, could also refer to ‘Generalisation’ from what we observe or know.
Induction involves reasoning about the future from the past, but in a broad sense, it involves reach-
ing conclusions about unobserved things on the basis of what is actually observed.
Induction starts from 'Specific' observations or set of observations to Generalised Theory or
Law. It could be termed as ‘bottom-up' approach. A classical example is when Newton observed
an apple falling from a tree—he generalised it to ‘Gravitational Theory’.
Induction can also be considered as divergent thinking. It is used when nothing or little is known,
and we wish to expand our knowledge. The following example illustrates such a process:
A company is approached by a new management institute, to select its graduates.
The company, through a written test and interview, selects 5 students.
The company finds those students rendering excellent service to the company.
The company selects 5 students again in the next year, by the same process.
Once the company repeats such selection for some years, it forms an opinion or rather induces
that the students of the institute, selected through the same process, would prove to be highly
useful for the company. In fact, in due course of time, the company might do away with the
written test, and select students just on the basis of interview.
Based on the experience of this company, the other companies also start recruiting from the
institute, and that is how, the image (brand) of the institute is created.
The explanation for treating deduction as ‘Top-down’ approach and induction as ‘Bottom-
up’ approach, may be depicted as follows:
Deduction Induction
Theory Theory
Hypothesis Hypothesis
Observation Pattern
Confirmation Observation
Illustration 2.2
Deduction
Daily Rates of Return on Sensex
Theory Rates of Return follow symmetrical distribution with reference to some mean
Hypothesis The rates of return follow normal distribution (Hypothesis is the statement
which can be tested)
Observations Collect Data
Confirmation Fit a normal distribution to the observed data, calculate expected frequencies
for various intervals, within the range of rate of return, confirm the hypothesis
by chi-square test of significance, described in Chapter 11.
(O – E)2
c2 = Σ _______
E
Induction
Example:
For studying daily rates of return on Sensex, we may collect observations, say, for a year.
Making a histogram (discussed in Chapter 8) of these returns, we may observe a pattern as
follows:
From the above pattern, we may develop the hypothesis that the distribution of rates of return fol-
lows normal distribution.
From this hypothesis, we may develop a theory that rates of return on Sensex follow a symmetrical
pattern with respect to some mean. So, here we are going from observations in terms of histogram
to come to one theory; therefore, it is an induction.
2.14 Business Research Methodology
published by Tata McGraw-Hill. However, the salient aspects of statistics relevant for BRM are
given in Chapter 8 to Chapter 13.
Quantitative research answers the questions that need a quantitative answer like:
‘How Much’ (Measurement: How much growth has taken place in the retail outlets during the
year 2010)
‘How Many’ (Counting: How many MBA students in an institute have work experience more
than 2 years)
‘How Often’ (Frequency of Occurrence: How many times the production process went out of
control during January 2010?)
It is also useful for testing any assumption which is
Descriptive (Average increase in the equity prices of SENSEX companies during 2009 is 22%),
or
Causative (Coaching helps in better performance in CAT), or
Associative (The salaries offered in campus placement and the final grade in MBA are
correlated).
As a matter of recapitulation, all the topics listed next, that we study in statistics are studied
through Quantitative Research.
Measures of Location and Dispersion (Mean, Variance, etc.)
Statistical Distributions (Normal, Binomial, etc.)
Correlation and Regression Analysis (Correlation Coefficient and Regression Equation, etc.)
Statistical Inference—Estimation and Testing of Hypothesis
ANOVA and Design of Experiments
Time Series Analysis
Forecasting
Decision Theory
Incidentally, it may be mentioned that the discussions in this book are predominantly on
Quantitative Research Methodology, which generally connotes Research Methodology.
For example,
Why Why the market share of the company has been declining?
How How to meet the competition due to the opening of another store in the neighbourhood?
What What are the factors that motivate the employees in an organisation?
Qualitative research was first used in social sciences, and later on its use spread to other fields
like education, psychology, communication and even management studies, especially those relat-
ing to consumer behaviour, for new products and services. It investigates the ‘why’ and ‘how’ of
decision-making in addition to what, where and when. Therefore, qualitative research relies on
smaller but focused samples rather than large and random samples.
The Qualitative Research is broadly classified into the studies relating to:
(i) Reaction and Feelings
(ii) Learning (improving attitudes, knowledge)
(iii) Changes in Skills (improving effectiveness of lectures/workshop, etc.)
(iv) Effectiveness (improved performance attributed to improved behaviours)
(v) Response from existing and potential customers
Some of the situations needing qualitative research are:
CEO of a company wishes to assess the reaction and feelings of employees for the health care
scheme announced by the company
Director (HRD) of a company wishes to assess the impact of the two-week training for middle-
level executives, conducted by a consulting agency
A consultant during his assignment relating to organisation structure, in a company, observes
the need for improving behavioural skills of young MBA executives, and communication skills
of old executives
The Board of a company wants to have an assessment of the impact of the endorsement of its
main product by a celebrity.
A car manufacturer wishes to revise the prices of its different models based on the likely re-
sponse by the new and potential buyers
2.6.2.1 Sources of Data for Qualitative Research Following are sources of data for conduct-
ing a qualitative research:
People—individual or groups.
Organisations—an individual or a representative of a group of organisations
External environmental factors like regulatory, economic, social or technological
Internal environment in an organisation encompassing work culture, incentive systems for hiring
and retention, etc.
Texts (Published or Internet)
2.6.2.2 Terminologies Used in Qualitative Research Qualitative research uses several ter-
minologies which are specific to such research. Since the list is quite exhaustive, we have included it
in the glossary.
2.6.2.3 Applications of Qualitative Research The following are some of the applications of
Qualitative Research:
(i) Used for exploratory purposes or to investigate ‘how’ and ‘why’ of happening of a situation.
Concepts and Tools for Business Research 2.17
(ii) Used for pilot testing to design quantitative surveys of large scale.
(iii) Used for more diversity in responses as also flexibility to adapt to new developments or issues
during the research process.
(iv) Used for better and deeper understanding of the situation/phenomenon. For example, how a
customer goes about selecting a mobile phone and a service provider.
(v) Often used for policy and programme evaluation research since it can answer certain important
questions more efficiently and effectively.
(vi) When a set of qualitative data is difficult to graph or display, pictorially. However, it categorises
data into patterns which provide the basis for analysing and drawing conclusions.
(vii) Used for explaining and interpreting quantitative results like in Factor Analysis described in
Chapter 14.
Some of the above applications are reflected in the following case:
The following case illustrates yet another situation involving the use of qualitative research:
of 3 years. Accordingly, the salaries of canteen employees were suitably revised. However, there
was no perceptible change in the quality of food and service. The General Manager in charge of
the administration then had personal interaction with the representatives of the canteen staff, and
called a meeting of the canteen employees in the dining hall. He was profusely welcomed and
offered a bouquet by the canteen staff. The General Manager asked for their free and frank com-
ments and suggestions. The gist of the deliberations was that they perceived differential attitude of
management with the canteen employees and the other staff of the organisation. For instance,
They were not invited to any function organised by the management
Their meritorious children were not awarded scholarships like children of other staff of the
organisation
The canteen was not air-conditioned like the office
They also gave some suggestions for bringing about a better menu without much cost implica-
tions.
The General Manager agreed to these suggestions. Later on, it was found that both the categories
of the staff had developed cordial relations.
(iv) Though the sample size is smaller, the technique of data collection is more elaborate leading
to the researcher’s personal involvement, generally requiring more time and money.
Quantitative Qualitative
Objective Describe, explain and predict Understand and interpret
Researcher’s Limited: Controlled to prevent bias High: Researcher is participant or catalyst
Involvement
Research Purpose Describe or Predict: Build and test theory In-depth understanding: Theory building
Research Design Rigid design and framework: Flexible design or framework:
• Determined before commencing the project • May evolve or adjust during the course
• Uses single method or mixed methods of the project
• Consistency is critical—involves either a • Often uses multiple methods simultane-
cross-sectional or a longitudinal approach ously or sequentially
• Consistency is not expected—involves
longitudinal approach
Desired Sample Probability Non-probability: Purposive
Design
Sample Size Large Small
Participant’s No preparation desired to avoid biasing the Pre-tasking is common
Preparation participant
Data Type and Verbal descriptions reduced to numerical codes Verbal or pictorial descriptions reduced to
Preparation for computerised analysis verbal codes (sometimes with computer
assistance)
Data Analysis • Computerised analysis—statistical and • Human analysis following computer or
mathematical methods dominate human coding; primarily non-quantita-
• Analysis may be ongoing during the project tive
• Maintains clear distinction between facts • Forces researcher to see the contextual
and judgements framework of the phenomenon being
measured
• Distinction between facts and
judgements less clear
• Always ongoing during the project
Insights and Limited by the opportunity to probe respon- Deeper level of understanding is the norm;
Meaning dents and the quality of the original data determined by type and quantity of free-
collection instrument; insights follow data response questions. Researcher’s participa-
collection and data entry, with limited ability tion in data collection allows insights to
to re-interview participants form and be tested during the process
Feedback • Larger sample sizes lengthen data collec- • Smaller sample sizes make data col-
Turnaround tion; Internet methodologies are shortening lection faster for shorter possible turn-
turnaround but inappropriate for many around
studies • Insights are developed as the research
• Insight development follows data collection progresses, shortening data analysis
and entry, lengthening research process; in-
terviewing software permits some tallying of
responses as data collection progresses
(Contd)
2.22 Business Research Methodology
(Contd)
Conclusions Generalising from sample to population is a Generalisation from small-sized research
valid subject to well-specified accuracy and has to be done with caution because of the
judgemental error. subjectivity involved in data collection
This table is based on the Table 8.2 of the book titled “Business Research Methods” by David R
Coopers and Pamela S Schindler, published by Tata McGraw-Hill.
Understand and explain the causes that could have brought about an adverse situation in one
group/office to avoid similar situation in other groups/offices
Test new products, services, systems or strategies in a couple of groups/offices, and if success-
ful, use in others.
The advantages of a case study are:
It is a cost-effective methodology for learning through a real life set-up without the need for
simulation
It helps in better understanding of a complex or an unusual situation/phenomenon or issue
It is an effective tool and mechanism for generating lively discussion and subsequent ideas to
improve upon the offered suggestions or practical system.
For deriving the previously discussed advantages, a case study is developed by collecting in-
formation through observation, recording, interacting and suitably analysing it to pose issues for
discussion. The details of solutions, if any, not included in the case, are sought from the selected
group. All the suggestions/solutions are thoroughly discussed to arrive at a consensus. Such con-
sensus, along with the earlier solution(s), if any, is deliberated further to reach to a much broader
consensus. Deliberation and consideration on multiple solution help to analyse the situation from
different perspective and hence, it can be used for either ‘learning’ or ‘testing’ in future.
Sometimes, one may conduct a number of case studies to have a better understanding of the
issues involved by covering a number of units. For example, the branches of a commercial bank
can be divided in four categories viz. metropolitan, urban, semi-urban and rural. These branches
are quite different from one another. In such a situation, if the objective is to study behaviour of
defaulting borrowers, one may select one branch from each of the four categories. Case studies at
these branches might reveal some aspects which could be valid for the bank as a whole.
Similarly, if the objective of the bank was to study the pattern of car loans in Metro cities, one
may select a branch each from the four Metros, and conduct four case studies at these branches.
We now describe two live case studies that were conducted primarily to bring out the relatively
new concepts and practices introduced by some individuals who ultimately reached the top level in
their careers.
and attracted new customers. The staff was highly appreciative of the branch manager as they felt
here was one person who was sensitive to their personal needs, and was leading from the front.
The study involved interaction with the staff and customers as also the analysis of business re-
cords at the branch to ensure that the bank’s interest was well-protected. The study revealed the
following strategies used by the branch manager to develop excellent business at the branch:
Set up consultative committee comprising high-valued customers
Conduct weekly meetings of staff
Conduct monthly meetings of customers
Establish personal rapport with the staff to ensure support and commitment of staff
Flexible timing at branch-led by example
Communication channel for customers
Meeting genuine and urgent demands of customers without sacrificing the interest of the
bank
By goodwill measures and personal interaction like birthday celebration at the branch, and
Inviting families of staff for tea at residence
The following illustration is one more case where an unconventional approach solved the problem
which was in the process of being resolved through conventional approach.
2.9 PERT/CPM
(A Tool for Time Management and Optimum Utilisation of Resources)
PERT and CPM are acronyms.
PERT stands for ‘Program Evaluation and Review Technique’, and
CPM stands for ‘Critical Path Method’.
There is a subtle difference, to be pointed out later, between the two but for our purpose we treat
them as the same. It is so, because, both these techniques tell us:
“How to do a work in a systematic and scientific manner?”
The work may be conducting a study, organising admission process for MBA, writing a
book, building a mall, etc. Contrary to the popular notion that these techniques are useful
only for large-scale projects, we would like to emphasise that both these techniques are
simple enough to be useful for any work where there is a concern for completing it on a
given time and at a given cost.
Both techniques were developed almost at the same time in the year 1958, in different projects.
While PERT was developed for the project relating to Polaris missile system in USA, CPM was
Concepts and Tools for Business Research 2.27
developed for the projects of overhauling of chemical and electricity generation plants in USA and
UK, respectively.
While the emphasis in PERT was to reduce ‘time’ for development of the system, and the emphasis
in CPM, used in the other two cases, mentioned above, was to reduce time but with the ultimate
objective of reducing ‘cost’. It may be appreciated that reducing time for any work, almost invari-
ably, leads to reduction in cost.
These techniques were soon adopted for management of big projects to cope up with the problem
of ‘over run’ on time and cost which were an integral part of almost all the projects. The significant
advantage of using these techniques propagated their use in almost all the spheres of office work
and industrial activities. In fact, in India, L&T is known for using these techniques effectively to
complete their projects like building Nehru Football Stadium in Chennai in stipulated time. The
stadium was made in just 9 months, 9 days ahead of schedule! The first author of this book has
successfully used this technique for publishing a book and organising annual board meetings.
2.9.1 Terminologies
In the context of managing projects, PERT is now more popular as ‘Project Evaluation and Review
Technique’. Project is defined as ‘any work which has a definite beginning and a definite end,
and which consumes resources’; and as such PERT can be used for managing any ‘project’ like:
Conducting a research study
Building a bridge
Selection process for MBA students at a management institute
Finalising of annual accounts of a company
Launching a new product or service
In the context of BRM, the use of PERT/CPM helps in managing the conduct of a research
study in lesser time with optimum use of given resources without compromising on the
quality of the research.
Scheduling
It implies indicating the starting and completion times for all the activities. It is a very important
managerial part of the project management, and it decides the extent to which, one could derive the
benefits of PERT/CPM, for any project.
Implementing
This is the physical part of carrying out the project as per the schedule.
Controlling
This is the managerial part of overseeing that the project is completed as per the schedule, and if
something does not go as per schedule, then to take appropriate corrective action. The objective is to
complete the project within the time and cost budgeted for the project. However, if due to unavoid-
able reasons, there is overrun on time and cost, it is to be kept at minimum.
A point of caution may be noted. Mere use of PERT/CPM does not ensure that there will
be no overrun on time and cost. The overrun can be minimised only to the extent it is
humanly possible.
The manager of an office in Fort area (Mumbai) sends out two persons, calls them as ‘A’ and ‘B’
for some office work. The following diagram indicates the locations where these people are required
to go, in connection with the assigned jobs.
‘A’ goes to BTC for collecting some documents, and from there goes to Dadar Station in Mumbai.
‘B’ goes to Dadar Station, and together, they go to the destination. In literal sense, the ‘project’ starts
with ‘A’ leaving the office and both ‘A’ and ’B’ reaching the destination. The activities or jobs, for
our purpose are as follows:
‘A’ going from office to CST
‘A’ going from CST to Dadar Station
Concepts and Tools for Business Research 2.29
The above diagram is known as PERT chart. The timings given above the arrows are in
minutes.
It comprises circles and arrows. While arrows indicate the activities, circles represent ‘Events’ or
milestones. Every activity is bound by two events – one the starting and the other as the completion
or finish. All the events are numbered from 1 to 5, and the names of activities are indicated above the
arrow. The times taken for activities are indicated below the arrow. It may be noted that, the length
of an arrow is not proportional to time but if one so desires, the length of an arrow can be made
proportional to time. We explain the phases of a project through the above example as follows:
The first step in planning is to list the activities. The manager has identified the jobs as:
For ‘A’:
Going from office to BTC
Doing the assigned job at BTC
Going from BTC to Dadar Station
For ‘B’:
Going from office to CST
Going from CST to Dadar Station
Going together with ‘A from Dadar to the destination
Thus, the project comprises all the above activities. For the sake of keeping the PERT chart
simple, we have ignored the job “Doing the assigned job at BTC” in the chart. The sequencing of
activities is clearly indicated. The co-ordination is specified in the form of both ‘A’ and ‘B’ going
together from Dadar to the destination.
As regards scheduling, we may consider the scheduling as shown in the chart as on next page:
The chart shows the times required for completion of each activity as also times for start and
finish times for various jobs.
At the first look at the chart, one may think that the scheduling is defective in the sense that ‘A’
is wasting his time to the tune of 35 minutes as he has to wait for ‘B’ to arrive at Dadar. But there
is a positive aspect also. There is a cushion for ‘A’. Even if he/she is stuck with non-availability of
train or whiles away his/her time, the manager will not be so much concerned about his movement.
2.30 Business Research Methodology
He/She will concentrate on ‘B’, and might even tell him/her to phone when he/she reaches Dadar
Station.
Further, if due to some reasons, after the project starts, he/she is required to expedite the project
by 15 minutes, the manager may authorise taxi fare to ‘B’ to go from BTC to Dadar Station, and
reach in 5 minutes. He/She need not incur any additional expenditure on ‘A’.
However, if we accept the point of view that ‘A’ is wasting his time, he/she may be asked to start
at 10.35 so that both will reach Dadar at the same time i.e. 11.20 as shown in the following:
However, now the manager will have to control the movements of both ‘A’ and ‘B’. If any one
of the two is delayed even by 5 minutes, the project will be delayed. Further, if he/she is required
to expedite the project by 15 minutes; either he/she has to expedite the movement of both ‘A’ and
‘B’ or he/she expedites the movement from Dadar to the destination by telling them to go by taxi
rather than bus. Thus, there are both pros and cons for the two schedules indicated above. There
is a possibility of yet another scheduling, and that is to ask ‘A’ to leave office at 10.15. It provides
20 minutes of cushion to ‘A’. In fact, ‘A’ could be asked to leave any time between 10.00 to 10.35.
That is why, scheduling is considered as a managerial activity rather than a routine job. One can
only guess what could be the scheduling concerns in a bigger project!
In fact, one of the biggest advantages of PERT chart is that it indicates independent jobs
which can be done simultaneously, and, thus, saves time. Incidentally, the biggest secret of
reducing time for a project, through PERT chart is that independent jobs can be done
simultaneously, and thus, saves time.
Event:
An event is said to be completed if all the jobs leading to it are completed. For example, event ‘4’
is said to be completed only when both the jobs 2–4 and 3–4 are completed as then only the job
4–5 can start. Incidentally, the event ‘4’ indicates reaching that milestone as also the beginning for
the next milestone.
The times taken for a job, indicated above the arrows and for TE and TL are given in weeks.
Earliest Expected Time (TE):
It relates to an event. It is the earliest time by which an event is expected to happen. An event
is said to have happened when all the activities that are leading to it are completed. The TEs for
various events are given above the circle representing the event. The explanation for the TE of event
4 is as follows:
2.32 Business Research Methodology
As mentioned earlier, this event is said to have happened when both the jobs 2–4 and 3–4 are
completed. The job 3–4 will get completed at the earliest by 8 weeks, and the job 2–4 will get
completed at the earliest by 6 weeks. However, TE for event 4 is 8 weeks, as the activity starting
from event 4 i.e. 4–5 can be started only after 8 weeks.
Latest Allowable Time (TL):
It also relates to an event. It is the latest time by which an event must be completed if the project
is not to be delayed. For the last event, TL indicates the project completion time.
For the last event, TE and TL are equal.
For event 5, the TL is 12 weeks.
For event 4, it is 12 – 4 = 8 weeks.
For event 2, TL for 4 is 8 weeks, and the job 2–4 takes 4 weeks, so TL is 4 weeks.
Critical Path
Starting from event 1, we can reach the last event 5 via various paths. The path which consumes
maximum time should really be called ‘longest’ path. But it is called ‘Critical Path’ because on
this path, even if there is slightest delay in any job, the project will get delayed. It is generally
indicated by red ink, as it demands maximum attention. In the given earlier chart, the path 1-2-4-5
is the critical path. The term, Critical Path Method implies managing a project based on analysis
using this path.
It may be noted that on critical path, the earliest expected times and latest available times
are equal.
Slack or Cushion Time for Jobs:
The slack or cushion time for a job is calculated by subtracting the TE for the previous (tail) event
plus the expected time for the job from the TL for the next (head) event. For example, for the job 3-4,
TL is 8 weeks. From this, we subtract 2 (TE for event 2) and the time taken for job 3–4 i.e. 4 weeks;
thus, getting 2 weeks as the cushion time for activity 3–4. The calculation is shown below:
8 (TL for event 4) – 2 (TE for event 2) – 4 (time for job 3–4) = 2
It may be noted that there is no cushion time for jobs lying on the critical path. These jobs
are called ‘critical jobs’. Similarly, the events lying on the critical path are termed as ‘critical
events’.
Activity Depends on
A ---
B A
C A
D C
E B and D
It may be mentioned that a PERT chart is only a statement of logic showing sequencing and
co-ordination, and its shape does not matter e.g. whether the arrows are slanting or are horizontal/
vertical.
It may be noted that the level of complexity depends on size and type of a project. This PERT
chart is only a guideline for preparing the PERT chart of an actual study.
It may also be mentioned that in a PERT chart, the emphasis is on management of time of various
activities.
The dotted arrows are called ‘dummy’ activities. These are used just to connect events.
The above PERT chart for the research projects indicates as to how the project can be conducted
in teams. The project starts with identification of goals and defining problem. In the next stage,
teams can be made and allocated different tasks like collecting information through interviews,
discussing problem in groups, and reviewing literature, among themselves. The outcomes of all
the activities mentioned above can be combined to refine and finalise the problem, and develop-
ing the hypothesis, research design and conducting a pilot study. After incorporating the findings
of the pilot study, the team again may split into sub-teams for collecting data. After checking the
data of different teams, the data can be compiled together and the analysis and interpretation can
be distributed again in different teams. Finally, the entire analysis is compiled together in the form
of a research report.
Concepts and Tools for Business Research 2.35
A PERT chart is a simple yet powerful device for managing a project in a systematic and
scientific manner. It is a universal technique for managing projects in any field right from
making tea to setting up a nuclear plant. Of course, the shape of the chart will go on getting
complicated depending on the complexity of the project. The decision for using a PERT chart
for a project does not depend on the size of the project; it depends on how serious one is to
minimise overrun on time and cost and make optimum use of resources.
Apparently, it may appear that the real use of PERT chart is only when certain activities
are parallel i.e. they can be done simultaneously. It is true to some extent, but the ‘Planning’
phase of PERT chart is useful in all the situations. In the case of BRM, collection and analysis
of data is one activity which can be subdivided in suitably number of parts as shown above,
each of which can be done simultaneously.
Creativity, in simple terms, is the trait of a person to think, create or do something new. Creativity
is also defined as the ability to generate new ideas or concepts. As such both research and creativity
are highly interrelated. In the context of application in business organisations, creativity is described
as a process of developing and expressing novel ideas for solving problems, in general, and meeting
specific needs of an organisation, in particular.
Normally, we associate creativity with art, music and literature but it may be appreciated that it
is also associated with innovation and inventions. Thus, it is equally important in professions like
business, architecture, industrial design, advertising, software engineering, video gaming, etc. It is
due to this lack of appreciation that creativity, as a subject, is yet to get it’s due place in the academic
and in business industry.
In the context of business environment, as mentioned earlier, it is highly useful in solving prob-
lems arising at various stages like planning, analysing and controlling. Sometimes, given a problem,
creativity helps in looking at the problem from an angle, which is quite different from the normal
or routine, and thus solve a problem. This concept is best illustrated with the help of the following
two classical examples. The first example relates to joining the nine dots arranged in a square shape
on a paper as follows:
2.36 Business Research Methodology
The conditions for joining the nine dots are that the dots should be joined by drawing four straight
lines by a pen ‘but without lifting the pen from the paper’. The conventional approach for solving
the problem is to try joining the dots while remaining within the square. The solution is, however,
easily obtained when one goes outside the square, and joins the dots as follows:
There is, however, a difference between the two, and the difference is attributed to creativity.
While a human's decay cannot be postponed indefinitely, an organisation's decay can be postponed
indefinitely, through creativity, if it keeps on innovating new products, services, processes and
systems. A classical example is ‘The Times of India’ newspaper which is now more than 170 years
old, and still dominates the circulation of English newspapers in India. However, many publications
started after ‘The Times of India’ have become history. Another example is the Punjab National
Bank which is more than 100 years old, and is still growing stronger steadily, and is now the second
largest public sector bank in India. Several other banks which started much later do not exist; some
of them are: New Bank of India, Hindustan Commercial Bank and Global Trust Bank.
Concepts and Tools for Business Research 2.37
There are two gates—one for entry to the institute and the other for entry to the hostel building.
Both gates are manned by one security guard each. As a part of the economy drive, the manage-
ment wanted to reduce the number of security guards. After a study of the pattern of usage of
the gates, it was decided to open only one gate at a time; and the timings for each gate were
announced. This led to a lot of inconvenience for staff of the institute and hostel residents, espe-
cially for those who used cars or taxis for entry to the two buildings. Those hostel residents who
came by car and had to enter through the institute gate could not reach the entrance of the hostel
2.38 Business Research Methodology
building causing inconvenience in rainy season or having to carry luggage for a good distance
of about 30 metres. On the other hand, the faculty and staff who came by car, entering through
the hostel gate could not park their cars in the institute campus and had to park cars outside the
campus due to limited parking capacity in the hostel compound. This went on for about a year.
Thereafter, a new security officer joined the institute. He suggested the reduction of the width of
the flower beds ‘A’ and ‘B’ in front of the institute's building, by 1 foot and 2 feet respectively.
This increased the width of the pathway from 5 feet to 8 feet. Thus, a car could pass through the
pathway. With this simple solution, the hostel gate was closed (except for utility vehicles) and the
institute's gate was kept open for 24 hours. Thus, the hostel residents could come any time, enter
through the institute's gate and reach the hostel building. Thus, the ideal solution was evolved by
changing the original problem of deciding times for opening/closing of the two gates.
Yet another classical example of changing the perspective of a problem needing solution was dur-
ing the Second World War when UK’s merchant ships were being attacked and destroyed/damaged
by German bombers in the Mediterranean Sea. A debate was going on in the British navy about
fitting of ships with antiaircraft (AA) guns which were available only in limited number, and were
required for protecting many civil/military installations in the country. Accordingly, only limited
number of ships were fitted with AA guns. However, these guns could not shoot down sufficient
number of bombers, and thus their utility was considered as doubtful by the military authorities.
At this junction, U.K. government appointed a committee headed by Noble laureate Prof. P.M.S.
Blackett; to look into the matter and offer its recommendations. The team collected and analysed the
data. The analysis revealed that even though the guns fitted on ships did not destroy the bombers,
their firing at the bombers did scare them and affected their accuracy of hitting the target with the
result that many bombs did not hit the ships and even if they did hit, many ships were not sunk, and
suffered only some damage that could be repaired on the sea itself. Prof. Blackett pointed out that
the criterion for judging the efficacy of installing AA guns on ships should not be the destruc-
tion of the bombers by the guns but the extent to which they were able to affect the bombers’
accuracy of hitting the ships and thus protecting the ships. This view was accepted by the U.K.
Government, and the guns continued to be fitted on ships. Thus, the perspective of the problem
was changed to arrive at the solution. We could also say that the originally perceived problem
was redefined.
Incidentally, creativity has been associated with right or forehead brain activity or with lateral
thinking, and the left brain is associated with memory and analytical ability.
Creative ideas are often generated when one discards preconceived notions and tries to follow a
new approach which could be unthinkable or even ‘laughable’ for others! Incidentally, the present
design of famous Tower Bridge in London (proposed by Sir W. G. Armstrong Mitchell & Company),
which has stood the test of time was not approved at the first instance—it was approved only after
discarding the suggestions that were received later.
In the context of a business organisation, the term, innovation—for various components of the
entire process—is used to refer to generating creative ideas and converting them to viable products,
services and systems and the term, creativity is referred to generating new ideas by individuals or
groups, as a necessary step within the innovation process. This leads to the saying, “Innovation
begins with creative ideas”.
Innovative staff has to be creative to continue to be relevant and remain fit to compete. Creativity
can be nurtured in the individuals through practical training as described in the following:
Creativity is best practised for day-to-day decision making through well known ‘Brainstorming”
technique, introduced by Alex Osborn. Another strategy for promoting creativity in the corporate
world is setting up of ‘Think Tanks’. In fact, both the techniques of ‘brainstorming’ and ‘think
tanks’ described in the following are instrumental in deriving the above mentioned benefits for an
organisation.
2.10.3 Brainstorming
Brainstorming is a group creativity technique designed to generate a large number of ideas for the
solution of a problem introduced in 1930s by Alex Osborn.
It is especially useful when the problem involves co-ordination and mutual support among the
members of a team. It encourages innovative ideas in a group setting and generates a sense of in-
volvement among the group/team members. While it leads to a sense of collective ownership of the
emerging solution(s), it also ensures commitments by the participating members.
One such example was the meeting convened by the Director of a management institute of the
faculty members to discuss the support system required for developing and using more case studies
as teaching pedagogy by faculty members.
Another example is the General Manager of the credit card division, in a bank, called a meeting
of the executives to discuss the different avenues to increase the profitability of the division, using
this technique.
In fact, without naming so all the meetings that are conducted in an organisation to solve any
specific problem could be termed as brainstorming sessions, if the following tenets of brainstorming
are followed. These relate to maximising the number of ideas, remove inhibition among people
to offer ideas, stimulate expression of the wildest idea and deriving synergy of ideas. These are as
follows:
Emphasise on quantity as per maxim ‘quantity contains quality’
No criticism of any idea
Welcome all ideas even if they appear weird or extrinsic
Combine and improve—combination of two or more ideas might lead to a new idea that could
be better than the original idea
The steps in the conduct of a typical brainstorming session are outlined as follows:
Specifying the problem
Preparing and circulating an agenda containing the specific problem
Selecting the participants—preferably in the range of 6 to 10, in number
Conducting the sessions—encouraging each participant to offer suggestions without any inhibi-
tion, and recoding the same on a board
Stimulating the idea generation process, if needed
Evaluating the suggestions subsequently, and reaching consensus
2.40 Business Research Methodology
It may be useful to discuss the difference between creativity and innovation which is so com-
monly used in business organisations. While creativity is associated with generation of new ideas,
innovation relates to translation of ideas into physical forms like products and services. In fact, in-
novation begins with creative ideas or in other words, creativity is the starting point for innovation.
However, it may be noted that creativity is necessary but not a sufficient condition for innovation.
Intelligence and creativity are the hallmarks of successful personalities. While, behavioural sci-
entists have evolved a measure of intelligence called I.Q (Intelligent Quotient), there is no measure
of creativity as of now. However, both are correlated to some extent.
“British give more importance to what they do than what they think,
while
Americans give more importance to what they think than what they do.”
It is because of the importance attached to creativity, that USA is the No. 1 economy leading
in pharmaceuticals, health care products and services, entertainment, information technology,
among several others.
2.42 Business Research Methodology
S U M M A RY
The difference between Research methodology and Research Methods is summed up as:
“The term ‘methodology’ connotes a much wider concept. It encompasses a complete gamut
of activities required to complete a research study including research methods that are used in the
study.”
The criterion for selection of appropriate research method is to select the method which
entails gathering most useful information in the most cost-effective manner that leads to most
cost-effective decision-making.
The terms constructs and concepts used in the context of business research have been explained
with suitable examples.
The chapter describes various types of variables that are the raw material for a research study.
These are:
Independent Variable
Dependent Variable
Moderating Variable
Concepts and Tools for Business Research 2.43
Intervening Variable
Extraneous Variable
While the basic concept in deductive logic is ‘Many to One’ or ‘Population to Sample’, the basic
concept in inductive logic is ‘One to Many’ or ‘Sample to Population’.
Collaborative approach is considered essential when a research study involves team work.
Creativity is important for exploring research areas and in conducting research activities.
PERT/CPM is an indispensible tool for completing any assignment including conduct of a research
study in a given amount of time and cost.
DISCUSSION QUESTIONS
3
1. Introduction
2. Identifying Research Problem
3. Formulating Research Problem
(a) Defining Research Problem
(b) Refining/Redefining Research Problem
Individual and Group Discussions (Exploiting Creativity)
Literature Survey
4. Hypotheses Development
Contents 5. Research Proposal
6. Request for Proposal (RFP)
7. Flow Chart for Conducting Research
8. External and Internal Research
9. Sponsored Research
10. Research at Corporate and Sectoral Levels
11. Guide for Conducting Good Business Research
12. A Consultant’s Approach to Problem Solving
LEARNING OBJECTIVES
The main purpose of this chapter is to provide an exhaustive and comprehensive view of the various steps of a re-
search process from problem identification to hypothesis development i.e. developing the statement to be tested for
acceptance or rejection. This facilitates conduct of further study involving collection of data, carrying out relevant
analysis, etc. and ultimately resolving the problem.
Further, the objective is also to facilitate preparing a research proposal, after the hypothesis is formulated, keeping
in view the criteria for conducting a good business research.
Subsequent to finalisation of the research proposal, the issue of conducting the research either in-house i.e. by
a team of researchers from within the organisation or getting it conducted by some outside consulting agency, has
been deliberated in detail with pros and cons of both the approaches.
3.2 Business Research Methodology
Relevance
National Dairy Products Ltd. is a small scale enterprise having two dairies in Kolhapur dis-
trict of Maharashtra, started by Mr Vasant, from a modest small dairy. The business has been
growing rapidly. The setup that was adequate for the small business, was not able to cope up
with the growth. Ms Charu, an enthusiastic MBA from reputed B-School, joined her father’s
business with an ambition to take the business to new heights.
After joining the business, she immediately felt the need for in-house research department.
She got her father’s approval for the same, on a condition that she would utilise only internal
talent for the department. She chose some of the talented staff from the other departments to
set up the research department.
She felt the need for orientation of these people towards the research process.
She conducted a seminar to lay the foundation of research methodology for the new team.
In the seminar, she explained the importance of research for the company, in general, and the
importance of a systematic process for a conducting research. She also explained each and
every step involved in research, and its importance.
At the end of the seminar, the selected employees who initially had a low morale, were
highly motivated for conducting the research projects.
It generated great deal of satisfaction to Ms Charu, as this was the indication that the seminar
was successful.
Ms Charu finally succeeded getting the approval for setting up of an internal research
department in National Dairy Products Ltd.
3.1 INTRODUCTION
In Chapter 2, we have discussed certain terminologies/topics which are used extensively in business
research methodology. In this chapter, we shall discuss the initial steps of the research process viz.
defining/refining/redefining i.e. formulation of the problem to be solved, formulating issues/questions
that are relevant for various hierarchal levels viz. managerial, researcher, investigator and measure-
ment, development of hypothesis i.e. the statement to be tested to resolve the problem.
As mentioned in Chapter 1, conducting research follows a well-structured process and it comprises
following broad steps:
(i) Specifying the area and the objective of the study
(ii) Defining and Refining a Problem
Defining Problem:
Refining/Redefining the Problem through
Literature Review
Interviewing relevant people
Group discussion with relevant people
(iii) Hypotheses Development
(iv) Preparing Research Design
(v) Collection of data
(vi) Analysing the data
Research Process 3.3
(iii) Competition
(iv) Customer Driven
(v) Failure
(vi) Technological Innovations
(vii) Environmental Considerations
(viii) Social
(ix) Economic
(x) Infrastructure
(xi) Operations / Process Driven
(xii) Coping with Changes
A problem might also arise due to
Experiences in the field or operations
Study of related literature. This might lead to people trying out similar experiments like expand-
ing the scope of the problem
Study of ‘Request for Proposals’, detailed in Section 3.7 or similar matter appearing in news-
papers, magazines, etc.
New visible opportunities
(ii) Impact of the Problem
If the problem is not solved, what would be the consequences?
The consequences could be:
increase in costs/expenditure
loss of revenue
missing out on the opportunity for expansion or diversification or, could be as serious as the
uncertainty over existence of the company e.g. due to non-compliance of regulatory direc-
tives
loss of reputation
erosion in USP (Unique Selling Points).
Questions’ Hierarchy
In the context of BRM, hypothesis could be defined as a proposition formulated for empirical
testing of a descriptive statement that describes
(i) The value of a parameter representing a variable
(ii) Relationship between two or more variables or attributes
Hypotheses are an integral part of many of the research studies. Their role in a study is to:
Authenticate or strengthen the confidence in any conclusion that could be drawn from the
study
Help in deciding upon the type of research design to be used
Help in drawing up a sampling plan
Help in deciding the issue relating to collection of data
Before, we deliberate further on hypothesis, we would like to explain the concept of ‘population’
in the context of BRM.
A population is a set of individuals, items, units, entities, etc.
Some examples are:
Students at a management institute
Cars manufactured in a factory
Houses in a colony
Companies e.g. those included in SENSEX of Mumbai Stock Exchange
A population is characterised by one or more number of variables. For example, human popula-
tion is characterised by income, religion, nationality, etc. An item like electric bulb is characterised
by ‘life’, wattage, etc. Students in a management institute are characterised by age, work experi-
ence, area of specialisation, etc. Companies are characterised by market capitalisation, P/E (price to
earning) ratio, etc. Each characteristic is represented by a variable.
In business research, each characteristic like sales, profit, EPS (earnings per shares), etc. is
represented by variables which are amenable to measurement.
In consonance with the dictionary meaning, hypothesis in the context of research methodology
is an assumption or statement about
A characteristic of a population
Two or more characteristics (like background and specialisation of MBA students) of a popula-
tion
The same characteristics of two or more populations
The same variable(s) of two or more population which can be tested to be true or false.
Incidentally, it may be noted that the null hypothesis is in the form of an equation like
m = 10, or inequality like m ≤ 10 or m ≥ 10. The alternative hypothesis, can, however,
be in the form of either
not equal to (≠), less than ( < ), or greater than ( > ).
The following table explains the difference between hypotheses developed for quantitative and
qualitative data:
(Contd)
Is the level of motivation among the workers less in the
second shift?
Training method ‘A’ is better than the training method What makes training method ‘A’ superior to training
‘B’ as reflected in improved sales performance of the method ‘B’?
salesmen trained by the two methods.
We have illustrated all the concepts associated with hypotheses development through a live
illustration given below.
Illustration 3.1 Hypothesis Development
It was the last day of the interviews of MBA candidates of a management institute. At the end of the
day all the faculty and external members of the interview panels had assembled in the conference
room, and were eagerly waiting for the registrar to signal that all the interview related papers were
in order so that they could disperse for the day.
While waiting, they shared some of their experiences of the interviews. Following are the remarks
made by some of them.
“The quality of this year’s students is very good”
“Yes I agree. This year, the performance has been better than last year”
“Yes I also feel, the quality of applicants is much better than last year”
“I feel, this year the students were well prepared”
“It could also be that now more and more talented students are opting for MBA rather than other
professional courses”
“Or may be that many of them were engineers”
“But the engineers that we get are not from reputed colleges so how they could be better than
B.Sc. and B.Com. students?”
“This year, the girls have fared much better than boys”
“The better quality could be due to the fact that many more students have work experience and
thus they fare better in interviews”
“Yes, this year the work experience seems to be more”
“The number of engineering graduates is much more than last year”
At this time, the registrar remarked “A quick analysis shows that the candidates in the morning
batches have scored better than the students in the afternoon batches”
He added “I overheard some candidates commenting that lady members of interview panel were
more considerate than male members”
(Contd)
3 The quality of appli- Same as 1
cants is much better
than last year
4 This year the students H0: There is no difference in the pre- Cannot be directly tested unless
were well prepared paredness of this year students and last data is collected on preparedness
year students but could be tested if relevant data
H1: This year students are better pre- is available
pared than last year students
5 More talented students H0: No difference in the students that Cannot be tested unless ‘talent’ is
are opting for MBA are opting for MBA defined in terms of some measur-
rather than other pro- H1: More talented students are opting able variables and data is collected
fessional courses for MBA on the same.
6 Many of them were en- H 0: There is no difference in the Testable directly
gineers proportions of engineers and non- Directional Hypothesis
engineers
H1: The proportion of engineers is
more than non-engineers
7 How engineers could H 0 : Engineers are same as non- Cannot be tested since ‘Better’ is
be better than B.Sc. and engineers vague unless defined properly and
B.Com. students? H 1 : Engineers better than non- quantified, and hypothesis cannot
engineers be tested
8 This year, girls have H0: There is no difference in the per- If performance can be quantified
fared much better than formance of girls and boys (marks in GD/PI written test, etc.),
boys H1: Girls have performed better than it is testable.
boys Directional Hypothesis
9 The better quality could (a) H0: There is no relationship be- (a) If quality can be quantified, it
be due to more and more tween experience of students and is testable.
students having work quality Directional Hypothesis
experience and so they H1: More the experience better the (b) Testable directly
face interview better quality Directional Hypothesis
(b) H0: There is no relationship be-
tween experience of students and their
performance in the interview
H1: More the experience better the
performance in the interview
10 Yes, this year the work H0: There is no difference in the work Testable directly
experience seems to be experience of students of this year and Directional Hypothesis
more last year
H1: This year students have more work
experience than last year
11 The candidates in the H0: There is no difference in marks of Testable directly
morning batches have the students of morning batch and the Directional Hypothesis
scored better than the evening batch
students in the afternoon H1: Candidates in the morning batches
batches have scored better than the students in
the afternoon batches
(Contd)
3.14 Business Research Methodology
(Contd)
12 Female members of in- H0: There is no difference in female Cannot be tested since ‘Considerate’
terview panel were more and male panel members is vague unless defined properly and
considerate than male H1: Female panelists are more consid- quantified, and hypothesis cannot
members erate than males be tested
A good quality proposal, in addition to the increased chances of acceptance by the concerned au-
thorities also creates a good impression as well as establishes credibility of the researcher. It is,
therefore, necessary to put in best efforts to ensure high degree of acceptance of the proposal and
its smooth execution.
A research proposal serves the purpose of convincing that the research is worthwhile and the
researcher has the requisite competence and ability to complete the project as per schedule.
It should reflect good grasp of various issues related to the topic supported by survey of relevant
literature. Accordingly, it should answer the following questions:
What is the objective to be achieved?
What is its relevance and importance?
What is the methodology to be used?
What is the plan and schedule of completion?
What are the scope and limitations?
What is the extent to which the objectives might be achieved?
Bidding Process
The bidding process provides for two steps for selection of successful bidder, first of eligibil-
ity determination and second of techno-financial evaluation of the proposal submitted by the
bidder. Detailed terms and conditions of the bid process are specified in the RFP document.
Issue and Receipt of Bids
RFP document can be obtained from EPCH’s office at EPCH House, Pocket 6 & 7, Sector-C,
LSC, Vasant Kunj, New Delhi-110070 on all working days from 7.4.2009 [TUESDAY] from
11.00 am to 5.00 pm on payment of Rs. 10,000/-. The payment shall be in form of Demand
Draft from any Nationalised/Scheduled Band drawn in favour of ‘Export Promotions Council
for Handicrafts’ payable at New Delhi. The RFP document can also be downloaded from www.
epch.com subject to condition that the payment for the RFP document shall be made along
with the bid. The bidders those have paid earlier vide advertisement dated 15.10.2008 &
23.1.2009, can obtain the REVISED RFP documents on presentation of receipt earlier
issued.
The sealed envelope containing the bids should be superscribed as ‘Bid for Development
of International Crafts Complex at New Delhi’ Bidders’ Proposals in response to RFP should
reach EPCH at the address given below on or before 1400 hours (IST) on 28.4.2009
[TUESDAY].
A pre-bid conference will be held on 17.4.2009 [FRIDAY] at 11.30 am hours in the com-
mittee Room of the Office of the Development Commissioner Handicrafts at West Block, 7,
R. K. Puram, New Delhi-110 066.
These bids are being invited on behalf of Ministry of Textile, Govt. of India.
In the context of research projects, RFP describes the problem, the context in which it arose, the
recommended approach for solving the problem, the amount to be paid for the assignment, etc. The
formal document for the project is prepared, and issued for competitive bidding by the interested
agencies. A typical format of the document, called ‘Request for Proposal’, is as follows:
(i) Background: Overview of the organisation
(ii) Research Project (Problem): Overview of the research project and general expectations like
time schedule, etc.
(iii) Vendor Information:
(a) General profile of the bidding company
(b) Record of previous experience in conducting such type of research
Research Process 3.19
3.20 Business Research Methodology
(c) Proposed strategy for conducting the research includes research design, human resources
to be deployed, time schedule, etc. It may also include seeking support from within the
organisation in terms of manpower, internal team composition, infrastructure support,
etc.
(d) Implementation Schedule in case of acceptance of the bid. For example, if some specific
information system is to be set up in the organisation, then how much time it would
take?
(e) Cost Proposal
(f) Statutory and Regulatory commitments or/and obligations for the vendor.
In a typical RFP, the format of information to be supplied by the company that floats the proposal,
and the format in which the vendor is to supply the information are as follows:
From the Interested Company
An Overview of the Company
Details of the Project
Details of All Approvals and Clearances from Regulatory Authorities
Estimated Cost of the Project, including Hardware and Software for Information Technology,
content of the project – with clauses for installation and overrun
Ethical Issues containing inter alia the clauses for confidentiality and privacy
From the Vendor Side
Profile of the Company
History and Description
Legal Status and Constraints, if any
Partners and Alliances to Participate in the Project
Proposed Approach/Solution
Services and Support
Cost Proposal–Pricing of Resources including men, material and machines
Time Schedule (PERT CHART)
Contractual Terms and Conditions for Implementation and Maintenance
So far, we have discussed conduct of business research in an organisation. However, due to
liberalisation policy and global integration of economies, an organisation is no longer insulated
from other organisations, especially those of the same sector like automobile, pharmaceutical, etc.
Therefore, while conducting research in an aspect, we also have to conduct research for that sector.
What is happening in the sector as a whole is bound to impact other organisations. This leads us
to the need for discussing certain salient aspects of research both at corporate (organisation) and
sectoral levels.
an external agency. Both strategies have certain advantages and disadvantages. In this section we
deliberate on the pros and cons and other relevant issues.
The above case leads to the importance of having qualified and competent team of researchers
who have credibility in the organisation and command respect for their professional approach. In the
organisations where formal organisational setup exists for researchers, two approaches are followed
viz.
Centralised Research
Decenteralised Research
Research Process 3.23
3.9.1.1 Centralised Approach In such a setup there is an exclusive department devoted to research
activities. All the problems that are perceived or arrive at any branch, region or corporate level are referred
to this department which is generally equipped with adequate expertise to undertake the assignments. The
researchers in this department must have multidisciplinary competence with the exception of some special-
ists in relevant areas.
3.9.1.2 Advantages of Centralised Research
(i) Research is not duplicated. The research done for one department/region/branch could be used
for others with suitable modification.
(ii) Limited number of experts are required in the central department.
(iii) Experience gained by researchers in one department can be put to use in the other depart-
ments.
(iv) Researchers follow unbiased approach to all departments.
(v) The staff of the department can have free and frank interaction with the researchers.
3.9.1.3 Disadvantages of Centralised Research
(i) Researchers are far from the scene of action and may not be able to comprehend a true picture
of the situation.
(ii) Researchers may not have thorough knowledge of systems and procedures of all the depart-
ments.
3.9.1.4 Decentralised Approach
In such a system, each department has its own research team to take care of its research require-
ments that arise from time to time.
3.9.1.5 Advantages of Decentralised Research
(i) Researchers in an individual department are well-versed with the systems and procedures of the
department as also the behavioural aspects of the staff, so chances of success of the research
increases.
(ii) Researchers are nearer to the problem area and could have direct feel of the situation.
(iii) Researchers can easily try out various solutions before selecting the most suitable alternative.
(iv) It is easier to develop expertise in one particular area than to develop expertise in several
areas.
3.9.1.6 Disadvantages of Decentralised Research
(i) The researchers may not have a detached approach. It may instead be in consensus with various
other sections of the department and its personnel, and at times may have certain preconceived
notions.
(ii) Researchers may overlook shortcomings of an individual or a system in the department for the
sake of its reputation.
(iii) Researchers may have only limited exposure, and thus the solution may be limited in ap-
proach.
(iv) Employees may hesitate in having free and frank interaction, especially if their views are not
shared by their bosses.
(v) A department may try to camouflage anything going wrong within the department, and might
influence its researchers. Similarly, a claim made by it may not be easily verifiable.
3.24 Business Research Methodology
(i) The external team may not be well-versed with the environment, systems and procedures,
culture, etc. prevalent in the organisation.
(ii) They may take time to grasp and comprehend the various aspects of the problem/study.
(iii) The selection of external agency plays a crucial role, and has to be done carefully to ensure
credibility and acceptability of their research in the organisation.
Therefore, a comprehensive list of criteria for the selection process is as follows.
3.9.2.3 Criteria for Selection of an External Research Agency The following criteria may
serve as a guide to select an external agency for assigning a research assignment.
(i) Reputation—Overall
The overall reputation of an agency in the market is an important criterion for selection.
(ii) Record of completing assignments in time
This information is also easily available. However, it has to be ensured that delays, if any, were
preliminarily caused by the agency. Sometimes the delays take place because of non-coopera-
tion from the clients or because of the factors on which the agency had no control.
(iii) Credibility in maintaining ethical standards
The agency should have the reputation of maintaining privacy and confidentiality of the as-
signments undertaken by them.
(iv) The agency should have such flexibility and this is dependent on its competence and experience
in diversified areas. This ability develops over a period of time, having varied experience and
developing maturity.
(v) Quality of past assignments
This can be verified by interacting with the earlier clients of the agency.
(vi) Experience – Overall experience and/or experience in similar assignments
This is very important for ensuring the quality of report by the agency.
(vii) Quality of Staff
The agency should have relevant qualifications and experience/expertise including communica-
tion skills.
(viii) Sharing of ideology and value systems
Through intense interaction, it has to be ensured that the company and the external agency
share similar ideology and value system.
Objectives are clearly defined Brief statement of business objective that a researcher is seeking to ac-
complish (e.g. increase revenue, reduce cost, improve customer satisfac-
tion, etc.)
Translation of objectives to hierarchi- On the basis of genesis of a problem, the researcher identifies manager’s
cal questions perception of the problem, and formulates the research problem
Research process detailed The researcher defines the complete research process with inputs/outputs at
each stage and where appropriate, an illustration of the process
Research design thoroughly planned Need for the exploration is assessed and type of exploration described
Exploratory procedures are outlined
Sample unit is clearly described along with sampling methodology
Questionnaire is designed in accordance with the required data and de-
sired analysis requirement
Data collection procedures are collected and designed
Deviations revealed Desired procedure is compared with actual procedure in report
Desired sample is compared with actual sample in the report
Impact on findings and conclusions is detailed
If the deviations are likely to significantly reduce confidence in the con-
clusions, then additional steps are taken to remove those limitations
Findings presented unambiguously Findings are clearly presented in words, tables and graphs
Findings are logically organised to facilitate reaching a decision about
the manager’s problem in line with the original objectives of the research
project
Executive summary of conclusions is outlined
Conclusions justified Decision-based conclusions are matched with detailed findings
Researcher’s experience reflected The researcher ensures reflection of expertise through the quality of the
report
Source: Based on the table given in the book titled “Business Research Methods” by David R Coopers and Pamela
S Schindler, published by Tata Mc Graw-Hill.
The intelligence of a consultant is like the power of a car engine but the problem-solving
approach is the steering wheel that guides the car in the desired direction.
3.28 Business Research Methodology
We conclude this chapter by describing an approach followed by some Strategy Consultants. The
reason for including this in the book is that many times, a corporation’s business managers have to
play this role while carrying out an assignment given to them by the senior management. We use the
word ‘client’ for the department, section, or group leader who has given the assignment. However,
we would like to mention that this approach should not be interpreted as the only approach followed
by consultants—a consultant starts with the textbook approach and adds his or her own ideas and
experience to evolve the approach to suit the business problem.
In Strategy Consulting, problem-solving requires a structured way for thinking and communicat-
ing about problems. The approach can be defined as a series of key steps.
3.13.1 Define the Problem with a Crisp, Clear and Concise Statement
The problem statement should be phrased in such a manner that it is thought-provoking and spe-
cific. It should not be phrased in a manner that re-states an obvious fact or is beyond dispute or is
too general.
Illustrative Examples:
Problem Statement Evaluation
The company should grow revenue and become profitable. Obviously!
The company should be managed differently to increase profitability. Too general!
The company should grow revenue and profit by using distribution partners instead of its Ideal!
own salesmen.
is that we may never finish the analysis in the timeframe expected for making a decision. So it is
important to understand the level of accuracy needed in the analysis.
(iv) Understand Decision-Makers’ Motivation
In conducting the analysis, it is also important to think about the decision-makers and their motiva-
tions. What might be their concerns and issues around the hypothesis? These could be based on past
experience (e.g. they have tried changing their distributors before and have never been successful).
Sometimes decision-makers are swayed by political concerns (e.g. Head of Marketing may not eas-
ily accept a solution that relies on putting more power in the hands of Head of Customer Care). It
is important to recognise these issues and address them in the analysis.
3.13.6 Illustration
Problem Statement:
Company should grow revenue and profit by using distribution partners instead of its own
salesmen.
Key Issues/Key Questions:
(i) Using distribution partners has not been successful in the past.
(a) Why did past attempts fail? What were the root causes? (e.g. lack of focus, bad selection
of partners, poor negotiation, etc.)
(b) Does the company know the distribution partners it would like to use?
(c) Are distribution partners still receptive to working with the company? Which ones?
(ii) Distribution partners may take a long time to start producing the revenue growth needed.
(a) What is typical ‘ramp-up’ time for a distribution partner? Can it be accelerated?
(b) How much revenue growth can be expected from distribution partners?
(iii) Distribution deals can be expensive.
(a) What is typical revenue sharing agreement in the industry?
(b) Can the company negotiate deals that are better than those of its competitors?
(c) What is the expense/revenue ratio that can be achieved with distribution partners and
how does it compare to that of its own sales force?
3.30 Business Research Methodology
(iv) The company has strong culture of growing revenue by growing sales force.
(a) What is the productivity of sales force?
(b) What would it cost to increase sales force to reach revenue growth target?
(c) Would the increased sales expense fit allow the company to meet profit targets?
APPENDIX
Background AGROTECH India Limited has been a pioneer in the biotechnology research-based agro
products. The organisation has been serving the farming community in India over the past two decades and
has made an impressive growth from Rs. 6 cr in 1996 to Rs. 200 cr in 2007. However, for the last 2 years
the top line has seen a downward trend and was at Rs. 170 cr in 2009 though the overall industry has seen
a healthy growth in the same period. The bottom line has also eroded and the PBT is down from a level of
Rs. 20 cr in 2007 to Rs. 10 cr in 2009. The management has taken several initiatives in the current year
including setting up the strategic agenda for the organisation and aspires to be a $150 million company by
2012.
The organisation, now wants to align its key business processes such as sales and operations
planning, demand generation, and annual budgeting with the overall business objective. Visionary
Consultants (VCs) have been partnering with AGROTECH India Limited over the last one year in
several initiatives in HR space and has been requested to submit a proposal for streamlining the key
business processes using its operations consulting expertise.
Our Understanding of the Current Situation A team from VC visited the corporate office and
interacted with key people from HR, Finance, Supply Chain and Planning and Sales including few Territory
Managers. The following were the key observations:
The actual revenues were 30% less than the budgeted revenues for FY08-09 and are about
15% less for the current year till Oct-09.
In the current year, the actual monthly sales has been only about 60% of the forecast.
There is a heavy skew in sales towards the end of the month – 50% of the sales happen on the
last day.
The products are highly seasonal and there is a very short selling season.
Research Process 3.31
The team subsequently studied a few key business processes and identified the following issues/
concerns:
Issues/Concerns in Sales and Operations Planning Process The entire Sales and Operations
Planning Process is not aligned to the Manufacturing and Procurement Lead Times.
The plan for every month is finalised by 7th and the last dispatch needs to be done by 27th/28th
–this leaves the supply side with only about 20 to 22 days to cope up with the entire months
requirement.
The dispatches of goods from all manufacturing locations generally begin only from mid of
the month. About 50% of FG is received by the C&FAs on the last 2 days of the month. This
may result in loss of sales as the selling window is very small.
The planning for procurement of domestic RM & PM is based upon the forecast given for
the next month. However, this forecast is made on an ad-hoc basis leading to a variation of
more than ± 50% between forecast and actual requirement. This results in either RM/PM not
available or pile of non-moving stock.
The planning of imported RM (LT more than 60 days) is based on the corporate budget, while
the actual requirement is far different from the corporate budget.
The metrics used for reviewing performance indicates more than 95% availability (based on
Rs dispatched vs Rs planned) while for more than 3/4th of the month the required SKU may
not be available at the C&FA.
Availability of stocks at C&FA is a no man’s territory
Inventory norms for FG/RM/PM have not been scientifically arrived at.
Issues/Concerns in Sales Processes
There is a variation in the way the sales is forecasted and there is also a high variation in
forecast vs. actual sales across the 4 units.
The focus is higher on lag indicators such as sales in Rs lakh collections, outstandings, etc.
while a few lead indicators such as details of farmer meets, number of dealers visited, dealer-
wise secondary sales, which are key for generating demand are not adequately covered.
The processes related to capturing market intelligence such as share of competitors at dealer
level, product level, crop level need to be further strengthened.
There is no structured review and standardised review mechanism across all units.
The sales return at 30% is significantly higher than the industry average.
Issues/Concerns in the Annual Budgeting Process
The budget focuses more on financial measures while performance indicators such as number/
details of new dealers to be appointed, new villages to be targeted, new crops to be targeted,
etc. need to be a part of the budgeting exercise also.
The final annual budget is not deployed in a structured manner across the organisation. Thus,
actions/initiatives to ensure achievement of budget are not identified.
The budget review mechanism is inclined more on top line and bottom line rather than status
of actions/initiatives for achievement of the same.
Suggested Approach for Improving the Business Processes AGROTECH should re-look
at some of its key business processes in a phased manner as under:
3.32 Business Research Methodology
Phase 1 The 1st phase will be for duration of 4 months and the objective will be to redesign and imple-
ment the Sales and Operations Planning Process, which will result in
Bridging the gap between the forecast and actual sales;
Reducing the skew towards month-end in both dispatches as well as sales;
Ensuring that the right stock keeping units (SKUs) are available throughout the month.
Phase 2 The 2nd phase will be for duration of 3 months with the following two objectives:
The first objective will be to redesign all sales processes such as demand generation, daily
order fulfillment, territory planning, market intelligence gathering, managing promotion activi-
ties, etc. Subsequently all the entire sales force will be trained on the new process and related
standards/formats/policies.
The second objective will be redesign and implement Production Planning and Procurement
Planning Process at all the three manufacturing locations.
Phase 3 The 3rd phase will be for duration of 3 to 4 months where a senior member from VC will
provide handholding and coaching on a part-time basis and participate in monthly reviews.
Engagement Objective for the 1st Phase The engagement objective for the 1st phase will be to
design and implement the Sales and Operations Planning Process for the Agri and Pesticide division.
VC Methodology VC would form a cross-functional team comprising of members from Supply Chain,
Sales, Production, Marketing and Finance. This team will use a very structured methodology as stated below
to redesign and implement the Sales and Operations Planning Process.
The team would work rigorously over a span of 4 months along with the VC team and would
focus on the following:
Research Process 3.33
Resource Requirements
Resource Deployment from VC Limited VC will deploy of a team of three resources as under:
A full-time consultant with appropriate skill sets will be deployed for a duration of 3 months
who will work along with the team from AGROTECH for redesigning and implementing the
Sales and Operations Planning Process
A Principal Consultant will be deployed on a part-time basis for the entire duration of 4 months.
He would mastermind the initiative, guide the teams and participate in all reviews.
A Chief Consultant will provide the necessary thought leadership to all teams including AGRO-
TECH Top Management.
Resource Requirement from AGROTECH AGROTECH will have to commit the following
resources:
One of the Senior Managers will have to champion the entire initiative and will have to commit
about 20% of his/her time on this initiative.
A middle-level manager, who will work with the VC Consultant, will have to be committed
on a full-time basis for the entire duration of 4 months.
A cross-functional team comprising of members from Supply Chain, Production, Marketing
and Sales will have to commit about 15% of their time of the next 4 months.
Professional Fees
VC would charge a professional fee of Rs. 24 lakhs (Rs. Twenty Four Lakhs Only/-) for the first phase
for a duration of 4 months. The total Professional Fee will be paid as per the following schedule:
Rs. 5 lakh will be paid as Mobilisation fee before the start of the assignment.
Rs. 5 lakh will be paid as monthly fee for the first three months.
Rs. 4 lakh will be paid as monthly fee for the fourth month.
Service tax, as per prevailing government rules (presently 12.24%), will be payable on the invoice
amount. Any new tax, applicable in future, will be similarly treated.
All out-of-pocket expenses such as local travel, travel to various plants/regional offices, lodging
and boarding at plants/regional offices, communication expenses will be payable by AGROTECH
India Limited.
All invoices and debit notes will be payable within 7 days from the date of receipt.
Research Process 3.35
S U M M A RY
DISCUSSION QUESTIONS
EXERCISE
1. Think of some problem in any area, and proceed to discuss the various steps that are required
to refine or redefine the problem, and resolve it in a systematic manner.
Research Design
4
1. Introduction
(a) Types of Research Designs
Exploratory
Descriptive
Explanatory/Causal/Relational
2. Validity of a Research Design
3. Experimental Designs
(a) Relevance and Historical Development
Contents (b) Types of Experimental Designs
One-factor Experiments
Two-factor Experiments
Two-factor Experiments with Interaction
Latin Square Design
Factorial Designs
Quasi-experimental Design
Ex Post Facto Designs
4. Cross-sectional Studies
5. Longitudinal Studies
6. Action Research
7. Sampling Schemes
8. Simulation
LEARNING OBJECTIVES
The main purpose of this chapter is to provide a comprehensive knowledge about the various experi-
mental and research designs, and their applications in business environment.
In addition to conventional designs and sampling schemes, the objective is also to acquaint with two
other types of designs that are highly useful in research environment to provide complete knowledge
about conduct of research studies.
One type of studies relates to conduct of research of a particular phenomenon for a cross-section
of entities like companies, institutions, at one point of time or over a period of time.
The other type of studies relates to an environment where it is advisable to simulate or generate
data in a futuristic scenario, and use it for managerial decisions.
4.2 Business Research Methodology
Relevance
Modern Electronics, a leading seller of electronic products has a chain of several stores all over
India. It has been growing at about 20% for the last 3 years. However, the opening of stores
of a multinational brand of electronic products has affected the sales of Modern Electronics
considerably during the last six months. The management of the company has been toying
with various options to deal with this unprecedented situation. The CEO thought of seeking
answers to the following questions before drawing out a strategy for future:
Have the sales of all products been affected equally?
Have the sales of the six zones been affected equally?
Is there any interaction between the decline of sales of products and the zones i.e. whether
any particular products have been affected more in some zones than others?
What is the perception of customers towards Modern Electronics, and what are their sug-
gestions for greater acceptability of its products?
What are the employees’ suggestions to boost up the sales?
The CEO engaged the services of a consultant for this purpose. The consultant used the
research designs and experimental designs discussed in this chapter to provide answers to
the above questions which helped the CEO to draw up a competitive strategy to successfully
compete with other multinational companies.
institutions in India. Although the information was useful and led to setting up an appropriate infor-
mation system, it was not available at the time when urgently needed for policy formulation.
We consider different designs for these three different types of studies as there are different
requirements of design for the three phases. For example, flexible design is suited to exploratory
studies, and rigid design is suited for explanatory studies. The descriptive studies may have a
mixed approach. A flexible design is amenable to the required changes in the research. In the case
of exploratory study, not much is known about the research topic, and therefore frequent changes
are required. The design should allow such changes. In the case of explanatory research study, the
research is generally matured through the previous two stages and the design should be rigid enough
to avoid any bias that might crop in the study.
We will discuss these phases, in brief, in the next section.
4.4 Business Research Methodology
4.3.1 Exploratory
We have discussed the exploratory studies in Chapter 1. Thus, we limit our discussion on exploratory
studies as a type of research design.
Exploratory studies are generally carried out:
When not much is known about the situation – prevailing or encountered, and yet we want to
have some assessment;
When we want to solve a problem but no information is available as to how same or similar
problem was solved in the past.
Exploratory study is conducted to explore a problem, at its preliminary stage, to get some basic
idea about the solution at the preliminary stage of a research study. It is usually conducted when
there is no earlier theory or model to guide us or when we wish to have some preliminary ideas to
understand the problem to be studied, as also the approach towards arriving at the solution. It might
help in modifying the original objective of the study or might even lead to a new perspective rather
than the earlier perceived problem.
4.3.2 Descriptive
Such studies deal with:
Description of a phenomenon like accidents in a city
Describing a variable like revenue, life of an item, return on investment etc., representing a
characteristic of a population under study
Estimation of the proportion of the population having certain characteristic(s) like colour prefer-
ence of cars, specified qualification or experience, etc.
The main goal of this type of study is to describe the data and characteristics of what is being
studied. The idea behind conducting descriptive study is to study frequencies, averages and other
statistical calculations. Although this study is highly accurate, it does not explain the causes
behind a situation.
Unlike exploratory research, descriptive research may be more analytic. It often focuses on a
particular variable or factor.
4.3.3 Explanatory/Causal/Relational
Such studies involve studying the impact of one variable on the other and also the relationship
between two variables.
The relevance of causal study arises only when there exists correlation between two variables.
For example, if there is correlation between two variables, say sales and advertising expenses, one
may like to study which of the two is the ‘cause’ and which is the ‘effect’. In this case, advertis-
ing expenses is the cause (called independent variable) and sales (called dependent variable) is the
effect. Incidentally, causal variable is also called ‘explanatory’ variable as it explains the effect or
impact on the dependent variable. Similarly, if one finds existence of association between two fac-
tors, one may investigate which of the two factors is the ‘cause’ and which one is the ‘effect’. But
the study cannot proceed further. However, in the case of correlation study relating two variables,
once we know the ‘cause’ and the ‘effect’, the study can proceed further to find out the relationship
between the two variables.
When we talk about association or correlation, we refer to association or correlation between
two factors or variables. A factor is an attribute or characteristic like colour, qualification, area of
Research Design 4.5
specialisation in MBA, and is usually not measurable. A variable is a measurement of some char-
acteristic like income, age, etc.
While association is used in the context of studying factors, correlation is used in the context of
studying variables.
Some examples of association study are:
Association between
Academic background (like commerce, science, engineering, etc.) of an MBA student and
his/her option of area like marketing, finance, operations, etc.
Motivation and Performance of Salesmen
Some examples of correlation study are:
Correlation between
Income and the insurance cover by individuals
Expenditure on advertisement and sales turnover of a company
In the case of association, we can conclude only when
There is any association, or
There is no association
However, in the case of correlation, we can also find whether the correlation is positive or negative.
If the correlation is positive, both variables move in the same direction like sales and advertising
expenses. If the correlation is negative, both variables move in the opposite direction like availability
of a commodity and its price.
Once causal study establishes cause and effect relationship between two or more variables, the
relational study attempts to measure the relationship in a mathematical equation derived by using
statistical methodology. The relationship could be in the form
y (sales) = 25 + 10 x (Advertising Expenses)
It may be interpreted as follows:
For a unit (say, Rs. in crore) change in the value of x, y changes by 10 units (Rs. in crore). It
implies that, on an average, an increase of Rs. 1 crore in advertising expenses causes an increase
of Rs. 10 crore in sales.
If the effect on dependent variable is only due to variation in the independent variable, then we
may conclude that the internal validity is achieved.
To achieve high internal validity, a researcher must consider all the factors that affect the depen-
dent variable and control them appropriately. It ensures that these factors do not interfere with the
results.
It may be noted that internal validity is only relevant in studies that aim at establishing causal
relationship. It is not relevant in most of the exploratory, observational or descriptive studies.
Some of the threats to internal validity are:
History
Maturation
Testing
Selection of Respondents
Statistical Regression
Experimental Mortality
A brief discussion on each of these threats is given below.
History
History refers to the events that are beyond the control of the experiment. These events may change
the attitude of the respondents irrespective of whether the independent variable is changed or not.
Thus, it is impossible to determine whether any change on the dependent variable is due to the
independent variable, or the historical event.
Maturation
The respondents may not have the same level of responses at the later part of experiment as in the
beginning of the study. The permanent changes, such as physical growth and temporary changes
such as fatigue, may provide alternative explanations; thus, they may change the way a respondent
would react to the independent variable. So upon completion of the study, the researcher may not
be able to determine if the cause of the discrepancy is due to time or the independent variable.
Testing
When the respondents are subjected to the repeated test, i.e. in the experiments where the respondents
are tested more than once, the bias could crop in as the respondent may remember what they are
being tested for. The mental ability tests in MBA selection process is an example. The respondents
may learn the techniques by practicing and may get higher scores than before. This may not be at-
tributed to the independent variable, thus leading to bias.
Selection of Respondents
The inappropriate selection of respondents may lead to bias in experimental design. If the selected
respondents are not uniform, inadvertent randomisation may take place leading to bias.
Statistical Regression
The statistical regression refers to the bias that may crop in due to some respondents giving extreme
responses. This bias is known as error sum of squares in statistical regression analysis.
Experimental Mortality
This can occur when the respondents drop out during the experiment especially in the experiment
involving pre-test and post-test. The same respondents who take up the pre-test may not be avail-
able for the post-test. This results in excluding the entire pre-test data from the analysis for the
dropped-out respondents.
Research Design 4.7
The word ‘treatment’ and its levels may be used in experiments as follows:
Before we proceed further, we would like to define the following terms used in Design of Experi-
ments, so that their usage will be understood and appreciated in the subsequent paragraphs.
Experimental Unit It is the object on which the experiment is to be conducted. Examples: Plots (of land),
Students, Salesmen, Patients
Response It is the dependent variable of interest.
Examples: Yield (of plot), Return on investment, Performance scores (like marks, grades,
sales), Quality of product/service
Treatment or Factor Those independent variables whose effect on the response is of interest to the experi-
menter
Quantitative Factor Measured on a numerical score like discount in price
Qualitative Factor Not measured numerically like gender, colour, location
Levels of a Factor Different values of a factor. Values of the factor that are used like 10 gms, 12 gms and
15 gms per sq. metre of land, dosages of medicine, duration of training, percentages of
discounts (5%, 10%, 15%)
Types of Factors Different varieties of fertiliser, different medicines, same training by different institutes,
different mutual funds or different schemes of mutual fund, different makers of machines
used in a factory
E x p e r i m e n t a l D e - The experimenter/researcher controls the specification of treatments and the method of
signs assigning experimental units to each treatment
C o m p l e t e l y R a n - Herein, the experiment is concerned with the study of only one factor. Each treatment/fac-
domised Experiment tor is assigned or applied to the experimental units without any consideration
Block Each block (analogy with agricultural plots) comprises of same number of experimental
units as the number of treatments under experimentation
Randomised Block Herein, the experiment involves study of two or more factors. One experimental unit
Design from each of the blocks, say ‘n’ in number, is assigned to each of the, say, ‘m’ treat-
ments. Thus, ‘n’ blocks have ‘m’ treatments in each block. For example, suppose, the
experiment involves comparing I.Q.s of students (experimental units) of each of the three
areas (‘treatments’) viz. Marketing, Finance and Operations
Blocking Blocking implies control of factors which are either not of interest or their effect is
removed/filtered/averaged out.
(Contd)
Research Design 4.9
(Contd)
Control Group This term is explained with the help of an example.
One may like to study the productivity of a particular fertiliser. This can be studied by
taking some, say 12, plots of similar type. While the fertiliser could be used only on 6
plots, the other 6 plots could be cultivated without the fertiliser. Thereafter, the yields
of the two sets of plots may be compared to see the effect of the use of the fertiliser.
In Experimental Designs terminology, the fertiliser is called ‘Treatment’ and the plots
are called ‘experimental units’ or simply ‘Units’ which are subjected to treatment for
generating the desired data. The plots which are not treated with the fertiliser are called
‘control’ group. They are used to evaluate impact of the treatment.
Replication and Ran- In addition to the concept of control, explained above, two other concepts viz. Replica-
domisation tion and Randomisation are very relevant and important for designing of experiments.
Replication, as the name implies, means repeating. Obviously, no conclusion can be
drawn by conducting experiment just on one unit. Just like, in statistics, no worthwhile
or meaningful conclusion can be drawn just by taking one observation in the sample;
similar is the case in DOE. Just like in a statistical study wherein the units on which the
observations are to be recorded must be selected randomly, similarly, the units must be
selected randomly as also the treatments under consideration should be applied to the
units in a random manner. Formally a randomisation is defined as random assignment
of treatments to experimental units.
We may reiterate that control, replication and randomisation play a very important role, and,
in fact, are at the core of Design of Experiments.
Following are some of the types of experiments which can be conducted for studying:
Productivity of two or more types of fertilisers, seeds, irrigation systems
Productivity of different amounts of the same fertilisers per unit of land
Effectiveness of two or more types of medicines
Effectiveness of two or more doses of the same medicine
Effectiveness of two or more types of training systems
Impact of various marketing strategies
Impact of various incentives for improving sales
Impact of responses to products and services in different cities, regions, etc.
Evaluation of different returns on various stocks or market indices
The statistical technique used for analysing the data collected for conduct of experiments is ANOVA,
discussed in Chapter 12.
We have used the concepts of ‘Design of Experiments’ with the help of agricultural experiments,
as these were the first such experiments, and more important, they are simple to understand.
4.5.2.1 One-Factor Experiment This can be better explained by the following example:
The yield of a crop depends on several factors like fertilisers, variety and quality of seeds, type and
quality of soil, methods of cultivation, amount of water made available, climate including tempera-
ture, humidity etc., method of harvesting, etc. Out of the several factors contributing to the yield of
a crop, say rice, suppose we are interested in any one factor, say varieties of rice.
In such cases, the type of treatment is one, e.g. variety of rice. Let there be three varieties of
rice. The data is collected about the yield of rice for each variety on a number of plots of equal size
and similar type of soil. The care to be taken while designing the experiment is that the yield from
plot to plot should vary only due to variety and not due to other factors. A typical table giving data
collected through an experiment is given as follows.
(Yield in Quintals per Unit of Plot)
Using the appropriate ANOVA technique, it can be concluded that there is significant difference
among the yields of the three varieties of rice. This example is solved in Chapter 12, Section
12.5.1.
Illustration 4.1: One Factor
Three groups of five salesmen each, were imparted training related to marketing of consumer prod-
ucts by three Management Institutes. The amount of sales made by each of the salesmen, during the
first month after training, is recorded and is given in Table 4.2.
Salesmen
1 2 3 4 5
Institutes
1 65 68 64 70 71
2 73 68 73 69 64
3 61 64 64 66 69
Research Design 4.11
The problem posed here is to ascertain whether the three institutes’ training programmes are
equally effective in improving the performance of trainees. If m1, m2 and m3 denote the mean ef-
fectiveness of the programmes of the three institutes, then, statistically, the problem gets reduced
to test the following null hypothesis, i.e.
H 0: m 1 = m 2 = m 3
against the alternative hypothesis that it is not so i.e.
H1: All means are not equal
This situation is resolved by using ANOVA – One Factor, explained in Chapter 12. The conclu-
sion reached is that all the training institutes are not equal with respect to the training programmes
conducted by them.
4.5.2.2 Two-Factor Experiments Suppose it is claimed that the yield of any variety of rice depends
not only on the variety itself, but also on the type of fertilisers used. Let there be three types of fertilisers
under consideration. Thus, we would also like to test as to whether yields due to all the three fertilisers are
equal. Such experiment is called two-factor experiment, and the data collected is in the following format.
Fertilisers
I II III IV
A 6 4 8 6
Varieties
B 7 6 6 9
C 8 5 10 9
Total 21 15 24 24
If we take the totals of variety ‘A’ for all fertilisers i.e. 24, it removes the effect of varieties, and
indicates yield of only variety ‘A’. Thus, the totals of the three varieties i.e. 24, 28, and 32 indicate
only the differences among ‘A’, ‘B’ and ‘C’. The impact of fertilisers has been averaged out. We
can also say that the impact of the factor fertiliser has been ‘blocked’.
Similarly, the totals of the four fertilisers, i.e. 21, 15, 24 and 24 indicate only the differences in
fertilisers. The impact of varieties has been ‘averaged out’ or ‘blocked’.
The analysis for two-factor experiment is the same as the analysis for two-way ANOVA, and is
given in Chapter 12. Using two-way ANOVA, it can be concluded that there is no significant dif-
ference among the three varieties of rice as also among the four varieties of fertilisers.
Illustration 4.2
The following table gives the number of subscribers added by four major telecom players in India
in the months of August, September, October and November 2005. The data are given in 000’s and
are rounded off to the nearest 100, and are thus in lakhs.
4.12 Business Research Methodology
Additions to Subscribers
(In lakhs)
Company
Months Bharti BSNL Tata Reliance
Indicom
August 6 6 2 5
September 7 6 2 3
October 7 6 6 4
November 7 8 7 4
V2 7 6 6 9
6 7 7 8
V3 8 5 10 9
7 5 9 10
It may be noted that in the data given in Table in Section 4.5.2.2, there is only one value available
for each of these combinations analysing variation due to any combination. However, for isolating
the interaction factor, we should have minimum of two values for each combination. Accordingly,
the above table gives data with minimum of two yields, obtained on two different plots of the same
type, for each of the combinations.
The analysis of such data is carried out with the ANOVA analysis for two-factor interaction, as
will be explained in Chapter 12. It can be concluded that:
The yields of three varieties of rice are significantly different.
The yields of four types of fertilisers are significantly different.
The interactions among varieties of rice and types of fertilisers are significantly different.
Illustration 4.5: Two Factors with Interaction
It has been observed that there are variations in the pay packages offered to MBA students. These
variations could either be due to specialisation in a field or due to the institute wherein they study.
The variation could also occur due to interaction between the institute and the field of specialisation.
4.14 Business Research Methodology
For example, it could happen that the marketing specialisation at one institute might fetch better
pay package rather than marketing at the other institute. These presumptions could be tested by col-
lecting following type of data for a number of students with different specialisations and different
institutes. However, for the sake of simplicity of calculations and illustration, we have taken only
two students each for each interaction between the institute and field of specialisation.
The data is presented below in a tabular format:
Institute A Institute B Institute C
Marketing 8 9 9
10 10 8
Finance 9 10 6
11 11 7
HRD 9 8 6
7 6 6
A B C
B C A
C A B
With the Latin Square design, a researcher is able to control variation in two directions. It may be
noted that:
Treatments are arranged in rows and columns.
Each row contains every treatment but only once.
Each column contains every treatment but only once.
Latin square designs were developed in the context of agricultural experiments. Suppose there is
a big agricultural plot available for experimentation. This plot is to be divided into several smaller
plots (experimental units) for experimenting to compare the yield of different fertilisers.
Research Design 4.15
If the plot is such that its fertility changes along with its length as well as its breadth, then alloca-
tion of fertilisers has to be done in such a way that variations are averaged out in both directions.
If there are 4 levels of fertilisers, the plot is divided into 4 × 4 (=16) plots and the different types
of fertilisers are assigned to different plots as follows:
F1 F2 F3 F4
F2 F3 F4 F1
F3 F4 F1 F2
F4 F1 F2 F3
It is better if the shape is square. The square given above may be made smaller to accommodate.
Such design is called Latin Square design.
It may be noted that:
Every treatment is used in all rows and columns
One treatment is used only once in each row
One treatment is used only once in each column
Because of such allocation, the variation in fertility along length and breadth does not matter for
comparing the yields of different fertilisers.
The above allocation is only one of several possible allocations satisfying the above criterion.
4.5.2.5 Factorial Designs A factorial experiment is an experiment whose design consists of two
or more factors, each with discrete possible values or ‘levels’, and whose experimental units take on all
possible combinations of these levels across all such factors. Such an experiment allows studying the ef-
fect of each factor on the response variable, as well as the effects of interactions between factors on the
response variable.
Factorial experiments allow for investigation of the interaction of two or more factors or inde-
pendent variables. A factorial design allows for testing of two or more treatments (factors) at vari-
ous levels, and also their interaction. For this reason, they are more efficient i.e. providing more
information with lesser resources.
4.16 Business Research Methodology
Plain
Checks
In general, for the vast majority of factorial experiments, each factor has only two levels. For example,
with two factors A and B each taking two levels viz. A1 and A2, and B1 and B2, a factorial experi-
ment would have four treatment combinations in total, and is called a 2 × 2 factorial design.
A 1B 1 A 1B 2
A 2B 1 A 2B 2
method is use of time series analysis to ascertain whether the impact of any factor has undergone change
over a period of time. Experiments designed in this manner are referred to as having quasi-experimental
design.
One such example is that if we want to examine whether the annual rate of returns on stocks of
Infosys, TCS and WIPRO are the same over a period of last 5 years. The data is available in an
organised form, and as such no randomisation is required.
Since quasi-experimental designs are used when randomisation is impossible and/or impractical,
they are easier to set up than true experimental designs; it takes much less effort to study and compare
subjects or groups of subjects that are already naturally organised than to have to conduct random
assignment of subjects. Additionally, utilising quasi-experimental designs minimises threats to ex-
ternal validity as natural environments do not suffer the same problems of artificiality as compared
to well-controlled laboratory settings. Since quasi-experiments are natural experiments, findings
in one study may be applied to other subjects and settings. Also, this experimentation method is
useful in longitudinal research that involves longer time periods which can be followed in different
environments.
Illustration 4.6
A consulting agency has been helping an investment company to recruit about 20 MBA students as
business analyst executives from two management institutes, each year. The company offers lucra-
tive compensation package aimed to attract the best talent.
One year, the consulting agency thought of organising an online competition between the two
management institutes from where they were recruiting the executives. It was prompted because of
the debate in the academic circles about the superiority of one over the other.
They picked up 20 top students, based on their latest academic scores, from each of the institute,
who agreed to participate in the competition. We may notice that it was not a random selection.
They divided the 20 students in 5 groups each having 4 students. The groups comprised students
with ranks 1 to 20, as follows:
Institute A:
Group 1 1 6 11 16
Group 2 2 7 12 17
Group 3 3 8 13 18
Group 4 4 9 14 19
Group 5 5 10 15 20
Institute B:
Group 1 1 6 11 16
Group 2 2 7 12 17
Group 3 3 8 13 18
Group 4 4 9 14 19
Group 5 5 10 15 20
It may be noted that the assignment of the students to the groups is not done in a random
manner.
4.18 Business Research Methodology
The business investment game was given to each of the two corresponding groups from each
institute. Group 1 of institute A competed with group 1 of B, Group 2 of A with Group 2 of B and
so on, and the scores recorded for all the groups. The institute whose three or more groups recorded
wins was declared as the winner.
4.5.2.7 Ex Post Facto Design The literal meaning of expost facto is “from what is done afterwards”
or “after the fact”. It means something done or something occurring after an event with a retroactive effect
on the event.
In an experimental approach an investigator has direct control, or can manipulate at least one
independent variable. He can choose his experimental units at random and assign treatments to
groups at random.
In ex post facto approach, one cannot control the independent variable or variables because they
have already occurred, and cannot assign subjects or treatments at random. In this situation, an
investigator must take things as they are and do his best in trying to sense and disentangle them.
In variables’ language, ex post facto research means that an investigator starts by observing a de-
pendent variable(s), and the possible causes for it i.e. independent variable(s), and then he studies
the independent variable(s) retrospectively for its possible effect on the dependent variable(s). For
example, if the sales (dependent variable) of a product have declined then one may like to study as
to whether it was due to change in price (independent variable) or change in quality (independent
variable) or some other factors.
Ex post facto method has been used in all fields of social sciences, dealing with problems which
do not lend themselves to experimental inquiry. As a matter of fact a large number of researches
in sociology, education and psychology are ex post facto. The method, in effect, has offered a
valuable tool to:
sociologists, who, for instance, wanted to study the cause of crime, drug addiction, delinquency,
family breakdown, and many other social ills that afflicted every society;
psychologists who wanted to study individual and group behaviour, roots of adult personality,
racial discrimination, conflicts and disagreements, and child-rearing practices;
educational scientists who wanted to study school achievement, teaching methods, intelligence,
teacher personality, home environment, etc.
Many of the above studies could not have been made through the normal way of merely collecting
the data and interpreting them, simply because they cannot be subjected to true experimentation.
Yet another example could be the study of highest or average marks obtained in the paper on Busi-
ness Research Methodology by students in each of the areas of specialisation like marketing, finance,
operations, IT and HR.
Cross-sectional study could also be made to study the relationship among several variables relat-
ing to an entity over the group of similar entities like all automobile manufacturers. The variables
in this case could be sales, net profit, advertising expenses, etc. For example, one could study the
changes in ‘sales’ and ‘net profit’ and their interrelationship in several similar companies in a par-
ticular sector like cement, in a particular year.
The major advantage of cross-sectional research is that data can be collected on many entities of
different kinds in a short span of time.
Since the data is collected at one point of time, it can be easily collected at lower cost. For ex-
ample, if we want to study the changes in share prices of some selected companies due to the an-
nouncement of budget, we may collect the data from just one newspaper of the next day. Similarly,
if we wish to compare business parameters like deposits, credit, etc. of commercial banks for the
year 2009-10 over the year 2008-09, we may refer to just one annual publication of Reserve Bank
of India.
Thus, it may be noted that the cross-sectional data may give the position at one point of time,
during the same period of time or it may indicate the change at one point of time over the previous
day/week/year.
The main advantage of cross-sectional study is that it is cheaper and faster to conduct such
a study. However, the main disadvantage of such study is that it reveals little as to how the
changes occur.
Longitudinal studies are quite popular in social and behavioural sciences, socio economic research,
banking and finance, etc. These are conducted over a period of time. Such studies can be made to
study changes in an individual or a group of individuals, a country or a group of countries over a
period of time. Following are some more examples of longitudinal studies:
Expenditure pattern over a period of time of an individual or a group of individuals
‘Quality of Life’ parameters of a state or a country
NPAs (Non-performing Assets) of an individual bank or the banking industry
R&D Expenditure by a sector of companies like pharmaceuticals
Communication among people over a period of time in one or more regions/countries (postal,
telephonic, e-mail, etc.)
It may be noted that all variables studied under longitudinal studies are measured over a period
of time.
For example, in one of the opinion polls surveys conducted in a country during Presidential elec-
tions, an agency selected the prospective voters for ascertaining their opinion with the help of the
telephone listings in the telephone directory. The agency did not realise that the persons listed in
the telephone directory were not representative of the entire voting population. In fact, as could be
well imagined, the listed persons belonged to the affluent section of the society whose percentage
in the entire voting population could be small. Further, it is observed, in general, that the percent-
age of the affluent persons going for voting is lesser than the other classes of voters. Because of
these factors, the prediction made by the agency about the poll results was totally off the mark. The
agency suffered a big setback in its reputation, and had to eventually stop its publication.
If due care is taken in selecting a representative sample from the population, the results obtained
will, generally, not only be more reliable and accurate but also will consume lesser resources
in terms of manpower, time, money, etc.
Sampling Frame: We ordinarily select a sample of units by selecting the numbers that identify them,
e.g. savings accounts at a bank’s branch. Generally, we first, identify each unit of the population by
giving them a distinct number, generally from 1, 2, 3, …, N where N is the population size.
Sampling With or Without Replacement: Let a population consist of N units. If a sample of size
n units is obtained by first selecting one of the N units, replacing it, then making a second selection
and replacing the unit before making a third selection, etc. until n selections are made, then the
sample is said to be selected with replacement. Since there are N possible results in each of the n
selection, the total number of possible samples of size is NPn. It is to be noted that a unit could ap-
pear more than once in the sample.
If a sample of n units is obtained by first selecting one of the N units, and, without replacing it,
selecting one of the remaining (n – 1) units, and without replacing the two selected units, and so
on, so that at the nth selection, there are (N – n + 1) units, then we say that the sample has been
selected without replacement. It may also be obtained by the first method (i.e. with replacement) if
the number of selections are continued till n distinct (different) units are selected and all repetitions
are ignored. In this case, the total number of possible samples is NCn.
But these are unordered samples. Each of these NCn samples has n! ordered samples, and hence
the total number of all possible ordered samples is n! NCn.
Illustration 4.7:
Let a, b and c be three units in the population, and we want to select a sample of 2 units, i.e. N = 3 and
n = 2. The possible samples in two cases (i.e. with and without replacement) are given below:
(i) Sampling With Replacement
In this type of sampling,
3! 3!
Total Number of possible samples = 3P2 = _______ = __ 1! = 6
(3 – 2)!
and all possible samples of size 2 are:
aa ab ac bb bc cc
(ii) Sampling Without Replacement
In this type of sampling,
3! 6
Total number of possible samples = 3C2 = _______ = _____ = 3
2!(3–2)! (2)(1)
and all possible samples of size 2 are:
ab ac bc
Now, we proceed to discuss some commonly used sampling schemes with their salient features.
These are:
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
These are described as follows.
First, we decide the digit of the random numbers to be chosen and it depends on the population
size N. If N is a one-digit number, then one-digit random number is used, if N is a two-digit number,
then two-digit random number is used, and so on. Examples of three-digit numbers are given in
Section 4.9. Then n random numbers, each less than or equal to N are chosen from the random
number table by starting, usually, from the first row of any one selected column of the table and
observing each number of the selected column from its starting row. If the column is completely
exhausted, and we do not find the required number n of random numbers from the column, then we
use the next column in the continuation, and so on.
Step 3: Finally, the units corresponding to the selected n random numbers are listed separately
which constitute the sample and further required information is collected from them according to
the plan of the survey.
Estimation of the Population Mean
The estimate of the population is the sample mean calculated as
_ Σxi
x = ___
n
where xi is the value of the characteristic of the ith unit in the sample (i = 1, 2, 3 , …, n), where n
is the sample size.
The sample mean may change from sample to sample. The extent of variation in these sample
means is measured by a quantity called standard error (s.e.) and is equal to
_____
_ _ s__ N
___ –n
_____
Standard Error of x = sx = n N–1
where n is the sample size
N is the population size
s is the s.d. of population
The ratio of sample size to population size i.e. n/N is called sampling fraction. _____ _____
–n
N_____ 9900
If N is very large say, 10,000, and n is comparatively small, say 100, then N – 1 is _____ 9999 ,
which is approximately equal to 1, and therefore,
s_x = ___
s__
n
Thus, if the population size is infinite or large, the s.d. of sample mean, also called standard
error, is
s_x = ___
s__
n
but if the population size is finite i.e. not large, then the s.d. of the sample mean is to be multiplied
by the term _____
N –n
_____
N–1
For this reason, the above term is called finite population correction.
Stratified Sampling
Cluster Sampling
These are described as follows.
4.9.7.1 Systematic Sampling In simple random sampling, the units in the sample are selected with
the help of the random number table. However, there is another method of sampling in which only the first
unit of the sample is selected with the help of the random number table, and the rest are selected automati-
cally according to a pre-determined pattern. The method is known as Systematic Sampling.
For instance, there are 50 students in a class, and each of them has the roll number from 1 to 50.
Suppose, we wish to select 10% i.e. 5 of the students for assessment of their views on the library
facilities, then, we may select one random number from single digit random numbers varying from
0 to 9, say 5. Thus, the first student in the sample would be the one with student number 5. The
number of the next student would be obtained by adding 10 to the first number 5, i.e. 15, and so
on. Thus, the five students selected in the sample would be the students with students: 5, 15, 25, 35
and 45. If the first random number selected was 0, then the first student would have been the one
with student number as 10, and the subsequent students in the sample would have been 20, 30, 40
and 50. The detailed procedure is outlined below:
Selection Procedure
Step 1: In systematic sampling also, first the sampling frame is prepared. However, the units are
not just randomly identified like in simple random sampling, but they are first arranged in a fixed
order with the help of some information and according to the purpose in mind. For example, in the
illustration cited above, the roll number was used to arrange the students in a particular order.
Step 2: Depending on the sample size desired, sampling interval is worked out. The interval gives
the difference among successive units to be selected in the sample. The sampling interval, say I, is
the ratio N/n. If N/n is not an integer, its integral part could be treated as I for the purpose of se-
lecting the sample. Then a random number, say R, is selected from the appropriate random number
table such that 1 ≤ R ≤ I.
Step 3: The units corresponding to the serial number R, R + I, R + 2I, …, R + (n – 1) I would
constitute the sample.
The above method of sampling is known as Linear Systematic Sampling.
If N/n is not an integer, the sample size will vary, being either n or n + 1, depending on the ran-
dom number selected.
For example, if N = 10 and n = 3, then N/n = 10/3 = 3.33, and therefore I = 3 (integral part of
3.33). Now, a number is to be selected from 1 to 3 . Suppose the number is 3, then the sample to be
selected would be the units corresponding to 3, 6 and 9. However, if the random number selected
between 1 and 3 is 1, then the units in the sample would be the units corresponding to 1, 4, 7 and
10 i.e. a sample of size 4.
One way of avoiding the difficulty of varying sample size is to select the sample through Circular
Systematic Sampling, described below.
Select a random number R from 1 to N. Start at the unit corresponding to this number, and there-
after, select cyclically every Ith unit (I is the integer nearest to N/n) until n units are chosen for the
sample. The cyclic selection involves assigning the number N + 1 to the first unit in the sampling
frame, N + 2 to the second unit, and so on. This is done to continue sampling even when the nth
unit has been reached. For example, in the above illustration when the population size was 10 and
4.28 Business Research Methodology
the sample size required was 3, I worked out to be 3. Now, if the random number selected between
1 and 10 is, say 6, then the numbers to be included in the sample would be 6, 9 and 12. However,
since the number of units in the population is 10, the unit corresponding to the number 12 would
be 2 (= 12 – 10), and this would be included in the sample. Thus, the sample would comprise the
units 2, 6 and 9. This procedure is known as Circular Systematic Sampling.
In the above example of 50 students in the class, suppose the sample size desired is 8 students.
Then the sampling interval I is equal to the integral part of 50/8 i.e. 6. Instead of taking a random
number between 1 and 6 as in linear systematic sampling, to start the circular systematic sampling,
one takes a random number between 1 and 50. Suppose it is 30. We divide 50 by 30 to get the
quotient as 1 and the remainder as 20. Now, we start sampling with this number 20, and get subse-
quent numbers in the sample as 26 (20 + 6), 32 (26 + 6), 38 (32 + 6), 44 (38 + 6), 50 (44 + 6), 56
i.e. 6 (50 + 6), and 12 (6 + 6). Thus, the sample of eight students would comprise employees with
numbers 6, 12, 20, 26, 32, 38, 44 and 50.
The arithmetic mean is an estimator of the population mean.
4.9.7.2 Stratified Sampling Stratified sampling involves classifying the population into a certain
number of non-overlapping homogeneous groups called strata, and then selecting samples independently
from each stratum (singular form of strata). For example, in India the entire population can be classified into
two strata as rural and urban, in three strata as ‘lower income group’, ‘middle income group’, and ‘higher
income group’ or in three strata as ‘children’ (up to 18 years of age), ‘adults’ (more than 18 years of age)
but up to 60 years of age, and ‘senior citizens’ (more than 60 years of age).
The strata should be as homogeneous as possible within each stratum, and as heterogeneous
as possible among various strata.
The main advantage of using the stratified sampling is to increase the efficiency (concept explained
in Chapter 11) of the estimators estimating the population characteristics. The other advantages are
as follows:
(i) When estimates are required with given precision not only for the population as a whole but
also for the various strata, stratified sampling is used.
(ii) When an organisation has field offices in various zones in which the country may have been
divided for administrative purposes, it might be desirable to treat zones as strata for facilitating
the organisation of fieldwork.
(iii) When an organisation has field offices in various zones in which the country may have been
divided for administrative purposes, it might be desirable to treat zones as strata for facilitating
the organisation of fieldwork.
(iv) When there is some periodical variations in the population, like hill stations, religious places,
etc., the stratification is useful for reducing the chances of getting bad samples.
Further, stratification is highly useful when the population is skewed e.g. income of individuals,
business at branches of a commercial bank, productivity of rice (yield per acre) in various parts of
the country, etc.
Selection Procedure
Step 1: First, the population of N units is divided into ‘k’ strata with the help of the prior knowledge,
intuition, etc. such that the strata are as homogeneous as possible. Let the number of units in the ith
stratum be Ni; i varying from 1 to k. Thus,
N = ΣNi
Research Design 4.29
Step 2: The total sample size n is allocated to each stratum in such a way so as to provide an esti-
mate of the population mean with maximum precision for a given cost. Such allocation is referred
to as the principle of Optimum Allocation.
But due to practical difficulties in arriving at these allocations, mostly Proportional Allocation
or sometimes Equal Allocation is used in practice. Let ni denote the sample size allocated to the ith
stratum, then
n = Σni
Step 3: After the strata and the sample sizes in each stratum are determined, the samples are drawn
independently from various strata following any of the above mentioned sampling scheme viz. Simple
Random, Systematic, etc., which is most efficient or/and convenient to the stratum concerned.
One of the biggest advantages of stratification is that one can use different sampling schemes
in different stratum.
Estimation
__ of Population Mean
Let Xi denote an unbiased estimator of the mean of the ith stratum, then an estimator of the popula-
tion mean is given by: __ __
Xst = ΣWiXi
Where,
Wi = Ni/N
4.9.7.3 Cluster Sampling Sometimes, the entire population is grouped into clusters which comprise
a number of units each. For example, the entire rural area in the country or in a district could be divided
into villages – the villages constituting the clusters.
In cluster sampling, the sampling unit is a cluster. Thus, cluster sampling involves formation of
suitable clusters of units, and then selecting a sample of clusters treating them as units by an appro-
priate sampling scheme. It is to be noted that all the units of each selected cluster are enumerated.
The advantage of cluster sampling from the point of view of cost arises mainly due to the fact that
collection of data for nearby units is easier, faster, cheaper and more convenient than observing
units scattered over a wide area.
As an illustration, a bank could be divided among its branches—branches constituting the clusters,
if the intended study is to exclude regional offices and the head office. For example, a bank wanted
to assess the computer training needs of its employees working in branches. There were three types
of branches—manually operated, partially computerised and totally computerised. Their numbers
were 560, 250 and 75. The bank decided to follow cluster sampling approach i.e. selecting samples
of branches of each type and then ascertaining the needs of all the employees at those branches
rather than taking samples of individual staff members from those posted at all the branches.
4.9.7.3.1 Multi-stage Sampling It is to be noted that if the number of units is fixed because of resource
constraints and the size of clusters is large, then only a limited number of clusters could be selected for
100% observations in those clusters. For example, suppose the number of units, which could be included
in the sample for observation, is 500. Further, assume that the size of clusters is about 100 each, and in all
there are 40 clusters. In such a case only 5 clusters could be selected for recording observations on each
unit in the clusters, and there will be no information available for the rest 35 clusters.
4.30 Business Research Methodology
It can be proved mathematically that for a given number of units, distributing the units over a
large number of clusters leads to greater precision than by taking a small number of clusters
and completely enumerating them.
In consonance with the above, a modification of cluster sampling called Two-Stage sampling
or Sub-sampling was evolved. Such a scheme envisages first selecting clusters and then choosing
specified number of units from each selected cluster. This implies that more clusters are included
in the sample, but instead of recording observation for each and every unit in these clusters, only
samples are selected.
The clusters which are selected randomly at the first stage are called the First Stage units, and
the units or the groups of units within clusters which are selected subsequently are called Second
Stage units. Such a scheme could be extended to three or more stages and is termed Multi-stage
sampling.
Two-stage sampling, used in the above illustration relating to 40 clusters, could be like this: First,
random sampling is resorted to select, say 10 clusters out of 40 clusters. Then, from each cluster,
samples of 50 units are selected at random.
Number of Clusters = 40, Number of units in each cluster = 100
Number of units in the population = 40 × 100 = 4000
Thus, in cluster sampling, a sample of 500 units was selected with all the100 units from each
of the 5 clusters selected at random out of the 40 clusters. In Two-stage sampling, the number of
clusters was increased to 10 to get better representation of clusters, but the sample size from each
cluster was reduced from 100 to 50, thus maintaining the overall sample size to 500. This can also
be illustrated through a tabular presentation given below.
Cluster Sampling Two-stage Sampling
Number of Clusters selected = 5 Number of Clusters selected = 10
Number of units selected from each of the 5 Number of units selected from each of the 10
clusters = 100 clusters = 50
Sample size = 500 Sample size = 500
Purposive Sampling
As the name implies, under this type of sampling, units of the population are selected according to
the relevance and the nature of representativeness of sampled units. For example, if one wants to
assess the likely reaction of employees to certain new measures contemplated in an organisation, it
might be better to include those employees in the sample who are likely to influence on the thinking
and actions of a vast majority of the employees. Incidentally, the sample size in such cases is not
fixed. We may terminate the sampling i.e. recording of information when we feel that no further
information or suggestion is being obtained.
Quota Sampling
Such sampling is, sometimes, considered a type of purposive sampling. It is usually resorted when
some quota about the number of units to be included in the sample is fixed. The quota is fixed due
to constraints on availability of time or/and cost. Within the quota stipulated, one has to select
a sample which is representative of the entire population. For example, within the overall quota
of interviewing 100 persons for some opinion poll, one may contact some persons from various
categories like college students, housewives, shopkeepers, office-goers, daily wage earners, etc.
Similarly, in an organisation, one might include persons from all categories of staff cadre-wise as
well as function-wise, department-wise, etc.
Judgement Sampling
In such type of sampling, the selection of units, to be included in the sample, depends on the judge-
ment or assessment of the person(s) collecting the sample. The sample is selected based on their
judgement/assessment as to what would constitute a representative sample. This is specially useful
when the sample size is small, and if random sampling is adopted, then the units which are more
important and critical to the objective of the study might not get included in the sample.
For example, in a training institute, the teaching staff was 30. However, for urgent academic or
administrative matters, the Director used to get opinion of one particular faculty as he was known
to have balanced views, did not belong to any group, and was frank enough to express his views.
Thus, the Director used to rely on a sample of size one.
Convenience Sampling
Such sampling is dictated by the needs of convenience rather than any other consideration.
For example, one may select some persons from a telephone directory, for getting their opinion on
some issue provided the views of those who own phones are relevant to the issue. For instance, their
views on TV programmes might be relevant but their views on some party or a person in a general
election may not be much relevant as they represent only relatively affluent class of people.
Similarly, one could select a sample of persons from the list of credit card holders.
Another example relates to opinion poll when one may find it easier to get the opinion of those
in the shops or restaurants or walking on pavement rather than going from house to house.
Snowball Sampling
Snowball sampling—also known as chain referral sampling—is considered a type of purposive
sampling. In such sampling, the sampling units are not fixed in advance but are decided as the sam-
pling proceeds. We may move to sample the units one after the other depending on the response
received from the previous units. If the units are human beings, one individual might refer to other
individual who, in turn, might refer to some other individuals. That is how it is called as “chain
4.32 Business Research Methodology
referral’ sampling. In this method, participants or informants with whom contact has already been
made use their influence/social networks to refer the researcher to other people who could potentially
participate in or contribute to the study. Snowball sampling is often used to find and recruit ‘hidden
populations’, that is, groups not easily accessible to researchers through other sampling strategies.
Inverse Sampling
In normal sampling, we take a sample of units and estimate about the characteristics of the units in
the population. However, if the proportion of units of a certain type is very small like fake notes in
circulation, then the method may not work. For instance, if we do not find any fake note in a sample
of 1000 pieces, examined at random, can we say that the proportion of fake notes is zero? In such
cases, inverse sampling could be used.
Inverse sampling is a method of sampling which requires that drawings of random samples shall
be continued until certain specified conditions dependent on the results of the earlier drawings have
been fulfilled, e.g. until a given number of units of specified type have been found.
Thus, referring to the above problem of estimating fake notes, we may continue taking samples
till we reach a certain number, say 10, of, say, Rs. 100 denomination notes. Suppose, we find 10
fake notes in a total sample of 10 lakhs pieces, it implies that the chance of a note being fake is
10/10 lakhs i.e. 1 in 1 lakh. Thus, if the number of Rs. 100 notes in circulation are 100 crores i.e.
10,000 lakhs, then about 10,000 lakhs × (1/1,00,000) = 10,000 notes are fake notes. It is only a
rough approximation but better than pure guess.
In the case of human beings, this method may be used to estimate the number of persons with
some rare characteristic like say, having 6 fingers on a palm, or some rare disease.
For illustrating the process of inverse sampling, a simplistic approach for estimating fake notes
is described as follows:
Starting from a day, say, 1st January, a daily record could be kept of the number of notes (com-
ing back to the Reserve Bank after circulation) examined at random, and number of fake notes
detected. When the number of fake notes reaches 5, this number 5 divided by the number of notes
examined would provide a quick estimate of the proportion of fake notes. Selection of number 5 is
arbitrary, and is based on the assumption that proportion of fake notes in the system is very small.
When the number of fake notes reaches 10, it will provide a better estimate. The estimate will keep
on improving, and might stabilise at certain level which could be considered a fair estimate of the
proportion of fake notes which have successfully passed through the system and have come back
to the Reserve Bank.
Of course, there are certain assumptions implicit in this approach, and it can be improved by
taking into consideration several other factors which are beyond the scope of this book.
(Contd)
Systematic Selects an element of the population at a Simple to design
beginning with a random start and following Easier to use than the simple random
the sampling fraction (n/N) selects every Ith Less expensive than simple
element, here I (I integral part of N/n) is the
sampling interval
Linear (random no. R selected between 1 and
I); the sample comprises R, R + I, R + 2I,…
Circular (random no. R selected between 1 and
N; the elements to be selected as R, R + I,…
I are the nearest integer to N/n)
Stratified Divide population into subpopulations or strata Sample size can be allocated to each
and use simple random sampling for each strata. stratum according to criteria of size,
Results may be weighted and combined. cost, etc.
Increased statistical efficiency
Provides data to represent and analyse
subgroups
Enables use of different methods in
different strata
Cluster Population is divided into appropriate subgroups. Provides an unbiased estimate of popu-
Some are randomly selected for further study lation parameters if properly done
either in full or part (two stage) Economically more efficient than
simple random
Lowest cost per sample, especially with
geographic clusters
Easy to do without a population list
For numbers greater than 50, we may divide the number by 50 and take the remainder. Thus, the
numbers are
49, 20, 8, 25, 6, 48, 40, 25, 42 and 30
Since one number 25 is repeated, we may take the first number in the third column i.e. 55 (first
two digits of 555). Dividing this number by 50, we get the remainder as 5. Thus, the sample of 10
students would be
5, 6, 8, 20, 25, 30, 40, 42, 48 and 49.
4.10 SIMULATION
So far, we have discussed three types of data by:
Observing e.g. model of a car, colour of a shirt, gender of a person, etc.
Recording e.g. income of an individual, salary offered to MBA students in campus placements,
sales turnover of a company, price of a stock, etc.
Soliciting (in writing) e.g. through questionnaire/schedule filled by a respondent
Experimenting and recording e.g. impact of a medicine on a patient, yield of a variety of rice
by using a particular fertiliser, marks obtained by students through examination, etc.
However, there is yet another method of collecting data, and it is through simulation.
In the context of BRM, simulation is used to generate data for carrying out the desired research
study without actually observing, recording, collecting or conducting an experiment.
For example, if we want to assess the likely impact of variation in ‘sale price’ per unit of an item
on the profit of a company manufacturing that product and if the company were to fix a certain
price, then traditional approach would be to record its sale after fixing that certain price, increase the
price further by 5%, record its sale, again increase the sale by 10%, and again record its sale. But
this will be a time-consuming process. One of the simple ways is to set up a mathematical formula
relating ‘sale price’ (S.P.) to the profit. Let the formula be
Profit = n × S.P – n × C.P. – 200
where n is the number of units sold (say, 100), C.P. is the cost per unit, say Rs. 2, and 200 is the
fixed cost. Let S.P. be Rs. 10 per unit.
Using the above formula, we can see the impact of varying S.P. by increasing S.P. to 12, 15 and
20, in the following table:
S.P. Profit
10 600
12 800
15 1100
18 1400
It may be noted that the above profit values have been ‘generated’ by using the given formula,
and help us to assess the impact of S.P. on the profit of the company. What we have described here
is a very simplistic situation just for illustration. In real life, simulation is more complex.
Incidentally, we may recall or note that, in the above case, we also get an idea as to how sensitive
is the profit to change in S.P. This is called ‘Sensitivity Analysis’. However, there is a difference
between sensitivity analysis and simulation. As we shall see later, in simulation, the independent
variables like S.P. are selected randomly rather than mathematically, as we have done.
Research Design 4.35
When we use the word simulation, we refer to any analytical method meant to imitate a real-life
system, especially when other analyses are too mathematically complex or too difficult to repro-
duce.
Simulation is defined by T. H. Taylor as “A numerical technique for conducting experiments on
a digital computer, which involves certain types of mathematical and logical relationships neces-
sary to describe the behaviour and structure of a complex real word system over extended period
of time.”
Even though simulation is a very sophisticated technique, as indicated by its formal definitions
given above, some simple aspects of simulation are described in this section.
Step 5: Analyse the results using histograms, summary statistics, confidence intervals
This can be explained by the following example:
Reliance Company wants to know how profitable it will be to market their new gadget, realising
there are many uncertainties associated with market size, expenses and revenue. It is planning to use
the Monte Carlo simulation to estimate profit and evaluate risk. The simulation exercise comprises
the following steps:
Step 1: Creating the Model
We use a top-down approach to create the sales forecast model, starting with:
Profit = Income – Expenses
Both income and expenses are uncertain parameters, but we aren't going to stop here, because one
of the purposes of developing a model is to try to break the problem down into more fundamental
quantities. Ideally, we want all the inputs to be independent. Does income depend on expenses? If
so, our model needs to take this into account somehow.
We assume that income comes solely from the number of sales (S) multiplied by the profit per
sale (P) resulting from an individual purchase of a gadget, so that
Income = S*P
The profit per sale takes into account the sale price, the initial cost to manufacture or purchase the
product wholesale, and other transaction fees (bank credit, shipping, etc.). For our purposes, we
assume that P may fluctuate between Rs. 2350 and Rs. 2650.
We could just leave the number of sales as one of the primary variables, but for this example, the
company generates sales through purchasing leads. The number of sales per month is the number
of leads per month (L) multiplied by the conversion rate (R) (the percentage of leads that result in
sales). So our final equation for income is:
Income = L*R*P
We assume the expenses to be a combination of fixed overhead (H) plus the total cost of the leads.
For this model, the cost of a single lead (C) varies between Rs. 10 and Rs. 40. Based upon some
market research, the Reliance Company expects the number of leads per month (L) to vary between
1200 and 1800. Thus, the final model for the Reliance Company sales forecast is:
Profit = L*R*P – (H + L*C)
It may be noted that H is also a part of the equation, but for illustration purpose it may be treated
as a constant. The inputs to the Monte Carlo simulation are just the uncertain parameters L, C, R
and P.
Normally, all the inputs in the model should be independent variables. In this case, even though
L is related to income as well as expenses, and thus, income and expenses are not independent.
However, for the sake of simplicity, we shall assume that L, R, P, H, and C are all independent.
Step 2: Generating Random Inputs
The key to Monte Carlo simulation is generating the set of random inputs. As with any modelling
and prediction method, the "garbage in equals garbage out" applies equally to this method.
Research Design 4.37
Input Values
The table above uses "Min" and "Max" to indicate the uncertainty in L, C, R, and P. To generate
a random number between "Min" and "Max", we use the following formula in Excel (Replacing
"min" and "max" with cell references):
= min + RAND()*(max – min)
We can also use the Random Number Generation tool in Excel's Analysis ToolPak Add-In to generate
a bunch of static random numbers for a few distributions. However, in this example we make use
of Excel's RAND() formula so that every time the worksheet recalculates, a new random number
is generated.
Suppose, we want to run n = 5000 evaluations of our model. Incidentally, this is a fairly moderate
number when it comes to Monte Carlo simulation.
A very convenient way to organise the data in Excel is to make a column for each variable as
shown in the following Excel snapshot.
Cell A2 contains the formula:
=Model!$F$14+RAND()*(Model!$G$14-Model!$F$14)
Note that the reference Model!$F$14 refers to the corresponding Min value for the variable L on
the Model worksheet, as shown in Figure 4.1.
To generate 5000 random numbers for L, one can simply copy the formula down 5000 rows. We
repeat the process for the other variables (except for H, which is constant).
Step 3: Evaluating the Model
Since our model is very simple, all we need to do, to evaluate the model for each run of the simula-
tion, is to put the equation in another column next to the inputs, as shown in Figure 4.1 (the Profit
column).
Cell G2 contains the formula:
=A2*C2*D2-(E2+A2*B2)
Step 4: Run the Simulation
We do not need to write a macro for this example in order to iteratively evaluate our model. We
simply copy the formula for profit down 5000 rows, making sure that we use relative references in
the formula
Rerun the Simulation: F9
Although we still need to analyse the data, we have essentially completed a Monte Carlo simula-
tion. Because we have used the volatile RAND() formula, to re-run the simulation all we have to
do is recalculate the worksheet (F9 is the shortcut).
4.38 Business Research Methodology
Figure 4.1 Screen capture from the example sales forecast spreadsheet.
In Part II of this Monte Carlo Simulation example, we completed the actual simulation. We ended
up with a column of 5000 possible values (observations) for our single response variable, profit. The
last step is to analyse the results. We create a histogram in Excel, a graphical method for visualising
the results.
We can draw many conclusions from this histogram:
Profit will be positive, most of the time.
Research Design 4.39
The result of performing a Monte Carlo Simulation with 5,000 iterations across 5 variables is
shown below:
order to analyse the characteristics of a project’s net present value (NPV), the cash flow components
that are heavily impacted by uncertainty are modelled, mathematically reflecting their "random char-
acteristics". Then, these results are combined in a histogram of NPV (i.e. the project’s probability
distribution), and the average NPV of the potential investment–as well as its volatility and other
sensitivities is observed. This distribution allows, for example, for an estimate of the probability
that the project has a net present value greater than zero (or any other value).
Some other areas where simulation is used are:
(iii) Assembly Line and Maintenance Scheduling
(iv) Bank Counters’ Allocation and Scheduling
(v) City Bus Scheduling
(vi) Telephone Traffic Routing
(vii) Consumer Behaviour Forecasting
(viii) Brand Selection and Sales Promotion
(ix) Warehouse Location
Instead of actually tossing a coin, and observing head (H) or tail (T), we can, for convenience,
simulate the experiment of tossing a coin by selecting a one-digit number from any column of the
table, reproduced here.
Thus, the ten numbers are
2, 0, 2, 0, 5, 6, 8, 9, 1 and 2
If a number is odd, it may be taken as H, and if it is even, it may be taken as T.
Thus, the simulated sequence of H and T is
T, T, T, T, H, T, T, H, H
This sequence may be taken as the simulated result of the tosses of a coin.
Another example of simulation is to simulate the experiment of throwing a dice, and noting the
number on the dice. We can select, two-digit numbers from the first column of the above random
number table. Thus, the numbers are
29, 06, 20, 09, 56, 66, 87, 94, 11 and 22
Dividing each of these numbers by 6, and matching the remainder 0 with number 1 on the dice,
remainder 1 with number 2 on the dice, and so on. Finally, the remainder 5 is matched with number
6 on the dice. Thus, the sequence of numbers is
6, 1, 3, 4, 3, 1, 4, 5, 6 and 5
This sequence may be taken as the simulated result of the experiment of throwing a dice.
The above two examples illustrate how a Random Number Table can help in simulating an
experiment without actually conducting it.
We can also generate random observations from any distribution or pattern which can be described
mathematically.
S U M M A RY
Descriptive
Explanatory/Causal and Association
The experimental designs are:
One-factor Experiments
Two-factor Experiments
Two-factor Experiments with Interaction
Latin Square Design
Factorial Design
Quasi-experimental Design
Ex Post Facto Designs
In addition to conventional designs and sampling schemes, two other types of designs are highly
useful in research environment to provide complete knowledge about conduct of research studies.
One type of studies relates to conduct of research of a particular phenomenon for a cross-section
of entities like companies, institutions, at one point of time or over a period of time.
The other type of studies relate to an environment where it is advisable to simulate or generate
data in a futuristic scenario, and use it for managerial decisions rather than limiting to available
actual data—which might be irrelevant!
DISCUSSION QUESTIONS
EXERCISES
1. A research firm wants to conduct “A study of the effect of tips given by brokers to retail inves-
tors on stock investments”.
(a) Write two objectives of this study.
(b) Identify major variables of the study.
(c) Suggest appropriate design for the study giving justification.
2. A company wants to study the effect of managerial control on the company’s performance.
(a) Identify and classify the variables in the study.
(b) Identify major variables of the study.
(c) Suggest suitable design for the study.
Measurement Scales
5
When you can measure what you are speaking about
and express in numbers, you know something about it;
but when you cannot express it in numbers, your
knowledge is of a meager and unsatisfactory kind.
—Lord Kelvin
1. Introduction
2. Qualitative and Quantitative Measures
3. Classification or Types of Measurement Scales
(a) Nominal
(b) Ordinal
(c) Interval
(d) Ratio
4. Properties of Scales
Contents (a) Distinctive Classification
(b) Order
(c) Equal Distance
(d) Fixed Origin
5. Statistical Analysis Based on Scales
6. Characteristics or Goodness of Instruments/Measurement Scales
(a) Accuracy and Precision
(b) Reliability
(c) Validity
(d) Practicality
7. Errors in Measurements
(a) Researcher
(b) Implementer/Measurer/Interviewer
(c) Participants/Subjects/Respondents
(d) Tool/Instrument
(e) Circumstantial Error
8. Types of Scales or Scaling Techniques
9. Comparative Scaling Techniques
(a) Paired Comparison
(b) Rank Order
(c) Constant Sum
5.2 Business Research Methodology
LEARNING OBJECTIVES
Providing a comprehensive understanding of the four types of measurement scales
Creating an awareness of the measurement errors at the four hierarchical levels of a study
Enabling to distinguish between comparative and non-comparative measurement scales, and their
different sub types
Equipping with basic tool-kit of scales for research
Relevance
Mr. Jitendar and Mr. Jay, students of a reputed B-School, joined a rapidly growing mobile
service provider company as summer interns. After completing the initial formalities, they
were asked to meet their corporate guide, Mr. Dixit. He accorded a warm welcome to them,
and explained the project assigned to them. It related to conducting a research study relating
to the penetration of the GPRS system in Indian Mobile Phone User Segment.
Mr. Dixit assured them that adequate resources would be provided to them, and the concerned
staff would provide them full co-operation and support. Mr. Dixit, however, smilingly told
them that the management had great expectations, and hoped they would enhance reputation
of their institute.
Jitendar and Jay started the project with great excitement. However, the excitement waned
off when they were asked to do two more additional projects that were accorded a higher
priority by the company.
After completion of the other two projects, when they reverted to the first project, they
realised that there was hardly any time available for the project. This resulted in hurryingly
designing of the requisite questionnaire. The data was collected from 350 participants. After
the data was coded and they started with the analysis, they realised the blunder they had
Measurement Scales 5.3
committed while using scales in the questionnaire. Most of the questions asked were purely
qualitative and the scales used were inappropriate to carry out any quantitative analysis, they
were planning to carry out.
At this stage, they realised that they should have given more time and thought while defin-
ing the measurements and scales in the questionnaire appropriate for the project. As a result,
they could not fulfill the envisaged objectives.
After the presentation of the project report, Jitendar and Jay had mixed feelings. Though the
presentation was not impressive, they could salvage the situation as they had established good
reputation in the company, thanks to their two successful projects. But the lesson they learnt
from the failed project was much more than what they learnt from the successful ones.
5.1 INTRODUCTION
The characteristics of individuals and business entities vary from individual to individual and from
entity to entity.
In the case of human beings, there are certain physical and/or quantitative characteristics like
height, weight, complexion, etc., and there are certain abstract or qualitative characteristics like
intelligence, integrity, creativity, attitude, etc.
Like human beings, a business organisation has also physical characteristics like: employees,
sales, offices, etc. Being physical in nature, these are easily measurable. However, there are certain
abstract characteristics (known as constructs) like reputation, image of the entity, motivation, work
culture, commitment, customer’s perception and trust.
All these perceptions and feelings of customers and employees are extremely important because
they help the company to stay afloat and grow. Therefore, it is essential for the companies to consider
the above constructs relating to employees and customers.
As mentioned in Chapter 2, constructs are abstract and concepts are components of construct that
are concrete, and, therefore, measurable.
Some concepts that are normally relevant for understanding the psychology of employees and
customers are:
Achievement
Aptitude
Attitude
Intelligence
Personality
It may be appreciated that even abstract characteristics or concepts such as mentioned earlier
have to be measured for their meaningful assessment. It is reflected in the quotation of Lord Kelvin,
given at the beginning of this chapter.
Accordingly, behavioural scientists have evolved the following instruments to measure the above
concepts:
Achievement Tests
Aptitude Tests
Attitude Scales
Intelligence Coefficient
Personality Profiling
5.4 Business Research Methodology
This chapter discusses the various types of measurement scales that have been evolved which
led to development of the above instruments. The advantages and limitations of the scales have also
been elaborated.
The variables associated with a study are classified in two basic categories viz.
Quantitative/Numeric/Metric
Qualitative/Categorical/Non-Metric
This forms the basis for the classification of the measurement scales. These have been elaborated
in the next section.
Distinctive Classification
A measure that can be used to classify objects or their characteristics into distinctive classes/categories
is said to have this property. This is a minimum requirement for any measure. For example, gender
classifies the individuals into two distinctive groups, males and females. The individuals may also
be classified on the basis of their occupation, like student, salaried, businessman, etc. Similarly, the
qualification of an individual could be used to classify individuals into various categories such as
undergraduate, postgraduate, professional, etc.
Order
A measure is said to have an order if the objects or their characteristics can be arranged in a mean-
ingful order. For example, marks of a student can be arranged in an ascending or a descending order.
As another example, a consumer may rank four telecom service providers on connectivity; the result
will be the order of companies as 1, 2, 3 and 4.
It may be noted that all quantitative measures have implied order. Qualitative measures may also
have order. The first example described earlier is a quantitative measure and the second is a qualita-
tive measure.
Equal Distance
If, for a measure, the difference between any two consecutive categories (generally termed as values
for numeric variables) of a measured attribute, are equal, then the measure is said to have equal
distance. For example, the time difference between 2.00 pm to 3.00 pm is same as the difference
between 3.00 pm and 4.00 pm i.e. 1 hour. Another example could be the temperature as a measure;
the difference between 40oC and 50oC is same as between 60oC and 70oC.
All numeric measures satisfy this property.
5.6 Business Research Methodology
Fixed Origin
A measurement scale for measuring a characteristic is said to have a fixed origin if there is a mean-
ingful zero or ‘absence’ of the characteristic. Examples are: income of an individual, sales of a
company, etc. These scales have a meaningful zero or ‘absence’ of the characteristic; zero income
signifies no income or absence of income, and zero sales signifies no sales or absence of sales.
Before we describe these details, we would like to discuss various properties that a scale
possesses.
Nominal Scale
A qualitative scale without order is called nominal scale. This scale can only be categorised and
do not satisfy other three properties described above. It is termed as ‘nominal’ as, though one may
represent the categories using numbers, the numbers are just ‘nominal’ or namesake, they do not
carry any value or order or meaning. For example, the colour of bikes is a nominal measure. The
different possible answers to the question ‘Which colour will you prefer for a bike?’—could be
blue, black, red, etc. One may number these colours as 1, 2, 3 or 4 or 100, 200, 300, 400, in any
sequence i.e. this scale neither has any specific order nor it has any value.
The nominal scale involves classification of measure objects into various categories such as ’Yes’
or ‘No’, ‘Pass’ or ‘Fail’, type of population group viz. metropolitan, urban, semi-urban, vehicle
used for going to office, bus, car, motor cycle, etc. Numeric value is assigned to these classified
categories. Such numbers are used for identifying individuals.
The data collected through a nominal measure scale is called nominal data.
The data obtained through a nominal scale is of a type that can be classified into categories or
groups, and given labels to describe them. Examples are: house number, telephone number, car
number, roll number (of a student), model (of a TV), etc.
Sometimes, instead of numbers, codes are used for classification like STD codes for cities, bar
codes for items in departmental stores, codes for various subjects in a university, codes for books
in a library, blood group of individuals, etc.
Ordinal
Ordinal scale is a scale that does not measure values of the characteristic(s) but indicates only the
order or rank like 1st, 2nd, 3rd, etc.–like in a beauty competition. Even when objects or their charac-
teristics are measured quantitatively, the scale converts them into ranks like students in a class.
As another definition, a qualitative scale with order is called an ordinal scale.
This scale possesses first two of the four properties of the scales, viz. the properties of distinctive
classification as well as order.
Rank as a measure is always considered as ordinal. The difference between any two ranks is not
necessarily equal. The difference between first and second rank does not connote the same differ-
ential. For example, if in a class of students, the highest mark is 95, next is 85 and the next is 84,
converting marks to ranks will lead to 1, 2 and 3. Incidentally, it may be noted that the difference
Measurement Scales 5.7
in the performance of the 1st ranker and 2nd ranker is not the same as the 2nd ranker and 3rd ranker.
Thus, one can only conclude that 1st ranker has performed better than 2nd ranker and 2nd ranker
better than 3rd ranker.
The data obtained using ordinal scale is termed as ordinal data.
Ordinal data is essentially the same as nominal data, except that there is now an order within the
groups into which the data is classified. However, as we are dealing with qualitative data, we are
unable to say by how much they differ from each other.
Some examples are:
Ratings of hotels, restaurants and movies. We can say a 5 star hotel is better than a 4 star hotel,
but we cannot say that a 4 star hotel is twice as good as a 2 star hotel
Class of travel in a train or an aeroplane
Grades of students in a class
Interval
A measurement scale whose successive values represent equal value or amount of the characteristic
that is being measured, and whose base value is not fixed, is called an interval scale.
This is a quantitative scale of measure without a fixed or true zero.
The data obtained from an interval scale is termed as interval data.
Interval data is quantitative data that can be measured on a numerical scale. However, the zero point
does not mean the absence of the characteristic being measured. Some examples are: temperature,
time, longitude, latitude, etc.
Ratio
Ratio scales are quantitative measures with fixed or true zero. Ratio scale has all the four properties
of scales that are described in the Section 5.3.1.
The data obtained from ratio scales are referred to as ratio data.
Ratio is also a quantitative data that can be measured on a numerical scale but, here, the zero point
is fixed and implies the absence of what is being measured. In fact, if a scale has all the features
of an interval scale, and there is a true zero point, then it is called a ratio scale. For example, a
weighing scale is a ratio scale. Some other examples are: height, life, price, length, sales, revenue,
etc. In all these cases, zero implies absence of that characteristic.
The table given in the following depicts the four properties followed by different types of
scales:
Type of Scale Category/ Order Distance Origin
Distinctive Classification
Nominal Yes No Not fixed Not fixed
Ordinal Yes Yes Not fixed Not fixed
Interval Yes Yes Fixed Not fixed
Ratio Yes Yes Fixed Fixed
It may be noted again that the ratio scale follows all the four properties of a scale.
Some more examples of the measurement scales/data are as follows:
5.8 Business Research Methodology
twenty-fifth of an inch (1 inch = 25 mm) while the ‘inch’ scale can measure precisely only up to
one-eighth of an inch. Similarly, the Fahrenheit scale of measuring temperature is more precise than
the Celsius scale. Incidentally, those using eyelenses wish that the precision of lenses could be more
than one-quarter i.e. 0.25!
An analogy could be had with the examination conducted to measure the knowledge and under-
standing of the students as also to distinguish the students from one another. The marks scored out
of, say 100, would provide better accuracy and precision than simply grading the students A+, A,
B+, B and C. However, this grading system would provide more accuracy and precision than grading
the students merely as A, B and C.
(ii) Reliability
Reliability indicates the confidence one could have in the measurement obtained with a scale. It
tests how consistently a measuring instrument measures a given characteristic or concept i.e. if
the same object/characteristic/attitude, etc. is measured again and again, it would lead to about the
same conclusion.
However, it may be emphasised that reliability does not necessarily imply that the measuring
instrument is also accurate. All it means is consistency in drawing conclusion. The following draw-
ing diagram explains it clearly:
The first diagram exhibits reliability as well as accuracy. In the second diagram, however, the
measuring instrument is consistent but not accurate.
(iii) Validity
The validity of a measuring instrument indicates the extent to which an instrument/scale tests or
measures what it is intended to measure. For example, if we intend to measure intelligence, the
instrument, say question paper, ought to be such that it results in measuring true intelligence; if the
paper tests only general knowledge, the instrument is not valid. As another example, if the reward
system for measuring performance of salesmen is based only on the sales figures, irrespective of
the territory, the reward system may not be valid, if the territories do impact sales.
Types of Validity
There are three types of validity. These are:
Content Validity
Criterion Validity
Construct Validity
A brief description of these is given below:
(a) Content Validity
It indicates the extent to which it provides adequate coverage of the issues that are under
study.
(b) Criterion Validity
These are of two types. One indicates the success of the measuring instrument used for pre-
dicting. The other, also called concurrent validity, is used to estimate the present status.
5.10 Business Research Methodology
a particular brand of products, the interviewer may influence respondent by nodding or smil-
ing at his preferred brand, thus, introducing error. While analysing data, inappropriate coding,
wrong calculations, etc. can also introduce errors.
Tool/Instrument
The flaws in the instrument itself may cause errors. The flaws could be in the form of inap-
propriate words/language used to ask questions, ambiguous meanings, incorrect order of
questions, incorrectly designed questionnaire, not giving enough choices for respondents also
called response choice omission, etc. For example, if in the questionnaire it is asked “What is
your income?” without mentioning monthly/yearly individual/family gross/net income from
salary or total income from all sources, then each respondent might respond according to his
definition of income, thus, leading to error.
Poor printing, not providing enough space for answers, etc. are also considered as instrument
errors.
In addition to the above four levels, there is yet another source of error viz. ‘Circumstantial’.
There could be errors due to circumstances like presence of someone, while answering the ques-
tions, which could influence the answers. For example, if the participant of a survey is an employee,
then the presence of his/her boss or any senior may influence the responses, thus, adding error.
Another example of circumstantial error is the error that could crop in, if a participant is not taken
into confidence or not assured anonymity as he/she might give guarded responses with caution,
suppressing the real responses.
All these errors needs to be controlled or neutralised by using appropriate design, understanding
the subject under consideration in depth, training the interviewer, simplifying the tool, etc.
punctuality, food, flying returns programme, etc. The consumer has to assign rank 1 to the most
preferred factor and the last rank to the least preferred factor. A sample question is indicated in the
box as follows:
Please rank the following factors in order of importance to you, while choosing an airline. As-
sign Rank 1 to most important and Rank 5 to least important factor. Do not repeat the ranks.
Factor Rank
Price
Punctuality
Food
Flying Returns Programme
Seat Comfort
The response will be indicative of the relative importance attached to different factors. Such com-
parative scales generally use ordinal scale and are interpreted only in relative terms, and as such
they generate non-metric/non-numerical data.
Please rate the following factors according to the importance attached by you, while choosing
an airline. Indicate by against the appropriate rating. Rate 1 to 2 least important and Rate
7* to 2 most important factor.
Factor 1–Least Important 2 3 4 5 6 7*–Most Important
Price
Punctuality
Food
Flying Returns Pro-
gramme
Seat Comfort
* rating could be on any scale. In this illustration, we have used a 7-point scale.
5.14 Business Research Methodology
The data generated through non-comparative scaling technique is usually in interval scale. The
advantage of non-comparative scales is that they can be continuous, metric/numeric variables. This
allows wide avenues for analysis as compared to non-numeric variables. This advantage makes it
the most preferred scaling technique.
Paired Comparison
In paired comparison scales, the respondent is asked to select one object from the list of two ob-
jects, on the basis of some criteria. This forces the respondent to compulsorily select one of the
two. Such scales are used when the study requires to distinguish between the two specified objects.
An example is given below:
In a study of consumer preferences about two brands of Glucose biscuits viz. Parle-G and Tiger
Glucose, the following paired comparisons were solicited on three characteristics.
Select only one of the two brands.
Which Glucose biscuits do you prefer on the basis of ‘TASTE’?
Parle – G Tiger
Which Glucose biscuits do you prefer on the basis of ‘PRICE’?
Parle – G Tiger
Which Glucose biscuits do you prefer on the basis of ‘PACKAGING’?
Parle – G Tiger
It may be noted that the data generated through this scaling technique is ordinal in nature.
This scaling technique is useful when the researcher wants to compare two or more objects. It
may be noted that in the above example we have compared two brands over three factors, hence,
the number of comparisons are three. If the number of objects to be compared are more, this can
lead to too many comparisons requiring more time on the part of the respondent.
In general, if there are n objects to be compared on k factors, the total comparisons will have to
be
n_________
× (n – 1)
k× 2
which can be high number for large values of n and k. Hence, such scales are generally used only
when number of objects to be compared are smaller in number.
Measurement Scales 5.15
For example, in the above case, instead of two brands if one considers five brands to be ranked
on three factors, the total number of paired rankings to be given by the respondent is
5 × (5 –1)
3 × _________
2 = 30
In general, the following factors may be taken into account while using Paired Comparisons:
The method is most effective when actual choices in the real situation are always between two
objects
The researcher must be willing to forgo measurement of the distance between items in each
pair, as this scale is ordinal and not numeric
The number of objects to be compared may be limited to avoid making the respondent’s task
too difficult
Rank the following services in the order of importance attached by you, while selecting a new
mobile service provider. The most preferred can be ranked 1, the next as 2 and so on. The least
preferred will have the last rank. Do not repeat the ranks.
Feature Rank
1. Connectivity _____
2. Minimum Call Drops _____
3. Value Added Services _____
4. SMS _____
5. Roaming _____
6. Ring tone/Caller tune _____
7. Alerts _____
8. Downloads _____
9. Internet _____
Feature Rank
1. Connectivity __1___
2. Minimum Call Drops __3___
3. Value Added Services __5___
4. SMS __4___
5.16 Business Research Methodology
5. Roaming __2___
6. Ring tone/Caller tune __8___
7. Alerts __7___
8. Downloads __9___
9. Internet __6___
Allocate the amount you would like to spend on your birthday on the following items, out of
total amount of Rs 5000/- (please note that total amount allocated should be exactly 5000)
Item Amount
1. Cosmetics _____
2. Clothes _____
3. Accessories _____
4. Jewellery _____
5. Dinner _____
6. Movie _____
______________________________________
Total 5000
Measurement Scales 5.17
Allocate the amount you would like to spend on your birthday on the following items, out of
total amount of Rs 5000/-
Item Amount
1. Cosmetics _____
2. Clothes 1000_
3. Accessories 1500_
4. Jewellery _____
5. Dinner 2000_
6. Movie _500_
______________________________________
Total 5000
The data generated from this scaling technique can sometimes be considered as numeric data. The
amount assigned to each of the objects in the list is purely numeric but generalisation of the amount
beyond the list of objects is not possible. Due to this limitation, it is appropriate to consider the data
as ordinal data. The advantage of this method is that it can distinguish the respondent’s preference
between objects, in lesser time than the other comparative scaling methods.
The following factors may be taken into account while using Constant Sum Scaling:
The respondent may not assign exact total amount, they may assign either lesser or more than
the specified amount. In such cases, the data may have to be discarded.
Only up to 10 objects may be used in the list. Allocation over large number of objects may
create confusion for the respondent leading to respondent’s error.
This cannot be used as a response strategy with children or uneducated people.
Rate the following criteria, while choosing a LCD TV, on the basis of importance attached by
you. (mark × at an appropriate distance)
1. Price Most _______________________________________ Least
2. Picture Most _______________________________________ Least
Quality
3. Sound Most _______________________________________ Least
Quality
4. Service Most _______________________________________ Least
Rate following aspects, while choosing a LCD TV, on the basis of importance attached by you.
(mark × at appropriate distance)
1. Price Most _______________________________________ Least
100 90 80 70 60 50 40 30 20 10 0
2. Picture Most _______________________________________ Least
Quality 100 90 80 70 60 50 40 30 20 10 0
3. Sound Most _______________________________________ Least
Quality 100 90 80 70 60 50 40 30 20 10 0
4. Service Most _______________________________________ Least
100 90 80 70 60 50 40 30 20 10 0
Theoretically, an infinite number of ratings are possible if the respondents are qualified enough to
understand and accurately differentiate the objects. The accurate score can be measured by measur-
ing the length of the mark from the either side.
The data generated from this scale can be treated as numeric and interval data.
The disadvantage of this scale is that it is more time consuming and difficult for editing, coding and
analysis compared to the other rating scales. Graphic scales are generally used with children since
they have limited vocabulary that prevents the use of scales dominated with words.
The following factors may be taken into while using continues rating scale:
This method is most applicable where evaluative responses are to be arrayed on a single di-
mension.
Scale extremes (both ends of the scale) may be labelled extremely to define the dimension, and
the labels used must be bipolar opposites.
In the vast majority of cases, the intermediate scale value should not be labelled with words,
and only numbers spaced at equal intervals may be used.
The respondents are asked to select one of the categories, that best describes the product, brand,
company or any other attribute being studied.
The commonly used itemised rating scales are:
Likert Scale
Semantic Differential Scale
Stapel Scales
Likert Scale
The Likert Scale is the most frequently used variation of the summated rating scale commonly
used in the studies relating to attitudes and perceptions.
Summated Rating Scales comprise statements that express either a favourable or an unfavour-
able attitude toward the object of interest on a 5 point, 7 point or on any other numerical value.
The respondents are given a list of statements and asked to agree or disagree with each statement
by marking against the numerical value that best fits their response. The scores may be summed-up
to measure the respondent’s overall attitude. It is not necessary to sum up the scores this scale can
be used in isolation without summing up. The summing up may be misleading especially if there
are statements designed to avoid leanings towards either side. The summed-up score in such cases
does not interpret the actual attitude towards the objects.
The following illustration relates to a retail store:
Please rate the statements given below. 1 – Disagree …. 5 – Agree
Statement Disagree Agree
The ambience at this store is good 1 2 3 4 5
This store has clean, attractive and convenient public areas (restrooms, 1 2 3 4 5
trial rooms)
This store has merchandise available when the customers want it. 1 2 3 4 5
Employees in this store have the knowledge to answer customers' 1 2 3 4 5
questions.
The behaviour of employees in this store instills confidence in cus- 1 2 3 4 5
tomers.
Customers feel safe in their transactions with this store. 1 2 3 4 5
Employees in this store give prompt service to customers. 1 2 3 4 5
Employees in this store are too busy to respond to customer's 1 2 3 4 5
requests.
This store gives customers individual attention. 1 2 3 4 5
This store willingly handles returns and exchanges. 1 2 3 4 5
Employees of this store are able to handle customer complaints directly 1 2 3 4 5
and immediately.
This store offers high quality merchandise. 1 2 3 4 5
This store accepts most major credit cards. 1 2 3 4 5
Likert Scale has several advantages that make it more popular. It is relatively easy and
quick to compute. Further, it is more reliable and provides more data for a given amount
of respondent’s time, as compared to other scales.
5.20 Business Research Methodology
Rate the ATM you have just used in respect of the indicated parameters. Mark × at appropriate
location that best suits your answer.
The ATM was _____________ for operations
Easy :___:___:___:___:___:___:___: Difficult
The processing time was
Slow :___:___:___:___:___:___:___: Fast
The security person was
Cordial :___:___:___:___:___:___:___: Indifferent
The advantage of Semantic Differential Scale is that it is versatile and gives multidimensional ad-
vantage. It is widely used to compare image of brands, products, services and companies.
The data generated from this scale can be considered as numeric in some cases, and can be
summed to arrive total scores. If the objects reflect about a single store/product etc., the data is
considered as ordinal.
The following factors may be taken into account while using semantic differential scale:
Adjectives must define a single dimension, and each pair must be bipolar opposites labelling
the extremes.
Precisely what the respondent is to rate must be clearly stated in the introductory instructions.
Measurement Scales 5.21
Stapel Scales
These scales are named after John Stapel who developed these scales. It is a unipolar rating scale with
10 categories numbered from –5 to +5 without a neutral point or zero. The respondents are asked to
rate how each term describes the object. Positive rating indicates that the respondent describes the
object accurately and negative rating indicates that the respondent inaccurately describes the object.
Fewer response categories may also be used in certain cases. This is usually presented in vertical
form as against other scales which are generally presented in horizontal form.
This scale is an alternative to semantic differentiation specially when it is difficult to find bipolar
adjective that matches the question.
Example:
Rate the outlet on the following factors. +5 indicates that the factor is most accurate for you
and – 5 indicates that the factor is most inaccurate for you.
+5 +5 +5
+4 +4 +4
+3 +3 +3
+2 +2 +2
+1 +1 +1
Good Ambience Quality Products Excellent Service
-1 -1 -1
-2 -2 -2
-3 -3 -3
-4 -4 -4
-5 -5 -5
In the first type, there are multiple options for the respondent, but only one answer can be chosen.
The scale used is called the multiple choice, single-response scale.
In the second type, as a variation researcher may use multiple choice, multiple-response scale
also termed as Check List wherein the respondent is given a list of multiple choices and can choose
more than one choices from the list.
Example:
Example:
Simple attitude scales are easy to develop, inexpensive, can be highly specific, and provide useful
information, if developed skillfully.
The following factors may be considered while using Single/Multiple category scale:
The category names may define a set of discrete alternatives, with clear distinction in the minds
of interviewers and/or respondents. The named categories should be mutually exclusive, so
that a response does not fit into more than one of the categories.
It may be ensued that the labeled alternatives capture majority (about 90%) of answers that
are likely to be given by the respondents. As an abundant caution, “Others” category may be
listed at the end to include any answers that do not fit into the named categories.
Example:
The checklist scale allows the respondent to select one or several alternatives. It is simple to
understand and saves considerable time.
The following factors may be considered while Multiple Category Multiple Response or Checklist
scale:
The instructions and response task are quick and simple, and many options can be included.
The scale yields data only in the form of discrete, nominal, dichotomous data.
The data gathered in single category scale is nominal.
Multiple category scale (single response) is either nominal or ordinal.
The checklist data formed is nominal, and each object in the list is coded as separate variable
with ‘Yes’ and ‘No’ type (‘Yes’ means the object was ticked by the respondent. Any ‘No’ means
the object was not ticked)
The following factors may be considered while using the verbal frequency scale:
The scale is most appropriate when respondents are unable or unwilling to compute exact
frequencies.
This scale is used when only an approximation of frequency is desired.
This scale generates ordinal data.
Data Properties
The properties of the data generated by scales is one of the important factors in deciding the scales.
Each data type has certain property and can be used for only selected type of analysis. For example,
nominal and ordinal data are not amenable to numerical analysis vide Section 5.4.
When the research design is planned, along with each variable, corresponding analysis is also
decided. This aspect should be taken into consideration while deciding the scales.
5.24 Business Research Methodology
Number of Dimensions
Measurement scales can be unidimensional or multidimensional. In unidimension scale, only one
attribute of the participant or object is measured. For example, ambience of a retail store can be
measured by a single measure like layout, or it may be measured as a combination of multiple
measures in a single measure called ‘store ambience’.
A multidimensional scale presumes that an object might be better described with several dimen-
sions than a single dimension. In the above example, the store’s ambiance can be defined on different
dimensions like store design, store décor, and friendly environment, lighting, layout, lobby area,
etc.
An unbalanced rating scale has an unequal number of favourable and unfavourable response
choices.
The scale does not allow participants who are unfavourable to express the intensity of their attitude
as much as for the respondents with the favourable viewpoint. An unbalanced rating scale is justi-
fied in studies where researchers know, in advance, that nearly all participants’ responses will lean
in one direction or the other.
When researchers know that one side of the scale is not likely to be used, they try to achieve
precision on the side that will receive the participant’s attention.
Measurement Scales 5.25
Indicate your level of agreement or disagreement with the following statements by placing a
tick mark in the relevant grid (5 = Strongly agree, 4 = Agree, 3 = Neutral, 2 = Disagree, 1 =
Strongly disagree).
Strongly Agree Neutral Disagree Strongly
Agree Disagree
Price is of core importance Undecided
Opinions of family and friends are
not important
Advertisements accurately depict
product feature
Consumer are developing mental
blocks to advertisements
I try most of the new products
in the market when they are
launched.
Indicate your level of agreement or disagreement with the following statements by placing
a tick mark in the relevant grid (5 = Strongly agree, 4 = Agree, 3 = Neutral, 2 = Disagree,
1 = Strongly disagree).
Strongly Agree Neutral Disagree Strongly
Agree Disagree
Price is of core importance Undecided
Opinions of family and friends are
not important
(Contd)
5.26 Business Research Methodology
(Contd)
Advertisements accurately depict
product feature
Consumer are developing mental
blocks to advertisements
I try most of the new products in the
market when they are launched.
S U M M A RY
There are two types of measures viz. qualitative and quantitative. Further, the four properties of
scales are classification, order, equal distance and fixed origin.
The characteristics of a good measurement scale are accuracy, precision, reliability, validity and
practicality.
Various types of measurement errors could creep in while conducting a study.
The four categories of measurement scales/data are nominal, ordinal, interval and ratio.
The measurement scales are of two types viz. conventional and unconventional. The conventional
scales are further classified in two categories viz. comparative scaling techniques and non-comparative
scaling techniques. Almost all important conventional scales are generally used in research studies are
illustrated with suitable examples.
DISCUSSION QUESTIONS
1. Describe various aspects of quantitative and qualitative measures associated with a research
study.
2. Discuss the four categories of measurement scales. Explain each of these scales with each of
the three hypothetical situations.
3. Describe the characteristics of various measurement scales with suitable examples.
4. Describe various sources of errors in measurement as also ways to avoid or minimise such er-
rors.
5. Discuss with examples the three comparative scaling techniques.
6. Describe the relevance and applications of Likert scale with an application.
7. Discuss the guidelines for deciding the use of scales in research studies.
8. Write short notes on:
(i) Statistical analysis based on various scales
(ii) Continuous Rating Scales
(iii) Semantic differential Scale
(iv) Stapel Scale
Primary and Secondary
Data and Their Sources
6
1. Introduction
2. Primary and Secondary Data
(a) Advantages and Limitations of Primary and Secondary Data
3. Primary Data Sources
(a) Surveys
(b) Questionnaire
(c) Observation
(d) Interview – Structured, Semi structured and unstructured
Contents (i) Focus Groups
(ii) Projective Techniques
4. Secondary Data Sources
(a) Publications
(b) Projects and Research Reports
(c) ERP/Data Warehouses and Mining
(d) Internet/Web
(i) Some of the Important Websites
(ii) Searching Databases/Web pages
LEARNING OBJECTIVES
To explain primary and secondary types of data with respective advantages and limitations
To acquaint with the various sources of primary and secondary data and the various methods to
collect such data
To guide web-based searches
Relevance
Mr. Anil, a senior consultant at ABC Consultant Ltd., had gone to meet one of its clients
Mr. Arjun, the owner of a reputed luxury hotel chain, at his Nariman Point Building. Mr.
Arjun was working on his new ambitious project of setting up a hotel with a budget of about
Rs. 350 crore. He was looking for a firm that had a previous experience of research in the hotel
business. ABC Consultant, having already worked for three reputed hotel chains successfully,
was the obvious choice.
The meeting was arranged to discuss the detailed research design prepared by the firm and
also to discuss the different sources for collecting the data required for the research. The firm,
6.2 Business Research Methodology
in its research proposal had mentioned about the requirement of considering qualitative as well
as quantitative methods of data collection. The qualitative method included observations, semi-
structured or unstructured interviews; and the quantitative method included questionnaires and
structured interviews. Since the study also required some economic parameters to assess the
feasibility of the project, it was also felt necessary to consider some secondary data for the
analysis.
After fruitful discussions in the meeting, the different strategies of data collection were
finalised with the tentative dates of execution. Mr. Anil remarked in the end, “Most research-
ers find these matters trivial and do not invest enough time and thinking for such decisions,
but this casual attitude towards such decisions may prove to be fatal at the later part of the
research study.”
Mr. Arjun agreed with the statement and nodded with appreciation.
6.1 INTRODUCTION
Data is the raw material for almost all research studies. The type of data and methodology of its
collection vary according to the requirements of the study. It also depends upon the ‘unit of study’
i.e. an object, an individual, an entity, etc. and the type of data that may be required for the study
i.e. quantitative or qualitative, primary or secondary, etc. Incidentally, the concepts of primary and
secondary data are explained in the next section. Further, the decisions on type and methodology of
collection of data may also depend on the type of planned research design. This chapter deals with
various methods of data collection like observations, focus groups, interview, survey, etc. and the
various types of data sources like primary and secondary data sources.
The data relating to banking in India is collected by the Reserve Bank of India, and so it is
primary data for the Reserve Bank. However, the same data when published in the Reserve Bank
publications becomes secondary for the person or organisation that uses the data.
The most common examples of secondary data are the data collected from published sources like
newspapers, magazines/journals, books, reports, institutional publications like ‘Handbook of Statis-
tics on the Indian Economy’ published by Reserve Bank of India, Economic Survey and Economic
Census of Data, published by Government of India, etc. The data provided by the organisations on
their websites could be primary data collected by them but for the visitor to the website, the data
is secondary.
In subsequent sections, we shall discuss the advantages and limitations, and the different sources
of primary and secondary data.
As the data collected is mostly in the form of notes, recorded tapes, etc. it requires a highly
qualified and experienced people.
Most researchers follow a mixed approach, wherein initially the qualitative methods of data
collection are used by a researcher to understand the problem/issue under consideration. Once the
insights are obtained using qualitative methods, these could be included to prepare a comprehensive
questionnaire and then surveys can be conducted. Most market researchers use this approach to
understand and analyse the consumer preferences.
We shall discuss some of the quantitative as well as qualitative methods of data collection in
the following sections.
6.3.1 Surveys
Survey is a method of data collection, usually on a large scale. This is a structured method of col-
lection of data. A survey, generally, has a fixed questionnaire containing a set of specific questions
that are close-ended, and the responses are analysed statistically.
A survey can be administered using the following methods:
Personally
Telephonic
Mail
Electronic Media
These are described as follows:
6.3.1.1 Personally Administered Survey or Structured Interview The set of designed
questions are personally asked by the researcher or interviewer. In this method, either the questionnaire
is handed over personally, and taken back after completion by the respondent, or he/she is asked the
questions orally and the responses are noted down by the interviewer. The first method is easier for the
researcher and takes lesser time, and many questionnaires can be filled in a limited time by different
respondents. In fact, the presence of the researcher is only to clarify any doubts that might be raised by
the respondent while answering the questions. The disadvantage is that the respondent may take longer
time, and because of which one may not find enough persons to fill the questionnaires.
In the second method wherein the responses are asked orally, and the researcher notes down the
answers, the survey is easier for the respondent who has to only speak. This method, also termed
as Structured Interview, may be adopted in two ways. First, the interviewer may hand over the
questionnaire to the respondent, and the respondent may speak out the answers orally which are
noted by the interviewer. Second, the interviewer may hold the questionnaire, asks the question
orally and also note down the respondents’ answers.
The disadvantage of personally administered survey is that it requires personal involvement,
on the part of the researcher, to conduct the process. This becomes a limitation if the samples are
distributed over large geographic area or the sample size is very large.
6.3.1.2 Telephonic Survey Many a times, it is not possible to personally conduct survey for each
unit in the sample. This drawback is overcome by conducting a telephonic survey. In this type of survey
method, the data is collected through telephonic interaction. The questions are asked over phone by pro-
fessional callers, and the responses are noted down by the callers. The advantage of this method is that it
can cover a larger geographic area than can be covered by personally administered surveys. It takes lesser
time for the interviewer and can be convenient for the respondents too. The high telephone penetration in
Primary and Secondary Data and Their Sources 6.7
India has made this method more convenient. The other advantages are, this method gives considerable cost
advantage than the personally administered survey method. The telephone responses may be immediately
entered to save time and cost of the data entry. The time spent in telephone interview method is much lesser
than most other methods. If necessary, one can conduct the telephone survey within a day’s time, which is
not possible in other survey methods. Interviewer bias caused by the physical appearance, body language
and actions of the interviewer is reduced in telephonic survey.
Major Limitations
The questionnaire should be specially designed for telephonic survey. It should be precise, clear
and short. If the questions are not understood by the respondent, it may create bias. Certain
type of scaling questions like rank order questions can make the interview difficult for the
respondents.
The interviewer has to be given proper training to conduct the interview.
The respondent may not be willing to respond. One should respect the respondents’ choice if
they are not willing to co-operate. The conversion rates i.e. the proportion of willing respon-
dents for telephonic interview is much lesser than for the personally administered survey.
The length of the survey questionnaire is very important in telephone survey. If the survey is
too long, the phone could be disconnected by the respondent even before completion of the
survey.
6.3.1.3 Mail Survey This method is used when either the respondents are geographically dispersed and
too far to call or the survey is detailed/extensive to be conducted on phone. In some cases especially in the
rural areas, the telephones may not be there. In such cases, the questionnaire is mailed to the respondents
and the detailed instructions about filling up the questionnaire, and sending it back are explained in the mail.
This is also termed as Self-Administered Survey. The respondent has to voluntarily fill the questionnaire
and send it back to the researcher. The postage is paid by the researcher.
Mail surveys are typically perceived as providing more anonymity than other communication
modes, including other methods for collecting data.
Limitations
The conversion rate is too low. Participants may not co-operate with a long and/or complex
questionnaire unless they perceive a personal benefit.
This could be more expensive than the electronic surveys, described later.
This method is time consuming as there is no certainty as to when the response is received
back.
Due to lack of direct communication, if any questions are not understood by the respondent,
either the questions will be left unanswered or would be answered inappropriately.
6.3.1.4 Survey Using Electronic Media The self-administered surveys discussed earlier can also
be sent through electronic media. This gives maximum reach. Practically, any location on the globe can be
reached through this media.
Electronic mail and Internet are common in most of the countries. The Information Communica-
tion Technology (ICT) reach is increasing at exponential rates. This format of data collection is
most efficient and cost-effective compared to the other formats. The questionnaire is sent through
electronic media in two ways:
6.8 Business Research Methodology
By sending a document file containing the questions through the e-mail services.
Using on-line survey services and making an on-line survey, and sending the link of the survey
to the respondents by e-mail.
In the first method, though the electronic media is used, the process is not automated and the
compilation of data through different documents may still consume time and resources.
In the second method, the survey is made online. It is very convenient for the respondents to
participate in online survey. It takes minimum time for responding, by the use of popup, drop down
menu, checkboxes, etc.
It is also easy for the researcher as most survey providers give readymade data file either in Excel
or in SPSS form. This saves data coding efforts and also maximises the accuracy as the process is
automated.
Some of the sites that provide this facilities are:
www.SurveyMonkey.com
www.surveypro.com
www.esurveyspro.com
www.surveygizmo.com/
Limitations
This method can give biased results as the people who willingly respond to the survey may be
from a section of society thus restricting the sample to that section of society. For example,
most of the students are willing to respond to such surveys. If a consumer study is conducted
through this media, the majority will be only from a section of a society, and, thus, the results
of the study may not be valid, in general.
The surveys are limited to the educated people who have access to mail. In India, still major-
ity of people do not have this privilege. This again restricts the use of this media. Especially
if the study undertaken requires data from rural/semi-urban area, then this media may not be
much useful.
6.3.2 Observation
Observation is a qualitative method of data collection. In this technique, the information is captured
by observing objects, human behaviour, systems and processes, etc.
For example, in a study about a product, instead of asking consumers a fixed set of questions like
in questionnaire, one can appoint a person in a store to observe and analyse their behaviour. This
could be more expensive method than the questionnaire method, but the results obtained through
this method are more reliable. The respondent error discussed in Chapter 5, could be eliminated
through this process. The observer generally does not interfere in the process, and to that extent, the
observer’s bias is also eliminated. The observation techniques may reveal some critical information,
that otherwise is not disclosed in the other forms of data collection. This method requires a qualified
person for data collection. The most common use of this method is done to access a process, where
a qualified person walks through the process, observes the process critically and notes down about
the process.
Advantages
The data collected through this method is original, first hand, accurate and authentic.
Primary and Secondary Data and Their Sources 6.9
One may capture information that the participant might ignore, if asked in any question in a
questionnaire, assuming the information is too trivial or not important for the study.
Certain type of information to be obtained by accessing a process, procedure or event can be
collected only through this method.
This method has the least intrusion for the participants.
Limitations
This method requires physical presence of a qualified observer.
The process is very slow, and the researcher may have to spend considerable time before ac-
quiring any valuable information.
The process is also more expensive than other methods.
The information captured is solely dependent on the skills of the observer, and is subjective.
The quantitative analysis of the information is generally not possible.
It is difficult to find the logic or rationale behind the behaviour from this method. The method
only accesses the behaviour and not the rationale behind the behaviour. This enables the re-
searcher to conclude about the results, but not about the reasons behind the results.
It may be noted that the selection of the observer is very critical to the success of this method.
We shall, therefore, discuss some of the qualities desired in an observer.
6.3.3 Interview
Interviewing is the most commonly used method of data collection. Interview could be of three
types:
Structured Interviews
Semi-Structured Interviews
Unstructured Interviews
Structured interviews are the quantitative method of data collection whereas semi-structured and
unstructured interviews are qualitative methods of data collection.
Structured Interviews
A survey can be conducted by using a structured interview. This method is generally termed as per-
sonally administered survey. We have discussed this method in Section 6.3.1.1. This is a quantitative
method of data collection and the analysis of the data collected is generally quantitative.
Semi-Structured Interviews
This method is used when the researcher asks the respondent some basic questions, and then lets
the respondent answer, interfering whenever necessary. In this method, the interviewer sets some
6.10 Business Research Methodology
guidelines for the questions to be asked. The succeeding questions are generally on the basis of the
preceding questions.
The most common example for semi-structured interview is ‘Job Interview’.
Unstructured Interviews
Unstructured interviews are those that allow the interviewer to get opinions and get a feel of
general attitudes of the respondents. They are exposed to lesser degree of bias and based on their
inputs and explanation, a researcher can get deeper insights. This can help researchers interpret the
respondent’s output better than in structured questions. Unstructured interviews are generally used
in exploratory research, and are generally time consuming. The biggest challenge of this method is
that the data generated is in unstructured format making it difficult for quantitative analysis. The
coding for such data, is extensive to allow methodical analysis, and needs expertise. Though there
are selective software tools like Computer Assisted Qualitative Data Analysis Software (CAQDAS)
available for this purpose, their scope is limited.
Adequate attention has to be paid while interviewing respondents who may not be very adept at
expressing themselves.
6.3.3.1 Focus Groups This is yet another variation of interviewing technique. Focus groups are small
selected group of participants who are interviewed by a trained researcher. The participants are from a target
research audience whose opinion is of interest to the researcher and the client. The discussions are generally
in form of exchange of experiences, opinions on how they feel and their ideas on a specific topic.
The researcher generally guides the discussion in a direction that will lead the participants to
opine on the relevant issue. The discussions during such interactions allow a free and open discus-
sion, wherein the researcher might receive some tips on a varied line of thought that could be of
advantage and was not previously perceived.
The selection of a focus group has to be given due importance. Smaller groups are preferred
to achieve natural and well co-ordinated discussions. The participants are to be selected, as far as
possible, from a similar economic, social and cultural background. This minimises any conflict that
could arise within the group, and contribute towards achieving the set objectives.
The researcher’s skills are very important in keeping the discussions alive and smooth without
getting entangled by any controversy or bias. In fact, he/she has to play the role of a moderator,
and, thus, should have the ability to make the participants at ease. His/her timely intervention and
probe to get more information in a congenial atmosphere is very important. The researcher should,
therefore, have fair knowledge of the topic that is to be discussed and should be able to understand
and effectively utilise the group dynamics or group behaviour.
In the context of market research, this method is used to elicit their perceptions, opinions, beliefs
and attitudes towards a product, service, concept, advertisement, idea or packaging. In particular,
focus groups allow companies wishing to develop, package, name or test market a new product, to
discuss, view and/or test the new product before it is made available to the public.
In fact, focus groups could provide more reliable information, and could be less expensive than
other forms of traditional marketing research.
Some other variants of focus groups are as follows:
Dual moderator focus group – One moderator ensures the session progresses smoothly, while
another ensures that all the topics are covered
Primary and Secondary Data and Their Sources 6.11
Word Association The respondents are asked to read a list of words and to recollect and indicate
the first word that comes to their mind.
Pictures and Words Association The respondents are given a number of words and pictures and are asked
to choose those they associate with a brand or product and to explain their
choice.
Sentence/Story Completion Respondents are given an incomplete sentence, or a story, and asked to com-
plete it.
Thematic Apperception Test Respondents are required to give opinions of other people’s actions, feelings
(Cartoons or Empty Balloons) or attitudes. “Bubble” drawings or cartoon tests provide an opportunity to fill
in the thought or speech bubbles of the characters depicted.
Expressive Respondents are required to role-play, act, draw or paint a specific concept
or situation.
Brand Mapping Respondents are presented with different brands and are required to offer their
perceptions with respect to certain criteria.
6.4.1 Publications
The material appearing in the print media can be labelled as a publication. Daily newspapers, maga-
zines, encyclopaedias, textbooks, hand-books, reports, journals, etc. can be termed as publications.
6.12 Business Research Methodology
These are also termed as the reference material, and these constitute widely available sources of
data. Most researchers use this media at literature review stage of the research study for getting a
wide perspective of the topic. The indexes, bibliographies, catalogues, etc. help search the topic in
a systematic manner. Proper referencing should be given to identify the source appropriately. Some
of suggested publications are:
Government of India publications like Economic Surveys presented before the budget, budget
speeches, etc.
Statutory bodies like Reserve Bank of India publications like Annual Reports, Reports on Cur-
rency and Finance, Handbook of Statistics on Indian Economy, Trend and Progress of Banking
in India, etc.
CMIE publications encompassing a wide range of economic parameters relating to the Indian
economy
Publications of
— Associations of segments of industries like Indian Banks Association
— Confederation of Indian Industry (CII)
— All-India Association of Industries (AIAI)
— Association of Chambers of Commerce and Industries in India
— National Association of Software & Service Companies (NASSCOM)
— Telephone Regulatory Authority of India (TRAI), etc.
Incidentally, the Government and the statutory bodies are generally considered to be more au-
thentic sources of secondary data.
Some reputed consultants like Gartner Group, McKinsey, etc. also bring out reports containing
valuable data that could also be considered as secondary data sources.
“Dun & Bradstreet (D&B) is the leading provider of international including Indian business in-
formation. HOOVERS, also a leading data base company, provides data about companies at global
level.
6.4.4 Internet/Web
One of the best ways to collect any secondary data is to search on the web. Specific related words
could be entered to activate the search process. World Wide Web (WWW) is collection of millions
of WebPages containing information about practically all the topics. The current trend is to store
the publications in the web form. This may include directories, dictionaries, encyclopaedias, news-
papers, journals, e-books, Government reports, etc. One may choose the relevant website from the
vast list displayed on the screen. Some of the important websites are given in the next section.
Internet is also responsible for data explosion. The internet is flooded with data, and the challeng-
ing task is to identify or locate the relevant or useful data from the huge amount of data. This task
is generally not easy. Though there are intelligent searches available like Google, Yahoo, etc., even
then these searches yield thousands of options and one needs to be patient to locate the exact data
one is looking for. To get over these limitations, one can also use online databases, like EBSCO,
ECCH, CMIE, Captialine, manupatra, legalpundits, etc. These are specialised databases, generally
subscribed with a fee. The searches in these databases can be easier and specific than on websites.
There are few free online databases like Wikipedia which is most useful site for researchers. Online
journals also form a major source of secondary data.
6.14 Business Research Methodology
There are also some journal database providers like Elsevier, Emerald, JSTOR, ProQuest, scien-
cedirect, etc. These subscribe to different journals and provide a combined access to the subscribers.
This saves trouble of separately subscribing to the different journals for an individual subscriber.
This also saves money as one has to only subscribe to the journal databases than the individual
journals.
6.4.4.1 Some of the Important Websites
Owner/Sponsored Site Address Description
Reserve Bank of India www.rbi.org.in Economic Data, Banking Data
Securities Exchange www.sebi.gov.in Data
Board of India
Bombay Stock www.bseindia.com Data
Exchange
National Stock www.nseindia.com Data
Exchange
Finance Ministry www.finmin.nic.in Data/Publications
Glossary of Statistics* www.wikipedia.com Definitions and Brief details*
Princeton University www.dss.princeton.edu Data and Statistical Services
Statistics Help Us statsdirect.com Data and Statistical Services (Paid Service)
The Oecd’s Online http://caliban.sourceoecd.org Statistical Databases, Books and Periodicals
Library
Nationmaster http://www.nationmaster.com/index.php Sources as the CIA World Fact book
India Stat www.indiastat.com Statistical data and useful information on
India
Federation of Indian http://www.ficci.com Commerce and Economic data
Chambers of Com-
merce and Industry
Iassist iassistdata.org Indian Publications
Bepress www.bepress.com Electronic journals (paid site)
International Telecom- http://www.itu.int/ Publications and cases studies
munication Union
World Bank www.worldbank.org Data
World Health Organi- Health topics - www.who.int/topics/en/ Data
sation Countries - www.who.int/countries/en/
About WHO - www.who.int/about/en/
Publications - www.who.int/publica-
tions/en/
ISI www.isical.ac.in/~library/ Web library
(Indian Statistical
Institute)
Telecom Regulatory http://www.trai.gov.in Data on telephone network
Authority of India
(Contd)
Primary and Secondary Data and Their Sources 6.15
(Contd)
Centre for Monitoring Industry Analysis Service - www.cmie.
Indian Economy com/.../industry-analysis-service.htm
Economic Intelligence Ser-
vice - www.cmie.com/
database/?service=database-p...
Products - www.cmie.com/database/
?service=database-products.htm
Euro monitor Euromonitor.com International market intelligence on indus-
tries, countries and consumers
India Info line http://www.indiainfoline.com/ Information of stocks and other financial
services
Modify the query if it fails to give desirable result or it gives too many or too few results.
It is always advisable to save the results at appropriate location with the detailed references, so
that the researcher can trace them to the source and can give appropriate references at the end
of the report.
One could also repeat this process on the websites after searching the databases.
S U M M A RY
The primary and secondary sources of data have advantages and limitations as well.
The two basic methods of data collection viz. qualitative and quantitative methods of data col-
lection can be differentiated with respective advantages and limitations.
The following are the methods of conducting a survey:
Personally Administered Survey
Telephonic Survey
Mail Survey
Survey Using Electronic Media
The other methods of collecting primary data are observation, interview, projective techniques
and focus groups. There are guidelines to collect the data from various sources such as:
Publications
Project and Research Reports
ERP/Data Warehouses and Mining
Internet/Web
DISCUSSION QUESTIONS
LEARNING OBJECTIVES
To explain the process of collecting primary and secondary data
To provide requisite knowledge about various aspects associated with designing a questionnaire
for collection of primary data
To outline the steps for preparation of data
To provide guidance in collecting/recording data from secondary sources
Relevance
The XYZ Consultants had bagged a research project from one of its clients, a leading player
in telecom sector. The project related to assessing the impact of the Mobile Number Portability
to be introduced by TRAI for the Indian telecom sector.
Mr. Uday, the chief consultant, met the client’s representatives for discussing various strate-
gies for collection of data, and to finalise the questionnaire for data collection.
The discussion was important as the study was to be conducted across a wide geographical
area and covering a wide range of telecom users. The span of the study was gigantic. It was
to cover rural and urban areas, different states, different users speaking different languages,
etc.
There were many issues to be discussed in the meeting such as: What is going to be the
unit i.e. object, entity, system, etc. of the study?
How will the data be collected i.e. personally administered, telephonic, mail, or e-mail survey
or mix of these methods?
What questions to be asked?
What should be the ideal length of the questionnaire?
What should be the sequencing of the questions?
How should the questions be framed?
Is there any requirement of multiple questionnaires?
What could be the advantages and limitations of having multiple questionnaires? etc.
After the fruitful discussion, the team jointly decided to go ahead with a mixed approach
of data collection. It was also decided to have multiple questionnaires, different for rural and
urban customers, and analyse them separately for the two groups.
7.1 INTRODUCTION
We have discussed in Chapter 6, the different sources of data and their respective advantages, limi-
tations and also various methods of collection of data. In this chapter, we shall discuss the meth-
odology of collecting the data using various tools/instruments or schedules. We shall also discuss
preparing the data for presentation and analysis.
This has been done separately for primary data and secondary data.
collecting, coding and preparing the data for subsequent presentation and analysis for each of the
methods.
7.2.1 Questionnaire
Most research studies use questionnaires. Surveys use questionnaires to collect data in the system-
atic format. A questionnaire is a set of questions asked to individuals to obtain useful information
about a given topic of interest. If properly constructed and responsibly administered, questionnaires
can become a vital instrument by which inferences can be made about specific groups or people or
entire populations. The advantage of questionnaire is its ease of use, getting answers ‘to the point’
and the ease at which the statistical analysis can be carried out. In this method of collecting data, a
wide range of information can be collected from a large number of individuals.
The designing of the questionnaire forms the most important part of a survey. Adequate question-
naire design is critical to the success of the survey. Any errors at the design stage of the questionnaire
could prove to be fatal at a later stage. The questionnaire is at the centre of any quantitative research
study, and, therefore, due attention should be paid to design the questionnaire. Inappropriately asked
questions, inappropriate order of questions, format, scaling, etc. could generate errors in the research
study as the responses may not actually reflect the opinions of the respondents. Most of these errors
can be avoided by conducting a pilot study or pretesting the questionnaire. After the questionnaire
is constructed, it is tested on a set of people on various aspects like completeness, time taken to
answer, clarity of questions, etc. If executed appropriately, pretesting can reduce design errors that
otherwise could have made the research study worthless. The questionnaire is generally revised after
incorporating the learning’s from the pretesting.
It may be noted that even if the pretested responses is retained as such in the final study, the
pretested questionnaire is retained as such.
7.2.1.1 Questionnaire Modes of Responses Different methods by which the questionnaire can
be administered have been earlier discussed. We will discuss in this section the guidelines for designing the
questionnaire, for different modes.
Telephonic Survey
In case of a telephonic survey, the interviewer interacts with the respondents, but is not personally
present. The questionnaire is also not seen by the respondent. These form the limitations for collec-
tion of data using this method. These limitations restrict the questionnaire design for this method
of data collection. The questionnaire should not be too long. It should not have questions that take
long time on phone. The instructions for answering questions should be simple so that the respondent
understands them easily. There are restrictions on using some of the scales like fixed sum scales,
7.4 Business Research Methodology
diagrammatic scales, semantic differential scales, etc. The interviewer should be trained for the
survey. There should be a mechanism to verify if the survey really took place or not. The data can
be directly entered in a form or an Excel sheet to save the coding time.
Mail Survey
Mail surveys are self-administered. Though the questionnaire can be seen by the respondent, there
is no interaction with the interviewer. This limitation should be considered while designing the
survey. The questionnaire must have clear instructions, clearly stated questions, etc. Since this is
self-administered, the questions should be so designed so as to hold the interest till the end of the
questionnaire. The questionnaire should not be too lengthy that it leads to no response. The flow
of the questions should follow a pattern like, easy questions in the beginning, followed by difficult
questions, followed by easy questions at the end. This ensures maximum responses. This pattern
assumes that the concentration level of an individual is less in the beginning, increases over time
and reduces after some time. Time spent while answering questions is very important factor for
these surveys. Lesser the time spent, more will be the responses. This factor is primarily considered
during the pretesting of the questionnaire.
E-mail Survey
As earlier discussed, the survey through electronic media can be done in two ways—sending the
soft form through e-mail, or using online survey services. In the first method, i.e. sending the soft
version of the form through e-mail, there is no difference in the questionnaire design than used
in mail surveys, as only the medium of collecting data is different. In the later case, the survey is
constructed using HTML format and tools. It makes the questionnaire easy to construct, easy to
answer, (by clicking answers), and easy to code. But this method adds some limitations in the type
of questions. The online survey providers do not provide all types of scales—for example, scales
like semantic differential scale, graphic rating scale, etc.
Open-ended questions are also difficult to answer and code. A sample of E-mail survey is displayed
at the end of the chapter as Sample Questionnaire 3.
7.2.1.2 Designing of a Questionnaire We have discussed in the previous section, the consider-
ations specific to method of survey, while designing a survey. We shall discuss in this section some general
guidelines for designing a questionnaire.
The questionnaire design often faces trade-off between the depth of the study to be conducted and
the likely receipt of the responses. The more the depth of the study, lesser could be the responses
received. The researcher should consider this aspect and design the questionnaire that has adequate
balance between these two factors. There are many other factors to be considered for improvement
in the responses. These are discussed in brief in section 7.2.1.3.
The questionnaire for a research study is designed based on the questions hierarchy discussed in
Chapter 3.These questions form the building blocks for questionnaire design. The survey strategy
is designed only after understanding connection between investigating questions and possible mea-
surement questions.
Some of the strategies are:
Deciding the type of scale needed
The method of data collection to be used
Collection and Preparation of Data 7.5
Wordings of Questions
This is the most important aspect of designing a questionnaire. Many a times, the questions are not
worded appropriately to get uniform or desired responses. It may result in different respondents
interpreting the same question differently, or the respondents interpreting the question in a different
way than meant by the designer of the questionnaire. This could result in the respondent’s error.
This problem may arise due to the different levels of vocabulary between the respondents and the
questionnaire designer. This problem may also be due to long ambiguous questions. This error can
be avoided by getting the questionnaire screened by others (other than the ones who design it).
Types of Questions—Structured/Unstructured
The questionnaire generally contains more structured questions than the unstructured. If the study
requires extensive use of unstructured questions, the preferred method could be interviewing. The
respondents generally avoid lengthy open-ended answers for unstructured questions.
There is always trade-off between the open-ended questions and the ease of analysis. More the
open-ended or unstructured questions, more difficult will be the analysis using quantitative methods.
The unstructured questions should be analysed separately to understand the pattern of responses.
consume more time of the respondents. There should be an appropriate balance between these two
objectives.
To maximise the response rate, the researcher may consider the following:
One needs to consider carefully how to administer the questionnaire
Establishing rapport goes a long way to increase response rate
Explaining the purpose and the importance of the survey helps
The researcher needs to remind those who have not responded
The length of the questionnaire should be appropriate
To obtain accurate, relevant and necessary information, a researcher may consider the following
aspects:
Asking precise questions
Using short and simple sentences is important for the respondent’s understanding. Thus, only
one piece of information should be asked at a time
The questionnaire should be planned as to what questions be asked, and in what order, how
to ask them, etc. according to the objectives of the study
Sometimes, additional questions can be used to detect the consistency of the respondent’s an-
swers. For example, there may be a tendency for some to tick either "agree" or "disagree" to
all the questions. Additional contradictory statements may be used to detect such tendencies
Selecting the respondents who have the requisite knowledge
7.2.1.4 Types of Response Questions We have discussed in Chapter 5 various scales and types
of questions that are appropriate for questionnaire design. The questions generally evolve from the ques-
tions’ hierarchy, the hypothesis developed and the analysis planned for the study discussed in Chapter 3.
Appropriate questions should be used keeping those aspects in mind.
7.2.2 Observation
The data collected through observation method requires to be transformed in appropriate form to
perform requisite analysis on the data. The data collected using this method is generally in the form
of notes by the observer or the sound clip/video clip of the recording of the process. This makes
it difficult to analyse directly. Some artificial intelligent software can be used for compiling such
data.
7.2.3 Interview
If the interview technique is structured, the data collected is in the similar form as the question-
naire, and can be easily compiled. In case of unstructured interviews, elaborate compilation skills
are required to analyse the data.
7.2.3.1 Focus Groups The moderator of the focused group collects the data. The data may be collected
in the form of notes, audio or video recordings of the discussion. This makes it difficult to analyse directly.
Some artificial intelligent software can be used for compiling such data. The extraction of information
from the data requires relevant expertise for compilation and interpretation of the data.
7.2.3.2 Projective Techniques Since these methods of data collection require psychological skills,
the data needs to be collected by a skilled person. The compilation and analysis of the data also requires
relevant expertise. These methods are generally used by the specialists and the data is collected and com-
piled by the experts.
7.8 Business Research Methodology
Editing
The purpose of editing is to ensure consistency in the responses, and to locate omission(s) of any
response(s) as also to detect extreme responses, if any. Further, editing also checks legibility of all
the responses and seeks clarifications in the responses wherever felt necessary.
Coding
The process of converting responses into numeric symbols is called codes. The purposes of coding
are:
Easy to input and store in computerised systems
Easily amenable for computerised processing and analysis
Easily amenable for sorting and tabulation
In fact, coding of data is decided according to the needs of inputting, storing, processing, sorting
and analysis of data.
Incidentally, the codes could be alphanumeric also but for using software packages like Excel
and SPSS, the codes should preferably be numeric.
Validation
After the data is coded, it is validated for data entry errors. The data is then used for further analy-
sis. The purpose of validating the data is that it has been collected as per the specifications in the
prescribed format or questionnaire.
For example, if the respondent is asked to rate a particular aspect on 1 to 7, then the obvious
responses should be 1 or 2 ….., or 7. Any other inputted number is not considered as valid. In
validation of the data, the above data will be restricted to the integers between 1 and 7. This mini-
mises the errors. The other validations are age within a number like 100, dates such as birth dates,
joining dates, etc. should not be future dates, etc.
Incidentally, while editing is done after the receipt of the responses, validation is done after
inputting the responses after coding. The validation is especially used to reduce data entry
errors.
confine the discussions to the aspects related to the collection of the secondary data that is authentic
and relevant to the study being conducted.
The relevance of the secondary data is the most important factor to be considered. The second
important factor is to ascertain the authenticity or reliability of the secondary data.
Authenticity of Data
While collecting or recording data from a secondary source for a researcher, it is to be ensured that
the data has been published by the agency that is authorised to either collect the primary data or get
it directly collected from relevant sources.
For example, Government of India is the competent authority to collect data relating to prices
and production. Thus, the data published by Government can be taken as an authentic source for
secondary data relating to prices and production. The same data can also be taken as authentic when
published by some statutory body like Reserve Bank of India. Similarly, Reserve Bank is authorised
to collect data about banking related parameters like deposit, advances, etc. and, therefore, the data
published by Reserve Bank of India may be taken as authentic. In fact, Reserve Bank of India
disseminates a variety of data relating to Indian Economy, through various publications (listed in
Chapter 6) and all those data are considered as authentic secondary sources of data for any study
about Indian economy.
The data collected and disseminated by associations and federations of various industries could
also be taken as authentic secondary data by a researcher.
At the global level, authentic data related to various aspects of different countries—either in
consolidated form or country/region wise is published by United Nations, World Bank, International
Monetary Fund, etc. Such data could also be taken as authentic secondary data by a researcher.
The other reliable sources could be the database services, which regularly conduct surveys and
elaborate the methodology of collecting data and the error control. Published company data can
be also reliable to a certain extent. Other sources of data should be first verified for the reliability
before considering the use of the data.
While referring to the data published by private bodies, consultants, newspapers, magazines, etc.,
it may be ensured that they refer to the official publications as mentioned above before accepting
the data as authentic.
Some other relevant issues while referring to secondary data are the definitions used for various
parameters like deposits and profit, the period for which the data is available, extent of coverage of
data, etc. In fact, the footnotes given for a data should be read carefully to assess their impact, if
any, on the research study.
SAMPLE QUESTIONNAIRE 1
2. Age:
3. Gender Male Female
4. Occupation:
Student Working/Business Housewife Others
5. What is the purpose for your using the Internet? (Please rate 5-maximum and 1-minimum)
Purpose 5 4 3 2 1
Work
Entertainment
8. Which is your most favourite website? (Please rate 5-maximum and 1-minimum)
Visited Preferred Useful
Orkut 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1
Facebook
Ibibo
Hi5
Myspace
Bigadda
Others_____
Collection and Preparation of Data 7.11
9. What is it that makes you like the website? (Please rate 5-maximum and 1-minimum)
Features 5 4 3 2 1
Interface
Crowd
Simplicity
Speed
Other features _______________(Please specify)
12. Do you feel safe to upload your picture and disclose your true self?
Yes, absolutely
No, absolutely not
Maybe, if there is restriction on the viewers
13. Has your attitude towards these sites changed after the adnan patrawala case?
Yes
No
I don’t know about it
14. What is your take on the number of social networking sites that have flooded the internet?
Yes, I’m all for it
No, I think they are shady
Well, each one to his own
15. Do you think these sites are short-lived? why?
Yes
No
I Don’t Know
________________________________________________________
7.12 Business Research Methodology
16. Would you prefer a networking site with a purpose or some use apart from just networking
and fun? what kind of purpose would you like it to solve?
Yes
No
networking with a purpose!! (please rate 5-maximum and 1-minimum)
5 4 3 2 1
Business networking
Intellectual networking i.e. meeting likeminded people
Project sharing
Purposive networking, ex- Marriage, Jobs
Carpooling
Meeting for common purposes like social work, animal activists
Others _____________________________________________________________________
____________________________________________________________________
__________________________________________________________________
(Please specify)
SAMPLE QUESTIONNAIRE 2
Consumer Study for Two Wheelers
1. Name:
2. Age:
(a) 18-20
(b) 21-25
(c) 26-30
(d) above 30
3. Income:
(a) > 1 lakh
(b) 1-2 lakhs
(c) 2-3 lakhs
(d) > 3 lakh
4. Rate the following on the basis of mode of your preference. (Use against number) (1-least
preferred, 5-most preferred)
Mode 1 2 3 4 5
Auto
Train
Bus
Car
Two wheeler
Other (Specify) _______
Collection and Preparation of Data 7.13
14. Rate the following on the basis of your preference of two-wheeler. (Use against number)
( 1- least preferred, 5-most preferred)
Initial Attraction 1 2 3 4 5
Small and Compact
Low Maintenance Cost
Affordable Price
Other (Specify) _______
SAMPLE QUESTIONNAIRE 3
Mobile Users Web-based Questionnaire
Telecom Survey
1. Age ___________
2. Gender
Male Female
3. Relationship status
Single Committed Married
4. Education Qualification
Under Graduate Graduate Post Graduate
5. Current Occupation
Student Salaried Self-employed
Professional Retired Home-maker
Others (Please Specify) __________
6. Salary/Income/Pension/Pocket money per month (in Rupees)
Less than Rs. 2000
Between 2001–5000
Between 5001–10000
Between 10001–20000
Between 20000–50000
Above Rs. 50000
7. Since how many years are you using the mobile phone?
8. Which service provider are you using? (In case you are having more than 1 phone, select the
most used one)
Vodafone
Airtel
MTNL (Dolphin)
MTNL (Trump)
Collection and Preparation of Data 7.15
Reliance CDMA
Reliance GSM
TATA CDMA
Spice
Virgin
Idea
BPL
Aircel
BSNL
Others (Please Specify) __________
9. Have you ever changed your service provider?
Yes
No
10. What type of connection do you use?
Post paid
Prepaid
Lifetime Prepaid
11. What is your average monthly cell phone expenditure? __________
12. How do you pay your phone bills?
Cash
Cheque
Credit Card
Other (Please Specify) __________
13. Who pays for your bills?
Self
Parents
Company
Children
Other (Please Specify)
14. Do you use a corporate plan?
Yes
No
15. Rate the following services on the basis of your preference (1 – Least preferred; 5 – Most pre-
ferred)
1-Least Preferred 2 3 4 5-Most Preferred
SMS 1 2 3 4 5
Ring tones/Caller tunes 1 2 3 4 5
Alerts 1 2 3 4 5
Downloads 1 2 3 4 5
Internet 1 2 3 4 5
7.16 Business Research Methodology
16. Rate the following depending on whom you call the most?
1-Least called 2 3 4 5-Most called
1-Least 2 3 4 5-Most
Family 1 2 3 4 5
Friends 1 2 3 4 5
Work related 1 2 3 4 5
One Specific number 1 2 3 4 5
Making Enquiries 1 2 3 4 5
21. What factors do you consider while selecting a service provider? Please rank them (1-Least
considered, 5-Most considered; 0 – if NA)
Parameter: Price
1-Least Considered 2 3 4 5-Most Considered 0-Not Applicable
Not Applicable 1-Poor 2 3 4 5- Excellent
Local Calling Rates 0 1 2 3 4 5
STD Calling Rates 0 1 2 3 4 5
ISD Calling Rates 0 1 2 3 4 5
SMS Charges 0 1 2 3 4 5
Value-added Charges 0 1 2 3 4 5
Roaming Charges 0 1 2 3 4 5
Credit Limits 0 1 2 3 4 5
Payment options available 0 1 2 3 4 5
Collection and Preparation of Data 7.17
22. What factors do you consider while selecting a service provider? Please rank them (1- Least
considered; 5- Most considered; 0 – if NA); PARAMETER : Connectivity/Performance
1-Least Considered 2 3 4 5-Most Considered 0-Not applicable
Not Applicable 1-Poor 2 3 4 5-Excellent
Least call drops (Lesser the call 0 1 2 3 4 5
drops, better the coverage)
Voice Clarity 0 1 2 3 4 5
Network Coverage 0 1 2 3 4 5
Roaming Network 0 1 2 3 4 5
Down Time 0 1 2 3 4 5
23. What factors do you consider while selecting a service provider? Please rank them (1-Least
considered; 5-Most considered; 0 – if NA ):
PARAMETER: Value-added Services (VAS)
1-Least Considered 2 3 4 5-Most Considered 0-Not applicable
Not Applicable 1-Poor 2 3 4 5-Excellent
SMS 0 1 2 3 4 5
MMS 0 1 2 3 4 5
Voice SMS 0 1 2 3 4 5
Ring Tones 0 1 2 3 4 5
Caller Tunes 0 1 2 3 4 5
Internet 0 1 2 3 4 5
Voice Conferencing 0 1 2 3 4 5
24. What factors do you consider while selecting a service provider? Please rank them (1-Least
considered; 5-Most considered; 0 – if NA)
PARAMETER : Customer Care
Not Applicable 1-Poor 2 3 4 5- Excellent
Bill Payment Ease 0 1 2 3 4 5
Call Centre Efficiency 0 1 2 3 4 5
Recharge Availability 0 1 2 3 4 5
Branch Availability 0 1 2 3 4 5
25. What factors do you consider while selecting a service provider? Please rank them (1-Least
considered; 5-Most considered; 0 – if NA) PARAMETER: Other Reasons
1-Least Considered 2 3 4 5-Most Considered 0-Not applicable
Not Applicable 1-Poor 2 3 4 5-Excellent
Advertisements 0 1 2 3 4 5
Peer Effect 0 1 2 3 4 5
Prior Experience 0 1 2 3 4 5
Word of mouth publicity 0 1 2 3 4 5
7.18 Business Research Methodology
S U M M A RY
Primary data is collected directly by the researcher for some specific purpose or study. Secondary
data is the data disseminated through some media like reports, newspapers, etc. The next stage
after collection of data is the preparation of data. Various steps involved are editing, coding and
validating.
DISCUSSION QUESTIONS
1. Describe the questionnaire modes of responses with illustration for each of the modes.
2. Describe the various issues to be considered for designing a questionnaire.
3. Describe the activities for preparation of data with illustrative examples.
4. Evaluate the three questionnaires critically, and offer suggestions for improvement in the
same.
5. Write short notes on:
(i) Focus Groups
(ii) Projective Techniques
(iii) Sources of Secondary Data
(iv) General Guidelines for a Questionnaire Relating to Collection of Data
Presentation of Data
8
1. Exploratory Analysis
2. Classification and Tabulation
Frequency and Cumulative Frequency Tables
3. Diagrammatic/Graphical Presentation
Stem and Leaf Diagram
Bar Chart
Contents Pareto Chart
Pie Chart
Histogram
Ogive
Line Graph
Lorenz Curve
4. Use of Graphs as Management Tool
LEARNING OBJECTIVES
The main objective of this chapter is to provide the methodology and scope of various modes of pre-
sentation of data. This is intended to help in making effective presentation of data and conclusions
relating to analysis and evaluation for an assignment or a project. Some of the charts and graphs
discussed are:
Bar Chart
Pareto Chart
Pie Chart
Histogram
Line Graph
Two distinguishing features of the chapter are:
Use of live data to describe various charts and graphs and thus help in using the same for any
live data in the work environment.
Highlight use of Graphs as a management tool through a live case study so as to facilitate using
the same or innovating another way to improve effectiveness of managerial functions.
is said that instead of our looking at them they look at us! If fact, they reflect our performance.
There is sufficient scope for making effective use of graphs and charts for managerial functions. We
have illustrated this through a case study in Section 3.3 of this chapter.
Various aspects relating to classification and presentation of the data are described in this chapter
with the help of live data taken from newspapers and magazines. Incidentally, this is referred to as
Exploratory Analysis which comprises easy to construct tables and diagrams that summarises and
describe the data.
The above data will be referred to as ‘Billionaires Data’ in subsequent sections and chapters in
this book.
Presentation of Data 8.3
It may be observed that the data arranged in ascending order is more meaningful and conveys
more information than the raw data. For example, one can immediately comprehend that:
(i) The minimum equity holdings of the billionaires is Rs. 2,717 (Millions) and the maximum is
Rs 6,874 (Millions)
(ii) The number of billionaires holding equity less than 3,000 are only two
(iii) The number of billionaires holding equity more than 6,000 are only three, and so on.
Similarly, the data can be arranged in descending order, and similar type of conclusions drawn.
Stem and Leaf Diagram A stem and leaf diagram presents a visual summary of a data. The dia-
gram provides sorting of the data and helps in detecting the distributional pattern of the data.
Stem and leaf diagram for the billionaires’ equity data is given below. While the stem units 2,
3, 4, 5 and 6 represents ‘000 in data values, the leaf values on the right side indicate ’00 in data
values. Thus, e.g., 2,796 comprises of 2 units as stem and 8 units (796 is approximated as 800) as
leaf. Similarly, the last value, 6,874 has 6 as stem value and 9 (874 is approximated by 9) as leaf
value.
Stem-and-Leaf Display
Stem unit: 1000
2 78
3 11559
4 235789
5 0146
6 579
8.2.3 Classification and Tabulation
Classification is the first step in Tabulation. Classification implies bringing together the items which
are similar in some respect(s). For example, students of a class may be grouped together with respect
to their marks obtained in an examination, their age or area of specialisation, etc.
After classification, tabulation is done to condense the data in a compact form which can be easily
comprehended.
Specific advantages or objectives of Tabulation are that it:
Summarises data into rows and columns.
Gives appropriate classification with number of data items into cells (intersection of rows and
columns), subtotals of rows and columns, etc. This help in drawing useful interpretations about
the data.
Provides significant features of data including comparisons that are revealed.
For example, the following tabulation of marks of 200 students, classified at intervals of 20 marks
each, reveals that the maximum number of students have obtained marks between 61 and 80.
8.4 Business Research Methodology
If marks are also classified with respect to sex of students, the tabulation, given below, might
reveal that the percentage of girls with marks more than 60% is more than percentage of boys with
marks more than 60%, as
Distribution of Marks among Boy and Girl Students
Procedure for Classification and Tabulation The procedure for classification and tabulation of
a data is illustrated in this section through the example of data on billionaires.
The first step in classification and tabulation is to group data into suitable number of class inter-
vals. This has been done for the billionaires’ data, as follows:
Class Intervals@ Actual Observations Tally Sheet* Number of Observations
2000–3000 2717, 2796 || 2
3000–4000 3098, 3144, 3527, 3534, 3862 |||| 5
4000–5000 4187, 4310, 4506, 4745, 4784, 4923 |||| | 6
5000–6000 5034, 5071, 5424 |||| 4
6000–7000 6505, 6707, 6874 ||| 3
@ Whenever the data is grouped in class intervals, some information is lost. In ungrouped data, we know the indi-
vidual observations but, in grouped data we only know only the number of observations in an interval but not their
individual values. However, this disadvantage is compensated by the advantage of comprehending the data in grouped
form—and the ease of calculations if the number of observations is large. The number of class intervals is decided
by considerations of both advantages and the disadvantages. If the number of intervals is large, the advantages get
reduced, and if the number of intervals is small, the disadvantages increase. Therefore, one has to decide the number
of intervals rather judiciously. In the above example, the number of intervals seem to be optimum. The intervals can
be continuous or discontinuous. In continuous intervals, the upper value of the previous interval is the same as the
lower point of the next interval—as in the above case. In such cases, it has to be stated clearly as to if an observa-
tion is exactly equal to the value of the lower/upper interval, in which group that observation is to be taken. For
example, in the above case, it could be stated that when an observation is exactly equal to the upper value of one
interval and lower value of the next interval, it will be included in the next interval. The only care to be taken, in
such a case, is that no observation should be equal to the upper value of the last interval.
*One tally mark means one observation. Every fifth observation is represented by horizontal line-cutting through
all the tally marks.
Presentation of Data 8.5
The width of the class intervals is generally, taken as equal, but it is not a must.
The intervals are sometimes open either for the lower-most interval or for the upper-most interval
or for both the intervals. For example, in the case of the data relating to annual income of individu-
als, the lower interval could be less than Rs 50,000, and the upper class interval could be more than
Rs 10 lakhs, as in the following table:
The most commonly used presentation of grouped data is in the form of a frequency table given
below for billionaire’s data:
*The frequency is represented by small letter f with a subscript ‘i’ indicating the frequency of the ith interval. Thus:
f1 = 2 , f2 = 5 , f3 = 6 , f4 = 4 , and f5 = 3.
@
fci represents the cumulative frequency up to the ith interval. Thus fc1= 2, fc2 = 7, fc3 = 13 , fc4 = 17 , and fc5 =
20. It may be noted that the cumulative frequency up to the last class interval is the total frequency, i.e. the number
of observations.
Pie Chart
Histogram
Ogive
Line Graph
Lorenz Curve
Subdivided Bar Chart A Subdivided Bar Chart is a bar chart wherein each bar is divided into
further components. In the above example, if the information about the cities from where the students
have graduated, is also available as given below.
Presentation of Data 8.7
The entire information is presented with the help of subdivided bar chart depicted as follows.
Multiple Bar Chart Multiple bar chart is one in which two or more bars are placed together for
each entity. The bars are placed together to give comparative assessment of values of some parameter
over two periods of time or at two different locations, etc. This chart is normally used when we wish
to present visual comparison of two years’ data for several entities, brands, etc.
As an illustration of the Multiple Bar Chart, we consider the following data giving sales of top
market brands among pain killers in India.
(Rs. in Crores)
Pain Killer 2005 2006
Voveran 16.5 23.2
Calpol 13.2 18.2
Nise 15.2 18.6
Combiflam 9.4 14.1
Dolonex 6.8 10.3
Sumo 5.1 7.4
Volini 6.9 9.6
Moov 3.8 4.9
Nimulid 3.5 4.9
Source: Economic Times dt. 9th October 2006.
8.8 Business Research Methodology
The multiple bar chart for presentation of the above data is as follows:
Pie Chart
When a question is asked as to why the pie chart is called by that term, a majority of
persons who have some idea about the chart respond that it is something to do with p,
because they feel that the chart is circular in shape, and the circumference of a circle is
2p r. They forget that the symbol p has the spelling “pi” while the chart is pie chart. The
widely held belief is that since pie is a dish like cake, with circular shape, and it is cut into
pieces which resemble the chart, it is called a pie chart. However, there is one more belief.
Pie was the name of a cook in a royal palace in France. Instead of arranging different
dishes in different plates, as we do at home or in parties, he used to place all the dishes
in each and every plate so that one could pick up all the items from one plate instead of
moving from plate to plate. He used to arrange the dishes on the plates just like the pie
chart—voluminous items like chips getting more space than heavier items like biscuits.
Because of this, it is believed that the chart is named after him.
The above information relating to sources of funds is presented in a tabular form as follows:
Sources of Funds Percentage of Total * Size of Segment (Degrees)
Excise 17 61.2
Customs 12 43.2
Corporate Tax 21 75.6
Income Tax 13 46.8
Service Tax 7 25.2
Borrowings & Others 30 108
Total 100 360
*Size of segment is derived by dividing the percentage value by 100 and then multiplying by 360o. For example, for
Income Tax, the segment = (13/100) 360° = 46.8°. The pie chart for the above data is presented below:
The polygon formed by joining the middle points of the above rectangles is known as the fre-
quency polygon. It gives an idea of the shape of the distribution of frequencies.
(v) Ogive
An ogive graph gives an idea about the number of observations less or greater than the values in the
range of the variable. Accordingly, there are two types of ogives, viz. ‘Less Than’ and ‘Greater
Than’.
most important characteristic of median is that it divides the data into two equal parts such
that 50% of the observations have value less than this, and the other 50% have values more
than this.
(Rs in Lakhs)
Bank Business Per Employee Business Per Employee
2005-06 2001-02
Allahabad Bank 336 153
Andhra Bank 426.75 195.96
Bank of Baroda 396 222.76
Bank of India 381 218.74
Bank of Maharashtra 306.18 191.44
Canara Bank 441.57 214.88
Central Bank of India 240.46 148.77
Corporation Bank 527 290.44
Dena Bank 364 221
Indian Bank 295 156
Indian Overseas Bank 354.73 175.41
Source: Economic Times dt. 23rd October 2006.
In the above data if the equity holding was equally distributed, 20% of equity would have been
held by 20% of billionaires, 40% of equity would have been held by 40% of billionaires, and so on.
However, it is not so in the above data because of inequality in distribution of equity holdings. The
extent of this inequality is shown in the Lorenz curve depicted in the graph given below. The gap
between the line of perfect equality and the line of actual equity holding can be shaded to indicate
departure from equality. Thus shaded area would indicate the extent of departure of equality in the
data from perfect equality; more the area more is the inequality.
After sometime, Dr. Damle was posted as a branch manager. He looked forward to demonstrate
the use of information systems even at the branch level.
There were 10 officers at the branch. He made each officer responsible for the different types
of business, comprising deposits, advances and other miscellaneous business. He asked each one
of them to prepare a graph depicting cumulative business at the end of every month for their slice
of business during the next year. Of course, he provided requisite guidance to them in facilitating
finalisation of their individual goals. Dr. Damle, then finalised the budget with his regional manager
in the presence of all the officers.
Thereafter, he asked each of the officers to make a big size graph showing growth of business
each month, and pin it on the board in his cabin. Thus, there were 10 graphs in his cabin within the
first week of the first month of the new financial year.
Dr. Damle instructed that after the end of the month, each officer should plot the actual business
value vis-à-vis the budgeted value as shown in the concerned graph. Dr. Damle felt that no officer
would like to plot actual value less than the budgeted value—specially since the budget was decided
by the officer himself. And, it really happened that way. As a result, the branch showed a phenom-
enal growth in its business. When the news reached the chairman, he was surprised, and visited the
branch to appreciate the approach and efforts of Dr. Damle. He also gave a letter of appreciation
to Dr. Damle, and promoted him as regional manager. Dr. Damle was happy that he had proved his
conviction that the combination of psychology and graphs could do wonders. He continued to adopt
the same strategy with the branch managers of his region. This approach which inter-alia included
use of graph as a management tool did wonders, and soon he was picked up as a top level executive
in another organisation.
Important developments having impact on business were recorded under the date / week / month
when the development took place. Thus while looking at the graph, one could make out the impact
of the development, and other factors including seasonality, if any.
Normally, graphs are not used as an integral part of a Management Information System. However,
these can be used very effectively as planning and monitoring tools for effective management of a
system.
Growth and fluctuations in the volume of any business activity like sales, profit, etc. or any
other parameter of environment like prices of commodities and stocks, etc. do get reflected in the
graphs.
Presentation of Data 8.15
As mentioned earlier, it is said that instead of our looking at them, they ‘look’ at us. They speak
the truth. They reflect performance of the management or the system and provide motivation to
improve or take corrective action, if necessary.
9
1. Measures of Central Tendency
(a) Mean
(b) Median
(c) Mode
2. Measures of Variation
(a) Range
Contents (b) Semi Inter-Quartile Range
(c) Standard Deviation (Variance)
(d) Coefficient of Variation
3. Measures of Skewness
4. Standardised Variables and Scores
5. Using Excel
LEARNING OBJECTIVES
This chapter provides an understanding of relevance and need for calculation of:
Various Measures of Location such as average or mean, median, mode, etc. as also their relative
advantages and limitations.
Various Measures of Variation or Dispersion such as range, standard deviation, mean deviation,
etc., as also their relative advantages and limitations.
Various measures of uniformity, consistency, disparity, volatility, etc. For example, an investor,
in addition to expected return from an investment, may also like to assess the volatility or risk in
returns.
Measure of symmetry/skewness in data. In real life not all data are symmetrical with reference
to their central value, and it is worthwhile to have a measure of the deviation from symmetry for
comparing two or more sets of data.
Relevance
The ‘Reliable’ company had two plants to manufacture a type of bearing. Because of the compe-
tition and the criticality of the item, the company decided to pay utmost attention to ensure that
the quality was as per the specification. However, due to staff problems, the company started
noticing high rate of rejection of the item from the users. On analysing the quality of bearings,
the company noted that the bearings from the two plants were getting rejected because of two
9.2 Business Research Methodology
different reasons. While, the bearings from one plant were getting rejected because they were
not measuring to the specified dimension – in fact it was lesser than specified, the bearings
from other factory were getting rejected because of variation in the dimension – some were less
than the specification and some were more than the specification. When the company referred
the problem to a consultant, he pointed out that it was not only necessary to ensure quality in
terms of specified average of dimension but also in terms of variation in the dimension from
bearing to bearing. Further investigations revealed that in the first plant, the problems were
mostly on account of the technical staff while in the second plant, it was mostly due to unrest
among the operators. The problems were sorted out accordingly, and the company was able to
retrieve the loss of image it had suffered.
9.2.1 Mean
There are three types of means viz.,
Arithmetic Mean
Harmonic Mean
Geometric Mean
These are described below along with their relative advantages and limitations:
Arithmetic Mean
The arithmetic mean is defined for ungrouped data as well as for grouped data. We shall discuss
both, separately.
(a) Ungrouped (Raw) Data
For ungrouped data, the arithmetic mean is defined as follows:
Arithmetic Mean (A.M.) of a set of n values, say, x1, x2, x3, …. xi, ….xn, is defined a
Sum of Observations
x = (9.1)
Number of Observations
n
xi
x1 x2 xi xn i 1
= =
n n
Illustration 9.1
The following data, given in Section 8.2.1, gives value of equity holdings of 20 of the India’s
billionaires, in ascending order.
(Contd)
Shashi Ruia 3527
K. K. Birla 3534
B. Ramalinga Raju, Rama Raju & Family 3862
Habil F. Khorakiwala 4187
The Murthy family 4310
Keshub Mahindra 4506
The Kirloskar family 4745
M.V. Subbiah & family 4784
Ajay G. Piramal 4923
Uday Kotak 5034
S.P. Hinduja 5071
Subhash Chandra 5424
Adi Godrej 5561
Vijay Mallya 6505
V.N. Dhoot 6707
Naresh Goyal 6874
(Rs in Millions)
then
k
fi xi
i 1
x = (9.2)
k
fi
i 1
where xi is the middle point of the ith class interval, fi is the frequency of the ith class interval, fixi
is the product of fi and xi, and k is the number of class intervals.
The above formula is justified as follows:
The basic definition of the mean remains the same i.e.
Sum of Observations
x =
Number of Observation
To find the sum of all the observations, all we know is that fi observations are in the i th class
interval. Since individual observations are not available, we assume that all the fi observations are
equal to the middle point (xi) of the i th class interval. Thus the total of fi observations in the i th class
interval is equal to fi xi . It is well appreciated that all the fi observations cannot be equal to xi, but
it is expected that some observations will be less than xi and some observations will be more than
xi . Thus, on the average, positive and negative errors will cancel out each other. Further, this is the
best that can be assumed when the individual observations are not known. Thus, the sum of all the
observations in the data is to take the summation of all products fi xi i.e. fi xi . To get the A.M.,
this sum is divided by the total of the frequencies of all class intervals viz. fi.
The following illustration explains calculation of arithmetic mean from a grouped data.
Illustration 9.2
The calculation is illustrated with the data relating to equity holdings of the group of 20 billionaires
given in Illustration 9.1.
Class Interval* Frequency (fi) Mid Value of Class Interval (xi) fixi Col.(4) = Col.(2) 3 Col.(3)
(1) (2) (3)
2000 – 3000 2 2500 5000
3000 – 4000 5 3500 17500
4000 – 5000 6 4500 27000
5000 – 6000 4 5500 22000
6000 – 7000 3 6500 19500
Sum fi = 20 fixi = 91000
*(Intervals include the upper class value but not the lower)
9.6 Business Research Methodology
5
fi xi
i 1
Substituting values of fi and fixi, in formula , we get
5
fi
i 1
x = 91,000 ÷ 20
= 4550
It may be noted that the A.M. of the ungrouped data worked out as 4565.4 is not the same as
the value obtained in the case of grouped data. In fact, it need not be so because while calculating
A.M. from the grouped data, it is assumed that all the observations in a class interval have the same
value viz. the middle point of the interval. The value of A.M. obtained from the grouped data is
only an approximation of the value obtained from ungrouped data.
Combined A.M. of Two Sets of Data
Let there be two sets of data with
Number of observations = n1 and n2
A.Ms. = x1 and x 2
If these two data are combined, the combined mean x is given by
from Rs 10 to 11 per kg. It may be intuitively noted that the impact of price rise in rice is more
than the impact of price rise in sugar as the consumption of rice is more. Mathematically, it can be
shown below. It may be noted that the price rise in sugar has been given a weight of 5, and price
rise in rice has been given a weight of 20 based on their monthly consumption.
Advantages Disadvantages
Easy to understand and calculate. Unduly influenced by extreme values.
Makes use of full data. Cannot be calculated if values of all observations are
not known. For example, in an ungrouped data
5 , 8 , 10 , 15 , > 20 wherein, all we know about the
fifth observation is that it is more than 20, but we
do not know its actual value. Thus, it is not possible
to find out sum of all the observations, and hence
A.M. cannot be calculated.
Only sum of values and their number need It cannot be calculated for a grouped data, if even
be known for determination. one class interval is open-ended.
The sum of observations can be calculated It cannot be located or comprehended graphically.
with its knowledge as also the number of
observations. For example, if average of 5
observations is 10, then the sum of all the 5
observations is 5 10 = 50.
It is amenable to mathematical treatment. It is suitable mostly for data which are symmetrical
For example, given the means of two sets or near symmetrical around some central value.
of data, we can find the combined mean for
the data formed by combining the two sets
of data. This property of A.M. forms the
basis for most of statistical analysis.
It is least susceptible to change from sample
to sample
9.8 Business Research Methodology
Geometric Mean
While calculating arithmetic mean, equal weight is attached to all the observations except while
calculating weighted mean. This may, at times, lead to arithmetic mean not being a true representa-
tive of the observations. For example, the arithmetic mean of 4 observations 1, 2, 3 and 10 is 16/4
= 4 which is obviously not a true representative of the data. One of the measures of location that
could be used in such a situation is the Geometric Mean.
The Geometric Mean (G.M.) of a series of observations with x1, x2, x3, …, xn is defined as the
nth root of the product of these values. Mathematically
G.M. = {(x1) (x2) (x3) … (xn)}(1/ n) (9.5)
It may be noted that the G.M. cannot be defined if any value of x is zero as the whole product
of various values becomes zero.
For example, the G.M of 0, 2, 4, 5, 9, and 10 is 0.
Further, G.M. is also not defined for negative values of observations.
If, it is found difficult to calculate the value of G.M. by the above formula, one may take loga-
rithms on both sides to get
1
log G.M. = {log x1 + log x2 + log x3 + … + log xn)
n
= 1 log xi
or n
G.M. = Antilog b ir
1 S log x
(9.6)
n
This formula is, generally, found to be more convenient.
Illustration 9.4
For the data with values, 2, 4, and 8,
G.M. = (2 3 4 3 8)(1/3)
= (64)1/3
=4
It can also be calculated by taking logarithm on both sides as illustrated below:
log G.M. = (1/3) log 64 = (1/3) 3 (1.8062 ) = 0.6021
Therefore, G.M. = antilog (0.6021) = 4
Average Rate of Growth of Production/Business or Increase in Prices:
If P1 is the production in the first year and Pn is the production in the nth year, then the aver-
age rate of growth is given by (G – 100)% where,
G = 100 (Pn/P1)1/(n-1) (9.7)
or, log G = log 100 + {1/(n – 1)} (log Pn – log P1) (9.8)
The annual growth rate calculated with the help of G.M. is also called Compound Annual
Growth Rate or Average Rate of Return on Investment.
Basic Analysis of Data 9.9
Advantages Disadvantages
Makes use of full data. Cannot be determined if any value is zero.
Can be used to indicate rate of change. More difficult to calculate and less easily understood
Extreme values have lesser impact on the value If any one value is negative, it cannot be calculated.
of this mean as compared to arithmetic mean.
Specially useful for studying rate of growth.
Harmonic Mean
The harmonic mean (H.M.) is defined as the reciprocal of the arithmetic mean of the recipro-
cals of the observations.
For example, if x1 and x2 are two observations, then the arithmetic means of their reciprocals
viz. 1/x1 and 1/x2 is
{(1/x1) + (1/x2)}/2 = (x2 + x1)/2 x1x2
The reciprocal of this arithmetic mean is 2 x1x2/(x2 + x1). This is called the harmonic mean.
Thus the harmonic mean of two observations x1 and x2 is
2 x1 x2
x1 + x2 (9.9)
This mean is used in averaging rates when the time factor is variable and the act being performed
is the same.
The situation in example relating to average speed of car, discussed earlier in this Section, is of
this type.
It may be noted that the distance traveled from A to B and B to A is the same but time taken to
travel is different because of different speeds of 40 and 60 kms/hr. Therefore, the harmonic mean
giving the average speed can be calculated as follows:
2 2 40 60
Harmonic mean or Average Speed =
1/40 1/60 40 60
4800
=
100
= 48 km/hr.
9.2.2 Median
While the mean is an appropriate measure of location in most of the applications, there are situa-
tions that have extreme values either on lower side or on the higher side. For example, if the data
comprises of values 2, 8, 9, 11, the mean works out to be 30/4 = 7.5 which may not be considered
as a representative of the data as three out of four value are more than this value. Similarly, if the
data comprises of values 8, 9, 11, 22, the mean works out to be 12.5 which again may not be con-
sidered as a good representative of the data.
Further, some times, exact values may not be available at either end of the range of values. For
example, the land holding of 5 farmers could be:
9.10 Business Research Methodology
then also A.M. cannot be calculated because we cannot find out the middle point of the 1st interval,
in the first case, and middle point of the last interval, in the second case.
Thus, whenever there are some extreme values in the data, calculation of A.M. is not desir-
able. Further, whenever, exact values of some observations are not available, A.M. cannot be
calculated. In both the situations, another measure of location called Median is used.
Median of a set of values is defined as the middle most value of a series of values arranged in
ascending/descending order. In general, if there are n observations arranged in ascending or descend-
ing order, median is defined as the value corresponding to the (n + 1)/2th observation.
If the number of observations is odd, the value corresponding to the middle observation is the
median. For example, if the observations are 2, 3, 5, 8 and 10, the median is the value correspond-
ing to the 3rd observation i.e. 5.
If the number of observations is even then the average of the two middle most values is the
median. For example, if the observations are 2, 3, 6, 8, 10 and 12, then the median is the value
corresponding to the 3.5th observation or the average of the 3rd and 4th observations viz. 6 and 8
i.e. 7.
The methodology of calculating the median for a given data is described below.
(a) Ungrouped Data
First the data is arranged in ascending/descending order.
In the earlier example relating to equity holdings data of 20 billionaires given in Table 9.1, the data
is arranged as per ascending order as follows
2717 2796 3098 3144 3527 3534 3862 4187 4310 4506 4745 4784 4923
5034 5071 5424 5561 6505 6707 6874
Here, the number of observations is 20, and therefore there is no middle observation. However, the
two middle most observations are 10th and 11th. The values are 12112 and 12388. Therefore, the
median is their average.
Basic Analysis of Data 9.11
We have to find out the value corresponding to the (13/2)th observation. From the above table, we
conclude that the median – the value corresponding to the 6.5th observation lies in the second interval.
Since 4 observations are in the earlier viz. first interval, the median is the value corresponding to
the 2.5th observation in the interval 10 to 20. Since, there are 5 observations in that interval, we can
presume that, all the five observations are equally spaced at 11,13,15,17 and 19, as shown below
_______*______*_______*_______*_______*_______
10 11 12 13 14 15 16 17 18 19 20
This implies that the 5th observation is at 11, the sixth observation is at 13, and the 7th observa-
tion is at 15, and so on.
Thus, the median corresponding to the 6.5th observation is midway between 13 and 15 i.e. 14.
This also amounts to saying that only two parts of the five parts (frequency of median interval being
5) are added to the lower point of the class interval. Thus, in grouped data, the median is really the
value corresponding to the sixth (i.e.12/2) and not 6.5th observation. This justifies the value (n/2) in
the numerator of the formula for calculating the median from the grouped data.
9.12 Business Research Methodology
The calculation of median for the above grouped data is illustrated below.
(( n / 2) f c )
Median = Lm + 3 wm
fm
((12 / 2) 4)
= 10 + 3 10
5
(6 4) 2
= 10 + 3 10 = 14.
3 10 = 10 +
5 5
We shall now calculate the median for the grouped equity holdings data given in Illustration 9.2.
The data is presented in tabular form as follows:
Here, n = 20, the median class interval is from 4000 to 5000 as the 10th observation lies in this
interval.
Further,
Lm = 4000
fm = 6
fc = 7
wm = 1000
Therefore,
20 / 2 7 1000
Median = 4000 +
6
= 4000 + 3/6 3 1000 = 4000 + 500 = 4500
It may be recalled that the A.M. calculated from the same grouped data is 4550.
It may be added that the median divides the data into two parts such that the number of
observations less than the median are equal to the number of observations more than it. This
property makes median a very useful measure when the data is skewed like income distribution
among persons/households, marks obtained in competitive examinations like that for admission
to Engineering/Medical Colleges, etc.
Frequencies as Percentages:
Sometimes, in a grouped data, instead of the frequencies in class intervals, we are given percentage
of total frequency in each class interval.
Advantages and Disadvantages of Median
The following table gives the advantages and disadvantages of using median as the measure of
location.
Basic Analysis of Data 9.13
Advantages Disadvantages
Simple to understand—divides the sample/ Arrangement of data in ascending or descending
population into two parts such that 50% are less order may be tedious if the number of values
than this value and 50% are more than this value. is large
Simple to calculate especially from ungrouped data Cannot be used to find out the total of the values
Extreme values do not affect its value. Does not lend itself to mathematical operations
Can be determined even when all the values are May not be representative of the data, if the
not known. number of observations are small. For example, if
there are 3 observations, say 1, 2 and 10, the
median is 2 which is not representative.
Specially useful where data is skewed like It is susceptible to more variation from sample to
income/asset distribution. sample
9.2.3 Quartiles
Median divides the data into two parts such that 50% of the observations are less than it and 50%
are more than it. Similarly, there are “Quartiles”. There are three Quartiles viz. Q1, Q2 and Q3. These
are referred to as first, second and third quartiles. The first quartile, Q1, divides the data into two
parts such that 25% (Quarter) of the observations are less than it and 75% more than it. The second
quartile, Q2, is the same as median. The third quartile divides the data into two parts such that
75% observations are less than it and 25% are more than it. All these can be determined, graphically,
with the help of the Ogive curve as shown below (equity holdings data).
As regards determining the quartiles mathematically, just like Median is defined as the value
corresponding to the {(n + 1)/2}th observation for ungrouped data, and the value corresponding to
the (n/2)th observation in the grouped data, Q1 and Q3 are defined as values corresponding to an
observation given below:
Ungrouped Data Grouped Data
(arranged in ascending order)
Lower Quartile Q1 {(n + 1)/4}th (n/4)th
Median Q2 {(n + 1)/2}th (n/2)th
Upper Quartile Q3 {3 (n + 1)/4}th (3n/4)th
9.14 Business Research Methodology
Calculations of Q1 and Q3 for grouped data are similar to median, and are illustrated below (for
equity holdings data)
( n /4) f c
Q1 = LQ1 wQ 1
fQ 1
(3n /4) f c
Q3 = LQ 3 wQ 3
fQ 3
For equity holdings data, the first and third quartiles are calculated as follows:
(( 20 /4 ) 2 )
Q1 = 3000 + 3 1000
5
(5 2)
= 3000 + 3 1000
5
3000
= 3000 + = 3000 + 600
5
= 3600
The interpretation of this value of Q1 is that 25% billionaires have equity holdings less than
Rs 3,600 millions.
(15 13 ) 2
Q3 = 5000 + 3 1000 = 5000 + 3 1000
4 4
= 5500
The interpretation of this value of Q3 is that 75% billionaires have equity holdings less than
Rs 5500 millions.
Incidentally, the average of first and third quartiles could also be considered as a measure of loca-
tion.
Thus (Q1 + Q3)/2 = (3600 + 5500)/2 = 9100/2 = 4550
can also be considered as a measure of location for the equity holdings data.
9.2.4 Percentiles
Just like quartiles divide the data into 4 quarters, there are percentiles which split the data into several
parts, expressed in percentages. A percentile also known as centile, divides the data in such a way
that “given percent of the observations are less than it”. For example, 95% of the observations are
less than the 95th percentile. It may be noted that the 50th percentile denoted as P50 is the same
as the median. The percentiles can be calculated just as median and quartiles.
As an illustration,
( 95 / 100 ) n f c
P95 = LP95 + 3 wP95
f P 95
where, LP95 is the lower point of the class interval containing 95th percent of total frequency, fc is the
cumulative frequency up to the 95th percentile interval, fP95 is the frequency of the 95th percentile
interval and wP95 is the width of the 95th percentile interval. Referring to the Table in Illustration
4.2, and substituting the respective values, we get
Basic Analysis of Data 9.15
(19 / 20 ) 20 17
P95 = 6000 + 3 1000
3
= 6000 + 2 3 1000
3
= 6000 + 667
= 6667
The interpretation of this value is that 95% of the billionaires have equity holdings less than
Rs 6667 Millions.
9.2.5 Mode
In addition to mean and median, Mode is yet another measure of location or central tendency.
Mode is a French word meaning fashion. Accordingly, it is defined in such a way that it represents
the ‘fashion’ of the observations in a data. Mode is defined as the ‘most fashionable’ value, which
maximum number of observations have or tend to have as compared to any other value. For ex-
ample, if, in a group of 10 boys, 3 are wearing white shirts, 4 are wearing blue shirts, 1 is wearing
a red shirt and 2 are wearing yellow shirts, the fashion or mode could be taken as ‘blue’ as it is the
colour of shirts of maximum number of boys. Similarly, if the observations are 2, 4, 4, 5, 8, 8, 8,
and 9, the mode is the number 8 because 3 observations have this value. It is possible to have more
than one mode in a data. For instance, in the data comprising of the observations, 3, 5, 5, 9, 9 and
10, there are 2 modes viz. 5 and 9.
In a grouped data, the mode is calculated by the following formula:
2
Mode = Lm + 3 wm (9.11)
3
where,
Lm is the lower point of the modal class interval
fm is the frequency of the modal class interval
f0 is the frequency of the interval just before the modal interval
f2 is the frequency of the interval just after the modal interval
wm is the width of the modal class interval
The method of calculating mode from a grouped data is illustrated with the help of the equity
holdings data given in Illustration 9.2, and reproduced below:
It may be noted that the modal interval i.e., the class interval with the maximum frequency (6)
is 4000 to 5000. Further,
9.16 Business Research Methodology
Lm = 4000
wm = 1000
fm = 6
f0 = 5
f2 = 4
Therefore (6 5 )
Mode = 4000 + 3 1000
2 6 5 4
= 4000 + 1 3 1000 = 4000 + 333.3
3
= 4333.3
Thus the modal equity holdings of the billionaires is Rs 4333.3 millions.
It may be recalled that the mean and median from the same data are 4550 and 4500.
The calculation of mode is further explained through an illustration below.
(Contd)
Geometric Mean
Makes use of full data Cannot be calculated if any observation has the
value zero or is negative
Extreme large values have lesser impact Difficult to calculate and interpret
Useful for data relating to ratios and percentages
Useful for rate of change/growth
Median
Simple to understand Arranging values in ascending /descending order
may sometime be tedious
Extreme values do not have any impact Sum of the observations cannot be found out,
if only Median is known
Can be calculated even if values of all Not amenable for mathematical operations
observations are not known or data has
open-end class intervals
Used for measuring qualities and factors which
are not quantifiable
Can be approximately determined with the help
of a graph (ogives)
Thus, for the equity holding data of 20 billionaires, the five numbers summary is
4500
3600 5500
2717 6874
9.3.1 Range
It is the simplest measure of variation, and is defined as the difference between the maximum and
the minimum values of the observations:
Range = Maximum Value – Minimum Value
The range for the above four sets of data are 0, 2, 20 and 80, respectively. This does appeal to
the common sense about the comparative presence of variation in the four sets of data.
However, since the range depends only on the two viz. the minimum and the maximum values,
and does not utilize the full information in the given data, it is not considered very reliable or ef-
ficient as brought out in Chapter 11 on Statistical Inference.
However because of simplicity or ease in its calculation, it is widely used in control charts
used for controlling the quality of manufactured items.
Coefficient of Scatter is another measure based on the range of a data. It is defined as the
ratio,
Range Maximum Minimum
=
Maximum + Minimum Maximum + Minimum
and is called “Relative Range”, “Ratio of the Range” or “Coefficient of Scatter”, and gives an
indication about variability in the data. For the above four sets of data, the coefficients of scatter
are 0, 0.02, 0.2 and 0.8, respectively. It may be noted that, being the ratio, this measure is a pure
number and has no unit of measurement.
within which the middle 50 % observations lie. With reference to the equity holdings data, it works
out to
Q3 – Q1 = 5500 – 3600 = 1,900
This implies that 50% of the billionaires have equity holdings between 3,600 to 5,500, the range
of their holdings being 1,900.
However, a much more popular measure of variation is Semi Inter-Quartile Range or Quartile
Deviation, and is defined as
Q3 Q1 (9.12)
2
This is especially useful if sample mean cannot be calculated because of the open ended class
interval(s).
For the equity holdings data in Table 4.1, its value, vide Section 4.2.3, is (5500 – 3600) 4 2 =
1900/2 = 950.
Variance = 1 ( xi x )2 (9.13)
n
It is also written as
1 xi2 x2
= (9.14)
n
9.20 Business Research Methodology
In real life situations, whenever a data is collected or obtained as a sample from a popula-
tion, the variance of the sample values of the observations is defined as
( xi x ) 2
s2 =
n 1
for certain theoretical reasons, discussed in details in Chapter 11 on Statistical Inference.
In fact many of the calculating devices provide variances for both population as well as
sample. Even the template titled ‘Elementary Statistics Chapter 9’ for this chapter gives
values of both the variances. However, in this chapter for all the manual calculations, we
have assumed the given data as the population itself, unless mentioned as sample s.d.
For calculation of variance we can use either of the above two formulas. If it is easy to comput**
(xi – ), the first formula can be used; otherwise the second formula is to be used.
The standard notation for variance is s 2 where s is a Greek symbol anologous to small s in
English language. It may be recalled that is a Greek symbol anologous to capital S in English.
The square root of s 2 i,e, s is known as the standard deviation. For the above example,
3
Variance (s 2) = 1/3 (xi x )2
i =1
= 1/3 (2) = 2/3
= 0.67
Standard Deviation (s) = 0.67
= 0.82
There is another measure of deviation known as Root Mean Square Deviation abbreviated as
RMSD, and is defined as follows:
Root Mean Square Deviation
RMSD = (1/n) (xi – A)2
where, A is some arbitrary value. It can be proved that
RMSD = Variance + d2 where d = A – x
Standard Deviation is minimum when the deviations are measured from the arithmetic
mean.
It may be noted that, as mentioned earlier, while Mean Deviation is minimum when the
deviations are measured from the median, the standard deviation is minimum when the devia-
tions are measured from the mean.
Impact of Change in Origin and Scale
If a constant c is added to each of observations x1, x2,….., xn, the standard deviation of new obser-
vations (x1 + c), ….(xn + c) is unchanged.
For example, as shown above, standard deviation of the observations 1, 2 and 3 is 0.82. Suppose
10 is added to all the three observations so that the new observations become 11, 12, and 13. It can
be verified that the standard deviation of 11, 12 and 13 remain unchanged as 0.82.
Thus, in general, if a constant is added or subtracted from original observation, the s.d. of
new observations remains the same as of original observations.
Basic Analysis of Data 9.21
For example, if s.d. of the variable x is 7, then the s.d. of the variable x + 5 is also 7.
However, if all the original observations are multiplied by a constant, say c, the s.d. of new
observations is c times the s.d. of original observations.
As another example, if the s.d. of variable x is 7, then the s.d. of variable 5x is 5 3 7 = 35 and
the s.d. of variable x/5 is 7/5 = 1.4.
As yet another example, since s.d. of 1, 2 and 3, as shown above, is 0.82, s.d. of these ob-
servations each multiplied by 100 viz. 100, 200 and 300 also gets multiplied by 100, and is
= 100 3 0.82 = 82. Now, if these observations are each divided by 10 so that the new observations
are 10, 20 and 30, the s.d. of these observations also gets divided by 10, and is = 8.2.
(b) Calculation of Standard Deviation from Grouped Data
The formulae for calculation of variance and s.d. from a grouped data are as follows
fi ( xi x ) 2
Variance = (9.15)
fi
fi xi2 fi ( x ) 2
= (9.16)
fi
Standard Deviation = Variance
Calculation of Variance and Standard Deviation for the Billionaires Data Given in
Illustration 9.2
Class Interval Mid Point of Frequency fi xi f i x i2 (xi – x ) (xi – x )2 fi (xi – x )2
Class Interval (xi) (fi)
2000–3000 2500 2 5000 12500000 -2050 4202500 8405000
3000–4000 3500 5 17500 61250000 -1050 1102500 5512500
4000–5000 4500 6 27000 121500000 -50 2500 15000
5000–6000 5500 4 22000 121000000 950 902500 3610000
6000–7000 6500 3 19500 126750000 1950 3802500 11407500
Sum 20 91000 443000000 10012500 28950000
**verage () 4550 Variance = 1447500
Thus the variance for the given data by using the formula (9.15) is 1447500 and Standard
Deviation is 1203.1.
The variance can also be calculated by the formula (9.16) by substituting the relevant val-
ues.
It may be verified that, the variance of the equity holdings for ungrouped data is 1217.8.
Combining Variances of Two Populations Suppose a set of data has n1 values whose variance
is s12, and another set of data has n2 values whose variance is s22, and both the sets of data have the
same mean, then if these two sets of data are combined to have n1 + n2 values, then the variance
s 2 of this combined data is
n1s12 n 2s 22
n1 n 2
9.22 Business Research Methodology
Interpretation of S.D.
It may be noted that while s.d. is most useful in describing or analysing a data, there is no
physical interpretation of s.d. like that of mean or median. It is useful only for comparative
purposes. For example, if s.d. of population ‘A’ is 4 and the s.d. of population ‘B’ is 6, we
conclude that dispersion among data values in population ‘B’ is more than dispersion among
data values in population ‘A’.
Thus, if standard deviation alone were to be taken as an index of variation, one would conclude
that all the three sets have the same extent of variation. But, it may be noted that the variation in the
first set is around 1 in the average value of 10, in the second set it is around 1 in the average value
of 50, and in the third set, it is around 1 in the average value of 100. Thus, it appears intuitively that
the variation is least in the third case and maximum in the first set. It was, therefore, felt necessary
to evolve a measure of variation, which could take this aspect into account, and also which is a pure
number without any units. Such measures are described in the next paragraph.
Coefficient of Variation (C.V.) or Coefficient of Dispersion (C.D.) or Relative Dispersion is
a statistical measure introduced by Karl Pearson. It helps in studying the relative dispersion of two
or more sets of data. It is defined as the ratio of standard deviation to the mean, and is, usually,
expressed in % form. If m and s are mean and standard deviation of a data, then its
s
C.V. = 3 100 (9.17)
m
For example, if m = 10, and s = 2, then
2
C.V. = 3 100
10
= 20%
For the above three sets of data with three observations each, Coefficients of Variation are:
= (0.82/10) 3 100 = 0.0820 3 100 = 82.0% (for values 9, 10 and 11)
= (0.82/50) 3 100 = 0.0164 3 100 = 16.4% (for values 49, 50 and 51)
= (0.82/100) 3 100 = 0.0082 3 100 = 8.2% (for values 99, 100 and 101)
Thus, the maximum value of C.V. is for the first set, and the minimum value is for the third set,
and it gives true extent of variation in the three sets of data.
For the equity holdings data, given in Illustration 9.1, C.V. is worked out below.
For ungrouped data,
C.V. = (1217.8/4565.4) 3 100 = 26.7%
For grouped data,
C. V. = (1203.1/4550) 3 100 = 26.44%.
Incidentally, because of the above relationship among C.V., m and s, given any two of these quanti-
ties, the third quantity can be found easily.
For example, if mean is 10, and C.V. is 5, then s can be found as follows:
s
5= 3 100
10
Therefore, 50 = 100 s
50
or s= = 0.5
100
Interpretations of C.V. The coefficient of variation can have different interpretations in dif-
ferent applications.
While studying the performance of individuals (like cricketers) and teams, coefficient of varia-
tion can be defined as a measure of consistency in performance. Incidentally, it may be noted that,
lesser the coefficient of variation, more the consistency. It may be interesting to have a look at the
9.24 Business Research Methodology
runs scored by some of the Indian batsmen in the Cricket World Cup-2003 as also the measures of
their consistency in the box given below:
WC 2003 AUS KEN NEWZ SRILA KEN PAK ENG NAMI ZIM AUS HOLL TOTAL NO. AVERAGE SD CV
(FINAL) (SEMI) MATCHES
PLAYER
It may be noted that Dravid was the most consistent Indian batsman in the World Cup 2003.
While studying the per capita income of different states in India or different countries in the world,
it can be defined as a measure of disparity. Disparity indices can be used for various developmental
parameters like literacy, life expectancy, income, bank credit as proportion of deposit, etc. This can
be attempted for among districts, states, among countries, etc.
While studying the return on equity capital invested in some shares, it could be defined as a
measure of volatility or risk (lesser C.V. lesser the risk).
While studying the workload on different counters in banks/booking offices, or for studying wages
in different orgnisations, or compensation offered to MBA students in different Institutes, etc., it
may be defined as a measure of uniformity.
(i) Symmetric
If the distribution is skewed, the extent of skewness could be measured by the following measure,
known as “Bowley’s Coefficient of Skewness”.
( Q3 2 Med) 2 (Med 2 Q1 )
Skb = (9.18)
( Q3 2 Q1 )
It varies from –1 to +1.
If this measure is greater than zero, it is called positively skewed.
If this measure is lesser than zero, it is called negatively skewed.
Pearson’s Measure of Skewness
Karl Pearson has also defined a measure of skewness as follows:
Mean Mode
=
s
If calculation of mode is difficult, it can be calculated as
3(Mean Median)
=
s
Such standardised variable, usually denoted by the letters ‘u’ or ‘z’, has mean as 0 and standard
deviation as 1, as shown below:
xi m
xi xi – m (xi – m)2 zi = zi2
s
1 –1 1 1 3 3
( 2 / 3) 2 2
2 0 0 0 0
1 3 3
3 1 1
( 2 / 3) 2 2
Sum =6 0 2 0 3
Average = 3 0 2/3 0 3/3 = 1
A = 65 3 100 = 81.25%
80
80
B= 3 100 = 94.12%
85
85
C= 3 100 = 89.47%
95
The merit list was prepared on the basis of these standardised marks derived for all applicants.
It may be noted that standardised marks for the student ‘B’ is 94.12% as compared to 89.47% for
student ‘C’. Thus, the performance of student ‘B’ may be considered better than that of ‘C’. However,
Basic Analysis of Data 9.27
such comparison of students is valid for the limited purpose of comparing relative performance in
the examination, and does not reflect the relative intelligence of students.
In the above snapshot of the template for ungrouped data, one could enter the data in the cell A4
downwards. Once the data is entered, the template automatically calculates the values for different
measures given in the boxes above.
The various measures given in the template are:
Measure Description
Sum Sum of all observations
Count Number of observations
Max Maximum of the observations
(Contd)
9.28 Business Research Methodology
(Contd)
Min Minimum of the observations
Arithmetic Mean Mean of the observations
Median Median of the observations
Mode Mode of the observations
Geometric Mean Geometric mean of the observations
Quartile 1 First quartile of the observations
Quartile 2 Second quartile or Median of the observations
Quartile 3 Third quartile of the observations
Percentile Percentile of the observations for given percentage
Range Range of the observations
IQR Inter Quartile Range of the observations
Population variance Population Variance of the observations
Sample Variance Sample Variance of the observations
Population Std Deviation Population Std Dev of the observations
Sample Std Deviation Sample Std Dev of the observations
Coefficient of Variation Coefficient of variation of the observations
Karl Pearson’s- Skewness (If Mode Defined) Karl Pearson’s Skewness of the observations*
Geometric Mean Geometric mean of the observations
Five Number Summary Five number summary of the observations
*Note: Karl Pearson’s Skewness can be calculated only if mode is defined.
We may recall the Illustration 9.2 relating to Billionaire’s data. We have solved this illustration
with the help of the above template.
The above snapshot gives the worksheet containing the template for grouped data. In this template,
if one enters the lower limit, the upper limit and the frequency of the class interval, the template
would automatically calculate the rest of the columns like mid value, f.x, cumulative frequency, and
the basic statistics.
This template can be used to solve all the problems given in this chapter.
Basic Analysis of Data 9.29
EXERCISES
1. Following is the list of countries having GDP more than one Trillion Dollar for the year
2006-07.
Calculate the coefficient of variation among the above countries with respect to GDP. Interpret
the coefficient of variation in the given context.
If the GDP of each country increases by 10% next year, what would be the value of this
coefficient of variation?
2. The following data relates to sales of top market brands among pain killers in India.
Calculate the following indicators for all the ten countries as a whole:
(i) Average life expectancy
(ii) Average school enrollment
(iii) Average GDP per capita
(iv) Average percentage of urban population
5. Jyoti and Anuj are the two eligible candidates for the position of Chief Manager (HRD). Their
overall rating (expressed in %) for the last five years are summarised below.
9.32 Business Research Methodology
Find out the extent to which the incentive scheme has been successful. Also, has the variability
in ‘Number of Days Outstanding’ reduced?
8. The following Table gives the return on investment (ROI) of 60 companies.
Find the mean, median, mode, and coefficient of variation for the ROI. Give the rating, on a
scale from 0 to 100, to a company whose ROI is 12%.
9. Given below is the pattern of deposits of a bank and the corresponding interest rates. Calculate
the appropriate average interest rate cost of deposits.
(Contd)
Less than 6 months 5.0 61
6 months – 1 year 6.0 46
1 year – 3 years 7.0 32
3 years – 5 years 8.0 10
Over 5 years 9.5 32
Simple Correlation and
Regression
10
1. Introduction
2. Scatter Diagram
3. Simple Correlation Coefficient and Coefficient of Determination
4. Rank Correlation
Contents 5. Regression Analysis
(a) Principle of Least Squares
(b) Linear Regression Equation
6. Beta of a Stock
7. Using Excel
LEARNING OBJECTIVES
This chapter provides the requisite knowledge and expertise to:
Understand the relevance and applications of relationship between two variables. For example,
one could explore whether advertising expenses of a company are related to overall sales of the
company?
Determine the nature of mathematical relationship if it exists; for example whether it is linear i.e.
a straight line?
Derive the exact equation, if the relationship is found to be linear or otherwise. The equation gives
an idea of the extent to which advertising can influence the sales.
Do similar analysis and derive another equation to measure the extent of the influence of another
variable like R&D expenses on the sales.
Compare the influences of two variables on one variable like advertising and R&D expenses on
sales.
To forecast one variable with the help of the other variable like forecasting the sales given the
budget for advertising or the budget for R&D.
Measure the extent of association between two variables, which are available in categorical form.
Sometimes, the data is available in categorical form as follows. For example, to assess the taste
of new toothpaste among users, the users may be divided in three categories viz. “young”, “middle
aged” and “old”, and the response may be categorised as “liked”, “indifferent” and “did not like”.
Measure ‘rank correlation’ between two variables whose ranks on some criteria are available
rather than their numerical values. For instance, ranks of 10 companies as per turnover and profit
in a year could be analysed to calculate correlation between ‘sales turnover’ and ‘profit’ of these
companies. Similarly, overall ranks of ten companies could be studied for the years 2005 and
2006 to assess correlation in the performance of these companies in the years 2005 and 2006.
10.2 Business Research Methodology
Relevance
The new CEO of a “Healthcare” pharmaceutical company called a meeting of the Heads of
various departments to discuss the strategy for future. While he expressed satisfaction over
the growing sales of the company, he also emphasised the need for giving a further boost to
the sales and image of the company. The Head of the R&D unit suggested more funds for
innovating new products and improving the existing ones. He pointed that out the R&D had
most significant contribution to the sales of the company. The Head of the Marketing Depart-
ment emphasized the importance of marketing strategies for boosting the sales of the company.
He, therefore, wanted more funds to be made available for the purpose. The Head of HRD
Department suggested need for more staff as also new training programmes for improving the
sales skills and motivation for sales force. This, he said would improve the sales of the com-
pany, very significantly. The CEO agreed, in principle, with them but wanted some analysis of
quantitative facts and figures to evaluate the claims of the Heads of Departments, and commit
funds for the new strategies. The job was entrusted to a consultant who analysed the data, using
statistical techniques, in general, and correlation and regression analysis, in particular to assess
the impact of R&D, Marketing and HRD initiatives in boosting the sales of the company, and
thus facilitated the CEO in taking appropriate decisions based on analytical approach.
x y
x1 y1
x2 y2
— —
— —
xi yi
— —
xn yn
These values of x and y can be depicted with the help of the rectangular co-ordinate system by
plotting the observed pairs of values of x and y, as shown below. This diagram is known as the
scatter diagram.
Fig. 10.1 Scatter Diagram of Sales and Advertisement Expenses in Various Years
10.4 Business Research Methodology
The diagram shows the joint variation among the pairs of values (x1, y1), (x2, y2), ……, (xn, yn),
and gives an idea of the relationship between x and y.
If the points are scattered around a straight line, the correlation is linear (as shown above), and if
the points are scattered around a curve, the correlation is non-linear as shown below. If the points
are scattered all over without any pattern, there may not be any correlation between the variables x
and y.
Two variables may have a positive correlation or negative correlation or they may be uncorrelated.
The variables are said to be positively correlated, if they tend to change in the same direction i.e.
if they tend to increase or decrease together, as shown below:
On the other hand, two variables are said to be negatively correlated if they tend to change in the
opposite direction i.e. when one increases, the other decreases or if one decreases the other increases.
For example the demand of a non-essential product goes down if its price is increased, as depicted
below:
One interesting feature to be noted in day-to-day life is that a mother’s joy increases as the level
of milk in the feeding bottle decreases! As yet another example of negative correlation could be the
correlation between contentment and urge to progress in a person!
However, two variables are uncorrelated when they tend to change with no connection to each
other as shown below in the case of Intelligent Quotient (I.Q.) and Height of persons.
Following scatter diagrams indicating positive, negative and zero correlation are placed together
for easy comprehension.
Fig. 10.6 Scatter Diagrams Depicting Positive, Negative and Zero Correlations
It may be noted that all the points in the scatter diagram tend to lie near a line or a curve with
a positive slope when two variables are positively correlated. Similarly, the points in the scatter
diagram tend to lie near a line or curve with a negative slope when two variables are negatively
correlated. However, in case of zero correlation, the points do not tend to lie on a line or a curve.
In the above Fig. 10.6, both the scatter diagrams with positive and negative correlation indicate
linear relationship. However, there are situations when the scatter diagram suggests curvilinear cor-
relation. A couple of situations are shown below:
Fig. 10.7 Scatter Diagram Indicating Relationship between Levels of Stress and Performance
10.6 Business Research Methodology
It may be noted that as the stress increases, the performance also increases, but beyond a point
when stress increases further, the performance drops. Similar curvilinear pattern is observed while
studying relationship between efforts and results, as indicated below.
Fig. 10.8 Scatter Diagram Indicating Relationship between Efforts and Results
It may be noted that initially when the efforts increase, the results also improve; but after a certain
level, when efforts are increased further, the results start declining. For example, we may improve
our results by studying for more and more hours; but can we improve results by increasing the study
time to, say 20 or more hours?
In this chapter, however, we confine to only linear correlation.
( xi x )( yi y)
= (10.3)
2 2
( xi x) ( yi y)
However, if the variables xi and yi are measured from their means i.e. –x and –y ; respectively, the
formula for r can be written as
(1/ n ) X i Yi
=
(1/ n ) X i2 ) (1/ n ) Yi 2
X i Yi
= (10.4)
( X i2 ) ( Yi 2 )
where X i = (xi – –x ) and Yi = (yi – –y ) are the variables measured from their means.
The correlation coefficient can also be calculated by the following formula derived from formula
(10.3)
( xi yi nx y )
= (10.5)
2 2 2 2
( xi nx ) ( yi ny )
using the results:
(xi – –x )(yi – –y ) = xiyi – nx–
and i(x – –x )2 = x 2 – nx–2
i
and (yi – –y )2 = yi2 – ny–2
It may be noted that the correlation coefficient is just a ratio, and has no dimension like
time, money, etc. It is independent of the dimensions of variables x and y.
Illustration 10. 1
The following data gives Sales and Net Profit for some of the top Auto-makers during the quarter
July-September 2006, Find out the correlation coefficient.
Company Average Sales Year to Year Average Net Profit Year to Year
Estimates (Rs Crores) Growth (%) Estimates (Rs Crores) Growth (%)
Tata Motors 6,484.8 36 466.0 38
Hero Honda 2,196.5 1.4 224.2 6
Bajaj Auto 2,444.7 31 345.4 19
TVS Motor 1,032.9 31 35.1 10
Bharat Forge 461.6 23 63.4 22
Ashok Leyland 1,635.8 31 94.7 26
M&M 2,365.5 24 200.6 28
Maruti Udyog 3,426.5 13 315.7 20
Company Average Sales Estimates (Rs Hundred Crores) Average Net Profit Estimates
(Rs Ten Crores)
Tata Motors 65.00 47
Hero Honda 22.00 22
Bajaj Auto 24.00 34.5
TVS Motor 10.00 3.5
Bharat Forge 5 6
Ashok Leyland 16.00 9
M& M 24.00 20
Maruti Udyog 34.00 32
If one wants to calculate the correlation coefficient between the average sales and net profit of
these automobile companies, one may take the average sales as variable x and average net profit as
y. The scatter diagram is given below:
It may be noted that all the points representing the pairs of values lie around a straight line with
a positive slope. This indicates that the correlation is positive and is linear.
For calculating the correlation coefficient by the formula (10.1), one requires the following
expressions to be calculated:
xi(for calculating –x ), yi(for calculating –y )
xi2, yi2 and xi yi.
The calculations are detailed below:
Automobile Company Average Sales xi Average Profit yi xi2 yi2 x iy i
Tata Motors 65 47 4,225 2209 3055
Hero Honda 22 22 484 484 484
Bajaj Auto 24 34.5 576 1190.25 828
TVS Motor 10 3.5 100 12.25 35
Bharat Forge 5 6 25 36 30
Ashok Leyland 16 9 256 81 144
M&M 24 20 576 400 480
Maruti Udyog 34 32 1,156 1024 1088
Sum ( ) 200 174 7,398 5,436.50 6,144
Average 25(x–) 21.75(y–)
6144 4350
r=
(7398 5000 ) (5436.5 3784.5)
1794
r=
2398 1652
1794
r=
1990.35
= 0.90135
However, in real life situations, when the variables are random variables, the correlation coeffi-
cient does not equal the value 1 (+1 or –1). Its value nearing 1, indicates strong linear relationship,
and its value nearing 0 indicates absence of linear relationship.
It is to be emphasized that the value of correlation coefficient r indicates the extent or inten-
sity of only linear relationship. The low or near zero value of r merely means that the relationship
is not linear but there could be other type of relationships.
The real reason was that over the period, the population was increasing leading to increased
number of marriages. Also, the yield of potatoes was increasing due to better methods of cultivation
and better quality of seeds and fertilisers. Thus both the variables were increasing due to different
factors and not due to interdependence on each other. Such correlation is termed as spurious or
nonsense correlation.
Simple Correlation and Regression 10.11
Yet another example could be the correlation between the number of students getting graduate
degrees every year and the number of auto accidents in the country!
It is, therefore, necessary that before using correlation analysis in any situation, and
using it for studying cause – effect relationship, one must ascertain that there is some
prima-facie reason to believe that the two chosen variables are interrelated. However, if
two variables are observed to be increasing/decreasing together or if one is increasing,
the other is decreasing or vice-versa, time and again, then even if there is no apparent
known relationship, it may be worth investigating the reasons for the observed pattern.
In this example, we may take I.Q. as the independent variable as x, and Marks in Decision Sci-
ence as dependent variable y. This is so, because the marks obtained, would generally depend on
the I.Q. of a student.
is given in the form of the ranks of two variables based on some criterion. In this Section, we
discuss the correlation between ranks of the two variables rather than their absolute values.
For example, instead of the final grades and salary offered in campus placement of the top 10
students, one may have the data about their final grade ranks from 1 to 10, and rank in terms of
salary offered, from 1 to 10. Such data, for a Management Institute for the Batch (Marketing) of
2006, is given below. Incidentally, all these students are those who had no work experience:
With this type of data, we can have an idea of the correlation between ‘Grade’ and ‘Salary’ of the
students through a measure called ‘Spearman Rank Correlation’, introduced by Charles Spearman
in 1904, and described in this section.
Spearman defined the rank correlation as
6 di2
rs = 1 – (10.6)
n (n2 1)
where, di is the difference in the ranks of ith individual or unit, and n is the number of individuals
or units. In the above example, n = 10. The value of d1 for Simran is 0, the value of d2 for Sajay
is 3 – 2 = 1, the value of d3 for Saluni is 3 –2 = 1, and the value for Sumil is 4 – 4 = 0. It may be
noted that while taking the difference between the ranks, we take only the absolute value of the
difference without bothering for the sign.
For calculating the value of rs, the following table is prepared:
6 di2
rs = 1 –
n (n2 1)
6 18 108
=1– =1– = 1 – 0.11
10 (100 1) 990
= 0.89
Thus, the value of Spearman’s rank correlation between Final Grade and Salary offered is 0.89.
The value of rs , like correlation coefficient, lies between –1 and +1. The value + 1 implies that
there is perfect correlation in the ranks i.e. the ranks are the same for all the individuals/units. The
value –1 implies that the two sets of ranking are just reverse of each other. Thus, if there are three
students ranked 1, 2 and 3 on one parameter, they would be ranked 3, 2 and 1 on the second pa-
rameter to get a rank correlation of –1. In the above example, if the data were as follows:
i.e. Simran who was ranked first as per final grade was ranked the last #10 as per salary offered, Sajay
who was ranked second as per final grade was ranked #9, and so on, the Spearman’s rank correlation
would have been –1. This can be verified by calculating the rank correlation for the above Table.
Sometimes, it may happen that two individuals or entities may have the same rank. For instance,
in the above example, suppose Saluni and Sumil had the same CGPA; in that case both of them have
to be given the same rank. This is done by splitting the total of their ranks equally between them.
Thus, both Saluni and Sumil would get rank as (3 + 4)/2 = 3.5. Rest of the methodology remains
the same. The formula does undergo a change in such cases but if the number of observations hav-
ing same rank is not large, the above formula is a good approximation. It may be verified that in
the above case, the rank correlation would work out to be 0.881. As a further possibility, suppose
three students, say Simran, Sajay, and Saluni had the same CGPA, the rank allocated to them would
have been total of their positions viz. 1 + 2 + 3 divided by 3. Thus each one would have been as-
signed the rank 2. It may be verified that in such a case, the rank correlation would work out to be
0.8807.
The above methodology assures justice to all the individuals whose ranks are tied without chang-
ing the sum of ranks for all the individuals or entities as the case might be.
The Spearman’s correlation coefficient is, in general, easier to calculate than Karl Pearson’s cor-
relation. However, it is less reliable. Besides, it has the following limitations:
(i) It is quite cumbersome to calculate, if number of observations are large.
(ii) It cannot be calculated from a grouped data. Incidentally, Pearson’s correlation coefficient can
be calculated from a grouped data, but it has not been discussed in this book.
10.14 Business Research Methodology
However, it does not require any assumption about the distributions of x and y, while Pearson’s
correlation coefficient requires the assumption of normal distributions for both x and y. That is how,
rank correlation is said to be a distribution free or non-parametric method of assessing correlation.
This is a positive point, if one is not sure of the distributions of the variables x and y.
It is interesting to note that the value of Spearman’s rank correlation can be found by taking
the two ranks as two variables x and y, and calculating Pearson’s correlation coefficient be-
tween x and y. However, Spearman’s formula is simpler to calculate as compared to Pearson’s
formula.
We have discussed several examples of rank correlation to indicate its application in variety
of situations, and because one could have a quick assessment of the correlation between two
variables without the use of any calculating aid.
Example 10.1
As per a study, the following are the ranks of priorities for ten factors taken as ‘Job Commitment
Drivers’ among the executives in Asia Pacific (AP) and India. Calculate the rank correlation between
priorities of ‘Job Commitment Drivers’ among executives from India and Asia Pacific.
2006 2005
Rank Company Market Capitalisation Rank Company Market Capitalisation
(Rs Crores) (Rs Crores)
1 RIL 1,62,971 1 ONGC 1,46,835
2 ONGC 1,61,536 2 RIL 1,08,011
3 Infosys 1,12,180 3 NTPC 85,094
4 NTPC 1,07,521 4 Infosys 71,550
5 TCS 1,05,973 5 TCS 69,022
7 Bharti 90,159 6 Bharti 64,717
8 Wipro 78,164 7 Wipro 54,923
9 ITC 69,886 8 Indian Oil 53,431
10 Indian Oil 66,309 9 SBI 48,606
11 ICICI Bank 61,628 10 ITC 48,202
12 BHEL 56,969 11 ICICI Bank 38,932
13 SBI 54,177 12 HLL 38,686
6 HLL 51,222 13 BHEL 28,776
14 HDFC 37,432 14 HDFC 24,516
15 L&T 35,613 15 SAIL 23,688
16 Tata Motors 34,708 16 Tata Motors 20,332
17 SAIL 34,551 17 L&T 19,479
Note: Reliance Communications, Suelos Energy and Bajaj Auto (Ranked 8th, 16th and 20th) are not
included in 2006 and GAIL, HDFC Bank and Tata steel (Ranked 16th, 17th and 18th) are not included
in 2005 as these Companies were not listed within 20 ranks in both the years.
Use the above data to calculate Spearman’s rank correlation as well as Pearson’s correlation
coefficient and comment on the results.
It may be verified that while the Pearson’s correlation coefficient is 0.9897, the Spearman’s cor-
relation coefficient is 0.9314.
or a straight line, like shown below, then the linear function is called simple regression equation,
and the topic under study is referred to as simple regression analysis.
Actually when we talk of relationship between two variables, the scenario could be as
follows – in the order of ignorance to knowledge:
Ignorance: We just do not know whether there exists any relationship at all
We know that the two are related but do not know anything beyond this
We know that the two are positively or negatively related. Positive relation implies that if
one increases, the other also increases, or if one decreases the other also decreases. Nega-
tive relation implies that if one increases the other decreases or vice versa. But, beyond this
aspect of positive and negative correlation, we are ignorant
We know only the nature of relationship; whether it is linear or curvilinear.
Knowledge: We know exactly the mathematical equation of this relationship, so that if one
of the variables is known, the other can be derived from the equation.
In real life, the ‘knowledge’ types of situations are very rare, and we have to be contended
with the next best i.e. statistical relationships. These can be estimated by the Principle of Least
Squares as indicated later in this Chapter.
and mathematically as
y = a + bx
where ‘b’ is the inclination or slope of the straight line and ‘a’ is the intercept of the line on the
y-axis. The value of ‘b’ is the amount of increase in y when x increases by 1. The graph is as fol-
lows:
Principle of Least Squares For a given value of x as xi, the observed value of y, as per the
sample, is yi. For the same value of xi, the estimated value of y, say y^i, as per the equation y = a +
bx is y^i = a + bxi.
Fig. 10.11 Difference Between Observed and Estimated Values as per Equation
The Principle of Least Squares provides the criterion for selecting that line for which the sum
of squares of differences between the observed values and the estimated values is minimum.
The values of ‘a’ and ‘b’ are obtained as
10.18 Business Research Methodology
( xi x )( yi y)
b= (10.7)
{ ( xi x )2 }
xi yi nx y
or, = (10.8)
{ xi2 nx 2 }
and, a = –y – bx– (10.9)
The above equation y = a + bx is called the regression equation of y on x. It indicates the sen-
sitivity of y to changes in x i.e. how y changes with respect to changes in x.
The slope ‘b’ is called the regression coefficient of y on x. It gives the amount of change in y
when x changes by one unit. Thus, when x changes by one unit, y changes by ‘b’ units. For example,
if the relationship is
y (yield of rice in kgs.) = 200 + 5x (rainfall in cms.)
it implies that if rainfall increases by 1 unit i.e. 1 cm., yield of rice increases by 5 units i.e. by
5 Kgs. Thus, one has to be careful while interpreting the estimated or forecasted value of y for a
given value of x.
The intercept ‘a’ has only mathematical interpretation in the sense that it is the value of y for x
= 0, but it need not have any physical interpretation. In fact, sometimes its physical interpretation
can lead to nonsensical conclusion.
Illustration 10.3
We refer to the data given in Illustration 10.2 of studying correlation between I.Q. and Marks, one
can estimate the regression equation of Marks on I.Q. of the type
y (Marks) = a + bx (I.Q.)
by representing Marks by the variable y and I.Q. by the variable x.
This is illustrated below.
For estimating the value of b, by the Formula (10.8), we require the values of, –x , –y , xi yi and
xi2. These quantities are already calculated above as:
–x = 20
–y = 7
xi yi = 960
xi2 = 2650
Substituting these values, we get
b = 0.48
(Note: when we calculated correlation coefficient in this example, we changed origin for x by 100
and for y by 80 by subtracting respective numbers. The regression coefficient will remain same by
change of origin but “a” will be different; hence we will find “a” by substituting actual means)
and a = –y – bx– = 87 – 0.48 × 120 = 29.4
The physical interpretation of the regression coefficient ‘b’ is that an increase of I.Q. by one will
increase the marks by 0.48.
The intercept a = 29.4 which gives the value of y when x = 0, has no physical interpretation as
it means that a student with even I.Q. as 0, will score 29.4 marks.
Simple Correlation and Regression 10.19
However, instead of the variables x and y, if we consider the variables Xi(= xi – –x ) and
Yi(= yi – –y ) i.e. the variables measured from their means, then the regression equation of Y on X is
Y = bX (10.10)
and the regression coefficient ‘b’ is
X i Yi
b= (10.11)
X i2
It may be noted that there is no intercept in the equation (10.10), as its value is 0. The value of
‘b’, however, remains unchanged.
It may be added that if the variables X and Y are measured from –x and –y , respectively or even
some arbitrary values like 5 and 25, respectively, then value of ‘b’ remains unchanged. This property
can be used to simplify calculations if the values of x and y are large like in Example 10.2, which
will be solved using this property.
The formula (10.11) can also be written as
s yx
b= (10.12)
sx2
where, syx is the sample covariance of x and y, and sx2 is the sample variances of x. For the popula-
tion values, the value of b is wriiten as
s yx
b= (10.13)
s x2
where, syx is the population covariance of x and y, and sx2 is the population variances of x.
y y
Y=
sy
then the correlation coefficient between X and Y is the same as between x and y. However, the re-
gression equation of Y on X is
Y = BX (10.14)
where the value of B is different from that of ‘b’. In fact,
sy
B= b (10.15)
sx
In computrtised output such as SPSS, Bs i.e. regression coefficients between standardized variables
are referred to as ‘beta’. It may be clarified that this ‘beta’ is different from ‘beta’ of a stock
described in Section 10.7.
10.20 Business Research Methodology
It may be noted that X and Y, being standardised variables, have mean as 0 and s.d. as 1. Further,
they have no units of measurement like time, weight, money, etc. Thus, for an increase in X by 1
causes a change in Y by B.
In fact, correlation and regression between standardised variables solves the problem of
dealing with different units of measurements of x and y. The values of ‘Bs’ in different regres-
sion equations can be compared to assess the impact of the change in Y due to change in X.
Unexplained Variation
=1–
Total Variation
s e2
=1–
s 2y
The standard error can also be written in terms of correlation coefficient as follows:
2 2 2
e = y (1 – r ) (10.17)
(Contd)
11-10-2006 1143.20 1143.30
12-10-2006 1169.50 1168.75
13-10-2006 1190.15 1190.70
16-10-2006 1213.40 1215.20
17-10-2006 1216.05 121340
18-10-2006 1208.00 1208.50
The relationship between BSE and NSE stock prices could also be studied by taking into consid-
eration the stock prices of each of a number of stocks, say 10 stocks, on a particular day, say 5th
October 2006, and recording the data as follows:
Closing Prices of Some Stocks on BSE and NSE on 5th October 2006
This correlation between BSE and NSE prices gives an idea of correlation for a wider cross-section
of industry.
in the ‘market’ index such as BSE or NSE is taken as the independent variable. Thus the regression
equation fitted to such data is of the form y = a + b x, as follows:
% daily change in the stock price = a + b (% daily change in the market index) (10.18)
A stock’s beta measures the relationship between the stock’s rate of return (variable y) and the
average rate of return for the market as a whole (variable x). If beta is > 1, the stock is said to be
‘aggressive’.
b>1
b=1
b<1
From the above regression equation (10.18), it may be noted that ‘beta’ of a stock is the covari-
ance between the returns on a stock and index (like BSE or NSE), divided by the variance of index
returns i.e.
Covariance (x, y )
b=
Var ( x )
where, x represents the index returns and y represents the stock returns.
The coefficient of determination i.e. r2 derived from the data on percentage daily changes in a
stock and percentage daily changes in market index provides a measure of volatility explained in
a stock’s price by the market.
Incidentally ‘beta’ values of stocks are available on the website of Stock Exchange Mum-
bai (BSE).
Example 10.3
The following data relates to the closing BSE Sensex and the stock price of RIL, for 10 trading
days during the period from 5th October to 18st October 2006. Calculate the beta measure of the
stock of RIL.
Date BSE Stock Price of RIL
5.10.06 12389 1155.05
6.10.06 12373 1163.05
9.10.06 12366 1154.1
10.10.06 12364 1150.5
11.10.06 12353 1143.2
12.10.06 12538 1169.5
(Contd)
10.24 Business Research Methodology
(Contd)
13.10.06 12736 1190.15
16.10.06 12928 1213.4
17.10.06 12884 1216.05
18.10.06 12858 1208
The following table gives the percentage changes in BSE (xi) as well as RIL (yi) and the calcula-
tion requires for calculating covariance xi, yi and variance of x. It may be noted that the percentage
change in BSE index on 6/10/2006 is worked out as:
{(the value of BSE index on 6/10/2006 – the index on 5/10/2006) × 100 ÷ value of BSE index
on 5/10/2006}
The other values of xi and yi have been derived, accordingly
Covariance (x, y )
b=
Var ( x )
yi xi nyx
=
{ xi2 nx 2 }
Substituting the values of –y , –x , yi xi and xi2, in the formula, we get
9.2536 9 0.4168 0.5059
b=
7.194 9 (0.4168) (0.4168)
9.2536 1.8977
=
7.194 1.559
= 1.306
This implies that the RIL stock was 30.6% more aggressive than BSE SENSEX.
The objective of the above analysis is only to explain the calculations involved in determining
the beta of the stock. The objective is not to draw any inference for RIL stock. For drawing any
valid conclusions about a stock, the price of the stock is to be studied over a much longer period.
Simple Correlation and Regression 10.25
Incidentally, as per a study published in Economic Times dated 25th February 2007, there is a trend
in the increase in correlation of Indian stock markets with the other stock markets in the world. It
is mentioned therein that the link-up with equity indices is helping people take a call on how the
Indian market will perform that day. The data containing the correlation of Indian markets with
some of the other markets over the period 2000 to 2006 is given below:
Sticking Together
Year FTSE Nikkei Dow Jones NYSE COM NASDAQ Strait Times CAC40
–UK –Japan –US –US –US –Singapore –France
2000 0.15 0.22 -0.07 0.01 0.07 0.28 0.1
2001 0.21 0.31 0.18 0.15 0.11 0.4 0.17
2002 0.14 0.2 0.03 0.01 0.03 0.22 0.11
2003 0.24 0.25 0.11 0.14 0.11 0.35 0.25
2004 0.24 0.3 0.14 0.18 0.16 0.42 0.27
2005 0.23 0.29 0.07 0.08 0.06 0.3 0.3
2006 0.37 0.39 0.1 0.23 0.11 0.53 0.38
It may be noted that the correlations in 2006 are more than the correlations in the year 2000.
The implication of the first assumption is that the errors are symmetrical with both positive and
negative values. The second assumption implies that the sum of positive and negative errors is zero,
and thus they cancel out each other. The third assumption means that the variance or fluctuations
in all error terms are of the same magnitude. The fourth assumption implies that the error terms are
uncorrelated with each other, i.e. one error term does not influence the other error term.
These are discussed in detail in Chapter 14 on Multivariate Statistical Techniques.
G L O S S A RY
Scatter Diagram Values of x and y depicted with the help of the rectangular co-ordinate
system, by plotting the observed pairs of values of x and y
Correlation Coefficient The quantitative measurement of the degree of linear relationship between
two variables, say ‘x’ and ‘y’
Coefficient of The square of the correlation coefficient.
Determination.
Spurious Correlation When two variables change due to different factors and not due to in-
terdependence on each other
Rank Correlation Association between two variables where data is given in the form of
the ranks of two variables based on some criterion
Regression Analysis Study of the relationship among two or more variables
Simple Regression Study of the relationship between two variables
Analysis
10.28 Business Research Methodology
Multiple Regression Study of the relationship among more than two variables
Analysis
Principle of Least Sum of squares of differences between the observed values and the
Squares estimated values is minimum
Beta of Stock A statistical measure which reflects the sensitiveness of a stock to move-
ment in the stock market index like BSE – SENSEX or NSE – NIFTY,
as a whole
Beta Regression coefficients of standardised independent variables
Concurrent Deviation Measure of correlation that depends only on the sign (and not magnitude)
of the deviations of the two variables, say x and y, recorded at the same
point of time from their values at the preceding point of time
EXERCISES
1. The following data was published by Economic Times in the publication ET 500 containing
salient details of top 500 companies:
Sr. No. Name of Net Sales Net Profit P/E Ratios on
the Company (Sept, 2005) (Sept, 2005) 31st Oct., 2005
(Rs Cr) (Rs Cr)
1. Infosys 7836 2170.9 32
2. TCS 8051 1831.4 30
3. WIPRO 8211 1655.8 31
4. Bharti 9771 1753.5 128
5. Hero Honda 8086 868.4 16
6. ITC 8422 2351.3 20
7. Satyam computers 3996 844.8 23
8. HDFC 3758 1130.1 21
9. Tata Motors 18363 1314.9 14
10. Siemens 2753 254.7 38
Fit regression equations of (i) Net Profit on Net Sales, and (ii) P/E Ratio on Net Sales for group
of these companies.
2. In the list of top 500 companies published as ET 500 in February 2006, the following are the
ranks of the top ten companies according to their (i) Overall rating (ii) Market capitalisation,
and (iii) Net Profit.
Name of Over all Rank Rank as per Rank as per
the Company in February Market Capitalisation Net profit
(Abbreviated) 2006 in Oct., 2005
(Rank within these (Rank within these
10 companies) 10 companies)
Infosys 1 1 2
TCS 2 2 3
WIPRO 3 4 5
Bharti 4 3 4
Hero Honda 5 5 1
ITC 6 8 8
Satyam Computers 7 9 9
HDFC 8 6 7
Tata Motors 9 7 6
Siemens 10 10 10
Calculate
(i) the rank correlation coefficients between the overall rank and market capitalisation rank
(ii) the overall rank and rank as per Net Profit, and
(iii) market capitalisation rank and rank as per Net Profit.
3. The following data gives the closing prices of BSE Sensex, and the stock prices of three indi-
vidual companies viz. ICICI Bank, L&T and Reliance Industries Ltd. during the 10 trading
days during the period from. 6th to 21st March 2006.
Simple Correlation and Regression 10.31
(a) Find the following correlation coefficients between the stock prices of the companies and
comment.
(i) ICICI Bank and Reliance Industries
(ii) ICICI Bank and L&T, and
(iii) L&T and Reliance Industries
(b) Calculate the ‘Beta’ measures of all the three stocks, and comment.
4. A company’s past records contain the following data relating to sales revenue and expenditure
on advertisements for six years, as follows:
Calculate the appropriate regression equation, and estimate the sales in the next year when the
advertisement expenses are budgeted as Rs 30 Crores.
5. A company wanted to assess the consistency between two HRD executives who were to recruit
MBA students for summer placements. They were asked to assess the 12 trainee executives
recruited from the last batch, and give their rankings. The rankings given by the two executives
are as follows:
Trainee Executive 1 2 3 4 5 6 7 8 9 10 11 12
Executive 1 1 11 8 2 12 10 3 4 7 5 6 9
Executive 2 4 12 11 2 5 10 1 3 9 8 6 7
(a)
Find the regression equation of profit on R&D Expenditure.
(b)
Estimate the profit when expenditure on R&D is budgeted at Rs 1 Crore.
(c)
Find the correlation coefficient.
(d)
What proportion of variability in profit is explained by variability in expenditure on
R&D.
7. Following are the ranks of ten different banks on the basis of customer satisfaction and their
market share of deposits.
Customer satisfaction 3 2 1 5 6 4 7 8 9 10
Market Share 1 3 4 5 6 2 10 7 8 9
Sample No. Academic Score Score on Job Sample No. Academic Score Score on Job
1 65 37 7 91 46
2 72 54 8 72 47
3 78 42 9 56 30
4 84 58 10 92 52
5 89 44 11 68 32
6 52 40 12 77 50
Calculate the correlation coefficient between job performance and academic score in manage-
ment programme. Also, find the coefficient of determination, and offer comments.
11
1. Introduction
2. Estimation—Point and Interval Estimation, Confidence Intervals for Mean and Proportion
3. Determination of Sample Size for Estimating Mean and Proportions
4. Testing of Hypothesis
(a) Types of Errors
(i) Type – I
(ii) Type – II
(b) Methodology of Carrying out Tests of Significance
(i) Level of Significance
(ii) Choosing and Calculation of Appropriate Statistic
(iii) Critical and Acceptance Regions
Contents
(iv) Power of a Test
(c) Tests of Significance
(i) Mean(s)
(ii) Proportion(s)
(iii) Regression Coefficient(s)
(iv) Correlation Coefficient
(v) Rank Correlation
(vi) Association or Independence
(vii) Goodness of Fit
(viii) Variances
5. Using Excel
LEARNING OBJECTIVES
This chapter provides the requisite knowledge and expertise to
Understand the two aspects of statistical inference, viz. ‘Estimation’ and ‘Testing of Hypothesis’,
based on a sample of observations from a population. While estimation involves estimating some
parameter of a population like ‘mean life of a brand of car battery’, testing of hypothesis implies
testing of some assumption about the population like whether the mean life of a brand of car
battery is 3 years?
Understand the properties of good estimators, and scope of estimation with respect to accuracy
and confidence in an estimate. It is to be appreciated that an estimate based on a sample from
population cannot be 100% accurate, and one cannot swear by it with 100% confidence. A
compromise is stuck by defining confidence intervals with certain extent of accuracy, say ±5%,
11.2 Business Research Methodology
and also on the confidence, say 99%. in the statement that the true value lies within ±5% of the
sample estimate. Thus a typical conclusion could read like “I am 99% confident that the true value
of population mean lies within ±5% of the sample mean.”
Understand the types of errors committed while drawing conclusions with the help of a sample
from a population about some assumption made about the population parameter. While Type-I
error is “to reject an hypothesis when it is true”, the Type-II error is to “accept a hypothesis when
it is false”. These errors can only be minimised but not eliminated.
Conduct various tests of significance. A test of significance implies the procedure for testing
whether the assumption made about the population parameter is true or not. This is based on
the analysis of observations in the sample. There are many tests of significance, like (i) Mean life
of a brand of car battery is 3 years, (ii) Proportion of defectives in a manufactured lot of electric
bulbs is 2%, (iii) the mean lives of car batteries manufactured in two different plants are equal
(iv) Whether the expenses on R&D have any impact on the sales of a company? (v) Whether the
daily sales of a retail outlet follow any particular pattern or distribution, etc.?
Relevance
The new General Manager of Everbright Light Company manufacturing tube lights is concerned
about the dwindling profits of the company. The main reason is that the company provides a
guarantee of 1 year of life, and undertakes to replace a tube light if it fails within one year.
Since a good number of tube lights are failing in less than a year and are being replaced free
of cost, they are lowering the company’s profitability and also causing loss of reputation. The
General Manager intuitively feels that the guaranteed life must be such that the percentage
of tube lights failing within that period is quite small; say 5% or 10%, so as to keep the cost
of replacement low. Since it may not be appropriate to reduce the guaranteed life, the only
alternative is to increase the life of the tube light. After careful consideration, he outlines the
following steps:
Estimate the average life of tube lights, as well as the variation in their lives.
Take action to increase the life of tube light with the help of improved technology and better
management of the production process.
Test whether the actions taken have increased the life, and by how much?
Fix the price and guarantee period in such a way as to ensure adequate increase in profits.
The subject of statistical inference, as described in this chapter, could play a useful role in
these steps.
company that the average life of bulbs is, say 2,500 hours. Similarly, either one could estimate the
percentage of defective bulbs produced in the factory, or one could test the claim of the factory
that the percentage of defective bulbs is less than or equal to 5%. As another example, suppose an
investor is looking for avenues to invest his recently acquired wealth into mutual funds and stocks
of reputed companies. After discussions with friends, he decides to invest into those stocks and
funds which:
(a) Provide higher yield/returns as compared to other stocks, funds and the overall market.
(b) Are less volatile as compared to other stocks and funds. This criteria is meant to minimise
risk.
Statistical inference provides adequate tools to facilitate decision-making in the above situa-
tion.
In this chapter, we shall describe various methods of estimation, as also various methods of test-
ing hypotheses. We shall also discuss the criteria evolved to measure their ‘goodness’. This helps
in differentiating among various estimators or tests of hypotheses, and selecting the most appropri-
ate ones. However, before discussing the theoretical aspects, we indicate below, the relevance of
estimation and testing of hypothesis in business environment.
The Reliable Company is doing extremely well in terms of growth in sales and profits. However,
the Head of Finance feels that the company can generate more profits if the ever-growing expen-
diture on advertisements is reduced. The Head of Marketing feels that one of the main reasons for
the growth of the company is the aggressive advertising campaigns. To resolve this contentious
issue, Statistical Inference can be used to create an analytical approach that assesses the impact of
advertisements on sales, and provides various scenarios of sales with various levels of expenditure
on advertisement.
The new Chairman of Evershine Detergent Company realised that the sales force in the com-
pany was not performing up to the desired standards. He asked the Human Resources Department
to organise a two-week comprehensive training programme to increase the marketing skills as well
as motivation of the sales force. The company obtained two proposals from two Management In-
stitutes with about the same financial implications. The Chairman felt that before giving the entire
assignment to one particular Institute, the HR Department could organise two training programmes,
one by each of the two Institutes. Based on the comparative effectiveness, evaluated with the help
of statistical inference, of the two programmes, the contract could then be awarded to one of the
Management Institutes.
A pharmaceutical company has developed a medicine for curing insomnia. However, before
introducing it in the market, the company would like to test the effectiveness of the medicine on a
certain number of patients. Such tests could be designed, analysed and evaluated with the help of
statistical inference.
11.2 ESTIMATION
The topic of estimation in Statistics deals with estimation of population parameters like mean of a
statistical distribution. It is assumed, that the concerned variable of the population follows a certain
distribution with some parameter(s). For instance, it may be assumed that the life of the electric
bulbs follows a normal distribution which has two parameters viz. mean (m) and standard deviation
(s). While one of the parameters, say, standard deviation is known to be equal to 200 hours from
11.4 Business Research Methodology
past experience, the other parameter, viz. the mean life of the bulbs, is not known, and which we
wish to estimate.
Given a sample of observations x1, x2, x3, …, xn, one is required to determine with the aid of
these observations, an estimate in the form of a specific number like 2500 hrs., in the above case.
This number can be taken to be the best value of the unknown mean. Such single value estimate
is called ‘Point’ estimate. The estimation could also be in the form of an interval, say 2,300 to
2,700 hrs. This can be taken to include the value of the unknown mean. This is called ‘Interval
Estimation’. An example of point and interval estimation could be provided from our day-to-day
conversation when we talk about commuting time to office. We do make statements like “It takes
about 45 minutes ranging from 40 to 50 minutes depending on the traffic conditions.” The statistical
details of these two types of estimation are described below.
the width of the interval has to be within reasonable limits. This is where the subject of Statistics
plays a role in finding out these limits. The intervals or limits, so arrived, are referred to as confi-
dence intervals or confidence limits. The word confidence implies the degree of confidence one has
that the population value would lie in the interval or lie within the limits. This aspect is discussed,
in detail in the next Section.
Confidence Interval for Mean: (population standard deviation i.e. known) Suppose, one
wants a confidence interval estimation for the mean of a variable which follows the normal distribu-
tion with mean m and standard deviation s.
The value of the estimator, the sample mean, is given by
–x = xi
n
s
The distribution of –x is normal with mean ‘m’ and s.d. .
n
Now we want the confidence interval such that we are 95% confident that the actual mean would lie
within that range. This interval has to be around the estimated value –x , and is derived below.
11.6 Business Research Methodology
? ?
We want that the absolute difference between –x and the population mean m i.e. | –x – m | is to be
less than a value, say d, with probability 0.95 i.e.
P(|x– – m | ≤ d) = 0.95
It may be derived that the 95% confidence interval for mean is
s s
x 1.96 to x 1.96
n n
The interpretation of this confidence interval is that if there is a variable x which is normally
distributed with mean m and s.d , and a sample mean of n observations of the variable is
calculated as –x , then we are 95% confident that the mean m would lie in the above interval.
Similarly, the 99% confidence interval can be derived by finding out the values of z in the
standard normal curve, such that the area between those points is 0.99. From Table T1, relat-
ing to area under standard normal curve, we see that the values are –2.575, and +2.575.
s
x za / 2 (11.1)
n
where,
n is the sample size
s is the s.d of the population
za /2 is the point on the standard normal curve area beyond which is a/2%
(1 – a ) is the level of confidence.
It may be noted that most samples taken in practice are without replacement. In such cases, if
population is finite, i.e. if sample size is greater than 5% of population size, then the standard error
8500 ± 3290
8
Rs. 8500 ± 411.25
(ii) 95% confidence limits:
–x ± 1.96 s
n
2000
8500 ± 1.96
64
8500 ± 3920
8
Rs. 8500 ± 490
(iii) 99% confidence limits,
–x ± 2.575 s
n
8500 ± 2.575 2000
64
5150
8500 ±
8
Rs. 8500 ± 644
It may be noted that the interval or limits get wider as the desired level of confidence is in-
creased.
Confidence Interval for Mean (population standard deviation, , unknown) In the above
examples, the value of population standard deviation was given. If not given, it has to be estimated
from the sample itself. In that case, however, the distribution of –x is not normal but student’s ‘t’
distribution. So, instead of referring to the Table for normal distribution, we refer to the Table T2
for ‘t’ distribution.
Thus the (100 – a)% confidence interval for the population mean is:
–x – t s to –x + t s (11.2)
a/2, (n–1) a /2, (n–1)
n n
where, – and s are the mean and the standard deviation of the observations in a sample of size n,
and ta /2, (n–1) is the value of student’s ‘t’ with (n – 1) d.f., area beyond which, on either side is a/2.
Example 11.2
For assessing the number of monthly transactions in credit cards issued by a bank, transactions in
25 cards were analysed. The analysis revealed an average of 7.4 transactions and sample standard
deviation of 2.25 transactions. Find confidence limits for the monthly number of transactions by all
the credit card holders of the bank?
Solution:
When population standard deviation is not known, and values of sample mean and sample standard
deviation are given, the 95% confidence interval for the population mean is given by
Statistical Inference 11.9
4.635
or, 7.4 ±
5
or, 7.4 ± 0.927
or, 6.473 to 8.327
For large values of n, say ( 30), ‘t’ distribution may be approximated by a normal distribu-
tion. Thus, the limits are given by the formula 11.1, using s in place of s.
Confidence Interval for Proportion The confidence intervals for the proportion are worked out
on the assumption that a proportion follows a Binomial distribution. If the proportion in the popula-
po qo
tion is po, the sample proportion p^ follows the Binomial distribution with mean po and s.d. as
n
where qo = 1– po, and n is the size of the sample or the number of observations in the sample is
n. Since the tables showing the probabilities for various values of p and n are not feasible (would
require many tables for different values of n), we use the property that a Binomial distribution ap-
proaches normal distribution as the value of n become large and p is not near to 0 or 1. Thus, if p^
is the proportion in a sample of size n, then
p^ po
is distributed as N (0, 1)
po qo / n
where, p0 is the proportion in the population.
In general, the confidence interval for the population proportion at a level of significance is
p^ q^
p^ ± za /2 (11.3)
n
Thus the 95% confidence interval for the population proportion is
p^ q^ to p^ + 1.96 p^ q^
p^ – 1.96
n n
Since po is not known its estimate p^ is used in the expression, and thus the limits are
11.10 Business Research Methodology
^ ^
p^ – 1.96 p^ q^ to p^ + 1.96 p q
n n
The calculation of limits is illustrated through the following numerical example:
An insurance company policy sells foreign travel policy to those going abroad. The company in
reputed to settlle the claims within a period of 2 months. However, the new CEO of the company
came to know about the delay in settling claims. He, therefore, ordered the concerned official to take
a sample of 100 claims, and report the proportion of cases which were settled within 2 months. The
CEO received the proportion as 0.6. What are the 95% confidence limits for such proportion?
Here, n = 100, and p^ = 0.6.
Therefore, 95% confidence limits are
0.6 0.4 0.6 0.4
0.6 1.96 to 0.6 1.96
100 100
or, 0.6 – 0.096 to 0.6 + 0.096
or, 0.504 to 0.696
Thus, the CEO could be 95% confident that about 50 to 70% of claims are settled within 2
months.
It may be noted that most samples taken in practice are without replacement. In such cases
if population is finite i.e. if sample size is greater than 5% of population size, then the standard
Illustration 11.1
A company wants to determine the average time to complete a certain job. The past records show
that the s.d. of the completion times for all the workers in the company has been 10 days, and there
is no reason to believe that this would have changed. However, the company feels that because of the
procedural changes, the mean would have changed. Determine the sample size so that the company
may be 95% confident that the sample average remains within ±2 days of the population mean.
The company wants to be 95% confident that the difference between the sample mean (x–) and
actual population mean (m) should be ≤ 2. This can be mathematically expressed as
P{|x– – m| ≤ 2} = 0.95
(dividing both sides of the inequality within brackets by s/ n , we get)
x m 2
P = 0.95
s/ n s/ n
x m
Since is distributed normally with mean 0 and s.d. 1, usually represented by z, we have
s/ n
P | z| 2 = 0.95
s/ n
From, standard normal distribution Table T1, giving area under the curve, we see that the value
of z area beyond which is 2.5% or 0.025 is 1.96 i.e.
P(|z| ≤ 1.96) = 0.95
Therefore,
2
= 1.96
s/ n
or, 2 n = 1.96s
Squaring both sides and substituting value of as 10, we get,
4n = (1.96)2 (100)
or n = 96.04
Thus, a sample size of at least 97 is required to be 95% confident that the difference between
the sample and population means will be less than 2.
0.01 n
P | z| = 0.95
p0 q0
(As mentioned earlier in Section 11.3, (p^ – p0)/ ( p0 q0 )/ n is distribution as N(0, 1))
From the property of normal distribution, we know that the value of z is 1.96
.01
Therefore, 1.96 = p0 q0
n
p0 q0
or, 1.96 = .01
n
Squaring on both sides, we have,
p q
(1.96)2 0 0 = 0.0001
n
p0 q0
or, n = (1.96)2
0.0001
For most conservative estimate of n i.e. the least value of n for which the criteria is satisfied, the
value of the product p0q0 has to be maximum. It can be shown mathematically that the value of the
product of two fractions is maximum when both the fractions are equal to ½. The maximum value
of p0q0 is thus equal to ½ ½ = ¼.
It is also shown in the table given below.
p0 q0 p0 q0
0.1 0.9 0.09
0.2 0.8 0.16
0.3 0.7 0.21
0.4 0.6 0.24
0.5 0.5 0.25
0.6 0.4 0.24
0.7 0.3 0.21
0.8 0.2 0.16
0.9 0.1 0.09
Statistical Inference 11.13
Thus
3.8416 (1/ 4)
n=
0.0001
= 9604
Thus the minimum or the most conservative sample size for ascertaining the percentage of defec-
tives to be within ±1% of the true value with 95% confidence is 9604.
It may be verified that if the margin is increased from 1% to 3%, the required sample size comes
down to 1067.
It may also be verified that if the margin remains the same i.e. 1%, and if the level of confidence
is increased from 95% to 99%, the requisite sample size increases to 16,641.
Formulas for Calculation of Sample Size In fact, the value of the sample size can be readily
found from the following formulas:
(i) For Mean:
2
za /2 s
n (11.4)
Margin of Error
166
Thus, if we want to estimate mean of a population whose s.d. is 200, within a margin of 40 of
the population value, the minimum sample size required is 166.
(ii) For Proportion:
2
za / 2 p (1 p )
n (11.5)
Margin of Error
2401.
Thus, if we want to estimate the proportion in a population, within a margin of 2% of the population
value, then the minimum sample size required is 2401.
It may be noted that there is no lower limit or upper limit for critical or rejection regions.
As an illustration, suppose, one makes a statement that a particular High School has tall
students, and the average height of its Class X students is 165 cms. One alternative, to test this
statement, is to measure the height of all the High school students and accept or reject the statement.
However, if the time and other resources to measure the height of all students are not available,
only a sample could be drawn from the ‘population’ of all High school students, and decision taken
to accept or reject the hypothesis based on this sample. Now, let a sample of size, say 25, be taken
from the total population of, say of 250 students. Obviously the common sense approach would be
Statistical Inference 11.15
to calculate the average height of the students in the sample and compare with the stated average
of 165.
Now it is too much to accept that even if the statement is true, the average height of the sample
would also be 165 cms.
Even without the knowledge of Statistics, the common sense approach dictates that if the sample
mean is close to 165 cms. we can accept the hypothesis that the whole school mean is 165, but if
it is not so close, we can reject the hypothesis.
The following diagram can illustrate this. If the sample means lies between A and B – we can call
it the ‘acceptance region’, we accept the hypothesis. However, if the sample mean lies beyond this
region i.e. either less than A or greater than B—this region can be called the rejection or critical
region—we reject the hypothesis.
Now the problem is to find out the values of A and B. Here the commonsense fails to
specify the exact values of A and B. This can be done with the help of the theory of Statistical
Inference.
Before, we proceed to find out the values of A and B, a few points have to be considered. First
of all, whenever we take a decision about a population (number of students in Class X, in this case)
based on a sample, the decision cannot be 100% foolproof or reliable. The following possibilities
exist:
(i) The hypothesis might be true i.e. the average height of all Class X students is 165 cms but
the 25 students in the sample could be relatively shorter, and, therefore, their average may
work out to less than A or they could be relatively taller, and their average may work out to
more than B. In such a situation, we would reject the hypothesis even if it is true. This type
of judgmental error is called Type–I error.
(ii) The hypothesis might be false i.e. the average height of the Class X students is not 165 cms.
but the 25 students in the sample could be such that their average height works out to be in
the region from A to B, and, therefore, we would accept the hypothesis that the average height
of Class X students is 165 cms. This type of judgmental error is called Type–II error.
Incidentally, in military applications such as radar or sonar, Type-I error is called a ‘miss’ (e.g.
an enemy missile has been missed by the detection system), Type-II error is called a ‘false alarm’
(e.g. detection system is sensing an enemy missile even though it is not there).
We can represent various possibilities in decision making about a population from a sample with
the help of the following diagram.
Hypothesis
True False
Accept Right Type–II
Decision Error
Decision
Reject Type–I Right
Error Decision
11.16 Business Research Methodology
Whenever, a decision about a population is based on a sample, these two errors cannot be elimi-
nated. These can only be minimised. However, it is not possible to reduce both the errors simultane-
ously, for the same sample size; the moment we want to reduce one error, the other gets increased.
This is further explained in Section 11.5.5.
It is, therefore, customary, while evolving various tests, to fix the Type-I error and minimise
the Type-II error. All the tests of significance discussed in this book are based on this premise.
As a convention in Statistics, Type-I error is always denoted by a, and Type-II error by b.
The quantity 1- b is called the ‘power’ of a test, signifying its ability to reject a hypothesis
when it is false, a is also referred to as level of significance, and 1 – a is called confidence
coefficient.
Type-I Error as 5%
It is customary to fix Type - I error, in most cases, as 5%. Sometimes, it is taken as 1%,
or some other %. However, in Statistics, unless otherwise stated, it is taken as 5%. This
convention started in UK/USA. It is interesting to note that, in India, it is quite common
in Hindi/Urdu speaking regions to make a statement about similarity of quality, while
comparing qualities of two items, “Oh!, bus unnis-bees ka fark hai”. In English, the state-
ment means that it is just a difference between 19 and 20, signifying that a difference of
5% is considered negligible or tolerable! Incidentally, it may be noted that on the computer
key board, 5 and % are on the same key!
After setting up the null hypothesis, one has to set up an alternative hypothesis i.e. a statement
which is intended to be accepted if the null hypothesis is rejected. It is denoted by H1. In the above
case relating to height of students, the alternative statement could be that the average height is not
165 cms. These hypotheses could be written as:
Null Hypothesis : H0 : m = 165 cms.
Alternative Hypothesis : H1 : m 165 cms.
Obviously, one of the two hypotheses is to be accepted based on the calculations from the values
obtained through the sample.
In this case, let the sample values (heights of 25 students selected in the sample) be:
161 163 168 162 163
165 164 170 168 167
167 166 166 169 166
169 160 166 169 164
165 168 172 166 166
Now, an estimate of the population mean is obtained from this sample. Thus
–x = xi
= 166
25
where x be the variable representing the characteristic i.e. height of students in this case, and xi the
height of the i th student in the sample.
Further, let us assume that the height of the students follows the normal distribution with mean
m (which is unknown) and standard deviation as 3. That is,
x ~ N (m, 32)
As mentioned in Section 10.3 of this chapter, the sample mean –x , based on sample size n, is
distributed as Normal with mean m and s.d. as 3/ n i.e. 3 / 25 = 3/5, in the above case.
Now the test statistic is formed as follows:
x 165
z= 3
5
It may be noted that the test statistic z is formed by subtracting assumed population mean from
sample mean i.e. (x– – 165), and then dividing the same by the standard deviation of –x . Thus, z is a
standardised variable with mean 0 and s.d. as 1. This is abbreviated as z ~ N(0, 1). Further, let the
Type-I error be fixed as 0.05 i.e. 5%
The acceptance and critical regions are derived as follows. It may be recalled that if a variable
x has mean m and s.d. s, then the sample mean –x of n values of this variable is distributed normally
with mean m and s.d. s / n . Thus, the variable
x m
s/ n
is distributed normally with mean 0 and s.d. 1. Similarly, the above variable z is also distributed
normally with mean 0 and s.d. as 1.
Statistical Inference 11.19
Let us assume that the life of the bulbs represented by the random variable x follows the normal
distribution with mean 650 hrs. and standard deviation 25, and the sample of 100 bulbs is from the
same population. It reduces to the statement that the mean value of the life, in the population (from
which the sample is selected) is 650 and its s.d. is 25.
Now, to set up the null hypothesis and the alternative hypothesis, we note that the null hypoth-
esis is that the average life is 650 hrs. and the alternative hypothesis, desired by the Production
Manager, is that the average life is more than 650 hrs. We may add that even though H0 is that
m = 650, actually because of H1 being m > 650, what it implies is that m 650. Thus, if H0 is
rejected, it means that the average life of bulbs is > 650 hrs. Common sense dictates that for H0
to be rejected and H1 accepted, the average life of sample bulbs has to be more than 650 hrs. If
sample average life is less than 650, the test need not be even conducted, and one could accept H0
or reject H1.
Thus we have, in this case
H0 : m = 650 hrs.
H1 : m > 650 hrs.
The alternative hypothesis is set up depending on the requirement of the situation. Here, the
manager wants the assurance that the average life of all the bulbs manufactured by the unit i.e.
population mean is more than 650 hrs, and, therefore, the alternative hypothesis H1 is that m exceeds
650 hrs.
The statistic to be used in this case is
x m0
z=
s/ n
Its sampling distribution is distributed normally with mean 0 and s.d. as 1.
From the sample values, it is given that
–x = 670, m = 650, n = 100, s = 25
o
Thus, we have
670 650 20 20
z=
25 / 100 25 2 .5
10
=8
Since z is a standard normal variable with mean 0 and s.d. 1, the point for critical region at 5%
level of significance is the point the area to the right of which is 5%. Referring to the Table T1,
giving area under the standard normal curve, we note that this point, as shown below, is 1.645.
Now since the calculated value of z is more than the tabulated value, it falls in the critical region.
Therefore, the null hypothesis is to be rejected i.e. the mean of the bulbs is not 650 hrs, and the
alternative hypothesis i.e. the mean life is greater than 650 hrs. is to be accepted. Thus the Produc-
tion Manager should be convinced that the average life of the bulbs being made by the existing
production process is up to his level of expectation, i.e. exceeding 650 hrs.
If the form of alternative hypothesis has < sign (e.g. m < mo), the entire critical region is taken
in the left tail of the distribution.
If the form of alternative hypothesis has > sign (e.g. m > mo), the entire critical region is taken
on the right side of the distribution.
(vi) Ascertain Tabulated Values Find out the tabulated values of the statistic based on the value
of the level of significance and indicate the critical region—it can be one-sided, i.e, the entire region
is on one side or it can be both sided as shown below.
(vii) Calculate the value of the statistic from the given data It is to be emphasised that the data
should be collected only after setting up the hypotheses. Further, a statistic is always calculated
on the assumption that the null hypothesis is true.
(viii) Accept or Reject the Null Hypothesis If the calculated value of the statistic falls in the
critical region, reject the null hypothesis; otherwise accept the null hypothesis.
Statistical Inference 11.23
All the above steps are explained with the help of several examples given below. However before
proceeding further, we would like to explain as to why the two types of errors viz. Type I and Type
II cannot be reduced simultaneously through an Illustration in Section 11.5.5.
Incidentally, it may be noted that the null hypothesis is in the form of an equation like
m = 10, or inequality like m 10 or m 10. The alternative hypothesis, can, however, be
in the form of either
not equal to ( ), less than (<), or greater than (>).
Now, suppose the mean is actually 32 so that the distribution is as shown below
Now, even if the alternative hypothesis is true, i.e. m = 32, if the sample mean is less than 33.29,
we will accept the null hypothesis i.e. we will commit Type II error. Thus the shaded region marked
by horizontal lines gives the Type II error i.e. b. The two diagrams given above indicate that if we
reduce a, i.e. shaded area marked by slanting lines, the b i.e. shaded area marked by horizontal
lines increases.
Example 11.3
A random sample of 100 students from the current year’s batch gives the mean CGPA as 3.55 and
variance 0.04. Can we say that this is same as the mean CGPA of the previous batch which was
3.5?
Solution:
Since we have to test the hypothesis that the population mean is equal to 3.5, the null and alterna-
tive hypotheses are set up as
H0 : m = 3.5
H1 : m 3.5
Even though population variance is not given, but the sample size is so large that the sample
variance can be taken to be equal to population variance i.e. 0.04. Thus, the test statistic used is
x mo
z= which is distributed as N(0,1)
s n
We are given,
–x = 3.55, m = 3.5, n = 100 and s= 0.04 = 0.2,
0
Therefore,
3.55 3.5 0.05 0.05
z=
0.2 / 100 0.2 /10 0.02
= 2.5
Since, the calculated value of z(2.5) is more than the tabulated value of z, at 5% level of significance,
1.96, the null hypothesis is rejected, as shown below:
Fig 11.9
Thus the current year’s mean CGPA is not the same as the last year’s mean CGPA.
(b)
H0 : m = mo
H1 : m < mo
Example 11.4
It has been found from experience that the mean breaking strength of a brand of thread is 500 gms.
with a s.d. of 40 gms. From the supplies, received during the last month, a sample of 16 pieces
of thread was tested which showed a mean strength of 450 gms. Can we conclude that the thread
supplied is inferior?
11.26 Business Research Methodology
Solution:
In this case
H0 : m = 500
H1 : m < 500
x m0
The test statistics is z= which is distributed as N(0,1)
s/ n
We are given that
–x = 450, m = 500, n = 16 and s = 40,
0
Therefore,
450 500 50 50
z=
40 / 16 40 / 4 10
= –5.0
Here the test statistic falls in the rejection region, and hence we reject the null hypothesis i.e. the
sample indicates that the thread is inferior.
(c) H0 : m = mo
H1 : m > mo
Example 11.5
A telephone company’s records indicate that individual customers pay on an average Rs 155 per
month for long-distance telephone calls with standard deviation Rs 45. A random sample of 40 cus-
tomers’ bills during a given month produced a sample mean of 160 for long-distance calls. At 5%
significance, can we say that the company’s records indicate lesser mean than the actual i.e. actual
mean is more than 155 mts.?
Solution:
Here the two hypotheses are set up as follows.
H0 : m = 155
H1 : m > 155
The test statistic is
x m0
z= which is distributed as N(0,1)
s/ n
Statistical Inference 11.27
= 0.703
Here, the test statistic falls in the acceptance region. Hence we do not reject the null hypothesis i.e.
there is no evidence to infer that records indicate lesser mean than the actual.
(ii) Mean (s unknown)
Here again, we have illustrated three tests for three different types of alternative hypotheses viz.
(a) Not equal to mo
(b) Greater than mo
(c) Less than mo
(a) H0 : m = m0
H1 : m m0
The test statistic formed is:
x mo
‘t’ = ~ ta/2,(n–1) (11.7)
s/ n
where, s is the sample s.d, i.e,
( xi x ) 2
s=
n 1
If the calculated value of t lies in the critical (shaded) region then the null hypothesis is rejected,
otherwise it is accepted.
Example 11.6
A sample size of 10 drawn from a normal population has mean as 31, and variance as 2.25. Is it
reasonable to assume that the mean of the population is 30? Assume a = 0.01.
Solution:
This is the situation corresponding to testing the hypothesis that the population mean has a speci-
fied value m0 when s.d. (s) of the population is not known. The appropriate test is the students ‘t’
test with:
11.28 Business Research Methodology
H0 : m = 30
H1 : m 30
The test statistic is
x m0
t= ~ students ‘t’ distribution with (n – 1) d.f.
s/ n
where,
–x is the sample mean, and
s is the sample standard deviation
In the given case
n = 10
–x = 31
s = 2.25 = 1.5
Thus,
31 30 1
‘t’ = = 2.11
1.5 / 10 0.474
The tabulated value of ‘t’ with (n –1) = (10 – 1) = 9 d.f. at 1% level of significance vide Table T2
is 3.25.
Since the calculated value 2.11 is less than 3.25 and falls in the acceptance region, we do not
reject the null hypothesis H0 that the population mean age of the whole batch is 30. It is therefore
reasonable to assume that the population mean is 30.
Statistical Inference 11.29
If the sample size is large say 30, ‘t’ distribution may be approximated by the normal
x 2 m0
distribution and instead of t statistic, one may use z statistic = ~ N(0, 1)
s/ n
(iii) Proportions:
There are three tests for three different types of alternative hypotheses, viz.:
(a) Not equal to po
(b) Greater than po
(c) Less than po
(a) H0 : p = po
H1 : p po
r
Calculate p^ =
n
| p^ p0 |
z= z~ N( 0 , 1) (11.8)
p0 q0
n
If z 1.96 Accept H0
If z> 1.96 Reject H0
(b) H0 : p = po
H1 : p > po
(c) H0 : p = po
H1 : p < p0
Example 11.7
A manufacturer of LCD TV claims that it is becoming quite popular, and that about 5% homes
are having LCD TV. However, a dealer of conventional TVs claims that the percentage of homes
with LCD TV is less than 5%. A sample of 400 household is surveyed, and it is found that only
18 households have LCD TV. Test at 1% level of significance whether the claim of the company
is tenable.
Solution
Let p be the proportion of defective parts.
In this case, we take the null hypothesis as the claim of the company is 5%, that is to be tested
against the alternative that the percentage of LCD TV is less than 5%. Thus, the null and alternative
hypotheses are set up as follows:
H0 : p = 0.05
H1 : p < 0.05
The appropriate test statistic to be used is:
p^ p0
z= ~ N(0,1)
p0 q0
n
11.30 Business Research Methodology
Since the calculated value of z (–0.4545) is more than the tabulated value (–2.575) and falls in the
acceptance region, we conclude that the null hypothesis is not to be rejected. Thus, the company’s
claim is tenable.
Assuming level of significance to be 5%, accept H0, if |z| 1.96, otherwise, reject H0, and accept
H1 .
(b) H0 : m1 = m2
H1 : m1 m2
(s1 = s2 but unknown)
Since s1 and s2 are unknown, these have to be estimated from the samples. However, it is as-
sumed that the variances in the two populations are equal, say s. The estimate of this is obtained
by ‘s’ through pooling the variances of the two samples, as shown below:
(n1 1) s12 (n2 1) s22
s2 =
( n1 n2 2)
n1
where, s12 = 1 ( x1 i x1 ) 2 (s.d. of first sample)
( n1 1 ) i 1
n2
s22 = 1 ( x2 i x2 ) 2 (s.d. of second sample)
( n 2 1) i 1
If the absolute value of the calculated value of ‘t’ is more than the tabulated value of ‘t’ at (n1 +
n2 – 2) d.f. H0 is rejected.
Example 11.8
A car manufacturer is procuring car batteries from two companies. For testing whether the two
brands of batteries, say ‘A’ and ‘B’, had the same life, the manufacturer collected data about the
lives of both brands of batteries from 20 car owners – 10 using ‘A’ brand and 10 using ‘B’ brand.
The lives were reported as follows:
Lives in Months
Battery ‘A’ : 50 61 54 60 52 58 55 56 54 53
Battery ‘B’ : 65 57 60 55 58 59 62 67 56 61
Test whether both the brands of batteries have the same life?
Solution:
The hypotheses to be tested are:
H0 : m1 = m2 (m1 and m2 being average lives of
H1 : m1 m2 batteries ‘A’ and ‘B’, respectively)
( n1 1) s12 ( n 2 1) s22
s2 =
( n1 n2 2)
n1
s12 = 1 ( x1 i x1 ) 2
where, (s.d. of first sample)
( n1 1) i 1
n2
s22 = 1 ( x2 i x2 ) 2 (s.d. of second sample)
( n2 1) i 1
x1 x2
t=
s 1 1
n1 n2
(with n1 + n2 – 2 = 10 + 10 – 2 = 18 d.f.)
where
n1 and n2 are number of observations for ‘A’ and ‘B’
Statistical Inference 11.33
Note: Here we are using u = x – 50 and v = y – 60, for simplicity of calculations. We are using property of variance
that it is unaffected by change of origin.
= 13.56
s= 3.68
–x = u– + 50 =5.3.+ 50 = 55.3 –x = v– + 60 = 0 + 60 = 60
1 2
This calculated value of ‘t’ is less than tabulated value of t with 18 d.f. at 5% level of significance
(2.101) which lies in the shaded region, which is the rejection region. Therefore, the null hypothesis
that m1 = m2 is rejected. Thus, the lives of the two batteries are not the same.
(c) No Significant Change (Paired ‘t’ Test)
On several occasions, we are required to find out the effectiveness of a ‘treatment’. The treatment
could be in the form of a medicine for curative purpose, training salesmen for improving sales, ad-
vertisement campaign for boosting sales, an exercise for improving, say, a swimmer’s performance,
etc. For testing whether a ‘treatment’ has been effective or not, certain number of persons are se-
lected and their performance recorded before the ‘treatment’ is given. The performance is recorded
again after the ‘treatment’. Based on the analysis, as described below, a conclusion is reached as to
whether the ‘treatment’ has been effective or not.
It is to be appreciated that the ‘treatment’ might cause some change but what is tested is whether
the change is statistically significant or not.
H0 : m1 = m2 (no significant change)
H1 : m1 m2 (significant change i.e. increase or decrease)
d
t= ~ t(n – 1) d.f. (11.11)
sd / n
where,
d i = xi (after) – xi (before).
–
is the difference in ‘before’ and ‘after’ values, for ith person, measured by variable x, d is the mean
of dis, and sd is the s.d. of dis.
Illustration 11.6
Suppose a pharmaceutical company wants to test the effectiveness of a medicine to reduce the level
of blood-sugar in diabetic patients. Obviously, the company would like to administer the medicine
to some patients and record the level of blood-sugar in those patients before the start of taking
medicine and certain days after taking the medicine.
For such situation, the data is collected in the form:
Value of variable (x) relating to various persons Difference between the values
xi (Before) xi (After) di = (xi (Before) – xi (After)) di2
x1 ( ) x1 ( ) d1 d12
: : :
xi ( ) xi ( ) di di2
: : :
xn ( ) xn ( ) dn dn2
Sum d d i2
– i
Average d
Example 11.9
As per ET-TNS Consumer Confidence Survey, published in Economic Times dt. 10th November 2006,
the consumer confidence indices for some of the cities changed from December 2005 to September
2006, as follows. Is the difference significant?
Solution:
The following table gives the data collected and also the calculations required to test the hypothesis
that the consumer confidence indices did not change from December 2005 to September 2006.
Thus, ( di d )2 di 2 8d 2
sd2 =
7 7
2067 8 ( 3.875)2
= = 16.68
Therefore, 7
sd = 5.90
3.875 3.875
‘t’ =
16.68 / 8 5.90
= –0.657
n1
1 x1 ) 2
where, s12 = ( x1 i (s.d. of first sample)
( n1 1) i 1
n2
s22 = 1 ( x2 i x2 ) 2 (s.d. of second sample)
( n2 1) i 1
Example 11.10
A firm wanted to choose a popular actor to be the brand ambassador for the firm’s product. How-
ever, before taking the final decision, the firm conducted a market survey to know the opinion of its
customers in Mumbai and Delhi. The surveys conducted in the two cities revealed that while 290
out of 400 customers favoured the choice, in Mumbai, only 160 out of 300 customers favoured the
choice in Delhi. Can the firm conclude that the proportions of customers who favoured the actor in
Mumbai and Delhi are the same?
Solution:
Let p^1 denote the proportion of customers favouring the choice of the actor in Mumbai.
Let p^2 be proportion of customers favouring the choice of the actor in Delhi.
11.38 Business Research Methodology
and q– = 1 – p– = 0.357
Therefore,
0.725 0.533 0.192
z=
0.037
0.23 1 1
400 300
= 5.24
Since level of significance is not given, we presume it to be 0.05. From the Table T1 of area under
normal curve, we note the two tail value for level of significance 0.05 as 1.96, as shown below:
Since the calculated value of z is in the rejection region, the null hypothesis may be rejected. It
indicates that there is a significant difference in the customers’ responses in the two cities for the
choice of the actor.
(b) One Sided or One Tailed:
H0 : p1 = p2
H1 : p1 > p2
Statistical Inference 11.39
Illustration 11.7
From a hospital record, the following data was obtained about the births of new born babies on
various days of the week during the past year:
Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total
184 148 145 153 150 154 116 1050
Can we conclude that the birth of a child is independent of the day in a week?
Solution:
In this case, the null and alternative hypotheses are
H0 : The birth of child is not associated with any day of the week or it is indepen-
dent
of the day of a week.
H1 : The birth of a child is associated with days of the week.
In such cases, the test statistics to be used is c 2, which is evaluated as follows:
k (Oi ei ) 2
c 2(k – 1) = (11.13a)
i 1 ei
k Oi2
= n (11.13b)
i 1 ei
where, Oi is the observed frequency, ei, is the expected frequency for each of the seven days of
the week, and n is the total number of observations (1050 in this case). The expected values ei are
derived from the equation
ei = n pi
where pi is the probability of an observation lying in the ith interval (ith day of the week in this
case).
One can compute c 2 with either of the above two formulas depending on the values of
(Oi – ei) and the ease of finding squares of Ois. If eis are full numbers and their difference from
Oi s can be found easily, then the first expression may be more convenient as we have to square
smaller quantities. However, if we can square Ois, conveniently, then second expression can
be used.
In the above data relating to birth of children, under the null hypothesis that the birth of a child
is independent of the day of the week, the probability that a child is born on any one particular day
of the week is 1/7. Thus all pis are equal to 1/7. Thus the expected frequency, ei ,for each day of
the week is 1050 1/7 = 150. If we want to use the first expression for calculating the value of c 2,
then the table is prepared as follows:
(Contd)
145 150 –5 25 0.17
153 150 3 9 0.06
150 150 0 0 0
154 150 4 16 0.11
116 150 –34 1156 7.71
c2 = 15.77
Now, the test statistic c 2 is calculated by substituting values of (Oi – ei )2/eis in formula (11.14a)
above, we get
c 2 = 15.77
The number of d.f. for this c 2 are (k – 1) where k is the number of classes (days in this case).
Since number of days are 7, the d.f. are 7–1 = 6. Since the level of significance is not specified, it
is presumed as 5%.
3 3 (Oij eii ) 2
c2 = (11.14)
i 1j 1 eij
where,
Oij is the observed frequency of ith row (i = 1 or ‘High’ and i = 2 for ‘Medium’ and
i = 3 as ‘Low’) and jth column ( j = 1 for ‘ Upto 5’ and j = 2 for ‘5 to 10’ and for
j = 3 ‘More than 10’)
eij is the expected frequency of ith row and jth column, and is derived by the following
expression
(Total observations in ith row) (Total observations in j th column)
eij =
Total number of Observations
n = Total number of observations
Applying this formula for expected frequencies, we get the various expected frequencies as un-
der:
Row 1 Column 1 100 30
e11 = = 15
Total number of Observations 200
Row 3 Column 2 70 60
e32 = = 21
Total number of Observations 200
Row 3 Column 3 30 60
e33 = =9
Total number of Observations 200
The following table is prepared with the help of the above expected frequencies.
Since the calculated value of (16.93) is more than the tabulated value (9.49), it falls in the critical
region, the null hypothesis is to be rejected.
Thus, it is to be concluded that the Annual salary and the level of satisfaction are not independent
of each other.
Example 11.12: Uniform Distribution: Market Shares of Different Types of Cars
A marketing manager wants to test his belief that four different categories of cars share the auto
market in a particular segment, equally in Delhi city. These four categories of cars are brand A,
brand B, brand C, and imported cars. He stood at a busy intersection, and collected following data
on 1,000 such cars:
11.44 Business Research Methodology
With the support of this data, we can help him in solving the problem. We first set up the hy-
potheses
H0: Same market shares (uniform distribution)
H1: Different market shares (non-uniform distribution)
The chi-square test statistic defined in Formula 11.13 can be used to perform the hypothesis test.
When the null hypothesis is true, there should be 250 cars of each brand. The requisite calculation
are shown in the following table. This implies that the expected frequency should be 250 for each
car.
Brand Number of Cars Expected Frequency Oi – ei (Oi – ei )2 (Oi – ei )2/ei
(Observed Frequency) Oi ei
A 235 250 –15 225 0.9
B 255 250 5 25 0.1
C 240 250 –10 100 0.4
Imported 270 250 20 400 1.6
Total 1,000 1,000 χ2 = 3
H1 : s12 s 22
Some of the applications of this test are in the situations described below. All these situations involve
comparing variances in two populations.
Quality of items, as measured by some indicators like diameter, length, etc. manufactured by two
machines. The machine with lower variance, implying more consistent quality, is preferred.
Price or Earning Per Share (EPS) of two shares on day-to-day basis, say for a week or month,
etc., The share with lower variance implying lesser volatility could be considered less risky
and consequently preferred by some investors.
Service time taken by two systems or agencies—the one with lower variance is preferred.
Illustration 11.8
Let A and B be two agricultural regions. The data below presents the yields, in quintals, of 10 plots
(of equal area) from each of the two regions.
Region A : 12 7 15 10 13 8 7 10 10 8
Region B : 10 9 6 7 8 7 10 15 12 9
Let us now test whether the above random samples taken from the two regions have the same
variance at 5% level of significance.
Solution:
F-Test is the appropriate test for testing equality of variances of the two populations. In the given
case, regions are treated as populations, and the yields of plots as individual observations.
The null and alternative hypotheses are formulated as follows:
H0 (null hypothesis) : s12 = s 22 (two populations have the same variance)
H1 (alternative hypothesis) : s12 s 22 (two populations do not have equal variance)
For carrying out the test of significance, we calculate the statistic ‘F’ which is defined as the ratio
of sample variances as follows:
s12
F= 2 (11.15)
where, s2
( x1 i x1 )2 x12i n1 x12
s12 =
n1 1 ( n1 1)
( x2 i x2 ) 2 x22 i n2 x22
s 22 =
n2 1 ( n2 1)
It may be recalled that in the ‘F’ ratio, the numerator is greater than the denominator i.e.
Larger estimate of population variance
F=
Smaller estimate of population variance
Thus the larger of the two sample variances is to be taken in the numerator. Let us assume that
s12 2 2
> s 22 . However, if s1 > s 22 , we would have defined the ‘F’ ratio as s 22 / s1 .
The ‘F’ statistic has two degrees of freedom: one for the numerator and one for the denominator.
These are written as d1, d2 where
d1 = n1 – 1 = degrees of freedom for sample having larger variance.
d2 = n2 – 1 = degrees of freedom for sample having smaller variance.
Since, F test is based on the Ratio of two variances, it is also known as the Variance Ratio
Test. Now we calculate both s12 and s 22 to see which of the two is larger.
7.12
Therefore, F= = 1.001
7.11
Statistical Inference 11.47
The value of F for 9, 9 d.f. at 5% level of significance is 3.18. Thus the calculated value falls in
the acceptance region.
Thus, we accept the Null hypothesis and conclude that the samples come from populations having
equal variance.
Assumptions for F-test:
The following assumptions are required for the validity of the ‘F’ test for comparing variances of
two populations.
Normality: i.e the values in each population are normally distributed.
Homogeneity: i.e the variance within each population are equal ( s12 = s 22 = s2). The assump-
tion is needed in order to combine or pool the variances within the populations into a single
source of variation ‘within populations’.
Independence of errors: It stipulates that the error, i.e. variation of each value around its own
population mean, should be independent of the value.
( yi y^ i ) 2
s^b =
(n 2) ( xi x )2
and y^i is the estimated value of yi for x = xi as per the fitted regression equation.
Illustration 11.9
Let the observations on a pair of variable x and y be as follows:
x y
x1 y1
x2 y2
xi yi
xn yn
Let the regression equation for the above data be:
y = a + bx
Then, testing the significance of the regression coefficient implies testing the null hypothesis
H0 : b = 0
against the alternative hypothesis
H1 : b 0
The statistic for testing this hypothesis is students ‘t’ with n –2 d.f., and is defined as:
^
b 0
t= ^
where, sb
^
b = estimate of ‘b’ derived from sample data.
( yi y )( xi x )
=
( xi x ) 2
n = Sample size i.e. number of pairs of observation on
the two variables x and y
^
s^b = Standard error of b
( yi y^i ) 2
=
(n 2) ( xi x )2
^
y^i = a^ + b xi (y^i is the estimated value of yi for x = xi)
a^ = y– – b –x
^
(estimated value of a from given data)
The test is explained with a example relating to correlation between sales and advertising expenses
of a company.
Statistical Inference 11.49
Advertising Sales
Expenses (x) (y)
(Rs. in Crores) (Rs. in Crores) x2 xy y2
1 60 1 60 3600
1.5 62 2.25 93 3844
2 65 4 130 4225
2.5 68 6.25 170 4624
3.5 72 12.25 252 5184
4.5 75 20.25 337.5 5625
Sum 15 402 46 1042.5 27102
Average 2.5 67
Advertising Sales
Expenses (x) (y)
(Rs. in Crores) (Rs. in Crores) y^ = 55.97 + 4.41x y – y^ ( y – y^)2 xi – –x (xi – –x )2
1 60 60.38 –0.38 0.1444 –1.5 2.25
1.5 62 62.585 –0.585 0.342225 –1 1
2 65 64.79 0.21 0.0441 –0.5 0.25
2.5 68 66.995 1.005 1.010025 0 0
3.5 72 71.405 0.595 0.354025 1 1
4.5 75 75.815 –0.815 0.664225 2 4
Average 2.5 Sum 2.559 Sum 8.5
Now,
^
t= b
s^ b
4.41
=
0.2744
= 16.07
The tabulated value of ‘t’ with (n – 2) i.e., 6 – 2 = 4 d.f., at 5% level of significance, is 2.776.
Since the calculated value is more than the tabulated value, and falls in the critical region, we
reject the null hypothesis H0, and accept the alternative hypothesis H1. This implies that the regres-
sion coefficient is significant.
It may be mentioned that in correlation and regression analysis, it is presumed that both x
and y are normally distributed. This is necessary for testing the significance of the regression
coefficient either by t or F test, and finding its confidence limits.
when r is the correlation coefficient calculated from n pairs of values of the two variables, say x
and y.
We provide the example of the test for the above example of relationship between Sales and
Advertisement Expenses.
The value of r for the above example is
yi xi nx y
r=
( yi2 n y 2 ) ( xi2 nx 2 )
1042.5 6 2.5 67
r=
( 27102 6 4489)( 46 6 6.25 )
= 0.9924
Once the value of r is known, we calculate the ‘t’ statistic as
Statistical Inference 11.51
r (n 2)
t=
1 r2
0.9924 4 1.9848
=
1 (0.9924 ) 2 0.123
= 16.123
The tabulated value of ‘t’ for 4 d.f. at 5% level of significance is 2.78.
Since the calculated value (16.123) is more than the tabulated value, and falls in the rejection
region, we reject the hypothesis that r = 0.
Thus there is significant correlation between Sales and Advertisement Expenses.
Since z is distributed as normal with mean 0 and s.d. as 1, we can use the Table T1 of area
under standard normal curve to take an appropriate decision. Since value of z(2.09) is more
than 1.96 and lies in the critical region, we reject H0 and conclude that the correlation is sig-
nificant.
Thus, if n = 2, even the value of 1 may not be significant, and thus may not imply correlation
between x and y.
In general, for lower values of n, even a high value of correlation coefficient may not be signifi-
cant. On the other hand, for high values of n, even a low value of correlation coefficient may be
significant, calculated with n pairs of observations, The value of r which can be considered significant
at 5% level of significance is given by the following formula
t0.25, ( n 2)
r (11.18)
(n 2) t 20.25, ( n 2)
For ready reference, the following table gives the minimum values of r which are considered
significant at 5% level of significance for values various of n.
n r
3 0.997
4 0.950
5 0.878
… …
… …
10 0.633
The minimum values of r, for various values of n, at 5% and 1% levels of significance are given
in Table T8. It has been derived by the formula (11.18).
11.10 p-VALUE
Computer Based Approach to Acceptance or Rejection of Hypothesis
This refers to level of significance, and is used in computer outputs. Instead of calculating test sta-
tistic and critical values, as is customary in manual system of calculation, the computer, generally
calculates the p-value, and compares it with the corresponding level of significance. This value
indicates the maximum level of significance at which the Null hypothesis would be accepted.
If the level of significance is specified as 5% and the p-value generated by computer is 0.02, i.e.
less than .05, the null hypothesis is rejected.
Statistical Inference 11.53
The p-value is the probability of getting the given sample when the null hypothesis is true.
The smaller the p-value, the less likely it is that the observed sample would have come from the
assumed population i.e. when the null hypothesis is true. The computer packages provide the p-
value associated with any test of significance. Thus, one can compare the p-value to the level of
significance, and conclude without referring to statistical tables.
The use of p-value in testing of hypotheses is explained with the help of the following ex-
ample.
A random sample of 100 students from the current year’s batch gives the mean CGPA as 3.55 and
variance 0.04. Can we say that this is same as the CGPA of the previous batch which was 3.5?
Solution:
Since we have to test the hypothesis that the population mean is equal to 3.5, the null and alterna-
tive hypotheses are set up as
H0 : m = 3.5
H1 : m 3.5
Even though population variance is not given, but the sample size is so large that the sample vari-
ance can be taken to be equal to population variance i.e. 0.04. Thus, the test statistic used is
x m0
z= which is distributed as N(0,1)
s/ n
We are given,
–x = 3.55, m = 3.5, n = 100 and s = 0.04 = 0.2
0
11.54 Business Research Methodology
Therefore,
3.55 3.5
z=
0.2 / 100
0.05 0.05
=
0.2 /10 0.02
= 2.5
Since, the calculated value of z (2.5) is more than the tabulated value of z, at 5% level of signifi-
cance, 1.96, the null hypothesis is rejected, as shown below:
Fig. 10.21
Thus the current year’s CGPA is not same as the last year’s CGPA.
The p-value for this example works out to 0.0124 which is less than 0.05 (assumed level of
significance). Hence the null hypothesis is rejected by this method as well.
1. Mean Estimation
This worksheet is basically divided into two parts, viz.:
Estimation with Raw Data
Estimation with Sample Statistics
Estimation with Raw Data
This is at the left side of the worksheet. One has to enter the raw data in the column A from cell A4
downwards. The template automatically calculates the sample statistics like sample mean sample
standard deviation and sample size.
For computation of confidence interval from raw data, one could use the upper left part of the
worksheet if the population standard deviation is known, and lower left part if population standard
deviation is not known.
If population standard deviation is known, if one enters the s.d. and the confidence level in the
cells D8 and D9 respectively, the template automatically computes the confidence intervals.
If population standard deviation is not known, once the confidence level is entered in the cell
D27, the template automatically computes the confidence intervals. We have given a drop box for
entering the confidence level in which one could select from different values.
Estimation with Sample Statistics
This is at the right side of the worksheet. Here, one could directly enter the known sample statis-
tics.
For computation of confidence interval from sample statistics, one could use the upper right part
of the worksheet if the population standard deviation is known, and lower right part if population
standard deviation is not known.
If population standard deviation is known, if one enters the sample mean, sample size population
s.d. and the confidence level in the cells, I6, I7, I8 and I9 respectively, the template automatically
computes the confidence intervals.
If population standard deviation is not known, if one enters the sample mean, sample size Sample
s.d. and the confidence level in the cells, I24, I25, I26 and I27, respectively, the template automati-
cally computes the confidence intervals.
If the finite population correction is applicable, one could enter the population size in the cell
D11, if using raw data and in the cell I11, if using sample statistics.
2. Proportion Estimation
If one enters the sample size, sample proportion and the confidence level in the cells B5, B6 and
B7, respectively, the template would automatically calculate the confidence intervals in the cells
B10 and C10.
11.56 Business Research Methodology
Above is the snapshot of the worksheet One Sample Test – Mean of the template, Statistical
Inference. This worksheet is basically divided into two parts viz.
Testing for population mean with Raw Data
Testing for population mean with Sample Statistics
Testing for Population Mean with Raw Data
This is at the left side of the worksheet. One has to enter the raw data in the column A from cell A4
downwards. The template automatically calculates the sample statistics like sample mean, sample
standard deviation and sample size.
For testing the hypothesis from raw data, one could use the upper left part of the worksheet if
the population standard deviation is known, and lower left part if population standard deviation is
not known.
If population standard deviation is known, one needs to enter the s.d., and the value of a in
the cells D8 and G11 respectively. One has to also enter the ‘Alternate Hypothesis’. The Null
Hypothesis is assumed universally as H0: m = m0. We have given all three types of tests, two tailed,
left tailed and right tailed. One can enter the hypothesised value of m in the cell E9. The template
would automatically test all the three hypothesis (all the three tailed tests). One could consider the
appropriate test for conclusion.
If population standard deviation is not known, once the value of a is entered in the cell G36, and
the hypothesised value of m is entered in the cell E37, the template automatically tests the hypoth-
esis. Rest of the explanation is same as for the above case.
Testing for Population Mean with Sample Statistics
This is at the right side of the worksheet. Here, one could directly enter the known sample statis-
tics.
For testing of hypothesis from sample statistics, one could use the upper right part of the worksheet
if the population standard deviation is known, and lower right part if population standard deviation
is not known.
If population standard deviation is known, and if one enters the sample size, sample mean, popula-
tion s.d., value of a and the hypothesised value of mean in the cells, I6, I7, I8 and D9 respectively,
the template carries out all the three tests.
If population standard deviation is not known, if one enters the sample mean, sample size Sample
s.d. and the confidence level in the cells J5, J6, J8, L11 and J12 respectively, the template automati-
cally tests the hypotheses.
If the finite population correction is applicable, one could enter the population size in the cell
G18 if using raw data and in the cell L18 if using sample statistics.
We have solved the example on heights of students, using raw data and Illustration 11.5, using
sample statistics template.
This worksheet could be used for testing of equality of means for two samples.
Above is the snapshot of the worksheet Two Sample Mean Pop Sds Known of the template,
Statistical Inference. This worksheet is basically divided into two parts viz.
Testing for equality of means of two samples with Raw Data
Testing for equality of means of two samples with Sample Statistics
Testing for Equality of Means of Two Samples with Raw Data
This is at the left side of the worksheet. One has to enter the raw data in the columns B and C from
cell B6 and C6 downwards. The template automatically calculates the sample statistics like sample
mean sample standard deviation and sample size, for the two samples.
One needs to enter the s.d.s, and the value of a in the cells F9, G9 and H15 respectively.
One has to also enter the ‘Alternate Hypothesis’. The Null Hypothesis is assumed universally as
H0: m1 – m2 = d0 We have given all three types of tests, two tailed, left tailed and right tailed. One
can enter the hypothesised value of the difference, d0, in the cell F16. The template would automati-
cally test all the three hypothesis (all the three tailed tests). One could consider the appropriate test
for conclusion.
Testing for Equality of Means of Two Samples with Sample Statistic
This is at the right side of the worksheet. Here, one could directly enter the known sample statistics
for the two samples.
One needs to enter the sample mean, sample size Sample s.d. and the confidence level in the
cells L6, M6, L7, M7, L9, M9 N15 and L16 respectively, the template automatically tests the
hypotheses.
Statistical Inference 11.59
This worksheet could be used for testing of equality of proportions for two samples.
Above is the snapshot of the worksheet Two Sample Mean Pop Sds unknown of the template,
Statistical Inference. This worksheet is basically divided into two parts viz.
Testing for equality of means of two samples with Raw Data
Testing for equality of means of two samples with Sample Statistics
Testing for Equality of Means of Two Samples with Raw Data
This is at the left side of the worksheet. One has to enter the raw data in the columns B and C from
cell B5 and C5 downwards. The template automatically calculates the sample statistics like sample
mean sample standard deviation and sample size, for the two samples.
One needs to enter the value of a in the cell G15 respectively. One has to also enter the ‘Alternate
Hypothesis’. The Null Hypothesis is assumed universally as H0: m1 – m2 = d0. We have given all
three types of tests, two tailed, left tailed and right tailed. One can enter the hypothesised value of
the difference, d0, in the cell E16. The template would automatically test all the three hypotheses
(all the three tailed tests). One could consider the appropriate test for conclusion.
We have solved the Example 11.8 relating to car batteries using raw data template.
11.60 Business Research Methodology
7 . T W O S A M P L E M E A N PA I R E D D I F F
This worksheet could be used for testing of equality of means for paired or dependent
samples.
Above is the snapshot of the worksheet Two Sample Mean Paired Diff of the template, Statistical
Inference. One needs to enter the data in the columns B and C from the cells B4 and C4 downwards.
the template automatically computes the two sample statistics like the difference, sample size, mean
of difference, standard deviation of difference. One needs to also enter the value of a and the hy-
pothesised value of difference in the cells I10 and G11 respectively. The template then computes
the test statistic, p-value and tests the hypotheses.
We have solved Example 11.9 relating to consumer confidence survey, using the above
template.
8 . C H I S Q UA R E T E S T F O R G O O D N E S S O F F I T
Above is the snapshot of the worksheet Chi Square Goodness of Fit test of the template, Statistical
Inference. In this template one could give the observed and the expected frequencies in the columns
A and B from the cells A4 and B4 respectively. The template automatically calculates the columns
like deviation squared deviation, and c2 statistic. After entering the value of a in the cell, H7, the
template carries out the test and gives the result in the cell H8.
We have solved Illustration 11.7 with the help of above template.
9 . F T E S T — E Q UA L I T Y O F VA R I A N C E S
This worksheet could be used for testing of equality of variances for two samples.
Above is the snapshot of the worksheet F Test–Equality of Variances of the template, Statistical
Inference. This worksheet has two parts viz.
Testing for population variance with Raw Data
Testing for population variance with Sample Statistics
Testing for Population Variance with Raw Data
This is at the left side of the worksheet. One has to enter the raw data in the columns B and C from
cells B6 and C6 downwards. The template automatically calculates the sample statistics like sample
sizes and sample variances of the two samples.
If the value of a is entered in the cell G13, the template carries out the required test and gives
the results.
Testing for Population Variance with Sample Statistics
This is at the right side of the worksheet. Here, one could directly enter the known sample statis-
tics.
If one enters the sample sizes and the sample variances in the cells, J5, K5, J6, K6, respectively,
and also the value of a in the cell K13, the template carries out the required test and gives the re-
sults.
We have solved Illustration 11.8 relating to Agricultural Regions, using raw data.
Statistical Inference 11.63
G L O S S A RY
1. Which of the following factor does not usually affect the width of a confidence interval?
(a) Sample size
(b) Confidence desired
(c) Variability in the population
(d) Population size
2. Given level of confidence as 95% and margin of error as 2%, the minimum sample size required
to estimate the proportion is:
(a) 1256 (b) 2009 (c) 2401 (d) 2815
3. Type-I error is defined as the probability to:
(a) accept a hypothesis when it is true
(b) accept a hypothesis when it is false
(c) reject a hypothesis when it is true
(d) reject a hypothesis when it is false
4. Which of the following is not an alternative hypothesis?
(a) H1 : m mo (b) H1 : m > mo (c) H1 : m < mo (d) H1 : m = mo
5. If the alternative hypothesis is m1 > m2, the critical region will be on?
(a) the left side (b) the right side
(c) on both sides (d) any one of the above
6. Which of the following statement about confidence limits for population mean is not true?
(a) 50% confidence limits are wider than 95%
(b) 90% confidence limits are wider than 95%
(c) 95% confidence limits are wider than 99%
(d) 99% confidence limits are widest
7. If on the basis of a sample, a hypothesis is to be rejected, at 5% level of significance, then
p – value will be:
(a) = 0.05 (b) < 0.05 (c) > 0.05 (d) < 0.025
8. Which one of the following statements is false?
(a) a is called Type-I error
(b) 1– a is called power of the test
(c) b is called Type-II error
(d) 1– b is called power of the test.
9. Which one of the following is not a step in conducting a test of significance?
(a) Set up the Null hypothesis
(b) Decide the level of significance
(c) Decide the power of the test
(d) Decide on the appropriate statistic.
10. The p-value indicates the:
(a) Minimum level of significance at which the Null hypothesis would be rejected
(b) Maximum level of significance at which the Null hypothesis would be accepted
(c) Maximum level of significance at which the Null hypothesis would be rejected
(d) Minimum level of significance at which the Null hypothesis would be accepted
Statistical Inference 11.65
EXERCISES
1. A Department store wants to determine the % of shoppers who buy at least one item. A random
sample of 500 shoppers leaving the shop showed that 150 did not buy any item. What is the
96% confidence interval for the true percentage of buyers?
2. A manager wants to determine the average time required to complete a job. As per the past data
about completion of the job, the standard deviation is 5 days. How large should the sample be
so that he may be 99% confident that the sample mean may lie within ±2 days of the actual
mean?
3. An oil company has purchased a new machine which fills 1 litre tins with a type of oil. If
the fill exceeds 1000 ml, there will be wastage of oil. If the fill is under 1000 ml, there will
be complaints from the customers. To check the filling operation of the machine, 36 bottles
are chosen at random and found to have a fill of 999.2 ml. The s.d. of the machine is known
to be 1.2 ml. What is the hypothesis for such a test? Test the hypothesis using 1% level of
significance.
4. A sample of 50 pieces of a certain type of string was tested. The mean breaking strength turned
out to be 15 kgs. Test whether the sample is from a batch of strings having a mean breaking
strength of 15.6 kgs. and standard deviation of 2.2 kgs. Use 1% level of significance.
5. A claim is made that a batch of bulbs has a mean life of 2000 hrs. From past experience, it
is known that the s.d. of lives is 100 hrs. A buyer specifies that he wants to test the claim
against the alternative hypothesis that the mean burning time is, in fact, below 2000 hrs at 2%
significance level. A sample of size 25 is drawn, and the sample average is found to be 1950
hrs. What conclusion should the buyer make? Should the buyer accept the hypothesis that the
mean life of all bulbs in the population is at least 2000 hrs?
6. A manufacturer of a patent medicine claims that it is effective in curing 90% of the people
suffering from the disease. In a sample of 200 people using this medicine, 160 were relieved
of suffering. Determine whether his claim is justified?
7. The owner of a workshop wants to know which of the 2 brands of hand gloves used in the
workshop has longer lasting life than the other. He selected, at random, 40 workers who wear
gloves of National firm, and their gloves lasted on an average for 80 days with s.d. 5.0 days;
while another 40 randomly selected workers wear out the gloves of Liberty firm on an average
in 84 days with s.d. of 4.0 days. Can he feel 95% confident that the difference between the
two brands is significant?
8. The manager of a workshop wishes to determine if a new process would reduce the working
time per unit manufactured on a given machine. He recorded the initial timings in minutes
taken by 5 workers and the new timings by the same workers after introducing the process.
He wishes to draw inference about the usefulness of the process from the observations given
below:
Workers 1 2 3 4 5
Working time per unit Before 8 4 9 8 6
After 5 3 8 6 8
11.66 Business Research Methodology
Use a = 0.05 and test whether the new process has resulted in the reduction of mean working
time.
9. A new petrol additive is being tested, in the hope that it will increase kms. per litre. A series
of trials are carried out, with and without the additive. One hundred trials on a brand of car
without the additive show an average petrol consumption of 15 km.p.litre. with the s.d. of 1.2.
With the additive, the average of another 150 trials is 16.5 km.p.litre with a s.d. of 1.4. Do
these figures establish, at 5% significance level, that the additive has increased the km.p.liter
consumption?
10. A pharmaceutical company wants to estimate the mean life of a particular drug under typical
weather conditions. The following results were obtained from a simple random sample of 25
bottles of the drug. The population s.d is given to be 3 months.
Sample Mean = 30 months
Population s. d. = 3 months
Find interval estimates with confidence level of (i) 90% (ii) 95% and (iii) 99%.
11. The Value for Money, a consumer products firm, interested in promoting a new product, wishes
to test the effectiveness of sponsoring a major TV movie. Of the 300 individuals surveyed,
during the week preceding the movie, 45% were familiar with the new product. After the
screening, a sample of 400 individuals were surveyed and the brand awareness found to be
51%. Can the firm conclude that the brand awareness was improved by sponsoring the movie,
at a = 0.05?
12. A new washing machine liquid detergent was introduced in the market, by using only cash
discount incentive as a promotional drive. After about a month, a sample of 60 housewives were
requested to rate the new detergent. After a month of intensive TV advertising, the same women
were asked to rate the detergent once again. Using a scoring system, based on perceptions of
product effectiveness, the difference in scores had mean 1.6 and s.d. 0.4. Is there evidence
that perceptions of the products’ effectiveness changed during the period of the advertising?
Carry out the test at 0.05 level of significance.
13. In a management institute, the A+, A and B Grades allocated to students in their final examina-
tion, were as follows:
Specialisation Grades
A+ A B
Finance 20 25 10
Marketing 15 20 8
Operations 5 15 7
Using 5% level of significance, determine whether the grading scale is independent of the spe-
cialisation?
14. The Progressive Bank allocates the loan applications in the order they are received to its four
loan approval officers, one after the other in the same sequence, to avoid any bias of process-
ing. The following data shows the loan approvals by the four officers:
Decision Ms. Simran Mr. Sajay Ms. Saloni Mr. Sumil
Approved 24 17 35 11
Rejected 16 18 15 20
Statistical Inference 11.67
Use an appropriate statistical test to determine if the loan approval decision is independent of the
loan officer processing the loan application. Carry out the test at 5% level of significance.
15. The following results were obtained while studying the service time taken by two operators
while serving the customers, selected at random:
Operator Number of Customers Mean Service time Sum of Squares of
(in seconds) Deviations from Mean
‘A’ 10 160 900
‘B’ 12 140 1080
Test the equality of variances of the service times of the two operators at 5% level of
significance.
Analysis of Variance
12
1. One Way/Factor ANOVA
2. Post hoc Tests – Pair-wise Comparision of Means
Tukey’s HSD Test
Fisher’s LSD Test
Contents
3. Two Way / Factor ANOVA
4. Two Way / Factor ANOVA with Interaction
5. Use of ANOVA for Testing Significance of Regression Equation
6. Using Excel
LEARNING OBJECTIVES
The main objective of this chapter is to help in understanding the type of analysis that is required to
test:
Equality of means of more than two variables/factors representing life of picture tubes, returns on
investments, effectiveness of training, impact of promotional strategies
Validity or significance of multiple regression equation
Equality of interactions between two factors like three advertising campaigns and three categories
of populations like urban, semi-urban and rural.
Relevance
Mr. Pankaj, the CEO of a relatively new brand of television, was enjoying the bliss of contin-
ued phenomenal growth in sales during the last two years. However, to give further impetus
to the sales, he planned to recruit about 200 field staff who could provide guidance to the staff
of retail outlets as also the potential customers about the features of televisions, innovations
taking place, etc., and thus bring out the cost-benefit aspect of the company’s televisions. Mr.
Pankaj intended to take fresh science graduates and provide them four weeks training at one
of the three institutions which were equipped to provide such training. However, before award-
ing the contract, he thought of comparing the effectiveness of training imported by the three
institutions. Out of the first batch of 30 officers, he deputed 10 officers to each of the three
institutions where they were given training in three different modules, viz. technical, financial
and behavioural. After the training, he hired the services of a reputed consulting organisation to
conduct a quantitative assessment of the training imparted to the field officers. The consulting
agency used ANOVA to evaluate the impact of training in the three modules as also the three
institutions. Such an analysis helped Mr. Pankaj to award the training contract on objective
basis, without any prejudice.
12.2 Business Research Methodology
Salesmen
1 2 3 4 5
Institutes
1 67 70 65 71 72
2 73 68 73 70 66
3 61 64 64 67 69
The problem posed here is to ascertain whether the three Institutes’ training programmes are
equally effective in improving the performance of trainees. If m1, m2 and m3 denote the mean ef-
fectiveness of the programmes of the three Institutes, statistically, the problem gets reduced to test
the null hypothesis, i.e.
H0 : m1 = m2 = m3
against the alternative hypothesis that it is not so, i.e.,
H1 : All means are not equal
It may be appreciated that the sales figures of all the 15 salesmen would have been varying from
each other, even if they had attended the same training programme. This is due to inherent variations
Analysis of Variance 12.3
that exist from person to person. Therefore, the variation in the 15 observations could be attributed
to two factors – one, the training received at different institutes and the other, due to inherent factors
present in the different salesmen and some other miscellaneous factors like their areas of operations,
etc. The Analysis of Variance technique helps us to find out the variation due to both the factors, and
also assess whether the variation among the three Institutes is significantly greater than the other
factors. If if it so, the programmes of the three Institutes are not equally effective.
All the above 15 observations could be represented by a variable xij indicating score at ith Institute
for jth salesman. It may be noted that while i varies from 1 to 3, j varies from 1 to 5. Further, let
mean of salesmen at ith Institute be
5
1 xij = –x i
5 j=1
1 3 x
and mean of jth Salesmen be = –x j
3 i=1 ij
The methodology of carrying out the test is illustrated below.
The marginal row and column totals of values in Table 12.1, as also means of the three groups
of salesmen trained at each Institute, are worked out and presented in Table 12.2.
1 67 70 65 71 72 345 –x = 69
1
2 73 68 73 70 66 350 –x = 70
2
3 61 64 64 67 69 325 –x = 65
3
Total for
Three Salesmen 201 202 202 208 207 1020 Grand Total
Mean for
Three Salesmen 67 67.3 67.3 69.3 69 68 Grand Mean
The grand mean sale of all the salesmen is worked out below:
Grand Total 1020
Grand Mean ( x ) =
Total Number of Observations 15
= 68
Now, total variation among
all the 15 observations = Sum of the squares of deviations of
all the 15 observations from their grand mean viz. 68
= (66 – 68)2 + … + (61 – 68)2 + … + (69 – 68)2
= 180
12.4 Business Research Methodology
It may be noted that if all the salesmen were equally good with respect to their performance, and
the training of all the three Institutes were equally effective, all the observations would have been
identical, and this sum would have been zero.
Mathematically, this sum is expressed as:
2
Total Variation or Total Sum of Squares = ( xij x)
I j
th
where, xij is the sales by j salesman ( j =1, 2, 3, 4, 5) trained by ith Institute (i = 1, 2, 3)
This can also be written as
2
= ( xij xi xi x)
I j
It may be noted that the first term in the above expression indicates the sum of squares of the
deviation of sales by jth salesman trained by ith Institute from the average sales of five salesmen
trained by ith Institute. Further, the second term indicates the sum of squares of the deviations of
the mean of all five salesmen trained by ith Institute and the grand mean for all 15 salesmen.
The first term, termed as sum of squares within institutes, is evaluated numerically as fol-
lows:
( xij xi ) 2 = {(x – –x )2 + (x – –x )2 + (x – –x )2 + (x – –x )2 + (x – –x )2}
11 1 12 1 13 1 14 1 15 1
I j
+ {(x21 – –x 2)2 + … + (x25 – –x 2)2}
+ {(x31 – –x 3)2 + … + (x35 – –x 3)2}
= (67 – 69)2 + (70 – 69)2 + … + (72 – 69)2
+ (73 – 70)2 + … (66 – 70)2
+ (61 – 65)2 + … + (69 – 65)2
= 110
Further, the second term representing sum of squares between institutes is evaluated as fol-
lows:
2
( xi x) = 5 ( xi x )2
I j i
= 5 {( x1 x )2 ( x2 x )2 ( x3 x )2 }
= 5 {(69 – 68)2 + (70 – 68)2 + (65 – 68)2}
= 5(14)
= 70
Thus, we note that
Total Sum of Squares (180) = Sum of Squares within Institutes (110)
+ Sum of Squares between Institutes (70)
Analysis of Variance 12.5
Now, the sum of squares ( xij xi ) 2 in the expression (11.1) is called the Sum of Squares
I j
(among salesmen) Within Institute, and if all the salesmen in each Institute would have been same,
this would have been zero as each xi1, xi2, xi3, xi4 and xi5 in ith Institute would have been therefore,
equal to –x i.
2
The sum of squares ( xi x ) in expression (12.1) is called Sum of Squares Between Insti-
I j
tutes, and would have been zero if all the three Institutes would have been equally effective because
then –x 1 = –x 2 = –x 3 = =
x.
Thus, we can say that
Total Sum of Squares = Sum of Squares Within Institutes
+ Sum of Squares Between Institutes (12.2)
The above application is referred to as one way or one factor ANOVA as we have tested dif-
ferences among only one factor i.e. Institute. The three Institutes are referred to as three levels or
treatments.
Assuming all salesmen to be equally competent, the observations vary from each other due to one
factor viz. training imparted by the institutes – the three Institutes being referred to as three levels
of the factor. In general, the observations are collected in the following format:
3 x31 x32 T3
4 x41 x42 T4
n i
It may be noted that total number of treatments are equal to k, and there are ni observations for
ith treatment.
The general result is of the form;
Total Sum of Squares = Sum of Squares Between Treatments or Due to Treatments
+ Sum of Squares Within Treatments or
‘Sum of Squares Due to Error’ or ‘Error Sum of Squares’
There is a convenient mechanical way of performing various calculations and carrying out the
requisite procedure for testing the equality of treatments, and is explained below:
xij
Calculation of Grand Mean =
x = (12.3)
n
12.6 Business Research Methodology
xij2
Correction Factor (Abbreviated as CF) = (12.4)
n
Ti 2
Sum of Squares due to Between Treatments (SST) = – CF
i ni
The expression (11.2) is generalised as follows
Total Sum of Squares = Sum of Squares Between Treatments
+ Sum of Squares within Treatments
The Sum of Squares within Treatments is referred to as Sum of Squares due to Error or Error
Sum of Squares, and is abbreviated as SSE. This terminology is used to indicate that total variation
is either due to differences among treatments or due to several other unknown and random factors.
All these unknown and random factors are combined to cause ‘Error’, and the sum of squares due
to these unknown and random factors is called ‘Error Sum of Squarer’ or ‘Sum of Squares due to
Error’.
Sum of Squares due to Error (SSE) = Total Sum of Squares
– Sum of Squares Due to Treatments
SSE = TSS – SST
For testing the equality of means or equal effectiveness of all the k treatments, we compute
SST
Mean Sum of Squares Due to Treatments (MSST) = (12.6)
k 1
SSE
Mean Sum of Squares Due to Error (MSSE) = (12.7)
n k
The Fisher’s ratio ‘F’ Statistic is defined as
MSST
F= (12.8)
MSSE
and is distributed as F with (k – 1, n – k) d.f.
If the calculated value of F is more than the tabulated value of F at (k – 1, n – k) d.f., we
reject the null hypothesis that all the means are equal. If, on the other hand, the calculated
value is less than the tabulated value, we accept the null hypothesis.
It may be noted that more the variation among the Institutes, more will be SST and MSST, and
accordingly, more the value of ‘F’ implying greater chances of the rejection of hypothesis about
equality of means.
For the above example of the training programmes at three Institutes,
xij2 10202
Correction Factor (CF) =
15 15
= 69360
Analysis of Variance 12.7
= 69430 – 69360
= 70
Therefore,
SSE = TSS – SST
= 180 – 70
= 110
Thus,
SST 70
MSST = = 35
k 1 3 1
SSE 110
MSSE = = 9.167
n k 15 3
MSST
F= (k – 1, n – k d.f.)
MSSE
35
=
9.167
= 3.82 (2, 12 d.f.)
The tabulated value of F at 2, 12 d.f. vide Table T4 is 3.89. Since the calculated value is less than
the tabulated value, and falls in the acceptance region, we do not reject the null hypothesis that all
the training Institutes are equal with respect to the training programmes conducted by them.
All the above results are summarised in an ANOVA Table given below.
Thus, there is no significance difference among the means of sales of salesmen trained at three
different institutes.
12.8 Business Research Methodology
Rs. per month. Further, the data represents independent random samples of total compensation for
eight banking executives belonging to each of the 3 banking sectors.
We use the above data for illustrating the Tukey’s test for comparing means of the three pairs of
variables, viz.:
Corporate banking and Retail banking
Corporate banking and Personal banking
Retail banking and Personal banking
The data is also used to illustrate calculation of 95% confidence limits.
Before applying the Tukey’s test, one has to ascertain that the differences among means are sig-
nificant. Because, if the differences among means are insignificant, there is no use of using Tukey’s
test to test significant difference between any pairs of means. Thus, first we use ANOVA to test the
following null hypothesis
H0 : m1 = m2 = m3
where, m1, m2 and m3 are the mean compensations for corporate, retail and personal banking.
The Tukey’s test is to be used only if the above hypothesis is rejected i.e. the mean compensa-
tions of executives in the above sectors are not equal.
It may be verified that the ANOVA yields following results.
*F = MS Between Sector
MS Within Sector
@
Tabulated value of F at (2, 21) d.f. vide Table T4 is 3.47.
12.10 Business Research Methodology
It may be noted that the null hypothesis about equality of means of the three sectors is re-
jected. We can, therefore, now proceed to illustrate Tukey’s test. It may be appreciated that,
if the null hypothesis for equality of means was accepted, there would have been no need to
proceed further.
For calculating the value of the statistic
qs w
w= , (q = is called studentised range)
n s/ n
we note from the following Table 12.6 that for n = 8, the value of q at (3, 21) d.f. at 5% level of
significance = 3.56
k
n 3 4
6 15 (3.67) 20 (3.96)
8 21 (3.56) 28 (3.86)
10 27 (3.51) 36 (3.81)
For calculating the values of s, we prepare the following Table to calculate sample variances of
sectors (si’s)
Corporate Banking x1 (x1i – –x 1)2 Retail Banking x2 (x2i – –x 2)2 Personal Banking x3 (x3i – –x 3)2
755 41108 520 12377 438 18564
712 60393 295 113064 828 64389
845 12713 553 6123 622 2280
985 743 950 101602 453 14702
1300 117135 930 89252 562 150.06
1143 34318 428 41311 348 51189
733 50513 510 14702 405 28646
1189 53477 864 54173 938 132314
–x = 957.75 Sum = 370398 –x = 631.25 Sum = 432601 –x = 574.25 Sum = 312233
1 2 3
The value of s can also be obtained from ANOVA Table 12.5, as s2 is equal to MSSE i.e. mean
sum of squares within sectors. From the table, it is found to be equal to 53106 which is the same
value as obtained from the above Table 12.7.
Now all the pairs of differences between the sample means are compared with this value 290.05.
The values greater than this will imply significant difference between the group means.
From the above table the difference in group means are as follows:
Difference between Corporate Banking and Retail Banking
–x – –x = 957.75 – 631.25 = 326.5
1 2
Difference between Corporate Banking and Personal Banking
–x – –x = 957.75 – 574.25 = 383.5
1 3
Difference between Retail Banking and Personal Banking
–x – –x = 631.25 – 574.25 = 57
2 3
Thus, we observe that the compensations between corporate banking and retail banking execu-
tives are significantly different and so is the compensation between corporate banking and personal
banking executives. However, the difference in compensation between personal banking and retail
banking executives is not significant.
It may be added that the 95% confidence limits for all the sector means have been derived and
are given in tabular as well as pictorial form for the purpose of only better comprehension.
Corporate Banking 957.75 ± 169.44
Retail Banking 631.25 ± 169.44
Personal Banking 574.25 ± 169.44
where n = number of observations in each of the three groups, k is the number of groups (assumed
as 3), and MSSE is Within Groups Sum of Squares obtained from the ANOVA Table 12.5 prepared
from the given data, ta/2, k(n – 1) is the value of Student’s ‘t’ at a % level of significance and k(n – 1)
d.f. In our illustration, n = 8, k = 3, and within group sum of squares is the sum of squares within
sectors calculated as 53106 in the ANOVA Table 12.5.
For the given example,
53106 Ê + ˆ
1 1
Ë 8 8¯
= 13276.5
= 115.2
Value of ‘t’ at 5% level of significance and 3(8 – 1) = 21 d.f. = 2.08
Therefore,
LSD = 2.08 115.2 = 239.6
This is the Least Significant Difference between group means –x i and –x j, for the null hypothesis mi
= mj to be rejected. It implies that the difference between two sample means to be significant must
be at least 239.6. In the above example,
|x–1 – –x 2| = 326.5
|x–1 – –x 3| = 383.5
|x–2 – –x 3| = 57
We note that the first two differences are greater than LSD (239.6) but the third one is less than
LSD. Thus, while the hypotheses, Ho: m1 = m2 and Ho: m1 = m3 are rejected, the hypothesis,
Ho : m2 = m3 is not rejected.
It may be noted that the conclusions for comparison of group means are the same for Fisher’s
LSD test as for Tukey’s HSD test.
Such type of analysis is called One Way or One Factor ANOVA. In this section, we discuss Two
Way or Two Factor ANOVA. Here, the variation in the data is caused by two factors. This is illustrated
with an example below, wherein the variation in the number of additional mobile phone subscribers
is caused by different telecom companies as well as different periods of time, say months.
Example 12.1
The following Table gives the number of subscribers added by four major telecom players in India,
in the months of August, September, October and November 2005. The data are given in 000’s and
are rounded off to the nearest 100, and are thus in lakhs.
Additions to Subscribers
(In Lakhs)
Company
Months Bharti BSNL Tata Indicom Reliance
August 6 6 2 5
September 7 6 2 3
October 7 6 6 4
November 7 8 7 4
(Source: Indiainfoline.com on India Mobile Industry)
Additions to Subscribers
(in Lakhs)
Company Month
Months Bharti BSNL Tata Indicom Reliance Total Average
August 6 6 2 5 19 4.75
September 7 6 2 3 18 4.50
October 7 6 6 4 23 5.75
November 7 8 7 4 26 6.50
Company Total 27 26 17 16 Grand Total 86
Average 6.75 6.5 4.25 4 Grand Mean 5.375
From the above table, we work out the requisite sums of squares for preparing the ANOVA Table,
as follows:
( xij ) 2 862
Correction Factor (CF) =
16 16
= 462.25
12.14 Business Research Methodology
= 62 + 62 + … + 72 + … 42 – 462.25
= 514 – 462.25
= 51.75
Ti 2
S.S. Between Months = CF
n
2
= 19 182 232 262 – ]462.25
4 4 4 4
= 472.5 – 462.25
= 10.25
27 2 262 17 16
S.S. Between Companies = – 462.25
4
= 487.5 – 462.25
= 25.25
Residual or Error Sum of Squares = 51.75 – 10.25 – 25.25 = 16.25
The above results are presented in the following ANOVA Table.
ANOVA Table
With calculated values of ‘F, given in the table, we can derive conclusions as follows:
(i) Since calculated value of F (4.65 ) is greater than 3.86, the tabulated value of F at 5% level
of significance and 3, 9 d.f., it is concluded that there is a significant difference between
the companies in terms of adding subscribers.
(ii) Since calculated value of F (1.89) is less than 3.86, the tabulated value of F at 5% level of
significance and 3, 9 d.f., it is concluded that there is no significant difference between
months in terms of total additional consumers.
The variation could also occur due to interaction between the institute and the field of specialisation.
For example, it could happen that the marketing specialisation at one institute might fetch a better
pay package rather than that at the other institute. These presumptions could be tested by collect-
ing the following type of data for a number of students with different specialisations and different
institutes. However, for the sake of simplicity of calculations and illustration, we have taken only
two students each for each interaction between institute and field of specialisation.
Illustration 12.3
The data is presented below in a tabular format.
Institute Institute Institute
A B C
Marketing 8 10 8
10 11 7
Finance 9 11 5
11 12 6
HRD 9 9 8
7 7 5
From the above table, we work out the requisite sums of squares for preparing the ANOVA Table,
as follows:
(Sum of All Observations)2
Correction Factor (CF) =
Total number of Observations
(153 ) 2
=
18
= 1300.5
Total Sum of Squares = (Sum of Squares of All Observations) – CF
(TSS) = 1375 – 1300.5
= 74.5
542 542 452
Sum of Squares Between or = 1300.5
6 6 6
due to Fields of Specialisation
(Row: SSR) = 9
2
Sum of squares due to = 54 602 392
1300.5
6 6 6
Institutes Column: SSC
= 39
Sum of squares due to Interaction between Institutes and fields of specialisations
where, n is the number of observations for each interaction. In this example, it is equal to 2.
–x is the mean of the observations of ith row and jth column.
ij
–x is the mean of the observations in the ith row.
i.
–x . is the mean of the observations in the jth column.
j
–x .. is the grand mean of all the observations.
These terms can be calculated by first calculating the means of all the interactions as also the
means of corresponding rows and columns, as given below,
and then calculating the sum of squares for each interaction by the formula (12.5 ) as follows.
Analysis of Variance 12.17
Thus,
Interaction SS (SSI) = 2 6 = 12
Sum of Squares Due to Error = Total Sum of Squares – SS Due to Specialisation
(Row: SSR) (column: SS – SS Due to Interaction
– SS due to Institute
= TSS – SSR – SSC – SSI
= 74.5 – 9 – 39 – 12
= 14.5
Now, the ANOVA Table can be prepared as follows.
ANOVA Table
From the above table, we conclude that the while the pay packages among the institutes are
significantly different, there is no significant difference among the pay packages for fields of spe-
cialisations as also among interactions between the institutes and the fields of specialisations.
Example 12.2
RELIABLE tyre dealer wishes to assess the equality of lives of three different brands of tyres sold
by it. It also wants to assess whether the lives of these tyres is the same for four brands of cars on
which they are being used. Thus, each brand of tyre was tested on each of the four brands of cars.
Further, the dealer wishes to ascertain the equality of lives of tyres for each combination of brand
of tyre and car. The mileages obtained are given as follows:
(Mileage in ’000 Kms)
Car Brands
A B C D
Tyre I 32 30 34 36
31 29 33 38
(Contd)
12.18 Business Research Methodology
(Contd)
33 28 36 39
Brands 31 30 35 40
II 38 39 40 41
37 40 41 39
38 41 42 40
39 39 43 42
III 32 33 40 45
30 32 42 43
31 30 41 42
33 31 40 46
This example is solved through template on ANOVA with Interaction, and is given in Section 12.7
titled USING EXCELL.
where, k is the number of independent variables (equal to 2 in the above equation), and n is the
number of observations for each of the independent variables. Further details are given in Chapter
14 on Multivariate Statistical Techniques.
Analysis of Variance 12.19
ONE WAY
This worksheet provides the calculations for one-way or one factor ANOVA.
The above is a snapshot of the worksheet ‘One Way’ of the template ‘ANOVA’. In the above
worksheet one has to enter the data in columns B. C, D, E, F, from 5th row downwards. The maxi-
mum number of variables possible is five. If one enters data in the above mentioned columns, the
template automatically calculates the rest of the computations and gives ANOVA Table at the right
side of the data.
The ANOVA Table has following columns:
Source : Source of variation.
SS : Sum of squares for respective source
df : Degrees of freedom for the source
MS : Mean Sum of Squares
F : Calculated value of F statistic
F critical : Table value of F statistic
p-value : Probability value of the F statistic
Result : The decision of accepting or rejecting the hypothesis
The source of variation contains three factors, columns, rows and total.
We have solved Illustration 12.1, relating to salesmen training from different institutes, using the
above worksheet.
12.20 Business Research Methodology
TWO WAY
This worksheet provides the calculations for two way or two factor ANOVA.
The above is a snapshot of the worksheet ‘Two Way’ of the template ‘ANOVA’. In the above
worksheet, one has to enter the data in columns B, C, D, E, F, from 5th row downwards. The maxi-
mum number of columns possible is five. If one enters data in the above mentioned columns, the
template automatically calculates the rest of the computations and gives ANOVA Table on the right
side of the data.
The explanation of ANOVA Table is the same as in the previous worksheet, except that now the
source contains four different factors, ‘columns, rows, error and total’.
We have solved Example 12.1, relating to telecom players, using the above worksheet.
Analysis of Variance 12.21
This worksheet provides the calculations for two-way or two factor ANOVA with inter-
action.
The above is a snapshot of the worksheet ‘Two Way’ of the template ‘ANOVA’. In the above
worksheet one has to enter the data in columns B. C, D, E, F, from 5th row downwards. The maxi-
mum number of columns possible is five. Here, each row has more than one observation. If one
enters data in the above mentioned columns and rows, the template automatically calculates the rest
of the computations and gives ANOVA Table at right side of the data.
The explanation of ANOVA Table is the same as in the previous worksheet, except that now the
source contains five different factors, columns, rows, interaction, error and total.
We have solved Example 12.2, relating to mileage of tyres, using the above worksheet.
G L O S S A RY
Analysis of The process for splitting the variation of a group of observations into
Variance ANOVA assignable causes and setting up various significance tests
One-Way or One- When the source of variation in the observations is primarily due to
Factor ANOVA one factor
12.22 Business Research Methodology
Two-Way or When there are two factors as sources of variation in the observations
Two-Factor ANOVA
Interaction of Factors Joint impact of two factors
Post hoc Test Test carried out based on the result of the earlier test
EXERCISES
1. In order to promote use of credit cards by a bank, the users of all three types of card holders
viz. ‘Gold International’, ‘Gold’ and ‘Silver’ were offered 5% discount on their bills. After the
offer period, five cards were selected at random from each category. Percentage of increase in
the bills of each type are given below:
Are the increases among different cards bill amounts equal? What does it imply? Use Tukey’
HSD and Fisher’s LSD tests to carry out pair wise comparison of three types of cards.
2. Three groups of almost equally effective salesmen for consumer products were deputed to sales
training programmes conducted by three different training institutes. The amounts of sales
made by each of the 15 salesman during the first week, after completing the training, are as
follows:
Institute A : 65, 68, 64, 70, 71, 75
Institute B : 73, 68, 73, 69, 64
Institute C : 64, 64, 66, 69
Can the difference among mean sales by the three groups be attributed to chance at the level
of significance = 0.05?
3. The following data gives monthly rates of returns for the shares of three companies over the
six month period from October 2006 to March 2007.
Month A B C
October ’06 3.5 5.2 4.0
November ’06 –2.5 –4.0 3.6
December ’06 –5.6 5.4 6.0
January ’07 4.0 –4.6 –3.5
February ’07 5.0 6.6 6.0
March ’07 7.5 8.0 5.2
Can we conclude that the average monthly rates of returns on all the shares have been the
same?
Analysis of Variance 12.23
4. A company dealing with office equipment has offices in 3 cities. The company wants to com-
pare the volume of sales in the offices during the five day promotional period. The data about
sales on each of the five days are given below:
Office Sales in Rs. 1000’s
Mumbai 85 125 110 93 160
Delhi 124 75 82 135 105
Chennai 95 130 145 190 170
Test for differences between the sizes of sales among the 3 offices.
5. The following table gives the retail prices of a certain commodity in some selected shops in
four cities.
City Prices
A 62, 58, 60, 59
B 50, 48, 52
C 70, 65, 68, 64, 63
D 80, 85, 82, 78
Can we say that the prices of the commodity differ in the four cities?
6. It is required to assess the life of three different types of tyres. To eliminate, the effect of the
brand of cars on which they are being used, each type of tyre was tested on each of the brand
of cars. The mileages obtained are given as follows:
Tyre Brand Mileage (in ’000 Kms.)
Car Brand I II III
A 32 38 32
B 30 39 33
C 35 40 40
D 38 41 45
Use an appropriate test to assess whether there is any association between car brands and
types of tyres.
7. A paint manufacturing company is marketing paint tins whose maximum retail price is
Rs 600. The salesmen are free to negotiate the price with the retailers, subject to a minimum
of Rs. 400. Three of its salesmen viz. ‘A’, ‘B’ and ‘C’ have reported the following prices with
the five shops each
Sales Person
Retail Shop A B C
1 450 430 420
2 435 410 200
3 425 405 400
4 400 420 410
5 440 425 415
Test whether the average prices negotiated by the three salesmen are equal.
Non-Parametric Tests
13
1. Relevance– Advantages and Disadvantages
2. Tests for
Randomness of a Series of Observations – Run Test
Change in value or Preference—Sign Test
Specified Mean or Median of a Population – Signed Rank Test
Goodness of Fit of a Distribution – Kolmogorov-Smirnov Test
Contents Comparing Two Populations – Kolmogorov-Smirnov Test
Equality of Two Means – Mann-Whitney (‘U’) Test
Equality of Several Means
— Wilcoxon-Wilcox Test
— Kruskel-Wallis Rank Sum (‘H’) Test – One Way ANOVA
— Friedman’s (F’) Test – Two Way ANOVA
Rank Correlation – Spearman’s
Rank Correlation-Kendal’s Tau
LEARNING OBJECTIVES
This chapter aims to
Highlight the importance of non-parametric tests when the validity of assumptions in tests of sig-
nificance, described in Chapters 11 and 12 on Statistical Inference and ANOVA, respectively, is
doubtful.
Describe certain non-parametric tests of significance relating to randomness, mean of a popula-
tion, means of two or more than two populations, rank correlation, etc.
Relevance
The General Manager of Evergrowing Corporation was going through the report of a con-
sultant who had suggested certain measures to accelerate the growth of the Corporation. He
was bit surprised, that the report highlighted the importance of training to the staff and con-
tained several options to impart the training. However, the selection of options was left to the
Corporation. The General Manager requested the Chief of HRD Department to evaluate the
effectiveness of the various options suggested and recommend appropriate training strategies,
at the earliest. The HRD Chief was aware of the use of statistical tests to compare the effec-
tiveness of training programmes, but was not too sure of the assumptions that were required
for application of those tests. He, therefore, discussed the matter with his friend who was
13.2 Business Research Methodology
teaching Statistics at a management institute. His friend was also not too sure of the validity of
the assumptions in the training environment in the Corporation. He, however, advised, that since
the time available was short and the effectiveness of training was not such a precise variable
like, physical measurements of an item or monetary evaluation of an option, the HRD chief
could use non-parametric tests to evaluate and compare the effectiveness of various training
options. This advice helped the HRD Chief to evaluate the effectiveness of various training
programmes and submit the report, well in time, to the General Manager.
It may be noted that the number of (+) observations is equal to number of (–) observations; both
being equal to 10. As defined above, succession of values with the same sign is called a run. Thus
the first run comprises observations 58 and 61, with the – sign, second run comprises only one ob-
servation i.e. 78, with the + sign, the third observation comprises of three observations 72, 69 and
65, with the negative sign, and so on.
Total number of runs R = 10.
This value of R lies inside the acceptance interval found from Table T11 as from 7 to 15 at 5% level
of significance. Hence the hypothesis that the sample is drawn in a random order is accepted.
Applications
(i) Testing Randomness of Stock Rates of Return The number-of-runs test can be applied to a
series of stock’s rates of return, for each of the trading day, to see whether the stock’s rates of return
are random or exhibit a pattern that could be exploited for earning profit.
(ii) Testing the Randomness of the Pattern Exhibited by Quality Control Data Over Time If
a production process is in control, the distribution of sample values should be randomly distributed
above and below the center line of a control chart. We can use this test for testing whether the pat-
tern of, say, 10 sample observations, taken over time, is random,
(Contd)]
7 2 –
8 2 –
9 1 +
10 1 +
11 2 –
12 1 +
13 2 –
14 2 –
15 1 +
16 1 +
17 1 +
18 1 +
19 1 +
20 1 +
(Contd)]
4 0.0046
5 0.0148
6 0.0370
7 0.0739
8 0.1201
9 0.1602
10 0.1762
11 0.1602
12 0.1201
13 0.0739
14 0.0370
15 0.0148
16 0.0046
17 0.0011
18 0.0002
19 0.0000
20 0.0000
The binomial probability distribution as shown above can be used to provide the decision rule for
any sign test up to a sample size of n = 20. With the null hypothesis p = 0.5 and the sample size
n, the decision rule can be established for any level of significance. In addition, by considering the
probabilities in only the lower or upper tail of the binomial probability distribution, we can develop
rejection rules for one-tailed tests.
The previous table gives the probability of the number of plus signs under the assumption that H0
is true, and is, therefore, the appropriate sampling distribution for the hypothesis test. This sampling
distribution is used to determine a criterion for rejecting H0. This approach is similar to the method
used for developing rejection criteria for hypothesis testing given in the Chapter 11 on Statistical
Inference.
For example, let a = 0.05, and the test be two sided. In this case, the alternative hypothesis will
be
H1 : p 0.50
and we would have a critical or rejection region area of 0.025 in each tail of the distribution. Start-
ing at the lower end of the distribution, we see that the probability of obtaining zero, one, two, three
or four plus signs is 0.0000 + 0.0000 + 0.0002 + 0.0011 + 0.0046 + 0.0148 = 0.0207. Note that we
stop at 5 + signs because adding the probability of six + signs would make the area in the lower
tail equal to .0 207 + .0370 = .0577, which substantially exceeds the desired area of 0.025. At the
upper end of the distribution, we find the same probability of 0.0207 corresponding to 15, 16, or
17, 18, 19 or 20 + signs. Thus, the closest we can come to a = 0.05 without exceeding it is 0.0207
+ 0.0207 = 0.0414. We therefore adopt the following rejection criterion:
Reject H0 if the number of + signs is less than 6 or greater than 14.
Since the number of + signs in the given illustration are 12, we cannot reject the null hypothesis,
and thus the data reveals that the students are not against the option.
In case, sample size is greater than 20, we can use the large-sample normal approximation of
binomial probabilities to determine the appropriate rejection rule for the sign test.
Non-Parametric Tests 13.7
(72, 70) (82, 79) (78, 69) (80, 74) (64, 66) (78, 75)
(85, 86) (83, 77) (83, 88) (84, 90) (78, 72) (84, 82)
We convert the above data in the following tabular form where ‘+’ indicates values of xi > yi .
more than 0.05, and, therefore, the null hypothesis cannot be rejected. It may be noted that if the
number of observations for which (xi > yi) would have been 9 or more than null hypothesis would
have been rejected.
xi (Marks) 55 58 63 78 72 69 66 79 75 80
xi – m0 –15 –12 –7 +8 +2 –1 –4 +9 +5 +10
|xi – m0|* 15 12 7 8 2 1 4 9 5 10
Ranks of |xi – m0|
+ or – signs 1 2 6 5 9 10 8 4 7 3
Ranks with signs of respective
(xi – m0) –1 –2 –6 +5 +9 –10 –8 +4 +7 +3
Any sample values equal to m0 are to be discarded from the sample.
*Absolute value or modulus of (xi – m0).
For example, if the Director wanted to test whether the median of the marks was 70, the test would
have resulted in same values and the same conclusions. It may be verified that the Director does
not have sufficient evidence to contradict the Professor’s guess about median marks as 68.
Example 13.1
The following table gives the ‘real’ annual income that senior managers actually take home in cer-
tain countries, including India. These have been arrived at, by US based Hay Group, in 2006, after
adjusting for the cost of living, rental expenses and purchasing power parity.
Test whether (1) the mean income is equal to 70,000 (2) The median value is 70,000.
It may be verified that both the hypotheses are not rejected.
The following data gives the number of students from each of the five institutes viz. A, B, C, D
and E. These students were out of 50 students selected randomly from each institute.
The relevant data and calculations for the test are given in the following table.
Institutes
A B C D E
Out of Groups of 50 students 5 9 11 16 19
Observed Cumulative Distribution Function Fo (x) 5/60 14/60 25/60 41/60 60/60
Expected Cumulative Distribution Function Fe (x) 12/60 24/60 36/60 48/60 60/60
|Fo (x) – Fe (x)| 7/60 10/60 11/60 7/60 0
Here, we wish to test that the CGPAs for students with Commerce and Engineering backgrounds
follow the same distribution.
The value of the statistic ‘D’ is calculated by preparing the table given below.
The calculation of values in different columns is explained below.
Col. (iii) The first cumulative score is same as the score in Col. (ii). The second cumulative score
is obtained by adding the first score to second score, and so on. The last, i.e. 15th score, is obtained
by adding fourteen scores to the fifteenth score.
Col. (iv) The cumulative distribution function for any observation in second column is obtained
by dividing the cumulative score for that observation by cumulative score of the last i.e. 15th ob-
servation.
(Contd)
6 3.14 19.44 0.41143 3.11 17.59 0.38373 0.027703
7 3.06 22.5 0.47619 3.33 20.92 0.45637 0.01982
8 3.17 25.67 0.54328 2.65 23.57 0.51418 0.029101
9 2.97 28.64 0.60614 3.14 26.71 0.58268 0.023459
10 3.14 31.78 0.67259 2.97 29.68 0.64747 0.025123
11 3.69 35.47 0.75069 3.39 33.07 0.72142 0.029265
12 2.85 38.32 0.81101 3.08 36.15 0.78861 0.022393
13 2.92 41.24 0.8728 3.3 39.45 0.8606 0.012202
14 2.79 44.03 0.93185 3.25 42.7 0.9315 0.000351
15 3.22 47.25 1 3.14 45.84 1 0
Values in Col. (vi) and Col. (vii) are obtained just like the values in Col. (iii) and Col. (iv).
The test statistic D is calculated from Col. (viii) of the above Table as
Maximum of Di = ‘D’ = Maximum of {Fi(C) – Fi(E)} = 0.0293
Since the calculated value of the statistic viz. ‘D’ is 0.0293 is less than its tabulated value
0.338
at 5% level (for n =15), the null hypothesis that both samples come from the same population is
not rejected.
Salesman Sr. No. Training Method A Salesman Sr. No. Training Method B
Sales Sales
1 1,500 1 1,340
2 1,540 2 1,300
3 1,860 3 1,620
4 1,230 4 1,070
5 1,370 5 1,210
6 1,550 6 1,170
7 1,840 7 950
8 1,250 8 1,380
9 1,300 9 1,460
10 1,710 10 1,030
Solution:
Here, the hypothesis to be tested is that both the training methods are equally effective, i.e.
H 0: m 1 = m 2
H 1: m 1 m 2
where, m1 is the mean sales of salespersons trained by method A, and m2 is the mean sales of sales-
persons trained by method B.
The following table giving the sales values for both the groups as also the combined rank of sales
for each of the salesman is prepared to carry out the test.
Training Method A . Training Method B
Salesman Sales Combined Salesman Sales Combined
Sr. No. 5 kg. Tins Rank Sr. No 5 kg. Tins Rank
1 1,500 14 1 1,340 10
2 1,540 15 2 1,300 8.5
3 1,860 20 3 1,620 17
4 1,230 6 4 1,070 3
5 1,370 11 5 1,210 5
6 1,550 16 6 1,170 4
7 1,840 19 7 950 1
8 1,250 7 8 1,380 12
9 1,300 8.5 9 1,460 13
10 1,710 18 10 1,030 2
Average Sales 1,515 — — 1,253 —
Sum of Ranks — R1 = 134.5 — — R2 = 75.5
‘U’ values — 20.5 — — 79.5
The ‘U’ statistic for both the training methods are calculated as follows:
10 (10 1)
U = 10 10 + – 134.5 = 20.5
2
10 (10 1)
U = 10 10 + – 75.5 = 79.5
2
Non-Parametric Tests 13.15
The tabulated or critical value of U with nl = n2 = 10, for a = 0.5, is 23, for a two-tailed test vide
Table T14.
It may be noted that in this test too, like the signed rank test in Section 13.4, the calculated value
must be smaller than the critical value to reject the null hypothesis.
Since the calculated value 20.5 is smaller than the critical value 23, the null hypothesis is rejected
that the two training methods are equally effective. It implies that Method A is superior to Method
B.
From the values of ‘Rank Sum’, we calculate the net difference in ‘Rank Sum’ for every pair of
cities, and tabulate as follows:
Difference in Ranks D M K A B
D 0 12 11 16* 1
M 12 0 1 4 11
(Contd)
13.16 Business Research Methodology
(Contd)
K 11 1 0 5 10
A 16* 4 5 0 15*
B 1 11 10 15* 0
The critical value for the difference in ‘Rank Sums’ for number of cities = 5, number of observa-
tions for each city = 5, and 5% level of significance is 13.6, vide Table T15.
Comparing the calculated difference in rank sums with this 13.6, we note that difference in rank
sums in A (Ahmedabad) and D (Delhi) as also, difference in rank sums between A (Ahmedabad)
and B (Bangalore) are significant.
Note: In the above case, if the data would have been available in terms of actual values rather
than ranks alone, ANOVA would just lead to conclusion that means of D, M, K, A and B are
not equal, but would have not gone beyond that. However, the above test concludes that mean
of Ahmedabad is not equal to mean of Delhi as also mean of Bangalore. Thus, it gives com-
parison of all pairs of means.
12 k T j2
H= 3 ( n 1) (13.4)
n ( n 1) j 1 nj
where
Tj = Sum of ranks for treatment j
nj = Number of observations for treatment j
n = nj = Total number of observations
k = Number of treatments
The test is illustrated through an example given below.
Example 13.4
A chain of departmental stores opened three stores in Mumbai. The management wants to compare the
sales of the three stores over a six day long promotional period. The relevant data is given below.
Non-Parametric Tests 13.17
Use the Kruskal-Wallis test to compare the equality of mean sales in all the three stores.
Solution:
The combined ranks to the sales of all the stores on all the six days are calculated and presented in
the following Table.
Store ‘A’ Store ‘B’ Store ‘C’
Sales Combined Rank Sales Combined Rank Sales Combined Rank
16 1 20 5.5 23 10
17 2 20 5.5 24 11
21 7.5 21 7.5 26 13
18 3 22 9 27 14
19 4 25 12 29 16.5
29 16.5 28 15 30 18
T1 = 34.0 T2 = 54.5 T3 = 82.5
It may be noted that the rank given for sales as 21 in Store ‘A’ and for sales as 21 in Store ‘B’
are given equal ranks as 7.5. Since there are six ranks below, the rank for 21 would have been 7,
but since the value 21 is repeated, both the values get rank as average of 7 and 8 as 7.5. The next
value 22 has been assigned the rank 9. If there were three values as 21, the rank assigned to the
values would have been average of 7, 8 and 9 as 8, and the next value would have been ranked as
10.
Now the H statistic is calculated as
12 34.02 54.52 82.52
H= – 3(18 + 1)
18 (8 1) 6
12 10932.5
= – 57
18 (18 1) 6
10932.5
= 12 – 57
342 6
= 63.93 – 57 = 6.93
2
The value of c at 2 d.f. and 5% level of significance is 5.99.
Since the calculated value of H is 14.45, and is greater than tabulated value, and it falls in the
critical region, we reject the null hypothesis that all the three stores have equal sales.
13.18 Business Research Methodology
In this case, the null hypothesis is that there is no significant difference among the growth rates
of the three brands. The alternative hypothesis is that at least two samples (two brands) differ from
each other.
Under null hypothesis, the Friedman’s test statistic is:
k
12
F= R 2j 3n ( k + 1) (13.5)
nk ( k + 1) j =1
where,
k = Number of samples (brands) = 3 (in the illustration)
n = Number of observations for each sample (brand) = 6 (in the illustration)
Rj = Sum of ranks of jth sample (brand)
It may be noted that this ‘F’ is different from Fisher’s ‘F’.
The statistical tables exist for the sampling distribution of Friedman’s ‘F’, but these are not readily
available for various values of n and k. However, the sampling distribution of ‘F’ can be approxi-
mated by a c 2 (chi-square) distribution with k – l degrees of freedom. The chi-square distribution
table shows, that with 3 – 1 = 2 degrees of freedom, the chi-square value at 5% level of significance
is c 2 = 5.99.
If the calculated value of ‘F’ is less than or equal to the tabulated value of chi-square (c 2 at 5%
level of significance), growth rates of brands are considered statistically the same. In other words,
Non-Parametric Tests 13.19
there is no significant difference in the growth rates of the brands. In case the calculated value ex-
ceeds the tabulated value, the difference is termed as significant.
For the above example, the following null hypothesis is framed:
Ho: There is no significant difference in the growth rates of the three brands of refrigerators.
For calculation of F, the following table is prepared. The figures in brackets indicate the rank
of growth of a brand in a particular year—the lowest growth is ranked 1 and the highest growth is
ranked 3.
Total Ranks
Year Brand ‘A’ Brand ‘B’ Brand ‘C’ (Row Total)
1 15 14 32 6
(2) (1) (3)
2 18 15 30 6
(2) (1) (3)
3 15 11 27 6
(2) (1) (3)
4 13 19 38 6
(l) (2) (3)
5 20 18 33 6
(2) (1) (3)
6 27 20 22 6
(3) (1) (2)
With reference to the above table, Friedman’s test amounts to testing that sums of ranks (Rj) of
various brands are all equal.
The value of F is calculated as,
12
F= (122 + 72 + 172) – 3 6(3 + 1)
72
482
= – 72 = 80.3 – 72 = 8.3
6
It is observed that the calculated value of ‘F’ statistic is greater than the tabulated value of c 2
(5.99 at 5% level of significance and 2 d.f.). Hence, the hypothesis that there is no significant dif-
ference in the growth rates of the three brands is rejected.
Therefore, we conclude that there is a significant difference in the growth rates of the three
brands of refrigerators, during the period under study. The significant difference is due to the
best growth rate of brand ‘C’.
In the above example, if the data were given for six showrooms instead of six years, the test
would have remained the same.
13.20 Business Research Methodology
where, n is the number of pairs of ranks given to individuals or units or objects, and di is the dif-
ference in the two ranks given to ith individual/unit/object
There is no statistic to be calculated for testing the significance of the rank correlation. The cal-
culated value of rs is itself compared with the tabulated value of rs, given in Table T9, at 5% or 1%
level of significance. If the calculated value is more than the tabulated value, the null hypothesis
that there is no correlation in the two rankings is rejected.
Here, the hypotheses are as follows
H 0: r s = 0
H 1: r s 0
In Example 10.1, of Chapter 10 on Correlation and Regression, the rank correlation between
priorities of ‘Job Commitment Drivers’ among executives from India and Asia Pacific was found
to be 0.9515. Comparing this value with the tabulated value of rs at n, i.e.10 d.f. and 5% level of
significance as 0.636, we find that the calculated value is more than the tabulated value, and hence
we reject the null hypothesis that there is no correlation between priorities of ‘Job Commitment
Drivers’ among executives from India and Asia Pacific.
13.11.1 Test for Significance of Spearman’s Rank Correlation for Large Sample
Size
If the number of pairs of ranks is more than 30, the distribution of the rank correlation rs under the
null hypothesis that rs = 0, can be approximated by normal distribution with mean 0 and s.d. as
1
. This can be expressed in symbolic form as follows:
n 1
1
for n > 30, rs ~ N 0 , (13.6)
n 1
It may be verified that the rank correlation between the rankings in Mumbai and Bangalore is
0.544. Thus the value of z i.e. standard normal variable is
0.544 0
z = = 0.544 7 = 3.808 which is more than 1.96, the value of z at 5% level
1/ 7
of significance. Thus, it may be concluded that the ranking between Mumbai and Bangalore are
significantly correlated.
Non-Parametric Tests 13.21
nk ( k 2 1)
where, s= (13.10)
12
13.22 Business Research Methodology
n = 3, k = 10
3 10 (100 1)
s= = 247.5
12
sd = 29.2
29.2
s12 = = 1.08 (d.f. = 9)
3 (10 1)
29.2
247.5
s22 3
= = 11.89 (d.f. = 20)
10 (3 1)
s22
F= (Because s22 > s12 )
s12
It may be recalled that in the Fisher’s F ratio of two sample variances, the greater one is taken
in the numerator, and d.f. of F are taken, accordingly vide Section 10.11
11.89
F= = 11.00
1.08
Tabulated value of ‘F’ (20, 9) d.f. at a = 0.05 as per Table T4.
F20, 9 = 2.94
Since the calculated value is more than the tabulated value of F, we reject the null hypothesis,
and conclude that rankings on the given parameter are not equal.
used when the data is available in ordinal form. It is more popular as Kendall’s Tau, and denoted
by the Greek letter ‘tau’( corresponding to ‘t’ in English). It measures the extent of agreement or
association between rankings, and is defined as
( nc nd )
t= (13.11)
( nc nd )
where
nc : Number of concordant pairs of rankings
nd : Number of disconcordant pairs of rankings
The maximum value of nc + nd could be the total number of pairs of ranks given by two different
persons or by the same person on two different criteria. For example, if the number of observa-
tions is, say 3 ( call them ‘a’, ‘b’ and ‘c’), then the pairs of observations will be: ab, ac and bc.
Similarly, if there are four pairs, the possible pairs are six in number as follows:
ab, ac, ad, bc, bd, cd
It may be noted that, more the number of concordant pairs, more will be the value of the numerator
and more the value of the coefficient, indicating higher degree of consistency in the rankings.
A pair of subjects i.e. persons or units, is said to be concordant, if for a subject, the rank of both
variables is higher than or equal to the corresponding rank of both the variables. On the other hand,
if for a subject, the rank of one variable is higher or lower than the corresponding rank of the other
variable and the rank of the other variable is opposite i.e. lower/higher than the corresponding rank
of the other variable, then the pair of subjects is said to be discordant.
The concepts of concordant and discordant pairs, and the calculation of ‘t’ are explained through
an example given below.
As per a study published in Times of India dated 4th September 2006, several cities were ranked
as per the criteria of ‘Earning’, ‘Investing’ and ‘Living’. It is given in the CD in the chapter relating
to ‘Simple Correlation and Regression’. An extract from the full table is given as follows:
City Ranking as per ‘Earning’ Ranking as per ‘Investing’
Chennai 3 2
Delhi 1 3
Kolkata 4 4
Mumbai 2 1
Now, we form all possible pairs of cities. In this case, total possible pairs are 4C2 = 6, viz. DM,
DC, DK, MC, MK and CK.
The status of each pair i.e. whether, it is concordant or discordant, along with reasoning is given
in the following table:
13.24 Business Research Methodology
It may be noted that, for a pair of subjects (cities in the above case), when a subject ranks higher
on one variable also ranks higher on the variable or even equal to the rank of the other variable,
then the pair is said to be concordant. On the other hand, if a subject ranks higher on one variable
and lower on the other variable, it is said to be discordant.
From the previous table, we note that
nc = 3 and nd = 3
Thus, the Kendall’s coefficient of correlation or concordance is
= (3 – 3 )/5 = 0.
As regards the possible values and interpretation of the value of t, the following results can be
concluded:
If the association between the two rankings is perfect (i.e., the two rankings are the same), the
coefficient has value 1.
If the association between the two rankings is perfect (i.e., one ranking is the reverse of the
other), the coefficient has value 1.
In other cases, the value lies between 1 and 1, and increasing values imply increasing asso-
ciation between the rankings. If the rankings are completely independent, the coefficient has
value 0.
Incidentally, Spearman’s correlation and Kendall’s correlations are not comparable, and their values
could be different.
The Kendall’s Tau also measures the strength of association of the cross tabulations. It also
tests the strength of association of the cross tabulations when both variables are measured at
the ordinal level, and, in fact, is the only measure of association in ordinal form in a cross
tabulation data available in ordinal form.
G L O S S A RY
Non-Parametric Tests Tests of significance used when certain assumptions about the usual tests
of significance are not valid or doubtful. Also, these tests are for aspects
like randomness, rank correlation, etc.
Run A succession of values with the same sign or type (in case of qualitative
data)
Signed Rank The ranks assigned to observations are attached a sign, viz. + or –
Rank Sum Sum of ranks
Non-Parametric Tests 13.25
10. The zero value of rank correlation between two pairs of rankings about the same subjects:
implies that:
(a) The two rankings of subjects are not related to each other
(b) 6 di2 = n (n2 – 1)
(c) di2 = 0
(d) The rankings are biased
EXERCISES
1. The following data gives the increase (I) or decrease (D) in daily rate of return of a share on
30 consecutive days.
I I I D D I D I I I
D D D I I D D I I D
I I I D I I D I D D
Test whether the increase or decrease in the daily rate of return follows a random pattern.
2. A car manufacturer claims that its new car gives an average mileage of 12 kms. per litre
(km.p.l) of petrol. A sample of 12 cars is taken, and their mileage recorded as follows (in
km.p.l):
13.2, 12.7, 13.3, 13.0, 12.8, 12.7, 12.6, 12.6, 12.7, 12.4, 11.8, 13.2
Use the signed rank test to ascertain the claim of the manufacturer. Also use this test to
ascertain whether the median mileage of the car is 12 km.p.l.
3. Following is the data for number of guest faculty visiting a management institute during the
academic year 2006-07 comprising 280 academic days.
Number of Guest Faculty Number of Days
0 100
1 105
2 55
3 15
4 5
Use Kolmogorov–Smirnov Test to test whether the above data follows a Poisson distribution
with mean equal to 1.
4. Following is the data of the percentage of MBA girl students in five similar level management
institutes.
Management Institutes Percentage of MBA Girl Students
A 21
B 20
C 19
D 18
E 22
Non-Parametric Tests 13.27
Use Kolmogorov-Smirnov Test to test whether the percentage of MBA girl students in all the
five institutes is the same.
5. The management of an organisation wanted to assess whether there is any difference in the
officers who were recruited from campus (MBAs) and who were recruited from open market
with respect to their knowledge and application relevant to organisation Accordingly, a suitable
test was designed and conducted for 10 officers each from the two categories. Their scores on
a scale of 10 are given below:
Srl. No. Campus Recruited Officers Srl. No. Open Market Recruited Officers
1 9.7 1 9.2
2 9.2 2 8.9
3 8.9 3 7.2
4 7.9 4 8.5
5 9.5 5 8.1
6 9.6 6 7.8
7 9.0 7 8.5
8 9.5 8 9.0
9 8.8 9 9.3
10 8.5 10 8.0
Use Mann-Whitney test to test whether the mean scores of the two categories of officers are
equal.
6. A car manufacturer is procuring car batteries from two companies. For testing whether the two
brands of batteries, say ‘A’ and ‘B’, had the same life, the manufacturer collected data about
the lives of both brands of batteries from 20 car owners–10 using ‘A’ brand and 10 using ‘B’
brand. The lives were reported as follows:
Lives in Months
Battery ‘A’ : 50 61 54 60 52 58 55 56 54 53
Battery ‘B’ : 65 57 60 55 58 59 62 67 56 61
Use Mann-Whitney test to test the equality of lives of the two types of batteries.
7. For the data given in Exercise 3 of Chapter 10, about the closing stock prices of three individual
companies viz. ICICI Bank, L & T and Reliance Industries Ltd. test whether the ranks as daily
rate of return are equal for all the above three companies, using Kruskall-Wallis test.
8. For the data given in Exercise 2 of Chapter 10, about ranks of some selected companies as per
net profit, market capitalisation and overall rank; test whether the ranks are equal for all the
three parameters of all the companies.
9. For the same data as in the Exercise 10 below, use the Wilcoxon-Wilcox Test for Comparison
of Multiple Treatments to test equality of daily rate of returns for all the three companies as
also the three pairs of companies.
10. Given the following monthly rates of return on BSE 30, BSE 100, BSE 200 and BSE 500, for
the period January 2005 to November 2005, use Friedman’s test to test whether the rates of
return are equal for all the above indices of Bombay Stock Exchange.
13.28 Business Research Methodology
LEARNING OBJECTIVES
There are several situations in real life or work environment when we are required to collect data
on several variables. The situation is analogous to the examination for selection in IAS and other
Central services when a candidate is examined in several subjects and marks in these subjects are
summarised for deciding the suitability of a candidate. The medical fitness of a candidate for entry to
Defence Services is based on several health-related parameters. Similar is the situation relating to
studies conducted for designing and marketing of physical as well as financial products and services.
This chapter describes various techniques that are available for reducing the data on many variables
and summarising it with few indicators that can be interpreted to derive meaningful conclusions relat-
ing to designing and marketing of products and services. Incidentally, these techniques are also used
for many behavioural and social studies. In addition to providing a comprehensive understanding and
applications of the techniques, the chapter also illustrates, with examples, the use of SPSS, step by
step, in conducting the studies in their entirety. The chapter is aimed at generating confidence in us-
ing SPSS for arriving at final conclusions/solutions in a research study. This should provide motivation
for learning these techniques that have, so far, not received due importance mainly due to the lack of
confidence as well as awareness in using SPSS package.
14.2 Business Research Methodology
Relevance
Nowadays, there are several features associated with any product or service. With time, these
features are increasing and providing several options to the consumers. Like other mobile
phone manufacturing companies, the MOCEL company was also coming out with new models
and new features. However, the CEO of the company was not too satisfied with the moderate
growth in business in view of the exponential growth in the industry. He, therefore, hired the
services of a consultant to advise the company about the preferences of potential customers and
suggestions of the existing customers. The consultant conducted a survey among the various
strata of the society, to ascertain the needs, preferences, price sensitiveness, etc. He also col-
lected similar information from dealers. The vast amount of data was analysed with the help
of techniques such as factor analysis, cluster analysis, conjoint analysis and multidimensional
scaling, described in this chapter, and suitable recommendations made to the CEO of MOCEL
company. MOCEL could improve its market share, substantially.
Artificial intelligence software analyses the credentials of the candidate (contesting an election)
and gives a rating on how successful he or she will be as a politician.
Dr APJ Abdul Kalam (in Hindustan Times dt. 25th March, 2007)
Statistical Techniques, Their Relevance and Uses for Designing and Marketing of
Products and Services
(This Table may be referred to when begins to read the particular techniques described in various
Sections of the Chapter)
(Contd)
14.4 Business Research Methodology
(Contd)
Logistic Regression Logistic regression is highly useful in biometrics and health sciences.
It is a technique that assumes the It is used frequently by epidemiologists for the probability (sometimes
errors are drawn from a binomial interpreted as risk) that an individual will acquire a disease during
distribution. some specified period of vulnerability.
In logistic regression, the dependent Credit Card Scoring: Various demographic and credit history variables
variable is the probability that an could be used to predict if an individual will turn out to be ‘good’ or
event will occur, hence it is con- ‘bad’ customers.
strained between 0 and 1. Market Segmentation: Various demographic and purchasing informa-
All of the predictors can be binary, a tion could be used to predict if an individual will purchase an item
mixture of categorical and continuous or not.
or just continuous.
Multivariate Analysis of Variance Determines whether statistically significant differences of means of
(MANOVA) several variables occur simultaneously between two levels of a vari-
It simultaneously explores the rela- able. For example, assessing whether
tionship between several non-metric (i) a change in the compensation system has brought about changes
independent variables (Treatments, in sales, profit and job satisfaction.
say Fertilisers) and two or more (ii) geographic region (North, South, East, West) has any impact on
metric dependant variables (say, Yield consumers’ preferences, purchase intentions or attitudes towards
and Harvest Time). If there is only specified products or services.
one dependent variable, MANOVA (iii) a number of fertilisers have equal impact on the yield of rice as
is the same as ANOVA. also on the harvest time of the crop.
Principal Component Analysis One could identify several financial parameters and ratios exceeding
(PCA) ten for determining the financial health of a company. Obviously, it
Technique for forming set of new would be extremely taxing to interpret all such pieces of information
variables that are linear combinations for assessing the financial health of a company. However, the task could
of the original set of variables, and be much simpler if these parameters and ratios could be reduced to
are uncorrelated. The new variables a few indices, say two or three, which are linear combinations of the
are called Principal Components. original parameters and ratios.
These variables are fewer in number A multiple regression model may be derived to forecast a parameter
as compared to the original variables, like sales, profit, price, etc. However, the variables under consideration
but they extract most of the informant could be correlated among themselves indicating multicollinearity in
provided by the original variables. the data. This could lead to misleading interpretation of regression
coefficients as also increase in the standard errors of the estimates
of parameters. It would be very useful, if the new uncorrelated vari-
ables could be formed which are linear combinations of the original
variables. These new variables could then be used for developing the
regression model, for appropriate interpretation and better forecast.
Common Factor Analysis (CFA) Helps in assessing
It is a statistical approach that is the image of a company/enterprise
used to analyse interrelationships attitudes of sales personnel and customers
among a large number of vari- preference or priority for the characteristics of
ables (indicators) and to explain – a product like television, mobile phone, etc.
these variables (indicators) in terms – a service like TV programme, air travel, etc.
of a few unobservable constructs
(factors). In fact, these factors impact the
variables, and are reflective indicators
(Contd)
Multivariate Statistical Techniques 14.5
(Contd)
of the factors. The statistical approach
involves finding a way of condensing
the information contained in a number
of original variables into a smaller set
of constructs (factors)—mostly one or
two—with a minimum loss of informa-
tion.
Identifies the smallest number of
common factors that best explain or
account for most of the correlation
among the indicators. For example,
intelligence quotient of a student
might explain most of the marks
obtained in Mathematics, Physics,
Statistics, etc. Yet another example,
when two variables x and y are highly
correlated, only one of them could be
used to represent the entire data.
Canonical Correlation Analysis Used in studying relationship between types of products purchased
(CRA) and consumer life styles and personal traits. Also, for assessing impact
An extension of multiple regression of life styles and eating habits on health as measured by number of
analysis (MRA involving one de- health-related parameters.
pendant variable and several metric Given assets and liabilities of a set of banks/financial institutions,
independent variables). It is used for helps in examining interrelationship of variables on the asset and li-
situations wherein there are several ability sides.
dependent variables and several in- HRD department might like to study the relationship between set of
dependent variables. behavioural, technological and social skills of a salesman with the set
Involves developing linear combina- of variables representing sales performance, discipline and cordial
tions of the sets of variables (both de- relations with staff.
pendant and independent) and studies The Central Bank of a country might like to study the relationship
the relationship between the two sets. between sets of variables representing several risk factors and the
The weights in the linear combination financial indicators arising out of a bank’s operations. Similar analysis
are derived based on the criterion that could be carried out for any organisation.
maximises the correlation between
the two sets of variables.
Cluster Analysis It helps in classifying a given set of entities into a smaller set of distinct
It is an analytical technique that is entities by analysing similarities among the given set of entities.
used to develop meaningful subgroups Some situations where the technique could be used are:
of entities which are homogeneous A bank could classify its large network of branches into clusters
or compact with respect to certain (groups) of branches which are similar to each other with respect to
characteristics. Thus, observations in specified parameters.
each group would be similar to each An investment bank could identify groups of firms that are vulnerable
other. Further, each group should be for takeover.
different from each other with respect A marketing department could identify similar markets where products
to the same characteristics, and there- or services could be tested or used for target marketing.
fore, observations of one group would An insurance company could identify groups of motor insurance policy
be different from the observations of holders with high claims.
the other groups.
(Contd)
14.6 Business Research Methodology
(Contd)
Conjoint Analysis Useful for analysing consumer responses, and use the same for design-
Involves determining the contribution ing of product and services.
of variables (each of several levels) to Helps in determining the contributions of the predictor variables
the choice preference over combina- and their respective levels to the desirability of the combinations of
tions of variables that represent real- variables.
istic choice sets (products, concepts, For example, how much does the quality of food contribute to con-
services, companies, etc.) tinued loyalty of a traveller to an airline? Which type of food is liked
most?
Multidimensional Scaling Useful for designing of products and services.
It is a set of procedures, for drawing It helps in:
pictures of data so as to visualise and illustrating market segments based on indicated preferences.
clarify relationships described by the identifying the products and services that are more competitive with
data more clearly. each other.
The requisite data is typically col- understanding the criteria used by people while judging objects (prod-
lected by having respondents give ucts, services, companies, advertisements, etc.).
simple one-dimensional responses.
Transforms consumer judgments/per-
ceptions of similarity or preferences
in usually a two-dimensional space.
Manpower in a Sales Organisation Number of Sales Offices + Business per Sales Office
EPS (time series) Sales + Dividend + Price
or EPS (cross-sectional)
Sales of a Company Expenditure on Advertisement + Expenditure on R&D
Return on BSE SENSEX Return on Stock of Reliance Industries + Return on Stock of Infosys
Technologies
the dependent variable and independent variables is measured by multiple correlation coefficient.
The methodology of deriving the multiple regression equation and calculating multiple correlation
coefficient is illustrated below.
Illustration 14.1
Let the dependent variable of interest be y which depends on two independent variables, say x1 and
x 2.
The linear relationship, among y, x1 and x2 can be expressed in the form of the regression equa-
tion of y on x1 and x2, in the following form:
y = bo + b1 x1 + b2 x2 (14.1)
where bo is referred to as ‘intercept’ and b1 and b2 are known as regression coefficients.
The sample comprises ‘n’ triplets of values of x1, y and x2, in the following format:
y x1 x2
y1 x11 x21
y2 x12 x22
. . .
yn x1n x2n
The values of constants, bo, b1 and b2 are estimated with the help of Principle of Least Squares just
like values of a and b were found while fitting the equation y = a + bx in Chapter 10 on Simple
Correlation and Regression Analysis. These are calculated by using the above sample observations/
values, and with the help of the formulas given below:
These formulas and manual calculations are given for illustration only. In real life, these are easily
obtained with the help of personal computers wherein the formulas are already stored.
( yi x1i nyx1) ( x22i nx22) ( yi x2i ny x2) ( x1i x2i n x1 x2)
b1 =
( x12i nx12) ( x22i nx22) ( x1i x2i nx1 x2) 2
The effectiveness or the reliability of the relationship, thus, obtained is judged by the multiple
coefficient of determination, usually denoted by R2, and is defined as the ratio of variation explained
Multivariate Statistical Techniques 14.9
by the regression equation 14.1 and total variation of the dependent variable y. Thus,
Explained Variation in y
R2 = (14.4)
Total Variation in y
Unexplained Variation
R2 = 1 – (14.5)
Total Variation
(yi yˆi )2
=1– (14.6)
( yi yi )2
It may be recalled from Chapter 10, that total variation in the variable y is equal to the variation
explained by the regression equation plus unexplained variation by the regression equation. Math-
ematically, this is expressed as:
( yi y )2 ( yˆi y )2 ( yi yˆi ) 2
Total Variation Explained Variation Unexplained Variation
_ _
where yi is the observed value of y, y is the mean of all,_ ^y is the estimate of the value yi by the
regression equation 14.1. It may be recalled that Σ(^yi – y)2 is the explained variation of y by the
estimate of y, and Σ(yi – ^yi)2 is the unexplained variation of y by the estimate of y(^y). If yi is equal to
the estimate ^yi, then all the variation is explained by ^yi, and, therefore, unexplained variation is zero.
In such case, total variation is fully explained by the regression equation, and R2 is equal to 1.
The square root of R2 viz. R is known as the coefficient of multiple correlation and is always
between 0 and 1. In fact, R is the correlation between the independent variable and its estimate
derived from the multiple regression equation, and as such it has to be positive.
All the calculations and interpretations for the multiple regression equation and coefficient of
multiple correlation or determination have been explained with the help of an illustration given
below:
Example 14.1
The owner of a chain of ten stores wishes to forecast net profit with the help of next year’s projected
sales of food and non-food items. The data about current year’s sales of food items, sales of non-
food items as also net profit for all the ten stores are available as follows:
Table 14.1 Sales of Food and Non-Food Items and Net Profit of a Chain of Stores
Supermarket No. Net Profit (Rs. Crore) Sales of Food Items Sales of Non-Food
(Rs. Crore) Items (Rs. Crore)
y
x1 x2
1 5.6 20 5
2 4.7 15 5
3 5.4 18 6
(Contd)
14.10 Business Research Methodology
(Contd)
4 5.5 20 5
5 5.1 16 6
6 6.8 25 6
7 5.8 22 4
8 8.2 30 7
9 5.8 24 3
10 6.2 25 4
Substituting the values of bo, b1 and b2, the desired relationship is obtained as
y = 0.233 + 0.196 x1 + 0.287 x2 (14.7)
This equation is known as the multiple regression equation of y on x1 and x2, and it indicates as to
how y changes with respect to changes in x1 and x2. The interpretation of the value of the coefficient
of x1 viz. ‘b1’ i.e. 0.196, is that if x2 (sales of non-food items) is held constant, then for every crore
of sales of food items, the net profit increases by Rs. 0.196 crore i.e. Rs. 19.6 lakh. Similarly, the
interpretation of the value of coefficient of x2 viz. ‘b2’ i.e. 0.287 is that if the sales of non-food items
increases by one crore rupees, the net profit increases Rs. by 0.287 crore i.e. Rs. 28.7 lakh.
Multivariate Statistical Techniques 14.11
The effectiveness or the reliability of this relationship is judged by the multiple coefficient
of determination, usually denoted by R2, and is defined as given in equation 14.4 as:
yi ^*
yi yi – ^yi 2 _ 2
(yi – ^yi) ( y i– y i)
(1) (2) (3) (4) (5)
5.6 5.587 0.0127 0.0002 0.0961
4.7 4.607 0.0928 0.0086 1.4641
5.4 5.482 –0.082 0.0067 0.2601
5.5 5.587 –0.087 0.0076 0.1681
5.1 5.09 0.0099 0.0001 0.6561
6.8 6.854 –0.054 0.0029 0.7921
5.8 5.693 0.1075 0.0116 0.0121
8.2 8.121 0.0789 0.0062 5.2441
5.8 5.798 0.0023 0.0000 0.0121
6.2 6.281 –0.081 0.0065 0.0841
Sum = 59.1 59.1 0 Sum = 0.0504 Sum = 8.789
_
y = 5.91 (Unexplained (Total Variation)
Variation)
* Derived from the earlier fitted equation, y = 0.233 + 0.196 x1 + 0.287 x2
The scatter diagram indicates a positive linear correlation between the net profit and the sales of
food items.
As stated earlier, the effectiveness/reliability of the regression equation is judged by the coefficient
of determination, can be obtained as
r2 = 0.876
This value of r2 = 0.876 indicates that 87.6% of the variation in net profit is explained by the variation
in sales of food items, and thus one may feel quite confident in forecasting net profit with the help
of the sales of food items. However, before doing so, it is desired that one examines the possibility
of considering some other variables also either as an alternative or in addition to the variable (sales
of food items) already considered, to improve the reliability of the forecast.
As mentioned in Chapter 10, the correlation coefficient is also defined as
It may be noted that the unexplained variation or residual error is 1.109 when the simple regression
equation 14.9 of net profit on sales of food items is fitted but it was lower as reduced to 0.05044,
when multiple regression equation 14.7 was used by taking into account adding one more variable
as sale of non-food items (x2).
14.14 Business Research Methodology
Also it may be noted that only one variable viz. sales of food items is considered then r2 is 0.876
i.e. 87.6% of variation in net profit is explained by variation in sales of food item but when both
the variables viz. sales of food as well as non-food items are considered. R2 is 0.9943 i.e. 99.43%
of variation in net profit is explained by variation in both these variables.
ryx 1
ryy ryx 2
rx x rx y rx x
2 1 2 2 2
Since rx x , ryy and rx x are all equal to 1, the matrix can be written as
1 1 2 2
1 1
rx y rx x
1 2
ryx
1
1 ryx
2
rx x rx y 1
2 1 1
Further, since rxy and ryx are equal, and so are rxz and rxz, it is sufficient to write the matrix in the
following form:
1 1
rx y rx x
1 2
– 1 ryx
2
– – 1
If there are three variables x1, x2 and y then simple correlation coefficient can be defined between all
pairs of x1, x2 and y. However when there are more than two variables in a study, then the simple cor-
relation between any two variables are known as total correlation. All nine of such possible pairs are
represented in the form of a matrix given above.
Multivariate Statistical Techniques 14.15
multiple coefficient of determination, takes into account n (number of observations) and k (number
of variables) for comparison in two situations, and is calculated as
__
(
n–1
)
R2 = 1 – ________ (1 – R2)
n–k–1
(14.10)
where n is the sample size or the number of observations on each of the variables, and k is the
number of independent variables. For the above example,
__
(
10 – 1
R2 = 1 – _________ )
10 – 2 – 1 (1 – 0.9943)
= 0.9927
To start
__ with when an independent variable is added i.e. the value of k is increased, the
value of R2 increases but when the addition of another variable does __ not contribute towards
explaining the variability in the dependent variable, the value of R2 decreases. This implies
that the addition of that__ variable is redundant.
The adjusted R2 i.e. 2 2
__ R is lesser than R as number of observations per independent variable
decreases. However, R tends to be equal to R2 as sample size increases for the given number of
2
independent
__ variables.
2
R is useful in comparing two regression equations having different number of independent vari-
ables or when the number of variables is the same but both are based on different sample sizes.
It may be verified that the multiple regression equation with amount of insurance premium as
dependent variable and income as well as marital status as independent variables is
Premium = 5.27 + 0.091 Income + 8.95 Marital Status
The interpretation of the coefficient 0.091 is that for every additional thousand rupees of income,
the premium increases by 1000 × 0.091 = Rs. 91
The interpretation of the coefficient 8.95, is that a married person takes an additional premium
of Rs. 8, 950 as compared to a single person.
total correlation coefficients indicate the relationship between the two variables ignoring the pres-
ence or effect of the other third variable. The multiple correlation coefficient Ry ⋅ x1 x2 indicates
the correlation between y and the estimate of y obtained by the regression equation of y on x1 and
Multivariate Statistical Techniques 14.17
x2. The partial correlation coefficients are defined as correlation between any two variables when
the effect of the third variable on these two variables is removed or when the third variable is held
constant. For example, ryx .x means the correlation between y and x1 when the effect of x2 on y and
1 2
x1 is removed or x2 is held constant. The various partial correlation coefficients viz. ryx .x ryx .x and
1 2 2 1
The values of the above partial correlation coefficients, ryx .x , ryx .x and rx .x ⋅ y are 0.997, 0.977 and
1 2 2 1 1 2
0.973, respectively.
The interpretation of ryx .x = 0.977 is that it indicates the extent of linear correlation between y
2 1
yi xi zi xi2 Y X1 X2
1 5.6 20 5 400 –0.331 –0.34 –0.09
2 4.7 15 5 225 –1.291 –1.48 –0.09
3 5.4 18 6 324 –0.544 –0.8 0.792
4 5.5 20 5 400 –0.437 –0.34 –0.09
5 5.1 16 6 256 –0.864 –1.25 0.792
6 6.8 25 6 625 0.949 0.798 0.792
7 5.8 22 4 484 –0.117 0.114 –0.97
8 8.2 30 7 900 2.443 1.937 1.673
9 5.8 24 3 576 –0.117 0.57 –1.85
10 6.2 25 4 625 0.309 0.798 –0.97
Sum 59.1 215 51 4815
mean 5.91 21.5 5.1
Variance 0.88 19.25 1.29
s.d. 0.937 4.387 1.136
14.2.9 Properties of R2
As mentioned earlier that the coefficient of multiple correlation R is the ordinary or total correla-
tion between the dependent variable and its estimate as derived by the regression equation i.e. R = r
Iy^yi, and as such is always positive. Further,
(i) R2 ≥ each of the square of total correlation coefficients of y with any one of the variables,
x2, …., xk.
(ii) R2 is high if correlation coefficients between independent variables viz. rxi xj s are all low.
2
(iii) If rxi xj = 0 for each i ≠ j, then R2 = r2yx + r2yx + r2yx + … + ryxk
1 2 3
14.2.10 Multicollinearity
Multicollinearity refers to the existence of high correlation between independent variables. Even
if the regression equation is significant for the equation as a whole, it could happen that due to
multicollinearity, the individual regression coefficients could be insignificant indicating that they do
not have much impact on the value of the dependent variable. When two independent variables are
highly correlated, they basically convey the same information and logically appears that only one
of the two variables could be used in the regression equation.
Multivariate Statistical Techniques 14.19
If the value of R2 is high and the multicollinearity problem exists, the regression equation
can still be used for prediction of dependent variables given values of independent variables.
However, it should not be used for interpreting partial regression coefficients to indicate impact
of independent variables on the dependent variable.
The multicollinearity among independent variables can be removed, with the help of Principal
Component Analysis discussed in this chapter. It involves forming new set of independent variables
which are linear combinations of the original variables in such a way that there is no multicollinear-
ity among the new variables.
If there are two variables, sometimes the exclusion of one may result in an abnormal change in
the regression coefficient of the other variable; sometimes even the sign of the regression coefficient
may change from + to – or vice versa, as demonstrated for the data given below.
y x1 x2
10 12 25
18 16 21
18 20 22
25 22 18
21 25 17
32 24 15
It may be verified that the correlation between x1 and x2 is 0.91 indicating the existence of mul-
ticollinearity.
It may be verified that the regression equation of y on x1 is
y = – 3.4 + 1.2 x1 (i)
the regression equation of y on x2 is
y = 58.0 – 1.9 x2 (ii)
and the regression equation of y on x1 and x2 is
y = 70.9 – 0.3 x1 – 2.3 x2 (iii)
It may be noted that the coefficient of x1 (1.2) in (i) which was positive when the regression equation
of y on x1 was considered, has become negative (–0.3) in equation (iii), when x2 is also included in
the regression equation. This is due to high correlation of –0.91 between x1 and x2. It is, therefore,
desirable to take adequate care of multicollinearity.
type of paint. Further, even if the four variables mentioned above are found to be significantly con-
tributing, as a whole, to the sales of the paint, but one or some of these might not be influencing
the sales in a significant way.
For example, it might happen that the sales are insensitive to the advertising expenses i.e. increas-
ing the expenditure on advertising might not be increasing the sales in a significant way. In such
case, it is advisable to exclude this variable from the model, and use only the other three variables.
As explained in the next section, it is not advisable to include a variable unless its contribution to
variation in the dependent variable is significant. These issues will be explained with examples in
subsequent sections.
14.2.13.3 Stepwise Method This method is used when a researcher wants to find out, which
independent variables significantly contribute in the regression model, out of a set of independent
variables. This method finds the best fit model, i.e. the model which has a set of independent variables
that contribute significantly in the regression equation.
For example, if a researcher has identified some three independent variables that may affect the
dependent variable, and wants to find the best combination of these three variables which contribute
significantly in the regression model, he or she may use stepwise regression. The software would
give the exact set of variables that contribute or are worth keeping in the model.
There are three most popular stepwise regression methods namely, forward regression, backward
regression and stepwise regression. In forward regression, one independent variable is entered with
dependent variable and the regression equation is arrived along with other tests like ANOVA, t tests
etc.; in the next iteration, one more independent variable is added and is compared with the previous
model. If the new variable significantly contributes in the model, it is kept, otherwise it is thrown
out from the model. This process is repeated for each remaining independent variables, thus arriv-
ing at a model that is significant containing all contributing independent variables. The backward
method is exactly opposite to this method. In case of backward method, initially all the variables
are considered and they are removed one by one if they do not contribute in the model.
The stepwise regression method is a combination of the forward selection and backward elimina-
tion methods. The basic difference between this and the other two methods is that in this method,
even if a variable is selected in the beginning or gets selected subsequently, it has to keep on com-
peting with the other entering variables at every stage to justify its retention in the equation.
These steps are explained in the next section, with an example.
Sr. Company M-Cap Oct. Net Sales Sept.’ Net Profit Sept’ P/E as on Oct.
No ’ 05 05 05 31’ 05
Company Amount Rank Amount Rank Amount Rank Amount Rank
1 Infosys Technologies 68560 3 7836 29 2170.9 10 32 66
2 Tata Consultancy Services 67912 4 8051 27 1831.4 11 30 74
3 Wipro 52637 7 8211 25 1655.8 13 31 67
4 Bharti Tele-Ventures * 60923 5 9771 20 1753.5 12 128 3
(Contd)
14.22 Business Research Methodology
(Contd)
In the above example, we take Market Capitalisation as the dependent variable, and Net Profit,
P/E Ratio and Net Sales as independent variables.
We may add that this example is to be viewed as an illustration of selection of optimum number
of independent variables, and not the concept of financial analysis.
The notations used for the variables are as follows:
Y Market Capitalisation
x1 Net Sales
x2 Net Profit
x3 P/E Ratio
Step I:
First of all we calculate the total correlation coefficients among all the dependent and independent
variables. We also calculate the correlation coefficients of the dependent variable. These are tabu-
lated below.
Multivariate Statistical Techniques 14.23
1 2 3
Net Sales Net Profit P/E Ratio
Net Sales 1.0000
Net Profit 0.7978 1.0000
P/E Ratio –0.5760 –0.6004 1.0000
Market Cap 0.6874 0.8310 –0.2464
We note that the correlation of y with x2 is the highest. We, therefore, start by taking only this
variable in the regression equation.
Step II:
The regression equation of y on x2 is
y = 15465 + 7.906 x2__
__
The values of R and R2 are : R2 = 0.6906 R2 = 0.6734
2
Step III:
Now, we derive two regression equations, one by adding x1 and one by adding x3 to see which
combination viz. x2 and x3 or x2 and x1 is better.
The regression equation of y on x2 and x1 is
Y = 14989 + 7.397 x2 + 0.135 x1
__ __
The values of R and R are: R2 = 0.6922 R2 = 0.656
2 2
It may be noted that inclusion of x1 in the model has __very marginally increased the value of R2
from 0.7903 to 0.8016, but the adjusted value of R2 i.e. R2 has come down from 0.7656 to 0.7644.
Thus, it is not worthwhile to add the variable x1 to the regression model having variables as x2 and
x 3.
14.24 Business Research Methodology
Step V:
The advisable regression model is by including only x2 and x3
Y = – 19823 +10.163 x2 + 1352.4 x3 (14.14)
__
2
This is the best regression equation fitted to the data on the basis of R criterion, as discussed
above.
We have discussed the same example to illustrate the method using SPSS in Section 14.2.17.
The next box that appears is shown in the following SPSS snapshot:
SPSS Snapshot MRA 2
Multivariate Statistical Techniques 14.27
If the method of selection is General method, ‘Enter’ should be selected from the drop box. If
the method is stepwise, ‘Stepwise’ should be selected from the list. We have explained the criteria
for selecting appropriate method in Section 14.2.14.
The next step is to click on ‘Statistics’ button in the bottom of the box. When one clicks on
Statistics, the following box will appear.
SPSS Snapshot MRA 3
The Durbin–Watson Statistic is used to check the assumption of regression analysis which states
that the error terms should be uncorrelated. While its desirable value is 2, the desirable range is 1.5
to 2.5.
After clicking ‘Continue’, SPSS will return to screen as in SPSS Snapshot MRA 2.
In this snapshot, click on the plots button at the bottom. The next box that would appear is given
in the following Snapshot MRA 4.
SPSS Snapshot MRA 4
The residual analysis is done to check the assumptions of multiple regression that the residu-
als should be normally distributed. This assumption can be checked by viewing the histogram and
normal probability plot.
14.28 Business Research Methodology
After clicking ‘Continue’, SPSS will take back to the Snapshot MRA 3.
By clicking ‘OK’, SPSS will carry out the analysis and give the output in the Output View.
We will discuss two outputs using the same data. One, by using General method for entering
variables, and the other by selecting stepwise method for entering variables.
‘Part and partial correlations matrix’ is useful in understanding the relationships between the
independent and dependent variables. The regression analysis is valid only if the independent and
dependent variables are not interrelated. If these are related to each other, they may lead to mis-
interpretation of the regression equation. This is termed as multicollinearity, and its impact is de-
scribed in Section 14.2.10. The above correlation matrix is useful in checking the inter relationships
between the independent variables. In the above table, the correlations in square are correlation of
independent variables with dependent variables and are high (0.687 and 0.831) which means that
the two variables are related. Whereas the correlations between the independent variables (0.798,
–0.576 and –0.6) are high which means that this data may have multicollinearity. Generally, very
high correlations between the independent variables like more than 0.9, may make the entire regres-
sion analysis unreliable for interpreting the regression coefficients.
Multivariate Statistical Techniques 14.29
Variables Entered/Removedb
Since the method selected was Enter method or General method, this table does not communicate
any meaning.
Model Summaryb
Model R R Square Adjusted R Square Std. Error of the Estimate Durbin–Watson
1 0.895a 0.802 0.764 16682.585 0.982
a. Predictors: (Constant), peratio_oct05, Net_Sal_sept05, Net_Prof_sept05
b. Dependent Variable: m_cap_amt_oct05
This table gives the model summary for the set of independent and dependent variables. R2 for
the model is 0.802 which is high and means that around 80% of variation in dependent variable
(market capitalisation) is explained by the three independent variables (net sale, net profit and P/E
ratio). The Durbin–Watson statistic for this model is 0.982, which is very low. The desired value
is in the range 1.5 to 2.5. It may, therefore, be appended as a caution that the assumption that the
residuals are uncorrelated is not valid.
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 1.80E + 10 3 5996219888 21.545 0.000a
Residual 4.45E + 09 16 278308655.0
Total 2.24E + 10 19
a. Predictors: (Constant), peratio_oct05, Net_Sal_sept05, Net_Prof_sept05
b. Dependent Variable: m_cap_amt_oct05
The ANOVA table for the regression analysis indicates whether the model is significant, and
valid or not. The ANOVA is significant, if the ‘Sig.’ column in the above table is less than the level
of significance (generally taken as 5% or 1%). Since 0.000 < 0.01, we conclude that this model is
significant.
If the model is not significant, it implies that no relationship exists between the set of variables.
Coefficientsa
Model Unstandardized Standardized Correlations
Coefficients Coefficients
B Std. Error Beta t Sig. Zero-order Partial Part
1. (Constant) –3531.5 13843.842 –1.700 0.109
Net_Sal_sept05 0.363 0.381 0.180 0.953 0.355 0.687 0.232 0.106
Net_Prof_sept05 8.954 1.834 0.941 4.882 0.000 0.831 0.774 0.544
peratio_oct05 1445.613 486.760 0.422 2.970 0.009 –0.246 0.596 0.331
a. Dependent Variable: m_cap_amt_oct05
14.30 Business Research Methodology
This table gives the regression coefficients and their significance. The equation can be considered
as:
Market capitalisation = – 23531.5 + 0.363 × Net Sales + 8.954 × Net Profits + 1445.613 × P/E
Ratio
Residuals Statisticsa
Charts
The above chart is to test the validity of the assumption that the residuals are normally distributed.
Looking at the chart one may conclude that the residuals are more or less normal. This can be tested
using Chi-square goodness of fit test.
Multivariate Statistical Techniques 14.31
Correlations
M_cap_amt_ Net_Sal_ Net_Prof_ peratio_oct05
oct05 sept05 sept05
Pearson Correlation M_cap_amt_oct05 1.000 0.687 0.831 0.246
Net_Sal_sept05 0.687 1.000 0.798 –0.567
Net_Prof_sept05 0.831 0.798 1.000 –0.600
peratio_oct05 –0.246 –0.576 –0.600 1.000
Sig. (1-tailed) M_cap_amt_oct05 ⋅ 0.000 0.000 0.148
Net_Sal_sept05 0.000 ⋅ 0.000 0.004
Net_Prof_sept05 0.000 0.000 ⋅ 0.003
peratio_oct05 0.148 0.004 0.003 ⋅
N M_cap_amt_oct05 20 20 20 20
Net_Sal_sept05 20 20 20 20
Net_Prof_sept05 20 20 20 20
peratio_oct05 20 20 20 20
Variables Entered/Removeda
In the previous method, there was only one model. Since this is a stepwise method, it will give
all models that are significant in each step. The Durbin–Watson statistic is improved from the previ-
ous model but is still less than the desired range (1.5 to 2.5). The second model (stepwise) model
is generally the best model. This can be verified by the higher value of R2. It consists of dependent
variable, market capitalisation and independent variables, Net profit and P/E Ratio is the best model.
The following table gives coefficients for the best model:
The following table gives ANOVA for all the iterations (in this case 2), and both are
significant:
ANOVAc
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Correlations
Model B Std. Error Beta t Sig. Zero-order Partial Part
1. (Constant) 15465.085 5484.681 2.820 0.011
Net_Prof_sept05 7.906 1.247 0.831 6.338 0.000 0.831 0.831 0.831
2. (Constant) –19822.8 13249.405 –1.496 .153
Net_Prof_sept05 10.163 1.321 1.068 7.691 0.000 0.831 0.881 0.854
peratio_oct05 1352.358 475.527 0.395 2.844 .011 –0.246 0.568 0.316
a. Dependent Variable: m_cap_amt_oct05.
It may be noted in the above Table that the values of the constant and regression coefficients are
the same as in equation 14.14, derived manually. The SPSS stepwise regression did this automati-
cally, and the results we got are the same.
Multivariate Statistical Techniques 14.33
The following table gives summary of excluded variables in the two models:
Excluded Variablesc
There may be a situation that a researcher would like to divide the data into two parts, and use
one part to derive the model and other part to validate the model. SPSS allows to split the data into
two groups termed as estimation group and validation group. The estimation group is used to fit
the model, which is validated using validation group. This improves the validity of the model. This
process is called as cross validation. This method can be used only if the data is large enough to
fit the model. Random variable functions from SPSS can be used to select the data randomly from
the SPSS file.
example is the variable ‘marks’ (percentage or percentile), in an examination which are used to
classify students in two or more categories. As is well known even marks cannot guarantee 100%
accurate classification.
Discriminant analysis is used to analyse relationships between a non-metric dependent variable
and metric or dichotomous (Yes/No type or Dummy) independent variables. Discriminant analysis
uses the independent variables to distinguish among the groups or categories of the dependent vari-
able. The discriminant model can be valid or useful only if it is accurate. The accuracy of the model
is measured on the basis of its ability to predict the known group memberships in the categories of
the dependent variable.
Discriminant analysis works by creating a new variable called the discriminant function score
which is used to predict to which group a case belongs. The computations find the coefficients for
the independent variables that maximise the measure of distance between the groups defined by the
dependent variable.
The discriminant function is similar to a regression equation in which the independent variables
are multiplied by coefficients and summed to produce a score.
The general form of discriminant function is:
D = b 0 + b 1X 1 + b 2X 2 + … … + b kX k (14.16)
D = Discriminant Score
bi = Discriminant coefficients or weights
Xi = Independent variables
The weights bi s are calculated by using the criteria that the groups differ as much as possible
on the basis of discriminant function.
If the dependent variable has only two categories, the analysis is termed as discriminant analysis.
If the dependent variable has more than two categories, then the analysis is termed as Multiple
Discriminant Analysis.
In case of multiple discriminant analysis, there will be more than one discriminant function. If
the dependent variable has three categories like high risk, medium risk, low risk, there will be two
discriminant functions. If dependent variable has four categories, there will be three discriminant
functions. In general, the number of discriminant functions is one less than the number of categories
of the dependent variable.
It may be noted that in case of multiple discriminant functions, each function needs to be signifi-
cant to conclude the results.
The following illustrations explain the concepts and the technique of deriving a Discriminant
function, and using it for classification. The objective in this example is to explain the concepts in
a popular manner without mathematical rigour.
Illustration 14.2
Suppose, we want to predict whether a science graduate, studying inter alia the subjects of Phys-
ics and Mathematics, will turn out to be a successful scientist or not. Here, it is premised that the
performance of a graduate in Physics and Mathematics, to a large extent, contributes in shaping up
a successful scientist. The next step is to select some successful and some unsuccessful scientists,
and record the marks obtained by them in Mathematics and Physics in their graduate examination.
While in real life application, we have to select sufficient, say 10 or more number of students in
both categories, just for the sake of simplicity, let the data on two successful and two unsuccessful
scientists be as follows:
Multivariate Statistical Techniques 14.35
__ __ __ __
Average : Ms = 10 Ps = 9 Mu = 8 Pu = 8
S: Successful U: Unsuccessful
It may be mentioned that the marks as 8, 10, etc. are taken just for the sake of ease in calcula-
tions.
The discriminant function assumed is
Z = w1 M + w2 P
The requisite calculations on the above data yield
w1 = 9 and w2 = 23
Thus, the discriminant function works out to be
Z = 9 M + 23 P
and the discriminant score works out to be
__ __ __ __
× Ms + 23Ps) + (9 × MU + 23PU)
(9_____________________________
ZC = 2
9 ×___________________________
10 + 23 × 9 + 9 × 8 + 23 × 8
= 2
= 276.5
This discriminant score helps us to predict whether a graduate student will turn out to be a suc-
cessful scientist or not. This score for the two successful scientists is 292 and 302, both being more
than the discriminant score 276.5, the score is 214 and 270 for unsuccessful scientists, both being
14.36 Business Research Methodology
less than 276.5. If a young graduate gets 11 marks in Mathematics and 9 marks in Physics, his or
her score as per the discriminant function is 9 × 11 + 23 × 9 = 306. Since this is more than the dis-
criminant score of 276.5, we can predict that this graduate will turn out to be a successful scientist.
This is pictorially depicted in the following:
It may be noted that the both the successful scientists’ scores are above the discriminant line and
the scores of both the unsuccessful scientists are below the discriminant line.
The student with assumed marks is classified in the category of successful scientist.
This example illustrates that with the help of past data, about objects including entities, indi-
viduals, etc., and their classification in two categories, one could derive the discriminant function
and the discriminant scores. Subsequently, if the same type of data is given for some other object,
the discriminant score could be worked out for that object and thus classify it in either of the two
categories.
ensure 100% accurate classification. It is, therefore, logical to measure the goodness of a function
that would indicate the extent of confidence one could attach to the obtained results.
14.3.2.1 Assumptions of Discriminant Analysis The first requirement for using discriminant
analysis is that the dependent variable should be non-metric and the independent variable should be
metric or dummy.
The ability of discriminant analysis to derive discriminant function that provides accurate classi-
fications is enhanced when the assumptions of normality, linearity and homogeneity of variance are
satisfied. In discriminant analysis, the assumption of linearity applies to the relationships between
pairs of independent variable. This can be verified from the correlation matrix, defined in Section
14.2.3. Like multiple regression, multicollinearity in discriminant analysis is identified by examin-
ing ‘tolerance’ values. The multicollinearity problem can be resolved by removing or combining
the variables with the help of Principal Component Analysis discussed in Section 14.6.3.
The assumption of homogeneity of variance is important in the classification stage of discriminant
analysis. If one of the groups defined by the dependent variable has greater variance than the others,
more cases will tend to be classified in that group. Homogeneity of variance is tested with Box's M
test, which tests the null hypothesis that the group variance–covariance matrices are equal. If we
fail to reject this null hypothesis and conclude that the variances are equal, we may use a pooled
variance–covariance matrix in classification.
14.3.2.2 Tests Used for Measuring Goodness of a Discriminant Function There are two
tests for judging goodness of a discriminant function:
1. An F test, Wilks' lambda Λ is used to test if the discriminant model as a whole is
significant. Wilks' Λ for each independent variable is calculated using the formula:
Within Group Sum of Squares
Wilks' Λ = _________________________
Total Sum of Squares
It lies between 0 and 1. The large value of Λ indicates that there is no difference in the group
means for the independent variable. Small values of Λ indicate group means are different for
the independent variable. Smaller the value of Λ, more is the discriminating power of the
variable, in the group.
2. If the F test shows significance, then the individual independent variables are assessed to see
which of these differ significantly (in mean) by group and are subsequently used to classify
the dependent variable.
(Contd)
Eigenvalue Eigenvalue for each discriminating function is defined as the ratio between groups to
within group sum of squares. More the eigenvalue, more appropriate is the differen-
tiation, hence the model. There is one eigenvalue for each discriminant function. For
two-group DA, there is one discriminant function and one eigenvalue, which accounts
for all of the explained variance. If there is more than one discriminant function, the
first will be the largest and most important, the second next most important is explana-
tory power and so on.
Relative Percentage The relative percentage of a discriminant function equals a function's eigenvalue di-
vided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is
the percent of discriminating power for the model associated with a given discriminant
function. Relative % is used to tell how many functions are important.
The Canonical It measures the extent of association between the discriminant scores and the groups.
Correlation, R* When R* is zero, there is no relation between the groups and the function. When the
canonical correlation is large, there is a high correlation between the discriminant
functions and the groups. It may be noted that for two-group DA, the canonical cor-
relation is equivalent to the Pearson’s correlation of the discriminant scores with the
grouping variable.
Centroid Mean values for discriminant scores for a particular group. The number of centroids
equals the number of groups, being one for each group. Means for a group on all the
functions are the group centroids.
Discriminant Score The Discriminant Score, also called the DA score, is the value resulting from apply-
ing a discriminant function formula to the data for a given case. The Z score is the
discriminant score for standardised data.
Cutoff If the discriminant score of the function is less than or equal to the cutoff, the case
is classified as 0, or if above the cutoff, it is classified as 1. When group sizes are
equal, the cutoff is the mean of the two centroids (for two-group DA). If the groups
are unequal, the cutoff is the weighted mean.
Standardised Discriminant Also termed as standardised canonical discriminant function coefficients, they are used
Coefficients to compare the relative importance of the independent variables, much as beta weights
are used in regression. Note that the importance is assessed relative to the model be-
ing analysed. Addition or deletion of variables in the model can change discriminant
coefficients markedly.
Functions at Group The mean discriminant scores for each of the dependent variable categories for each of
Centroids the discriminant functions in MDA. Two-group discriminant analysis has two centroids,
one for each group. We want the means to be well apart to show the discriminant func-
tion is clearly discriminating. The closer the means, the more errors of classification.
(Model) Wilks' lambda Used to test the significance of the discriminant function as a whole. The "Sig." level
for this function is the significance level of the discriminant function as a whole. The
larger the lambda, the more likely it is significant. A significant lambda means one
can reject the null hypothesis that the two groups have the same mean discriminant
function scores and conclude the model as discriminating.
ANOVA Table for Another overall test of the DA model. It is an F test, where a "Sig." p value < 0.05
Discriminant Scores means the model differentiates discriminant scores between the groups significantly.
(Contd)
Multivariate Statistical Techniques 14.39
(Contd)
(Variable) Wilks' lambda It can be used to test which independents contribute significantly to the discrimiinant
function. The smaller the value of Wilks' lambda for an independent variable, the more
that variable contributes to the discriminant function. Lambda varies from 0 to 1, with
0 meaning group means differ (thus, the more the variable differentiates the groups),
and 1 meaning all group means are the same.
Classifi cation Matrix or Also called assignment, or prediction matrix or table, is used to assess the performance
Confusion Matrix of DA. This is a table in which the rows are the observed categories of the dependent
and the columns are the predicted categories of the dependents. When prediction is
perfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is
the percentage of correct classifications. This percentage is called the hit ratio.
Expected hit ratio The hit ratio is not relative to zero but to the percent that would have been correctly
classified by chance alone. For two-group discriminant analysis with a 50–50 split in
the dependent variable, the expected percentage is 50%. For unequally split 2-way
groups of different sizes, the expected percent is computed in the "Prior Probabilities
for Groups" table in SPSS, by multiplying the prior probabilities times the group size,
summing for all groups, and dividing the sum by N. The best strategy is to pick the
largest group for all cases, the expected percent is then the largest group size divided
by N.
Cross-validation Leave-one-out classification is available as a form of cross-validation of the classifi-
cation table. Under this option, each case is classified using a discriminant function
based on all cases except the given case. This is thought to give a better estimate of
what classificiation results would be in the population.
Measures of association Can be computed by the crosstabs procedure in SPSS if the researcher saves the pre-
dicted group membership for all cases.
Mahalanobis D-Square, Indices other than Wilks' lambda indicating the extent to which the discriminant func-
Rao’s V, Hotelling's trace, tions discriminate between criterion groups. Each has an associated significance test.
Pillai's trace, and Roy’s A measure from this group is sometimes used in stepwise discriminant analysis to
gcr (greatest characteristic determine if adding an independent variable to the model will significantly improve
root) classification of the dependent variable. SPSS uses Wilks' lambda by default but also
offers Mahalanobis distance, Rao's V, unexplained variance and smallest F ratio on
selection.
Structure Correlations These are also known as discriminant loadings, can be defined as simple correlations
between the independent variables and the discriminant functions.
After opening the file bankloan.sav, one can click on ‘Analyse’ and ‘Classify’ as shown in the
following snapshot:
SPSS Snapshot DA 1
The next box that will appear is given in the following snapshot:
After entering the dependent variable and clicking on the ‘Define Range’ as shown above, SPSS
will open the following box:
SPSS Snapshot DA 2
Multivariate Statistical Techniques 14.41
After defining the variable, one should click on ‘Continue’ button as shown above. SPSS will go
back to the previous box shown below:
SPSS Snapshot DA 3
After selecting the dependent, independent variables and the method of entering variables one
may click on Statistics, SPSS will open a box as shown below:
SPSS Snapshot DA 4
After selecting the descriptives SPSS will go back to the previous box shown below:
14.42 Business Research Methodology
SPSS Snapshot DA 5
SPSS Snapshot DA 6
After clicking ‘Continue’, SPSS will be back to the previous box as shown in the Snapshot DA 6,
then click on the Save button at the bottom. SPSS will open a box as shown below:
SPSS Snapshot DA 7
Multivariate Statistical Techniques 14.43
After clicking ‘Continue’, SPSS will again go back to the previous window shown in Snapshot DA
6, at this stage one may click OK button. This will lead SPSS to analyse the data and the output
will be displayed in the output view of SPSS.
Output for Enter Method
We will discuss interpretation for each output.
Discriminant
Analysis Case Processing Summary
This table gives case processing summary, i.e. how may valid cases were selected, how many were
excluded (due to missing data), total and their respective percentages.
Group Statistics
Previously defaulted Mean Std. Deviation Valid N (listwise)
Unweighted Weighted
No Age in years 35.5145 7.70774 517 517.000
Years with current employer 9.5087 6.66374 517 517.000
Years at current address 8.9458 7.00062 517 517.000
Household income in thousands 47.1547 34.22015 517 517.000
Debt to income ratio (x100) 8.6793 5.61520 517 517.000
Credit card debt in thousands 1.2455 1.42231 517 517.000
Other debt in thousands 2.7734 2.81394 517 517.000
No Age in years 33.0109 8.51759 183 183.000
Years with current employer 5.2240 5.54295 183 183.000
Years at current address 6.3934 5.92521 183 183.000
Household income in thousands 41.2131 43.11553 183 183.000
Debt to income ratio (x100) 14.7279 7.90280 183 183.000
Credit card debt in thousands 2.4239 3.23252 183 183.000
(Contd)
14.44 Business Research Methodology
(Contd)
Other debt in thousands 3.8628 4.26368 183 183.000
Total Age in years 34.8600 7.99734 700 700.000
Years with current employer 8.3886 6.65804 700 700.000
Years at current address 8.2786 6.82488 700 700.000
Household income in thousands 45.6014 36.81423 700 700.000
Debt to income ratio (x100) 10.2606 6.82723 700 700.000
Credit card debt in thousands 1.5536 2.11720 700 700.000
Other debt in thousands 3.0582 3.28755 700 700.000
This table gives the group statistics of independent variables, for each categories (here yes and
no) of dependent variables.
Tests of Equality of Group Means
Analysis 1
Box's Test of Equality of Covariance Matrices
Log Determinants
Previously defaulted Rank Log Determinant
No 7 21.292
Yes 7 24.046
Pooled within-groups 7 22.817
The ranks and natural logarithms of determinants printed are those of the group covariance
matrices.
Test Results
Box's M 563.291
F Approx. 19.819
df1 28
df2 431743.0
Sig. .000
This table gives summary of canonical discriminant function. This indicates eigenvalue for this
model is 0.404 and canonical correlation is 0.536. Since there is single discriminant function all the
explained variation is contributed by the function. 53.60% of variation in the dependent variable is
explained by the model.
Wilks' Lambda
Test of Function(s) Wilks’ Lambda Chi-square df Sig.
1 0.712 235.447 7 0.000
This table tests the significance of the model. As seen in the Sig. column, the model is significant.
14.46 Business Research Methodology
Function
1
Age in years .122
Years with current employer –.829
Years at current address –.310
Household income in thousands .215
Debt to income ratio (×100) .603
Credit card debt in thousands .564
Other debt in thousands –.178
Structure Matrix
Function
1
Debt to income ratio (×100) 0.666
Years with current employer –0.464
Credit card debt in thousands 0.397
Years at current address –0.262
Other debt in thousands –0.232
Age in years –0.219
Household income in thousands –0.112
Function 1
Age in years 0.015
Years with current employer –0.130
Years at current address –0.046
Household income in thousands 0.006
Debt to income ratio (×100) 0.096
Credit card debt in thousands 0.275
Other debt in thousands –.0.055
(Constant) –0.576
Unstandardized Coefficients
This table gives the canonical correlations. Negative sign indicates inverse relation. For example,
years at current employer is -0.130; it means that more the number of years spent at the current
employer, lesser the chance that the person will default.
Multivariate Statistical Techniques 14.47
Function
Previously defaulted 1
No –0.377
Yes 1.066
Classification Statistics
Classification Processing Summary
Processed 850
Excluded Missing or out-of-range 0
group codes
At least one missing
discriminating variable 0
Used in Output 850
Previously Defaulted
No Yes
Age in years 0.803 0.825
Years with current employer –0.102 –0.289
Years at current address –0.294 –0.360
Household income in thousands 0.073 0.081
Debt to income ratio (×100) 0.639 0.777
Credit card debt in thousands –1.004 –0.608
Other debt in thousands –1.044 –1.124
(Constant) –15.569 –16.898
Fisher’s linear discriminant functions
14.48 Business Research Methodology
This is classification matrix or confusion matrix. This gives the percentage of cases that are
classified correctly i.e. the hit ratio. This hit ratio should be at least 25% more than the random
probability.
In the above example, 532 of the 700 cases are classified correctly. Overall, 76% of the cases are
classified correctly. 139 out of 263 defaulters were identified correctly.
SPSS Snapshot DA 8
After clicking on method, SPSS will open a window as shown in the following:
SPSS Snapshot DA 9
Stepwise Statistics
Variables Entered/Removed a,b,c,d
Min. D Squared
Exact F
Step Entered Statistics Between Statistic df1 df2 df3
Groups
1 Debt to income ratio (×100) 0.924 No and Yes 124.889 1 698.000 0.000
2 Years with current employer 1.501 No and Yes 101.287 2 697.000 0.000
3 Credit card debt in thousands 1.926 No and Yes 86.502 3 696.000 0.000
4 Years at current address 2.038 No and Yes 68.572 4 695.000 0.000
At each step, the variable that maximizes the Mahalanobis distance between the two closest groups is entered.
(a) Maximum number of steps is 14.
(b) Minimum partial F to enter is 3.84.
(c) Maximum partial F to remove is 2.71.
(d) F level, tolerance, or VIN insufficient for further computation.
Wilks' Lambda
Number of
Step Variables Lambda df1 df2 df3 Exact F
These tables give summary of variables that are in analysis variables that are not in the analysis
and the model at each step, its significance.
Multivariate Statistical Techniques 14.51
It can be concluded that variables Debt to income ratio (x100), Years with current employer,
Credit card debt in thousands, Years at current address remain in the model and others are removed
from the model. This means that only these variables contribute in the model.
The logistic equation (8) can be reduced to a linear form by converting the probability p into log
of (p)/(1 – p)p or logit as follows:
y = log [(p)/(1 – p))] = a + bx (2)
The logarithm, here, is the natural logarithm to the base ‘e’. The logarithm of any number to
this base is obtained by multiplying the logarithm to the base 10 by log of 10 to the base ‘e’ i.e.
2.303.
The equation (8) is similar to a regression equation. However, here, a unit change in the indepen-
dent variable causes a change in the dependent variable, expressed as logit rather than the dependent
variable p, directly. Such regression analysis is known as Logistic Regression.
The fitting of a logistic regression equation is explained through an illustration wherein data was
recorded on the CGPA (up to first semester in the second year of MBA) of 20 MBA students, and
their success in the first interview for placement. The data collected was as follows where Pass is
indicated as “1” while Fail is indicated as “0”.
Student (Srl. No.) 1 2 3 4 5 6 7 8 9 10
CGPA 3.12 3.21 3.15 3.45 3.14 3.25 3.16 3.28 3.22 3.41
Result of First Interview 0 1 0 0 0 1 1 1 0 1
Student (Srl. No.) 11 12 13 14 15 16 17 18 19 20
CGPA 3.48 3.34 3.25 3.46 3.32 3.29 3.42 3.28 3.36 3.31
Result of First Interview 1 1 0 1 1 1 1 1 1 0
Now, given this data, can we find the probability of a student succeeding in the first interview given
the CGPA?
y = 1.83 x – 5.37
Instead, let us now attempt to fit a logistic regression to the student data. We will do this by comput-
ing the logits and then fitting a linear model to the logits. To compute the logits, we will regroup
the data by CGPA into intervals, using the midpoint of each interval for the independent variable.
We calculate the probability of success based on the number of students that passed the interview
for each range of CGPAs. This results in the following data:
Multivariate Statistical Techniques 14.53
We plot the Logit against the CGPA and then look for the linear fit which gives us the equation:
y = 3.667 x – 11.852
Thus, if p is the probability of passing the interview and x is the CGPA, the logistic regression
can be expressed as:
p
(
ln _____ )
1 – p = 3.667x – 11.852
Converting the logarithm to an equivalent exponential form, this equation can also be expressed as
expressed as:
e3.677x – 11.852
p = _______________
1 + e3.667x – 11.852
x 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
y* –2.68 –2.32 –1.95 –1.58 –1.22 –0.85 –0.48 –0.12 0.25 0.62 0.98 1.35 1.72 2.08 2.45 2.82
p 6% 9% 12% 17% 23% 30% 38% 47% 56% 65% 73% 79% 85% 89% 92% 94%
From this regression model, we can see that probability of success at the interview is below 25%
for CGPAs below 2.90 but is above 75% for CGPAs above 3.60.
While one could apply logistic regression to a number of situations, it has been found useful
particularly in the following situations:
Credit – Study of creditworthiness of an individual or a company. Various demographic and
credit history variables could be used to predict if an individual will turn out to be ‘good’ or
‘bad’ customers.
Marketing/Market Segmentation – Study of purchasing behaviour of consumers. Various demo-
graphic and purchasing information could be used to predict if an individual will purchase an
item or not.
Customer loyalty – The analysis could be done to identify loyal or repeat customers using vari-
ous demographic and purchasing information.
Medical – Study of risk of diseases/body disorder.
(Contd)
Stepwise logistic In stepwise logistic regression, the three methods available are enter, backward and
regression forward. In enter method, all variables are included in logistic regression, irrespective
the variable is significant or insignificant. In backward method, the model starts with
all variables and removes nonsignificant variables from the list. In forward method,
logistic regression starts with single variable and adds one by one variable and tests
significance and removes insignificant variables from the model.
Measures of Effect Size In logistic regression, R2 is no more accepted because R2 tells us the variance extrac-
tion by the independent variable. The maximum value of the Cox and Snell r-squared
statistic is actually somewhat less than 1; the Nagelkerke r-squared statistic is a "cor-
rection" of the Cox and Snell statistic so that its maximum value is 1.
Classification Table The classification table shows the practical results of using the logistic regression model.
It is useful to understand validity of the model.
LR Snapshot 1
LR Snapshot 2
LR Snapshot 3
SPSS will take back to the window as displayed in LR Snapshot 2, at this stage click on Options.
The following window will be opened.
Multivariate Statistical Techniques 14.57
LR Snapshot 4
SPSS will be back to the window as shown in LA Snapshot 2. At this stage, click OK. Following
output will be displayed.
Logistic Regression
Case Processing Summary
This table indicates the case processing summary 700 out of 850 cases are used for the analysis
150 are ignored as these have missing values.
Dependent Variable Encoding
Original Value Internal Value
No 0
Yes 1
This table indicates the coding for the dependent variable 0=>not defaulted, 1=>defaulted
14.58 Business Research Methodology
Predicted
Observed Percentage
Previously defaulted Correct
No Yes
Model Summary
The Hosmer–Lemeshow statistic indicates a poor fit if the significance value is less than 0.05.
Here since the value is above 0.05, the model adequately fits the data.
Classification Tablea
Predicted
Percentage
Previously defaulted Correct
Observed No Yes
Step 1 Previously No 490 27 94.8
defaulted Yes 137 46 25.1
Overall Percentage 76.6
Step 2 Previously No 481 36 93.0
defaulted Yes 110 73 39.9
Overall Percentage 79.1
Step 3 Previously No 477 40 92.3
defaulted Yes 99 84 45.9
Overall Percentage 80.1
Step 3 Previously No 478 39 92.5
defaulted Yes 91 92 50.3
Overall Percentage 81.4
a. The cut value is .500
This table is the classification table. It indicates the number of cases correctly classified as well
as incorrectly classified. Diagonal elements represent correctly classified cases and non-diagonal
elements represent incorrectly classified cases.
14.60 Business Research Methodology
It may be noted that for each step, the number of correctly classified cases are improved than
in the previous step. The last column gives the percentage of correctly classified cases, which is
improved at each step.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step debtnic 0.132 0.014 85.377 1 0.000 1.141
a
1 Constant –2.531 0.195 168.524 1 0.000 0.080
Step employ –0.141 0.019 53.755 1 0.000 0.868
2b debtinc 0.145 0.016 87.231 1 0.000 1.156
Constant –1.693 0.219 59.771 1 0.000 0.184
Step employ –.244 0.027 80.262 1 0.000 0.783
c
3 debtinc 0.088 0.018 23.328 1 0.000 1.092
creddebt 0.503 0.081 38.652 1 0.000 1.653
Constant –1.227 0.231 28.144 1 0.000 0.293
Step employ –0.243 0.028 74.761 1 0.000 0.785
d
4 address 0.020 17.183 1 0.000 0.922
debtinc 0.088 0.019 22.659 1 0.000 1.092
credebt 0.573 0.087 43.109 1 0.000 1.774
Constant –0.791 0.252 9.890 1 0.002 0.453
a. Variable(s) entered on step 1: debtinc.
b. Variable(s) entered on step 2: employ.
c. Variable(s) entered on step 3: creddebt.
d. Variable(s) entered on step 4: address.
The best model is usually the last model i.e. step 4. It contains variables: years to current em-
ployee, years at current address, debt to income ratio and credit card debt. All other variables are
insignificant in the model.
Model if Term Removed
Score df Sig.
Step Variables age 16.478 1 0.000
1 employ 60.934 1 0.000
address 23.474 1 0.000
income 3.219 1 0.073
creddebt 2.261 1 0.133
othdebt 6.631 1 0.010
Overall Statistics 113.910 6 0.000
Step Variables age 0.006 1 0.939
2 address 8.407 1 0.004
income 21.437 1 0.000
credebt 64.958 1 0.000
othdebt 4.503 1 0.034
Overall Statistics 84.064 5 0.000
Step Variables age 0.635 1 0.426
3 address 17.851 1 0.000
income 0.773 1 0.379
othdebt 0.006 1 0.940
Overall Statistics 22.221 4 0.000
Step Variables age 3.632 1 0.057
4 income 0.012 1 0.912
othdebt 0.320 1 0.572
Overall Statistics 4.640 3 0.200
The above table gives the scores which can be used to predict whether the person having certain
values of variable will default or not. In fact, the scores can be used to find the probability of de-
fault.
In this case, one of the conclusions drawn was that both the programmes had positive impact on
both knowledge and motivation but there was no significant difference between classrooms based
and job based training programmes.
As another example, we could assess whether a change in Compensation System-1 to Compensa-
tion System-2 has brought about changes in sales, profit and job satisfaction in an organisation.
MANOVA is typically used when there are more than one dependent variables, and independent
variables are qualitative/categorical.
It may be noted that the above example is of MANOCOVA as we have selected some categorical
variables and some metric variables.
We are assuming in the above example that the dependent variables are the investments in com-
modity market and in share market and the categorical independent variables are occupation and
how long the respondents block investments. The metric independent variables are age, respondent’s
rating for commodity market and share market. Here, we assume that their investments depend on
their ratings, occupation, age and how long they block their investments.
The following output will be displayed:
(Contd)
3 1 to 3 years 5
4 > 3 years 10
5 4
6 6
7 3
8 2
This table indicates the null hypothesis that the investments are equal for all occupations is re-
jected since the significance value (p-value) is less than 0.05 as indicated by circles. Thus, we may
conclude at 5% Level of Significance (LOS) that there is significant difference in the both invest-
ments (share market and commodity markets) and occupation of the respondents.
The null hypothesis that the investments are equal for different levels of time the investment
blocked, is rejected since the significance value (p-value) is less than 0.05 as indicated by circles.
Thus, we may conclude at 5% LOS that there is significant difference in the both investments (share
market and commodity markets) and the period for which the respondents would likely to block
their money.
The other hypothesis about age, ratings of CM and ratings of SM are not rejected (as p-value
is greater than 0.05) this means there is no significant difference in the investments for these vari-
ables.
Tests of Between-Subjects Effects
Mr Pankaj is a Director in one of the most prestigious companies in the world. His parents
went from Pune to Delhi, for a visit when he was about 2 years old. After a few days after
the visit, his grandfather, incidentally, enquired as to what he would like to become when he
would grow old. Pankaj promptly replied “Bus Conductor”. All those present on the occasion
were taken aback with surprise, and asked the natural question “Why”? Prompt came the reply
“Because he controls the people from entering the bus and allows only some of them to enter”.
After few years, when he was about 4, his grandfather repeated the question. Prompt came the
reply “Jockey”. When asked to explain “why”; he replied having seen a film wherein how well
a jockey was controlling the horse and made him win a race. It was just a matter of chance that
the grandfather repeated the question to Pankaj when he was about 7 years old. This time, the
prompt reply without any hesitation was “Pilot”. The justification given was that he observed,
in an air show, how well a pilot controls the plane. One might infer that Pankaj in his thoughts
laid lot of importance to the aspect of ‘Control’ that could be termed in BRM terminology as
‘Construct’ defined in Chapter 2. This unobserved abstract feature could be called as a ‘Fac-
tor’ that was responsible for the observed responses. Or, alternatively one could say that the
observed responses reflected the inner desire to control. This is the essence of Factor Analysis.
It attempts to find out the factors responsible for observed human responses.
(Contd)
Communality The amount of variance, an original variable shares with all other variables included in
the analysis. A relatively high communality indicates that a variable has much in common
with the other variables taken as a group.
Eigen value Eigenvalue for each factor is the total variance explained by each factor.
Factor A linear combination of the original variables. Factor also represents the underlying
dimensions (constructs) that summarises or accounts for the original set of observed
variables.
Factor Loadings The factor loadings, or component loadings in PCA, are the correlation coefficients be-
tween the variables (given in output as rows) and factors (given in output columns) These
loadings are analogous to Pearson’s correlation coefficient r, the squared factor loading is
defined as the percent of variance in the respective variable explained by the factor.
Factor Matrix This contains factor loadings on all the variables on all the factors extracted.
Factor Plot or Ro- This is a plot where the factors are on different axis and the variables are drawn on these
tated Factor Space axes. This plot can be interpreted only if the number of factors are 3 or less.
Factor Scores Each individual observation has a score, or value, associated with each of the original
variables. Factor analysis procedures derive factor scores that represent each observation’s
calculated values, or score, on each of the factors. The factor score will represent an
individual’s combined response to the several variables representing the factor.
The component scores may be used in subsequent analysis in PCA. When the factors
are to represent a new set of variables that they may predict or be dependent on some
phenomenon, the new input may be factor scores.
Goodness of a Factor How well can a factor account for the correlations among the indicators?
One could examine the correlations among the indicators after the effect of the factor
is removed. For a good factor solution, the resulting partial correlations should be near
zero, because once the effect of the common factor is removed, there is nothing to link
the indicators.
Bartlett’s Test of This is the test statistics used to test the null hypothesis that there is no correlation
specificity between the variables.
Kaiser Meyer Olkin This is an index used to test appropriateness of the factor analysis. High values of this
(KMO) Measure of index, generally, more than 0.5, may indicate that the factor analysis is an appropriate
Sampling Adequacy measure, where as the lower values (less than 0.5) indicate that factor analysis may not
be appropriate.
Scree Plot A plot of eigenvalues against the factors in the order of their extraction.
Trace The sum of squares of the values on the diagonal of the correlation matrix used in the
factor analysis. It represents the total amount of variance on which the factor solution
is based.
There are a number of financial parameters/ratios for predicting health of a company. It would
be useful if only a couple of indicators could be formed as linear combination of the original pa-
rameters/ratios in such a way that the few indicators extract most of the information contained in
the data on original variables.
Further, in the regression model, if independent variables are correlated implying there is multi-
collinearity, then new variables could be formed as linear combinations of original variables which
themselves are uncorrelated. The regression equation can then be derived with these new uncorrelated
independent variables, and used for interpreting the regression coefficients as also for predicting
the dependant variable with the help of these new independent variables. This is highly useful in
marketing and financial applications involving forecasting, sales, profit, price, etc. with the help of
regression equations.
Further, analysis of principal components often reveals relationships that were not previously sus-
pected and thereby allows interpretations that would not be ordinarily understood. A good example
of this is provided by stock market indices.
Incidentally, PCA is a means to an end and not the end in itself. PCA can be used for inputting
principal components as variables for further analysing the data using other techniques such as
cluster analysis, regression and discriminant analysis.
The quantitative ability factor explains marks in subjects like Mathematics, Physics and Chemistry
and verbal ability explains marks in subjects like Languages and History.
In another study, a detergent manufacturing company was interested in identifying the major
underlying factors or dimensions that consumers used to evaluate various detergents. These factors
are assumed to be latent; however, management believed that the various attributes or properties of
detergents were indicators of these underlying factors. Factor analysis was used to identify these
underlying factors. Data was collected on several product attributes using a five-point scale. The
analysis of responses revealed existence of two factors viz. ability of the detergent to clean and its
mildness.
In general, the factor analysis performs the following functions:
Identifies the smallest number of common factors that best explain or account for the correlation
among the indicators.
Identifies a set of dimensions that are latent (not easily observed) in a large number of vari-
ables.
Devises a method of combining or condensing a large number of consumers with varying pref-
erences into distinctly different number of groups.
Identifies and creates an entirely new smaller set of variables to partially or completely replace
the original set of variables for subsequent regression or discriminant analysis from a large
number of variables. It is especially useful in multiple regression analysis when multicollinear-
ity is found to exist as the number of independent variables is reduced by using factors and
thereby minimising or avoiding multicollinearity. In fact, factors are used in lieu of original
variables in the regression equation.
In yet another study, another group of students of a management institute conducted a survey to
identify the factors that influence the purchasing decision of a motorcycle in the 125 cc category.
Through the use of Principal Component Analysis and factor analysis using computer software, the
group concluded that the following three parameters are most important:
Comfort
Assurance
Long-term Value
After opening the file Car_sales.sav, one can click on Analyze – Data Reduction and Factor as
shown in the following snapshot:
FA Snapshot 1
14.72 Business Research Methodology
SPSS will take back to the window as shown in FA Snapshot 6. Click on the button ‘Scores’. This
will open a new window as shown as follows:
FA Snapshot 8
This will take back to window as shown in FA Snapshot 7, in this window now click on ‘Ok’. SPSS
will give the following output. We shall explain each in brief.
Factor Analysis
Correlation Matrix
Vehicle Price in Engine Horse- Wheel- Width Length Curb Fuel Fuel
type thou- size power base weight capacity efficiency
sands
Correlation Vehicle type 1.000 –0.042 0.269 0.017 0.397 0.260 0.150 0.526 0.599 –0.577
Price in –0.042 1.000 0.624 0.841 0.108 0.328 0.155 0.527 0.424 –0.492
thousands
Engine size 0.269 0.624 1.000 0.837 0.473 0.692 0.542 0.761 0.667 –0.737
Horsepower 0.017 0.841 0.837 1.000 0.282 0.535 0.385 0.611 0.505 –0.616
Wheelbase 0.397 0.108 0.473 0.282 1.000 0.681 0.840 0.651 0.657 –0.497
Width 0.260 0.328 0.692 0.535 0.681 1.000 0.706 0.723 0.663 –0.602
Length 0.150 0.155 0.542 0.385 0.840 0.706 1.000 0.629 0.571 –0.448
Curb weight 0.526 0.527 0.761 0.611 0.651 0.723 0.629 1.000 0.865 –0.820
Fuel capacity 0.599 0.424 0.667 0.505 0.657 0.663 0.571 0.865 1.000 –0.802
Fuel efficiency –0.577 –0.492 –0.737 –0.616 –0.497 –0.602 –0.448 –0.820 –0.802 1.000
This is the correlation matrix. The PCA can be carried out if the correlation matrix for the vari-
ables contains at least two correlations of 0.30 or greater. It may be noted that the correlations >0.3
are marked in circle.
KMO and Bartlett’s Test
Kaiser-Meyer-Olkin Measure of Sampling
0.833
Adequacy
Bartlett’s Test of Sphericity Approx. Chi-Square 1578.819
df 45
Sig. 0.000
data is 0.833 and the chi-square statistics is significant (<0.05). This means the principal component
analysis is appropriate for this data.
Communalities
Initial Extraction
Vehicle type 1.000 .930
Price in thousands 1.000 .876
Engine size 1.000 .843
Horsepower 1.000 .933
Wheelbase 1.000 .881
Width 1.000 .776
Length 1.000 .919
Curb weight 1.000 .891
Fuel capacity 1.000 .861
Fuel efficiency 1.000 .860
Extraction Method: Principal Component Analysis.
Extraction communalities are estimates of the variance in each variable accounted for by the
components. The communalities in this table are all high, which indicates that the extracted com-
ponents represent the variables well. If any communalities are very low in a principal components
extraction, you may need to extract another component.
Total Variance Explained
The scree plot gives the number of components against the eigenvalues and helps to determine
the optimal number of components.
Incidentally, "scree" is the geological term referring to the debris which gets deposited on the
lower part of a rocky slope.
The components having steep slope indicate that good percentage of total variance is explained by
that component, hence the component is justified. The shallow slope indicates that the contribution
of total variance is less, and the component is not justified. In the above plot, the first three compo-
nents have steep slope and later the slope is shallow. This indicates the ideal number of components
is three.
Component Matrixa
Component
1 2 3
Vehicle type .471 .533 –.651
Price in thousands .580 –.729 –.092
Engine size .871 –.290 .018
Horsepower .740 –.618 .058
Wheelbase .732 .480 .340
Width .821 .114 .298
Length .719 .304 .556
Curb weight .934 .063 –.121
Fuel capacity .885 .184 –.210
Fuel efficiency –.863 .004 .339
Extraction Method: Principal Component Analysis.
a. 3 components extracted.
Multivariate Statistical Techniques 14.77
This table gives each variable component loadings but it is the next table, which is easy to inter-
pret.
Rotated Component Matrix
Component
1 2 3
Vehicle type –0.101 0.095 0.954
Price in thousands 0.935 –0.003 0.041
Engine size 0.753 0.436 0.292
Horsepower 0.933 0.242 0.056
Wheelbase 0.036 0.884 0.314
Width 0.384 0.759 0.231
Length 0.155 0.943 0.069
Curb weight 0.519 0.533 0.581
Fuel capacity 0.398 0.495 0.676
Fuel efficiency –0.543 –0.318 –0.681
This table is the most important table for interpretation. The maximum of each row (ignoring
sign) indicates that the respective variable belongs to the respective component. The variables
‘price in thousands’, ‘engine size’ and ‘horsepower’ are highly correlated and contribute to a single
component. ‘Wheelbase’, ‘width’ and ‘length’ contribute to second component. And ‘vehicle type’,
‘curb weight’, ‘fuel capacity’ contribute to the third component.
Component Transformation Matrix
Component 1 2 3
1 0.601 0.627 0.495
2 –0.797 0.422 0.433
3 –0.063 0.655 –0.753
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
1 2 3
Vehicle type –0.173 –0.194 0.615
Price in thousands 0.414 –0.179 –0.081
Engine size 0.226 0.028 –0.016
Horsepower 0.368 –0.046 –0.139
(Contd)
14.78 Business Research Methodology
(Contd)
Wheelbase –0.177 0.397 –0.042
Width 0.011 0.289 –0.102
Length –0.105 0.477 –0.234
Curb weight 0.070 0.043 0.175
Fuel capacity 0.012 0.017 0.262
Fuel efficiency –0.107 0.108 –0.298
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
Component Scores.
This table gives the component scores for each variable. The component scores can be saved
for each case in the SPSS file. These scores are useful to replace internally related variables in the
regression analysis. In the above table, the scores are given component wise. The factor score for
each component can be calculated as the linear combinations of the component scores of that com-
ponent.
Component Score Covariance Matrix
Component 1 2 3
1 1.000 0.000 0.000
2 0.000 1.000 0.000
3 0.000 0.000 1.000
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
Component Scores.
C ASE 14.1
In the year 2009, the TRAI (Telephone Regulatory Authority of India) was assessing the require-
ments for number portability. Number portability is defined as switching a service provider, with-
out changing the number. This year had seen fierce price war in the telecom sector. Some of the
oldest service providers are still to certain extent immune to this war as most of their consumers
would not like to change number. The number portability will increase the price war and give
opportunity to relatively new service providers. The price war is so fierce that the industry experts
comment that the future lies in how one differentiates in terms of services, than price.
With this background, a TELECOMM company conducted a research study to find the factors
that affect consumers while selecting/switching a telecom service provider. The survey was con-
ducted on 35 respondents. They were asked to rate 12 questions, about their perception of factors
important to them while selecting a service provider, on 7-point scale (1= completely disagree,
7= completely agree).
Multivariate Statistical Techniques 14.79
(Contd)
18 3 5 5 4 6 2 5 5 5 5 5 4
19 3 4 5 4 5 4 3 4 4 4 3 4
20 3 3 6 4 5 7 1 1 1 4 1 5
21 1 7 7 2 7 1 7 6 7 6 7 7
22 6 5 3 7 4 4 4 5 5 3 5 2
23 5 6 7 6 6 2 6 6 6 5 6 7
24 2 6 7 3 7 3 6 2 5 7 6 6
25 1 3 4 2 5 2 7 4 5 3 6 3
26 3 7 7 4 6 2 7 5 6 7 7 7
27 4 6 6 5 6 1 7 6 7 6 6 5
28 4 5 7 5 6 3 4 5 5 5 4 6
29 4 6 5 5 4 2 6 2 6 4 6 4
30 5 5 4 6 3 3 5 2 5 3 5 3
31 3 6 7 4 6 2 7 5 6 7 7 7
32 3 3 6 4 5 7 1 1 1 4 2 5
33 5 5 2 6 3 5 3 4 4 2 4 1
34 1 3 6 2 5 3 6 5 5 5 5 5
35 7 2 5 7 3 2 6 2 6 5 6 5
Carry out relevant analysis and write a report to discuss the findings for the above data.
The initial process of conducting common factor analysis is exactly same as for principal com-
ponent analysis except for the method of selection shown in FA Snapshot 5.
We will discuss only the steps that are different than the principal component analysis shown
above.
Following steps are carried out to run factor analysis using SPSS:
1. Open file telcom.sav
2. Click on Analyse ->Data Reduction ->Factor as shown in FA Snapshot 1.
3. Following window will be opened by SPSS.
FA Snapshot 9
Multivariate Statistical Techniques 14.81
4. Click on descriptives, coefficients and click on initial solution, click on KMO and Bartlett’s test
of sphericity, and also select Anti-Image as shown in FA Snapshot 3.
It may be noted that we did not select Anti-Image in PCA, but we are required to select it
here.
5. Click on Extraction, following window will be opened by SPSS.
FA Snapshot 10
6. SPSS will take back to the window shown in FA Snapshot 9 at this stage. Click on Rotation,
the window SPSS will open is shown in FA Snapshot.
7. Select Varimax rotation, select Display rotated solution and click continue, as shown in FA
Snapshot 7.
8. It may be noted that in PCA of FA Snapshot 8 we selected to store some variables which is not
required here.
Following output will be generated by SPSS:
Factor Analysis
Descriptive Statistics
Mean Std. Deviation Analysis N
Q1 4.80 1.568 35
Q2 3.20 1.410 35
Q3 2.83 1.671 35
Q4 3.89 1.605 35
Q5 3.09 1.245 35
Q6 3.49 1.772 35
Q7 3.23 1.734 35
Q8 3.86 1.611 35
Q9 3.46 1.633 35
Q10 3.74 1.615 35
Q11 3.17 1.505 35
Q12 3.60 1.866 35
This is descriptive statistics given by the SPSS. This gives general understanding of the
variables.
14.82 Business Research Methodology
Correlation Matrix
Correlation Q1 1.000 –0.128 –0.294 0.984 –0.548 –0.017 –0.188 –0.093 0.129 –0.242 –0.085 –0.259
Q2 –0.128 1.000 0.302 –0.068 0.543 0.231 0.257 0.440 0.355 0.359 0.344 0.378
Q3 –0.294 0.302 1.000 –0.282 0.558 0.148 0.258 0.056 0.148 0.898 0.164 0.930
Q4 0.984 –0.068 –0.282 1.000 –0.510 –0.052 –0.223 –0.063 0.099 –0.227 –0.113 –0.251
Q5 –0.548 0.543 0.558 –0.510 1.000 0.101 0.195 0.387 0.067 0.538 0.149 0.559
Q6 –0.017 0.231 0.148 –0.052 0.101 1.000 0.901 0.159 0.937 0.230 0.906 0.096
Q7 –0.188 0.257 0.258 –0.223 0.195 0.901 1.000 0.202 0.866 0.379 0.943 0.211
Q8 –0.093 0.440 0.056 –0.063 0.387 0.159 0.202 1.000 0.204 0.042 0.192 0.156
Q9 0.129 0.355 0.148 0.099 0.067 0.937 0.866 0.204 1.000 0.258 0.889 0.091
Q10 –0.242 0.359 0.898 –0.227 0.538 0.230 0.379 0.042 0.258 1.000 0.309 0.853
Q11 –0.085 0.344 0.164 –0.113 0.149 0.906 0.943 0.192 0.889 0.309 1.000 0.119
Q12 –0.259 0.378 0.930 –0.251 0.559 0.096 0.211 0.156 0.091 0.853 0.119 1.000
This is the correlation matrix. The Common Factor Analysis can be carried out if the correlation
matrix for the variables contains at least two correlations of 0.30 or greater. It may be noted that
some of the correlations >0.3 are marked in circle.
KMO and Bartlett’s Test
KMO measure of sampling adequacy is an index used to test appropriateness of the factor
analysis. The minimum required KMO is 0.5. The above table shows that the index for this data is
0.658 and the chi-square statistics is significant (0.000<0.05). This means the principal component
analysis is appropriate for this data.
Communalities
Initial Extraction
Q1 0.980 0.977
Q2 0.730 0.607
Q3 0.926 0.975
Q4 0.978 0.996
Q5 0.684 0.753
Q6 0.942 0.917
Q7 0.942 0.941
(Contd)
Multivariate Statistical Techniques 14.83
(Contd)
Q8 0.396 0.379
Q9 0.949 0.942
Q10 0.872 0.873
Q11 0.934 0.924
Q12 0.916 0.882
Extraction Method: Principal Axis Factoring.
Initial communalities are the proportion of variance accounted for in each variable by the rest of
the variables. Small communalities for a variable indicate that the proportion of variance that this
variable shares with other variables is too small. Thus, this variable does not fit the factor solution.
In the above table, most of the initial communalities are very high indicating that all the variables
share a good amount of variance with each other, an ideal situation for factor analysis.
Extraction communalities are estimates of the variance in each variable accounted for by the fac-
tors in the factor solution. The communalities in this table are all high. It indicates that the extracted
factors represent the variables well.
Total Variance Explained
This output gives the variance explained by the initial solution. This table gives the total vari-
ance contributed by each component. We may note that the percentage of total variance contributed
by first component is 39.908, by second component is 25.288 and by third component is 19.960.
It may be noted that the percentage of total variances is the highest for first factor and it decreases
thereafter. It is also clear from this table that there are total three distinct factors for the given set
of variables.
14.84 Business Research Methodology
The scree plot gives the number of factors against the eigenvalues, and helps to determine the
optimal number of factors. The factors having steep slope indicate that larger percentage of total
variance is explained by that factor. The shallow slope indicates that the contribution to total variance
is less. In the above plot, the first four factors have steep slope; and later on the slope is shallow.
It may be noted from the above plot that the number of factors for eigenvalue greater than one are
four. Hence, the ideal number of factors is four.
Factor Matrixa
Factor
1 2 3 4
Q1 0.410 0.564 0.698 0.066
Q2 0.522 –0.065 0.139 0.557
Q3 0.687 –0.515 0.423 –0.245
Q4 –0.413 0.528 0.725 0.141
Q5 0.601 –0.500 –0.067 0.371
Q6 0.705 0.630 –0.111 –0.102
Q7 0.808 0.486 –0.176 –0.144
Q8 0.293 0.005 –0.031 0.540
Q9 0.690 0.682 0.039 0.019
Q10 0.727 –0.372 0.401 –0.213
Q11 0.757 0.575 –0.137 –0.040
Q12 0.644 –0.523 0.432 –0.090
Extraction Method: Principal Axis Factoring.
a. 4 factors extracted. 14 iterations required.
Multivariate Statistical Techniques 14.85
This table gives each variable factor loadings but it is the next table, which is easy to interpret.
Rotated Factor Matrixa
Factor
1 2 3 4
Q1 0.001 –0.147 0.972 –0.109
Q2 0.195 0.253 –0.002 0.711
Q3 0.084 0.970 –0.146 0.074
Q4 –0.042 –0.137 0.987 –0.035
Q5 0.006 0.456 –0.430 0.600
Q6 0.953 0.053 –0.004 0.078
Q7 0.939 0.161 –0.162 0.086
Q8 0.117 –0.008 –0.040 0.603
Q9 0.936 0.068 0.164 0.186
Q10 0.210 0.899 –0.101 0.103
Q11 0.943 0.078 –0.059 0.158
Q12 0.022 0.909 –0.110 0.206
Extraction Method: Principal Axis Factoring.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
This table is the most important table for interpretation. The maximum in each row (ignoring sign)
indicates that the respective variable belongs to the respective factor. For example, in the first row
the maximum is 0.972 which is for factor 3; this indicates that the Q1 contributes to third factor. In
the second row, maximum is 0.711; for factor 4 indicating that Q2 contributes to factor 4, and so
on.
The variables ‘Q6’, ‘Q7’, ‘Q9’ and ‘Q11’ are highly correlated and contribute to a single factor
which can be named as Factor 1 or ‘Economy’.
The variables ‘Q3’, ‘Q10’ and ‘Q12’ are highly correlated and contribute to a single factor which
can be named as Factor 2 or ‘Services beyond Calling’.
The variables ‘Q1’ and ‘Q4’ are highly correlated and contribute to a single factor which can be
named as Factor 3 or ‘Customer Care’.
The variables ‘Q2’, ‘Q5’ and ‘Q8’ are highly correlated and contribute to a single factor which
can be named as Factor 4 or ‘Anytime Anywhere Service’.
We may summarise the above analysis in the following table:
Factors Questions
Factor 1 Q.6. Call rates and Tariff plans
Economy Q.7. Additional features like unlimited SMS, lifetime prepaid, 2 phones free calling, etc.
Q.9. SMS and Value Added Services charges
Q.11. Roaming charges
(Contd)
14.86 Business Research Methodology
(Contd)
It implies that the telecomm service provider should consider these four factors which customers
feel are important, while selecting/switching a service provider.
The practical significance of a canonical correlation is that it indicates as to how much variance
in one set of variables is accounted for by another set of variables.
Squared canonical correlations are referred to as canonical roots or eigenvalues.
If X1, X2, X3, ………, Xp and Y1, Y2, Y3, ……., Yq are the observable variables then canonical
variables will be:
U1 = a1X1 + a2X2 + ……….. + apXp V1 = b1Y1 + b2Y2 + ………. + bqYq
U2 = c1X1 + c2X2 + …………. + cpXp V2 = d1Y1 + d2Y2 + ……….. + dqYq
and so on
Then Us and Vs are called canonical variables and coefficients are called canonical coefficients.
The first pair of sample canonical variables is obtained in such a way that
Var (U1) = Var (V1) = 1
and Corr (U1, V1) is maximum.
The second pair U2 and V2 are selected in such a way that they are uncorrelated with U1 and V1,
and the correlation between the two is maximum, and so on.
DA and CRA
CR2 which is defined as the ratio of SSB to SST is a measure of the strength of the discriminant
function. If its value is 0.84, it implies that 84% of the variation between the two groups is explained
by the two discriminating variables.
In MDA, the objective is not to account for maximum variance in the data (i.e. maximum SST),
but to maximise the between-group to within-group sum of squares ratio (i.e. SSB/SSW) that results
in the best discrimination between the groups. The new axis or the new linear combination that
is identified is called Linear Discriminant Function. The projection of an observed point onto this
discriminant function (i.e. the value of the new variable) is called the discriminant score.
Canonical Snapshot 1
Canonical Snapshot 2
It may be noted that the above example is discussed in Section 14.5.1. The difference between
MANOCOVA and canonical correlation is that MANOCOVA can have both factors and metric
independent variables, Canonical correlation can have only metric independent variables, factors
(categorical independent variables) are not possible in canonical correlation.
We are assuming in the above example that the dependent variables are the investments in com-
modity market and in share market. The metric independent variables are age, respondent’s rating for
commodity market, share market and mutual funds and respondent’s perception of risk for commod-
ity market, share market and mutual funds. Here, we assume that their investments depend on their
ratings, age and their risk perceptions for mutual funds, commodity markets and share markets.
The following output will be displayed:
(Contd)
Rate_SM Pillai’s Trace 0.026 0.280a 3.000 32.000 0.839
Wilks' Lambda 0.974 0.280a 3.000 32.000 0.839
a
Hotelling's Trace 0.026 0.280 3.000 32.000 0.839
Roy's Largest Root 0.026 0.280a 3.000 32.000 0.839
a
Rate_SM Pillai’s Trace 0.123 1.497 3.000 32.000 0.234
a
Wilks' Lambda 0.877 1.497 3.000 32.000 0.234
Hotelling's Trace 0.140 1.497a 3.000 32.000 0.234
a
Roy's Largest Root 0.140 1.497 3.000 32.000 0.234
risky_SM Pillai’s Trace 0.044 0.490a 3.000 32.000 0.692
a
Wilks' Lambda 0.956 0.490 3.000 32.000 0.692
Hotelling's Trace 0.046 0.490a 3.000 32.000 0.692
a
Roy's Largest Root 0.046 0.490 3.000 32.000 0.692
Age Pillai’s Trace 0.152 1.914a 3.000 32.000 0.147
a
Wilks' Lambda 0.848 1.914 3.000 32.000 0.147
a
Hotelling's Trace 0.179 1.914 3.000 32.000 0.147
Roy's Largest Root 0.179 1.914a 3.000 32.000 0.147
a
Rate_FD Pillai’s Trace 0.031 0.338 3.000 32.000 0.798
Wilks' Lambda 0.969 0.338a 3.000 32.000 0.798
a
Hotelling's Trace 0.032 0.338 3.000 32.000 0.798
Roy's Largest Root 0.032 0.338a 3.000 32.000 0.798
a
Rate_MF Pillai’s Trace 0.092 1.075 3.000 32.000 0.373
a
Wilks' Lambda 0.908 1.075 3.000 32.000 0.373
Hotelling's Trace 0.101 1.075a 3.000 32.000 0.373
a
Roy's Largest Root 0.101 1.075 3.000 32.000 0.373
Rate_FD Pillai’s Trace 0.145 1.814a 3.000 32.000 0.164
a
Wilks' Lambda 0.855 1.814 3.000 32.000 0.164
Hotelling's Trace 0.170 1.814a 3.000 32.000 0.164
a
Roy's Largest Root 0.170 1.814 3.000 32.000 0.164
Rate_MF Pillai’s Trace 0.001 0.012a 3.000 32.000 0.998
a
Wilks' Lambda 0.999 0.012 3.000 32.000 0.998
a
Hotelling's Trace 0.001 0.012 3.000 32.000 0.998
Roy's Largest Root 0.001 0.012a 3.000 32.000 0.998
a. Exact statistic
b. Design: Intercept+Rate_CM+Rate_SM+risky_CM+risky_SM+Age+Rate_FD+Rate_MF+risky_FD+risky_MF
This table indicates that the hypothesis about age, ratings of CM, SM and MF and Risky CM,
SM and MF are not rejected (as p-value is greater than 0.05)—this means there is no significant
difference in the investments for these variables.
Multivariate Statistical Techniques 14.91
(Contd)
Total Invest_CM 2.780E+010 44
Invest_SM 2.109E+011 44
Invest MF 2.302E+011 44
Corrected Total Invest CM 1.703E+010 43
Invest SM 1.221E+011 43
Invest MF 1.375E+011 43
(a) R Squared = 0.385 (Adjusted R Squared = 0.222)
(b) R Squared = 0.313 (Adjusted R Squared = 0.131)
(c) R Squared = 0.251 (Adjusted R Squared = 0.053)
The above table gives three different models namely a, b and c. Model a is for the first dependent
variable, invest CM, model b is for dependent variable invest SM and model c is for dependent
variable invest MF.
The table also indicates the individual relationship between each dependent – independent variable
pair. It is indicated above that only two pairs namely, Risky CM and Invest CM, and Age and Invest
CM are significant (p value less than 0.05 indicated by circles). This indicates that the independent
variable, perception of risk of commodity markets by consumers (variable name Risky CM) signifi-
cantly affects the dependent variable, i.e. their investment in commodity markets indicating that the
riskiness perceived by consumers affects their investments in the market. Similarly, variable AGE
also impacts, their investments in commodity markets. All other combinations are not significant.
The question is how to determine how similar or dissimilar each row of data is from the
others?
This task of measuring similarity between entities is complicated by the fact that, in most cases,
the data in its original form are measured in different units or/and scales. This problem is solved by
standardising each variable by subtracting its mean from its value and then dividing by its standard
deviation. This converts each variable to a pure number.
The measure to define similarity between two entities, i and j, is computed as
Dij = (xi1 – xj1)2 + (xi2 – xj2)2 + ……….+ (xik – xjk)2
Smaller the value of Dij, more similar are the two entities.
The basic method of clustering is illustrated through a simple example given as follows:
Let there be four branches of a commercial bank each described by two variables viz. deposits
and loans/credits. The following chart gives an idea of their deposits and loans/credits:
From the above chart, it is obvious that if we want two clusters, we should group the branches
1 and 2 (High Deposit, High Credit) into one cluster, and 3 and 4 (Low Deposit Low Credit) into
another, since such grouping produces the clusters for which the entities (branches) within each other
are most similar. However, this graphical approach is not convenient for more than two variables.
In order to develop a mathematical procedure for forming the clusters, we need a criterion upon
which to judge alternative clustering patterns. This criterion defines the optimal number of entities
within each cluster:
Now we shall illustrate the methodology of using distances among the entities from clusters. We
assume the following distance similarity matrix among three entities:
Distance or Similarity Matrix
1 2 3
1 0 5 10
2 5 0 8
3 10 8 0
14.94 Business Research Methodology
Thus, the best clustering would be to cluster entities 1 and 2 together. This would yield minimum
distance within clusters ( = 5), and simultaneously the maximum distance between clusters (=18).
Obviously, if the number of entities is large, it is a prohibitive task to construct every possible cluster
pattern, compute each ‘within cluster distance’ and select the pattern which yields the minimum. If
the number of variables and dimensions are large, computers are needed.
The criterion of minimising the within cluster distances to form the best possible grouping to
form ‘k’ clusters assumes that ‘k’ clusters are to be formed. If the number of clusters to be formed
is not fixed a priori, the criterion will not specify the optimal number of clusters. However, if the
objective is, to minimise the sum of the within cluster distances and the number of clusters is free
to vary, then all that is required is to make each entity its own cluster, and the sum of the within
cluster distances will be zero. Obviously, more the number of clusters lesser will be the sum of
‘within cluster distances’. Thus, making each entity its own cluster is of no value. This issue is,
therefore, resolved intuitively.
(ii) Agricultural Clusters A study was conducted by one of the officers of the Reserve Bank
of India, to form clusters of geographic regions of the country based on agricultural parameters like
cropping pattern, rainfall, land holdings, productivity, fertility, use of fertilisers, irrigation facilities,
etc. The whole country was divided into 9 clusters. Thus, all the 67 regions of the country were al-
located to these clusters. Such classification is useful for making policies at the national level as also
at regional/cluster levels.
Most common methods of clustering are agglomerative methods. This could be further divided
into:
Linkage Methods – these distance measures. There are three linkage methods, Single linkage
– minimum distance or nearest neighbourhood rule, Complete linkage – Maximum distance
or furthest neighbourhood and Average linkage – average distance between all pair of objects.
This is explained in the Diagram 14 provided in the following.
Centroid Methods – this method considers distance between the two centroids. Centroid is the
means for all the variables.
Variance Methods – this is commonly termed as Ward’s method; it uses the squared distance
from the means.
Diagram 14
(a) Single Linkage (Minimum Distance/Nearest Neighbourhood)
Cluster1 Cluster 2
(b) Complete Linkage (Maximum Distance/Furthest Neighbourhood)
Cluster1 Cluster 2
(c) Average Linkage (Average Distance)
Cluster1 Cluster 2
Multivariate Statistical Techniques 14.97
Cluster1 Cluster 2
Cluster1 Cluster 2
C ASE 14.2
INVESTMENT AWARENESS AND PERCEPTIONS
A study of investment awareness and perceptions was undertaken with the aim of achieving a
better understanding of investor’s behaviour. The main objective was to conduct an analysis of
the various investment options available, behaviour patterns and perceptions of the investors in
relation to the available options. For simplicity, four of the most popular investment options were
selected for analysis, viz.
Investment in Commodity Markets
Investment in Stock Markets
Investment in Fixed Deposits
Investment in Mutual Funds
Special focus was on the study of the levels of awareness among the investors about the com-
modity markets and also perceptional ratings of the investment options by the investors.
This study was undertaken with an intention to gain a better perspective on investor behaviour
patterns and to provide assistance to the general public, individual/small investors, broker’s and
portfolio managers to analyse the scope of investments and make informed decisions while in-
vesting in the above mentioned options. However, the limitation of the study is that it considers
investors from Mumbai only, and hence, might not be representative of the entire country.
An investment is a commitment of funds made in the expectation of some positive rate of return.
If properly undertaken, the return will be commensurate with the risk the investor assumes.
An analysis of the backgrounds and perceptions of the investors was undertaken in the report.
The data used in the analysis was collected by e-mailing and distributing the questionnaire among
friends, relatives and colleagues. 45 people were surveyed, and were asked various questions
relating to their backgrounds and knowledge about the investment markets and options. The raw
data contains a wide range of information, but only the data which is relevant to objective of the
study was considered.
The questionnaire used for the study is as follows:
Questionnaire
Age: _________
Occupation:
SELF EMPLOYED
GOVERNMENT
STUDENT
HOUSEWIFE
DOCTOR
ENGINEER
CORPORATE PROFESSIONAL
OTHERS (PLEASE SPECIFY) : ________________________
Gender:
MALE
FEMALE
Multivariate Statistical Techniques 14.99
SAFE RISKY
8. ON A SCALE OF 1-10, HOW RISKY DO YOU THINK IS THE STOCK MARKET?
SAFE RISKY
9. ON A SCALE OF 1-10, HOW RISKY DO YOU THINK IS FIXED DEPOSITS?
SAFE RISKY
10. ON A SCALE OF 1-10 HOW RISKY DO YOU THINK ARE MUTUAL FUNDS?
SAFE RISKY
14.100 Business Research Methodology
26 27 3 2 2 3 4 1 0 4500 7000 0 1 6 9 1 10
(Contd)
CA Snapshot 2
SPSS will take back to the window as displayed in CA Snapshot 2. At this stage, click on ‘Method’.
SPSS will open the following window:
CA Snapshot 4
Multivariate Statistical Techniques 14.103
Next step is to select the clustering measure. The most common measure is the squared Euclidean
distance.
CA Snapshot 5
SPSS will be back to the window as shown in CA Snapshot 2. At this stage, click on Save, the fol-
lowing window will be displayed.
CA Snapshot 6
Click on continue, and SPSS will be back as shown in CA Snapshot 2. At this stage, click OK. The
following output will be displayed.
We shall discuss this output in detail.
Proximities
Case Processing Summarya
Cases
Valid Missing Total
N Per cent N Per cent N Per cent
44 97.8% 1 2.2% 45 100.0%
a. Squared Euclidean Distance used
14.104 Business Research Methodology
This table gives the case processing summary and its percentages. The above table indicates
there are 44 out of 45 valid cases. Since one case has some missing values, it is ignored from the
analysis.
Cluster
Single Linkage
This is the method we selected for cluster analysis.
Agglomeration Schedule
(Contd)
30 1 33 8.994 29 0 31
31 1 11 9.066 30 23 32
32 1 24 9.071 31 19 33
33 1 17 9.245 32 0 34
34 1 28 9.451 33 0 35
35 1 31 9.483 34 0 36
36 1 37 9.946 35 0 38
37 7 36 9.953 20 0 38
38 1 7 10.561 36 37 39
39 1 25 10.705 38 0 40
40 1 23 11.289 39 0 41
41 1 39 12.785 40 0 42
42 1 3 12.900 41 0 43
43 1 6 15.449 42 1 0
This table gives the agglomeration schedule or the details of the clusters formed in each stage. This
table indicates that the cases 6 and 42 were combined at first stage. Cases 7 and 43 were combined
at second stage, 2 and 10 were combined at third stage, and so on. The last stage (stage 43) indicates
two cluster solution. One above last stage (stage 42) indicates three cluster solution, and so on. The
column Coefficients indicates the distance coefficient. Sudden increase in the coefficient indicates
that the combining at that stage is more appropriate. This is one of the indicators for deciding the
number of clusters.
Agglomeration
Schedule
Stage Cluster Combined Coefficients Stage Cluster First Next Difference in
Appears Stage Coefficients
Cluster 1 Cluster 2 Cluster 1 Cluster 2
1 6 42 0 0 0 43
2 7 43 0.828734 0 0 20 0.828734
3 2 10 2.189939 0 0 16 1.361204
4 40 41 2.360897 0 0 6 0.170958
5 8 44 2.636238 0 0 20 0.275341
6 38 40 3.002467 0 4 15 0.366229
7 30 32 3.749321 0 0 14 0.746854
8 1 14 3.808047 0 0 10 0.058726
9 26 29 3.89128 0 0 12 0.083232
10 1 16 4.1447 8 0 13 0.253421
11 34 35 4.325831 0 0 12 0.181131
12 26 34 4.587371 9 11 15 0.26154
(Contd)
14.106 Business Research Methodology
(Contd)
13 1 18 4.697703 10 0 16 0.110332
14 19 30 5.105442 0 7 23 0.407739
15 26 38 5.750915 12 6 19 0.645473
16 1 2 5.921352 13 3 17 0.170437
17 1 20 6.052442 16 0 18 0.13109
18 1 9 6.236206 17 0 21 0.183764
19 24 26 6.389243 0 15 32 0.153037
20 7 8 6.790893 2 5 37 0.40165
21 1 5 7.297921 18 0 22 0.507028
22 1 15 7.480892 21 0 24 0.182971
23 11 19 7.631185 0 14 31 0.150293
24 1 21 7.710566 22 0 26 0.079381
25 4 12 7.735374 0 0 28 0.024808
26 1 13 8.288569 24 0 27 0.553195
27 1 27 8.510957 26 0 28 0.222388
28 1 4 8.656467 27 25 29 0.14551
29 1 22 8.807405 28 0 30 0.150937
30 1 33 8.994409 29 0 31 0.187004
31 1 11 9.066141 30 23 32 0.071733
32 1 24 9.070588 31 19 33 0.004447
33 1 17 9.244565 32 0 34 0.173977
34 1 28 9.450741 33 0 35 0.206176
35 1 31 9.483015 34 0 36 0.032274
36 1 37 9.946286 35 0 38 0.463272
37 7 36 9.953277 20 0 38 0.00699
38 1 7 10.56085 36 37 39 0.607572
39 1 25 10.70496 38 0 40 0.144109
40 1 23 11.28888 39 0 41 0.583924
41 1 39 12.78464 40 0 42 1.495759
42 1 3 12.90033 41 0 43 0.115693
43 1 6 15.44882 42 1 0 2.548489
We have replicated the table with one more column added called “Difference in the coefficients”,
this is the difference in the coefficient between the current solution and the previous solution. The
highest difference gives the most likely clusters. In the above table, the highest difference is 2.548
which is for 2-cluster solution. The next highest difference 1.4956 and is for 3-cluster solution. This
indicates that there could be 3 clusters for the data.
Multivariate Statistical Techniques 14.107
14.108 Business Research Methodology
The icicle table also gives the summery of the cluster formation. It is read from bottom to top.
The topmost is the single cluster solution and bottommost is all cases separate. The cases in the
table are in the columns. The first column indicates the number of clusters for that stage. Each case
is separated by an empty column. A ‘cross’ in the empty column means the two cases are combined.
A ‘gap’ means the two cases are in separate clusters.
If the number of cases is huge, this table becomes difficult to interpret.
The diagram given below is called the dendrogram. A dendrogram is the most used tool to un-
derstand the number of clusters and cluster memberships. The cases are in the first column and they
are connected by lines for each stage of clustering. This graph is from left to right—the leftmost is
all cluster solution and rightmost is the one cluster solution.
This graph also has the distance line from 0 to 25. More is the width of the horizontal line for
the cluster, more appropriate is the cluster.
The graph shows that 2-cluster solution is a better solution indicated by the thick dotted line.
Dendrogram
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *
Dendrogram using Single Linkage
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
Case 7 6
Case 43 42
Case 41 40
Case 42 41
Case 39 38
Case 27 26
Case 30 29
Case 35 34
Case 36 35
Case 25 24
Case 31 30
Case 33 32
Case 20 19
Case 12 11
Case 4 4
Case 13 12
Case 2 2
Multivariate Statistical Techniques 14.109
Case 11 10
Case 1 1
Case 15 14
Case 17 16
Case 19 18
Case 21 20
Case 10 9
Case 6 5
Case 16 15
Case 22 21
Case 14 13
Case 28 27
Case 23 22
Case 34 33
Case 18 17
Case 29 28
Case 32 31
Case 38 37
Case 8 7
Case 44 43
Case 9 8
Case 45 44
Case 37 36
Case 26 25
Case 24 23
Case 40 39
Case 3 3
14.110 Business Research Methodology
The above solution is not decisive as the differences are very close. Hence, we shall try a differ-
ent method, i.e. furthest neighbourhood.
The entire process is repeated and this time the method (as shown in CA Snapshot 4) selected is
the furthest neighbourhood.
The output is as follows:
Proximities
Case Processing Summarya
Cases
Valid Missing Total
N Per cent N Per cent N Per cent
44 97.8% 1 2.2% 45 100.0%
a. Squared Euclidean Distance used
Cluster
Complete Linkage
Agglomeration Schedule
2 7 43 .829 0 0 16
3 2 10 2.190 0 0 25
4 40 41 2.361 0 0 10
5 8 44 2.636 0 0 16
6 30 32 3.749 0 0 15
7 1 14 3.808 0 0 11
8 26 29 3.891 0 0 19
9 34 35 4.326 0 0 19
10 38 40 4.386 0 4 31
11 1 16 5.796 7 0 22
12 5 18 7.298 0 0 18
13 20 21 7.711 0 0 26
14 4 12 7.735 0 0 33
15 11 30 7.770 0 6 28
(Contd)
Multivariate Statistical Techniques 14.111
(Contd)
16 7 8 7.784 2 5 31
17 9 15 7.968 0 0 18
18 5 9 8.697 12 17 23
19 26 34 8.830 8 9 24
20 23 28 11.289 0 0 37
21 13 31 11.457 0 0 25
22 1 22 11.552 11 0 29
23 5 33 12.218 18 0 33
24 24 26 13.013 0 19 32
25 2 13 13.898 3 21 30
26 17 20 14.285 0 13 36
27 36 37 14.858 0 0 35
28 11 19 14.963 15 0 34
29 1 27 16.980 22 0 37
30 2 39 18.176 25 0 34
31 7 38 18.897 16 10 38
32 24 25 19.342 24 0 36
33 4 5 21.241 14 23 40
34 2 11 25.851 30 28 39
35 6 36 26.366 1 27 41
36 17 24 26.691 26 32 40
37 1 23 29.225 29 20 39
38 3 7 30.498 0 31 41
39 1 2 36.323 37 34 42
40 4 17 37.523 33 36 42
41 3 6 55.294 38 35 43
42 1 4 63.846 39 40 43
43 1 3 87.611 42 41 0
14.112 Business Research Methodology
Dendrogram
Cluster 4
Cluster 2
Cluster 1
Cluster 3
Multivariate Statistical Techniques 14.113
The above dendrogram clearly shows that the longest horizontal lines for clusters are for 4-cluster
solution, shown by thick dotted line (the dotted line intersects four horizontal lines). It indicates that
the cluster containing cases 7, 43, 37 and 38, named as cluster 4 the cluster containing 41, 42, 39,
8, 44, 9, 45 and 3, named as cluster 2, and so on.
We shall run the cluster analysis again with same method and this time we shall save the cluster
membership for single solution = ‘4’ clusters as indicated in CA Snapshot 6.
The output will be same as discussed except a new variable is added in the SPSS file with name
‘CLU4_1’. This variable takes value between 1 to 4 each value indicates the cluster membership.
We shall conduct ANOVA test on the data where the dependent variables are taken as all the
variables that were included while performing cluster analysis, and the factor is the cluster member-
ship indicated by variable CLU4_1. This ANOVA will indicate if the clusters really distinguish on
the basis of the list of variables, which variables significantly distinguish the clusters and which do
not distinguish.
The ANOVA procedure is as follows:
Select ‘Analyze’ – Compare Means – One way ANOVA from the menu as shown below.
CA Snapshot 7
This gives list of Post Hoc tests for ANOVA. Most common are LSD and HSD (discussed in
Chapter 12) we shall select LSD and click on continue.
SPSS will take back to CA Snapshot 8. Click on Options and the following window will be
opened.
CA Snapshot 10
SPSS will take back to window as shown in CA Snapshot 8, at this stage click on OK.
The following output will be displayed:
Oneway
Descriptives
(Contd)
3 16 33.56 9.063 2.266 28.73 38.39 21 50
4 4 26.00 4.761 2.380 18.42 33.58 23 33
Total 44 30.43 11.571 1.744 26.91 33.95 18 55
Rate_CM 1 16 3.69 0.873 0.218 3.22 4.15 2 5
2 8 1.50 0.535 0.189 1.05 1.95 1 2
3 16 1.88 0.719 0.180 1.49 2.26 1 3
4 4 2.50 0.577 0.289 1.58 3.42 2 3
Total 44 2.52 1.171 0.177 2.17 2.88 1 5
Rate_SM 1 16 3.94 0.680 0.170 3.58 4.30 3 5
2 8 2.13 0.354 0.125 1.83 2.42 2 3
3 16 2.31 0.602 0.151 1.99 2.63 1 3
4 4 3.50 0.577 0.289 2.58 4.42 3 4
Total 44 2.98 1.000 0.151 2.67 3.28 1 5
Rate_FD 1 16 2.81 0.911 0.228 2.33 3.30 2 5
2 8 4.13 0.354 0.125 3.83 4.42 4 5
3 16 4.44 0.727 0.182 4.05 4.83 3 5
4 4 3.00 1.414 0.707 0.75 5.25 2 5
Total 44 3.66 1.098 0.166 3.33 3.99 2 5
Rate_MF 1 16 4.00 1.155 0.289 3.38 4.62 2 5
2 8 2.75 0.463 0.164 2.36 3.14 2 3
3 16 2.94 0.998 0.249 2.41 3.47 1 5
4 4 4.50 1.000 0.500 2.91 6.09 3 5
Total 44 3.43 1.149 0.173 3.08 3.78 1 5
Invest_CM 1 16 5937.50 9681.382 2420.346 778.66 11096.34 0 30000
2 8 36250.00 8762.746 3098.098 28924.16 43575.84 25000 50000
3 16 4593.75 7289.762 1822.440 709.31 8478.19 0 23000
4 4 57500.00 11902.381 5951.190 38560.66 76439.34 50000 75000
Total 44 15647.73 19901.671 3000.290 9597.07 21698.39 0 75000
Invest_SM 1 16 20156.25 18716.943 4679.236 10182.70 30129.80 3000 60000
2 8 111875.00 60999.854 21566.705 60877.85 162872.15 50000 185000
3 16 18656.25 24993.145 6248.286 5338.34 31974.16 1500 100000
4 4 115000.00 41231.056 20615.528 49392.19 180607.81 70000 150000
Total 44 44909.09 53293.535 8034.303 28706.38 61111.81 1500 185000
Invest_FD 1 16 6718.75 7061.560 1765.390 2955.91 10481.59 0 25000
2 8 95625.00 43951.069 15539.049 58880.99 132369.01 70000 200000
3 16 20500.00 18071.156 4517.789 10870.56 30129.44 2500 60000
4 4 63750.00 7500.000 3750.000 51815.83 75684.17 55000 70000
Total 44 33079.55 39780.056 5997.069 20985.30 45173.79 0 200000
(Contd)
14.116 Business Research Methodology
(Contd)
Invest_MF 1 16 18593.75 21455.550 5363.887 7160.89 30026.61 2500 75000
2 8 117500.00 54837.422 19387.956 71654.77 163345.23 50000 180000
3 16 17781.25 15921.651 3980.413 9297.20 26265.30 0 50000
4 4 124250.00 72126.625 36063.312 9480.44 239019.56 22000 175000
Total 44 45886.36 56550.540 8525.315 28693.43 63079.30 0 180000
how_much_time_ 1 16 2.19 1.047 0.262 1.63 2.75 1 4
block_your_money 2 8 6.13 1.553 0.549 4.83 7.42 4 8
3 16 4.31 1.887 0.472 3.31 5.32 1 7
4 4 4.25 0.500 0.250 3.45 5.05 4 5
Total 44 3.86 2.030 0.306 3.25 4.48 1 8
risky_CM 1 16 5.31 1.887 0.472 4.31 6.32 2 9
2 8 6.63 0.916 0.324 5.86 7.39 6 8
3 16 6.00 1.966 0.492 4.95 7.05 3 9
4 4 3.00 0.816 0.408 1.70 4.30 2 4
Total 44 5.59 1.921 0.290 5.01 6.17 2 9
risky_SM 1 16 5.38 2.527 0.632 4.03 6.72 1 9
2 8 7.25 0.886 0.313 6.51 7.99 6 8
3 16 7.19 1.559 0.390 6.36 8.02 3 9
4 4 4.75 2.062 1.031 1.47 8.03 3 7
Total 44 6.32 2.122 0.320 5.67 6.96 1 9
risky_FD 1 16 1.94 0.772 0.193 1.53 2.35 1 3
2 8 1.13 0.354 0.125 0.83 1.42 1 2
3 16 1.50 0.516 0.129 1.22 1.78 1 2
4 4 1.00 0.000 0.000 1.00 1.00 1 1
Total 44 1.55 0.663 0.100 1.34 1.75 1 3
risky_MF 1 16 5.13 2.187 0.547 3.96 6.29 2 10
2 8 6.50 1.195 0.423 5.50 7.50 4 8
3 16 6.56 2.159 0.540 5.41 7.71 4 10
4 4 4.75 2.363 1.181 0.99 8.51 3 8
Total 44 5.86 2.120 0.320 5.22 6.51 2 10
The above table gives descriptive statistics for the dependent variables for each cluster the short
summary of above table is displayed below:
Descriptives
risky_mu-
tual_funds_
Mean one_to_ten
N Age Rate CM Rate SM Rate FD Rate MF Spend CM Spend SM Spend FD
1 16 20.5625 3.6875 3.9375 2.8125 4 5937.5 20156.25 6718.75 5.125
2 8 46.125 1.5 2.125 4.125 2.75 36250 111875 95625 6.5
3 16 33.5625 1.875 2.3125 4.4375 2.9375 4593.75 18656.25 20500 6.5625
(Contd)
Multivariate Statistical Techniques 14.117
(Contd)
4 4 26 2.5 3.5 3 4.5 57500 115000 63750 4.75
Total 44 30.43182 2.522727 2.977273 3.659091 3.431818 15647.73 44909.09 33079.55 5.863636
It may be noted that these four clusters have average age as 20.56, 46.13, 33.56 and 26. This
clearly forms four different age groups. The other descriptive is summarised as follows:
Cluster 1 Average age 20.56 (Young non-working)
Sr No: 31, 33, 12, 20, 2, 11, 14, 32, 40, Prefers investing in commodity market, share market and mutual funds;
24, 29, 1, 15, 17, 23, 28 does not prefer to invest in fixed deposits; invests less money, blocks
money for lesser period and finds share market and commodity market
investments as least risky (among other people).
Cluster 2 Average age 46.13 (Oldest)
Sr No: 41, 42, 39, 8, 44, 9, 45, 3 Least prefers investing in commodity market, share market and mutual
funds; prefers to invest in fixed deposits; invests high money, blocks
money for more period and finds share market and commodity market
investments as most risky (among other people).
Cluster 3 Average age 33.56 (Middle age)
Sr No: 4, 13, 6, 19, 10, 16, 34, 21, 22, Less prefers investing in commodity market, share market and mutual
18, 27, 30, 35, 36, 25, 26 funds; prefers to invest in fixed deposits; invests less money, blocks
money for lesser period (but not lesser than people of cluster 1) and finds
share market and commodity market investments as more risky.
Cluster 4 Average age 26 (Young working)
Sr No: 7, 43, 37, 38 Less prefers investing in commodity market, share market but prefers to
invest in mutual funds and fixed deposits; invests more money, blocks
money for moderate period and finds share market and commodity market
investments as more risky.
It may be noted that these clusters are named in the dendrogram on the basis of above criteria.
Test of Homogeneity of Variances
(Contd)
Rate_MF 1.136 3 40 0.346
Invest_CM 0.369 3 40 0.775
Invest_SM 17.591 3 40 0.000
Invest_FD 4.630 3 40 0.007
Invest_MF 15.069 3 40 0.000
how_much_time_ 3.390 3 40 0.027
block_your_money
risky_CM 1.995 3 40 0.130
risky_SM 4.282 3 40 0.010
risky_FD 5.294 3 40 0.004
risky_MF 2.118 3 40 0.113
This table gives Leven’s Homogeneity test which is a must for ANOVA as ANOVA assumes
that the different groups have equal variance. If the significance is less than 5% (LOS), the null
hypothesis which states that the variances are equal is rejected. i.e. the assumption is not followed.
In such a case, ANOVA cannot be used. The above table rejection of the assumption is indicated by
circles, which means ANOVA could be invalid for those variables.
It may be noted that when ANOVA is invalid, the test that can be performed is non parametric
test, Kruskal–Wallis test discussed in Chapter 13.
Anova
(Contd)
Invest_CM Between Groups 1E+010 3 4621914299 58.403 0.000
Within Groups 3E+009 40 79138671.88
Total 2E+010 43
Invest_SM Between Groups 8E+010 3 2.545E+010 22.243 0.000
Within Groups 5E+010 40 1144289844
Total 1E+011 43
Invest_FD Between Groups 5E+010 3 1.624E+010 33.585 0.000
Within Groups 2E+010 40 483427734.4
Total 7E+010 43
Invest_ME Between Groups 9E+010 3 3.005E+010 25.377 0.000
Within Groups 5E+010 40 1184108594
Total 1E+011 43
h o w _ m u c h _ t i m e _ Between Groups 89.682 3 29.894 13.666 0.000
block_your_money
Within Groups 87.500 40 2.188
Total 177.182 43
risky_CM Between Groups 39.324 3 13.108 4.394 0.009
Within Groups 119.313 40 2.983
Total 158.636 43
risky_SM Between Groups 43.108 3 14.369 3.821 0.017
Within Groups 150.438 40 3.761
ANOVA
Total 193.545 43
not rejected
risky_FD Between Groups 5.097 3 1.699 4.920 0.005 as > 0.05
Within Groups 13.813 40 0.345
Total 18.909 43
risky_MF Between Groups 24.744 3 8.248 1.959 0.136
Within Groups 168.438 40 4.211
Total 193.182 43
The above ANOVA table tests the difference between means for the different clusters. The null
hypothesis states that there is no difference between the clusters for given variable. If significance
is less than 5% (p value less than 0.05), the null hypothesis is rejected.
It may be noted that for above table, null hypothesis that the variable is equal for all clusters is
rejected for all variables except for Risky_MF. This means all other variables significantly vary for
different clusters. It also indicates that the four cluster solution is a good solution.
K-means Cluster
This method is used when one knows in advance, how many clusters to be formed. The procedure
for k-means cluster is as follows:
14.120 Business Research Methodology
CA Snapshot 11
SPSS will take back to the window as shown in CA Snapshot 12. Click on Options and the follow-
ing window will appear:
Multivariate Statistical Techniques 14.121
CA Snapshot 14
Cluster
1 2 3 4
Age 55 22 54 45
Rate_CM 1 4 1 1
Rate_SM 2 4 2 2
Rate_FD 4 4 4 4
Rate_MF 2 3 3 3
Invest_CM 45000 5000 50000 25000
Invest_SM 60000 3000 60000 185000
Invest_FD 75000 1000 200000 100000
Invest_MF 75000 2500 50000 155000
how_much_time_block_your_money 8 4 8 6
risky_CM 8 3 8 6
risky_SM 8 6 8 7
risky_FD 1 2 1 2
risky_MF 7 3 4 7
This table gives initial cluster centres. The initial cluster centres are the variable values of the k
well-spaced observations.
Iteration Historya
The iteration history shows the progress of the clustering process at each step.
This table has only three steps as the process has stopped due to no change in cluster centres.
Final Cluster Centres
Cluster
1 2 3 4
Age 33 27 54 34
Rate_CM 2 3 1 2
Rate_SM 3 3 2 3
Rate_FD 4 4 4 4
Rate_MF 4 3 3 4
Invest_CM 31000 2981 50000 36667
Invest_SM 58182 11769 60000 161667
Invest_FD 41636 11827 200000 81667
Invest_MF 60636 10846 50000 170000
how_much_time_block_your_money 4 3 8 5
risky_CM 5 6 8 5
risky_SM 7 6 8 5
risky_FD 1 2 1 1
risky_MF 6 6 4 5
Cluster Error
Mean Square df Mean Square df F Sig.
Age 318.596 3 120.025 40 2.654 0.062
Rate_CM 2.252 3 1.306 40 1.725 0.177
Rate_SM 0.599 3 1.029 40 0.582 0.630
Rate_FD 0.377 3 1.269 40 0.297 0.827
Rate_MF 0.722 3 1.366 40 0.528 0.665
Invest_CM 3531738685 3 160901842.9 40 21.950 0.000
Invest_SM 3.750E+010 3 240364627.0 40 156.032 0.000
Invest_FD 1.819E+010 3 336746248.5 40 54.022 0.000
Invest_MF 4.225E+010 3 268848251.7 40 157.162 0.000
how_much_time_block_your_money 10.876 3 3.614 40 3.010 0.041
risky_CM 3.654 3 3.692 40 0.990 0.407
risky_SM 3.261 3 4.594 40 0.710 0.552
risky_FD 0.603 3 0.427 40 1.411 0.254
risky_MF 1.949 3 4.683 40 0.416 0.742
Multivariate Statistical Techniques 14.123
The F tests should be used only for descriptive purposes because the clusters have been chosen
to maximize the differences among cases in different clusters. The observed significance levels are
not corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means
are equal.
The ANOVA indicates that the clusters are different only for different investment options like
invest in CM, invest in SM, invest in FD and invest in MF as also block money, as the significance
is less than 0.05 only for these variables.
Number of Cases in each Cluster
Cluster 1 11
2 26
3 1
4 6
Valid 44
Missing 1
The above table gives the number of cases for each cluster.
It may be noted that this solution is different than hierarchal solution and hierarchal cluster is
more valid for this data as it considers standardised scores and this method does not consider stan-
dardisation.
utility value. Utility can be defined as a number which represents the value that consumers place
on specific attributes. A low utility indicates less value; a high utility indicates more value. In other
words, it represents the relative ‘worth’ of the attribute. This helps in designing products/services that
are most appealing to a specific market. In addition, because conjoint analysis identifies important
attributes, it can be used to create advertising messages that are most appealing.
The process of data collection involves showing respondents a series of cards that contain a writ-
ten description of the product or service. If a consumer product is being tested, then a picture of the
product can be included along with a written description. Several cards are prepared describing the
combination of various alternative set of features of a product or service. A consumer’s response
is collected by his/her selection of number between 1 and 10. While ‘1’ indicates strongest dislike,
‘10’ indicates strongest like for the combination of features on the card. Such data becomes the
input for final analysis which is carried out through computer software.
The concepts and methodology are elaborated in the case study given below.
Sr. No Transaction Time (min) Card Fees (Rs) Interest Rate Rating *
1, 1.5, 2 0, 1000, 2000 1.5%, 2.0%, 2.5% 27 to 1
1 1 0 1.5 27
2 1.5 0 1.5 26
(Contd)
Multivariate Statistical Techniques 14.125
(Contd)
3 1 1000 1.5 25
4 1.5 1000 1.5 24
5 2 0 1.5 23
6 2 1000 1.5 22
7 1 0 2 21
8 1.5 0 2 20
9 1 2000 1.5 19
10 1.5 2000 1.5 18
11 1 1000 2 17
12 1.5 1000 2 16
13 1 2000 2 15
14 2 2000 1.5 14
15 1.5 2000 2 13
16 2 0 2 12
17 2 1000 2 11
18 1 0 2 10
19 1.5 0 2.5 9
20 1 1000 2.5 8
21 2 1000 2.5 7
22 2 2000 2 6
23 2 0 2.5 5
24 2 1000 2.5 4
25 1 2000 2.5 3
26 1.5 2000 2.5 2
27 2 2000 2.5 1
* rating 27 indicates most preferred and rating 1 indicates lest preferred option by customer.
Conduct appropriate analysis to find the utility for these three factors.
The data is available in credit card.sav file, given in the CD.
Thus, 6 variables, ie. X1 to X6 are used to represent the 3 levels of life of the transaction time
(1, 1.5, 2), 3 levels of fees (0, 1000, 2000) and 3 levels of interest rates (1.5, 2, 2.5). All the six
variables are independent variables in the regression run. Another variable Y which is the rating of
each combination given by the respondent forms the dependent variable of the regression curve.
Thus, we generate the regression equation as: Y= a +b1X1+ b2X2+ b3X3+ b4X4+ b5X5+ b6X6
Input data for the regression model:
Sr No Transaction time Fees Interest Rate Y X1 X2 X3 X4 X5 X6
1 1 0 1.5 27 1 0 1 0 1 0
2 1.5 0 1.5 26 0 1 1 0 1 0
3 1 1000 1.5 25 1 0 0 1 1 0
4 1.5 1000 1.5 24 0 1 0 1 1 0
5 2 0 1.5 23 –1 –1 1 0 1 0
6 2 1000 1.5 22 –1 –1 0 1 1 0
7 1 0 2 21 1 0 1 0 0 1
8 1.5 0 2 20 0 1 1 0 0 1
9 1 2000 1.5 19 1 0 –1 –1 1 0
10 1.5 2000 1.5 18 0 1 –1 –1 1 0
11 1 1000 2 17 1 0 0 1 0 1
12 1.5 1000 2 16 0 1 0 1 0 1
13 1 2000 2 15 1 0 –1 –1 0 1
14 2 2000 1.5 14 –1 –1 –1 –1 1 0
15 1.5 2000 2 13 0 1 –1 –1 0 1
16 2 0 2 12 –1 –1 1 0 0 1
17 2 1000 2 11 –1 –1 0 1 0 1
18 1 0 2 10 1 0 1 0 0 1
19 1.5 0 2.5 9 0 1 1 0 –1 –1
20 1 1000 2.5 8 1 0 0 1 –1 –1
(Contd)
Multivariate Statistical Techniques 14.127
(Contd)
21 2 1000 2.5 7 –1 –1 0 1 –1 –1
22 2 2000 2 6 –1 –1 –1 –1 0 1
23 2 0 2.5 5 –1 –1 1 0 –1 –1
24 2 1000 2.5 4 –1 –1 0 1 –1 –1
25 1 2000 2.5 3 1 0 –1 –1 –1 –1
26 1.5 2000 2.5 2 0 1 –1 –1 –1 –1
27 2 2000 2.5 1 –1 –1 –1 –1 –1 –1
This table indicates that r square for the above model is 0.963, which is close to one. This indi-
cates that 96.3% variation in the rate is attributed by the six independent variables (x1 to x6).
We conclude that the regression model is fit and explains the variations in the dependent variables
quite well.
ANOVAb
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
The coefficients are circled; this indicates the utility values for each variable.
The Regression equation is as follows:
Y= 13.857 + 1.377X1 + 1.326X2 + 2.265X3 – 1.48X4 + 8.143X5 - 0.121X6
Combination Utilities
The total utility of any combination can be calculated by picking up the attribute levels of our
choice.
For example,
The combined utility of the combination of 1.5 min + 1000 Fees + 2% Interest
= 1.326 + 1.480 – 0.121
= 2.685
To know the BEST COMBINATION, it is advisable to pick the highest utilities from each attribute
and then add them.
Individual Attributes
The difference in utility with the change of one level in one attribute can also be checked.
1. Transaction Time
For the time 1 min to 1.5 min – There in decrease in utility value of 0.051 units.
But the next level, that is, 1.5 min to 2 min has decrease in utility of 4.029 units.
2. Annual Fees
Increase fees from 0 to Rs.1000 induces a utility drop of 0.785
Whereas, Rs.1000 to Rs. 2000, there is an decrease in utility of 5.225
3. Interest Rates
Interest rate increase from 1.5% to 2.0% induces 8.264 units drop in utility.
Interest rate increase from 2.0% to 2.5% induces 7.901units drop in utility.
Thus, MDS reveals relationships that appear to be obscure when one examines only the numbers
resulting from a study.
It attempts to find the structure in a set of distance measures between objects. This is done by
assigning observations to specific locations in a conceptual space (2 to 3 dimensions) such that the
distances between points in the space match the given dissimilarities as closely as possible.
If objects A and B are judged by the respondents as being most similar compared to all other
possible pairs of objects, multidimensional technique positions these objects in the space in such a
manner that the distance between them is smaller than that between any other two objects.
Suppose, data is collected for perceiving the differences or distances among three objects say A,
B and C, and the following distance matrix emerges:
A B C
A 0 4 6
B 4 0 3
C 6 3 0
However, if the data comprises only ordinal or rank data, then the same distance matrix could be
written as:
A B C
A 0 2 3
B 1 0 3
C 3 1 0
If the actual magnitudes of the original similarities (distances) are used to obtain a geometric
representation, the process is called Metric Multidimensional Scaling.
14.132 Business Research Methodology
When only this ordinal information in terms of ranks is used to obtain a geometric representation,
the process is called Non-metric Multidimensional Scaling.
It is observed that two zonal managers viz. B and E exhibit high concern for both the organisa-
tion as well as staff. If these criteria are critical to the organisation, then these two zonal managers
could be the right candidates for higher positions in the Head Office.
Example 14.5
Similar study could be conducted for a group of companies to have an assessment of the percep-
tion of investors about the attitude of companies towards interest of their shareholders and vis-à-vis
Multivariate Statistical Techniques 14.133
Example 14.5
A small team of students of a management institute conducted a study to decide upon the position-
ing of the new brand vis-à-vis existing brands of 125 cc motorcycles. They collected data through
a questionnaire from 40 users of such motorcycles. They followed the multidimensional scaling
(MDS) approach for positioning of a new brand. Through the use of SPSS software, they derived
MDS – stimulus configuration as depicted by perceptual mapping in the following two-dimensional
graph:
S U M M A RY
A number of statistical techniques are especially useful in designing of products and services. These
techniques, basically involve reduction of data and subsequent its summarisation, presentation and
interpretation.
14.134 Business Research Methodology
These techniques (with their abbreviations in brackets) coupled with the appropriate computer
software like SPSS, play a very useful role in the endeavour of reduction and summarisation of data
for easy comprehension.
A brief idea about these techniques is as follows:
Multiple Regression Analysis (MRA): It deals with the study of relationship between one metric
dependent variable and more than one metric independent variables.
Principal Component Analysis (PCA): Technique for forming set of new variables that are linear
combinations of the original set of variables, and are uncorrelated. The new variables are called
Principal Components.
Canonical Correlation Analysis (CRA): An extension of multiple regression analysis (MRA in-
volving one dependant variable and several metric independent variables). It is used for situations
wherein there are several dependant variables and several independent variables.
Discriminant Analysis: It is a statistical technique for classification or determining a linear function,
called discriminant function, of the variables which helps in discriminating between two groups of
entities or individuals.
Multivariate Analysis of Variance (MANOVA): It explores, simultaneously, the relationship be-
tween several non-metric independent variables and two or more metric dependant variables.
Factor Analysis (FA): It is a statistical approach that is used to analyse inter-relationships among
a large number of variables (indicators) and to explain these variables(indicators) in terms of a
few unobservable constructs (factors). In fact, these factors impact the variables, and are reflective
indicators of the factors.
Cluster Analysis: It is an analytical technique that is used to develop meaningful subgroups of
entities which are homogeneous or compact with respect to certain characteristics.
Conjoint Analysis: Involves determining the contribution of variables (each of several levels) to
the choice preference over combinations of variables that represent realistic choice sets (products,
concepts, services, companies, etc.)
Logistic Regression: In logistic regression, the dependent variable is the probability that an event
will occur, hence it is constrained between 0 and 1. All of the predictors can be binary, a mixture
of categorical and continuous or just continuous.
Canonical Correlation: It relates a set of dependent variables with a set of independent variables.
It involves developing linear combinations of the sets of variables (both dependant and independent)
and studies the relationship between the two sets.
Multidimensional Scaling: It is a set of procedures drawing pictures of data so as to visualise and
clarify relationships described by the data more clearly. It transforms consumer judgments/percep-
tions of similarity or preferences in usually a two-dimensional space.
DISCUSSION QUESTIONS
1. Distinguish between the dependence and interdependence techniques, with suitable examples.
2. Write short notes on the following bringing out their relevance and applications:
Multivariate Statistical Techniques 14.135
15
1. Introduction and Relevance
2. Format of a Report
Contents 3. A Classification of the Sections of the Report
4. Power Point Presentations
5. Power of Revision
LEARNING OBJECTIVES
The objective of this chapter is to provide comprehensive guidelines for preparing and presenting a
research study report.
Format of the report should be such so as to hold sustained interest in reading the entire report
Scope and limitations of the approaches followed and of the findings should be outlined. If the
study relates to either estimation of parameters like sales, profit or, estimating association or
relationship like relationship between advertising and sales, testing of some assumption like
lives of two brands of car batteries are the same, then an idea about the faith one can place in
the findings has to be provided.
One may use creativity to come with unique design indicating the basic theme of the research.
The idea is that the reader should start with a pleasant feeling. It assumes greater significance, if
the readers do not know the researcher(s).
A few samples are given at the end of the chapter.
For MBA Students:
The Institute’s logo may be given at the top
Title and Researcher’s or Team Members’ names
‘MBA Batch Year (e.g. 2010)’ may be mentioned
Title Page
The title page covers
(i) Title of the Project (to arouse interest)—should be exactly the same as on the cover sheet.
(ii) Broad Description of the project, background or logic for selection—whether by the researcher
or assigned by the faculty
Letter of Transmittal
If the project is given to the researcher by an internal authority or external agency through an of-
ficial letter, called Letter of Authorisation, then along with the title indicating its relevance to the
assignment, the researcher has to submit what is called ‘Letter of Transmittal’, forwarding the
report to the concerned authority giving reference of the letter received from them. These types of
letters are given on the next two pages after the title page.
Index of Contents with Pages
It includes page numbers of heading/title of sections and subsections like 2, 2.1, 2.1.1, and their
corresponding page numbers.
Summary (Executive Summary, if the report is submitted to higher authorities)
Elaborate a bit about the title of the project bringing out the background and objective of the study,
and to the extent of fulfillment through the conclusions drawn from the study. If the study requires
the researcher to make recommendations, the recommendations are to be highlighted. The scope of
the conclusions/recommendations and their limitations are to be mentioned. If the limitations are of
serious nature, the report may not invoke the interest in the full report.
As mentioned in Chapter 8, a picture is equal to thousand words. So is true about charts, graphs
or sometimes even tables. Therefore, the summary may contain a couple of these, if relevant.
Executive Summary is prepared when the report is submitted to the top management. It has to
provide them a sense of receiving important inputs that could be useful for their decision-making.
But due care has to be taken so that the summary is unbiased and without any exaggeration.
Acknowledgement for Guidance
The names of the persons who suggested the topic or/and provided guidance are to me mentioned
here.
(ii) Cohesiveness among various sections. As far as possible, each one following from the previous
and leading to the next section
(iii) All the statements, not flowing from analysis, should be validated by giving references
Preamble/Preface/Introduction
Normally, the details of the conduct of a study start with ‘Introduction’ which means the ‘first
section of a communication’. However, it is preferable to title the first section as Preamble as it
means ‘stating the reasons and intent of what follows’. This section is to contain reasons that led
to the conduct of the study. Thus, one has to give reference to the available literature containing
earlier work in this area, bring out the need for extending that work or even suggest a new approach
to deal with the current situation. Incidentally, it could also be termed as ‘Relevance’ of the study
conducted.
The section contains literature review, earlier development, etc. While describing these, care has
to be taken that only that literature and development have to be included that has direct bearing to
the topic of the study. The focus should not be lost by the enthusiasm of the researcher to impress
the vast literature surveyed by him/her or efforts made by him/her, by discussing broader issues.
Appropriate Methodology or Research Design for the conduct for the study, justifying its use
in the context of the research topic
How it is considered superior to the methodology used in the earlier researches, if any
Sampling Design—with justification
Collection of data—Primary and Secondary: Questionnaire, Telephone/Mail/On-line Survey/In-
terviews/Focus Group
Presentation of data through Tables, Charts, Graphs, etc., and their interpretation
Qualitative/Quantitative analysis of the data
Conclusion (and recommendations if applicable)
The interpretations and conclusions should flow from the presentation and analysis, without even
an iota of bias on the part of the researcher.
However, the recommendations could be based on the implications of the conclusions for the
organisation which had sponsored the study, and could contain some aspects not covered in the
analysis. Some of these could relate to computerisation, setting website, management information
system, etc. The specific recommendations could be followed by suggesting an action plan for the
sponsor. Care is to be taken that the action plan is practical, and should be presented in the order
of importance to the organisation. The action plan could also be divided into short- and long-term
plans.
It may be added that resentation/conclusions/interpretation/recommendation could encompass the
following points:
(i) The objective and scope of the study
(ii) Findings and results with reference to the analysis
(iii) Justifying validity through the findings/results
(iv) Viability or practicability of recommendations
Report Writing 15.5
Epilogue
This is the concluding section relating to the research study. It reiterates the main findings in popu-
lar terms, highlighting the main ‘achievement’/contribution by the researcher. It also indicates the
scope and limitations, and suggests approaches that could be followed for broadening the scope
and/or reducing the limitations, in future. It may also indicate the need for a further study in the
related area. For example, if the study has been conducted for criteria for preference of features
of motorcycles of different brands in one segment, one could suggest that a similar study could be
conducted for motorcycles in the other segment. Further, suppose the study has been conducted in
Mumbai, one could suggest a similar study being conducted in other metro cities. One could also
add that the sample size of customers was restricted to only 100 users in and around a certain area
in Mumbai due to paucity of resources. More reliable results could be obtained by taking a larger
sample from all over Mumbai.
15.2.4 Formats for Various Types of Reports for Different Types of Research
Studies
While the formats of the report comprising the preliminary components from Cover sheet to Table of
contents, and the final components from Glossary to Appendices are about the same, but the formats
of Preamble/Introduction, Research Design and Findings, Conclusions and Recommendations could
vary depending on the type of research/study. Thus, the reports for each of the following types of
studies would be different:
15.6 Business Research Methodology
(Contd)
• Career progress of MBA graduates • Basic Research*
(Women versus Men) • Descriptive—Comparative—Fact finding
• Factors leading to less progress for • Exploratory
women
• How to improve profitability of credit • This would involve extensive collection and analysis of qualitative
card business in a bank as well as quantitative data, setting up and testing of hypotheses,
etc.
*If the same study is to be done for a particular company, it would be called ‘Applied Research.’
Prefatory
It comprises:
Title Page giving main objective of the research study
Letter of Transmittal (where relevant as explained earlier)
Letter of Authorisation (where relevant as explained earlier)
Table of Contents
Summary/Executive Summary containing main results/conclusions/recommendations
Main Body
It comprises:
Relevance/Introduction
Methodology—Methods used
Results, Conclusions, Recommendations
Scope and Limitations of Data, Methods and Conclusions
Appended Part
It comprises:
Terminologies used in the Report
Presentation of Data in the form of Tables/Charts/Graphs
Calculations used for drawing conclusions
Bibliography and References
In yet another classification, some portions of the report could be termed ‘Cosmetic’ parts wherein
the researcher can use his/her artistic creativity to make the appearance of the report attractive enough
for the reader. However, it is no substitute for the quality of the report, but like cosmetics, it gives
a pleasant feeling to the reader, and motivates the reader to read on.
Therefore, the preparation of such presentation is similar, in a way, to the report writing, and
therefore demands as much attention as to the report writing. Some tips for this purpose are as fol-
lows:
It demands more of creativity as compared to report writing.
Apart from font type and size, colour combination is quite important as is use of contrast—light
on dark or dark on light.
Font size should be easily readable even from the last row of the audience.
First page should start with ‘Welcome to the Presentation of …..’
Title of the report may be condensed in 2 to 3 words, and may be on all pages. The pages may
also contain logo of the Institute (for students), Company (for consultants and executives).
A slide should contain only few points and lines containing only most distinguishing words.
Charts and graphs may be preferred over text, for better grasping.
The points may come on the screen, one after the other, next point coming only after the previ-
ous one is discussed.
Verbs and adjectives may be avoided; could be mentioned by the presenter.
The last slide should express ‘Thanks’ to the audience.
A sample is given in the CD provided with the book.
One of the Ph.D. students went to show his first paper to his Ph.D. guide viz. Late Padmab-
hushan Dr. V. S. Huzurbazar. Dr. Huzurbazar took the paper in his hands, and even without
opening the pages, returned the paper, and asked the student to revise the paper, and meet him
after two weeks. The student was a bit shaken for a moment. However, as directed, he worked
on the paper again, and revised it. When he showed the paper to Dr. Huzurbazar, he took the
paper in his hands, turned a few pages at random, and retuned the paper saying that the paper
needed further revision. The student was rather disappointed but as a sincere student worked
on the paper once again, and presented the paper to his guide after improving the language
and presentation of contents. Dr. Huzurbazar asked the student about the difference in the first
and third version of the paper. The student admitted that the third version was much better
than the first one. Dr. Huzurbazar, then read the entire paper, and complimented the student
for the good work, advised to make a few changes and then sent the paper for publication to
one of the most prestigious journals in USA. The student was thrilled to receive acceptance
of the paper for publication in the journal.
However, the revisions should not come in the way of timely submissions of the report.
Report Writing 15.9
Prepared by :
Mayank Aggarwal #03, Shabnam Charaniya #12, Akshay Kant #26, Suhaib Sayeed #46, Arpit
Shah #49
NMIMS University
15.10 Business Research Methodology
PROJECT WORK
rt
po
Re
le
mp
VISHESH AGRAWAL: 4
ADITI AGARWAL: 8
Sa
SHREYANSH DEDHIA: 17
PRATEEK CHAMARIA: 18
AKANSHA MANDORA: 43
Report Writing 15.11
rt
po
Re
nichel alexander 05
ashish borar 10
nayan ranjan 27
kaustav maiti 30
le
ishan somaiya 54
mp
NMIMS University
Sa
S U M M A RY
Comprehensive guidelines for preparing and presenting a research study report are provided.
It starts with outlining the relevance of a research study report and its important criteria for ful-
filling its objectives.
The format of a report is there with guidelines for each of the components of the report.
In addition to the general features of a report, the specific relevance for the research report for
various types of research studies, has been provided.
The chapter concludes with guidance for preparing a power point presentation.
DISCUSSION QUESTIONS
1. Describe the format of a typical research study report with the help of a hypothetical study.
2. Describe various types of research studies and their corresponding relevance for a research
report.
3. Prepare a list of ten points as a guidance for writing a research report.
Ethics in Business
Research
16
1. Introduction: Ethics at Individual Level
2. Ethics—Definitions and Norms
Ethical Norms for Professionals
3. Ethical Issues in Business Research
(a) Sponsoring Research
(b) Consultant/Researcher Level
(c) Individual/Group Sponsored Research
Contents (d) Research Team
4. Ethical Standards in Qualitative and Quantitative Research
(a) Qualitative Research
(i) Ethical Obligations for Researchers
(ii) Ethical Obligations for Respondents
(b) Quantitative Research
5. Research Ethics in an Organisation
6. Ethical Issues at Various Levels of a Research Process
LEARNING OBJECTIVES
The objective of this chapter is to acquaint the readers with all the relevant issues associated with
business research. A tabular presentation of ethical issues involved at each step of research process
has been provided to facilitate easy comprehension of various issues associated with the conduct of
research.
“The truly wise man will know what is right, do what is good, and therefore be happy.”
— Socrates (Greek Philosopher)
(Incidentally, Socrates was the first to draw attention to the intrinsic qualities of a person)
“For the right person, virtue denotes doing the right thing, at the right time, to the proper extent, in
the correct fashion.”
— Aristotle (Greek Philosopher)
“If you have integrity, nothing else matters. If you do not have integrity, nothing else matters.”
—Alan K. Simpson (Statesman; USA)
“Commerce is as a heaven, whose sun is trustworthiness and whose moon is truthfulness.”
—Baha’u’llah, Persian founder of the Baha’i religion
16.2 Business Research Methodology
The case relates to a company which was being run professionally and had good reputation. It
was getting its properties insured with an insurance agency. As per the prescribed procedure,
every year a note was initiated by the ‘Organisation and Methods’ Department recommending
the name of an insurance agency for insuring all the properties. Based on the note, and the
comments/suggestions of senior officials, the final decision was taken by the Chairman.
One year, the job of initiating the note was entrusted to a newly recruited officer who had
requisite specialisation and experience. He noted that for the last 3 years, the award was be-
ing given to the same insurance company. He collected the relevant data for all the properties
of the company and from all the eligible insurance companies. The analysis revealed that a
substantial premium amount could be saved if the contract was given to another insurance
company. The note was duly endorsed at all the levels, and finally reached the Chairman. The
Chairman thought that he would discuss with a few top level executives the next day before
taking the decision. It so happened, that the same evening, he went to a party with his wife.
At the party, his wife was pleasantly surprised to meet her old friend, who incidentally was
the wife of the Chairman of the insurance company which was being awarded the contract for
the last 3 years. Next day, the Chairman came to the office, discussed the matter with a couple
of top executives, and ruled that there was no need to change the insuring company!
However, in view of the emphasis of this chapter on ethics relating to conducting business research,
we would like to highlight some of the reported unethical research behaviours by some individuals
that led to developing conscientiousness about ethics in research are as follows:
Researcher-altered data
Reporting of research findings for research studies that were not conducted
Non-existent co-author
Fabricating scientific data about drug tests on hyperactive children
Based on tests, recommended a medical device for smooth functioning of heart; the scientist
had good fortune invested in the company making that device
Not reporting risks to heart patients about use of a medicine
Ethics in Business Research 16.3
Conducting tests on human beings without indicating the true reason for conducting test that
involved serious side affects
Against the above backdrop, we shall discuss the ethical issues relating to conduct of business
research.
A Director of an institute wanted to remove one faculty and appoint another (identified) faculty
in his place. He knew that the faculty was taking much less lectures than others at his level
but he was publishing papers and writing a book. He ordered a study to assess the work load
Ethics in Business Research 16.5
of faculty members, and suggest norms for lectures at the Institute. As a part of the study, data
was collected about the lectures taken by different faculty members. He used that data about
lectures only to convince the governing board that the concerned faculty should be removed.
In one particular year, there was a phenomenal growth in the business of a company. However,
the figures were ordered to be toned down to show only good growth—the balance was to be
used next year to show good growth even if there was decline or marginal growth.
Encouraging/directing inappropriate analysis even if it is not the get the desired results.
Using appropriate analytical tools relevant for the scope and objective of the research
Bringing out fair and true picture in the organisation, and making recommendations without any
inhibition
There are instances where it could be obvious to the consultant, during exploratory study,
that the stated objective of research could be achieved by a study that requires scaled-down
resources as compared to the original assignment, and, therefore, deserving lesser compensa-
tion. The ethical consultants would inform the sponsor about this, irrespective of the reduced
profits and would continue with the original assignment, only if mandated by the sponsor after
deliberating on the advice
In qualitative research, there is a close interaction between the researcher and the participants/
subjects/respondents, and so the ethical obligations to be met by both the researchers and the par-
ticipants or subjects or respondents in a study, assume greater significance. We have discussed the
ethical studies for both types of research separately.
In an all India organisation, an exhaustive database was to be prepared about the employees
comprising factual data about their postings in various departments and various cities, training
attended, etc. as also the data about the training required by them, area of responsibilities for
which they could be well utilised, their preference for postings for the next 5 years, etc.
After a couple of years, the employees noticed that the only information being utilised by
the organisation was for ordering place (city) of postings for those employees who had served
in a city for more than 5 years. It caused a lot of discontentment among the employees re-
sulting in the issue being taken up by their association with the top management. Ultimately,
the management was ‘forced’ to evolve transparent policies about their assignments and
training.
(iii) Salient aspects of the findings—both quantitative and qualitative—should be made known to
the respondents. This would give them the satisfaction that the information and suggestions
made by them have been found useful, and motivates the respondents to co-operate with further
studies.
(Contd)
The need to disguise the exact purpose of research, in order to prevent a bias
in the response, may to some extent require camouflaging using a cover. It
is ethical to inform the respondents, in advance that the procedure is indirect
to enable an unbiased response. It is also preferred that the true purpose is
clarified after receiving the responses
Any use of video or voice recording equipment during the interviews should
be informed to the respondents before hand
It is unethical to conduct the interview with the respondent in a manner, to
receive the desired response
Use of leading questions, which bias the respondents fair and appropriate
response, is unethical. The research document should be designed in to elicit
a genuine response, rather than what is convenient for the researcher
It is unethical for the researcher to disclose the names of the respondents es-
pecially if they have been assured anonymity. It will be correct to inform the
respondents in advance and seek their approval if their identity is planned to
be disclosed. The names of the respondents should also not be used to promote
sales without their explicit permission
Special care should be taken specially when the population size is small, since
it is easier to guess as opposed to samples drawn from a larger population
It is also desirable that quotations and opinions by respondents are delicately
included or avoided in the report when sample is small, to avoid compromis-
ing the respondent’s identity
It is also unethical for the researcher to inflate the standard deviation, thereby
increase the sample size to enhance revenue. It is a standard practice to cal-
culate the sample size based on an estimate of the standard deviation, it is
possible to increase the sample size by inflating the standard deviation.
Ethics in Information Tech- There are many ethical issues that arise from the use of information technology.
nology The most important traits of IT Department personnel are that they should be
honest and trustworthy. It assumes more significance because what they do is
not easily verifiable.
Coding data
Inputting data
Maintaining security and confidentiality of data
Software for generating reports
It is easy to gain access to information. It could lead to privacy invasion.
There is an ethical obligation for the researcher to use only those data that
is relevant to the research. The decision to use secondary or primary data to
arrive at the conclusion should be judicious and not driven by profit alone.
As primary data can be more expensive, secondary data should not be used
just to reduce costs.
Presentation of Data: The data should not be presented so as to deliberately cause visual deception.
Company A Company B
Year Sales Year Sales
2005 2000 2005 2000
2006 2300 2006 3000
2007 2800 2007 3500
2008 3400 2008 4500
2009 4000 2009 5000
(Contd)
Ethics in Business Research 16.11
(Contd)
Data
It may be noted just by increasing the width of the time interval on the x-axis,
the company ‘A’s performance appears to be better than B.
Analysis of data The analysis of data has to be as per the planned methodology. If any change
is contemplated after collecting the data, it has to be justified.
It is an ethical responsibility of the researcher to ensure that the correct method
is utilised to control the errors caused by variables. A pilot study is recom-
mended wherever possible. This will allow an initial evaluation, before the
bulk of the research is conducted, a continuous interval monitoring of research
work is also recommended to allow improvement where necessary.
The assumptions made in the model used for the data needs to be validated
before arriving at conclusions. For example, regression analysis requires the
assumptions of linearity, normality, etc. These should be validated before using
the regression equation for prediction.
The data values, that might lead to results contrary to the findings being
contemplated, should not be ignored
Conclusions
Interpreting Conclusions and Interpretations have to flow directly from the conclusions, giving valid argu-
Report Writing ments. Technical language and jargons should not be used to contrive the
desired results and conclusions.
It is ethical for a researcher to ignore data that is irrelevant and of questionable
quality, before conducting an analysis. The basis for ignoring such data, and the
percentage of the sample affected must preferably be included in the report.
Conclusions have to be factual giving margin of error and confidence that can
be placed therein. It is judicious to take that part of analysis that supports the
intended hypothesis
(Contd)
16.12 Business Research Methodology
(Contd.)
It is important to note that since confidence interval can be impacted by stan-
dard deviation. If in a study, the standard deviation is too large, it may affect
the confidence interval. It is ethical for the researcher to share and jointly
decide with the client the course of action, instead of miscommunication of
the confidence interval.
Web-Assisted Research With the advent of information technology, research work has been assisted by
data available on the net. Data warehousing and mining has largely contributed
to the ease in research work. The present ability of the user to subscribe to
cloud computing, that offers the user an option to request additional resources
on demand, and just as easily release those resources when no longer needed,
have allowed for greater control and flexibility on research expenses. While the
traditional data offerings can also include this facility, in cloud computing it is
automated.
There are several instances when data could be collected from routine
transactions. Biometrics data such as fingerprints, digital computer monitoring,
smart cards, etc. are sources of data. The other sources of data are the application
forms submitted for credit cards and shopping mall memberships, etc.
It should be an individual’s choice, to have the right to online privacy and
expect that they have a choice on how information is collected, used and shared. It
is, therefore, important that raw data is protected from prying eyes. Data security,
quality and measures to maximise the accuracy of personal information is an
ethical obligation of those that put together information used by others.
While filling the data online, routine pop-ups could be generated by hosting
agencies, advising that the data could be accessed by others. This is a way of
securing an informed consent from the user.
S U M M A RY
The ethical issues at individual as well as organisational levels have been highlighted. Further,
there are ethical issues that are relevant in business research at various hierarchical levels, ethical
standards in qualitative and quantitative research, and ethical issues relevant at every step in the
research process. A tabular presentation indicating the extent of variation in research reports that
are suitable for different types of research studies is given.
DISCUSSION QUESTIONS
1. Highlight the importance of ethics, in general, and in business research, in particular.
2. Why the standards are different in Qualitative and Quantitative research? Explain with refer-
ence to a particular study.
3. Describe the ethical issues in business research at various hierarchical levels, with reference to
a particular research study.
4. Describe, in brief, the ethical issues at various steps of a research process.
5. Write short notes on:
(i) Ethics at individual level
(ii) Ethics at organisation level
(iii) Ethical norms for professionals
Indicative Topics for Appendix
Business Research
Several research studies have been reported in books and other media. Equally relevant are the re-
search studies conducted by students at Management Institutes. Here is a list of research studies which
have been or could be conducted by students. The list is only indicative and not comprehensive.
(Contd)
Market development in joint replacement implants
Marketing 'Home Automations' in India the cost divide
Marketing in remittance hub (money transfer)
Marketing potential of LED lighting and emerging technologies
Petrochemical marketing
Relationship between advertising expenditure and sales
Sales force automation
Social networking websites as a marketing tool
Effectiveness of out-of-home advertising in Indian market
Assessing brand awareness in the market—for specific brand
Trends in FMCG industry
Finance/Economics Bailout, a right solution by the government for the economy
and Banking
Bank’s money transfer service in India: Qualitative study of two banks
Basel II: Advantages and challenges for the Indian Banking Industry
Brand Equity Analysis of acquisition strategies of a bank
Credit derivatives—Risk mitigation or amplification
Credit rating procedure of banks/financial institutions
Credit Risk Management—Credit ratings in banking industry evolved with the era
Currency derivatives and its impact on the corporate India
Current trends and challenges in the mortgage industry
Analysis of debt to equity ratio of a selected group of companies
Derivatives/futures and options
Derivative products in international markets
Different mortgage products offered by Indian Mortgage Industry and the options of
introducing new mortgage products in India
Economics of data acquisition and information security spending in the financial sector
in India
Effect of subprime lending on financial institutions
Effects of dollar fluctuation on Indian Trade (EXIM)
Equity research
Evaluation of performance of credit cards in India
Factors affecting users choice of bank
FDI or FII—which is a better option for Indian economy
Feasibility of implementing mobile banking systems for urban India
Feasibility of V.L.S.I. design services in India
Feasibility study: should RBI introduce Credit Default Swaps (CDS) in India?
(Contd)
Appendix I—Indicative Topics for Business Research A.3
(Contd)
Financial re-engineering of a company
Fixed income and money market in India
Forex management
Further liberalisation of financial markets in India
Game theory: applications in commerce and industry
Government treasury bills in an economy
Identifying critical factors responsible for the rising level of NPA in Indian banking
system
Identifying key factors and financial ratios for successful and unsuccessful companies
Identifying traits of good and defaulting borrowers
— auto loan
— credit cards
— home loan
Identifying relevant factors for success in mergers and acquisitions
Impact of transportation system on economic progress of a country
Implementation of Basel II in Indian banks with particular focus on credit risk manage-
ment
Indian Mutual Funds Performance 2001-2010
India's export policy and its impact on the country's economic development
IPO: means of finance for SMEs: benefits vs. disadvantages
Issue management (investment banking) and launch of NFO scheme analysis
Issues concerning valuation of venture capital and private equity funds
Key account management
Mergers and acquisitions—Opportunities and challenges
Recovery of education loans
Mutual funds for retail/individual investor
Study of mutual funds in India
Operational asset effectiveness
Process innovation in retail banking
Project finance for major infrastructure projects
Quality of assets
Receivables and fixed asset management
Relationship between interest rates and bank deposit patterns of customers
Risk in international trade
Risk management
Setting up private equity venture capital funding for small to medium enterprises
Short-term trading strategies for equity markets
(Contd)
A.4 Business Research Methodology
(Contd)
Significance of volatilities in derivative markets
Study of credit ratings in India
Study on currencies of emerging market as a potential alternative to dollar
Study of security issues and risk management in online banking
Technical analysis of stocks and commodities
Application of clustering techniques to physical representation in asset performance
benchmarking systems
Applications of modern portfolio theory to multi-asset management efficiency in property
portfolios
Distortion of India FDI policies in the new era: coalition between foreign investor and
local government
Economic effects of India's foreign policies
Impact of Internet on banking
The planning, evaluation, financing and implementation of projects: a critical review of
experience in India
Trends in billing cycle management of SME contracting firms
Trends in working capital management of SME contracting firms
Venture capital for real estate funds
What are the consequences of the Internet for banks and brokerage firms?
Operations Study of import purchasing decision behaviour
Comparative analyses of transport provision, land-use patterns and travel conditions in
cities in developing and developed countries
Decision-making framework for managers: Profit by forecasting, costs and price manage-
ment
Determining the procurement quantity and time of procurement
EAI—Enterprise Application Integration
Effective Resource Management in projects
Evaluation of the organisation with reference to Organisational Project Management
Maturity Model (OPM3)
Impact of GST implementation on logistics cost for pesticides industry: Bayer perspec-
tive
Inventory Management through Kanban systems
Logistics management of chemicals in refineries
Monitoring and analysis of road traffic speeds using GPS technology
Operations research
Optimising operational efficiency
Process management
Process migrations and outsourcing
Reduction in the turnaround time for a mortgage offer
(Contd)
Appendix I—Indicative Topics for Business Research A.5
(Contd)
Role of logistics in supply chain management
Scope of supply chain and procurement in retail
Service innovations
Six Sigma—Controlling and improving production process and quality
Six Sigma Methodology in service sector
Supply chain management
Technology absorption
Temperature control warehousing
Development of transportation modes in India
Evolution of ERP in our new global economy
Influence of highway geometry and traffic conditions on vehicle speeds and driver be-
haviour.
Provision, impacts and funding of concessionary public transport fares for elderly and
disabled people
Role of market forces and regulations on transport policy
Track record of Build, Operate and Transfer (BOT) Projects in India
Third party inspection services in engineering industry in India
Impact of on time performance on passenger preference in airline industry
TQM implementation by training initiatives
Transportation issues in less developed countries like India
Use of remote process monitoring as an optimisation tool
HRD A positive progressive approach to training and learning development
Absenteeism and motivation in production units
Attracting, selecting and retaining IT professionals
Conflict management in the workplace
Contingent workforces in the hospitality industry
Diversity of workforce contributes more to organisational efficiency
Does increased fitness cause a greater reduction in stress?
Employers' experiences of shortages of skilled workers in India
Exploring the linkage between managerial cognitions and organisational effectiveness
Facilitation of action learning groups: An action research and grounded theory investiga-
tion
Factors affecting performance of employees
HR challenges during mergers and acquisitions
Identifying traits of successful and not so successful employees
Impact of downsizing
Impact of training on employee performance
Leadership: being an effective project manager
(Contd)
A.6 Business Research Methodology
(Contd)
(Contd)
Impact of information technology on governance i.e. e-Governance
Impact on QMS of a software company
Internet privacy and institutions
IT in BPO/KPO space
IT security and risk management
Moving IT infrastructure offshore
Network management
Neural networks/fuzzy logic as decision support systems
Online auction assisting potential bidders through information systems
Online banking: advantages and disadvantages
QA activities on in-house financial software
Remote infrastructure management in IT industry
Safety and privacy: Internet security for the private user
Sailing through rough weather: Journey of Indian 3D animation industry
Study of online gaming in India
The business of open source software
The growth of E-commerce
Trends in the IT industry
Trends in IT outsourcing
Users’/client’s perception of a good web design
Value addition of quality assurance processes to software testing
Web 2.0—opportunity or a fad
Website design and management
Strategy BPO boom in India
Enterprise application and integration
Evaluation of cost cutting strategy
Feasibility of setting up computer education centres
Impact of US recession on Indian BPO industry
Innovation as part of business strategy
Knowledge management in BPO space—opportunities and challenges
Setting up an enterprise in India
Strategy formulation in IT/ITES industry in response to different economic/business
scenarios
Study of Indian outsourcing industry
Study of process improvement tools for BPO industries in India
The BPO industry—The call routing process and the factors affecting its revenue
The impact of high attrition in BPO industry
(Contd)
A.8 Business Research Methodology
(Contd)
Supplier partnership for quality management
Transition management in BPO's—A review
Insurance Indian Life Insurance Companies foray into health and pensions products
Technological breakthroughs as a source of competitive advantage for Indian insurance
companies
Analysis of the life insurance industry in India
Bancassurance—Global trends in insurance
Designing insurance policies of various types
Future of insurance in india
Growing insurance industry in India
Impact of different factors on health and life
Impact of PFRDA regulation on insurance pension policies
Market leaders in insurance—ULIP products of insurance company
Pros and cons of centralisation/decentralisation in general insurance back office
operations
Service excellence—A source of competitive advantage for life insurance companies in
India
Understanding the modern insurance to design better consumer stimuli
Telecom Criteria for selection of phone and service provider among different age/income/profes-
sional groups
Declining ARPU's—A concern for Indian telecom industry
Effects of changing economic scenario/business environment on telecom industry
FDI in India and its effect on telecom sector
Managed services prospects in telecom industry in India
M-Commerce—landscape, applications, trends and future
Opportunities with 3G technology for Mobile VAS in India
Sustainability in Indian telecom market
The growth of the wireless communications industry
The Indian Telecom Industry—Focus on cellular market growth strategy and upcoming
technologies
Retail Management Evaluating declining sales of a retail outlet
Growth of luxury retail in India
Identifying customer buying behaviour: Preferences and patterns
Impact of economic slowdown on organised retail in India
Impact of Foreign Direct Investment on the retail industry
Research on consumer behaviour in malls
Organised retailing in India—Opportunities and challenges ahead
(Contd)
Appendix I—Indicative Topics for Business Research A.9
(Contd)
(Contd)
Fund raising: India v/s other countries
Future of background screening in India
Future of batteries in Industrial applications
Future of entertainment industry in India
Future of gamma radiation processing in India
Industrial applications of nuclear radiation
Future of public transport system in Mumbai
Future of SMART GRID Solution in India—with special focus on Pvt. Distribution Co.
in metro cities
Future of BTL as a mode of communication with the Consumer in India
Global warming: Implications on the world environment
Growth of India's Contract Research Organisation industry
Holding the line: Call centres, resistance, accommodation and
self-impact of newspaper on children
Impact of research and development on real estate industry
India housing market analysis
Indian processed foods industry: A vision
India's green revolution: Forecasting its agriculture success
Investment in gold during recessions
Issues and prospects of the real estate sector in India
Media—Its avenues and effects
New Topic: Evolution of ticketing in aviation industry
Past, present and future of cancer treatment in India
Private equity real estate market in India
Public private partnership for infrastructure development in India
Real estate: Project feasibility study
Research in job portal industry
Reuse of STP treated water for nonpotable application
Setting of energy in rural India
Sources of power generation and its effectiveness in the Indian context
Study of a gold rated green building in Mumbai under the category core and shell
The current and future status of women in India
The dynamics of an "Industry" called cricket
The emergence of stronger individualism in India
Assessing casual clothing preference of youth
Popularity of Indian fast food v/s continental fast food in Indian market
Bottlenecks in iron-ore exports from Indian sea ports
(Contd)
Appendix I—Indicative Topics for Business Research A.11
(Contd)
Bottlenecks in using "Renewable Electric Generation" resources
Trends in clinical research outsourcing
Trends in Indian automotive paint industry
Wine market in India
B-School Association between educational background and performance in terms of grades for
PGD students
Creativity in training systems
Economic growth in India: The role of human capital and education
Factors contributing to the satisfaction/dissatisfaction levels in MBA programme
Future of e-learning in India
Work stress experienced by students in management institutes
Excel—A Tool for Appendix
Statistical Analysis
A.1 INTRODUCTION
Microsoft Excel, commonly known as MS Excel, is one of the popular application softwares in-
cluded in the MS Office package of Microsoft Corporation. Excel is popular mainly for the use in
financial analysis. Excel also comes with in-built statistical functions and various other applications
that help statisticians carry out statistical analysis. However, Excel’s capability to perform statistical
analysis is not as popular as financial analysis. Here we focus on the use of Excel for various types
of statistical analysis, described in this book.
The current worksheet is called the active worksheet. To view a different worksheet in a work-
book, one may click the appropriate worksheet tab (given as point 10 above).
One can access and execute commands from the main menu or by pointing to one of the toolbar
buttons. When one places the cursor over these tool bar buttons, the Excel displays the name/action
of the button.
Moving Around the Worksheet
It is important to be able to move around the worksheet effectively because one can only enter or
change data at the position of the cursor. One can move the cursor by using the arrow keys or by
moving the mouse to the required cell and clicking. Once selected, the cell becomes the active cell
and is identified by a thick border; only one cell can be active at a time.
To move from one worksheet to another one may click the worksheet tabs. The name of the ac-
tive sheet is shown in bold.
Moving Between Cells
These are some commonly used keyboard shortcuts to move the active cell. These are:
Home—moves to the first column in the current row
Ctrl+Home—moves to the top left corner of the document
End and then Home—moves to the last cell in the document
Ctrl+End—moves to the last cell in the document
Home and then End—moves to the last column of the current row
To move between cells on a worksheet, one may click any cell or use the arrow keys. To see a
different area of the sheet, one may use the scroll bars and click on the arrows or the area above/
below the scroll box (indicated in point 11) in either the vertical or horizontal scroll bars.
Formulae automatically do calculations on the values in other specified cells and display the
result in the cell in which the formula is entered. It may be noted that the formula would not
be displayed in the cell but just the result of the formula will be displayed. One could look
into formula bar (part 4 mentioned above) to read the formula.
A snapshot of a excel workbook is given below:
In this workbook, the active cell is A1 which is indicated by highlighting of Column A and
Row 1. To enter information into a cell, one may select the cell and begin typing. As seen above,
one may select the cell A1 and type “Item of Expenditure” in the cell.
It may be noted that as one types some information into the cell, it is also displayed in the formula
bar.
After one completes typing in the cell, one could either Press “Enter” to move to the next cell
below (in the above case, A2) or this could be done with the help of Tab key which would take
cursor to cell B1. One could also select any cell by clicking on it.
Unless the information one enters is classified as a value or a formula, Excel interprets it as a
label. The default alignment for a text or label is ‘left’.
Entering Values:
A value is a number, date, or time, plus a few symbols if necessary to further define the numbers
(like., +, –, ( ), %, $, /, etc.).
Numbers are assumed to be positive; to enter a negative number, one may use a minus sign “-”
or enclose the number in parentheses “( )”.
Dates are stored as MM/DD/YYYY, but one does not have to enter it precisely in that format. If
you enter “jan 1” or “jan-1”, Excel would recognise it at January 1 of the current year, and store it
as 1/1/2007.
The default alignment for number is ‘right’. Excel identifies any value as a number, if it does not
contain any characters in it. E.g. ‘34a’ would not be considered as a number.
Excel—A Tool for Statistical Analysis A.5
In the above snapshot, the active cell is B7 and the formula entered in the cell is
=SUM (B2:B6)
If this formula is copied to the next cell at the right, i.e. to the cell C7, it would get copied rela-
tively. It would be now
=SUM (C2:C6)
and the value of this formula would become 100.
If the same formula is copied down B7, i.e. in the cell B8, it would be now,
=SUM(B3:B7)
And the value of this formula would become 18500 (excludes 1500 and includes 10000).
Excel shifts the reference of the cell as it is copied. It would increase the column address, if
copied at right, and would increase the row address, if copied down. This is very useful especially
when one wants to use the same formula for different rows and columns.
Absolute address is the address which remains the same irrespective of the direction (left, right,
up or down) in which it is copied. To make an address as absolute, one may assign a $ sign before
the row as well as column. For example, in the above case, if the formula entered in the cell B7
was,
= SUM($B$2:$B$6)
then, if one copies this formula anywhere in the worksheet, it would remain the same, and the value
at that place would also be same, i.e. 10000.
One could also have partial absolute address. When one puts $ for either row or column address,
it is called partial absolute address. For example,
$B2 – means the column is absolute, and the row is relative. This formula when copied to next
column would remain same i.e. $B2, but when copied to next row, would become, $B3.
B$2 – means the column is relative, and the row is absolute. This formula when copied to next
row would remain same i.e. B$2, but when copied to next column, would become, C$2.
The above snapshot shows the edit menu of the menu bar. It shows the options as well as some
keywords. For example, in the first option ‘Undo Clear’, the key word is Ctrl + Z. This keyword is
the shortcut key for that option. Excel has provided keywords or shortcuts for most options in the
menu bar. We have provided a list towards the end of this Chapter.
Statistical Functions
(Arranged in alphabetic order)
Function Name Function Utility
AVEDEV Returns (gives) the average of the absolute deviations of data points from their mean
AVERAGE Returns the average of its arguments
AVERAGEA Returns the average of its arguments, including numbers, text, and logical values
BINOMDIST Returns the individual term of a binomial distribution
CHIDIST Returns the one-tailed probability of the chi-squared distribution
CHITEST Returns the test for independence
CONFIDENCE Returns the confidence interval for a population mean
CORREL Returns the correlation coefficient between two data sets
COUNT Counts how many numbers are in the list of arguments
COUNTA Counts how many values are in the list of arguments
COVAR Returns covariance, the average of the products of paired deviations
DEVSQ Returns the sum of squares of deviations
EXPONDIST Returns the exponential distribution
FDIST Returns the F probability distribution
FISHER Returns the Fisher transformation
FORECAST Returns a value along a linear trend
FREQUENCY Returns a frequency distribution as a vertical array
FTEST Returns the result of an F-test
GEOMEAN Returns the geometric mean
GROWTH Returns values along an exponential trend
HARMEAN Returns the harmonic mean
INTERCEPT Returns the intercept of the linear regression line
KURT Returns the kurtosis of a data set
LARGE Returns the k-th largest value in a data set
LINEST Returns the parameters of a linear trend
LOGEST Returns the parameters of an exponential trend
LOGINV Returns the inverse of the lognormal distribution
LOGNORMDIST Returns the cumulative lognormal distribution
MAX Returns the maximum value in a list of arguments
MEDIAN Returns the median of the given numbers
MIN Returns the minimum value in a list of arguments
MODE Returns the most common value in a data set
NORMDIST Returns the normal cumulative distribution
NORMSDIST Returns the standard normal cumulative distribution
PEARSON Returns the Pearson product moment correlation coefficient
PERCENTILE Returns the k-th percentile of values in a range
PERCENTRANK Returns the percentage rank of a value in a data set
PERMUT Returns the number of permutations for a given number of objects
POISSON Returns the Poisson distribution
PROB Returns the probability that values in a range are between two limits
QUARTILE Returns the quartile of a data set
RANK Returns the rank of a number in a list of numbers
RSQ Returns the square of the Pearson product moment correlation coefficient
SKEW Returns the skewness of a distribution
SLOPE Returns the slope of a linear regression line
(Contd)
Excel—A Tool for Statistical Analysis A.11
(Contd)
SMALL Returns the k-th smallest value in a data set
STANDARDIZE Returns a normalized value
STDEV Estimates standard deviation based on a sample
STDEVA Estimates standard deviation based on a sample, including numbers, text, and logical val-
ues
STDEVP Calculates standard deviation based on the entire population
STDEVPA Calculates standard deviation based on the entire population, including numbers, text, and
logical values
STEYX Returns the standard error of the predicted y-value for each value of x in the Regression
equation
TDIST Returns the Student’s t-distribution
TREND Returns values along a linear trend
TRIMMEAN Returns the mean of the interior of a data set
TTEST Returns the probability associated with a Student’s t-test
VAR Estimates variance based on a sample
VARA Estimates variance based on a sample, including numbers, text, and logical values
VARP Calculates variance based on the entire population
VARPA Calculates variance based on the entire population, including numbers, text, and logical
values
ZTEST Returns the two-tailed p-value of a z-test
Source: Microsoft Excel Help available with the software. For using the above functions, one can refer to tutorials,
which are integrated parts of the Excel software.
(Contd)
F3 key Search for a file or a folder
ALT+ENTER View the properties for the selected item
ALT+F4 Close the active item, or quit the active program
ALT+ENTER Display the properties of the selected object
ALT+SPACEBAR Open the shortcut menu for the active window
CTRL+F4 Close the active document in programs that enable you to have
multiple documents open simultaneously
ALT+TAB Switch between the open items
ALT+ESC Cycle through items in the order that they had been opened
F6 key Cycle through the screen elements in a window or on the desktop
F4 key Display the Address bar list in My Computer or Windows Explor-
er
SHIFT+F10 Display the shortcut menu for the selected item
ALT+SPACEBAR Display the System menu for the active window
CTRL+ESC Display the Start menu
ALT+Underlined letter in a menu name Display the corresponding menu—Underlined letter in a command
name on an open menu (Perform the corresponding command)
F10 key Activate the menu bar in the active program
RIGHT ARROW Open the next menu to the right, or open a submenu
LEFT ARROW Open the next menu to the left, or close a submenu
F5 key Update the active window
BACKSPACE View the folder one level up in My Computer or Windows Ex-
plorer
ESC Cancel the current task
SHIFT while inserting CD-ROM drive to prevent it from automatically playing
(Contd)
Windows Logo+E Open My Computer
Windows Logo+F Search for a file or a folder
CTRL+Windows Logo+F Search for computers
Windows Logo+F1 Display Windows Help
Windows Logo+ L Lock the keyboard
Windows Logo+R Open the Run dialog box
Windows Logo+U Open Utility Manager
(Contd)
CTRL+W Close the current window
CTRL+ALT+Minus sign (-) Place a snapshot of the active window in the client on the Terminal
server clipboard and provide the same functionality as pressing
PRINT SCREEN on a local computer.
CTRL+ALT+Plus sign (+) Place a snapshot of the entire client window area on the Terminal
server clipboard, and provide the same functionality as pressing
ALT+PRINT SCREEN on computer.
A.5 EPILOGUE
We have provided a brief overview of Excel, which is considered necessary to understand the use
of excel templates for statistical calculations, discussed in this book. The contents and coverage of
Excel, here, have been decided in consonance with this objective. For detailed coverage of Excel,
one may like to refer to the books written exclusively on Excel. A few of these are included in the
list of books given under the title ‘Some Other Useful Books…’ for all the topics covered in this
book.
Introduction to IBM SPSS Appendix
Statistics 18
IBM SPSS Statistics is the preferred choice of most Government, Academic, Research and Corporate
organisations in India for Data Analysis and Reporting. SPSS can be used for performing basic as
well as advanced analysis on data. In this chapter the interface of the newly launched version 18 of
IBM SPSS Statistics is explained in brief.
SPSS Statistics has three views, Data View, Variable View and the Output View. The data view
shows the data entered into the file. The data is entered in a way that each row represents a record,
and each column represents the variable. The variable view is used for defining and labelling vari-
ables, while the output window displays the results of all analysis. These are shown in the following
snapshots.
The result of any analysis done is displayed separately in the output window. A typical output win-
dow is displayed as follows:
Introduction to IBM SPSS Statistics 18 A.3
The data in SPSS can be either entered directly or can be imported through Excel or a database file.
Following snapshots explains importing data from Excel file:
Most analysis can be done using the ‘Analyze’ option from the menu.
Introduction to IBM SPSS Statistics 18 A.9
Common Factor Analysis It is a statistical approach that is used to analyse inter- relationships
(CFA) among a large number of variables (indicators) and to explain these
variables (indicators) in terms of a few unobservable constructs
(factors).
Communality The amount of variance, an original variable shares with all other
variables included in the analysis. A relatively high communality in-
dicates that a variable has much in common with the other variables
taken as a group.
Comparative Scaling The comparative scales involve direct comparison of the different
Techniques objects.
Completely Randomised The experiment concerned with the study of only one factor. Each
Experiment treatment/factor is assigned or applied to the experimental units with-
out any consideration
Concept concepts are components of constructs
and are concrete, and are therefore measurable
Concurrent Deviations. Measure of correlation that depends only on the sign ( and not mag-
nitude ) of the deviations of the two variables, say x and y, recorded
at the same point of time from their values at the preceding point of
time.
Confidence Coefficient Complement of level of significance
Confidence Interval or The interval or limits within which the true value of the parameter
limits lies.
Confidence Level It is expressed in percentage e.g. 95%, and indicates the degree of
confidence that the true value of the parameter lies in the specified
interval
Confirmatory Factor This technique is used when the researcher has the prior knowledge(on
Analysis (CFA) the basis of some pre-established theory) about the number of fac-
tors the variables will be indicating. This makes it easy as there is no
decision to be taken about the number of factors and the number is
indicated in the computer based tool while conducting analysis.
Conjoint Analysis Involves determining the contribution of variables ( each of several
levels) to the choice preference over combinations of variables that
represent realistic choice sets ( products, concepts, services, compa-
nies, etc.)
Consistent Estimator A property which implies that the estimate tends to the true value
of the parameter as the sample size increases
Constant Sum Scaling In this technique a respondent is asked to allocate certain points, out
of a fixed sum of points, for each object according to the importance
attached by him/her to that object. If the object is not important, the
respondent can allocate zero point, and if an object is most important
he/she may allocate maximum points out of the fixed points.
Construct A construct is an abstract based on concepts, or can be thought of as
a conceptual model that has measurable aspects.
G.4 Glossary
Divergent Thinking The thinking that is quite different from usual ways of doing and
observing. It helps to develop insights and new ideas.
Dummy Activities The activities which do not consume any resources. These are used
just to connect events.
Economy of a Is defined as the time spent by a respondent to answer the question-
Questionnaire naire.
Editing To ensure consistency in the responses, and to locate omission(s) of
any response(s) as also to detect extreme responses, if any. It also
checks legibility of all the responses and seeks clarifications in the
responses wherever felt necessary
Efficient Estimator A property which implies that the variance of the estimator is minimum
as compared to any other estimator.
Eigen value for Discrimi- Eigen value for a discriminating function is defined as ratio of between
nant Function groups to within group sum of squares.
Eigenvalue in Factor Eigenvalue for each factor is the total variance explained by each
Analysis factor.
Empirical Research Research based on observed data without the support of any theory
or model.
Estimator A function of sample values to estimate a parameter of a
population.
Ethics (also known as moral is defined as a branch of philosophy which seeks to address questions
philosophy) about morality; that is, about concepts like good and bad, right and
wrong, justice, virtue, etc.
Ex Post Facto Design The design used for studying phenomenon / event which is already
occurred.
Experiment The study conducted by manipulating the independent variables to
understand the relationships between the independent and the depen-
dent variables and
Experimental Designs Involves specification of treatments and the method of assigning
experimental units to each treatment
Experimental Unit The object on which the experiment is be conducted.
Exploratory Factor Analysis Used when there is no prior knowledge about the number of factors
(EFA) that the variables will be indicating. In such cases, computer based
techniques are used to indicate appropriate number of factors.
Exploratory Research Research conducted to explore a problem, at its preliminary stage,
to get some basic idea about the solution at preliminary stage of a
research study.
External research Research conducted for an organisation by an outside agency like a
consultant, consultancy firm or a professional.
External Validity Related to generalisability of the findings/results to other situations.
Extraneous variable It is outside or external to the situation under study, and its impact on
dependent variable is beyond the scope of the study.
Glossary G.7
Haphazard Sampling In such sampling, the units from the population are selected without
any set criteria. They are selected based on the preference, prejudice
or bias of the person(s) selecting the sample.
Historical Research The process of systematically examining past events to give an ac-
count; may involve interpretation to recapture the nuances, person-
alities, and ideas that influenced these events; to communicate an
understanding of past events.
Hypothesis Educated or informed guess about the answer to a question framed
in a particular study.
A statement or assumption or a claim about a parameter of a
population.
Icicle Diagram It displays information about how cases are combined into clusters at
each iteration of the analysis.
Implementing This is the physical part of carrying out the project as per the
schedule.
Independent /Explanatory/ is a variable which influences the other variables, under consideration,
Causal Variable in the study. The value of this variable can be decided or controlled
by the researcher.
Independent Jobs/Activi- A job which can be started on its own is called ‘independent’ job. Two
ties jobs are said to be independent of each other, if both the jobs can be
done independently or simultaneously.
Induction/Inductive Logic/ Induction reasoning starts from 'Specific' observations or set of
Inductive Approach observations to Generalised Theory or Law. It could be termed as
‘bottoms-up' approach.
Inductive A form of reasoning in which a generalized conclusion is formulated
from particular instances
Inductive Analysis A form of analysis based on inductive reasoning; a researcher using
inductive analysis starts with answers, but forms questions throughout
the research process.
Interaction of Factors Simultaneous impact of two factors
Interdependence Interdependence techniques do not assume any variable as independent
Techniques /dependent variables.
Internal Consistency The extent to which all questions or items assess the same character-
istic, skill, or quality.
Internal Research A research conducted by a team of in-house experts of the
organisation.
Internal Validity The rigour with which the study is conducted (e.g., the study's design,
the care taken to conduct measurements, and decisions concerning
what to be measured and the extent to which alternative explanations
for any causal relationships are taken into account.
Internal Validity Internal validity describes the ability of the research design to unam-
biguously test the research hypothesis.
Glossary G.9
Interval Scale A measurement scale, whose successive values represent equal value
or amount of the characteristic that is being measured, and whose base
value is not fixed, is called an interval scale.
Intervening Variable In a study involving independent and dependent variables, there could
be a variable / factor which might affect the dependent variable, but it
cannot be directly observed or measured then this could be considered
as intervening variable.
Interviews A research tool in which a researcher asks questions of participants;
interviews are often audio/video-taped for transcription and analysis,
later on.
Inverse Sampling A method of sampling which requires that drawings of random samples
shall be continued until certain specified conditions dependent on the
results of the earlier drawings have been fulfilled, e.g. until a given
number of units of specified type have been found.
Investigative Questions/ The next level of the questions’ hierarchy is the Investigative ques-
Issues tions. These questions disclose specific information that is useful to
answer the research question.
Itemised Rating Scale In an itemised scale, respondents are provided with a scale having
numbers and/or brief descriptions associated with each category. The
categories are usually ordered in terms of scale position. The respon-
dents are asked to select one of the categories, that best describes the
product, brand, company or any other attribute being studied.
Judgement Sampling In such type of sampling, the selection of units, to be included in the
sample, depends on the judgement or assessment of the person(s)
collecting the sample.
Kaiser Meyer Olkin This is an index used to test appropriateness of the factor analysis.
(KMO) Measure of High values of this index, generally, more than 0.5 , may indicate that
Sampling the factor analysis is an appropriate measure, where as the lower values
Adequacy (less than 0.5) indicate that factor analysis may not be appropriate.
Levels of a Factor Different values of a factor. Values of the factor that are used like 10
gms, 12 gms and 15 gms per sq. meter of land, dosages of medicine,
duration of training, Percentages of discounts(5%,10%, 15%)
Likert Scale/Summated comprise of statements that express either a favourable or an unfavour-
Rating Scales able attitude toward the object of interest on a 5 point,7 point or on any
other numerical value. The respondents are given a list of statements
and asked to agree or disagree with each statement by marking against
the numerical value that best fits their response. The scores may be
summed up to measure the respondent’s overall attitude
Longitudinal Studies Studies which are conducted over a period of time
Management Question/ The Management dilemma gets translated into management Questions.
Issues The management questions convert the dilemma in to question form
G.10 Glossary
Maximum Likelihood This method is used in logistic regression to predict the odd ratio for
Estimation the dependent variable. In least square estimate, the square of error
is minimized, but in maximum likelihood estimation, the log likeli-
hood is maximized
Measurement Questions/Is- These questions allow the researcher to collect specific information
sues required for the research study. For each investigative question, the
measurement questions are asked.
Moderating Variable In a study involving an independent variable and a dependent variable,
a relationship could be established through a variable. However we
may come across a third variable, which is not an independent variable
but forms strong contingent/ contextual effect on the relationship of
the independent and dependent variable is moderating variable.
Monte Carlo Simulation This technique uses modeling of key variables with defined
random distributions to cover potential values in solving analytical
problems.
Multicollinearity Refers to the existence of high correlation between independent
variables.
Multidimensional Scaling It is a set of procedures for drawing pictures of data so as to visualise
and clarify relationships described by the data more clearly.
Multiple Choice, Multiple- In this scales the respondent is given a list of multiple choices and
Response Scale/Check List can choose more than one choices from the list.
Multiple Choice, Single- In such scales there are multiple options for the respondent , but only
Response Scale. one answer can be chosen.
Multiple Regression Analy- Deals with the study of relationship between one metric dependent
sis variable and more than one metric independent variables
Multivariate Analysis of It explores, simultaneously, the relationship between several non-
Variance ( MANOVA) metric independent variables and two or more metric dependant
variables.
Nominal Scale A qualitative scale without any order is called nominal scale.
Non- Participant This is the qualitative method of data collection where the researcher
Observations does not become a part of the group for observation.
Non-Comparative Scaling The non-comparative scales involve scaling of each object indepen-
Techniques dently of other objects.
Non-Parametric Tests Tests of significance used when certain assumptions about the usual
tests of significance are not valid or doubtful.
Non-Random/ Non- In this type of sampling scheme, the selection of units is subjective
Probability Sampling and not based on any probability considerations.
Normative Exploratory This is the normative research of exploratory nature
Research
Normative Research Usually conducted while developing a new product, service or system
to assure whether desired objective / standard has been achieved.
Null Hypothesis An assumption or claim about a specific parameter
Glossary G.11
Qualitative/Categorical/ The variables which cannot be quantified into some numeric value
Non-Metric Variables and the arithmetic calculations like addition, subtraction, etc. cannot
be performed.
Qualitative Factor Not measured numerically like gender, colour, location
Qualitative Research Researcher immersed in the phenomenon to be studied, gathering data
which provide a detailed description of events, situations and interac-
tion between people and things, providing depth and detail.
In which the researcher explores relationships using textual, rather
than quantitative data. Case study is considered a form of qualitative
research. Results are not usually considered generalisable, but are
often transferable.
Quantitative/Numeric/ The variables which can be represented with some numeric value
Metric Variables and the arithmetic calculations like addition, subtraction, etc. can be
performed.
Quantitative Factor Measured on a numerical score like discount in price, is based on
Quantitative Research quantitative or qualitative data relating to measurement, counting and
frequency of occurrences and other statistical analysis. In which the
researcher explores relationships using numeric data. Survey is gener-
ally considered a form of quantitative research. Results can often be
generalized, though this is not always the case.
Quasi-experiment A scientific research method primarily used in the social sciences.
“Quasi” means likeness or resembling, and therefore quasi-experiments
share characteristics of true experiments which seek interventions or
treatments.
Questionnaire A set of questions asked to individuals to obtain useful information
about a given topic of interest.
Quota Sampling Such sampling is, sometimes considered a type of purposive sampling.
It is usually resorted when some quota about the number of units to
be included in the sample is fixed.
Random/Probability It’s a sampling procedure where each and every unit of population has
Sampling some pre defined probability to be selected in a sample.
Randomised Block Design Such experimental design involves study of two or more factors. One
experimental unit from each of the blocks, say ‘n’ in number, is as-
signed to each of the, say, ‘m’ treatments. Thus, ‘n’ blocks have ‘m’
treatments in each block.
Rank Order Scaling Rank order scaling technique prompts respondents to rank a given
list of objects
Rank Sum Sum of ranks
Ratio scale is quantitative measure with fixed or true zero.
Refining Problem This is a stage of research process where the problem is redefined by
further investigation in the research study.
Regression Analysis Study of the relationship among two or more variables
Glossary G.13
Relational Hypotheses Such hypotheses are concerned with studying or analysing relationship
or correlation between two variables (characteristics) or among more
than two variables (characteristics) of a population
Relative Percentage of a Equals a function's eigenvalue divided by the sum of all eigenvalues
Discriminant function of all discriminant functions in the model. Thus it is the percent of
discriminating power for the model associated with a given discrimi-
nant function.
Reliability Reliability indicates the confidence one could have in the measurement
obtained with a scale. It tests how consistently a measuring instrument
measures a given characteristic or concept.
Request for Proposal A document relating to a project that is prepared and issued by the
(RFP) promoter, for competitive bidding by the interested agencies.
Research It is an organised systematic data-based scientific inquiry, or investiga-
tion into a specific problem, undertaken with the purpose of finding
answers or solutions to it.
Research Design A comprehensive plan of the sequence of operations that a researcher
intends to carry out to achieve the objectives of a research study. It
provides the conceptual structure or blue print for the conduct of the
research study.
Research Methodology “It is the analysis of the principles of methods, rules and postulates
used in a field of study.”
“It encompasses the systematic study of methods that are useful in a
field of study."
Research Process The methodology of conducting a research assignment / project / study
in a scientific and systematic manner.
The step by step scientific process that is followed to conduct re-
search.
Research Proposal It encompasses the methodology of conducting the research to solve
the formulated research problem.
Research Questions/Issues The questions that are selected by the researcher for further analysis,
Out of the several management questions.
Response It is the dependent variable of interest
Run A succession of values with the same sign or type(in case of qualita-
tive data)
Sample One or more units, selected from a population according to some
specified procedure.
Sampling Frame Sampling frame is assigning unique number to each and every unit
of population.
Scatter Diagram Values of x and y depicted with the help of the rectangular co-ordinate
system, by plotting the observed pairs of values of x and y.
Scheduling It implies indicating the starting and completion times for all the
activities.
G.14 Glossary
Scientific Research The research conducted in Science subjects such as Physics, Chemistry,
Biology, etc., or research study conducted using scientific process.
Scree Plot A plot of Eigen values against the factors in the order of their extrac-
tion.
Secondary Data The data which is disseminated through some media
Secondary Research Reports/Publications analyzing and evaluating data collected by
others
Semantic Differential This scale provides a measure to the psychological meaning of an
Scale attitude or an object, using bipolar adjectives. The respondents mark
in the blank spaces provided between the two objects, indicating how
they would best rate the object. Commonly this is rated on 7-point
scale.
Semi-Structured Interviews This method is used when the researcher asks the respondent some
basic questions, and then lets the respondent answer, interfering when-
ever necessary. In this method, the interviewer sets some guidelines
for the questions to be asked. The succeeding questions are generally
on the basis of the preceding questions.
Signed Rank The ranks assigned to observations are attached a sign viz. + or –
Significant Difference between the sample value and population value is said to
be significant if the calculated statistic falls in the rejection or criti-
cal region.
Similarity/ Distance It is a matrix containing the pair wise distances between the cases.
Coefficient Matrix
Simple Category Scales This scale is also termed as a dichotomous scale. It offers two mu-
tually exclusive response choices, typically a ‘Yes’ or ‘No’ type of
response.
Simple Random Sampling It’s a sampling procedure where each and every unit of population
has equal probability to be selected in a sample.
Simple Regression Study of the relationship between two variables
Analysis
Snowball Sampling/ Chain In such sampling, the sampling units are not fixed in advance but are
Referral Sampling decided as the sampling proceeds.
Social/Behavioural Refers to research conducted by social and behavioural researchers
Research into sociology, political science, behavioural science, education, etc.
Spurious Correlation When two variables change due to different factors and not due to
interdependence on each other.
Rank Correlation Association between two variables where data is given in the form of
the ranks of two variables based on some criterion.
Standard Error Standard Deviation of an Estimate from a Sample
Standardised Discriminant Also termed as standardised canonical discriminant function coef-
Coefficients ficients. Used to compare the relative importance of the independent
variables, like beta weights in regression.
Glossary G.15
Unbiased Estimator A property which implies that its expected value is equal to the popu-
lation value.
Unforced-Choice Rating This allows participants to express no opinion when they are unable
Scale to make a choice among the alternatives offered.
Unstructured Interviews Such interviews allow the interviewer to get opinions and get a feel
of general attitudes of the respondents. In this type of interviewing
method, the questions asked are not structured, and different questions
may be asked to different participants.
Validation Process of checking and correcting data for data entry errors. Also to
ensure that the data has been collected as per the specifications in the
prescribed format or questionnaire.
Validity The degree to which a study accurately reflects or assesses the specific
concept that the researcher is attempting to measure.
Validity of a Measuring Indicates the extent to which an instrument/scale tests or measures
Instrument what it is intended to measure.
Variable Observable characteristic that varies among individuals/items/ units/
entities, etc.
Wilks' lambda Used to test the significance of the discriminant function as a whole.
The "Sig." level for this function is the significance level of the dis-
criminant function as a whole
Some Other Useful Books
on Business Research
Methodology
Aczel Amir D. and Jayavel Sounderpandian: Complete Business Statistics, Tata McGraw-Hill Pub-
lishing Company Ltd (2006) Sixth Edition.
Anderson, David R., Sweeney Dennis J and Williams, Thomas A : Statistics for Business and Eco-
nomics. Thomson Asia Pte. Ltd (2002) Eighth Edition.
Anderson, T.W.: An introduction to Multivariate Statistical Analysis, John Wiley & Sons Inc. (2003)
Third Edition.
Beri, C.G.: Business Statistics, Tata McGraw-Hill Publishing Company Ltd., (2005), Second
Edition.
Cochran, William G and Cox, G.M.: Experimental Designs, John Wiley & Sons, Inc. (1977) Third
Edition
C.R. Kothari: Research Methodology—Methods and Techniques, New Age Publications (2002)
Second Revised Edition.
Donald R. Cooper and Pamela S. Schindler: Business Research Methods, Tata McGraw-Hill Pub-
lishing Company Ltd. (2009), Ninth Edition.
Fred N. Kerlinger and Howard B. Lee: Foundations of Behavioural Research, Harcourt College
Publishers (2000), Fourth Edition.
Frye, Microsoft® Excel Version 2002 Step By Step, Microsoft Press (2002)
Guy Hart Davis : How to do Everything with Microsoft Excel 2007, Tata McGraw-Hill Ltd.
(2007)
Josef F. Hair, Jr., William C. Black, Barry J. Babin, Ralf E. Anderson and Ronald L. Tatham: Mul-
tivariate Data Analysis, Pearson Prentice Hall (2006), Sixth Edition.
Hastings NAJ and Peacock JB: Statistical Distributions, London Butterworth (1975)
Johnson Richard A. and Wichern Dean W.: Business Statistics – Decision Making with Data, John
Wiley & Sons, Inc. (2003)
Kanji, Gopal K.: 100 Statistical Tests, SAGE Publications Ltd., (1999), New Edition
Lee Cheng F., Lee John and Lee, Alice C L: Statistics for Business and Financial Economics, World
Scientific Publishing Company Pte. Ltd (2000) Second Edition
Levin Richard I. and Rubin David S.: Statistics For Management, Prentice Hall of India Pvt. Ltd.,
(2002) Seventh Edition.
Malhotra, Naresh K.: Marketing Research—An Applied Orientation, Pearson Prentice Hall (2006)
Fifth Edition
Murray R. Spiegel: Schaum’s Outline Series—Theory and Problems of Statistics, McGraw-Hill
Book Company (1972)
Pande Peter S, Neuman Robert P and Cavanaugh Ronald R: The Six Sigma Way, Tata McGraw-Hill
(2003)
Paul McFedries: Formulas and Functions with Microsoft Office Excel 2007, Pearson Education
Asia (2007)
AI.2 Business Research Methodology
Rajendra Nargudkar: Marketing Research – Text and Cases, Tata Mc Graw-Hill Publishing Company
Ltd. (2009) Third Edition.
Richard A. Johnson and Dean W. Wichern: Applied Multivariate Analysis, Pearson Prentice Hall
(2007) Sixth Edition.
Sharma Subhash: Applied Multivariate Techniques, John Wiley & Sons (1996) First Edition
Siegel Sidney and Castellan, Jr. N. John : (Nonparametric Statistics for the Behavioral Sciences)
McGraw-Hill International Edition (1988),
Subhash Sharma: Applied Multivariate Techniques, John Wiley and Sons, Inc. (1996).
Uma Sekaran: Research Methods for Business–A Skill Building Approach, John Wiley and Sons
(2003) Fourth Edition.
Vohra N.D: Quantitative Techniques in Management, Tata McGraw-Hill (2007) Third Edition
William G. Zikmund: Business Research Methods, Thomson South Western (2006) Seventh
Edition.
Yule G U and Kendall M.G: An Introduction to the Theory of Statistics, Charles Griffin & Co. Ltd.,
(1973) Eleventh Edition.
Zikmund, William: Business Research Methods, Thomson Asia Pte. Ltd (2004) 7th Edition.
Statistical Tables
z 0 1 2 3 4 5 6 7 8 9
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
(Contd)
ST.2 Business Research Methodology
(Contd)
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
Statistical Tables ST.3
n1 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∝
n2
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 4.49 3.63 2.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01
Statistical Tables
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23 2.19 21.5 2.10 2.06 2.01 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71
(Contd.)
Table T4 (Contd.) ‘F’-Distribution-Values of ‘F’
Level of Significance.a = 0.01 ST.6
n1 = degrees of freedom for numerator
n1 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 μ
n2
1 4052 4999.5 5403 5625 5764 5859 5928 5982 6022 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.22 26.13
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.46
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42
Business Research Methodology
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
Level of Significance
n 0.10 0.05 0.025 0.01
4 0.8000
5 0.8000 0.9000 0.9000
6 0.7714 0.8286 0.8857 0.9429
7 0.6786 0.7450 0.8571 0.8929
8 0.6190 0.7143 0.8095 0.8571
9 0.5833 0.6833 0.7667 0.8167
10 0.5515 0.6364 0.7333 0.7818
11 0.5273 0.6091 0.7000 0.7455
12 0.4965 0.5804 0.6713 0.7273
13 0.4780 0.5549 0.6429 0.6978
14 0.4593 0.5341 0.6220 0.6747
15 0.4429 0.5179 0.6000 0.6536
16 0.4265 0.5000 0.5824 0.6324
17 0.4118 0.4853 0.5637 0.6152
18 0.3994 0.4716 0.5480 0.5975
19 0.3895 0.4579 0.5333 0.5825
20 0.3789 0.4451 0.5203 0.5684
21 0.3688 0.4351 0.5078 0.5545
22 0.3597 0.4241 0.4963 0.5426
23 0.3518 0.4150 0.4852 0.5306
24 0.3435 0.4061 0.4748 0.5200
25 0.3362 0.3977 0.4654 0.5100
26 0.3299 0.3894 0.4564 0.5002
27 0.3236 0.3822 0.4481 0.4915
28 0.3175 0.3749 0.4401 0.4828
29 0.3113 0.3685 0.4320 0.4744
30 0.3059 0.3620 0.4251 0.4665
ST.10 Business Research Methodology
Table T8 Acceptance Region for Values of ‘R’ in the RUN Test for Randomness
Table T9 Minimum (Critical) Value of ‘T’ for Signed Rank Test for Mean
k
n 3 4 5
4 6.6 9.4 15.1
5 7.4 10.5 13.6
6 8.1 11.5 14.9
8 9.4 13.3 17.3
10 10.5 14.8 19.3
ANSWERS TO EXERCISES
Chapter 9
1. CV = 109.1143. It is a measure of disparities in GDP among various countries and remains
unchanged as 109.1143.
2. Total sales (2005) = 80.4 Crores. Total sales (2006) = 111.2 Crores. Growth: 38.3%.
3. Median – 76,075, IQR - 13,366, Sample Std Dev - 13,582.67, Coefficient of Variation - 20%
4. (i) 68.58 (ii) 67.1% (iii) 7855.82 (iv) 30.86%
5. Jyoti: Mean = 80% CV = 9.048481
Anuj : Mean = 79% CV = 3.10062
Their average is about the same but Anuj has more consistent rating than Jyoti.
6. 18.92%
7. While mean has reduced by 1.14 days, s.d. slightly increased by 0.48. However, CV has in-
creased. i.e. variability is increased by 1%
8. Mean - 10.25, Median - 9.00, Coefficient of Variation - 57%
ROI 12%, Rating: 66.33%
9. Average Rate of Interest or Cost of Funds = 5.366%
Chapter 10
1. (a) Net profit = 983.4 + 0.055 Net Sales, r = 0.3663, r2 = 0.1342
(b) Net sales = 7818.07 + 3.02 P/E Ratio, r = 0.0231, r2 = 0.0005.
2. (i)0.8909 (ii) 0.7576 (iii) 0.8667
3. (a) (i) 0.104378 (ii) 0.146146 (iii) 0.073593
(b) (i) Beta of ICICI stock = 1.0273
This implies that the ICICI Stock is 2.73% more aggressive than BSE
(ii) Beta of Reliance Industries stock = 0.8523
This implies that the Reliance Industries Stock gives 85.23 % returns if the return on BSE is
100%.
(iii) Beta of L & T stock = 0.981
This implies that the L & T Stock gives 98.1 % returns if the return on BSE is 100%.
4. Sales Revenue = 60.94737 + 4.302632 Advertising Expenses
Sales Revenue = 190.0263 Crores for Advertising expenses = 30 Crores.
Answers to Exercises AN.3
5. r = 0.685315.
It is desirable that the correlation is high for consistency in rankings by the two executives.
They may not be able to recruit the right candidates.
6. (a) Profit = 6.5625 + 121.875 Expenditure on R&D
(b) 128.4375 (c) 0.97745 (d) 95.54%
7. 0.818182 Only 67% of variation in deposits is explained by customer satisfaction. The bank
might be getting deposits due to other reasons also such as higher rate of deposits, confidence
reposed by customers, flexibility in deposit schemes, etc.
8. (i)Demand = 188.522 – 7.33 Price (ii) 78,560 units (iii) 90.86%
9. (a) Marks = 1 + 0.654 I.Q. (b) 79.515 (c) 0.9014
(d) 81.25%
10. r = 0.6232 r2 = 0.3884. Thus, only about 39% of variation in ‘Score on Job’ is explained by
‘Academic Score’
Chapter 11
1. 0.7 ± 0.0421 = (0.6579, 0.7421)
2. Sample size = 42
3. H0 : m = 1000 ml.
H1 : m 1000 ml.
Test Statistic z = – 4. z tabulated = – 2.576. Reject Ho. The machine does not fill 1000 ml.
4. H0 : m = 15.6
H1 : m 15.6
Test Statistic z = – 1.93, z tabulated = 2.575. Do not reject Ho. The mean breaking strength
of the lot could be 15.6
5. H0 : m = 2000
H1 : m < 2000
Test Statistic z = –2.5. z tabulated = –2.05 Reject Ho. The population mean is less than 2000
hrs.
6. H0 : p = 0.90
H1 : p < 0.90
Test Statistic z = – 4.714. z tabulated = – 1.645. Reject Ho. The claim that the medicine is
effective for 90% of people is not justified.
7. H0 : m1 = m2
H1 : m1 m2
Test Statistic = –3.95. Reject Ho. There is significant difference in the two brands of gloves.
8. H0 : m1 = m2
H1 : m1 > m2
Test Statistic = 6.3246. Reject Ho. The process is effective in reducing time.
9. H0 : m1 = m2
H1 : m1 < m2
Test Statistic = 2.1921. Reject. The additive has increased the mileage of cars.
10. 90% : (29.013,30.987), 95% : (28.824,31.176), 99% : (28.4545, 31.5455)
11. H0 : p1 = p2
H1 : p1 < p2
AN.4 Business Research Methodology
Test Statistic = 1.572.. z tabulated = 1.645. Do not reject Ho. There is no significant improve-
ment by the sponsorship of the movie.
12. H0 : d = 0
H1 : d 0
Test Statistic = 30.98 Reject Ho. There is significant change in perception of the product’s
effectiveness.
13. Test statistic c2 = 2.9946. Do not reject the hypothesis that they are independent.
14. Test statistic c2 = 10.302. Reject hypothesis that approval/rejection is not independent of the
officer.
15. H 0 : s 12 = s 22
H 1 : s 12 s 22
F = 1.01 Do not reject Ho. There is no significant difference in the variances of the service
times of the two operators.
Chapter 12
1. ANOVA Table
Source SS df MS F Fcritical(a = 5%) Result
Between Card Types 1203.33 2 601.67 18.513 3.8853 Reject
Within Card Types 390 12 32.5
Total 1593.33 14
There is difference in the billing amounts of different cards. This implies that the amount spent
depends on the type of the card.
2. ANOVA Table
Source SS df MS F Fcritical(a = 5%) Result
Between Institutes 33.617 2 16.808 1.2865 3.8853 Do not reject
Within Institutes 156.783 12 13.065
Total 190.4 14
5. ANOVA Table
Source SS df MS F Fcritical(a = 5%) Result
Between Cities 1845.5 3 615.17 95.252 3.4903 Reject
Within Cities 77.5 12 6.4583
Total 1923 15
Chapter 13
1. Critical Values of R for number of ‘+’ signs = 17 and number of ‘–’ signs = 13, as per Table
T11, are 10 and 22. Since R = 16 lies in acceptance region, accept the hypothesis that the
pattern of returns is random.
2. H0 : m = 12, H1: m 12
T = Min.(11, 55) = 11
Minimum (critical) value of ‘T’ at a =0.05 from Table T12 is 8. Since T(= 11) > 8, accept HO
that the mean is equal to 12.
3. Tabulated Value of ‘D’ = 0.0813. Since, Max ‘D’(= 0.0107) < Tabulated value of ‘D’, therefore,
HO is accepted, and data follows Poisson distribution.
4. Tabulated value of ‘D’ = 0.136. Since, Max ‘D’(= 0.02) < Tabulated value of ‘D’, therefore, HO
is accepted, i.e. percentage of MBA girl students in all the five Institutes are the same.
5. ‘U’ (for Company Recruited Officers) = 10 10 + (10 11)/2 – 130.5 = 34.5
‘U’ (for Market Recruited Officers) = 10 10 + (10 11)/2 – 79.5 = 75.5
AN.6 Business Research Methodology
Tabulated value of ‘U’ with n1 = n2 = 10 and a = 0.05, is = 23. Since Minimum (34.5, 75.5)
= 34.5 > 23, reject HO. Thus, scores of company recruited and market recruited officers are
not equal.
6. ‘U’ = 10 10 + (10 11)/2 – 82.5 = 72.5
‘U’ = 10 10 + (10 11)/2 – 127.5 = 27.5
Tabulated value of ‘U’ with n1 = n2 = 10 and a = 0.05 is = 23
Since Minimum (72.5, 27.5) = 27.5 > 23, reject HO. Thus, lives of two types of batteries is not
equal.
7. Data about daily rates of returns and their ranks among the three companies
Date BSE ICICI Bank Reliance Industries L&T
6/3/2006 — — — —
7/3/2006 –0.093 –2.047 –0.007 3.331
8/3/2006 –2.014 –1.682 –1.694 –2.181
9/3/2006 0.619 1.897 1.008 0.121
10/3/2006 1.806 1.853 0.75 2.878
13/3/2006 0.362 –1.599 0.027 0.939
16/3/2006 0.685 0.73 4.936 –1.139
17/3/2006 –0.165 –0.37 0.826 –1.619
20/3/2006 0.746 0.025 0.174 –0.215
21/3/2006 –0.329 –1.255 0.528 0.187
Assigning rank 1 to lowest rate of return (–2.181), and 27 to highest rate (4.936)
Sum of ranks for ICICI = 105
Sum of ranks for Reliance Industries = 147
Sum of ranks for L&T = 126
Calculated Value of ‘H’ = 8.14 > 5.99 (Tabulated Value of c2 at a = 0.05)
Therefore, reject the null hypothesis that the daily rates of returns for the three companies are
equal.
8. Since tabulated value of ‘F’ at 27,9 d.f. (2.88) is less than calculated value of ‘F’(4.09), therefore,
reject equality of ranks of companies on the three parameters.
9. From the table of net differences in rank sums of the pairs of four indices, it is observed that
none of the differences is significant i.e. greater than the critical value for the difference in
‘Rank Sums’ for number of indices as 4, number of observations for each index = 11, and
level of significance = 5 %. Thus, there is no significant difference in the returns on BSE 30,
BSE 100, BSE 200 and BSE 500.
10. Since the calculated value of ‘F’(2.9) < 7.815 (Tabulated value of c2 at 5 % level of significance
and, 3 d.f.), there is no significant difference among daily rates of return on the four BSE
indices.
Answers
Customer Driven Research, 1.8 Factor Analysis on Data Using SPSS, 14.71
Faculty Sponsored Research, 3.26
Data Validation, 2.19 Features of a Good Statistical Average, 9.16
Deductive and Inductive Logic, 2.9 Fisher’s Least Significant Difference (LSD) Test, 12.11
— Approach, 2.9 Five Number Summary, 9.17
Definition and Wording of a Hypothesis, 3.7 Flow Chart for Conducting Research, 3.20
Definitions of Research, 1.4 Forced Ranking, 5.15
Dependent Variable, 2.7 Forced versus Unforced Scales, 5.25
Descriptive Hypotheses, 3.9 Forced-Choice scale,5.25
— Research, 1.18, 1.19 Formats for Various Types of Reports for Different Types
— Research Design, 4.4 of
Design of Experiments (DOE), 4.7 Research Studies, 15.5
Developing or Setting up Hypotheses, 3.10 Formulating Research Problem, 3.4
Discriminant Analysis Using SPSS, 14.39 Friedman’s Test 13.18
Discriminant Analysis, 14.33
— Function, 14.33, 14.34 Generalised Regression Model, 14.24
— Variable, 14.33 - Assumptions for the Multiple Regression Model,
Divergent Thinking, 2.38 14.24
Dummy Variable, 14.15 Geometric mean, 9.8
—Activities, 2.34 Glimpses of Past Research, 1.2
Goal Setting for a Research Project, 2.24
Earliest Expected Time (TE), 2.31 Government/Corporate Sponsored Research, 3.26
Editing Data, 7.8 Graphic Rating, 5.17
E-mail Survey, 7.4 Graphs as Management Tool, 8.1
Empirical Research, 1.12 Guidance for Good Business Research, 3.27
ERP/Data Warehouses and Mining, 6.12, 6.13 Guide to Conducting Research Projects by Students, 1.26
Errors in Measurements, 5.10 Guidelines for Deciding Scales, 5.23
Estimation of Multiple Regression Equation and Calcula-
tion Haphazard Sampling, 4.30
of Multiple Correlation Coefficient, 14.7 Harmonic mean, 9.9
Ethics—Definitions and Norms, 16.3 Histogram/Frequency Polygon, 8.10
Ethical Issues at Various Levels of Research Process, Historical Research, 1.15
16.8 H-test, 13.16
Ethical Issues in Business Research, 16.4 Human Behaviour and Preferences, 2.5
— Norms for Professionals, 16.3 Hypothesis Development, 3.6
— Standards in Qualitative and Quantitative
Research, 16.6 Idea Room, 2.40
Ethical Obligations for Researchers, 16.7 Identifying Research Problem, 3.3
Ethical Obligations for Respondents, 16.7 Illustration of a PERT Chart, 2.28
Experimental Designs, 4.7 Independent and Dependent Jobs/Activities, 2.31
Explained Variation, 10.20 Independent Variable, 2.7
Explanatory Variable, 2.7, 4.4 Individual Attributes, 14.130
—Causal/Relational, 4.4 Individual/Group Sponsored Research, 3.26, 16.6
External Research, 3.24 Internal and External Research, 3.21, 3.22
- Criteria for Selection of an External Research Internal Validity, 4.5
Agency, 3.25 Internet/Web, 6.13
- Limitations of External Research, 3.24 —Searching Databases/WebPages, 6.15
External Validity, 4.5, 4.7 —Some of the Important Websites, 6.14
Extraneous Variables, 2.8 Inter-Quartile Range (IQR), 9.18
Factor Analysis, 14.66 Interval Estimation, 11.4
Index I.3
Single/Multiple Category Scales, 5.21 Test for Regression Model and Regression Coefficients,
Size of a Sample, 4.24 14.19
Slack or Cushion Time for Jobs, 2.32 Tests for Measuring Goodness of a discriminant func-
Snowball Sampling, 4.31 tion, 14.37
Social Factors, 1.8 Transaction Processing (TP) Systems or Enterprise
Social/Behavioural Research, 1.14
Resource Planning (ERP) systems, 6.12
Spearman’s Rank Correlation, 10.11
Tukey’s Honestly Significant Difference (HSD)
Specific Observations, 2.11
Sponsoring Research, 3.25, 16.4 Test for Multiple Comparison of Means, 12.8
Spurious Correlation, 10.10 Two Way or Two Factor ANOVA, 12.12
Standard Deviation, 9.19 Two-Way or Two-Factor ANOVA with Interaction, 12.14
—Error of Estimator, 10.20 Type–I Error, 11.15
Standardised Marks, 9.26 Type–II Error, 11.15
—Score, 9.25 Type-I and Type-II Errors from Indian Epics, 11.16
—Variable, 9.25 Types of Experimental Designs, 4.9
Stapel Scales, 5.21 — Ex Post Facto Design, 4.18
Statistical Analysis Based on Scales, 5.8 — Factorial designs, 4.15
Statistical Inference, 11.2
— Latin square design, 4.14
Statistics for Management, 2.14
— One-factor experiment, 4.10
Stem and Leaf Diagram, 8.3
— Quasi-experimental design, 4.16
Steps for Conducting Tests of Significance for
Mean,11.21 — Two-factor experiments, 4.11
Steps for Monte Carlo Simulation, 4.35 — Two-factor experiments with interaction, 4.13
Stepwise Method for Entering Variables, 14.31 Types of Hypotheses, 3.8
Stratified sampling, 4.28
Structured Interviews, 6.9 Unbalanced Rating Scale, 5.24
Subdivided Bar Chart, 8.6 Unforced-Choice Rating Scale, 5.25
System Simulation, 4.35 Ungrouped (Raw) Data, 8.2
Systematic Sampling, 4.27 Unstructured Interviews, 6.10