Research Methodology
Research Methodology
Research Methodology
DCOM408/DMGT404
www.lpude.in
DIRECTORATE OF DISTANCE EDUCATION
RESEARCH METHODOLOGY
LPU is reaching out to the masses by providing an intellectual learning environment that is academically rich with the most
affordable fee structure. Supported by the largest University1 in the country, LPU, the Directorate of Distance Education (DDE)
is bridging the gap between education and the education seekers at a fast pace, through the usage of technology which
significantly extends the reach and quality of education. DDE aims at making Distance Education a credible and valued mode
of learning by providing education without a compromise.
DDE is a young and dynamic wing of the University, filled with energy, enthusiasm, compassion and concern. Its team strives
hard to meet the demands of the industry, to ensure quality in curriculum, teaching methodology, examination and evaluation
system, and to provide the best of student services to its students. DDE is proud of its values, by virtue of which, it ensures to
make an impact on the education system and its learners.
Through affordable education, online resources and a network of Study Centres, DDE intends to reach the unreached.
SYLLABUS
Research Methodology
Objectives: The general objective of this course is to introduce students to methods of research. The specific objectives are: To
develop understanding of the fundamental theoretical ideas and logic of research; To develop understanding of the issues
involved in planning, designing, executing, evaluating and reporting research; To introduce students to many of the technical
aspects of how to do empirical research using some of the main data collection and analysis techniques.
Sr. No.
Description
1.
2.
Sampling Design: Steps of Sampling Design, Characteristics of Good Sampling Design, Different types of
Sampling Design.
3.
Measurement and Scaling Technique: Tools of Sound Measurement, Techniques of Developing Measurement
Tools, Scaling meaning and Important Scaling Techniques
4.
Data Collection: Primary (Interview, Observation and Questionnaire and Collection of Secondary Data).
5.
Data Analysis: Measure for Central Tendency, Dispersion, Correlation and Regression Analysis, Time Series
and Index Number.
6.
7.
Multivariate Analysis: Classification, Important Methods of Factor Analysis, Factor Analysis, Rotation in
Factor Analysis, Overview of Cluster Analysis, Discriminant Analysis, Multi Dimensional Scaling, Conjoint
Analysis.
8.
Report Writing: Technique and Precaution of Interpretation, Significance of Report Writing, Layout and Types
of Report.
CONTENTS
Unit 1:
Introduction to Research
Unit 2:
Research Problem
17
Unit 3:
Research Design
25
Unit 4:
Sampling Design
56
Unit 5:
81
Unit 6:
102
Unit 7:
Secondary Data
125
Unit 8:
Descriptive Statistics
135
Unit 9:
164
Unit 10:
Time Series
201
Unit 11:
Index Numbers
231
Unit 12:
Hypothesis Testing
254
Unit 13:
Multivariate Analysis
280
Unit 14:
Report Writing
302
Statistical Tables
319
Notes
CONTENTS
Objectives
Introduction
1.1
Research Objectives
1.1.2
Marketing Research
1.2
Defining Research
1.3
Research Process
1.4
1.3.1
Problem Formulation
1.3.2
1.3.3
1.3.4
1.3.5
1.3.6
1.3.7
1.3.8
Types of Research
1.4.1
Exploratory Research
1.4.2
Descriptive Research
1.4.3
Applied Research
1.4.4
1.4.5
Conceptual Research
1.4.6
Causal Research
1.4.7
Historical Research
1.4.8
1.4.9
Action Research
1.4.10
Evaluation Research
1.4.11
Library Research
1.5
Summary
1.6
Keywords
1.7
Review Questions
1.8
Further Readings
Research Methodology
Notes
Objectives
After studying this unit, you will be able to:
Introduction
Research means technical and organized search for relevant information on a particular topic. It
is defined as an academic activity that involves identifying the research problem, formulating a
hypothesis, collecting and analyzing data and reaching specific conclusions in the form of
solutions or general theories. The primary objective of research is to find solutions for problems
in a methodical and systematic way. A research depends on the field in which the research work
is performed. Various types of researches can be done for different fields, like fundamental
research for identifying the important principles of the research field and applied research for
solving an immediate problem. However, all these researches primarily follow two approaches,
quantitative and qualitative. The quantitative approach focuses on the quantity of the data
obtained from the research, while the qualitative approach is concerned with the quality of the
obtained data.
1.
2.
3.
4.
1.
To Identify and Find Solutions to the Problem: To understand the problem in depth
Notes
Example: "Why is that demand for a product is falling"? "Why is there a business fluctuation
once in three years"? By identifying the problem as above, it is easy to collect the relevant data
to solve the problem.
2.
Example: Should we maintain the advertising budget same as last year? Research will
answer this question.
3.
To Find Alternative Strategies: Should we follow pull strategy or push strategy to promote
the product.
4.
Did u know? Most large banks have their own market research departments that evaluate
not only products, but their Brick and Mortar branch banking networks through which
most banking products are sold.
Self Assessment
Fill in the blanks:
1.
2.
Research Methodology
Notes
3.
...................... methods are concerned with attempts to quantify social phenomena and
collect and analyse numerical data.
Self Assessment
Fill in the blanks:
4.
The purpose of research is to find solutions through the application of ...................... and
...................... methods.
5.
6.
Notes
Notes The research process itself involves identifying, locating, assessing, analyzing, and
then developing and expressing your ideas. These are the same skills you will need
outside the academic world when you write a report or proposal for your boss.
There are nine steps in the research process, that can be followed while designing a research
project. They are as follows:
1.
2.
3.
4.
5.
Data collection
6.
7.
8.
9.
Defining the research problem and formulation of hypothesis are the hardest steps in the research
process.
2.
3.
4.
1.
Determine the objective: Objective may be general or specific. General - Would like to
know, how effective was the advertising campaign.
The above looks like a statement with objective. In reality, it is far from it. There are two
ways of finding out the objectives precisely. (a) The researcher should clarify with the MR
manager "What effective means". Does effective mean, awareness or does it refer to sales
increase or does it mean, it has improved the knowledge of the audience, or the perception
of audience about the product. In each of the above circumstances, the questions to be
asked from audience varies (b) Another way to find objectives is to find out from the MR
Manager, "What action will be taken, given the specified outcome of the study.
Research Methodology
Notes
Example: If research finding is that, the previous advertisement by the company was indeed
ineffective, what course of action the company intends to take (a) Increase the budget for the next
Ad (b) Use different appeal (c) Change the media (d) Go to a new agency.
Caution: If objectives are proper, research questions will be precise. However we should
remember that objectives, do undergo a change.
2.
Example: Assume that the company wants to introduce a new product like Iced tea or
frozen green peas or ready to eat chapathis.
The following are the environmental factors to be considered:
(a)
(b)
Presently, who are the other competitors in the market with same or similar product.
(c)
What is the perception of the people about the other products of the company, with
respect to price, image of the company.
(d)
All the above factors could influence the decision. Therefore researcher must work very
closely with his client.
3.
Nature of the problem: By understanding the nature of the problem, the researcher can
collect relevant data and help suggesting a suitable solution. Every problem is related to
either one or more variable. Before starting the data collection, a preliminary investigation
of the problem is necessary, for better understanding of the problem. Initial investigation
could be, by using focus group of consumers or sales representatives.
If focus group is carried out with consumers, some of the following question will help the
researcher to understand the problem better:
4.
(a)
Did the customer ever included this company's product in his mental map?
(b)
If the customer is not buying the companies product, the reasons for the same.
(c)
(d)
State the alternatives: It is better for the researcher to generate as many alternatives as
possible during problem formulation hypothesis.
Example: Whether to introduce a Sachet form of packaging with a view to increase sales. The
hypothesis will state that, acceptance of the sachet by the customer will increase the sales by
20%. Thereafter, the test marketing will be conducted before deciding whether to introduce
sachet or not. Therefore for every alternative, a hypothesis is to be developed.
Notes
Example: Company 'X' wants to launch a product. The company's intuitive feeling is that,
the product failure possibilities is 35%. However, if research is conducted and appropriate data
is gathered, the chances of failure can be reduced to 30%. Company also has calculated, that the
loss would be 3,00,000 if product fails. The company has received a quote from MR agency. The
cost of research is 75,000. The question is "Should the company spend this money to conduct
research?"
Solution:
Loss without research = 3,00,000 0.35
= 1,05,000
Loss with research = 3,00,000 0.30
= 90,000
Value of research information = 1,05,000 90,000
= 15,000
Since the value of information namely 15000 is lower than the cost of research
conducting research is not recommended.
75,000,
Example: What is the overall industry demand? What is the share of the competitor? The
above information will help the management to estimate overall share and its own shares, in
the market.
2.
Distribution coverage:
Example:
(a)
(b)
3.
Example: "What percentage of target population are aware of firm's product"? "Do customers
know about the product"? "What is the customers' attitude towards the product"? "What percentage
of customers repurchased the product"?
4.
Marketing expenditure:
Example: "What has been the marketing expenditure"? "How much was spent on promotion"?
Research Methodology
Notes
5.
Example: "Causes for decline in sales of a specific company's product in a specific territory
under a specific salesman".
The researcher may explore all possibilities why sales in falling?
(a)
(b)
Higher price
(c)
Less discount
(d)
Less availability
(e)
Inefficient advertising/salesmanship
(f)
(g)
less awareness
3.
4.
Example: In a test of advertising copy, the respondents can first be interviewed to measure
their present awareness, and their attitudes towards certain brands. Then, they can be shown a
pilot version of the proposed advertisement copy, following this, their attitude also is to be
measured once again, to see if the proposed copy had any effect on them.
If it is a questionnaire, (a) What are the contents of the questionnaire? (b) What type of questions
to be asked? Like pointed questions, general questions etc. (c) In what sequence should it be
asked? (d) Should there be a fixed set of alternatives or should it be open ended. (e) Should the
purpose be made clear to the respondents or should it be disguised are to be determined well in
advance.
Notes
Task Prepare a questionnaire to find if the consumers appreciate your new product as
compared to the older ones or not.
Caution While selecting the sample, the sample unit has to be clearly specified
(b)
(c)
(d)
Week, day and time to meet the specific respondents etc., are to be decided.
The data collected should be scanned, to make sure that it is complete and all the instructions
are followed. This process is called editing. Once these forms have been edited, they must
be coded.
2.
Coding means, assigning numbers to each of the answers, so that they can be analyzed.
Research Methodology
Notes
3.
The final step is called as data tabulation. It is the orderly arrangement of the data in a
tabular form. Also at the time of analyzing the data, the statistical tests to be used must be
finalized such as T-Test, Z-test, Chi-square Test, ANOVA, etc.
Self Assessment
Fill in the blanks:
7.
8.
9.
10.
11.
It is better for the researcher to generate as many alternatives as possible during problem
.............................
12.
Quantitative Research
Descriptive Research
Qualitative Research
Direct Approach
(Non Disguised)
Focus Group
Indirect Approach
Single Cross
(Disguised)
Sectional Design
Depth Interview
Causal Research
Multiple Cross
Sectional Design
Projective
Techniques
Association
Technique
Completion
Technique
Construction
Technique
Expressive
Technique
10
Inefficient service
2.
Improper price
3.
4.
Ineffective promotion
5.
Improper quality
Notes
The research executives must examine such questions to identify the most useful avenues for
further research. Preliminary investigation of this type is called exploratory research. Expert
surveys, focus groups, case studies and observation methods are used to conduct the exploratory
survey.
Descriptive research deals with demographic characteristics of the consumer. For example,
trends in the consumption of soft drink with respect to socio-economic characteristics such
as age, family, income, education level etc. Another example can be the degree of viewing
TV channels, its variation with age, income level, profession of respondent as well as time
of viewing. Hence, the degree of use of TV to different types of respondents will be of
importance to the researcher. There are three types of players who will decide the usage of
TV: (i) Television manufacturers, (ii) Broadcasting agency of the programme, (iii) Viewers.
Therefore, research pertaining to any one of the following can be conducted:
(a)
The manufacturer can come out with facilities which will make the television more
user-friendly. Some of the facilities are (i) Remote control, (ii) Child lock,
(iii) Different models for different income groups, (iv) Internet compatibility etc.,
(v) Wall mounting etc.
(b)
Similarly, broadcasting agencies can come out with programmes, which can suit
different age groups and income.
(c)
Ultimately, the viewers who use the TV must be aware of the programmes appearing
in different channels and can plan their viewing schedule accordingly.
2.
Descriptive research deals with specific predictions, for example, sales of a company's
product during the next three years, i.e., forecasting.
3.
Descriptive research is also used to estimate the proportion of population who behave in
a certain way.
Example: "Why do middle income groups go to Food World to buy their products?"
A study can be commissioned by a manufacturing company to find out various facilities that can
be provided in television sets based on the above discussion.
Similarly, studies can be conducted by broadcasting stations to find out the degree of utility of
TV programmes.
11
Research Methodology
Notes
Example: The following hypothesis may be formulated about the programmes:
1.
The programmes in various channels are useful by way of entertainment to the viewers.
2.
Viewers feel that TV is a boon for their children in improving their knowledge- especially,
fiction and cartoon programmes.
2.
12
Notes
Example: Investors in the share market study the past records or prices of shares which
he/she intends to buy. Studying the share prices of a particular company enables the investor to
take decision whether to invest in the shares of a company.
Crime branch police/CBI officers study the past records or the history of the criminals and
terrorists in order to arrive at some conclusions.
The main objective of this study is to derive explanation and generalization from the past trends
in order to understand the present and anticipate the future.
There are however, certain shortcomings of historical research:
1.
2.
3.
Task List the records to be considered while conducting a historical research in analyzing
the sales aspect of a television brand
13
Research Methodology
Notes
Self Assessment
Fill in the blanks:
13.
14.
15.
16.
17.
18.
19.
20.
In exploratory research, all possible reasons which are ................... are eliminated
1.5 Summary
Research question is further subdivided, covering various facets of the problem that need
to be solved.
The role and scope of research has greatly increased in the field of business and economy
as a whole.
The study of research methods provides you with knowledge and skills you need to solve
the problems and meet the challenges of today is modern pace of development.
1.6 Keywords
Ad Tracking: It is periodic or continuous in-market research to monitor a brand's performance
using measures such as brand awareness, brand preference, and product usage.
Advertising Research: It is a specialized form of marketing research conducted to improve the
efficacy of advertising.
Concept Testing: To test the acceptance of a concept by target consumers.
Copy Testing: It predicts in-market performance of an ad before it airs by analyzing audience
levels of attention, brand linkage, motivation, entertainment, and communication, as well as
breaking down the ad's flow of attention and flow of emotion.
14
Notes
2.
What would be the instances in which you might take causal research in your organization?
3.
It is said that action research is conducted to solve a problem. Why are the other researches
conducted then?
4.
What type of research would you undertake in order find why middle income groups go
to a particular retail store to buy their products?
5.
Which research would you undertake if you have got a practical problem?
6.
Which type of research would you conduct when the problem is not clear and all the
possible reasons are eliminated? Why?
7.
How does a research help the managers to determine the pattern of consumption?
8.
Do you think that a market research helps the marketer to identify brand loyalty and
establish it with further strength? Why/why not?
9.
When records exist in all authenticated form, why is it so that their verification remains a
big issue?
10.
Is there any difference in pure research and ex-post facto research? Support you answer
with suitable reasons.
Social science
2.
3.
Quantitative
4.
Systematic, scientific
5.
New knowledge
6.
Purposeful
7.
Data tabulation
8.
sample unit
9.
non-probability
10.
researcher
15
Research Methodology
Notes
11.
formulation hypothesis
12.
nine
13.
Action research
14.
Ex-post Facto
15.
Conceptual
16.
demographic
17.
applied
18.
Library
19.
basic
20.
very obvious
Books
Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.
Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.
S. N. Murthy and U. Bhojanna, Business Research Methods, Excel Books.
16
Notes
CONTENTS
Objectives
Introduction
2.1
Research Problem
2.2
2.3
2.4
2.5
2.6
Summary
2.7
Keywords
2.8
Review Questions
2.9
Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
In all organizations, some kind of research is required to support decision-making, for example,
examination of circulation records to determine if fund allocations should be changed. A manager
exists in three time dimensions: past, present and future. The past specifies an accurate sense of
what was achieved and what was not, while the present specifies what is being achieved. On the
other hand, the future time dimension specifies what a manager should achieve.
Research is used to provide facts on the first two, which supports the decisions that will have an
impact on the future. These decisions are made on the basis of collected data or facts. The
importance of the decisions and their impact on the organization will determine the importance
of research.
There is a famous saying that "problem well-defined is half solved". This statement is strikingly
true in market research, because if the problem is not stated properly, the objectives will not be
clear. If the objective is not clearly defined, the data collection becomes meaningless.
Research problem is a condition that causes a researcher to feel anxious, uneasy and confused. It
involves the complete analysis of the problem area involving who, what, where, when and why
of the problem situation.
17
Research Methodology
Notes
2.
3.
Notes Problem formulation is the key to research process. For a researcher, problem
formulation means converting the management problem to a research problem. In order
to attain clarity, the M.R. manager and researcher must articulate clearly so that perfect
understanding of each others is achieved.
In research process, the first and foremost step happens to be that of selecting and properly
defining a research problem. A researcher must find the problem and formulate it so that it
becomes susceptible to research. To define a problem correctly, a researcher must know: what a
problem is?
Did u know? Like a medical doctor, a researcher must examine all the symptoms (presented
to him or observed by him) concerning a problem before he can diagnose correctly.
Self Assessment
Fill in the blanks:
18
1.
In order to attain clarity, the manager and researcher must ...................... clearly.
2.
3.
Notes
Caution Remember that research must yield a publication for it to have meaning.
You may wish to query likely periodical editors to see if they might be interested in an article
on your research topic.
In some cases, as with a thesis or a dissertation, some sort of preliminary study may be needed
to see if the problem and the study are feasible and to identify snags. Such a PILOT STUDY can
be quite valuable.
Task Analyse what problems you might encounter while selecting a problem?
Selection Criteria
1.
2.
3.
The degree to which research on this problem benefits the profession and society.
4.
The degree to which research on this problem will assist your professional goals and
career objectives.
5.
6.
The degree to which this research will interest superiors and other leaders in the field.
7.
The degree to which the research builds on your experience and knowledge.
8.
Ease of access to the population to be studied and the likelihood that they will be cooperative
Affordability.
19
Research Methodology
Notes
9.
Likelihood of publication.
10.
11.
12.
13.
Degree to which the research builds on and extends existing knowledge before the final
selection of a problem is done, a researcher must ask himself the following questions:
(a)
Whether he is well equipped in terms of his background to carry out the research?
(b)
(c)
Whether the necessary cooperation can be obtained from those who must participate
in research as subjects?
Self Assessment
Fill in the blanks:
4.
5.
A ......................... should always avoid selecting the first problem that he encounters.
6.
Did u know? Marketing problem which needs research can be classified into two categories:
1.
2.
Opportunity related problems, while the first category produces negative results
such as, decline in market share or sales, the second category provides benefits.
Problem definition might refer to either a real-life situation or it may also refer to a set of
opportunities. Market research problems or opportunities will arise under the following
circumstances: (1) Unanticipated change (2) Planned change. Many factors in the environment
can create problems or opportunities. Thus, change in the demographics, technological and
legal changes affect the marketing function. Now the question is how the company responds to
new technology, or product introduced by the competitor or how to cope with the changes in
life-styles. It may be a problem and at the same time, it can also be viewed as an opportunity. In
order to conduct research, the problem must be defined accurately.
20
2.
3.
4.
Notes
Example: "Why does the upper-middle class of Bangalore shop at Life-style during the
Diwali season"?
Here all the above four aspects are covered. We may be interested in a number of variables due
to which shopping is done at a particular place. The characteristic of interest to the researcher
may be (1) Variety offered at life-style (2) Discount offered by way of promotion (3) Ambience
at the life-style and (4) Personalised service offered. In some cases, the cause of the problem is
obvious whereas in others the cause is not so obvious. The obvious causes are the products being
on the decline. Not so obvious causes could be a bad first experience for the customer.
Self Assessment
Fill in the blanks:
7.
Changes in the demographics, technological and legal changes affect the .......................
function.
8.
9.
21
Research Methodology
Notes
Notes A proper definition of research problem will enable the researcher to be on the
track whereas an ill-defined problem may create hurdles.
What are the sources of problem identification?
Research students can adopt the following ways to identify the problems:
1.
2.
3.
4.
5.
Cultural and technological changes can act as a source for research problem identification.
6.
Self Assessment
Fill in the blanks:
10.
........................ and ...................... changes can act as a source for research problem
identification.
11.
12.
When you define a research problem you are trying to ...................... the outcome of an
answer.
13.
2.
3.
4.
5.
6.
7.
What exactly will be the difficulties in conducting the study, and hurdles to be overcome?
8.
Managers often want the results of research in accordance with their expectation. This satisfies
them immensely. If one were to closely look at the questionnaire, it is found that in most cases,
there are stereotyped answers given by the respondents.
22
Notes
Self Assessment
Fill in the blanks:
14.
Managers often want the results of research in accordance with their ........................
15.
2.6 Summary
It is vital and any error in defining the problem incorrectly can result in wastage of time
and money.
The task of defining a research problem, very often, follows a sequential pattern.
The problem is stated in a general way, the ambiguities are resolved, thinking and
rethinking process results in a more specific formulation of the problem.
It is done so that it may be a realistic one in terms of the available data and resources and
is also analytically meaningful.
All this results in a well defined research problem that is not only meaningful from an
operational point of view.
But is equally capable of paving the way for the development of working hypotheses and
for means of solving the problem itself.
2.7 Keywords
Marketing Research Problem: It is a situation where your company intends to sell a product or
service that fills a specific gap.
Objective of Research: It means to what the researcher aims to achieve.
Pilot Study: A small scale preliminary study conducted before the main research in order to
check the feasibility or to improve the design of the research.
Problem Definition: The process in order to clear understanding (explanation) of what the
problem is.
Research Problem: It focuses on the relevance of the present research.
The objective of research problem should be clearly defined; otherwise the data collection
becomes meaningless. Discuss with suitable examples.
23
Research Methodology
Notes
2.
Cultural and technological changes can act as a source for research problem identification.
Why/why not?
3.
4.
5.
If you are appointed to do a research for some problem with the client, what would you
take as the sources for problem identification?
6.
It may be a problem and at the same time, it can also be viewed as an opportunity. Why/
why not?
7.
In some cases, some sort of preliminary study may be needed. Which cases are being
referred to and why?
8.
9.
10.
What do you think as the reason behind specialists suggesting to avoid selecting the first
problem that you encounter?
articulate
2.
formulation
3.
what a problem is
4.
conclusive
5.
researcher
6.
carefully
7.
marketing
8.
negative
9.
problem
10.
Cultural, technological
11.
specific problem
12.
reduce
13.
defined
14.
expectation
15.
identify
Books
24
Notes
CONTENTS
Objectives
Introduction
3.1
3.2
3.3
An Overview
3.1.1
3.1.2
Exploratory Research
3.2.1
3.2.2
3.2.3
3.2.4
Secondary Data
3.2.5
Qualitative Research
3.3.2
3.3.3
Survey
3.3.4
Observation Studies
3.4
3.5
3.6
Experimentation
3.6.1
Experimental Designs
3.7
Summary
3.8
Keywords
3.9
Review Questions
Objectives
After studying this unit, you will be able to:
25
Research Methodology
Notes
Introduction
Research design is simply a plan for a study. This is used as a guide in collecting and analyzing
the data. It can be called a blue print to carry out the study. It is like a plan made by an architect
to build the house, if a research is conducted without a blue print, the result is likely to be
different from what is expected at the start. The blue print includes (1) interviews to be conducted,
observations to be made, experiments to be conducted data analysis to be made. (2) Tools used
to collect the data such as questionnaire (3) what is the sampling methods used.
3.1 An Overview
Research design can be thought of as the structure of research - it is the "glue" that holds all of the
elements in a research project together. A successful design stems from a collaborative process
involving good planning and communication.
Research Design is mainly of three types namely, exploratory, descriptive and causal research.
Exploratory research is used to seek insights into general nature of the problem. It provides the
relevant variable that need to be considered. In this type of research, there is no previous
knowledge; research methods are flexible, qualitative and unstructured.
Notes The researcher in this method does not know "what he will find".
Descriptive research is a type of research, very widely used in marketing research. Generally in
descriptive study there will be a hypothesis, with respect to this hypothesis, we ask questions
like size, distribution, etc.
Causal research, this type of research is concerned with finding cause and effect relationship.
Normally experiments are conducted in this type of research.
It helps to plan in advance the methods and techniques to be used for collecting and
analysing data.
It helps in obtaining the objectives of the research with the availability of staff, time and
money.
The researcher should consider the following factors before creating a research design:
26
Problem objectives
Notes
Self Assessment
Fill in the blanks:
1.
........................... research is used to seek insights into general nature of the problem.
2.
Research design helps to plan in advance the methods and techniques to be used for
collecting and ..data.
2.
3.
To list all possibilities. Among the several possibilities, we need to prioritize the
possibilities which seem likely
4.
Did u know? Exploratory study is also used to increase the analyst's familiarity with the
problem. This is particularly true, when the analyst is new to the problem area.
27
Research Methodology
Notes
Example: A market researcher working for (new entrant) a company for the first time.
5.
6.
Exploratory studies may be used to clarify concepts and help in formulating precise
problems.
Example: The management is considering a change in the contract policy, which it hopes,
will result in improved satisfaction for channel members.
An exploratory study can be used to clarify the present state of channel members' satisfaction
and to develop a method by which satisfaction level of channel members is measured
7.
8.
In general, exploratory research is appropriate to any problem about which very little is
known. This research is the foundation for any future study.
2.
3.
4.
5.
6.
Sometimes, it may not be possible to develop any hypothesis at all, if the situation is
being investigated for the first time. This is because no previous data is available.
2.
3.
In other cases, most of the data is available and it may be possible to provide answers to
the problem.
28
Research Question
Hypothesis
1.
No hypothesis
formulation is possible.
2.
3.
Impersonalization is the
problem.
In example 1: The research question is posed to determine "What benefit do people seek from the
Ad?" Since no previous research is done on consumer benefit for this product, it is not possible
to form any hypothesis.
Notes
In example 2: Some information is currently available about packaging for a soft drink. Here it
is possible to formulate a hypothesis which is purely tentative. The hypothesis formulated here
may be only one of the several alternatives available.
In example 3: The root cause of customer dissatisfaction is known, i.e. lack of personalised
service. In this case, it is possible to verify whether this is a cause or not.
The company's market share has declined but industry's figures are normal.
(b)
The industry is declining and hence the company's market share is also declining.
(c)
If we accept the situation that our company's sales are down despite the market showing
an upward trend, then we need to analyse the marketing mix variables.
Example:
(a)
A TV manufacturing company feels that its market share is declining whereas the overall
television industry is doing very well.
(b)
Due to a trade embargo imposed by a country, textiles exports are down and hence sales
of a company making garment for exports is on the decline.
The above information may be used to pinpoint the reason for declining sales.
2.
Experience Survey: In experience surveys, it is desirable to talk to persons who are well
informed in the area being investigated. These people may be company executives or
persons outside the organisation. Here, no questionnaire is required. The approach adopted
in an experience survey should be highly unstructured, so that the respondent can give
divergent views.
Caution Since the idea of using experience survey is to undertake problem formulation,
and not conclusion, probability sample need not be used. Those who cannot speak freely
should be excluded from the sample.
29
Research Methodology
Notes
Example:
(a)
A group of housewives may be approached for their choice for a "ready to cook product".
(b)
A publisher might want to find out the reason for poor circulation of newspaper introduced
recently. He might meet (i) Newspaper sellers (ii) Public reading room (iii) General
public (iv) Business community, etc.
Focus Group: Another widely used technique in exploratory research is the focus group.
In a focus group, a small number of individuals are brought together to study and talk
about some topic of interest. The discussion is co-ordinated by a moderator. The group
usually is of 8-12 persons. While selecting these persons, care has to be taken to see that
they should have a common background and have similar experiences in buying. This is
required because there should not be a conflict among the group members on the common
issues that are being discussed. During the discussion, future buying attitudes, present
buying opinion, etc., are gathered.
Most of the companies conducting the focus groups first screen the candidates to determine
who will compose the particular group. Firms also take care to avoid groups, in which
some of the participants have their friends and relatives, because this leads to a biased
discussion. Normally, a number of such groups are constituted and the final conclusion of
various groups are taken for formulating the hypothesis. Therefore a key factor in focus
group is to have similar groups. Normally there are 4-5 groups. Some of them may even
have 6-8 groups. The guiding criteria is to see whether the latter groups are generating
additional ideas or repeating the same with respect to the subject under study. When this
shows a diminishing return from the group, the discussions stopped. The typical focus
group lasts for 1-30 hours to 2 hours. The moderator under the focus group has a key role.
His job is to guide the group to proceed in the right direction.
The following should be the characteristics of a moderator/facilitator:
30
(a)
Listening: He must have a good listening ability. The moderator must not miss the
participant's comment, due to lack of attention.
(b)
Permissive: The moderator must be permissive, yet alert to the signs that the group
is disintegrating.
(c)
Memory: He must have a good memory. The moderator must be able to remember
the comments of the participants. Example: A discussion is centered around a new
advertisement by a telecom company. The participant may make a statement early
and make another statement later, which is opposite to what was said earlier.
Example: The participant may say that s(he) never subscribed to the views expressed
in the advertisement by the competitor, but subsequently may say that the "current
advertisement of competitor is excellent".
(d)
(e)
(f)
Sensitivity: The moderator must be sensitive enough to guide the group discussion.
(g)
(h)
Notes
4.
1.
Respondent moderator group: Under this method, the moderator will select one of the
participants to act as a temporary moderator.
2.
Dualing moderator group: In this method, there are two moderators. They purposely
take opposing positions on a given topic. This will help the researcher to obtain the
views of both groups.
3.
Two way focus group: Under this method one group will listen to the other group.
Later, the second group will react to the views of the first group.
4.
Dual moderator group: Here, there are two moderators. One moderator will make
sure that the discussion moves smoothly. The second moderator will ask a specific
question.
Case Studies: Analysing a selected case sometimes gives an insight into the problem
which is being researched. Case histories of companies which have undergone a similar
situation may be available. These case studies are well suited to carry out exploratory
research. However, the result of investigation of case histories are always considered
suggestive, rather than conclusive. In case of preference to "ready to eat food", many case
histories may be available in the form of previous studies made by competitors. We must
carefully examine the already published case studies with regard to other variables such
as price, advertisement, changes in the taste, etc.
Quantitative: Census, housing, social security as well as electoral statistics and other
related databases.
2.
Notes Secondary data can also be helpful in the research design of subsequent primary
research and can provide a baseline with which the collected primary data results can be
compared to. Therefore, it is always wise to begin any research activity with a review of
the secondary data.
31
Research Methodology
Notes
Secondary data is classified in terms of its source - either internal or external. Internal, or inhouse data, is secondary information acquired within the organization where research is being
carried out. External secondary data is obtained from outside sources.
2.
Sales and marketing reports: These can include such things as:
(a)
(b)
(c)
Method of payment
(d)
(e)
Sales territory
(f)
Salesperson
(g)
Date of purchase
(h)
Amount of purchase
(i)
Price
(j)
Application by product
(k)
Location of end-user
Accounting and financial records: These are often an overlooked source of internal
secondary information and can be invaluable in the identification, clarification and
prediction of certain problems. Accounting records can be used to evaluate the success of
various marketing strategies such as revenues from a direct marketing campaign.
There are several problems in using accounting and financial data. One is the timeliness
factor it is often several months before accounting statements are available. Another is
the structure of the records themselves. Most firms do not adequately setup their accounts
to provide the types of answers to research questions that they need. For example, the
account systems should capture project/product costs in order to identify the company's
most profitable (and least profitable) activities.
Companies should also consider establishing performance indicators based on financial
data. These can be industry standards or unique ones designed to measure key performance
factors that will enable the firm to monitor its performance over a period of time and
compare it to its competitors. Some example may be sales per employee, sales per square
foot, expenses per employee (salesperson, etc.).
3.
32
Miscellaneous reports: These can include such things as inventory reports, service calls,
number (qualifications and compensation) of staff, production and R&D reports. Also the
company's business plan and customer calls (complaints) log can be useful sources of
information.
Notes
Federal government
2.
Provincial/state governments
3.
Statistics agencies
4.
Trade associations
5.
6.
7.
Annual reports
8.
Academic publications
9.
Library sources
10.
Computerized bibliographies
11.
Syndicated services.
The two major advantages of using secondary data in market research are time and cost savings.
1.
2.
When secondary data is available, the researcher need only locate the source of the data
and extract the required information.
3.
Secondary research is generally less expensive than primary research. The bulk of
secondary research data gathering does not require the use of expensive, specialized,
highly trained personnel.
4.
There are also a number of disadvantages of using secondary data. These include:
1.
Secondary information pertinent to the research topic is either not available, or is only
available in insufficient quantities.
2.
Some secondary data may be of questionable accuracy and reliability. Even government
publications and trade magazines statistics can be misleading.
3.
4.
Much secondary data is several years old and may not reflect the current market conditions.
Trade journals and other publications often accept articles six months before appear in
print. The research may have been done months or even years earlier.
Did u know? Many trade magazines survey their members to derive estimates of market
size, market growth rate and purchasing patterns, then average out these results. Often
these statistics are merely average opinions based on less than 10% of their members.
33
Research Methodology
Notes
2.
3.
collects evidence
4.
5.
produces findings that are applicable beyond the immediate boundaries of the study
34
2.
In-depth interviews are optimal for collecting data on individuals' personal histories,
perspectives, and experiences, particularly when sensitive topics are being explored.
3.
Focus groups are effective in eliciting data on the cultural norms of a group and in
generating broad overviews of issues of concern to the cultural groups or subgroups
represented.
Notes
Task Enlist the basic differences between quantitative and qualitative research methods.
Self Assessment
Fill in the blanks:
3.
4.
5.
In experience surveys, it is desirable to talk to persons who are well informed in the area
being .............................
6.
Most of the companies conducting the ........................... groups first screen the candidates to
determine who will compose the particular group.
7.
8.
Who? Who is regarded as a shopper responsible for the success of the shop, whose
demographic profile is required by the retailer?
2.
3.
4.
5.
Should the measurement be made while the shopper is shopping or at a later time?
35
Research Methodology
Notes
6.
7.
Should it be outside the stores, soon after they visit or should we contact them at their
residence?
8.
9.
What is the purpose of measurement? Based on the information, are there any strategies
which will help the retailer to boost the sales? Does the retailer want to predict future sales
based on the data obtained?
10.
Answer to some of the above questions will help us in formulating the hypothesis.
11.
(b)
(c)
(d)
(e)
2.
3.
To make a prediction. We might be interested in sales forecasting for the next three years,
so that we can plan for training of new sales representatives.
4.
Research problem
36
Hypothesis
1.
Longitudinal study
2.
Cross-sectional study
1.
Longitudinal Study: These are the studies in which an event or occurrence is measured
again and again over a period of time. This is also known as 'Time Series Study'. Through
longitudinal study, the researcher comes to know how the market changes over time.
Notes
Longitudinal studies involve panels. Panel once constituted will have certain elements.
These elements may be individuals, stores, dealers, etc. The panel or sample remains
constant throughout the period. There may be some dropouts and additions. The sample
members in the panel are being measured repeatedly. The periodicity of the study may be
monthly or quarterly etc.
Example: For longitudinal study, assume a market research is conducted on ready to eat
food at two different points of time T1 and T2 with a gap of 4 months. Each of the above two
times, a sample of 2000 household is chosen and interviewed. The brands used most in the
household is recorded as follows.
At T1
At T2
Brand X
Brands
500(25%)
600(30%)
Brand Y
700(35%)
650(32.5%)
Brand Z
400(20%)
300(15%)
Brand M
200(10%)
250(12.5%)
All others
200(10%)
250(12.5%)
200
100%
As can be seen between period T1 and T2 Brand X and Brand M has shown an improvement in
market share. Brand Y and Brand Z has decrease in market share, where as all other cate gories
remains the same. This shows that Brand A and M has gained market share at the cost of Y and
Z.
There are two types of panels: (a) True panel (b) Omnibus panel.
(a)
True panel: This involves repeat measurement of the same variables. Example: Perception
towards frozen peas or iced tea. Each member of the panel is examined at a different time,
to arrive at a conclusion on the above subject.
(b)
Omnibus panel: In omnibus panel too, a sample of elements is being selected and
maintained, but the information collected from the member varies. At a certain point of
time, the attitude of panel members "towards an advertisement" may be measured. At
some other point of time the same panel member may be questioned about the "product
performance".
We can find out what proportion of those who bought our brand and those who did not.
This is computed using the brand switching matrix.
(b)
The study also helps to identify and target the group which needs promotional effort.
(c)
Panel members are willing persons, hence a lot of data can be collected. This is because
becoming a member of a panel is purely voluntary.
37
Research Methodology
Notes
(d)
Panel data is more accurate than cross-sectional data because it is free from the error
associated with reporting past behaviour. Errors occur in past behaviour because of
time that has elapsed or forgetfulness.
The sample may not be representative. This is because sometimes, panels may be selected
on account of convenience.
(b)
The panel members who provide the data, may not be interested to continue as panel
members. There could be dropouts, migration, etc. Members who replace them may differ
vastly from the original member.
(c)
Remuneration given to panel members may not be attractive. Therefore, people may not
like to be panel members.
(d)
(e)
A lengthy period of membership in the panel may cause respondents to start imagining
themselves to be experts and professionals. They may start responding like experts and
consultants and not like respondents. To avoid this, no one should be retained as a member
for more than 6 months.
2.
Field study: This includes a depth study. Field study involves an in-depth study of a
problem, such as reaction of young men and women towards a product.
Example: Reaction of Indian men towards branded ready-to-wear suit. Field study is carried
out in real world environment settings. Test marketing is an example of field study.
(b)
Field survey: Large samples are a feature of the study. The biggest limitations of this
survey are cost and time. Also, if the respondent is cautious, then he might answer
the questions in a different manner. Finally, field survey requires good knowledge
like constructing a questionnaire, sampling techniques used, etc.
(ii)
(iii)
Urban population which does not use the product - Category III
(iv)
Semi-urban population which does not use the product - Category IV.
Here, we should know that the hypothesis need to be supported and tested by the sample data
i.e., the proportion of urbanities using the product should exceed the semi-urban population
using the product.
38
Notes
3.3.3 Survey
The survey is a research technique in which data are gathered by asking questions of respondents.
Survey research is one of the most important areas of measurement in applied social research.
The broad area of survey research encompasses any measurement procedures that involve
asking questions of respondents. A "survey" can be anything form a short paper-and-pencil
feedback form to an intensive one-on-one in-depth interview.
Types of Surveys
Surveys can be divided into two broad categories: the questionnaire and the interview.
Questionnaires are usually paper-and-pencil instruments that the respondent completes.
Interviews are completed by the interviewer based on the respondent says. Sometimes, it's hard
to tell the difference between a questionnaire and an interview. For instance, some people think
that questionnaires always ask short closed-ended questions while interviews always ask broad
open-ended ones. But you will see questionnaires with open-ended questions (although they do
tend to be shorter than in interviews) and there will often be a series of closed-ended questions
asked in an interview.
Survey research has changed dramatically in the last ten years. We have automated telephone
surveys that use random dialing methods. There are computerized kiosks in public places that
allows people to ask for input. A whole new variation of group interview has evolved as focus
group methodology. Increasingly, survey research is tightly integrated with the delivery of
service. Your hotel room has a survey on the desk. Your waiter presents a short customer
satisfaction survey with your check. You get a call for an interview several days after your last
call to a computer company for technical assistance. You're asked to complete a short survey
when you visit a web site.
Population Issues
He first set of considerations have to do with the population and its accessibility.
1.
2.
39
Research Methodology
Notes
read to some degree, your questionnaire may contain difficult or technical vocabulary.
Clearly, there are some populations that you would expect to be illiterate. Young children
would not be good targets for questionnaires.
3.
4.
5.
Sampling Issues
The sample is the actual group you will have to contact in some way. There are several important
sampling issues you need to consider when doing survey research.
40
1.
What data is available?: What information do you have about your sample? Do you
know their current addresses? Their current phone numbers? Are your contact lists up to
date?
2.
Can respondents be found?: Can your respondents be located? Some people are very busy.
Some travel a lot. Some work the night shift. Even if you have an accurate phone or
address, you may not be able to locate or make contact with your sample.
3.
Who is the respondent?: Who is the respondent in your study? Let's say you draw a sample
of households in a small city. A household is not a respondent. Do you want to interview
a specific individual? Do you want to talk only to the "head of household" (and how is that
person defined)? Are you willing to talk to any member of the household? Do you state
that you will speak to the first adult member of the household who opens the door? What
if that person is unwilling to be interviewed but someone else in the house is willing?
How do you deal with multi-family households? Similar problems arise when you sample
groups, agencies, or companies. Can you survey any member of the organization? Or, do
you only want to speak to the Director of Human Resources? What if the person you
would like to interview is unwilling or unable to participate? Do you use another member
of the organization?
4.
Can all members of population be sampled?: If you have an incomplete list of the population
(i.e., sampling frame) you may not be able to sample every member of the population.
Lists of various groups are extremely hard to keep up to date. People move or change their
names. Even though they are on your sampling frame listing, you may not be able to get
to them. And, it's possible they are not even on the list.
5.
Are response rates likely to be a problem?: Even if you are able to solve all of the other
population and sampling problems, you still have to deal with the issue of response rates.
Some members of your sample will simply refuse to respond. Others have the best of
intentions, but can't seem to find the time to send in your questionnaire by the due date.
Still others misplace the instrument or forget about the appointment for an interview.
Low response rates are among the most difficult of problems in survey research. They can
ruin an otherwise well-designed survey effort.
Notes
Question Issues
Sometimes the nature of what you want to ask respondents will determine the type of survey
you select.
1.
2.
3.
4.
5.
6.
41
Research Methodology
Notes
Content Issues
The content of your study can also pose challenges for the different survey types you might
utilize.
1.
2.
Bias Issues
People come to the research endeavor with their own sets of biases and prejudices. Sometimes,
these biases will be less of a problem with certain types of survey approaches.
1.
2.
3.
Administrative Issues
Last, but certainly not least, you have to consider the feasibility of the survey method for your
study.
42
1.
Costs: Cost is often the major determining factor in selecting survey type. You might
prefer to do personal interviews, but can't justify the high cost of training and paying for
the interviewers. You may prefer to send out an extensive mailing but can't afford the
postage to do so.
2.
Facilities: Do you have the facilities (or access to them) to process and manage your study?
In phone interviews, do you have well-equipped phone surveying facilities? For focus
groups, do you have a comfortable and accessible room to host the group? Do you have
the equipment needed to record and transcribe responses?
3.
Time: Some types of surveys take longer than others. Do you need responses immediately
(as in an overnight public opinion poll)? Have you budgeted enough time for your study
to send out mail surveys and follow-up reminders, and to get the responses back by mail?
Have you allowed for enough time to get enough personal interviews to justify that
approach?
4.
Notes
Clearly, there are lots of issues to consider when you are selecting which type of survey you
wish to use in your study. And there is no clear and easy way to make this decision in many
contexts. There may not be one approach which is clearly the best. You may have to make
tradeoffs of advantages and disadvantages. There is judgment involved. Two expert researchers
may, or the very same problem or issue, select entirely different survey methods. But, if you
select a method that isn't appropriate or doesn't fit the context, you can doom a study before you
even begin designing the instruments or questions themselves.
Self Assessment
Fill in the blanks:
9.
10.
11.
12.
43
Research Methodology
Notes
Exploratory Research
Descriptive Research
Focus group
Longitudinal
Literature Searching
Cross-sectional studies
Case study
Self Assessment
Fill in the blanks:
13.
14.
44
studied. There are many reasons for this, one of them being that true random assignment is not
possible in many cases. The three main reasons why you can't test everything deal with:
1.
2.
Ethics, because we can't randomly assign that some people receive a virus to test its
effects, or that some participants have to act as slaves and others as masters to test a
hypothesis, and
3.
Resources, if a researcher does not have the money or the equipment needed to perform a
study, then it won't be done.
Notes
Causal design is the study of cause and effect relationships between two or more variables.
William J. Goode & Paul K. Hatt in Methods in Social Research define cause and effect relationship
as:
"when two or more cases of given phenomenon have one and only one condition in common, that condition
may be regarded as the cause and effect of that phenomenon."
The set of causes generated to predict their effects, can be deterministic or probabilistic in
nature. The deterministic cause is the one which is essential and adequate for stimulating the
occurrence of another event. While the probabilistic is the one that is essential, but is not the
only one responsible for the stimulation of the occurrence of another event.
The objective is to determine which variable might be causing certain behaviour i.e., whether
there is a cause and effect relationship between variables, causal research must be undertaken.
This type of research is very complex and the researcher can never be completely certain that
there are not other factors influencing the causal relationship, especially when dealing with
people's attitudes and motivations. There are often much deeper psychological considerations
that even the respondent may not be aware of.
In marketing decision making, all the conditions allowing the most accurate casual statements
are not usually present but in these circumstances, casual inference will still be made by marketing
managers. Because in doing so they would want to be able to make casual statements about the
effects of their actions.
Example: The new advertising campaign a company developed has resulted in percentage
increase in sales or the sales discount strategy a company followed has resulted in percentage
increase in sales. In both of these examples, marketing managers are making a casual statement.
However, the scientific concept of casuality is complex and differs substantiality from the one
held by the common person on the street. The common sense view holds that a single event (the
cause) always results in another event (the effect) occurring. In science, we recognize that an
event has a number of determining conditions or causes which act together to make the event
probable. Note that the common sense notion of casuality is that the effect always follows the
cause. This is deterministic causation in contrast to scientific notion which specifies the effect
only as being probable. This is termed as probabilistic causation. The scientific notion holds that
we can only infer casuality and never really prove it. That is the chance of an incorrect inference
is always thought to exist. The world of marketing fits the scientific view of casuality. Marketing
effects are probabilistically caused by multiple factors and we can only infer a casual relationship.
The condition under which we can make casual inference are:
(a)
(b)
Concomitant variation
(c)
45
Research Methodology
Notes
Causal research design are used to provide a stronger basis for the existence of causal relationship
between variables. The researcher is able to control the influence of one or more extraneous
variables on the dependent variable. If it is not possible to control the influence of an extraneous
variable on the dependent variable, that variable is called confounded variable.
Caution Gender cannot be randomly assigned, and therefore already you cannot test all
causal hypotheses.
Defining the Problem: In defining the problem of the research objective, definition of key
terms, general background information, limitations of the study and order of presentation
should be mentioned in brief.
2.
Review of Existing Literature: In this head, researcher should study the summary of different
points of view on the subject matter as found in books, periodicals and approach to be
followed at the time of writing.
3.
Conceptual Framework and Methodology: Under this head the researcher should first
make a statement of the hypothesis. Discussion on the research methodology used, duly
pointing out the relationship between the hypothesis and objective of the study and
finally discussions about the sources and means of obtaining data should also be made. In
this head the researcher should also point out the limitations of methodology, if any, and
the natural crises from which the research is bound to suffer for such obvious limitations.
4.
Analysis of Data: Analysis of the data involves testing of hypothesis from data collected
and key conclusions thus arrived.
5.
Finally the researcher should mention about the bibliographies and appendices. The above
format is drawn after a standard framework followed internationally in preparation of a synopsis.
However, in our country, keeping in view the object of research, style and structure of synopsis
varies and quite often it is found that the research guide exercises his own discretion in synopsis
preparation than following some acceptable international norms. A standard format for
preparation of synopsis commonly used in management and commerce research in India may
be drawn as follows:
46
1.
Introduction: This includes definition of the problem and its review from a historical
perspective.
2.
Objective of the Study: It defines the research purpose and its speciality from the existing
available research in the related field.
3.
Literature Review: It includes among other things, different sources from which the required
abstract is drawn.
4.
Methodology: It is intended to draw out the sequences followed in research and ways and
manners of carrying out the survey and compilation of data.
5.
6.
Model: It underlies the nature and structure of the model that the researcher is going to
build in the light of survey findings.
Notes
Self Assessment
Fill in the blanks:
15.
. research is a way of seeing how actions now will affect a business in the future.
16.
Synopsis is an abstract form of research which underlines the research procedure followed
and is presented before the guide for evaluating its .
3.6 Experimentation
Experimentation Research is also known as causal research. Descriptive research, will suggest
the relationship if any between the variable, but it will not establish cause and effect relationship
between the variable. Example: The data collected may show that the no. of people who own a
car and their income has risen over a period of time. Despite this, we cannot say "No. of car
increase is due to rise in the income". May be, improved road conditions or increase in number
of banks offering car loans have caused in increase in the ownership of cars.
To find the causal relationship between the variables, the researcher has to do an experiment.
Example:
1.
Which print advertisement is more effective? Is it front page, middle page or the last page?
2.
Input
Output
Test units
Explanatory variable
(Independent variable)
Dependent variable
Test Units
These are units, on which the experiment is carried out. It is done, with one or more independent
variables controlled by a person to find out its effect, on a dependent variable.
47
Research Methodology
Notes
Explanatory Variable
These are the variables whose effects, researcher wishes to examine. For example, explanatory
variables may be advertising, pricing, packaging etc.
Dependent Variable
This is a variable which is under study. For example, sales, consumer attitude, brand loyalty etc.
Example: Suppose a particular colour TV manufacturer reduces the price of the TV by 20%.
Assume that his reduction is passed on to the consumer and expect the sales will go up by 15%
in next 1 year. This types of experiments are done by leading TV companies during festival
season.
The causal research finds out, whether the price reduction causes an increase in sales.
Extraneous Variables
These are also called as blocking variables Extraneous variables affects, the result of the
experiments.
Example:
1.
2.
Company introduces a product in two different cities. They would like to know the impact
of their advertising on sales. Simultaneously competitors product in one of the cities is not
available during this period due to strike in the factory. Now researcher cannot conclude
that sales of their product in that city has increased due to advertisement. Therefore this
experiment is confounded. In this case, strike is the confounding variable.
48
1.
History
2.
Maturation
3.
Testing
4.
Instrument variation
Notes
5.
Selection bias
6.
Experimental mortality
1.
History: History refers to those events, external to the experiment, but occurs at the same
time, as the experiment is being conducted. This may affect the result. Example: Let us say
that, a manufacture makes a 20% cut in the price of a product and monitors sales in the
coming weeks. The purpose of the research, is to find the impact of price on sales. Mean
while if the production of the product declines due to shortage of raw materials, then the
sales will not increase. Therefore, we cannot conclude that the price cut, did not have any
influence on sales because the history of external events have occurred during the period
and we cannot control the event. The event can only be identified.
2.
(a)
Pepsi is consumed when young. Due to passage of time the consumer becoming older,
might prefer to consume Diet pepsi or even avoid it.
(b)
Assume that training programme is conducted for sales man, the company wants to measure
the impact of sales programme. If the company finds that, the sales have improved, it may
not be due to training programme. It may be because, sales man have more experience
now and know the customer better. Better understanding between sales man and customer
may be the cause for increased sales.
Maturation effect is not just limited to test unit, composed of people alone. Organizations also
changes, dealers grow, become more successful, diversify, etc.
3.
Testing: Pre-testing effect occurs, when the same respondents are measured more than
once. Responses given at a later part will have a direct bearing on the responses given
during earlier measurement.
49
Research Methodology
Notes
4.
(b)
Change in the interviewer for pre-testing and post testing are different
The measurement in experiments will depend upon the instrument used to measure. Also
results may vary due to application of instruments, where there are several interviewers.
Thus, it is very difficult to ensure that all the interviewers will ask the same questions with
the same tone and develop the same rapport. There may be difference in response, because
each interviewer conducts the interview differently.
5.
Selection Bias: Selection bias occurs because 2 groups selected for experiment may not be
identical. If the 2 groups are asked various questions, they will respond differently. If
multiple groups are participating, this error will occur. There are two promotional
advertisement A & B for "Ready to eat food". The idea is to find effectiveness of the two
advertisements. Assume that the respondent exposed to 'A' are dominant users of the
product. Now suppose 50% of those who saw 'Advertisement A' bought the product and
only 10% of those who saw 'Advertisement B' bought the product. From the above, one
should not conclude that advertisement 'A' is more effective than advertisement 'B'. The
main difference may be due to food preference habits between the groups, even in this
case, internal validity might suffer but to a lesser degree.
6.
Experimental Mortality: Some members may leave the original group and some new
members join the old group. This is because some members might migrate to another
geographical area. This change in the members will alter the composition of the group.
Example: Assume that a vacuum cleaner manufacturer wants to introduce a new version. He
interviews hundred respondents who are currently using the older version. Let us assume that,
these 100 respondents have rated the existing vacuum cleaner on a 10 point scale (1 for lowest
and 10 for highest). Let the mean rating of the respondents be 7.
Now the newer version is demonstrated to the same hundred respondents and equipment is left
with them for 2 months. At the end of two months only 80 participant respond, since the
remaining 20 refused to answer. Now if the mean score of 80 respondents is 8 on the same 10
point scale. From this can we conclude that the new vacuum cleaner is better?
The answer to the above question depends on the composition of 20 respondents who dropped
out. Suppose the 20 respondents who dropped out had negative reaction to the product, then the
mean score would not have been 8. It may even be lower than 7. The difference in mean rating
does not give true picture. It does not indicate that the new product is better than the old product.
One might wonder, why not we leave the 20 respondent from the original group and calculate
the mean rating of the remaining 80 and compare. But this method also will not solve the
mortality effect. Mortality effect will occur in an experiment irrespective of whether the human
beings or involved or not.
Concomitant Variable
Concomitant variable is the extent to which a cause "X" and the effect "Y" vary together in a
predicted manner.
50
Notes
Example:
1.
Electrical car is new to India. People may or may not hold positive attitude about electrical
cars. Assume that, the company has undertaken a new advertising campaign "To change
the attitude of the people towards this car", so that the sale of this car can increase. Suppose,
in testing the result of this campaign, the company finds that both aims have been achieved
i.e., the attitude of the people towards electrical car has become positive and also the sales
have increased. Then we can say that there is a concomitant variation between attitude and
sales. Both variables move in the same direction.
2.
Assume that an education institute introduces a new elective which it claims is Job oriented.
The college authorities advertise this course in leading news paper. They would like to
know the perception of students to this course, and how many are willing to enroll. Now
if on testing, it is found the perception towards this course is positive and majority of the
respondent are willing to enroll, then we can say that, there is a concomitant variation
between perception and enrolment. Both variables move in the same direction.
2.
Before-after design
3.
Factorial design
4.
5.
Before-after Design
In this method, measurements are made before as well as after.
Example: Let us say that, an experiment is conducted to test an advertisement which is
aimed at reducing the alcoholism.
51
Research Methodology
Notes
Attitude and perception towards consuming liquor is measured before exposure to Ad. The
group is exposed to an advertisement, which tells them the consequences, and attitude is again
measured after several days. The difference, if any, shows the effectiveness of advertisement.
The above example of "Before-after" suffers from validity threat due to the following:
1.
Before measure effect: It alerts the respondents to the fact that they are being studied. The
respondents may discuss the topics with friends and relatives and change their behaviour.
2.
Instrumentation effect: This can be due to two different instruments being used, one
before and one after, change in the interviewers before and after, results in instrumentation
effect.
Factorial Design
Factorial design permits the researcher to test two or more variables at the same time. Factorial
design helps to determine the effect of each of the variables and also measure the interacting
effect of the several variables.
Example: A departmental store wants to study the impact of price reduction for a product.
Given that, there is also promotion (POP) being carried out in the stores (a) near the entrance
(b) at usual place, at the same time. Now assume that there are two price levels namely regular
price A1 and reduced price A2. Let there be three types of POP namely B 1, B2, & B3. There are 3
2 = 6 combinations possible. The combinations possible are B 1A1, B1A2, B2A1, B2A2, B3A1, B3A2.
Which of these combinations is best suited is what the researcher is interested. Suppose there are
60 departmental stores of the chain divided into groups of 10 stores. Now, randomly assign the
above combination to each of these 10 stores as follows:
Combinations
Sales
B1A1
S1
B1A2
S2
B2A1
S3
B2A2
S4
B3A1
S5
B3A2
S6
S1 TO S6 represents the sales resulting out of each variable. The data gathered will provide
details on product sales on account of two independent variables.
The two questions that will be answered are:
1.
2.
Is the display at the entrance more effective than the display at usual location? Also the
research will tell us about the interaction effect of the two variables.
52
1.
2.
3.
4.
Notes
No promotion
2.
3.
Window display
Which of the 3 will be effective? The out come may be affected by the size of the stores and the
time period. If we choose 3 stores and 3 time periods, the total number of combination is 33 = 9.
The arrangement is as follows:
Time period
Store
1
Self Assessment
Fill in the blanks:
17.
Explanatory variable are the variables whose effects, researcher wishes to .........................
18.
19.
..design helps to determine the effect of each of the variables and also measure
the interacting effect of the several variables.
3.7 Summary
There are primarily four types of research namely exploratory research, descriptive
research, Casual and experimental research.
53
Research Methodology
Notes
Exploratory research helps the researcher to become familiar with the problem. It helps to
establish the priorities for further research. It may or may not be possible to formulate
Hypothesis during exploratory stage.
To get an insight into the problem, literature search, experience surveys, focus groups,
and selected case studies assist in gaining insight into the problem.
The role of moderator or facilitator is extremely important in focus group. There are
several variations in the formation of focus group.
Descriptive research is used to describe the characteristics of the groups. It can also be used
forecasting or prediction.
Panel data is used in longitudinal studies. There are two different types of panels. True
panel and Omnibus panel. In true panel same measurement are made during period of
time. In Omnibus panel different measurement are made during a period of time.
Cross-sectional studies involves field study and field survey, the difference being the size
of sample.
Causal research is conducted mainly to prove the fact that one factor "X" the cause was
responsible for the effect "Y".
While conducting experiment, the researcher must guard against extraneous source of
error. This may confound the experiment.
3.8 Keywords
Causal Research: A research designed to determine cause and effect relationship.
Conclusive Research: This is a research having clearly defined objectives. In this type of research,
specific courses of action are taken to solve the problem.
Concomitant Variation: It is the extent to which cause and effect vary together.
Descriptive Research: It is essentially a research to describe something.
Ex-post Facto Research: Study of the current state and factors causing it.
Extraneous Variable: These variables affect the response of test units. Also known as confounding
variable.
Field Study: Field study involves an in-depth study of a problem, such as reaction of young men
and women towards a product.
Literature Research: It refers to "referring to a literature to develop a new hypothesis".
Longitudinal Study: These are the studies in which an event or occurrence is measured again
and again over a period of time.
2.
For each of the situation mentioned below, state whether the research should be
exploratory, descriptive or causal and why
(a)
54
(b)
To find out the consumer reaction regarding use of new detergents which are
economical
(c)
(d)
Estimate the sales potential for ready-to-eat food in the northeastern parts of India.
3.
In your analysis, what are the advantages and disadvantages of panel data?
4.
What do you see as the reason behind Latin Square Design testing only one variable?
5.
Do you see any benefit of factorial design over that of before-after design? Support your
answer with reasons.
6.
Is it necessary for the researcher to mention about the bibliographies and appendices?
Why/why not?
7.
8.
9.
Which type of research would you use to generate new product ideas and why?
10.
Which type of research study would you use to determine the characteristics of market?
Notes
Exploratory
2.
analyzing
3.
4.
flexible, versatile
5.
investigated
6.
focus
7.
participant
8.
unresponsive
9.
Longitudinal
10.
11.
repeat
12.
cost, time
13..
Descriptive
14.
exploratory
15.
Causal
16.
potentiality
17.
examine
18.
Test units
19.
Factorial
Books
55
Research Methodology
Notes
Sampling An Introduction
4.1.1
4.2
4.3
4.3.2
4.3.3
4.4
Fieldwork
4.5
Errors in Sampling
4.5.1
Sampling Error
4.5.2
Non-sampling Error
4.5.3
4.5.4
Non-response Error
4.5.5
Data Error
4.6
4.7
Sampling Distribution
4.8
Summary
4.9
Keywords
Objectives
After studying this unit, you will be able to:
56
Notes
Introduction
Sampling is the process of selecting units (e.g., people, organizations) from a population of
interest so that by studying the sample we may fairly generalize our results back to the population
from which they were chosen. Each observation measures one or more properties (weight,
location, etc.) of an observable entity enumerated to distinguish objects or individuals. Survey
weights often need to be applied to the data to adjust for the sample design. Results from
probability theory and statistical theory are employed to guide practice.
2.
3.
Specifying a sampling method for selecting items or events from the frame
4.
5.
6.
7.
products industry. These industries are limited in number, so a census will be suitable.
2.
2.
3.
57
Research Methodology
Notes
4.
Self Assessment
Fill in the blanks:
1.
2.
Sampling .................... is the list of elements from which the sample is actually drawn.
3.
4.
2.
3.
4.
5.
6.
7.
Selection of sample
1.
Elements
(b)
Sampling units
(c)
Extent
(d)
Time.
Example: If we are monitoring the sale of a new product recently introduced by a company,
say (shampoo sachet) the population will be:
58
(a)
(b)
(c)
(d)
2.
Telephone Directory
(b)
(c)
Notes
Example: You want to learn about scooter owners in a city. The RTO will be the frame,
which provides you names, addresses and the types of vehicles possessed.
3.
Specify the sampling unit: Individuals who are to be contacted are the sampling units. If
retailers are to be contacted in a locality, they are the sampling units.
Sampling unit may be husband or wife in a family. The selection of sampling unit is very
important. If interviews are to be held during office timings, when the heads of families
and other employed persons are away, interviewing would under-represent employed
persons, and over-represent elderly persons, housewives and the unemployed.
4.
5.
probability or
(b)
Determine the sample size: This means we need to decide "how many elements of the
target population are to be chosen?" The sample size depends upon the type of study that
is being conducted. For example: If it is an exploratory research, the sample size will be
generally small. For conclusive research, such as descriptive research, the sample size will
be large.
The sample size also depends upon the resources available with the company.
Did u know? Sample size depends on the accuracy required in the study and the permissible
errors allowed.
6.
Specify the sampling plan: A sampling plan should clearly specify the target population.
Improper defining would lead to wrong data collection.
Select the sample: This is the final step in the sampling process.
Goal orientation: This suggests that a sample design "should be oriented to the research
objectives, tailored to the survey design, and fitted to the survey conditions". If this is
done, it should influence the choice of the population, the measurement as also the
procedure of choosing a sample.
2.
Measurability: A sample design should enable the computation of valid estimates of its
sampling variability. Normally, this variability is expressed in the form of standard
errors in surveys. However, this is possible only in the case of probability sampling. In
non-probability samples, such a quota sample, it is not possible to know the degree of
precision of the survey results.
59
Research Methodology
Notes
3.
Practicality: This implies that the sample design can be followed properly in the survey,
as envisaged earlier. It is necessary that complete, correct, practical, and clear instructions
should be given to the interviewer so that no mistakes are made in the selection of
sampling units and the final selection in the field is not different from the original sample
design. Practicality also refers to simplicity of the design, i.e. it should be capable of being
understood and followed in actual operation of the field work.
4.
Economy: Finally, economy implies that the objectives of the survey should be achieved
with minimum cost and effort. Survey objectives are generally spelt out in terms of
precision, i.e. the inverse of the variance of survey estimates. For a given degree of precision,
the sample design should give the minimum cost. Alternatively, for a given per unit cost,
the sample design should achieve maximum precision (minimum variance).
It may be pointed out that these four criteria come into conflict with each other in most of the
cases,
Caution The researcher should carefully balance the conflicting criteria so that he is able to
select a really good sample design.
Self Assessment
Fill in the blanks:
5.
6.
The sample size depends upon the .................... available with the company.
Random sampling.
2.
3.
4.
Cluster sampling.
5.
Multistage sampling.
Random Sampling
Simple random sample is a process in which every item of the population has an equal probability
of being chosen.
60
Notes
2.
Using random number table: A random number table consists of a group of digits that are
arranged in random order, i.e., any row, column, or diagonal in such a table contains
digits that are not in any systematic order.
Tippet's table
(b)
(c)
40743
39672
80833
18496
10743
39431
88103
23016
53946
43761
31230
41212
24323
18054
Example: Taking the earlier example of stores. We first number the stores.
1
Equal Probability: This is also called as the random sampling with replacement.
61
Research Methodology
Notes
Example: Put 100 chits in a box numbered 1 to 100. Pick one number at random. Now the
population has 99 chits. Now, when a second number is being picked, there are 99 chits. In order
to provide equal probability, the sample selected is being replaced in the population.
(b)
Varying Probability: This is also called random sampling without replacement. Once a
number is picked, it is not included again. Therefore, the probability of selecting a unit
varies from the other. In our example, it is 1/100, 1/99, 1/98, 1/97 if we select four samples
out of 100.
2.
One unit between the first and Kth unit in the population list is randomly chosen.
3.
1000
50
Calculate
To select the first unit, we randomly pick one number between 1 to 20, say 17. So our
sample begins with 17, 37, 57.. Please note that only the first item was randomly
selected. The rest are systematically selected. This is a very popular method because we
need only one random number.
Proportionate stratified sampling: The number of sampling units drawn from each stratum
is in proportion to the population size of that stratum.
2.
Disproportionate stratified sampling: The number of sampling units drawn from each
stratum is based on the analytical consideration, but not in proportion to the size of the
population of that stratum.
62
1.
2.
Notes
No. of stores
Percentage of stores
Large stores
2,000
20
Medium stores
3,000
30
Small stores
5,000
50
10,000
100
Suppose we need 12 stores, then choose four from each strata, at random. If there was no
stratification, simple random sampling from the population would be expected to choose
two large stores (20% of 12) about four medium stores (30% of 12) and about six small
stores (50% of 12).
As can be seen, each store can be studied separately using the stratified sample.
Marketing Streaming
Finance Stream
HR Stream
32
11
33
34
36
12
13
35
37
40
15
17
38
39
43
18
20
41
42
46
19
21
44
45
47
22
24
49
48
60
23
25
59
58
10
57
28
26
60
56
14
50
27
29
52
51
16
53
31
30
55
54
63
Research Methodology
Notes
Third step - Determine how many are to be selected from marketing stream (say n1)
n1 = 30 1/10 = 30 1/10
Sample to be selected from marketing strata n1 = 30 1/10 = 3
Now we can select 3 numbers from among 30 numbers at random say 7, 60, 22
Similarly we can select n 2 n3
n2 = 20 1/10 = 2
The 2 numbers selected at random from finance stream are 13, 59
N3 = 10 1/10 = 1
Stratified sampling can be carried out with:
1.
2.
Size of stores
No. of stores
(Population)
Sample
Proportionate
Sample
Disproportionate
Large
2,000
20
25
Medium
3,000
30
35
Small
5,000
50
40
10,000
100
100
Size of stores
Sample Mean
Sales per store
No. of stores
Percent of stores
Large
200
2000
20
Medium
80
3000
30
Small
40
5000
50
10,000
100
The population mean of monthly sales is calculated by multiplying the sample mean by its
relative weight.
200 0.2 + 80 0.3 + 40 0.5 = 84
Sample Proportionate
If N is the size of the population.
n is the size of the sample.
i represents 1, 2, 3,..k [number of strata in the population]
\ Proportionate sampling
64
n1
n2
nk
n
P = N = N = ................ = N = N
1
2
k
=
Notes
n1 n
n
n 1 n 1 and so on
N1 N
N
Solution:
Total population, N = 10,000
Population in the strata of Hindus N1 = 6,000
Population in the strata of Muslims N2 = 2,000
Population in the strata of Christians N 3 = 1,000
Population in the strata of Sikhs N4 = 500
Population in the strata of Jains N 5 = 500
1
2
3
4
5
P= N =N =N =N =N =N
1
2
3
4
5
n
2000
n2
2,000
N
10,000
= 40
n3 =
n
200
N3
1,000
N
10,000
= 20
n4 =
n
200
N4
500
N
10,000
= 10
65
Research Methodology
Notes
n5 =
n
N 5 10
N
n = n1 + n2 + n3 + n4 + n5
= 120 + 40 + 20 + 10 + 10
= 200.
Sample Disproportion
Let is the variance of the stratum i,
where
i = 1, 2, 3.k.
The formula to compute the sample size of the stratum i is the variance of the stratum i,
where size of stratum i
ri = Sample size of stratum i
ri =
Ni
N
Solution:
66
Total Population
N = 1,500
N 1 = 600
N 2 = 500
N 3 = 400
Variance of stratum 1,
2 = 82 = 64
Variance of stratum 2,
2 = 52 = 25
Variance of stratum 3,
2 = 42 = 16
Sample size
n = 100
Stratum
Number
Size of the
stratum Ni
ri =
Ni
N
riin
ri in =
600
0.4
3.2
54
500
0.33
1.65
28
400
0.26
1.04
18
Total
ri in
Notes
ri
1 i
100
Example: Let us consider a case of 3 strata, of income group with given stratum variance.
Stratum
No. of Households
0 - 5000
5001-10,000
> 10,000
300
450
750
Total
1500
Stratum Variance
4.00
9.00
2.25
Find out the nos. From each stratum for a given sample size of 50?
Solution:
Disproportional Stratified Sampling
Stratum No (i)
No. of
elements/
Households
Strata
Variance
Stratum
Standard
Deviation
Sample
size (m)
Sampling
Ratio
(ni/N)
0 - 5000
300
4.00
2.0
10
0.033
5001-10000
450
9.00
3.0
22
0.049
> 10,000
750
2.25
1.5
18
0.024
Total
1500
50
n1 =
50
600 908
3075
n2 =
50
1350 22
3075
n3 =
50
1125 18
3075
Stratified Sampling in Practice: The main reasons for using stratified sampling for managerial
applications are:
1.
It can obtain information about different parts of the universe, i.e., it allows to draw
separate conclusion for each stratum.
2.
It often provides universe estimates of greater precision than other methods of random
sampling say simple random sampling.
However, the price paid for these advantages is high because of the complexity of design and
analysis.
67
Research Methodology
Notes
Cluster Sampling
The following steps are followed:
1.
2.
3.
Step 1: The above mentioned cluster sampling is similar to the first step of stratified random
sampling. But the two sampling methods are different. The key to cluster sampling is decided by
how homogeneous or heterogeneous the clusters are.
A major advantage of simple cluster sampling is the case of sample selection. Suppose, we have
a population of 20,000 units from which we wish to select 500 units. Choosing a sample of that
size is a very time-consuming process, if we use Random Numbers table. Suppose, the entire
population is divided into 80 clusters of 250 units each, we can choose two sample clusters
(2 250 = 500) easily by using cluster sampling. The most difficult job is to form clusters. In
marketing, the researcher forms clusters so that he can deal with each cluster differently.
Example: Assume there are 20 households in a locality.
Cross
Houses
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X16
We need to select eight houses. We can choose eight houses at random. Alternatively, two
clusters, each containing four houses can be chosen. In this method, every possible sample of
eight houses would have a known probability of being chosen i.e. chance of one in two. We
must remember that in the cluster, each house has the same characteristics. With cluster sampling,
it is impossible for certain random sample to be selected. For example, in the cluster sampling
process described above, the following combination of houses could not occur: X 1 X2 X5 X6 X9 X10
X13 X14. This is because the original universe of 16 houses have been redefined as a universe of
four clusters. So only clusters can be chosen as a sample.
Example: Suppose, we want to have 7500 households from all over the country. In such a
case, from the first stage, District, say 30 districts out of 600 are selected from all over the
country.
I Stage - Cities: Suppose 5 cities are selected out of each 30 districts; and
II Stage - Wards/Localities: say 10 wards/localities are selected from each city
III Stage - Households: 50 households are selected from each ward/locality.
In stage I, we can employ stratified sampling
In stage II, we can use cluster sampling
In stage III, we can have simple random sampling.
68
Notes
!
Caution The use of various methods shall give individually contribute towards accuracy,
cost, time, etc. This leads us to conclude that multistage sampling leads to saving of time,
labour and money. Apart from this wherever an appropriate frame is not available, the
use of multistage sampling has universal appeal.
Multistage Sampling
The name implies that sampling is done in several stages. This is used with stratified/cluster
designs.
An illustration of double sampling is as follows.
The management of a newly-opened club is solicits new membership. During the first rounds,
all corporates were sent details so that those who are interested may enroll. Having enrolled,
the second round concentrates on how many are interested to enroll for various entertainment
activities that club offers such as billiards, indoor sports, swimming, gym etc. After obtaining
this information, you might stratify the interested respondents. This will also tell you the
reaction of new members to various activities. This technique is considered to be scientific, since
there is no possibility of ignoring the characteristics of the universe.
Task What are the advantages and disadvantages of multistage sampling? Enlist.
Area Sampling
This is a probability sampling, a special form of cluster sampling.
Example: If someone wants to measure the sales of toffee in retail stores, one might choose
a city locality and then audit toffee sales in retail outlets in those localities.
The main problem in area sampling is the non-availability of lists of shops selling toffee in a
particular area. Therefore, it would be impossible to choose a probability sample from these
outlets directly. Thus, the first job is to choose a geographical area and then list out outlets
selling toffee. Then follows the probability sample for shops among the list prepared.
Example: You may like to choose shops which sell the brand-Cadbury dairy milk. The
disadvantage of the area sampling is that it is expensive and time-consuming.
Deliberate sampling
2.
3.
Sequential sampling
4.
Quota sampling
5.
Snowball sampling
6.
Panel samples
69
Research Methodology
Notes
Sequential Sampling
This is a method in which the sample is formed on the basis of a series of successive decisions.
They aim at answering the research question on the basis of accumulated evidence. Sometimes,
a researcher may want to take a modest sample and look at the results. Thereafter, s(he) will
decide if more information is required for which larger samples are considered. If the evidence
is not conclusive after a small sample, more samples are required. If the position is still
inconclusive, still larger samples are taken. At each stage, a decision is made about whether
more information should be collected or the evidence is now sufficient to permit a conclusion.
Example: Assume that a product needs to be evaluated.
A small probability sample is taken from among the current user. Suppose it is found that
average annual usage is between 200 to 300 units. It is known that the product is economically
viable only if the average consumption is 400 units. This information is sufficient to take a
decision to drop the product. On the other hand, if the initial sample shows a consumption level
of 450 to 600 units, additional samples are needed for further study.
Quota Sampling
Quota sampling is quite frequently used in marketing research. It involves the fixation of
certain quotas, which are to be fulfilled by the interviewers.
70
Suppose, 2,00,000 students are appearing for a competitive examination. We need to select 1% of
them based on quota sampling. The classification of quota may be as follows:
Notes
Quota
General merit
1,000
Sport
600
NRI
100
SC/ST
300
Total
2,000
The population is divided into segments on the basis of certain characteristics. Here, the
segments are termed as cells.
2.
Snowball Sampling
This is a non-probability sampling. In this method, the initial group of respondents are selected
randomly. Subsequent respondents are being selected based on the opinion or referrals provided
by the initial respondents. Further referrals will lead to more referrals, thus leading to a snowball
sampling. The referrals will have demographic and psychographic characteristics that are
relatively similar to the person referring them.
Example: College students bring in more students on the consumption of Pepsi. The major
advantage of snowball sampling is that it monitors the desired characteristics in the population.
Panel Samples
Panel samples are frequently used in marketing research. To give an example, suppose that one
is interested in knowing the change in the consumption pattern of households. A sample of
households is drawn. These households are contacted to gather information on the pattern of
consumption. Subsequently, say after a period of six months, the same households are approached
once again and the necessary information on their consumption is collected.
Here, each member of a universe has a known chance of being selected and included in the
sample.
2.
Any personal bias is avoided. The researcher cannot exercise his discretion in the selection
of sample items.
Example: Random sample and cluster sample.
71
Research Methodology
Notes
Non-probability Sample
In this case, the likelihood of choosing a particular universe element is unknown. The sample
chosen in this method is based on aspects like convenience, quota etc.
Example: Quota sampling and Judgment sampling.
When natural groupings are clear in a statistical population, cluster sampling technique is
used. While Stratified sampling is a method where in, the member of a group are grouped
into relatively homogeneous groups.
Cluster sampling can be chosen if the group consists of homogeneous members. On the
other hand, for heterogeneous members in the groups, stratified sampling is a good
option.
The benefit of cluster sampling over other sampling methods is, it is cheaper as compared
to the other methods. While the benefits of stratified sampling are, this method ignores
the irrelevant ones and focuses on the vital sub populations. Another advantage is, with
stratified random sampling method is that for different sub populations, the researcher
can opt for different sampling techniques. The stratified sampling method as well helps in
improving the efficiency and accuracy of the estimation and facilitates greater balancing
of statistical power of tests.
The major disadvantage of cluster sampling is, it initiates higher sampling error. This
sampling error may be represented as design effect. The disadvantages of stratified random
sampling method are, it calls for choice of relevant stratification variables which can be
tough at times. When there are homogeneous subgroups, random sampling method is not
much useful. The implementation of random sampling method is expensive and If not
provided with correct information about the population, then an error may be introduced.
All strata are represented in the sample; but only a subset of clusters are in the sample.
Self Assessment
Fill in the blanks:
72
7.
8.
9.
10.
11.
Stratified sampling can be carried out with ...................... proportion across the strata
proportionate stratified sample.
Notes
4.4 Fieldwork
The fieldwork consists of informal conversations as well as formal standardized interviews,
including projectives or questionnaires. Initially, a single person conducted the research. Changes
in society have shifted research for the most part into teamwork. However, a single person can
still conduct effective research. Traditionally, educational researchers began their research with
a set of hypothesis, whereas the fieldworker's hypothesis emerges through the fieldwork.
Fieldwork in its inception may seem to be disorganized. The notes may be scattered, information
is coming from all over the place. That is because the hypothesis has not yet emerged. Even
though, at times the hypothesis may become very clear rapidly. Once the hypothesis became
evident the fieldworker maintains an open mind thus allowing other hypothesis to emerge.
Another important difference between the types of research is the "nature of the proposition
sought: his propositions are rarely of the A causes B type, the usual casual interrelationships
between two or more variables dealt with in an experimental research".
Much of the naturalistic data is collected by using raw materials: notes stating the actual response
given. In order to be accurate recorders are often used. Experienced researchers create their own
techniques and develop the ability to remember the information that needs to be recorded.
Did u know? How does a fieldworker know when the Enquiry should finish?
The fieldworker knows when the inquiry should finish by analyzing the data as it is gathered.
The end arrives when the fieldworker sees patterns and no new significant changes.
Three important points that must be included are:
1.
2.
Most practitioners of the method probably consider its products to have full status as
actual studies
3.
Self Assessment
Fill in the blanks:
12.
13.
...................... researchers create their own techniques and develop the ability to remember
the information that needs to be recorded.
73
Research Methodology
Notes
also be done by choosing a sample without covering the entire population. There will be a
difference between the two methods with regard to monthly expenditure.
An MNC bank wants to pick up a sample among the credit card holders. They can readily
get a complete list of credit card holders, which forms their data bank. From this frame,
the desired individuals can be chosen. In this example, sample frame is identical to ideal
population namely all credit card holders. There is no sampling error in this case.
2.
Assume that a bank wants to contact the people belonging to a particular profession over
phone (doctors, lawyers) to market a home loan product. The sampling frame in this case
is the telephone directory. This sampling frame may pose several problems: (1) People
might have migrated. (2) Numbers have changed. (3) Many numbers were not yet listed.
The question is "Are the residents who are included in the directory likely to differ from
those who are not included"? The answer is yes. Thus in this case, there will be a sampling
error.
74
1.
Intimate the respondents in advance through a letter. This will improve the preparedness.
2.
3.
4.
5.
Increase of personal interview, I.D. card is essential to prove the bona fide.
6.
7.
8.
Notes
Self Assessment
Fill in the blanks:
14.
The only way to guarantee the minimization of sampling error is to choose the appropriate
...............................
15.
A ............................... is a specific list of population units, from which the sample for a study
being chosen.
16.
The first factor that must be considered in estimating sample size, is the error permissible.
2.
3.
Higher the confidence level in the estimate, the larger the sample must be. There is a trade
off between the degree of confidence and the degree of precision with a sample of fixed size.
75
Research Methodology
Notes
4.
The greater the number of sub-groups of interest within the sample, the greater its size
must be.
5.
6.
The issue of response rate: The issue to be considered in deciding the necessary sample
size is the actual number of questionnaires that must be sent out. Calculation-wise, we
may send questionnaires to the required number of people, but we may not receive the
response. For example, we may like to obtain the family income level from a mail survey,
but the researcher may not receive response from everyone. If the researcher feels the
response rate is 40%, then he needs to despatch that many extra questionnaires. A low
percentage of response can cause serious problems to the researcher. This is known as the
non-response error.
Non-response error may be due to (1) failure to locate, (2) flat refusal.
Failure to locate: People move to new destinations. However, if the sample frames used are of
recent origin, this problem can be overcome.
Flat refusal: We do not know if those who did not respond hold different views or opinions
from those who responded.
This implies that those who don't respond should be motivated. It can be done in any one of the
following ways:
1.
An advance letter informing the respondents that they will receive a questionnaire and
requesting their cooperation. This will generally increase the rate of response.
2.
Monetary incentive or gift given to respondents will yield a larger response rate.
3.
Proper follow up is necessary after the potential respondent received the questionnaire.
Example: Determine the sample size if standard deviation of the population is 3.9, population
mean is 36 and sample mean is 33 and the desired degree of precision is 99%.
Solution:
Given,
i.e.
where d X
2
2.576 3.9
n=
11.21 11
36 33
Example: Determine the sample size if the standard deviation of population is 12 and the
standard error (standard deviation of the sampling distribution) is 3.69.
Solution:
Given the standard deviation of population
= 12
76
Notes
2
n
X2 =
n=
n = 10.57 11
2
12
X 2 3.69
Example: Determine the sample size, if sample proportion p = 0.4 and standard error of
proportion is 0.043.
Solution:
Given that
We know that
p =
p2 =
n=
pq
n
pq
n
pq 0.4 0.6
p2 (0.043)2
= 129.79 130
Example: Determine the sample size if the standard deviation of population is 8.66, sample
mean is 45, population mean 43 and the desired degree of precision is 95%.
Solution:
Given that
= 43, X 45
= 8.66
z = 5% l.o.s
z = 1.96
where
d = X
2
1.96 8.66
n=
72.03 72
43 45
77
Research Methodology
Notes
Self Assessment
Fill in the blanks:
17.
18.
There is a trade off between the degree of confidence and the degree of ____________with
a sample of fixed size.
where is the standard deviation of the population distribution of that quantity and n is the size
(number of items) in the sample.
Self Assessment
Fill in the blanks:
19.
20.
The standard deviation of the sampling distribution of the statistic is referred to as the
............................... of that quantity
4.8 Summary
78
The most important factors distinguishing whether to choose sample or census is cost and
time. There are seven steps involved in selecting the sample.
There are two types of sample, namely, Probability sampling and Non-probability sample.
Random sampling can be chosen by Lottery method or using random number table.
In systematic random sampling, only the first number is randomly selected. Then by
adding a constant "K" remaining numbers are generated.
In stratified sampling, random samples are drawn from several strata, which has more or
less same characteristics.
Notes
4.9 Keywords
Census: It refers to complete inclusion of all elements in the population. A sample is a sub-group
of the population.
Deliberate Sampling: The investigator uses his discretion in selecting sample observations from
the universe. As a result, there is an element of bias in the selection.
Multistage Sampling: The name implies that sampling is done in several stages
Quota Sampling: Quota sampling is quite frequently used in marketing research. It involves the
fixation of certain quotas, which are to be fulfilled by the interviewers.
Random Sampling: Simple random sample is a process in which every item of the population
has an equal probability of being chosen.
Sample Frame: Sampling frame is the list of elements from which the sample is actually drawn.
Stratified Random Sampling: A probability sampling procedure in which simple random subsamples are drawn from within different strata, that are, more or less equal on some characteristics.
2.
Which method of sampling would you use in studies, where the level of accuracy can vary
from the prescribed norms and why?
3.
4.
Quota sampling does not require prior knowledge about the cell to which each population
unit belongs. Does this attribute serve as an advantage or disadvantage for Quota
Sampling?
5.
6.
One mobile phone user is asked to recruit another mobile phone user. What sampling
method is this known as and why?
7.
8.
Determine the sample size if the standard deviation of population is 20 and the standard
error is 4.1.
9.
What do see as the reason behind purposive sampling being known as judgement
sampling?
79
Research Methodology
Notes
10.
Suppose, the population consists of 45,000 households, divided into five (5) strata on the
basis of monthly income. This can be illustrating as below:
0
1000
1001
5000
5001
7500
7501
10,000
Above 10,000
Then
(a)
Find out the number of units from each strata if the sample constitutes 1% of the
population.
(b)
If selection is for 150 items selecting equally from each strata, find out the number of
sample units from each strata.
target
2.
frame
3.
large, homogeneous
4.
small
5.
target
6.
resources
7.
probability, non-probability
8.
two
9.
Equal Probability
10.
Varying Probability
11.
same
12.
disorganized
13.
Experienced
14.
sample size
15.
sampling frame
16.
non response
17.
larger
18.
precision
19.
population
20.
standard error
Books
80
Notes
CONTENTS
Objectives
Introduction
5.1
Nominal Scale
5.1.2
5.1.3
Interval Scale
5.1.4
Ratio Scale
5.2
5.3
Scaling Meaning
5.4
5.5
5.4.1
5.4.2
Non-comparative Scale
Reliability Analysis
5.5.2
Validity Analysis
5.6
Summary
5.7
Keywords
5.8
Review Questions
5.9
Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
Measurement is assigning numbers or other symbols to characteristics of objects being measured,
according to predetermined rules. Concept (or Construct) is a generalized idea about a class of
objects, attributes, occurrences, or processes.
Relatively concrete constructs comprises of aspects such as Age, gender, number of children,
education, income. Relatively abstract constructs take into accounts the aspects such as Brand
loyalty, personality, channel power, satisfaction.
81
Research Methodology
Notes
Scaling is the generation of a continuum upon which measured objects are located.
Scale is a quantifying measure a combination of items that is progressively arranged according
to value or magnitude. The purpose is to quantitatively represent an item's, person's, or event's
place in the scaling continuum.
Nominal scale
2.
Ordinal scale
3.
Interval scale
4.
Ratio scale
Characteristics
1.
2.
3.
Use: This scale is generally used in conducting in surveys and ex-post-facto research.
Example: Have you ever visited Bangalore?
Yes-1
No-2
'Yes' is coded as 'One' and 'No' is coded as 'Two'. The numeric attached to the answers has no
meaning, and is a mere identification. If numbers are interchanged as one for 'No' and two for
'Yes', it won't affect the answers given by respondents. The numbers used in nominal scales
serve only the purpose of counting.
The telephone numbers are an example of nominal scale, where one number is assigned to one
subscriber. The idea of using nominal scale is to make sure that no two persons or objects receive
the same number. Similarly, bus route numbers are the example of nominal scale.
82
Notes
!
Caution It should be kept in mind that nominal scale has certain limitation, viz.
1.
2.
3.
Statistical implication - Calculation of the standard deviation and the mean is not
possible. It is possible to express the mode.
Lux
2.
Liril
3.
Cinthol
4.
Lifebuoy
5.
Hamam
R ank
Item
N um ber of respondents
Cinthol
150
II
Liril
300
III
H amam
250
IV
Lux
200
Lifebuoy
100
T otal
1,000
Company Image
Functions
Price
Comfort
Design
Rank
5
3
2
1
4
Ordinal scale is used to arrange things in order. In qualitative researches, rank ordering is used
to rank characteristics units from the highest to the lowest.
83
Research Methodology
Notes
Characteristics
1.
The ordinal scale ranks the things from the highest to the lowest.
2.
3.
4.
5.
Scales involve the ranking of individuals, attitudes or items along the continuum of the
characteristics being scaled.
From the information provided by ordinal scale, the researcher knows the order of preference
but nothing about how much more one brand is preferred to another i.e., there is no information
about the interval between any two brands. All of the information, a nominal scale would have
given, is available from an ordinal scale. In addition, positional statistics such as the median,
quartile and percentile can be determined. It is possible to test for order correlation with ranked
data. The two main methods are Spearman's Ranked Correlation Coefficient and Kendall's
Coefficient of Concordance which shall be discussed later in the unit.
Did u know? What is the difference between nominal and ordinal scales?
In nominal scale numbers can be interchanged, because it serves only for the purpose of counting.
Numbers in Ordinal scale have meaning and it won't allow interchangeability.
1.
2.
Teachers are ranked in the University as professor, associate professors, assistant professors
and lecturers, etc.
3.
Professionals in good organizations are designated as GM, DGM, AGM, SR.MGR, MGR,
Dy. MGR., Asstt. Mgr. and so on.
4.
Ranking of two or more households according to their annual income or expenditure, e.g.
Households
5,000
9,000
7,000
13,000
21,000
E (1)
D(2)
B(3)
C(4)
A(5)
One can ask respondents questions on the basis of one or more attributes such as flower, colour,
etc., and ask about liking or disliking, e.g., whether the respondent likes soft drinks or not.
84
I strongly like it
+2
I like it
+1
I am indifferent
I dislike it
-1
I strongly dislike it
-2
Notes
In this manner, ranking can be obtained by asking the respondent their level of acceptability.
One can then combine the individual ranking and get a collective ranking of the group.
Interval scale uses the principle of "equality of interval" i.e., the intervals are used as the basis for
making the units equal assuming that intervals are equal.
It is only with an interval scaled data that researchers can justify the use of the arithmetic mean
as the measure of average. The interval or cardinal scale has equal units of measurement thus,
making it possible to interpret not only the order of scale scores but also the distance between
them. However, it must be recognized that the zero point on an interval scale is arbitrary and is
not a true zero. This, of course, has implications for the type of data manipulation and analysis
we can carry out on data collected in this form. It is possible to add or subtract a constant to all
of the scale values without affecting the form of the scale but one cannot multiply or divide the
values. It can be said that two respondents with scale positions 1 and 2 are as far apart as two
respondents with scale positions 4 and 5, but not that a person with score 10 feels twice as
strongly as one with score 5. Temperature is interval scaled, being measured either in Centigrade
or Fahrenheit. We cannot speak of 50F being twice as hot as 25F since the corresponding
temperatures on the centigrade scale, 100C and -3.9C, are not in the ratio 2:1.
Interval scales may be either numeric or semantic.
Characteristics
1.
2.
3.
4.
5.
Use: Most of the common statistical methods of analysis require only interval scales in order
that they might be used. These are not recounted here because they are so common and can be
found in virtually all basic texts on statistics.
85
Research Methodology
Notes
them. However, it must be recognized that the zero point on an interval scale is arbitrary and is
not a true zero. This, of course, has implications for the type of data manipulation and analysis
we can carry out on data collected in this form. It is possible to add or subtract a constant to all
of the scale values without affecting the form of the scale but one cannot multiply or divide the
values. It can be said that two respondents with scale positions 1 and 2 are as far apart as two
respondents with scale positions 4 and 5, but not that a person with score 10 feels twice as
strongly as one with score 5. Temperature is interval scaled, being measured either in Centigrade
or Fahrenheit. We cannot speak of 50F being twice as hot as 25F since the corresponding
temperatures on the centigrade scale, 100C and -3.9C, are not in the ratio 2:1.
Interval scales may be either numeric or semantic.
Characteristics
1.
2.
3.
4.
5.
Use: Most of the common statistical methods of analysis require only interval scales in order
that they might be used. These are not recounted here because they are so common and can be
found in virtually all basic texts on statistics.
Example:
1.
Suppose we want to measure the rating of a refrigerator using interval scale. It will appear
as follows:
(a)
Brand name
Poor Good
(b)
Price
High .. Low
(c)
Service after-sales
Poor Good
(d)
Utility
Poor .Good
The researcher cannot conclude that the respondent who gives a rating of 6 is 3 times more
favourable towards a product under study than another respondent who awards the rating
of 2.
2.
< 30 min.
(b)
30 min. to 1 hr.
(c)
1 hr. to 1 hrs.
(d)
> 1 hrs.
86
Notes
Characteristics
1.
2.
For measuring central tendency, geometric and harmonic means are used.
Self Assessment
Fill in the blanks:
1.
....................... scale may tell us "How far the objects are apart with respect to an attribute?"
2.
Ratio scale is a special kind of internal scale that has a meaningful .................................
Determining the level of the involved data; identifying whether it is nominal, ordinal,
interval or ratio.
There are two primary scale construction techniques, comparative and non-comparative. The
comparative technique is used to determine the scale values of multiple items by performing
87
Research Methodology
Notes
comparisons among the items. In the non-comparative technique, scale value of an item is
determined without comparing with another item. Furthermore, these two techniques are also
of many types. The various types of comparative techniques are:
1.
Pairwise comparison scale: This is an ordinal level scale construction technique, where a
respondent is provided with two items and then asked him to select his/her choice.
2.
Rasch model scale: In this technique, multiple respondents are simultaneously involved
with several items and from their responses comparisons are derived to determine the
scale values. Rank-order scale: This is also an ordinal level scale constructing technique,
where a respondent is provided with multiple items, which he needs to rank accordingly.
3.
Constant sum scale: In this scale construction technique, a respondent is usually provided
with a constant amount of money, credits or points that he needs to allocate to various
items for determining the scale values of the items.
Continuous rating scale: In this technique, respondents generally use a series of numbers
known as scale points for rating an item. This technique is also known as graphic rating
scaling.
2.
Likert scale: This technique allows the respondents to rate the items on a scale of five to
seven points depending upon the amount of their agreement or disagreement on the item.
3.
Semantic differential scale: In this technique, respondents are asked to rate the different
attributes of an item on a seven-point scale.
Self Assessment
Fill in the blanks:
3.
Scale construction techniques are used for measuring the .of a group.
4.
88
In the above figure, we are going to assess the attitude of an individual by analysing his thoughts
about drinkers. You can see that as you move down, the attitude or behaviour of people towards
drinkers become more provisional. If an individual agrees with a statement in the list, then it is
more likely that he will also agree with all of the assertions above that statement. Thus in this
example, the rule is growing one. So this is called scaling. Scaling is done in the research process
to test the hypothesis. Sometimes, you can also use scaling as the part of probing research.
Notes
Self Assessment
Fill in the blanks:
5.
6.
2.
Scaling
Techniques
Comparative
Scales
Paired
Comparison
Non-comparative
Scales
Constant
Sum
Continuous
Rating Scales
Itemized
Rating Scales
Rank
Order
Likert
Stapel
Semantic
Differential
Example: Here a respondent is asked to show his preferences from among five brands of
coffee A, B, C, D and E with respect to flavours. He is required to indicate his preference in
pairs. A number of pairs are calculated as follows. The brands to be rated are presented two at a
time, so each brand in the category is compared once to every other brand. In each pair, the
respondents were asked to divide 100 points on the basis of how much they liked one compared
to the other. The score is totally for each brand.
No. of pairs =
N(N 1)
2
89
Research Methodology
Notes
In this case, it is
5(5 1)
2
A&B
B&D
A&C
B&E
A&D
C&D
A&E
C&E
B&C
D&E
If there are 15 brands to be evaluated, then we have 105 paired comparison(s) and that is the
limitation of this method.
Example: For each pair of professors, please indicate the professor from whom you prefer
to take classes with a 1.
Cunningham
Cunningham
Day
Parker
Thomas
Day
Parker
Thomas
# of times Preferred
2.
3.
4.
Example: Please rank the instructors listed below in order of preference. For the instructor
you prefer the most, assign a "1", assign a "2" to the instructor you prefer the 2nd most, assign a
"3" to the instructor that you prefer 3rd most, and assign a "4" to the instructor that you prefer the
least.
Instructor
Ranking
Cunningham
Day
Parker
Thomas
90
1.
Respondents are asked to allocate a constant sum of units among a set of stimulus objects
with respect to some criterion
2.
Notes
3.
4.
Example: Listed below are 4 marketing professors, as well as 3 aspects that students typically
find important. For each aspect, please assign a number that reflects how well you believe each
instructor performs on the aspect. Higher numbers represent higher scores. The total of all the
instructors' scores on an aspect should equal 100.
Instructor
Availability
Fairness
Easy Tests
Cunningham
30
35
25
Day
30
25
25
Parker
25
25
25
Thomas
15
15
25
Sum Total
100
100
100
10
20
30
40
50
60
70
80
90
100
Likert Scale
It is known as summated rating scale. This consists of a series of statements concerning an
attitude object. Each statement has '5 points', Agree and Disagree on the scale. They are also
called summated scales, because scores of individual items are summated to produce a total
score for the respondent. The Likert Scale consists of two parts-item part and evaluation part.
Item part is usually a statement about a certain product, event or attitude. Evaluation part is a list
of responses like "strongly agree" to "strongly disagree". The five point-scale is used here. The
numbers like +2, +1, 0, 1, 2 are used. Now, let us see with an example how the attitude of a
customer is measured with respect to a shopping mall.
Table 5.1: Evaluation of Globus-the Super Market by Respondent
S.No.
Strongly
disagree
Strongly
agree
1.
2.
3.
4.
5.
6.
91
Research Methodology
Notes
The respondents' overall attitude is measured by summing up his (her) numerical rating on the
statement making up the scale. Since some statements are favourable and others unfavourable,
it is the one important task to be done before summing up the ratings. In other words, "strongly
agree" category attached to favourable statement and "strongly disagree" category attached to
unfavourable. The statement must always be assigned the same number, such as +2, or 2. The
success of the Likert Scale depends on "How well the statements are generated?" The higher the
respondent's score, the more favourable is the attitude. For example, if there are two shopping
malls, ABC and XYZ and if the scores using the Likert Scale are 30 and 60 respectively, we can
conclude that the customers' attitude towards XYZ is more favourable than ABC.
!
Caution The Likert Scale must contain an equal number of favourable and unfavourable
statements.
Notes Some individuals have favourable descriptions on the right side, while some have
on the left side. The reason for the reversal is to have a combination of both favourable
and unfavourable statements.
Scale items
+1
+2
+3
1.
Not reliable
Reliable
2.
Expensive
Not expensive
3.
Trustworthy
Not trustworthy
4.
Untimely delivery
Timely delivery
5.
The respondents were asked to tick one of the seven categories which describes their views on
attitude. Computation is being done exactly the same way as in the Likert Scale. Suppose, we are
trying to evaluate the packaging of a particular product. The seven point scale will be as follows:
"I feel ..
92
1.
Delighted
2.
Pleased
3.
Mostly satisfied
4.
5.
Mostly dissatisfied
6.
Unhappy
7.
Terrible.
Notes
Thurstone Scale
This is also known as an equal appearing interval scale. The following are the steps to construct
a Thurstone Scale:
Step 1: To generate a large number of statements, relating to the attitude to be measured.
Step 2: These statements (75 to 100) are given to a group of judges, say 20 to 30, who were asked
to classify them according to the degree of favourableness and unfavourableness.
Step 3: 11 piles are to be made by the judges. The piles vary from "most unfavourable" in pile 1
to neutral in pile 6 and most favourable statement in pile 11.
Step 4: Study the frequency distribution of ratings for each statement and eliminate those
statements, which different judges have given widely scattered ratings.
Step 5: Select one or two statements from each of the 11 piles for the final scale. List the selected
statements in random order to form the scale.
Step 6: The respondents whose attitudes are to be scaled were given the list of statements and
asked to indicate their agreement or disagreement with each statement. Some may agree with
one statement while some may agree with more than one statement.
Example:
1.
(b)
(c)
(d)
(e)
Watching a movie with crime and violence does not interfere with my routine life.
(f)
I have no opinion one way or the other, about watching movies with crime and
violence.
(g)
(h)
Most movies with crime and violence are interesting and absorbing.
(i)
(j)
People learn "how to be safe and protect oneself" by seeing a movie on crime.
(k)
Conclusion: A respondent might agree with statements 8, 9 and 10. Such agreement
represents a favourable attitude towards crime and violence. On the contrary, if items 1, 3,
4 are chosen by respondents, it shows that respondents are unfavourably disposed towards
crime in movies. If the respondent chooses 1, 5 and 11, it could be interpreted to indicate
that s(he) is not consistent in his(her) attitude about the subject.
93
Research Methodology
Notes
2.
One should live for the present and not the future. So, savings are absolutely not
required.
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Some amount of savings and investments are a must for every individual.
(j)
(k)
Multidimensional Scaling
This is used to study consumer attitudes, particularly with respect to perceptions and preferences.
These techniques help identify the product attributes that are important to the customers and to
measure their relative importance. Multi-Dimensional Scaling is useful in studying the following:
1.
(a) What are the major attributes considered while choosing a product (soft drinks, modes
of transportation)? (b) Which attributes do customers compare to evaluate different brands
of the product? Is it price, quality, availability etc.?
2.
Which is the ideal combination of attributes according to the customer? (i.e., which two or
more attributes consumer will consider before deciding to buy.)
3.
Which advertising messages are compatible with the consumer's brand perceptions?
Notes The multidimensional scaling is used to describe similarity and preference of brands.
The respondents were asked to indicate their perception, or the similarity between various
objects (products, brands, etc.) and preference among objects. This scaling is also known as
perceptual mapping.
There are two ways of collecting the input data to plot perceptual mapping:
1.
94
Non-attribute method: Here, the researcher asks the respondent to make a judgment
about the objects directly. In this method, the criteria for comparing the objects is decided
by the respondent himself.
2.
Attribute method: In this method, instead of respondents selecting the criteria, they were
asked to compare the objects based on the criteria specified by the researcher.
Notes
For example, to determine the perception of a consumer: Assume there are five insurance
companies to be evaluated on two attributes namely (1) convenient locality (2) courteous personal
service. Customers' perception regarding the five insurance companies are as follows:
Figure 5.3
2.
3.
4.
5.
Software such as SPSS, SAS and Excel are the packages used in MDS. Brand positioning research
is one of SPSS's important features. SAS is a business intelligence software. Excel is also used to
a certain extent.
Stapel Scales
1.
Modern versions of the Stapel scale place a single adjective as a substitute for the semantic
differential when it is difficult to create pairs of bipolar adjectives.
2.
The advantage and disadvantages of a Stapel scale, as well as the results, are very similar
to those for a semantic differential.
95
Research Methodology
Notes
Scale
Basic
Characteristics
Examples
Advantages
Disadvantages
Continuous
Rating Scale
Place a mark on
a continuous
line
Reaction to TV
commercials
Easy to construct
Cumbersome
scoring unless
computerized
Degree of
agreement on a
numbered scale
Measurement
of attitudes,
perceptions
Easy to construct,
administer, &
understand
More time
consuming
Semantic
Differential
Numbered scale
with bipolar
labels
Brand,
product, &
company
images
Versatile
Difficult to
construct
appropriate
bipolar adjectives
Stapel Scale
Unipolar
numbered scale,
no neutral point
Measurement
of attitudes &
images
Easy to construct,
can administer
over telephone
Confusing
difficult to apply
Self Assessment
Fill in the blanks:
7.
The advantage and disadvantages of a Stapel scale, as well as the results, are very similar
to those for a ...................... differential.
8.
9.
10.
11.
12.
13.
96
1.
Reliability; and
2.
Validity
Notes
Scale
Evaluation
Reliability
Validity
Internal
Consistency
Test-Retest
Content
Criterion
Alternative
Forms
Construct
Convergent
Validity
Discriminant
Validity
Nomological
Validity
2.
3.
4.
The consistency with which each item represents the construct of interest
(b)
(c)
Split-half Reliability
5.
Items constituting the scale divided into 2 halves, and resulting half scores are correlated:
Coefficient alpha (most common test of reliability)
6.
Average of all possible split-half coefficients resulting from different splitting of the scale
items.
97
Research Methodology
Notes
Construct Validity: A sales manager believes that there is a clear relation between job
satisfaction for a person and the degree to which a person is an extrovert and the work
performance of his sales force. Therefore, those who enjoy high job satisfaction, and have
extrovert personalities should exhibit high performance. If they do not, then we can
question the construct validity of the measure.
2.
Content Validity: A researcher should define the problem clearly. Identify the item to be
measured. Evolve a suitable scale for this purpose. Despite these, the scale may be criticised
for being lacking in content validity. Content validity is known as face validity. An
example can be the introduction of new packaged food. When new packaged food is
introduced, the product representing a major change in taste. Thousands of consumers
may be asked to taste the new packaged food. Overwhelmingly, people may say that they
liked the new flavour. With such a favourable reaction, the product when introduced on a
commercial scale may still meet with failure. So, what is wrong? Perhaps a crucial question
that was omitted. The people may be asked if liked the new packaged food, to which the
majority might have "yes" but the same respondents were not asked, "Are you willing to
give up the product which you are consuming currently?" In this case, the problem was not
clearly identified and the item to be 'measured' was left out.
3.
Predictive Validity: This pertains to "How best a researcher can guess the future
performance from the knowledge of attitude score"?
Example: An opinion questionnaire, which is the basis for forecasting the demand for a
product has predictive validity. The procedure for predictive validity is to first measure the
attitude and then predict the future behaviour. Finally, this is followed by the measurement of
future behaviour at an appropriate time. Compare the two results (past and future). If the two
scores are closely associated, then the scale is said to have predictive validity.
4.
98
Criterion Validity:
(a)
(b)
5.
Convergent Validity: Extent to which scale correlates positively with other measures of
the same construct.
6.
Discriminant Validity: Extent to which a measure does not correlate with other constructs
from which it is supposed to differ.
7.
Old Rifle
Low Reliability
(Target A)
New Rifle
High Reliabillity
(Target B)
(Target B)
Notes
Self Assessment
Fill in the blanks:
14.
An ....................... questionnaire, which is the basis for forecasting the demand for a product
has predictive validity.
15.
Those who enjoy high job satisfaction, and have extrovert personalities should exhibit
.................... performance.
16.
17.
There are two criteria to decide whether the scale selected is good or not, viz. .......................
and ....................
5.6 Summary
The scales show the extent of likes/dislikes, agreement disagreement or belief towards an
object.
There are four types of scales used in market research namely paired comparison, Likert,
semantic differential and thurstone scale.
Likert is a five point scale whereas semantic differential scale is a seven point scale.
Thurstone scale is used to assess attitude of the respondents group regarding any issue of
public interest.
Validity and reliability of the scale is verified before the scale is used for measurement.
There are three methods to check the validity which type of validity is required depends
on "What is being measured".
99
Research Methodology
Notes
5.7 Keywords
Interval Scale: Interval scale may tell us "How far the objects are apart with respect to an
attribute?"
Likert Scale: This consists of a series of statements concerning an attitude object. Each statement
has '5 points', Agree and Disagree on the scale.
Ordinal Scale: The ordinal scale is used for ranking in most market research studies.
Ratio Scale: Ratio scale is a special kind of internal scale that has a meaningful zero point.
Reliability: It means the extent to which the measurement process is free from errors.
2.
3.
Which do you find to be more favorable of the attribute and non-attribute method of
perceptual mapping and why?
4.
5.
One of the limitations of MDS can be that it keeps changing from time to time. What else
than this do you see as the major drawbacks it has?
6.
What can be the reasons for which you think that maintaining reliability can become
difficult?
7.
Does measurement scale always perform as expected in relation to other variables selected
as meaningful criteria? Why/why not?
8.
On an average, how many cups of tea do you drink in a day and why? Reply technically.
9.
Likert scale
(b)
(c)
Thurstone scale
10.
11.
Identify the type of scale, you will use in each of the following (ordinal, nominal, internal,
ratio). Justify your answer.
100
1.
Interval
2.
zero point
3.
attitude
4.
multiple
Attitude
6.
hypothesis
7.
semantic
8.
Multidimensional
9.
10.
Likert
11.
12.
Rank Order
13.
two or more
14.
opinion
15.
high
16.
accuracy, consistency
17.
reliability, validity
Notes
Books
101
Research Methodology
Notes
6.2
Observation Method
6.3
6.4
6.5
6.2.1
6.2.2
6.2.3
6.3.2
Characteristics of Survey
6.3.3
Purpose of Survey
6.3.4
Advantages of Survey
6.3.5
Disadvantages of Survey
Survey Methods
6.4.1
Personal Interviews
6.4.2
Telephone Surveys
6.4.3
6.4.4
E-mail Surveys
6.4.5
6.4.6
Mail Questionnaire
Questionnaire
6.5.1
6.6
Summary
6.7
Keywords
6.8
Review Questions
6.9
Further Readings
Objectives
After studying this unit, you will be able to:
102
Notes
Introduction
The data directly collected by the researcher, with respect to the problem under study, is known
as primary data. Primary data is also the firsthand data collected by the researcher for the
immediate purpose of the study.
Primary data is the data that is collected by the researchers for the purpose of investigation. This
data is original in character and generated by surveys. Primary data is the information collected
during the course of experiment in an experimental research. It can also be obtained through
observations or through direct communication with the persons associated with the selected
subject by performing surveys or descriptive research.
Validity: Validity is one of the major concerns in a research. Validity is the quality of a
research that makes it trustworthy and scientific. Validity is the use of scientific methods
in research to make it logical and acceptable. Using primary data in research can improves
the validity of research. First hand information obtained from a sample that is
representative of the target population will yield data that will be valid for the entire
target population.
2.
3.
Reliability: Reliability is the certainty that the research is enough true to be trusted on.
For example, if a research study concludes that junk food consumption does not increase
the risk of cancer and heart diseases. This conclusion should have to be drawn from a
sample whose size, sampling technique and variability is not questionable. Reliability
improves with using primary data. In the similar research mentioned above if the researcher
uses experimental method and questionnaires the results will be highly reliable. On the
other hand, if he relies on the data available in books and on internet he will collect
information that does not represent the real facts.
103
Research Methodology
Notes
One limitation of primary data collection is that it consumes a lot of time. The researchers will
need to make certain preparations in order to handle the different demands of the processes and
at the same time, manage time effectively. Besides time consumption, the researchers will
collect large volumes of data when they collect primary data. Since they will interact with
different people, they will end up with large volumes of data, which they will need to go
through when analyzing and evaluating their findings. The primary data also require the greater
proportion of workforce to be engaged in the collection of information and analysis, which
enhances complexity of operations. There is requirement of large amount of resources to collect
primary data.
There are several methods of collecting the primary data, which are as follows:
Observation Method
Interview Method
Through Questionnaires
Through Schedules
Other methods such as warranty cards, distributor audits, pantry audits, consumer panels, using
mechanical devices, through projective techniques, depth interviews and content analysis.
Observation and questioning are two broad approaches available for primary data collection.
The major difference between the two approaches is that in the questioning process, the
respondents play an active role because of their interaction with the researcher.
Self Assessment
Fill in the blanks:
1.
2.
The major difference between the observation and questioning approaches is that in the
.process
Example: Suppose a Road Safety Week is observed in a city and the public is made aware of
advance precautions while walking on the road. After one week, an observer can stand at a street
corner and observe the number of people walking on the footpath and those walking on the
road during a given period of time. This will tell him whether the campaign on safety is
successful or unsuccessful.
Sometimes, observation will be the only method available to the researcher.
Example: Behaviour or attitude of the children, and also of those who are inarticulate.
104
Notes
There are several methods of observation of which any one or a combination of some of them,
could be used by the observer. Some of these are:
Direct-indirect observation
Human-mechanical observation
Structured-Unstructured Observation
Whether the observation should be structured or unstructured depends on the data needed.
Example: A manager of a hotel wants to know "how many of his customers visit the hotel
with their families and how many come as single customers. Here, the observation is structured,
since it is clear "what is to be observed". He may instruct his waiters to record this. This information
is required to decide requirements of the chairs and tables and also the ambience.
Suppose, the manager wants to know how single customers and those with families behave and
what their attitudes are like. This study is vague, and it needs a non-structured observation.
Did u know? The observation method is the only method applicable to study the growth of
plants and crops.
!
Caution To use a more structured approach, it would be necessary to decide precisely what
is to be observed and the specific categories and units that would be used to record the
observations.
Disguised-Undisguised Observation
In disguised observation, the respondents do not know that they are being observed. In nondisguised observation, the respondents are well aware that they are being observed. In disguised
observation, observers often pose as shoppers. They are known as "mystery shoppers". They are
paid by research organisations. The main strength of disguised observation is that it allows for
registering the true of the individuals.
105
Research Methodology
Notes
In the undisguised method, observations may be restrained due to induced error by the objects
of observation. The ethical aspect of disguised observations is still open to question and debate.
Direct-Indirect Observation
In direct observation, the actual behaviour or phenomenon of interest is observed. In indirect
observation, the results of the consequences of the phenomenon are observed. Suppose, a
researcher is interested in knowing about the soft drinks consumption of a student in a hostel
room. He may like to observe empty soft drink bottles dropped into the bin. Similarly, the
observer may seek the permission of the hotel owner to visit the kitchen or stores. He may carry
out a kitchen/stores audit, to find out the consumption of various brands of spice items being
used by the hotel. It may be noted that the success of an indirect observation largely depends on
"how best the observer is able to identify physical evidence of the problem under study".
Human-Mechanical Observation
Most of the studies in marketing research are based on human observation, wherein trained
observers are required to observe and record their observation. In some cases, mechanical
devices such as eye cameras are used for observation. One of the major advantages of electrical/
mechanical devices is that their recordings are free from any subjective bias.
The original data can be collected at the time of occurrence of the event.
2.
Observation is done in natural surroundings. Therefore, the facts emerge more clearly,
whereas in a questionnaire, experiments have environmental as well as time constraints.
3.
Sometimes, the respondents may not like to part with some of the information. Such
information can be obtained by the researcher through observation.
4.
5.
Any bias on the part of the researcher is greatly reduced in the observation method.
106
1.
The observer might wait for longer period at the point of observation. And yet the desired
event may not take place. Observation is required over a long period of time and hence
may not occur.
2.
3.
4.
External observation provides only superficial indications. To delve beneath the surface is
very difficult. Only overt behaviour can be observed.
5.
Two observers may observe the same event, but may draw different inferences.
6.
Notes
Tasks What observation technique would you use to gather the following information?
1.
What kind of influence do children have on the purchase behaviour of their parents?
2.
3.
A study to find out the potential location for a snack bar in a city.
Self Assessment
Fill in the blanks:
3.
4.
5.
2.
3.
4.
5.
107
Research Methodology
Notes
2.
3.
4.
5.
6.
7.
It is systematic, follows specific set of rules, a formal and orderly logic of sequence.
8.
Information gathering: It collects information for a specific purpose. For example, pools,
census, customer satisfaction, attitude, etc.
2.
Theory testing and building: Surveys are also used for the purpose of testing and building
theory. For example, personality and social psychology theories.
Lack of control.
Self Assessment
Fill in the blanks:
6.
7.
8.
108
Notes
Advantages
The ability to let the Interviewee see, feel and/or taste a product.
The ability to find the target population. For example, you can find people who have seen
a film much more easily outside a theater in which it is playing than by calling phone
numbers at random.
Longer interviews are sometimes tolerated. Particularly with in-home interviews that
have been arranged in advance. People may be willing to talk longer face-to-face than to
someone on the phone.
Disadvantages
Personal interviews usually cost more per interview than other methods.
Advantages
People can usually be contacted faster over the telephone than with other methods.
You can dial random telephone numbers when you do not have the actual telephone
numbers of potential respondents.
Skilled interviewers can often invite longer or more complete answers than people will
give on their own to mail, e-mail surveys.
Disadvantages
Many telemarketers have given legitimate research a bad name by claiming to be doing
research when they start a sales call.
The growing number of working women often means that no one is at home during the
day. This limits calling time to a "window" of about 6-9 p.m. (when you can be sure to
interrupt dinner or a favorite TV program).
Advantages
Answers are more accurate to sensitive questions through a computer than to a person or
paper questionnaire.
Interviewer bias is eliminated. Different interviewers can ask questions in different ways,
leading to different results. The computer asks the questions the same way every time.
109
Research Methodology
Notes
Ensuring skip patterns are accurately followed. The Survey System can ensure people are
not asked questions they should skip based on their earlier answers. These automatic
skips are more accurate than relying on an Interviewer reading a paper questionnaire.
Response rates are usually higher as it looks novel and interesting to some people.
Disadvantages
The interviewees must have access to a computer or it must be provided for them.
As with mail surveys, computer direct interviews may have serious response rate problems
in populations due to literacy levels being low.
Advantages
Speed: An email questionnaire can gather several thousand responses within a day or two.
There is practically no cost involved once the set up has been completed.
The novelty element of an email survey often stimulates higher response levels than
ordinary mail surveys.
Disadvantages
Some people will respond several times or pass questionnaires along to friends to answer.
Many people dislike unsolicited email even more than unsolicited regular mail.
Findings cannot be generalised with email surveys. People who have email are different
from those who do not, even when matched on demographic characteristics, such as age
and gender.
!
Caution Software selection is especially important in internet survey so it should be selected
with proper care and after analyzing through feasibility studies
110
Notes
Advantages
Web page surveys are extremely fast. A questionnaire posted on a popular Web site can
gather several thousand responses within a few hours. Many people who will respond to
an email invitation to take a Web survey will do so the first day, and most will do so
within a few days.
There is practically no cost involved once the set up has been completed.
Pictures can be shown. Some Web survey software can also show video and play sound.
Web page questionnaires can use complex question skipping logic, randomizations and
other features which is not possible with paper questionnaires. These features can assure
better data.
Web page questionnaires can use colors, fonts and other formatting options not possible
in most email surveys.
A significant number of people will give more honest answers to questions about sensitive
topics, such as drug use or sex, when giving their answers to a computer, instead of to a
person or on paper.
Disadvantages
Current use of the Internet is far from universal. Internet surveys do not reflect the
population as a whole. This is true even if a sample of Internet users is selected to match
the general population in terms of age, gender and other demographics.
People can easily quit in the middle of a questionnaire. They are not as likely to complete
a long questionnaire on the Web as they would be if talking with a good interviewer.
Depending on your software, there is often no control over people responding multiple
times to bias the results.
Advantages
1.
2.
Since the interviewer is not present face to face, the influence of interviewer on the
respondent is eliminated.
3.
This is the only kind of survey you can do if you have the names and addresses of the
target population, but not their telephone numbers.
4.
Mail surveys allow the respondent to answer at their leisure, rather than at the often
inconvenient moment they are contacted for a phone or personal interview. For this
reason, they are not considered as intrusive as other kinds of interviews.
5.
Where the questions asked are such that they cannot be answered immediately, and needs
some thinking on the part of the respondent, the respondent can think over leisurely and
give the answer.
111
Research Methodology
Notes
6.
7.
8.
9.
The questionnaire can include pictures - something that is not possible over the phone.
Limitations
1.
It is not suitable when questions are difficult and complicated. Example, Do you believe in
value price relationship?
2.
3.
In case of a mail questionnaire, it is not possible to verify whether the respondent himself/
herself has filled the questionnaire. If the questionnaire is directed towards the housewife,
say, to know her expenditure on kitchen items, she alone is supposed to answer it. Instead,
if her husband answers the questionnaire, the answer may not be correct.
4.
Example: Prorated discount, product profile, marginal rate, etc., may not be understood
by the respondents.
5.
If the answers are not correct, the researcher cannot probe further.
6.
7.
In populations of lower educational and literacy levels, response rates to mail surveys are
often too small to be useful.
2.
3.
If a lengthy questionnaire has to be made, first write a letter requesting the co-operation
of the respondents.
4.
5.
Self Assessment
Fill in the blanks:
112
9.
10.
11.
Notes
6.5 Questionnaire
What is Questionnaire?
A questionnaire is a research instrument consisting of a series of questions and other prompts
for the purpose of gathering information from respondents. The questionnaire was invented by
Sir Francis Galton.
Characteristics of Questionnaire
1.
2.
3.
It should be specific, so as to allow the interviewer to keep the interview to the point.
4.
5.
2.
Get additional information on the research issue, from secondary data and exploratory
research. The exploratory research will suggest "what are the relevant variables?"
3.
4.
The type of information required. There are several types of information such as
(a) awareness, (b) facts, (c) opinions, (d) attitudes, (e) future plans, (f) reasons.
113
Research Methodology
Notes
Example: Which television programme did you see last Saturday? This requires a reasonably
good memory and the respondent may not remember. This is known as recall loss. Therefore,
questioning the distant past should be avoided. Memory of events depends on (1) Importance of
the events, and (2) Whether it is necessary for the respondent to remember. In the above case,
both the factors are not fulfilled. Therefore, the respondent does not remember. On the contrary,
a birthday or wedding anniversary of individuals is remembered without effort since the event
is important. Therefore, the researcher should be careful while asking questions about the past.
First, he must make sure that the respondent has the answer.
Example: Do you go to the club? He may answer 'yes', though it is untrue. This may be
because the respondent wants to impress upon the interviewer that he belongs to a well-to-do
family and can afford to spend money on clubs. To obtain facts, the respondents must be
conditioned (by good support) to part with the correct facts.
2.
3.
4.
1.
Cost is less
(b)
Lasts longer
(c)
Better fragrance
(d)
(d)
114
Notes
Example: "Subjects attitude towards Cyber laws and the need for government legislation
to regulate it".
Certainly, not needed at present
Certainly not needed
I can't say
Very urgently needed
Not urgently needed
2.
(b)
(c)
(d)
Depending on which answer the respondent chooses, his knowledge on the subject is
classified.
In a disguised type, the respondent is not informed of the purpose of the questionnaire.
Here the purpose is to hide "what is expected from the respondent?"
Example: "Tell me your opinion about Mr. Ben's healing effect show conducted at
Bangalore?"
"What do you think about the Babri Masjid demolition?"
3.
Non-Structured and Disguised Questionnaire: The main objective is to conceal the topic of
enquiry by using a disguised stimulus. Though the stimulus is standardized by the
researcher, the respondent is allowed to answer in an unstructured manner. The assumption
made here is that individual's reaction is an indication of respondent's basic perception.
Projective techniques are examples of non-structured disguised technique. The techniques
involve the use of a vague stimulus, which an individual is asked to expand or describe or
build a story, three common types under this category are (a) Word association (b) Sentence
completion (c) Story telling.
4.
Non structured and Non disguised Questionnaire: Here the purpose of the study is clear,
but the responses to the question are open-ended. Example: "How do you feel about the
Cyber law currently in practice and its need for further modification"? The initial part of
the question is consistent. After presenting the initial question, the interview becomes
very unstructured as the interviewer probes more deeply. Subsequent answers by the
respondents determine the direction the interviewer takes next. The question asked by the
interviewer varies from person to person. This method is called "the depth interview". The
major advantage of this method is the freedom permitted to the interviewer. By not
restricting the respondents to a set of replies, the experienced interviewers will be above
115
Research Methodology
Notes
to get the information from the respondent fairly and accurately. The main disadvantage
of this method of interviewing is that it takes time, and the respondents may not cooperate.
Another disadvantage is that coding of open-ended questions may pose a challenge. For
example: When a researcher asks the respondent "Tell me something about your experience
in this hospital". The answer may be "Well, the nurses are slow to attend and the doctor is
rude. 'Slow' and 'rude' are different qualities needing separate coding. This type of
interviewing is extremely helpful in exploratory studies.
Dichotomous Question
These questions have only two answers, 'Yes' or 'no', 'true' or 'false' 'use' or 'don't use'.
Do you use toothpaste? Yes .. No
There is no third answer. However sometimes, there can be a third answer:
Close-Ended Questions
There are two basic formats in this type:
116
Notes
Young old .
Single Married ..
Modern Old fashioned ...
2.
Rating Scale
(i)
(ii)
(b)
(c)
(d)
Based on what you saw in the commercial, how interested do you feel, you would be
buying the products?
(a)
Definitely
(b)
(c)
(d)
(e)
Closed-ended questionnaires are easy to answer. It requires less effort on the part of the
interviewer. Tabulation and analysis is easier. There are lesser errors, since the same questions
are asked to everyone. The time taken to respond is lesser. We can compare the answer of one
respondent to another respondent.
Notes One basic criticism of closed-ended questionnaires is that middle alternatives are
not included in this, such as "don't know". This will force the respondents to choose among
the given alternative.
Example: "Don't you think that Brazil played poorly in the FIFA cup?" The answer will be
'yes'. Many of them, who do not have any idea about the game, will also most likely say 'yes'. If
the question is worded in a slightly different manner, the response will be different.
Example: "Do you think that, Brazil played poorly in the FIFA cup?" This is a straightforward
question. The answer could be 'yes', 'no' or 'don't know' depending on the knowledge the
respondents have about the game.
117
Research Methodology
Notes
Example: "Do you think anything should be done to make it easier for people to pay their
phone bill, electricity bill and water bill under one roof"?
Example: "Don't you think something might be done to make it easier for people to pay
their phone bill, electricity bill, water bill under one roof"?
A change of just one word as above, can generate different responses by respondents.
Example: Instead of using the word 'reasonably', 'usually', 'occasionally', 'generally', 'on
the whole'.
Example: "How often do you go to a movie?" "Often, may be once a week, once a month,
once in two months or even more."
Example: "Do you feel that firms today are employee-oriented and customer-oriented?"
There are two separate issues here - [yes] [no]
Example: "Are you happy with the price and quality of branded shampoo?" [yes] [no]
Leading Questions: A leading question is one that suggests the answer to the respondent.
The question itself will influence the answer, when respondents get an idea that the data
is being collected by a company. The respondents have a tendency to respond positively.
Example: "How do you like the programme on 'Radio Mirchy'? The answer is likely to
be 'yes'. The unbiased way of asking is "which is your favorite F.M. Radio station? The
answer could be any one of the four stations namely (1) Radio City (2) Mirchy (3) Rainbow
(4) Radio-One.
Example: Do you think that offshore drilling for oil is environmentally unsound? The most
probable response is 'yes'. The same question can be modified to eliminate the leading factor.
What is your feeling about the environmental impact of offshore drilling for oil? Give choices
as follows:
118
(a)
(b)
(c)
No opinion.
2.
Notes
Example: "Do you own a Kelvinator refrigerator." A better question would be "what brand
of refrigerator do you own?" "Don't you think the civic body is 'incompetent'?" Here the word
incompetent is 'loaded'.
(a)
Are the Questions Confusing? If there is a question unclear or is confusing, then the
respondent becomes more biased rather that getting enlightened. Example: "Do you think
that the government publications are distributed effectively"? This is not the correct way,
since respondent does not know what is the meaning of the word effective distribution.
This is confusing. The correct way of asking questions is "Do you think that the government
publications are readily available when you want to buy?" Example: "Do you think whether
value price equation is attractive"? Here, respondents may not know the meaning of value
price equation.
(b)
Applicability: "Is the question applicable to all respondents?" Respondents may try to answer
a question even though they don't qualify to do so or may lack from any meaningful
opinion.
Example:
1.
2.
3.
"From which bank have you taken a housing loan" (assuming he has taken a loan).
2.
Would you prefer to have a job, or do you prefer to do just domestic work?
Even though, we may say that these two questions look similar, they vary widely. The difference
is that Q-2 makes explicit the alternative implied in Q-1.
Example: "Why do you use Ayurvedic soap"? One respondent might say "Ayurvedic soap is
better for skin care". Another may say "Because the dermatologist has recommended". A third
might say "It is a soap used by my entire family for several years". The first respondent answers
the reason for using it at present. The second respondent answers how he started using. The third
respondent "the family tradition for using". As can be seen, different reference frames are used.
The question may be balanced and rephrased.
119
Research Methodology
Notes
Complex Questions?
In which of the following do you like to park your liquid funds?
i.
Debenture
ii.
Preferential share
iii.
Equity linked MF
iv.
IPO
v.
Fixed deposit
If this question is posed to the general public, they may not know the meaning of liquid fund.
Most of the respondents will guess and tick one of them.
Are the Questions Too Long? Generally as a thumb rule, it is advisable to keep the number of
words in a question not exceeding 20. The question given below is too long for the respondent
to comprehend, leave alone answer.
Example: Do you accept that the people whom you know, and associate yourself have been
receiving ESI and P.F. benefits from the government accept a reduction in those benefits, with a
view to cut down government expenditure, to provide more resources for infrastructural
development?
Yes...................
No...................
Can't say...................
Task Give one example for each of the following type of the questions:
120
1.
Leading question
2.
Double-barreled question
3.
Close-ended question
4.
5.
Split-ballot question
Notes
Basic information
2.
Classification
3.
Identification information.
Items such as age, sex, income, education, etc., are questioned in the classification section. The
identification part involves body of the questionnaire. Always move from general to specific
questions on the topic. This is known as funnel sequence. Sequencing of questions is illustrated
below:
(1)
(2)
(3)
The above three questions follow a funnel sequence. If we reverse the order of question and ask
"which show was watched last week"?, the answer may be biased. This example shows the
importance of sequencing.
Layout: How the questionnaire looks or appears.
Example: Clear instructions, gaps between questions, answers and spaces are part of
layout. Two different layouts are shown below:
Layout - 1 How old is your bike?
........ Less than 1 year ........ 1 to 2 years ........ 2 to 4 years ........ more than 4 years.
Layout - 2 How old is your bike?
........ Less than 1 year
........ 1 to 2 years.
........ 2 to 4 years.
........ More than 4 years.
From the above example, it is clear that layout - 2 is better. This is because likely respondent
error due to confusion is minimised.
Therefore, while preparing a questionnaire start with a general question. This is followed by a
direct and simple question. This is followed by more focused questions. This will elicit maximum
information.
121
Research Methodology
Notes
How do you think this country is getting along in its relations with other countries?
2.
How do you think we are doing in our relations with the US?
3.
4.
5.
Some say we are very weak on the nuclear deal with the US, while, some say we are OK.
What do you feel ?
The first question introduces the general subject. In the next question, a specific country is
mentioned. The third and fourth questions are asked to seek views. The fifth question is to seek
a specific opinion.
122
i.
ii.
iii.
iv.
v.
Notes
Generally as a thumb rule, it is advisable to keep the number of words in a question not
exceeding .
13.
14.
15.
6.6 Summary
Primary data may pertain to life style, income, awareness or any other attribute of
individuals or groups.
There are mainly two ways of collecting primary data namely: (a) Observation (b) By
questioning the appropriate sample.
Observation method has a limitation i.e., certain attitudes, knowledge, motivation, etc.
cannot be measured by this method. For this reason, researcher needs to communicate.
Structured questionnaire is easy to administer. This type is most suited for descriptive
research. If the researcher wants to do exploratory sturdy, unstructured method is better.
In unstructured method questions will have to be framed based on the answer by the
respondent. Questionnaire can be administered either in person or online or Mail
questionnaire. Each of these methods have advantages and disadvantages.
Questions in a questionnaire may be classified into (a) Open question (b) Close ended
questions (c) Dichotomous questions, etc.
While formulating questions, care has to be taken with respect to question wording,
vocabulary, leading, loading and confusing questions should be avoided. Further it is
desirable that questions should not be complex, nor too long.
It is also implied that proper sequencing will enable the respondent to answer the question
easily. The researcher must maintain a balanced scale and must use a funnel approach.
6.7 Keywords
Computer Direct Interview: This is the method in which the respondents key in (enter) their
answers directly into a computer.
Dichotomous Question: These questions have only two answers, like 'Yes' or 'no'
Disguised Observation: The observation under which the respondents do not know that they are
being observed.
123
Research Methodology
Notes
Loaded Question: A question in which special emphasis is given to a word or a phrase, which
acts as a lead to respondent.
Non-disguised Observation: The observation in which the respondents are well aware that they
are being observed.
2.
What are the various methods available for collecting primary data?
3.
4.
What are the several methods used to collect data by observation method?
5.
What are the advantages and limitations of collecting data by observation method?
6.
7.
8.
9.
10.
Observation, questioning
2. questioning
3.
identify
4.
direct
5.
External
6.
non-experimental
7.
directly
8.
potential
9.
sample
10.
verify
11.
12.
20
13.
balanced
14.
limits
15.
exploratory
Books
Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.
Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.
S. N. Murthy and U. Bhojanna, Business Research Methods, Excel Books.
124
Notes
CONTENTS
Objectives
Introduction
7.1
7.2
Secondary Data
7.1.1
7.1.2
7.1.3
7.2.2
7.2.3
Advertising Data
7.3
7.4
Summary
7.5
Keywords
7.6
Review Questions
7.7
Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
In research, secondary data is collecting and possibly processed by people other than the researcher
in question. Common sources of secondary data for social science include censuses, large surveys,
and organizational records. In sociology primary data is data you have collected yourself and
secondary data is data you have gathered from primary sources to create new research. In terms
of historical research, these two terms have different meanings. A primary source is a book or
set of archival records. A secondary source is a summary of a book or set of records.
125
Research Methodology
Notes
1.
2.
Example: Sales in units, credit outstanding, call reports of sales persons, daily production
report, monthly collection report, etc.
Census data
2.
3.
4.
Miscellaneous data
Did u know? Census data is the most important data among the sources of data.
The following are some of the data that can obtained by census records:
Population Census
Product finder
126
undertaken a large scale survey, or even a census, this is likely to yield far more accurate results
than custom designed and executed surveys when these are based on relatively small sample
sizes.
Notes
It should not be forgotten that secondary data can play a substantial role in the exploratory
phase of the research when the task at hand is to define the research problem and to generate
hypotheses. The assembly and analysis of secondary data almost invariably improve the
researcher's understanding of the marketing problem, the various lines of inquiry that could or
should be followed and the alternative courses of actions which might be pursued.
Secondary sources help define the population. Secondary data can be extremely useful both in
defining the population and in structuring the sample to be taken. For instance, government
statistics on a country's agriculture will help decide how to stratify a sample and, once sample
estimates have been calculated, these can be used to project those estimates to the population.
Limitations
1.
Definition: The researcher, when making use of secondary data, may misinterpret the
definitions used by those responsible for its preparation and draw erroneous conclusions
2.
3.
Source bias: Researchers face the problem of vested interests when they consult secondary
sources. Those responsible for their compilation may have reasons for wishing to present
a more optimistic or pessimistic set of results for their organization i.e., exaggerated
figures or inflated estimates may be stated.
4.
Reliability: The reliability of published statistics may vary over time. Because the systems
of collecting data or geographical or administrative boundaries may be changed, or the
basis for stratifying a sample may have altered. Other aspects of research methodology
that affect the reliability of secondary data is the sample size, response rate, questionnaire
design and modes of analysis without any indication of this to the reader of published
statistics.
5.
Time scale: The time period during which secondary data was first compiled may have a
substantial effect upon the nature of the data for example: Most censuses take place at tenyear intervals, so data from this and other published sources may be out-of-date at the
time the researcher wants to make use of the statistics.
Self Assessment
Fill in the blanks:
1.
2.
Those data that have been compiled by some agency other than the user are known as
..data.
3.
4.
5.
.Secondary Data is the data collected by the researcher from outside the
company.
127
Research Methodology
Notes
Rating
2.
3.
These people also provide TRP rating namely television rating points on a regular basis.
This provides:
(i)
Viewership figures
(ii)
Notes Organizations involved in collecting syndicated data provide NRS called National
Readership Survey to the sponsors and advertising agencies.
There is also a study called FSRP which covers children in the age group of 10-19 years. Beside
their demographics and psychographics, the study covers areas such as:
Children as decision-makers
Media reviews
Syndicated sources consist of market research firms offering syndicated services. These market
research organisations collect and update information on a continuous basis. Since data is
syndicated, its cost is spread over a number of client organisations and hence is cheaper.
Example: A client firm can give certain specific question to be included in the questionnaire,
which are used routinely to collect syndicated data. The client will have to pay extra charges for
these. The data generated from additional questions and analysis will be revealed only to the
firms submitting the questions.
Therefore we can say that the customization of secondary data is possible. Some areas of
syndicated services are newspapers, periodical readership, popularity of TV channels, etc. Data
from syndicated sources are available on a weekly or monthly basis.
128
Notes
(b)
(c)
Advertising data.
Most of these data collection methods as mentioned above are also known as syndicated data.
Syndicated data can be classified into:
Notes Panels provide data on consumer buying habits on petrol, auto parts, sports goods
etc.
Limitations
Some people do not want to take the trouble of keeping records of their purchases.
Therefore, relevant data is not available.
Advantages
Use of scanner tied to the central computer helps the panel members to record their
purchases early (almost immediately).
We also have the Consumer Mail Panel (CMP). This consists of members who are willing to
answer mail questionnaires. A large number of such households are kept on the panel. This
serves as a universe through which panels are selected.
129
Research Methodology
Notes
It involves inspection of goods delivered between visits. If the stock of any product in the shop
is accurately counted during both the visits and data on deliveries are accurately taken from the
records, the collection of sales of a product over that period can be determined accurately as
follows:
Initial stock + Deliveries between successive visits - Second time stock = Sales
If this information is obtained from different shops from the representative sample of shops,
then the accurate estimates of sales of the product can be made. To do this, some shops can be
taken as a "Panel of shops" representing the universe.
Advantages
It provides information between audits on consumer purchase over the counter in specific
units. For example, KGs, bottles, No's, etc.
It provides data on shop purchases i.e., the purchases made by the retailer between audits.
Disadvantages
It is time consuming.
Did u know? With the help of wholesale and retail data, the manufacturer comes to know
how competitor is doing.
130
Notes
Task List some major secondary sources of information for the following:
(a)
(b)
M.T.R. has several product ideas on ready-to-eat products. It wishes to convert ideas
into products and enter the market. Before entering, the company needs to find
necessary information to assess the market potential.
(c)
An MNC wishes to open a showroom in a Metro. The first step that the company
would like to take is to collect the information about suitability.
Self Assessment
Fill in the blanks:
6.
7.
Retail and Wholesale data provides information between .on consumer purchase
over the counter in specific units.
8.
9.
10.
The cost of syndicated data is since it is spread over a number of client organisations
They provide information, which retailers may not be willing to reveal to researcher.
Disadvantages
Because secondary data has been collected for some other projects, it may not fit in with the
problem that is being defined. In some cases, the feed is so poor that the data becomes completely
inappropriate. It may be ill-suited because of the following three reasons:
Unit of measurement
Definition of a class
Recency
Unit of Measurement
It is common for secondary data to be expressed in units.
131
Research Methodology
Notes
Example: Size of the retail establishments, for instance, can be expressed in terms of gross
sales, profits, square feet area and number of employees. Consumer incomes can be expressed in
variables the individual, family, household etc. Secondary data available may not fit in easily.
Assume that the class intervals are quite different from those which are needed.
Problem of Accuracy
The accuracy of secondary data available is highly questionable. A number of errors are possible
in the collection and analysis of the data. Accuracy of secondary data depends upon:
(a)
(b)
(a)
Who has collected the data: The reliability of the source determines the accuracy of the
data. Assume that a publisher of a private periodical conducts a survey of his readers. The
main aim of the survey is to find out the opinion of readers about advertisements appearing
in it. This survey is done by the publisher in the hope that other firms will buy this data
before inserting advertisements.
Assume that a professional MR agency has conducted a similar survey and has sold its
syndicated data on many periodicals.
If you are an individual who wants information on a particular periodical you buy the
data from MR agency rather from the periodical's publisher. The reason for this is trust of
the MR agency. The reasons for trusting the MR agency are as follows:
(b)
132
1.
2.
The data quality of MR agency will be good since they are professionals.
2.
3.
4.
What was the time period of data collection? Example: days of the week, time of the
day.
Notes
!
Caution Before using the secondary data, the source of data must be verified in order to
ensure accuracy and reliability of data.
Recency
This pertains to "how old was the information?" If it is five years old, it may be useless. Therefore,
the publication lag is a problem.
Self Assessment
Fill in the blanks:
11.
12.
Secondary data may be ill-suited because of the three reasons which are..,
Definition of a class and Recency
13.
14.
15.
7.4 Summary
Secondary data may not be readily used because these data are collected for some other
purpose.
There are two types of secondary data (1) Internal and (2) External secondary data.
Syndicated data may be classified into (a) Consumer purchase data (b) Retailer and
wholesale data (c) Advertising data. Each has advantages and disadvantages.
7.5 Keywords
External Data: The data collected by the researcher from outside the company.
Internal Data: Internal data are those that are found within the organisation.
Panel Type Data: This is one type of syndicated data in which there are consumer panels.
Secondary Data: Secondary data is collecting and possibly processed by people other than the
researcher in question.
Syndicated Data: Data collected by this method is sold to interested clients on payment.
133
Research Methodology
Notes
2.
3.
4.
5.
6.
7.
8.
9.
Discuss the sources of secondary data for the study on "consumer purchasing a white
good".
10.
Internal, External
2.
secondary
3.
company's
4.
four
5.
External
6.
market research
7.
audits
8.
target
9.
weekly, monthly
10.
cheaper
11.
field staff
12.
Unit of measurement
13.
accuracy
14.
units
15.
Recency
Books
134
Notes
CONTENTS
Objectives
Introduction
8.1
8.2
Arithmetic Mean
8.2.2
8.2.3
Median
8.2.4
8.2.5
Mode
8.2.6
8.3
Measures of Dispersion
8.4
Summary
8.5
Keywords
8.6
Review Questions
8.7
Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
Lets take a look at the most basic form of statistics, known as descriptive statistics. This branch
of statistics lays the foundation for all statistical knowledge. Descriptive Statistics are used to
describe the basic features of the data gathered from an experimental study in various ways. A
descriptive statistics is distinguished from inductive statistics. They provide simple summaries
about the sample and the measures. Together with simple graphics analysis, they form the basis
of virtually every quantitative analysis of data. It is necessary to be familiar with primary
methods of describing data in order to understand phenomena and make intelligent decisions.
There may be two objectives for formulating a summary statistic: (1) to choose a statistic that
shows how different units seem similar. Statistical textbooks call one solution to this objective,
a measure of central tendency and (2) to choose another statistic that shows how they differ. This
kind of statistic is often called measure dispersion.
135
Research Methodology
Notes
Notes An average is a single value which can be taken as representative of the whole
distribution.
Functions of an Average
1.
To present huge mass of data in a summarised form: It is very difficult for human mind to
grasp a large body of numerical figures. A measure of average is used to summarise such
data into a single figure which makes it easier to understand and remember.
2.
3.
Example: If the average monthly sales of a company are falling, the sales manager may have
to take certain decisions to improve it.
2.
3.
4.
5.
6.
7.
Self Assessment
Fill in the blanks:
136
1.
2.
Notes
2.
3.
Mathematical Averages:
(a)
(b)
Geometric Mean
(c)
Harmonic Mean
(d)
Quadratic Mean
Positional Averages:
(a)
Median
(b)
Mode
Commercial Average:
(a)
Moving Average
(b)
Progressive Average
(c)
Composite Average
The above measures of central tendency will be discussed in the order to their popularity. Out of
these, the Arithmetic Mean, Median and Mode, being most popular, are discussed in that order.
X
i1
where S (called sigma) denotes summation sign. The subscript of X, i.e., i is a positive integer,
which indicates the serial number of the observation. Since there are n observations, variation
in i will be from 1 to n. This is indicated by writing it below and above S, as written earlier. When
there is no ambiguity in range of summation, this indication can be skipped and we may simply
write X1 + X2 +..... + Xn = SXi.
Arithmetic Mean is defined as the sum of observations divided by the number of observations.
It can be computed in two ways:
1.
2.
In case of simple arithmetic mean, equal importance is given to all the observations while in
weighted arithmetic mean, the importance given to various observations is not same.
Calculation of simple arithmetic mean can be done in following ways:
1.
137
Research Methodology
Notes
(b)
X
i1
Shortcut Method: This method is used when the magnitude of individual observations
is large. The use of shortcut method is helpful in the simplification of calculation
work.
Let A be any assumed mean. We subtract A from every observation. The difference
between an observation and A, i.e., X i A is called the deviation of ith observation
from A and is denoted by di. Thus, we can write; d1 = X1 A, d2 = X2 A, ..... dn = Xn
A. On adding these deviations and dividing by n we get
(X
A)
nA
d
d = X A where d i
or
On rearranging, we get
di
n
X = Ad A
Notes Theoretically we can select any value as assumed mean. However, for the purpose
of simplification of calculation work, the selected value should be as nearer to the value of
X as possible.
Example: The following figures relate to monthly output of cloth of a factory in a given
year:
Months
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Output
(in 000 metres)
80
88
92
84
96
92
96
100
92
94
98
86
80 88 92 84 96 92 96 100 92 94 98 96
12
Xi
80
88
92
84
96
92
96
100
92
94
98
86
Total
di = Xi A
10
10
di = 18
18
X = 90 12
138
2.
Notes
Let there be n values X1, X2, ..... Xn out of which X1 has occurred f1 times, X2 has occurred f2
times, ..... Xn has occurred fn times. Let N be the total frequency, i.e.,
n
N=
f
i1
(a)
Values
X1
X2
.....
Xn
Total Frequency
Frequency
f1
f2
.....
fn
Direct Method: The arithmetic mean of these observations using direct method is
given by
X1 X1 ... X1 X 2 ... ... X 2 ... Xn ... Xn
F1 times
F2 times
F3 times
F1 F2 ... Fn
Since X1 + X1 + ... + X1 added F1 times can also be written F1 X1. Similarly, by writing
other observation in same manner, we have
n
f 1 X 1 f 2 X 2 ... f n X n
f 1 f 2 ... f n
fX fX
i 1
n
f
i 1
(b)
i 1
...(1)
fidi = fi (Xi A) = fi Xi A fi = fi Xi A N
Dividing both sides by N we have
f i di f i Xi
AXA
N
N
3.
or
X A
f i di
Ad
N
!
Caution Here u1 may or may not be equal to l2, i.e., the upper limit of a class may or may
not be equal to the lower limit of its following class.
It may be recalled here that, in a grouped frequency distribution, we only know the
number of observations in a particular class interval and not their individual magnitudes.
Therefore, to calculate mean, we have to make a fundamental assumption that the
observations in a class are uniformly distributed. Under this assumption, the mid-value of
139
Research Methodology
Notes
a class will be equal to the mean of observations in that class and hence can be taken as
their representative. Therefore, if Xi is the mid-value of ith class with frequency fi, the
above assumption implies that there are fi observations each with magnitude Xi (i = 1 to n).
Thus, the arithmetic mean of a grouped frequency distribution can also be calculated by
the use of the formula.
Class Intervals
Frequency (f)
l1 u1
f1
l2 u2
f2
.
ln un
Total Frequency
fi = N
fn
Notes The accuracy of arithmetic mean calculated for a grouped frequency distribution
depends upon the validity of the fundamental assumption. This assumption is rarely met
in practice. Therefore, we can only get an approximate value of the arithmetic mean of a
grouped frequency distribution.
Example: The following table gives the distribution of weekly wages of workers in a
factory. Calculate the arithmetic mean of the distribution.
Weekly Wages
240-269
270-299
300-329
330-359
360-389
390-419
420-449
No. of Workers
19
27
15
12
12
Solution:
It may be noted here that the given class intervals are inclusive. However, for the computation
of mean, they need not be converted into exclusive class intervals.
Class Intervals
Mid-values (X)
240 - 269
270 - 299
300 - 329
330 - 359
360 - 389
390 - 419
420 - 449
254.5
284.5
314.5
344.5
374.5
404.5
434.5
7
19
27
15
12
12
8
Total
100
X A
Frequency
d = X 344.5
fd
90
60
30
0
30
60
90
630
1140
810
0
360
720
720
780
100
140
Xi A
. Multiplying both sides by fi and taking sum over all the observations we
h
Let us define ui
n
have,
fu
i1
1 n
f i X i A
h i1
Notes
i1
i1
i1
i1
h f i ui f i Xi A f i f i Xi A.N
or
fu fX
i1
i1
AXA
X Ah
fu
i1
...(2)
Using this relation we can simplify the computations of Example, as shown below.
u
X 344.5
30
fu
21
2
12
19
27
15
12
38
27
12
24
3
8
24
Total
100
26
30 26
336.7
100
When the arithmetic mean of a frequency distribution is calculated by shortcut or stepdeviation method, the accuracy of the calculations can be checked by using the following
formulae, given by Charlier.
141
Research Methodology
Notes
mean wage of the workers of a factory, it would be wrong to compute simple arithmetic mean
if there are a few workers (say managers) with very high wages while majority of the workers
are at low level of wages. The simple arithmetic mean, in such a situation, will give a higher
value that cannot be regarded as representative wage for the group. In order that the mean wage
gives a realistic picture of the distribution, the wages of managers should be given less importance
in its computation. The mean calculated in this manner is called weighted arithmetic mean. The
computation of weighted arithmetic is useful in many situations where different items are of
unequal importance, e.g., the construction index numbers, computation of standardised death
and birth rates, etc.
w X
w
1.
Xw
2.
Xw A
3.
Xw A
w d
w
i i
i
w u
w
i
(where di = Xi A)
X A
h where ui i
(Using step-deviation method)
Example: From the following results of two colleges A and B, find out which of the two is
better:
Examination
College A
College B
Appeared
Passed
Appeared
Passed
M.Sc.
60
40
200
160
M.A.
100
60
240
200
B.Sc.
200
150
200
140
B.A.
120
75
160
100
Solution:
Performance of the two colleges can be compared by taking weighted arithmetic mean of the
pass percentage in various classes. The calculation of weighted arithmetic mean is shown in the
following table.
College A
142
College B
Class
Appeared
wA
Passed
Pass
Percentage
XA
M.Sc.
60
40
66.67
M.A.
100
60
B.Sc.
200
150
B.A.
120
75
Total
480
325
wAXA
wBXB
Appeared
wB
Passed
Pass
Percentage
XB
4000.2
200
160
80.00
16000.0
60.00
6000.0
240
200
83.33
19999.2
75.00
15000.0
200
140
70.00
14000.0
62.50
7500.0
160
100
62.50
10000.0
32500.2
800
600
59999.2
X w for College A
w X
w
A
X w for College B
w X
w
B
32500.2
480 = 67.71%
Notes
59999.2
800 = 75%
Since the weighted average of pass percentage is higher for college B, hence college B is better.
Notes If X denotes simple mean and X w denotes the weighted mean of the same data, then
1.
2.
X X w , when items of small magnitude are assigned greater weights and items of
3.
8.2.3 Median
Median of distribution is that value of the variate which divides it into two equal parts. In terms
of frequency curve, the ordinate drawn at median divides the area under the curve into two
equal parts. Median is a positional average because its value depends upon the position of an
item and not on its magnitude.
Median can be determined under various situations like:
When Individual Observations are Given
The following steps are involved in the determination of median:
1.
The given observations are arranged in either ascending or descending order of magnitude.
2.
n 1
The size of
th observations, when n is odd.
(b)
n
n
th and 1 th observations, when n is even.
2
2
Solution:
Writing the observations in ascending order, we get 15, 16, 18, 20, 25, 28, 30.
143
Research Methodology
Notes
7 1
Since n = 7, i.e., odd, the median is the size of
, i.e., 4th observation.
2
Notes The same value of Md will be obtained by arranging the observations in descending
order of magnitude.
Task Find median of data: 245, 230, 265, 236, 220, 250.
14
18
36
51
54
52
20
14
18
36
51
54
52
20
c.f.
21
39
75
126
180
232
252
Solution:
N 252
N
126 and
1 127.
2
2
2
Therefore, Median is the mean of the size of 126th and 127th observation. From the table we note
that 126th observation is 4 and 127th observation is 5.
Md
45
4.5
2
Alternative Method: Looking at the frequency distribution we note that there are 126 observations
which are less than or equal to 4 and there are 252 75 = 177 observations which are greater than
or equal to 4. Similarly, observation 5 also satisfies this criterion. Therefore, median =
45
4.5.
2
144
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
50 - 60
12
14
18
13
Solution:
Notes
The median of a distribution is that value of the variate which divides the distribution into two
equal parts. In case of a grouped frequency distribution, this implies that the ordinate drawn at
the median divides the area under the histogram into two equal parts. Writing the given data in
a tabular form, we have:
Class Intervals (1)
0 - 10
0.5
10 - 20
12
17
1.2
20 - 30
14
31
1.4
30 - 40
18
49
1.8
40 - 50
13
62
1.3
50 - 60
70
0.8
Frequency Density
2.0
1.5
1.0
0.5
0
10
20
30Md 40
Class Intervals
50
60
Since the ordinate at median divides the total area under the histogram into two equal parts,
therefore we have to find a point (like Md as shown in the Figure) on X-axis such that an ordinate
(AMd) drawn at it divides the total area under the histogram into two equal parts.
We may note here that area under each rectangle is equal to the frequency of the corresponding
class.
Since area = Length Breadth = Frequency density Width of class =
f
h f.
h
Thus, the total area under the histogram is equal to total frequency N. In the given example
N
35. We note that area of first three rectangles is 5 + 12 + 14 = 31 and the area
2
of first four rectangles is 5 + 12 + 14 + 18 = 49. Thus, median lies in the fourth class interval which
is also termed as median class. Let the point, in median class, at which median lies be denoted by
Md. The position of this point should be such that the ordinate AMd (in the above histogram)
divides the area of median rectangle so that there are only 35 31 = 4 observations to its left.
From the histogram, we can also say that the position of Md should be such that
N = 70, therefore
M d 30 4
40 30 18
...(1)
145
Research Methodology
Notes
Md
Thus,
40
30 32.2
18
h
fm
fm
...(2)
Where, Lm is lower limit, h is the width and fm is frequency of the median class and C is the
cumulative frequency of classes preceding median class. Equation (2) gives the required formula
for the computation of median.
Remarks:
1.
2.
The above formula is also applicable when classes are of unequal width.
3.
Median can be computed even if there are open end classes because here we need to know
only the frequencies of classes preceding or following the median class.
or
...(3)
Note that C denotes the greater than type cumulative frequency of classes following the median
class. Applying this formula to the above example, we get
Md 40
35 21
18
10 32.2
Example: The following table gives the distribution of marks by 500 students in an
examination. Obtain median of the given data.
Marks
No. of Students
0-9
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 79
30
40
50
48
24
162
132
14
Solution:
Since the class intervals are inclusive, therefore, it is necessary to convert them into class
boundaries.
146
Class Intervals
Class Boundries
Frequency
0-9
0.5 - 9.5
30
30
10 - 19
9.5 - 19.5
40
70
20 - 29
19.5 - 29.5
50
120
30 - 39
29.5 - 39.5
48
168
40 - 49
39.5 - 49.5
24
192
50 - 59
49.5 - 59.5
162
354
60 - 69
59.5 - 69.5
132
486
70 - 79
69.5 - 79.5
14
500
Since
Notes
N
250, the median class is 49.5 59.5 and, therefore, Lm = 49.5, h = 10, fm = 162, C = 192.
2
Thus, M d 49.5
250 192
10 53.08 marks.
162
30 - 40
40 - 50
50 - 60
60 - 70
70 - 80
120
200
185
No. of Workers
Solution:
Let f1 and f2 be the frequencies of the classes 40 - 50 and 60 - 70 respectively.
Class Intervals
30 - 40
40 - 50
50 - 60
60 - 70
70 - 80
Frequency
120
f1
120
120 + f1
200
f2
320 + f1
185
900
320 + f1 + f2
450 120 f 1
200
10 50
330 f 1
20
147
Research Methodology
Notes
Example: The following table shows the daily sales of 230 footpath sellers of Chandni
Chowk:
Sales (in Rs.)
0 - 500
500 1000
1000 1500
1500 2000
2000 2500
2500 3000
3000 3500
3500 4000
No. of Sellers
12
18
35
42
50
45
20
2.
Both, the less than and the greater than type ogives.
Solution:
To draw ogives, we need to have a cumulative frequency distribution.
Class Intervals
0 - 500
500 - 1000
1000 - 1500
1500 - 2000
2000 - 2500
2500 - 3000
3000 - 3500
3500 - 4000
1.
Frequency
12
18
35
42
50
45
20
8
12
30
65
107
157
202
222
230
230
218
200
165
123
73
28
8
Cumulative Frequency
256
192
128
64
Median = 2080
500
148
The value
N
115 is marked on the vertical axis and a horizontal line is drawn from this
2
Notes
point to meet the ogive at point S. Drop a perpendicular from S. The point at which this
meets X-axis is the median.
2.
Cumulative Frequency
256
192
128
64
Median = 2080
500
A perpendicular is dropped from the point of intersection of the two ogives. The point at which
it intersects the X-axis gives median. It is obvious from Figure 8.2 and 8.3 that median = 2080.
Properties of Median
1.
It is a positional average.
2.
It can be shown that the sum of absolute deviations is minimum when taken from median.
This property implies that median is centrally located.
Quartiles
The values of a variable that divide a distribution into four equal parts are called quartiles. Since
three values are needed to divide a distribution into four parts, there are three quartiles, viz. Q1,
Q2 and Q3, known as the first, second and the third quartile respectively.
For a discrete distribution, the first quartile (Q1) is defined as that value of the variate such that
at least 25% of the observations are less than or equal to it and at least 75% of the observations
are greater than or equal to it.
For a continuous or grouped frequency distribution, Q1 is that value of the variate such that the
area under the histogram to the left of the ordinate at Q1 is 25% and the area to its right is 75%.
After locating the first quartile class, the formula for Q1 is as follows:
149
Research Methodology
Notes
Q1 LQ 1
N C
4
h
f Q1
Here, LQ1 is lower limit of the first quartile class, h is its width, f Q1 is its frequency and C is
cumulative frequency of classes preceding the first quartile class.
By definition, the second quartile is median of the distribution. The third quartile ( Q3) of a
distribution can also be defined in a similar manner.
For a discrete distribution, Q3 is that value of the variate such that at least 75% of the observations
are less than or equal to it and at least 25% of the observations are greater than or equal to it.
For a grouped frequency distribution, Q3 is that value of the variate such that area under the
histogram to the left of the ordinate at Q3 is 75% and the area to its right is 25%. The formula for
computation of Q3 can be written as
Q3 LQ3
3N C
4
h , where the symbols have their usual meaning.
f Q3
Deciles
Deciles divide a distribution into 10 equal parts and there are, in all, 9 deciles denoted as D1, D2,
...... D9 respectively.
For a discrete distribution, the ith decile Di is that value of the variate such that at least (10i)% of
the observation are less than or equal to it and at least (100 10i)% of the observations are greater
than or equal to it (i = 1, 2, ...... 9).
For a continuous or grouped frequency distribution, Di is that value of the variate such that the
area under the histogram to the left of the ordinate at Di is (10i)% and the area to its right is
(100 - 10i)%. The formula for the ith decile can be written as
iN C
10
h (i = 1, 2, ...... 9)
Di LDi
f Di
Percentiles
Percentiles divide a distribution into 100 equal parts and there are, in all, 99 percentiles denoted
as P1, P2, ...... P25, ...... P40, ...... P60, ...... P99, respectively.
For a discrete distribution, the kth percentile Pk is that value of the variate such that at least k%
of the observations are less than or equal to it and at least (100 k)% of the observations are
greater than or equal to it.
For a grouped frequency distribution, Pk is that value of the variate such that the area under the
histogram to the left of the ordinate at Pk is k% and the area to its right is (100 k)%. The formula
for the kth percentile can be written as
kN C
100
h , (k = 1, 2, ...... 99)
Pk LPk
f Pk
150
Notes
Remarks:
1.
We may note here that P25 = Q1, P50 = D5 = Q2 = Md, P75 = Q3, P10 = D1, P20 = D2, etc.
2.
In continuation of the above, the partition values are known as Quintiles (Octiles) if a
distribution is divided in to 5 (8) equal parts.
3.
The formulae for various partition values of a grouped frequency distribution, given so
far, are based on less than type cumulative frequencies. The corresponding formulae
based on greater than type cumulative frequencies can be written in a similar manner, as
given below:
Q1 UQ1
3N C
N C
4
4
h Q 3 U Q3
f Q1
f Q3
iN
kN
N 10 C
N 100 C
h
Di U Di
h Pk U PK
f Di
f Pk
Here UQ , UQ , UD , U P are the upper limits of the corresponding classes and C denotes the
greater than type cumulative frequencies.
1
8.2.5 Mode
Mode is that value of the variate which occurs maximum number of times in a distribution and
around which other items are densely distributed. In the words of Croxton and Cowden, The
mode of a distribution is the value at the point around which the items tend to be most heavily
concentrated. It may be regarded the most typical of a series of values. Further, according to
A.M. Tuttle, Mode is the value which has the greatest frequency density in its immediate
neighbourhood.
If the frequency distribution is regular, then mode is determined by the value corresponding to
maximum frequency. There may be a situation where concentration of observations around a
value having maximum frequency is less than the concentration of observations around some
other value. In such a situation, mode cannot be determined by the use of maximum frequency
criterion. Further, there may be concentration of observations around more than one value of
the variable and, accordingly, the distribution is said to be bimodal or multi-modal depending
upon whether it is around two or more than two values.
The concept of mode, as a measure of central tendency, is preferable to mean and median when
it is desired to know the most typical value, e.g., the most common size of shoes, the most
common size of a ready-made garment, the most common size of income, the most common
size of pocket expenditure of a college student, the most common size of a family in a locality,
the most common duration of cure of viral-fever, the most popular candidate in an election, etc.
Mode can be determined under following situations like:
When Data are either in the form of Individual Observations or in the form of Ungrouped
Frequency Distribution
Given individual observations, these are first transformed into an ungrouped frequency
distribution. The mode of an ungrouped frequency distribution can be determined in two ways,
as given below:
1.
By inspection or
2.
By method of grouping.
151
Research Methodology
Notes
1.
10
11
12
13
14
15
16
17
18
19
20
Frequency
Mode = 10
Notes
1.
If the frequency of each possible value of the variable is same, there is no mode.
2.
If there are two values having maximum frequency, the distribution is said to be
bimodal.
Example: Determine the mode of the following distribution:
10
11
12
13
14
15
16
17
18
19
15
20
100
98
95
90
75
50
30
Solution:
This distribution is not regular because there is sudden increase in frequency from 20 to 100.
Therefore, mode cannot be located by inspection and hence the method of grouping is used.
Various steps involved in this method are as follows:
152
1.
2.
In the first column, write the frequencies against various values of X as given in the
question.
3.
In second column, the sum of frequencies, starting from the top and grouped in twos, are
written.
4.
In third column, the sum of frequencies, starting from the second and grouped in twos, are
written.
5.
In fourth column, the sum of frequencies, starting from the top and grouped in threes are
written.
6.
In fifth column, the sum of frequencies, starting from the second and grouped in threes are
written.
7.
In the sixth column, the sum of frequencies, starting from the third and grouped in threes
are written.
The highest frequency total in each of the six columns is identified and analysed to determine
mode. We apply this method for determining mode of the above example.
X
f (1)
10
11
15
12
20
13
100
14
98
15
95
16
90
17
75
18
50
19
30
(2)
(3)
(4)
35
43
23
(5)
Notes
(6)
135
120
218
198
293
193
283
185
260
165
215
125
155
80
Analysis Table
V
Columns
A
10
R
11
I
12
A
13
B
14
L
15
E
16
17
18
19
2
3
5
6
Total
Since the value 14 and 15 are both repeated maximum number of times in the analysis table,
therefore, mode is ill defined. Mode in this case can be approximately located by the use of the
following formula, which will be discussed later, in this unit.
Mode = 3 Median 2 Mean
Calculation of Median and Mean
10
11
12
13
14
15
16
17
18
19
Total
15
20
100
98
95
90
75
50
30
581
C.f.
23
43
143
241
336
426
501
551
581
fX
80
165
240
1300
1372
1425
1440
1275
900
570
8767
8767
581 + 1
= 15.09
Median = Size of
th, i.e., 291st observation = 15. Mean =
2
581
Notes If the most repeated values, in the above analysis table, were not adjacent, the
distribution would have been bimodal, i.e., having two modes
153
Research Methodology
Notes
Determination of modal class: It is the class in which mode of the distribution lies. If the
distribution is regular, the modal class can be determined by inspection, otherwise, by
method of grouping.
2.
Exact location of mode in a modal class (interpolation formula): The exact location of
mode, in a modal class, will depend upon the frequencies of the classes immediately
preceding and following it. If these frequencies are equal, the mode would lie at the
middle of the modal class interval. However, the position of mode would be to the left or
to the right of the middle point depending upon whether the frequency of preceding class
is greater or less than the frequency of the class following it. The exact location of mode
can be done by the use of interpolation formula, developed below:
Figure 8.4
Frequency
LmMuUm
Classes
Let the modal class be denoted by Lm Um, where Lm and Um denote its lower and the upper limits
respectively. Further, let fm be its frequency and h its width. Also let f1 and f2 be the respective
frequencies of the immediately preceding and following classes.
We assume that the width of all the class intervals of the distribution are equal. If these are not
equal, make them so by regrouping under the assumption that frequencies in a class are uniformly
distributed.
Make a histogram of the frequency distribution with height of each rectangle equal to the
frequency of the corresponding class. Only three rectangles, out of the complete histogram, that
are necessary for the purpose are shown in the above Figure.
Let 1 = fm f1 and 2 = fm f2. Then the mode, denoted by Mo, will divide the modal class interval
in the ratio
1
.
2
To derive a formula for mode, the point M o in the Figure, should be such that
M o - Lm 1
=
U m - M o 2 or Mo2 Lm2 = Um1 Mo1
154
Notes
= (1 + 2) Lm + 1h
Dividing both sides by 1 + 2, we have
1
Mo = Lm + + h
1
2
...(1)
By slight adjustment, the above formula can also be written in terms of the upper limit (U m) of
the modal class.
Mo = U m h +
1
1
h = U m 1 h
1 + 2
1 + 2
= Um
h
1
2
...(2)
and
Mo = Lm +
fm - f1
h
2 fm - f1 - f2
...(3)
Mo = U m
fm - f2
h
2 fm - f1 - f2
...(4)
Notes The above formulae are applicable only to a unimodal frequency distribution.
155
Research Methodology
Notes
Frequency
Figure 8.5
Figure 8.6
Frequency
Symmetrical Distribution
X = Md = M0
Frequency
M0 Md X
156
X Md M0
Notes
Self Assessment
Fill in the blanks:
3.
4.
5.
6.
In a grouped frequency distribution, there are classes along with their respective
................................
7.
8.
Median of distribution is that value of the variate which divides it into ................................
parts.
9.
10.
11.
12.
Mode is that value of the variate which occurs ................................ number of times in a
distribution.
13.
Mean
157
Research Methodology
Notes
Indicates the degrees of the scatteredness of the observations. Let curves A and B represent two
frequency distributions. Observe that A and B have the same mean. But curve A has less variability
than B.
If we measure only the mean of these two distributions, we will miss an important difference
between A and B. To increase our understanding of the pattern of the data, we must also measure
its dispersion.
Let us understand various measures of dispersion:
Range: It is the difference between the highest and lowest observed values.
1.
Notes
1.
2.
HL
is called the coefficient of range.
H+L
Semi-inter Quartile Range (Quartile deviation): Semi-inter quartile range Q.
2.
Q is given by Q =
Q3 Q1
2
Notes
1.
Q3 Q1
is called the coefficient of quartile deviation.
Q3 Q1
2.
Quartile deviation is not a true measure of dispersion but only a distance of scale.
Mean Deviation (MD): If A is any average then mean deviation about A is given by:
3.
MD(A) =
fi |x i A|
N
Notes
Mean deviation about mean MD(x)=
2.
Of all the mean deviations taken about different averages mean derivation about
the median is the least.
3.
158
f i |xi x|
N
1.
MD(A)
is called the coefficient of mean deviation.
A
4.
Notes
1
Sf i ( x i - x ) 2
N
Notes
1
Sf i ( x i x ) 2
N
2 =
1
Sf i ( xi2 - ( x )2
N
Combined variance of two sets of data of N1 and N2 items with means x1 and x2
and standard deviations 1 and 2 respectively is obtained by:
2 =
N 1 12 + N 2 22 + N 1 d12 + N 1 d22
N1 + N2
Where
and
x=
N 1 x1 + N 2 x 2
N1 + N2
Sample variance (2) : Let x1, x2, x3, xn, represent a sample with mean x.
Then sample variance 2 is given by:
2 =
Notes
5.
(x - x)
n-1
(x x)
n-1
n-1
n-1
n(x)2
n-1
n( x )2
is called the sample standard deviation.
n-1
Notes
100
x
1.
2.
159
Research Methodology
Notes
Example: For the data 103, 50, 68, 110, 105, 108, 174, 103, 150, 200, 225, 350, 103 find the
range, coefficient of range and coefficient of quartile deviation.
Solution:
Range = H L = 350 50 = 300
Coefficient of range
H L
300
0.7
H L 350 50
n 1 14
3.5,
4
4
3(n 1)
10.5
4
Coefficient of QD =
84
0.2896
290
Self Assessment
Fill in the blanks:
14.
15.
................................ is a measure of the average squared distance between the mean and
each term in the population.
8.4 Summary
160
Descriptive statistics are used to describe the basic features of the data in a study.
They provide simple summaries about the sample and the measures.
Together with simple graphics analysis, they form the basis of virtually every quantitative
analysis of data.
When summarizing a quantity like length or weight or age, it is common to answer the
first question with the arithmetic mean, the median, or, in case of a unimodal distribution,
the mode.
Sometimes, we choose specific values from the cumulative distribution function called
quantiles.
The most common measures of variability for quantitative data are the variance; its square
root, the standard deviation; the range; interquartile range; and the average absolute
deviation (average deviation).
Notes
8.5 Keywords
Average: It is a single value which can be taken as representative of the whole distribution.
Descriptive Statistics: Descriptive statistics are used to describe the basic features of the data in
a study.
Dispersion: It is the spread of the data in a distribution.
Median: It is that value of the variate which divides it into two equal parts.
Mode: It is that value of the variate which occurs maximum number of times in a distribution
and around which other items are densely distributed.
Show that if all observations of a series are added, subtracted, multiplied or divided by a
constant b, the mean is also added, subtracted, multiplied or divided by the same constant.
2.
Prove that the algebraic sum of deviations of a given set of observations from their mean
is zero.
3.
Prove that the sum of squared deviations is least when taken from the mean.
4.
The heights of 15 students of a class were noted as shown below. Compute arithmetic
mean by using (i) Direct Method and (ii) Short-Cut Method.
5.
6.
7.
8.
S.
No.
10
11
12
13
14
15
Ht.
(cms)
160
167
174
168
166
171
162
182
186
175
178
167
177
162
163
Marks
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
50 - 60
No. of Students
12
18
27
20
17
Frequency
50
45
30
20
10
15
Size
10 - 20
10 - 30
10 - 40
10 - 50
10 - 60
10 - 70
10 - 80
10 - 90
No. of
Students
16
56
97
124
137
146
150
Distinguish between an absolute measure and relative measure of dispersion. What are
the advantages of using the latter?
161
Research Methodology
Notes
9.
50 - 59
60 - 69
70 - 79
80 - 89
90 - 99
100 - 109
110 - 119
No. of Students
15
40
50
60
45
90
15
10.
Calculate the coefficient of mean deviation from mean and median from the following
data:
Marks
No.
Students
11.
of
14.
30 - 40
40 - 50
50 - 60
60 - 70
70 -80
80 -90
12
18
25
20
10
7, 4, 6, 4, 4, 5, 2, 4, 1, 7, 7, 6, 2, 3, 4, 2
(b)
13.
20 - 30
12.
10 - 20
Size
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
50 - 60
Frequency
20
44
26
Size
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
Frequency
10
15
10
20
30
40
50
60
70
80
No. of Persons
15
30
53
75
100
110
115
125
30
35
40
45
50
55
60
65
70
75
80
Frequency
13
17
12
162
1.
Summarisation
2.
averages
3.
Arithmetic Mean
4.
Shortcut Method
5.
Positional
6.
Frequencies
7.
Charliers
8.
Two equal
9.
Frequency
10.
Deciles
11.
12.
Maximum
13.
Mode
14.
Dispersion
15.
Variance
Notes
Books
Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
R.S. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
163
Research Methodology
Notes
Correlation
9.1.1
Scatter Diagram
9.1.2
9.1.3
9.1.4
9.1.5
Probable Error of r
9.1.6
9.1.7
9.1.8
9.1.9
9.1.10
9.2
Multiple Correlation
9.3
Partial Correlation
9.4
Regression Analysis
9.4.1
Simple Regression
9.4.2
9.4.3
Non-parametric Regression
9.5
Summary
9.6
Keywords
9.7
Review Questions
9.8
Further Readings
Objectives
After studying this unit, you will be able to:
164
Notes
Introduction
Once best estimates are chosen, both from a statistical and epidemiologic perspective, hypotheses
about the estimated association between a single mean, proportion, or rate and a fixed value,
typically standard or goal, or about the estimated association between two or more means,
proportions, or rates can be tested.
The measures of association refer to a wide variety of coefficients that measure the strength of
the relationship that has been described in several ways. The word association in measures of
association measures the strength of association in which there is at least one of the variables
that is dichotomous in nature, generally nominal or ordinal. The measures of association define
the strength of the linear relationship in terms of the degree of monotonicity. This degree of
monotonicity used by the measures of association is based on the counting of various types of
pairs in a relationship.
9.1 Correlation
Various experts have defined correlation in their own words and their definitions, broadly
speaking, imply that correlation is the degree of association between two or more variables.
Some important definitions of correlation are given below:
1.
2.
3.
When the relationship is of a quantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in a brief formula is known
as correlation.
Croxton and Cowden
4.
One of the variable may be affecting the other: A correlation coefficient calculated from
the data on quantity demanded and corresponding price of tea would only reveal that the
degree of association between them is very high. It will not give us any idea about
whether price is affecting demand of tea or vice-versa. In order to know this, we need to
165
Research Methodology
have some additional information apart from the study of correlation. For example if, on
the basis of some additional information, we say that the price of tea affects its demand,
then price will be the cause and quantity will be the effect. The causal variable is
also termed as independent variable while the other variable is termed as dependent
variable.
Notes
2.
The two variables may act upon each other: Cause and effect relation exists in this case
also but it may be very difficult to find out which of the two variables is independent.
Example: If we have data on price of wheat and its cost of production, the correlation
between them may be very high because higher price of wheat may attract farmers to produce
more wheat and more production of wheat may mean higher cost of production, assuming that
it is an increasing cost industry. Further, the higher cost of production may in turn raise the price
of wheat.
For the purpose of determining a relationship between the two variables in such situations,
we can take any one of them as independent variable.
3.
The two variables may be acted upon by the outside influences: In this case we might get a
high value of correlation between the two variables, however, apparently no cause and
effect type relation seems to exist between them.
Example: The demands of the two commodities, say X and Y, may be positively correlated
because the incomes of the consumers are rising. Coefficient of correlation obtained in such a
situation is called a spurious or nonsense correlation.
4.
A high value of the correlation coefficient may be obtained due to sheer coincidence (or
pure chance): This is another situation of spurious correlation. Given the data on any two
variables, one may obtain a high value of correlation coefficient when in fact they do not
have any relationship.
Example: A high value of correlation coefficient may be obtained between the size of shoe
and the income of persons of a locality.
The sets of points in scatter diagram are known as dots of the diagram
166
Notes
Figure 9.1
Non-Linear Relationship X
No Relation
If all the points or dots lie exactly on a straight line or a curve, the association between the
variables is said to be perfect. This is shown below:
Figure 9.2
Y
Perfect Positive
Linear Relationship
X O
Perfect Negative
Linear Relationship
X O
Perfect Non-Linear
Linear Relationship
A scatter diagram of the data helps in having a visual idea about the nature of association
between two variables. If the points cluster along a straight line, the association between variables
is linear. Further, if the points cluster along a curve, the corresponding association is non-linear
or curvilinear. Finally, if the points neither cluster along a straight line nor along a curve, there
is absence of any association between the variables.
It is also obvious from the above figure that when low (high) values of X are associated with low
(high) value of Y, the association between them is said to be positive. Contrary to this, when low
(high) values of X are associated with high (low) values of Y, the association between them is
said to be negative.
167
Research Methodology
Notes
This unit deals only with linear association between the two variables X and Y. We shall measure
the degree of linear association by the Karl Pearsons formula for the coefficient of linear
correlation.
II
X=X
Y=Y
III
( X, Y)
IV
As mentioned earlier, the correlation between X and Y will be positive if low (high) values of X
are associated with low (high) values of Y. In terms of the above Figure, we can say that when
values of X that are greater (less) than X are generally associated with values of Y that are
greater (less) than Y , the correlation between X and Y will be positive. This implies that there
will be a general tendency of points to concentrate in I and III quadrants. Similarly, when
correlation between X and Y is negative, the point of the scatter diagram will have a general
tendency to concentrate in II and IV quadrants.
Further, if we consider deviations of values from their means, i.e., ( Xi X ) and (Yi Y ), we note
that:
1.
2.
( Xi X ) will be negative and (Yi Y ) will be positive for all points in quadrant II.
3.
Both ( Xi X ) and (Yi Y ) will be negative for all points in quadrant III.
4.
( Xi X )
will be positive and (Yi Y ) will be negative for all points in quadrant IV.
It is obvious from the above that the product of deviations, i.e., ( Xi X )(Yi Y ) will be positive for
points in quadrants I and III and negative for points in quadrants II and IV.
Notes Since, for positive correlation, the points will tend to concentrate more in I and III
quadrants than in II and IV, the sum of positive products of deviations will outweigh the
sum of negative products of deviations. Thus, ( Xi X )(Yi Y ) will be positive for all the n
observations.
Contd...
168
Similarly, when correlation is negative, the points will tend to concentrate more in II and
IV quadrants than in I and III. Thus, the sum of negative products of deviations will
outweigh the sum of positive products and hence ( Xi X )(Yi Y ) will be negative for all
the n observations.
Notes
Further, if there is no correlation, the sum of positive products of deviations will be equal
to the sum of negative products of deviations such that ( Xi X )(Yi Y ) will be equal to
zero.
On the basis of the above, we can consider ( Xi X )(Yi Y ) as an absolute measure of correlation.
This measure, like other absolute measures of dispersion, skewness, etc., will depend upon (i)
the number of observations and (ii) the units of measurements of the variables.
In order to avoid its dependence on the number of observations, we take its average, i.e.,
1
( Xi X )(Yi Y ) . This term is called covariance in statistics and is denoted as Cov(X, Y).
n
To eliminate the effect of units of measurement of the variables, the covariance term is divided
by the product of the standard deviation of X and the standard deviation of Y. The resulting
expression is known as the Karl Pearsons coefficient of linear correlation or the product moment
correlation coefficient or simply the coefficient of correlation, between X and Y.
rXY =
Cov (X , Y )
...(1)
s Xs Y
)(
or
rXY =
Cancelling
1
Xi - X Yi - Y
n
2
1
1
Xi - X n Yi - Y
n
...(2)
1
from the numerator and the denominator, we get
n
rXY =
(X - X )(Y - Y )
(X - X ) (Y - Y )
i
Consider
(X
)(
...(3)
- X Yi - Y Xi - X Yi - Y
(X
-X
(X
-X
X Y - nXY ( Y nY )
i
) X
2
2
i
- nX 2
(Y - Y ) Y
2
and
- nY 2
rXY =
X Y - nXY
- nX Y
i
2
i
- nY 2
...(4)
169
Research Methodology
Notes
X Y - n n
Xi
rXY =
X
Xi2 - n n i
Y
Yi2 - n n i
( X )
-
2
i
( X )( Y )
X Y i
...(5)
( Y )
-
( X )( Y )
n X - ( X ) n Y - ( Y )
n X iYi -
rXY =
2
i
...(6)
or
rXY =
or
rXY =
or
rXY =
x y
i
...(7)
1
1
xi2 n yi2
n
X Y
X Y
i
2
i
...(8)
1 xi y i
n x y
...(9)
Equations (5) or (6) are often used for the calculation of correlation from raw data, while the use
of the remaining equations depends upon the forms in which the data are available. For example,
if standard deviations of X and Y are given, equation (9) may be appropriate.
Example: Calculate the Karl Pearsons coefficient of correlation from the following pairs of
values:
Values of Xi
12
10
11
13
Values of Yi
14
11
12
Solution:
The formula for Karl Pearsons coefficient of correlation is
( X )( Y )
n X - ( X ) n Y - ( Y )
n X iYi 2
i
170
The values of different terms, given in the formula, are calculated from the following table:
Xi
Yi
XiYi
Xi2
12
14
168
144
196
72
81
64
Notes
Yi2
48
64
36
10
90
100
81
11
11
121
121
121
13
12
156
169
144
21
49
70
63
676
728
651
7 676 - 70 63
rXY =
7 728 - (70 )
7 651 - (63)
0.949
Example: Calculate the Karl Pearsons coefficient of correlation between X and Y from the
following data:
)(
184, Yi - Y
148,
Xi - X Yi - Y 164, X 11 and Y 10
Solution:
Using the formula,
rXY =
(X - X )(Y - Y )
(X - X ) (Y - Y )
i
rXY =
, we get
164
0.99
184 148
Example: Calculate the correlation between Reading (X) and Spelling (Y) for the 10 students
whose scores are given below:
Student
Reading
Spelling
13
11
19
17
15
10
15
10
171
Research Methodology
Notes
Solution:
Student
Reading (X)
Spelling (Y)
11
2.5
0.7
1.75
1.5
9.3
13.95
19
3.5
8.7
30.45
3.5
5.3
18.55
17
2.5
6.7
16.75
1.5
7.3
10.95
15
4.5
4.7
21.15
10
4.5
1.3
5.85
15
0.5
4.7
2.35
10
0.5
2.3
1.15
Sum
55
103
0.0
0.0
60.5
5.5
10.3
2.872
5.832
Mean
Standard
Deviation
X x
Y y
(X x) (Y y)
S( X - m x )(Y - m y )
Ns xs y
-60.5
-60.5
= (10)(2.872)(5.832) = 167.495 = -0.36
However, in real practice, we use the computational or raw score formula for the correlation
coefficient:
r=
N SXY - ( SX )( SY )
N SX 2 - ( SX ) 2 N SY 2 - ( SY ) 2
Where:
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Correlation between Reading and Spelling for the data given in example using Computational
Formula:
r=
172
N SXY - ( SX )( SY )
N SX 2 - ( SX ) 2 N SY 2 - ( SY ) 2
Notes
(10)(506) - (55)(103)
(10)(385) - (55)2 (10)(1401) - (103)2
(5060 - 5665
-605
-605
-605
= (28.723)(58.318) 1675.0679 -0.36
Thus, the correlation is .36, indicating that there is a small negative correlation between reading
and spelling. The correlation coefficient is a number that can range from 1 (perfect negative
correlation) through 0 (no correlation) to 1 (perfect positive correlation).
Task
1.
The covariance between the length and weight of five items is 6 and their standard
deviations are 2.45 and 2.61 respectively. Find the coefficient of correlation between
length and weight.
2.
The Karl Pearson's coefficient of correlation and covariance between two variables
X and Y is 0.85 and 15 respectively. If variance of Y is 9, find the standard deviation
of X.
Let ui
X i A + hui , \ X A + hu
(X - X )(Y - Y )
(X - X ) (Y - Y )
i
rXY =
rXY =
h (u - u ) k (v - v )
h (u - u ) k (v - v )
i
(u - u )(v - v )
(u - u ) (v - v )
i
173
Research Methodology
Notes
rXY = ruv
This shows that correlation between X and Y is equal to correlation between u and v,
where u and v are the variables obtained by change of origin and scale of the variables X
and Y respectively.
This property is very useful in the simplification of computations of correlation. On the
basis of this property, we can write a short-cut formula for the computation of rXY:
( u )( v )
n u - ( u ) n v - ( v )
n ui vi -
rXY =
2.
2
i
2
i
...(10)
x 'i
\ x'
2
i
or
Xi X
x 'i2
Yi Y
and y 'i
2
2
i
and y '
X2
Y Y
Y2
and y 'i2
X2
Y Y
Y2
x' y'
2
i
2
i
1
Xi X Yi Y 1 Xi X Yi Y 1
n
x 'i y 'i
Also, r =
X Y
n
X Y n
Consider the sum xi + yi. The square of this sum is always a non-negative number,
i.e., (xi + yi)2 0.
Taking sum over all the observations and dividing by n, we get
1
2
(x 'i + y 'i ) 0
n
or
or
or
1
(x 'i2 + y 'i2 + 2 x 'i y 'i ) 0
n
1
1
2
x 'i2 + n y 'i2 + n x 'i y 'i 0
n
1 + 1 + 2r 0 or 2 + 2r 0 or r 1
.... (11)
Further, consider the difference xi yi. The square of this difference is also non-negative,
i.e., (xi yi)2 0.
Taking sum over all the observations and dividing by n, we get
1
2
(x 'i - y 'i ) 0
n
or
174
1
(x 'i2 + y 'i2 - 2 x 'i y 'i ) 0
n
Notes
or
1
1
2
x 'i2 + y 'i2 - x 'i y 'i 0
n
n
n
or
1 + 1 2r 0 or 2 2r 0 or r 1
.... (12)
Combining the inequalities (11) and (12), we get 1 r 1. Hence r lies between 1
and + 1.
3.
If X and Y are independent they are uncorrelated, but the converse is not true.
If X and Y are independent, it implies that they do not reveal any tendency of simultaneous
movement either in same or in opposite directions. The dots of the scatter diagram will be
uniformly spread in all the four quadrants. Therefore,
(X
)(
- X Yi - Y or Cov(X, Y)
will be equal to zero and hence, rXY = 0. Thus, if X and Y are independent, they are
uncorrelated.
The converse of this property implies that if rXY = 0, then X and Y may not necessarily be
independent. To prove this, we consider the following data:
X
1
X iYi n
XY
=0
A close examination of the given data would reveal that although rXY = 0, but X and Y are
not independent. In fact they are related by the mathematical relation Y = (X 4)2.
!
Caution rXY is only a measure of the degree of linear association between X and Y. If the
association is non-linear, the computed value of rXY is no longer a measure of the degree
of association between the two variables.
Coefficient of correlation r does not give any idea about the existence of cause and effect
relationship between the variables. It is possible that a high value of r is obtained although
none of them seem to be directly affecting the other. Hence, any interpretation of r should
be done very carefully.
2.
It is only a measure of the degree of linear relationship between two variables. If the
relationship is not linear, the calculation of r does not have any meaning.
3.
175
Research Methodology
Notes
4.
If the data are not uniformly spread in the relevant quadrants the value of r may give a
misleading interpretation of the degree of relationship between the two variables. For
example, if there are some values having concentration around a point in first quadrant
and there is similar type of concentration in third quadrant, the value of r will be very
high although there may be no linear relation between the variables.
5.
As compared with other methods, to be discussed later in this unit, the computations of r
are cumbersome and time consuming.
1 - r2
1 - r2
, \ P.E.(r ) = 0.6745
n
n
Uses of P.E.(r)
1.
It can be used to specify the limits of population correlation coefficient r (rho) which are
defined as r P.E.(r) r r + P.E.(r), where r denotes correlation coefficient in population
and r denotes correlation coefficient in sample.
2.
It can be used to test the significance of an observed value of r without the knowledge of
test of hypothesis. By convention, the rules are:
(a)
If |r| < 6 P.E.(r), then correlation is not significant and this may be treated as a
situation of no correlation between the two variables.
(b)
If |r|> 6 P.E.(r), then correlation is significant and this implies presence of a strong
correlation between the two variables.
(c)
If correlation coefficient is greater than 0.3 and probable error is relatively small,
the correlation coefficient should be considered as significant.
Example: Find out correlation between age and playing habit from the following information
and also its probable error.
Age
15
16
17
18
19
20
No. of Students
250
200
150
120
100
80
Regular Players
200
150
90
48
30
12
Solution:
Let X denote age, p the number of regular players and q the number of students. Playing habit,
denoted by Y, is measured as a percentage of regular players in an age group, i.e., Y = (p/q) 100.
176
Notes
u = X 17
v = Y 40
uv
u2
v2
15
16
17
18
19
20
250
200
150
120
100
80
200
150
90
48
30
12
80
75
60
40
30
15
2
1
0
1
2
3
40
35
20
0
10
25
80
35
0
0
20
75
4
1
0
1
4
9
1600
1225
400
0
100
625
60
210
19
3950
Total
rXY =
- 6 210 - 3 60
= - 0.99
6 19 - 9 6 3950 - 3600
1 - (0.99 )2
= 0.0055
Probable error of r, i.e., P.E.(r ) = 0.6745
6
Y1
Y2
...
Yj
...
Yn
Total
X1
f11
f12
...
f1j
...
f1n
f1
X2
f21
f22
...
f2j
...
f2n
f2
Xi
fi1
fi2
fin
fi
Xm
fm1
fm2
...
fmj
...
fmn
fm
Total
f1
f2
...
f j
...
fn
fij
( f X )( f Y )
- ( f X ) N f Y - ( f Y )
N f ij X iYj N f i Xi2
2
j
vj =
Yj - B
k
Xi - A
and
h
177
Research Methodology
rXY =
( f u )( f v )
- ( f u ) N f v - ( f v )
N f ij ui v j -
Notes
N fiu
2
i
2
j
Example: Calculate Karl Pearsons coefficient of correlation from the following data:
Age (yrs.)
Marks
18
19
20 - 25
15 - 20
10 - 15
20
21
22
4
7
5 - 10
10
3
0-5
2
4
Solution:
Let Xi denote the mid-value of the class interval of marks. Various values of Xi can be written as
22.5, 17.5, 12.5, 7.5 and 2.5.
Further, let ui = (Xi 12.5) 5. Various values of ui would be 2, 1, 0, 1 and 2.
Similarly, let Yj denote age. Various values of Yj are 18, 19, 20, 21 and 22.
Assuming vj = Yj 20, various values of vj would be 2, 1, 0, 1 and 2.
We shall use the values of ui and vj in the computation of r.
Table for Calculation of r
178
40 (- 44 ) - 6 12
40 50 - 36 40 56 - 144
- 1832
= - 0.903
1964 2096
Notes
Example: Given the following data, compute the coefficient of correlation r, between X
and Y.
Age (yrs.)
Marks
30 -50
50 -70
70 -90
Total
0-5
10
18
5 - 10
12
10 - 15
20
Total
17
18
15
50
Solution:
Note: Instead of doing the computation work in a single table, it can be split into the following
steps:
Taking mid-values of the class intervals, we have
Mid-values (X)
2.5
7.5
12.5
Mid-values (Y)
40
60
80
X i - 7.5
Yi - 60
and vi =
5
20
Let
ui =
and
1.
Calculation of fijuivj
fijuivj = 13
2.
3.
vj
ui
fi
fi ui
fi ui
- 1
18
12
20
- 18
18
0
20
0
20
50
38
T o ta l
0
1
T o ta l
0
1
fj
fj vj
fj v j
17
18
15
17
0
15
17
0
15
50
32
50 13 - 2 (-2)
=
50 38 - 4 50 32 - 4
654
= 0.376
1896 1596
179
Research Methodology
Notes
When the quantitative measurements of the characteristics are not possible, e.g., the results
of a beauty contest where various individuals can only be ranked.
2.
3.
When the given data consist of some extreme observations, the value of Karl Pearsons
coefficient is likely to be unduly affected. In such a situation the computation of the rank
correlation is preferred because it will give less importance to the extreme observations.
4.
The coefficient of correlation obtained on the basis of ranks is called Spearmans Rank Correlation
or simply the Rank Correlation. This correlation is denoted by (rho).
Let Xi be the rank of ith individual according to the characteristics X and Yi be its rank according
to the characteristics Y. If there are n individuals, there would be n pairs of ranks (Xi, Yi), i = 1, 2,
...... n. We assume here that there are no ties, i.e., no two or more individuals are tied to a
particular rank. Thus, Xis and Yis are simply integers from 1 to n, appearing in any order.
The means of X and Y, i.e., X Y
Also, sx2 =
sy2 =
1 + 2 + ..... + n n( n + 1) n + 1
.
n
2n
2
1 2
(n + 1)2 1 n(n + 1)(2n + 1) ( n + 1)2 n 2 1
[1 + 2 2 + ... + n 2 ]
n
4
n
4
12
6
) (
)(
di = Xi Yi Xi - X - Yi - Y X Y
Squaring both sides and taking sum over all the observations, we get
di2 =
=
(X
(X
) (
- X - Yi - Y
- X + Yi - Y
2
)(
)(
- 2 Xi - X Yi - Y
1
1
di2 X i - X
n
n
1
Yi - Y
n
2
Xi - X Yi - Y
n
= s X2 + s Y2 - 2Cov (X , Y ) 2s X2 - 2Cov ( X , Y ) ( s X2 s Y2 )
From this, we can write 1 -
or
180
2
1 di
n 2s X2
2
2
6 di2
1 di
1 di
12
= 1 - n 2s 2 1 - n 2 n 2 - 1 1 n (n 2 - 1 )
X
Notes
10
Maths
45
50
60
65
75
40
62
72
66
56
English
48
58
55
60
76
35
52
49
66
65
Solution:
Maths
Score
Maths
Rank
English
Score
English
Rank
d = Maths Rank
English rank
D2
45
48
50
58
60
55
65
60
75
76
40
10
35
10
62
52
72
49
36
66
66
56
65
16
The sum of the squared difference in ranks (the sum of the entries in the D2 column) is given by:
0+9+0+0+0+0+0+4+36+1+16 = 66
Using the Spearman rank-correlation coefficient, we obtain:
rs = 1 -
6 66
0.56
10(10 10 - 1)
The Spearman rank-correlation coefficient ranges from -1 to + 1. The estimate of 0.56 suggests a
strong positive relationship between rank performance in Maths and English.
181
Research Methodology
Notes
Since the Spearmans formula is based upon the assumption of different ranks to different
individuals, therefore, its correction becomes necessary in case of tied ranks. It should be noted
that the means of the ranks will remain unaffected. Further, the changes in the variances are
usually small and are neglected. However, it is necessary to correct the term di2 and accordingly
the correction factor
m m 2 1
12
rank, is added to it for every tie. We note that there will be two correction factors, i.e.,
and
2 4 1
12
3 9 1
in the above example.
12
4n n 12 n 1
4n n 1
2
n n 1
6
2
2
2
n n 1n 1 n n 1
n n 1 2n 1 n 1 2 n 1
3
3
3
6n n2 1
3
n n 1
2
Example: The following table gives the marks obtained by 10 students in commerce and
statistics. Calculate the rank correlation.
182
Marks in Statistics
35
90
70
40
95
45
60
85
80
50
Marks in Commerce
45
70
65
30
90
40
50
75
85
60
Notes
Solution:
Calculation Table
Rank of Marks in
Marks in
Statistics
Marks in
Commerce
Statistics X
Commerce Y
di = X i Yi
di2
35
45
90
70
70
65
40
30
95
90
10
10
45
40
60
50
85
75
80
85
50
60
6 di2
6 16
= 1
= 0.903
n( n 2 1)
10 99
2C D
rC
, where C denotes the number of concurrences and D (= number of observations
D
1) is the number of pairs of deviation.
Notes
1.
2C D
The sign of rC is taken to be equal to the sign of
.
D
2.
2C D
When
is negative, we make it positive for the purpose of taking its square
D
2C D
The sign of rC will be positive when
is positive.
D
4.
This method gives same weights to smaller as well as to the larger deviations.
183
Research Methodology
Notes
!
Caution This method is suitable only for the study of short term fluctuations because it
does not take into account the changes in magnitudes of the values.
Self Assessment
Fill in the blanks:
1.
2.
The only merit of Karl Pearsons coefficient of correlation is that it is the most popular
method for expressing the ....................... and ....................... of linear association.
3.
4.
5.
6.
A ....................... rank correlation implies that a high (low) rank of an individual according
to one characteristic is accompanied by its high (low) rank according to the other.
7.
When two or more individuals have the same rank, each individual is assigned a rank
equal to the ....................... of the ranks that would have been assigned to them in the event
of there being slight differences in their values.
8.
9.
Coefficient of correlation r does not give any idea about the existence of .......................
relationship between the variables.
10.
11.
12.
13.
Thus
Rijk =
xi2 - xi xi jk
x (xi - xi jk ) xi
2
i
x x
x x
i ic
2
i
2
ic
x (x - x )
x (x - x )
i
i . jk
2
i
i . jk
xi2 - x xi jk
i
x ( x xi xi jk )
2
i
2
i
184
Notes
nSi2 - nSi2. jk
Si2 - Si2. jk
Si
.... (13)
Ri2 jk 1 -
xi2 jk
Si2 jk
Si2
Si jk
1 2
Si - Si2 jk 1 - 2
2
Si
Si
.... (14)
xi2
2
Further, we can write Ri jk in terms of the simple correlation coefficients.
Ri2 jk 1 -
Si2 1 - rjk2
)r
2
ij
S1223....m
x1223....m
1
S12
x12
Self Assessment
Fill in the blanks:
14.
15.
x x
x x
ik
2
ik
jk
2
jk
Si
nS jSk rjk nSiS j rij - rikrjk
Sk
185
Research Methodology
Notes
2
= nSi - rik
Si
nSiSk rik = nSi2 (1 - rik2 )
Sk
2
2
2
= nS j 1 - rjk .
Sx ik
Similarly,
Thus, we have
rijk =
nS (1 - r ) nS 1 - r
2
i
2
ik
2
j
2
jk
(1 - r )(1 - r )
2
ik
2
jk
Self Assessment
Fill in the blanks:
16.
In case of three variables xi, xj and xk, the partial correlation between xi and xj is defined as
the simple correlation between them after eliminating the effect of..
17.
186
bivariate data, there will always be two lines of regression. It will be shown later that these two
lines are different, i.e., one cannot be derived from the other by mere transfer of terms, because
the derivation of each line is dependent on a different set of assumptions.
Notes
Line of Regression of Y on X
The general form of the line of regression of Y on X is YCi = a + bXi, where YCi denotes the average
or predicted or calculated value of Y for a given value of X = Xi. This line has two constants, a and
b. The constant a is defined as the average value of Y when X = 0. Geometrically, it is the intercept
of the line on Y-axis. Further, the constant b, gives the average rate of change of Y per unit change
in X, is known as the regression coefficient.
The above line is known if the values of a and b are known. These values are estimated from the
observed data (Xi, Yi), i = 1, 2, ...... n.
Notes It is important to distinguish between YCi and Yi. Where as Yi is the observed value,
YCi is a value calculated from the regression equation.
Using the regression YCi = a + bXi, we can obtain YC1, YC2, ...... YCn corresponding to the X values
X1, X2, ...... Xn respectively. The difference between the observed and calculated value for a
particular value of X say Xi is called error in estimation of the i th observation on the assumption
of a particular line of regression. There will be similar type of errors for all the n observations.
We denote by ei = Yi YCi (i = 1, 2,.....n), the error in estimation of the i th observation. As is
obvious from Figure 9.4, ei will be positive if the observed point lies above the line and will be
negative if the observed point lies below the line. Therefore, in order to obtain a Figure of total
error, eis are squared and added. Let S denote the sum of squares of these errors,
n
i=1
i=1
Figure 9.4
Yi
Y ci =
X
a+b i
Yci
a
O
Xi
The regression line can, alternatively, be written as a deviation of Yi from YCi i.e. Yi YCi = ei or
Yi = YCi + ei or Yi = a + bXi + ei. The component a + bXi is known as the deterministic component and
ei is random component.
The value of S will be different for different lines of regression. A different line of regression
means a different pair of constants a and b. Thus, S is a function of a and b. We want to find such
values of a and b so that S is minimum. This method of finding the values of a and b is known as
the Method of Least Squares.
Rewrite the above equation as S = S(Yi a bXi)2 (YCi = a + bXi).
187
Research Methodology
Notes
S
S
S
S
and
are the partial derivatives of S w.r.t. a and b
0, where
0 and (ii)
b
a
b
a
respectively.
n
S
-2 (Yi - a - bX i ) 0
=
a
i1
Now
n
or
(Y - a - bX )
or
i1
i1
i1
i1
= na + b X i
.... (1)
i1
n
S
2 (Yi - a - bX i )(- X i ) 0
=
b
i1
Also,
or
or
X Y - a X
i1
i1
i1
or
Y - na - b X
X Y
i
i1
(X Y - aX
n
i1
- bXi2 ) 0
- b Xi2 = 0
i1
i1
i1
2
= a X i + b X i
.... (2)
Equations (1) and (2) are a system of two simultaneous equations in two unknowns a and b,
which can be solved for the values of these unknowns. These equations are also known as
normal equations for the estimation of a and b. Substituting these values of a and b in the
regression equation YCi = a + bXi, we get the estimated line of regression of Y on X.
Expressions for the Estimation of a and b.
Dividing both sides of the equation (1) by n, we have
na b X i
+
or Y a + bX
n
n
.... (3)
This shows that the line of regression YCi = a + bXi passes through the point X , Y .
From equation (3), we have a Y - bX
.... (4)
2
SXiYi = Y - bX X i + bX i
or
188
X Y - nXY
i i
= b
b=
( X
2
i
- nX 2
X Y - nXY
X - nX
i
i
2
i
.... (5)
Also,
and
(X
- X Yi - Y
- nX 2 =
(X
-X
i i
2
i
)(
X Y - nXY
b=
Notes
(X - X )(Y - Y )
(X - X )
.... (6)
x y
x
.... (7)
or
b=
2
i
)(
1
Xi - X Yi - Y Cov (X , Y )
n
b=
2
1
s X2
Xi - X
.... (8)
The expression for b, which is convenient for use in computational work, can be written from
equation (5) is given below:
X
Y
( X )( Y )
X Y - n n n X Y n
X
X
(
)
X - n n
X - n
i
b=
2
i
2
i
b=
( X )(Y )
n X - ( X )
n XiYi -
2
i
.... (9)
To write the shortcut formula for b, we shall show that it is independent of change of origin but
not of change of scale.
As in case of coefficient of correlation we define
ui =
Xi - A
h
and
vi =
Yi - B
k
or
Xi = A + hui
and
Yi = B + kvi
X = A + hu
and
Y = B + kv
also
and
(X
- X = h (ui - u )
Yi - Y = k ( vi - v )
189
Research Methodology
Notes
b=
hk (ui - u )( vi - v )
h
(u
- u)
)(
k (ui - u )( vi - v )
h (ui - u )
k n ui vi - ui vi
2
h
n ui2 - ui
.... (10)
Cov (X , Y )
s X2
r s Xs Y
s X2
=r
sY
sX
or
or
(Y
Ci
-Y = r
or YCi - Y = b Xi - X
sY
(X i - X )
sX
.... (11)
.... (12)
Line of Regression of X on Y
The general form of the line of regression of X on Y is XCi = c + dYi, where XCi denotes the
predicted or calculated or estimated value of X for a given value of Y = Yi and c and d are
constants. d is known as the regression coefficient of regression of X on Y.
In this case, we have to calculate the value of c and d so that
S = (Xi XCi)2 is minimised.
As in the previous section, the normal equations for the estimation of c and d are
and
Xi = nc + dYi
.... (13)
.... (14)
Figure 9.5
Y
X ci =
Yi
Yci
c+b
Yi
Xi
c
O
Dividing both sides of equation (13) by n, we have X = c + dY .
190
This shows that the line of regression also passes through the point X , Y . Since both the lines
Notes
.... (15)
As before, the various expressions for d can be directly written, as given below.
d=
X Y - nXY
Y - nY
i
.... (16)
Figure 9.6
ci
c+
dY
Y ci =
a+b
Xi
(X - X )(Y - Y )
(Y - Y )
.... (17)
x y
y
.... (18)
or
d=
or
d=
2
i
)(
1
Xi - X Yi - Y Cov (X , Y )
n
=
2
1
s Y2
Y
Y
i
n
Also
d=
( X )(Y )
n Y - ( Y )
n XiYi -
.... (20)
.... (19)
This expression is useful for calculating the value of d. Another shortcut formula for the calculation
of d is given by
d=
where ui
)(
h n ui vi - ui vi
2
k
n vi2 - vi
.... (21)
Xi - A
Yi - B
and vi
h
k
191
Research Methodology
Notes
Cov (X , Y )
s Y2
rs Xs Y
s Y2
=r
sX
sY
.... (22)
Substituting the value of c from equation (15) into line of regression of X on Y we have
(X
XCi = X - dY + dYi or
(X
or
-X = r
Ci
sX
Yi - Y
sY
Ci
) (
- X = d Yi - Y
.... (23)
.... (24)
Remarks: It should be noted here that the two lines of regression are different because these
have been obtained in entirely two different ways. In case of regression of Y on X, it is assumed
that the values of X are given and the values of Y are estimated by minimising S(Yi YCi)2 while
in case of regression of X on Y, the values of Y are assumed to be given and the values of X are
estimated by minimising S(Xi XCi)2. Since these two lines have been estimated on the basis of
different assumptions, they are not reversible, i.e., it is not possible to obtain one line from the
other by mere transfer of terms. There is, however, one situation when these two lines will
coincide. From the study of correlation we may recall that when r = 1, there is perfect correlation
between the variables and all the points lie on a straight line. Therefore, both the lines of
regression coincide and hence they are also reversible in this case. By substituting r = 1 in
equation (12) or (24) it can be shown that the lines of regression in both the cases become
Yi - Y
Xi - X
s = s
Y
X
Further when r = 0, equation (12) becomes YCi = Y and equation (24) becomes XCi = X . These are
the equations of lines parallel to X-axis and Y-axis respectively. These lines also intersect at the
point ( X , Y ) and are mutually perpendicular at this point, as shown in Figure.
bd = r
s
sY
and d = r X , we have
sY
sX
sY s X
r
= r 2 or r = b d . This shows that correlation coefficient is the geometric mean
s X sY
Remarks: The following points should be kept in mind about the coefficient of correlation and
the regression coefficients:
1.
Since r =
Cov( X , Y )
s Xs Y
,b=
Cov (X , Y )
s X2
and d =
Cov (X , Y )
s Y2
always be same and this will depend upon the sign of Cov(X, Y).
2.
Since bd = r2 and 0 r2 1, therefore either both b and d are less than unity or if one of them
is greater than unity, the other must be less than unity such that 0 b d 1 is always true.
Example: Obtain the two regression equations and find correlation coefficient between X
and Y from the following data:
192
10
11
Notes
Solution:
Calculation Table
(a)
XY
X2
Y2
10
60
100
36
27
81
14
49
32
64
16
11
55
121
25
45
20
188
415
90
Regression of Y on X
b=
Also, X =
n XY -
45
20
= 9 and Y =
=4
5
5
Regression of X on Y
d=
n XY -
193
Research Methodology
Notes
Kernel Estimation
The kernel regression is a non-parametric technique in statistics to estimate the conditional
expectation of a random variable. The objective is to find a non-linear relation between a pair of
random variables X and Y.
In any non-parametric regression, the conditional expectation of a variable Y relative to a
variable X may be written:
E(Y|X) = m(X)
where m is an unknown function.
Figure above shows a local polynomial regression. A local polynomial regression is similar
to kernel estimation, but the fitted values are produced by locally weighted regression rather
than by locally weighted averaging. Most commonly, the order of the local polynomial is taken
as k = 1, that is, a local linear fit. Local polynomial regression tends to be less biased than kernel
regression, for example at the boundaries of data. More generally, the bias of the local-polynomial
estimator declines and the variance increases with the order of the polynomial, but an oddordered local polynomial estimator has the same asymptotic variance as the preceding evenordered estimator: Thus, the local-linear estimator (of order 1) is preferred to the kernel estimator
(of order 0), and the local-cubic (order 3) estimator to the local-quadratic (order 2).
Smoothing Splines
The smoothing spline is a method of smoothing (fitting a smooth curve to a set of noisy
observations) using a spline function.
Let (xi, Yi); i = 1, ... , n be a sequence of observations, modeled by the relation E(Yi) = (xi). The
smoothing spline estimate of the function is defined to be the minimizer (over the class of twice
differentiable functions) of
n
(Yi - m ( xi ))2 +l m "( x )2 dx
i 1
194
Notes
Remarks:
1.
l 0 is a smoothing parameter, controlling the trade-off between fidelity to the data and
roughness of the function estimate.
2.
3.
4.
As (infinite smoothing), the roughness penalty becomes paramount and the estimate
converges to a linear least-squares estimate.
5.
The roughness penalty based on the second derivative is the most common in modern
statistics literature, although the method can easily be adapted to penalties based on other
derivatives.
6.
In early literature, with equally-spaced xi, second or third-order differences were used in
the penalty, rather than derivatives.
7.
10.5
16.4
10.7
18.8
10.8
19.7
...
...
20.6
77.0
Here there is only one independent variable, so the x matrix is just a single column. Given these
measurements, we would like to build a model which predicts the expected y for a given x.
Figure 9.8
195
Research Methodology
Notes
A Linear Model
A linear model for the above data is
y = 37 + 5.1x
The hat on the indicates that is estimated from the data. The figure on the right shows a plot of
this function: a line giving the predicted versus x, with the original values of y shown as red
dots.
The data at the extremes of x indicates that the relationship between y and x may be non-linear
(look at the red dots relative to the regression line at low and high values of x). We thus turn to
MARS to automatically build a model taking into account non-linearities. MARS software
constructs a model from the given x and y as follows:
y = 25
196
Notes
Self Assessment
Fill in the blanks:
18.
The regression equations are useful for predicting the value of ....................... variable for
given value of the ....................... variable.
19.
20.
9.5 Summary
Researchers sometimes put all the data together, as if they were one sample.
We can use the technique of correlation to test the statistical significance of the association.
In other cases we use regression analysis to describe the relationship precisely by means
of an equation that has predictive value.
197
Research Methodology
Notes
The correlation measures the direction and strength of the linear relationship.
The least-squares regression line is the line that makes the sum of the squares of the
vertical distances of the data points from the line as small as possible.
9.6 Keywords
Correlation: It is an analysis of covariation between two or more variables.
Correlation Coefficient: It is a numerical measure of the degree of association between two or
more variables.
Kernel Estimation: The kernel regression is a non-parametric technique in statistics to estimate
the conditional expectation of a random variable.
Regression Equation: If the coefficient of correlation calculated for bivariate data (Xi, Yi), i = 1, 2,
...... n, is reasonably high and a cause and effect type of relation is also believed to be existing
between them, the next logical step is to obtain a functional relation between these variables.
This functional relation is known as regression equation in statistics.
Smoothing Splines: It is a method of fitting a smooth curve to a set of noisy observations using
a spline function.
198
Obtain the two lines of regression from the following data and estimate the blood pressure
when age is 50 years. Can we also estimate the blood pressure of a person aged 20 years on
the basis of this regression equation? Discuss.
Age (X) (in years)
56
42
72
39
63
47
52
49
40
42
68
60
127
112
140
118
129
116
130
125
115
120
135
133
2.
Show that the coefficient of correlation, r, is independent of change of origin and scale.
3.
4.
If two variables are independent the correlation between them is zero, but the converse
is not always true. Explain the meaning of this statement.
5.
What is Spearmans rank correlation? What are the advantages of the coefficient of rank
correlation over Karl Pearsons coefficient of correlation?
6.
Distinguish between correlation and regression. Discuss least square method of fitting
regression.
7.
What do you understand by linear regression? Why there are two lines of regression?
Under what condition(s) can there be only one line?
8.
What do you think as the reason behind the two lines of regression being different?
9.
10.
What can you conclude on the basis of the fact that the correlation between body weight
and annual income were high and positive?
11.
12.
13.
14.
15.
Notes
(i)
(ii)
(iii)
(iv)
(v)
52
60
58
39
41
53
47
34
40
46
43
54
49
55
48
57
Find the multiple and partial correlation coefficients for the following data.
X
19
21
24
26
27
27
29
31
30
31
24
28
29
39
30
31
34
35
36
37
21
2-
26
30
27
32
31
36
33
38
Find the multiple correlation coefficient R1.23, the partial correlation coefficient r 23.1 and
the multiple regression equation of X2 on X3 X1.
X1
55
59
63
68
56
73
82
76
64
74
Y1
58
60
53
52
61
70
76
77
63
80
Z1
63
55
51
56
59
74
74
81
61
84
Find the three multiple correlation coefficients for the following data.
X1
10
15
12
X2
13
10
X3
10
14
12
16
19
27
29
32
33
39
40
43
X2
11
13
16
19
24
31
33
37
X3
11
15
21
25
26
34
40
2.
degree, direction
3.
probable error
4.
extreme
5.
Rank
6.
positive
7.
mean
8.
0.3, small
9.
10.
uncorrelated
11.
1 and + 1
12.
origin, scale
13.
avoid
14.
multiple
15.
simple
16.
xk
17.
Partial
18.
dependent, independent
19.
20.
kernel
199
Research Methodology
Notes
Books
Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
RS. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
200
Notes
CONTENTS
Objectives
Introduction
10.1 Time Series Analysis
10.2 Components of a Time Series
10.2.1
Secular Trend
10.2.2
Periodic Variations
10.2.3
10.3.2
Exponential Smoothing
10.6.2
10.6.3
10.6.4
10.7 Summary
10.8 Keywords
10.9 Review Questions
10.10 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
The future has always held a great fascination for mankind. Perhaps this is biologically
determined. Man and the higher apes seem to have brains that are equipped to engage in actions
for which a future reward is anticipated. In extreme situation reward is anticipated not in this
life but in the next life.
201
Research Methodology
Notes
There are two methodologies to anticipate future. They are called qualitative and quantitative.
But both start with the same premise, that an understanding of the future is predicted on an
understanding of the past and present environment. In this unit, we will mainly deal with
quantitative methods. We will also distinguish between forecast and prediction. We use the
word forecast when some logical method is used.
The quantitative decision maker always considers himself or herself accountable for a forecast
within reason. Let us look at the conceptual model first and then the mathematical model and
algorithms in turn which are used for making forecast.
Sales
Maturity
Decline
Growth
Introduction
Time
Another point of interest is the behaviour of the sales variable over the short run. It fluctuates
between a succession of peaks and troughs. How do these come about? In order to answer this
question, the time series, must be decomposed. Then four independent motors for this behaviour
become visible. First there is a long-term or secular trend (T) which is primarily noticeable
within each stage of the cycle and over the entire cycle. Secondly cyclical variations (C) which
are caused by an economys business cycles affect product sales. Such cycles, whose origins are
little understood, exist for all economies. Thirdly the products sales may be influenced by the
seasonality (S) of the item, and finally there may be the irregular (I) effects of inclement such as
weather, strikes and so forth. In equation form the decomposed time series appears as TS = T +
C + S + I.
202
This creates a complex situation in time series analysis. Each factor must be quantified and its
effect ascertained upon product sales. Let us see how this is done. The long-term trend effect T is
reflected in the slope b of the regression equation. We already know how b is calculated even
though minor modifications of the decision formulas will be encountered soon. The quantification
of the cyclical component C is beyond the scope of this book. However, since business cycles
always proceed from peak to trough to new peak and so on, their positive and negative effects
upon a products sales cancel out in the long-run. Hence in managerial, as opposed to economic,
decision making, the sum effect of the business cycles may be set equal to zero. This eliminates
the C factor from the equation. Seasonality, if present, is something that must be taken into
consideration because it is a product-inherent variable and therefore it is under the immediate
control of the decision maker. We will quantify the S component and keep it in the equation.
Notes
Finally, there are the irregular variations. Do we know in July whether the weather will be
sunny and mild during the four weeks before Diwali? We dont, but we know that if this
happens, Diwali sales will be severely impacted. Can we forecast such horrible weather
conditions? Not really. We cannot forecast them because they cannot be quantifieda rather
unpleasant characteristic they share with all other type of irregular variations like strikes,
earthquakes, power failure, etc. Yet, something strange usually happens after such an irregular
variation from normal has occurred. Whatever people did not do because of it, like not
buying a product, they attempt to catch up with quickly. Therefore the factor effect may also be
assumed to cancel out over time and it may be dropped from the equation which then appears
to the manager as TS = T + S.
Linear Analysis
We will construct again the best fitting regression line by the method of least squares. It involves
the dividend payments per share of the Smart, a well-known discount store chain, for the years
1990 through 1999. Suppose that a potential investor would like to know the dividend payment
for 2001. The data are recorded in the work sheet (Table 10.1) that appears below. First, however,
turn your attention to Figure 10.2 which shows the plot for this problem.
Figure 10.2: Plot of Dividend Values
10
9
8
7
6
5
4
3
2
1
1990 1
9 2000
203
Research Methodology
Notes
Think for a moment about the qualitative nature of the time variable. It is expressed in years in
this case but could be quarters, months, days, hours, minutes or any other time measurement
unit. How does it differ from advertising expenditures, the independent variable that we examine
in the preceding section? Is there a difference in the effect that a unit of each has on the dependent
variable, or, 1 million in one case and 1 year in the other? Time, as you can readily see is
constant. One year has the same effects as any other. This is not true for advertising expenditures,
especially when you leave the linear environment and enter the nonlinear environments. Then
there may be qualitative difference in the sales impact as advertising expenditures are increased
or decreased by unit.
The worksheet is in Table 10.1 and calculations are as follows:
Table 10.1: Worksheet
YEAR
Code for
an Even
Series X
YEAR
Code for
an Odd
Series X
Divident
payments
in Rs Y
XY
X2
1990
-9
1991
-7
1991
-4
2.2
-8.8
16
1992
-5
1992
-3
2.4
-7.2
1993
-3
1993
-2
3.0
-6.0
1994
-1
1994
-1
5.0
-5.0
1995
1995
6.8
1996
1996
8.1
8.1
1997
1997
9.0
18.0
1998
1998
9.5
28.5
1999
1999
9.9
39.6
16
Total
55.9
67.2
60
Since time is constant in its effect, we may code the variable rather than to use the actual years or
other time units x values. This code assigns a 1 to the first time period in the series and continues
in unit distances to the nth period. Do not start with a zero as this may cause some computer
programs to reject the input. The code is based on the fact that the unit periods are constant, and
therefore their sum may be set equal to zero. See what effect this has on the normal equations for
the straight line.
y = na + bx
xy = ax + bx2
If x = 0, the equations reduce to
y = na
xy = bx 2
which allow the direct solution for a and b as follows
a=
b=
204
Sy
n
S xy
S x2
This form simplifies the calculations substantially compared to the previous formulas. The
code, however, that allows to set x = 0 must incorporate the integrity of a unit distance series.
Thus if the series is odd-numbered, the midpoint is set equal to zero and the code completed by
negative and positive unit distances of x = 1 where each x unit stands for one year or other time
period. If the series is even-numbered, let us say it ran from 1990 to 1999, the two midpoints
(1994/1995) are set equal to -1 and +1, respectively. Since there is now a distance of x = 2 between
+ 1 (-1, 0, +1), the code continues by negative and positive units distance of x = 2 where each x unit
stands for one-half year or other time period.
Then
a=
b=
Notes
5.59
= 6.211
9
S xy
= 1.12
S x2
205
Research Methodology
Notes
Let us quantify this seasonality and illustrate how it may be used in a decision situation. There
are, as is often the case, a number of decision tools that may be applied. The reader may be
familiar with the term ratio-to-moving-average. It is a widely used method for constructing a
seasonal index and programs are available in most larger computer libraries. Usually the method
assumes a 12-period season like the twelve months of the year. There is a more efficient method
which yields good statistical results. It is especially helpful in manual calculations of the seasonal
index and when the number of seasonal periods is small like the four quarters of a year, the six
hours of a stock exchange trading day or the five days of a work week. This method is known as
simple average and will be used for illustration purposes.
To stay with the investment environment of this unit section, let us calculate a seasonal index for
shares traded on the Stock Exchange from July 2 through July 7, 1999. This period includes the
July 4 week-end. Volume of shares (DATA) for each trending day (SEASON) is given in thousands
of shares per hour. The Individual steps of the analysis (OPERATIONS) are discussed in detail for
each column of the worksheet below:
Table 10.2: Worksheet
(1)
Hour
(TS)
Column
(2)
(3)
(4)
(5)
Total
Trend
Seasonal Seasonal
Variation Variation variation
Index
(T)
TS-T
(6)
(7)
(8)
(9)
(10)
7/2
7/3
7/6
7/7
Avg. for
four
days
10-11
0.965
0.965
110.6
12.00
12.25
15.44
16.72
14.10
11-12
0.245
0.159
0.086
103.7
10.40
11.75
15.04
16.32
13.38
12-13
-0.885
0.318
-1.117
94.2
10.55
10.06
12.95
15.44
12.25
13-14
-1.555
0.477
-2.032
87.1
9.55
9.46
12.05
15.24
11.58
14-15
0.395
0.636
-0.241
101.1
11.02
11.55
14.82
16.73
13.53
15-16
0.835
0.795
0.040
103.3
11.58
12.25
15.38
16.69
13.97
600
10.85
11.22
14.28
16.19
13.135
Average
-0.383
As you inspect the data columns, you notice the V-shaped season for each trading day. You also
notice in the total daily volume that there is a increase in shares traded. Hence, you can expect a
positive slope of the regression line. The hourly mean number of shares is indicated also. This
is the more important value because we are interested in quantifying a season by the hour for
each trading day. Now turn to the operations. In last column the hourly trading activity for the
four days has been summed. In this total all time series factors are assumed to be incorporated.
You will recall that the positive or negative cyclical and irregular component effect is assumed
to cancel out over time. Hence averaging the trading volume over a long term data set eliminates
both components, yielding TS=T+S. You may ask, are four days a sufficiently long time span?
The answer is NO. In a real study you would probably use 15 to 25 yearly averages for each
trading hour. In an on-the-job application of this tool, you will have to know the specific time
horizon in order to effectively eliminate cyclical and irregular variations. But by and large,
what is a long or short time span depend upon situation.
In order to isolate the trend component (T) so that it may be subtracted from column (2) in the
Table 10.2, yielding seasonal variation, the slope (b) of the regression line must be calculated.
(Remember: b is T.) The necessary calculations are performed below using the mean hourly
trading volume for each day. But since we are interested in an index by the hour, the calculated
206
daily b value must be apportioned to each hour. This is accomplished by a further division by
sixthe number of trading hours. The result is entered in column (3). Note that the origin of a
time series is always zero. The origin of the time series is always the first period of the season.
In our case this is the 10-11 trading hour. Therefore the first entry in column (3) is always zero to
be followed by the equal (since this is a linear analysis) summed increment of the apportioned
b-value.
Notes
Day
Code
X1
xy
X2
7/1
-3
10.85
-32.55
7/5
-1
11.22
-11.22
7/6
14.28
14.28
7/7
16.19
48.57
52.54
19.08
20
Total
b=
S xy
S x2
19.08
= 0.954
20
It is not necessary to calculate the y-intercept (a) in this analysis unless of course, you wish to
combine it with a long-term forecast of daily trading volume. Then, just to review the calculations,
you would find:
a =
=
Sy
n
52.54
4
= 13.135
and
y c = 13.135 + 0.954 x
origin 7/5 and 7/6
x in half trading day units.
In column (4) TS T = S is performed. Column (4) is already a measure of seasonal variation. But
in order to standardise the answer so that it may be compared with other stock exchange, for
example, it is customary to convert the values in column (4) to a seasonal index. Every index has
a base of 100 and the values above or below the base indicate percentages of above or below
normal activity, hence the season. Since the base of column (5) is 100, the mean of the column
should be 100 and the total 600 since there are 6 trading hours. In order to convert the obtained
values of column (4) to index numbers, each of its entries is added to the total mean and then is
divided by the column mean added to total mean and multiplied by 100 yielding the corresponding
entry in column (5). It is customary to show index numbers with one significant digit.
207
Research Methodology
Notes
Column (6) shows the seasonal effect of this decision variableshare trading on the Stock
Exchange. Regardless of heavy or light daily volume, the first hour volume is the heaviest by
far. It is 7.4% above what may be considered average trading volume for any given day. Keep in
mind that a very limited data set was used in this analysis and while the season, reaching its low
point between 1 and 2 p.m., is generally correctly depicted, individual index members may be
exaggerated. What managerial action programs would result from an analyses such as this?
Would traders go out for tea and samosas between 10-11? How about lunch between 1-2? When
would brokers call clients with hot or lukewarm tips? Assuming that a decrease in volume
means a decrease in prices in general during the trading day, when would a savvy trader buy?
When would he sell? Think of some other intervening variables and you have yourself a nice
little bull session in one of Dalal Streets watering holes. If, in addition, you make money for
yourself or firm, then, you have got it.
Non-linear Analysis
Any number of different curves may be fitted to a data set. The most widely used program in
computer libraries, known as CURFIT, offers a minimum of 5 curves plus the straight line. The
curves may differ from program to program. So, which ones are the best ones? There is no
answer. Every forecaster has to decide individually about his pet forecasting tools. We will
discuss and apply three curves in this section. They appear to be promising decision tools
especially in problem situations that in some way incorporate the life cycle concept and the
range of such problems is vast, indeed.
If you take a look again at Figure 10.2, you see that three curves have been plotted. As we know
from many empirical studies, achievement is usually normally distributed. Growth, on the
other hand, seem to be exponentially distributed. The same holds true for decline. As the life
cycle moves from growth to maturity, a parabolic trend may often be used as the forecasting
tool. These are two of the curves that will be considered. The third one is related to the exponential
curve. As you look at the growth stage and mentally extrapolate the trend, your eyes will run off
the page. Now, we knowagain from all sorts of empirical evidencethat trees dont grow
into the high heavens. Even the most spectacular growth must come to an end. Therefore, when
using the exponential forecast, care must be taken that the eventual ceiling or floor ( in the case
of a decline) are not overlooked. The modified exponential trend has the ceiling or floor build
in. It is the third curve to be discussed.
One final piece of advice before we start fitting curves. If you can do it by straight line, do it. For
obvious reasons, just look at Figure 10.2, any possible errorand there is always a built-in five
percent chanceis worse when a curve is fitted. By extending the planning and forecasting
horizon over a reasonable shorter period rather than spectacular but dangerous longer period,
the straight line can serve as useful prediction tool.
Where a, b and c are constants a and b have been dealt. c can be treated as acceleration. The
normal equations are (method of least square).
Sy = na + bSx +cSx2
Sxy = aSx + bSx2 + cSx3
Sx2y = aSx2 + bSx3 + cSx4
208
Notes
S xy
S x2
There are direct formulas for a and c as well, but because of the possible compounding of
arithmetic error in manual calculations, it is safer to solve a and c algebraically in this case.
To illustrate the parabolic trend let us forecast earnings per share in dollars for Storage Technology
Corporation for the years 2000 and 2001. Storage Technology manufactures computer data
storage equipment, printers, DVD-ROMS and telecommunication products. The company was
founded in 1969 and after going through a period of explosive growth seems to be moving into
the maturity stage. Data, code and calculations are shown below in the usual worksheet format.
Year
Code
Earnings
Per Share
x2
xy
x2y
x4
1993
-3
0.39
-1.17
3.51
81
1994
-2
0.54
-1.08
2.16
16
1995
-1
1.13
-1.13
1.13
1996
1.58
1997
1.72
1.72
1.72
1998
2.50
5.00
10.00
16
1999
Total
Then b =
1.84
5.52
9.70
8.86
9
28
16.56
35.08
81
196
8.86
28
= 0.3164
and solving simultaneously
9.70 = 7a + 28c 4
38.8 = 28a + 112c
35.08 = 28a + 196c
3.72 = 84c
c = 0.0443
a = 1.2085
Therefore
y c = 1.2085 + 0.3164 x 0.0443x 2
origin 1996
x in 1 year units
209
Research Methodology
Notes
and specifically,
y2000 = 1.2085 + 0.3164(5) 0.0443(5) 2
=
1.68,
1.51
!
Caution Remember that the data set is small. Quarterly earnings per share figures for the
period may have been better because of the larger sample size. The significance test and
construction of the confidence interval is performed as previously shown. Furthermore,
as soon as new earnings per share figures become available, the regression line should be
recalculated, because there is always the chance that there may be a change in the
environment.
Self Assessment
Fill in the blanks:
1.
2.
In time series analysis, Each factor must be .and its effect ascertained upon
product sales.
3.
2.
3.
(a)
Seasonal Variations
(b)
Cyclical Variations
210
show that, in general, these magnitudes have been rising over a fairly long period. As opposed
to this, a time series may also reveal a declining trend, e.g., in the case of substitution of one
commodity by another, the demand of the substituted commodity would reveal a declining
trend such as the demand for cotton clothes, demand for coarse grains like bajra, jowar, etc. With
the improved medical facilities, the death rate is likely to show a declining trend, etc. The
change in trend, in either case, is attributable to the fundamental forces such as changes in
population, technology, composition of production, etc.
Notes
According to A.E. Waugh, secular trend is, that irreversible movement which continues, in
general, in the same direction for a considerable period of time. There are two parts of this
definition: (i) movement in same direction, which implies that if the values are increasing (or
decreasing) in successive periods, the tendency continues; and (ii) a considerable period of time.
Notes There is no specific period which can be called as a long period. Long periods are
different for different situations. For example, in cases of population or output trends, the
long period could be 10 years while it could be a month for the daily demand trend of
vegetables. It should, however, be noted that longer is the period the more significant
would be the trend. Further, it is not necessary that the increase or decrease of values must
continue in the same direction for the entire period. The data may first show a rising (or
falling) trend and subsequently a falling (or rising) trend.
To study past growth or decline of the series. On ignoring the short term fluctuations,
trend describes the basic growth or decline tendency of the data.
2.
Assuming that the same behaviour would continue in future also, the trend curve can be
projected into future for forecasting.
3.
In order to analyse the influence of other factors, the trend may first be measured and then
eliminated from the observed values.
4.
Trend values of two or more time series can be used for their comparison.
Boom
Re
ce
ssi
on
Re
ry
ve
o
c
Depression
Period of osc illation
211
Research Methodology
Notes
The oscillatory movements are termed as Seasonal Variations if their period of oscillation is
equal to one year, and as Cyclical Variations if the period is greater than one year.
A time series, where the time interval between successive observations is less than or equal to
one year, may have the effects of both the seasonal and cyclical variations. However, the seasonal
variations are absent if the time interval between successive observations is greater than one
year.
Although the periodic variations are more or less regular, they may not necessarily be uniformly
periodic, i.e., the pattern of their variations in different periods may or may not be identical in
respect of time period and size of periodic variations. For example, if a cycle is completed in five
years then its following cycle may take greater or less than five years for its completion.
1.
Causes of Seasonal Variations: The main causes of seasonal variations are: (a) Climatic
Conditions and (b) Customs and Traditions
(a)
Climatic Conditions: The changes in climatic conditions affect the value of time series
variable and the resulting changes are known as seasonal variations. For example,
the sale of woolen garments is generally at its peak in the month of November
because of the beginning of winter season. Similarly, timely rainfall may increase
agricultural output, prices of agricultural commodities are lowest during their
harvesting season, etc., reflect the effect of climatic conditions on the value of time
series variable.
(b)
Customs and Traditions: The customs and traditions of the people also give rise to the
seasonal variations in time series.
Example: The sale of garments and ornaments may be highest during the marriage
season, sale of sweets during Diwali, etc., are variations that are the results of customs and
traditions of the people.
It should be noted here that both of the causes, mentioned above, occur regularly and are
often repeated after a gap of less than or equal to one year.
Objectives of Measuring Seasonal Variations: The main objectives of measuring seasonal
variations are:
2.
(a)
(b)
(c)
Causes of Cyclical Variations: Cyclical variations are revealed by most of the economic
and business time series and, therefore, are also termed as trade (or business) cycles. Any
trade cycle has four phases which are respectively known as boom, recession, depression
and recovery phases. Various phases repeat themselves regularly one after another in the
given sequence. The time interval between two identical phases is known as the period of
cyclical variations. The period is always greater than one year. Normally, the period of
cyclical variations lies between 3 to 10 years.
Objectives of Measuring Cyclical Variations: The main objectives of measuring cyclical
variations are:
(a)
212
(b)
Notes
Self Assessment
Fill in the blanks:
4.
5.
6.
213
Research Methodology
Notes
Let Y1, Y2, ...... Yn be the n values of a time series for successive time periods 1, 2, ...... n respectively.
The calculation of 3-period and 4-period moving averages are shown in the following tables:
It should be noted that, in case of 3-period moving average, it is not possible to get the moving
averages for the first and the last periods. Similarly, the larger is the period of moving average
the more information will be lost at the ends of a time series.
When the period of moving average is even, the computed average will correspond to the
middle of the two middle most periods. These values should be centred by taking arithmetic
mean of the two successive averages. The computation of moving average in such a case is also
illustrated in the above table.
Year
: 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Production
('000' tonnes)
26
27
28
30
29
27
30
31
32
31
Example: Determine the trend values of the following data by using 3-year moving
average. Also find short-term fluctuations for various years, assuming additive model. Plot the
original and the trend values on the same graph.
Solution:
Calculation of Trend and Short-term Fluctuations
Years
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
214
Production
(Y )
26
27
28
30
29
27
30
31
32
31
81
27.00
0.00
85
28.33
0.33
87
29.00
1.00
86
28.67
0.33
86
28.67
1.67
88
29.33
0.67
93
31.00
0.00
94
31.33
0.67
Notes
Example: Assuming a four-yearly cycle, find the trend values for the following data by
the method of moving average.
Year
1979
1980
1981
1982
1983
1984
1985
74
100
97
87
90
115
126
Year
1986
1987
1988
1989
1990
1991
1992
108
100
125
118
113
122
126
Solution:
Calculation of Trend Values
Years
Scale
(Y)
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
74
100
97
87
90
115
126
108
100
125
118
113
122
126
4 - Year Moving
Total
...
...
358
374
389
418
439
449
459
451
456
478
479
...
...
Centered
Total
...
...
732
763
807
857
888
908
910
907
934
957
...
...
4 - Year Moving
Average (T)
...
...
91.50
95.38
100.88
107.13
111.00
113.50
113.75
113.38
116.75
119.63
...
...
smoothing parameter between 0 and 1 the larger the smoothing parameter, the greater
the weight given to the most recent demand
215
Research Methodology
Notes
Self Assessment
Fill in the blanks:
7.
8.
9.
..method is based on the principle that the total effect of periodic variations
at different points of time in its cycle gets completely neutralised.
2.
Subtract the mean from each data value to get the deviation from the mean
3.
4.
5.
Typically the point from which the deviation is measured is the value of either the median or the
mean of the data set.
|D| = |xi m(X)|
216
Notes
where
|D| is the absolute deviation,
xi is the data element
and m(X) is the chosen measure of central tendency of the data setsometimes the mean (), but
most often the median.
The average absolute deviation or simply average deviation of a data set is the average of the
absolute deviations and is a summary statistic of statistical dispersion or variability. It is also
called the mean absolute deviation, but this is easily confused with the median absolute deviation.
The average absolute deviation of a set {x 1, x2, ..., xn} is
S |x - x |
n
Did u know? What is median absolute deviation?
Self Assessment
Fill in the blanks:
10.
Typically the point from which the deviation is measured is the value of either the ..or
the .......of the data set.
11.
The Mean Absolute Deviation is a robust statistic, being more resilient to outliers in a data
set than the...
12.
The Mean Absolute Deviation can be used to estimate the scale parameter of distributions
for which the and standard deviation do not exist.
217
Research Methodology
Notes
which the estimator differs from the quantity to be estimated. The difference occurs because of
randomness or because the estimator doesnt account for information that could produce a more
accurate estimate.
The MSE is the second moment (about the origin) of the error, and thus incorporates both the
variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance. Like
the variance, MSE has the same unit of measurement as the square of the quantity being estimated.
In an analogy to standard deviation, taking the square root of MSE yields the root mean squared
error or RMSE, which has the same units as the quantity being estimated; for an unbiased
estimator, the RMSE is the square root of the variance, known as the standard error.
Mean squared error of an estimator b of true parameter vector B is:
MSE(b) = E[(b B)2]
which is also
MSE(b) = var(b) + (bias(b))(bias(b))
Among unbiased estimators, the minimal MSE is equivalent to minimizing the variance, and is
obtained by the MVUE. However, a biased estimator may have lower MSE. In statistical
modelling, the MSE is defined as the difference between the actual observations and the response
predicted by the model and is used to determine whether the model does not fit the data or
whether the model can be simplified by removing terms. Like variance, mean squared error has
the disadvantage of heavily weighting outliers. This is a result of the squaring of each term,
which effectively weights large errors more heavily than small ones. This property, undesirable
in many applications, has led researchers to use alternatives such as the mean absolute error, or
those based on the median.
Self Assessment
Fill in the blanks:
13.
.is the sum of the squared forecast errors for each of the observations
divided by the number of observations.
14.
Mean Squared Error of an estimator is one of many ways to quantify the amount by which
an estimator differs from the ..of the quantity being estimated.
15.
The is the amount by which the estimator differs from the quantity to be
estimated.
218
1.
2.
3.
To compare the pattern of seasonal variations of two or more time series in a given period
or of the same series in different periods.
4.
To eliminate the seasonal variations from the data. This process is known as
deseasonalisation of data.
Notes
The measurement of seasonal variation is done by isolating them from other components of a
time series. There are four methods commonly used for the measurement of seasonal variations.
These method are:
1.
2.
3.
4.
Notes In the discussion of the above methods, we shall often assume a multiplicative
model. However, with suitable modifications, these methods are also applicable to the
problems based on additive model.
Jan
46
45
42
Feb M ar Apr M ay
45 44
46
45
44 43
46
46
41 40
44
45
Jun
47
45
45
Jul Aug S ep
46 43 40
47 42 43
46 43 41
Oct Nov D ec
40 41 45
42 43 44
40 42 45
Solution:
Calculation Table
Year Jan
1987 46
1988 45
1989 42
Total 133
Ai
44. 3
S. I . 101. 4
Feb
45
44
41
130
43. 3
99.1
M ar Apr M ay Jun
Jul
44
46
45
47
46
43
46
46
45
47
40
44
45
45
46
127 136
136
137
139
42. 3 45. 3 45. 3 45. 7 46. 3
96. 8 103. 7 103. 7 104 . 6 105. 9
Aug
43
42
43
128
42. 7
97 . 7
Sep
40
43
41
124
41. 3
94. 5
Oct
40
42
40
122
40. 7
93.1
Nov D ec
41
45
43
44
42
45
126 134
42. 0 44. 7
96.1 102. 3
In the above table, Ai denotes the average and S.I. the seasonal index for a particular month of
S Ai
various years. To calculate the seasonal index, we compute grand average given by G =
=
12
524
A
= 437. Then the seasonal index for a particular month is given by S.I = i 100.
12
G
219
Research Methodology
Notes
Further, SS.I. = 1198.9 1200. Thus, we have to adjust these values such that their total is 1200.
This can be done by multiplying each figure by
1200
. The resulting figures are the adjusted
1198.9
Task Compute the seasonal index from the following data by the method of simple
averages.
Year
1980
1981
Quarter
I
II
III
IV
I
II
III
IV
Y
106
124
104
90
84
114
107
88
Year
1982
1983
Quarter
I
II
III
IV
I
II
III
IV
Y
90
112
101
85
76
94
91
76
Year
1984
1985
Quarter
I
II
III
IV
I
II
III
IV
Y
80
104
95
83
104
112
102
84
Obtain the trend values for each month or quarter, etc., by the method of least squares.
2.
Divide the original values by the corresponding trend values. This would eliminate trend
values from the data. To get figures in percentages, the quotients are multiplied by 100.
Thus, we have
3.
220
Y
T .S.R
100 =
100 = S.R.100
T
T
Notes
Example: Assuming that the trend is linear, calculate seasonal indices by the ratio to moving
average method from the following data:
Quarterly output of coal in 4 years
I
65
68
70
60
II
58
63
59
55
III
56
63
56
51
IV
61
67
52
58
Solution:
By adding the values of all the quarters of a year, we can obtain annual output for each of the four
years. Fit a linear trend to the data and obtain trend values for each quarter.
From the above table, a =
72
962
= 240.5 and b =
= 3.6
20
4
Thus, the trend line is Y = 240.5 3.6X, Origin : 1st January 1984, unit of X : 6 months.
The quarterly trend equation is given by
or Y =
240.5 3.6
months).
Shifting origin to 15th Feb. 1984, we get
Y = 60.13 0.45(X +
1
) = 59.9 0.45X, origin I-quarter, unit of X = 1 quarter.
2
I
63. 50
61. 70
59. 90
58.10
II
63. 05
61. 25
59. 45
57. 65
III
62. 60
60. 80
59. 00
57. 20
IV
62.15
60. 35
58. 55
56. 75
Y
100
T
I
II
III
IV
102. 36 91. 99 89. 46 98.15
110. 21 102. 86 103. 62 111. 02
116. 86 99. 24 94. 92 88. 81
103. 27 95. 40 89.16 102. 20
432. 70 389. 49 377.16 400.18
108.18 97. 37 94. 29 100. 05
108. 20 97. 40 94. 32 100. 08
399.89
= 99.97
4
221
Research Methodology
Notes
Example: Find seasonal variations by the ratio to trend method, from the following data:
Year
1975
1976
1977
1978
1979
I - Qr
30
34
40
54
80
II - Qr
40
52
58
76
92
III - Qr
36
50
54
68
86
IV - Qr
34
44
48
62
82
Solution:
First we fit a linear trend to the annual totals.
Years
1975
1976
1977
1978
1979
Total
Now a =
Annual Totals
(Y )
140
180
200
260
340
1120
XY
- 2
-1
- 280
- 180
0
1
2
0
0
260
680
480
4
1
0
1
4
10
1120
480
= 224 and b =
= 48
5
10
The trend equation is Y = 224 + 48X, origin : 1st July 1977, unit of X = 1 year.
The quarterly trend equation is Y =
224 48
X = 1 quarter.
Shifting the origin to III quarter of 1977, we get
Y = 56 + 3(X +
1
) = 57.5 + 3X
2
Table of Quarterly Trend Values
Year
1975
1976
1977
1978
1979
I
27. 5
39. 5
51. 5
63. 5
75. 5
II
30. 5
42. 5
54. 5
66. 5
78. 5
III
33. 5
45. 5
57. 5
69. 5
81. 5
IV
36. 5
48. 5
60. 5
72. 5
84. 5
Year
1975
1976
1977
1978
1979
Total
Ai
S. I .
Note that the Grand Average G =
I
II
III
IV
109.1 131.1 107. 5 93. 2
86.1 122. 4 109. 9 90. 7
77. 7 106. 4
93. 9 79. 3
85. 0 114. 3
97. 8 85. 5
106. 0 117. 2 105. 5 97. 0
463. 9 591. 4 514. 6 445. 7
92. 78 118. 28 102. 92 89.14
92.10 117. 35 102.11 88. 44
403.12
= 100.78. Also check that the sum of indices is 400.
4
222
Notes
Compute the moving averages with period equal to the period of seasonal variations.
This would eliminate the seasonal component and minimise the effect of random
component. The resulting moving averages would consist of trend, cyclical and random
components.
2.
The original values, for each quarter (or month) are divided by the respective moving
Y
TCSR
3.
Example: Given the following quarterly sale figures, in thousand of rupees, for the year
1986-1989, find the specific seasonal indices by the method of moving averages.
Year
1986
1987
1988
1989
I
34
37
39
42
II
33
35
37
41
III IV
34 37
37 39
38 40
42 44
Solution:
Calculation of Ratio to Moving Averages
34
33
34
37
37
35
37
39
39
37
38
40
42
41
42
44
4 - Period
Moving Total
138
141
143
146
148
150
152
153
154
157
161
165
169
279
284
289
294
298
302
305
307
311
318
34.9
35.5
36.1
Y
100
M
97.4
104.2
37.3
37.8
102.5
95.1
99.2
103.2
38.1
38.4
38.9
102.4
96.4
97.7
39.8
36.8
326
334
40.8
41.8
100.5
102.9
98.1
223
Research Methodology
Notes
Year
1986
1987
1988
1989
Total
Ai
S. I .
II
III
97. 4
99. 2
97. 7
IV
104. 2
103. 2
100. 5
102. 5 95.1
102. 4 96. 4
102. 9 98.1
307. 8 289. 6 294. 3 307. 9
102. 6 96. 5 98.1 102. 6
102. 7 96. 5 98.1 102. 7
Note that the Grand Average G =** = 99.95. Also check that the sum of indices is 400.
Compute the link relative (L.R.) of each period by dividing the figure of that period with
the figure of previous period. For example, link relative of 3rd quarter
=
2.
Obtain the average of link relatives of a given quarter (or month) of various years. A.M. or
Md can be used for this purpose. Theoretically, the later is preferable because the former
gives undue importance to extreme items.
3.
These averages are converted into chained relatives by assuming the chained relative of
the first quarter (or month) equal to 100. The chained relative (C.R.) for the current period
(quarter or month)
=
4.
Compute the C.R. of first quarter (or month) on the basis of the last quarter (or month).
This is given by
=
C.R. of last quarter (or month) average L.R. of 1st quarter (or month)
100
This value, in general, be different from 100 due to long term trend in the data. The
chained relatives, obtained above, are to be adjusted for the effect of this trend. The
adjustment factor is
224
d=
Notes
1
[New C.R. for 1st month 100] for quarterly data
4
and d =
1
[New C.R. for 1st month 100] for monthly data.
12
On the assumption that the trend is linear, d, 2d, 3d, etc., is respectively subtracted from
the 2nd, 3rd, 4th, etc., quarter (or month).
5.
Express the adjusted chained relatives as a percentage of their average to obtain seasonal
indices.
6.
Make sure that the sum of these indices is 400 for quarterly data and 1200 for monthly data.
Example: Determine the seasonal indices from the following data by the method of link
relatives:
Year
1985
1986
1987
1988
1989
1st Qr
26
36
40
46
42
2 nd Qr
19
29
25
26
28
3 rd Qr
15
23
20
20
24
4 th Qr
10
22
15 :
18
21
Solution:
Calculation Table
Year
1985
1986
1987
1988
1989
Total
Mean
C. R.
C . R .( adjusted )
S. I .
360. 0
181. 8
306. 7
233. 3
1081. 8
270. 5
100. 0
100. 0
170. 9
II
73.1
80. 5
62. 5
56. 5
66. 7
339. 3
67. 9
67. 9
62. 3
106. 5
III
78. 9
79. 3
80. 0
76. 9
85. 7
400. 8
80. 2
54. 5
43. 3
74. 0
IV
66. 7
95. 7
75. 0
90. 0
87. 5
414. 9
83. 0
45. 2
28. 4
48. 6
The chained relative (C.R.) of the 1st quarter on the basis of C.R. of the 4th quarter =
270.5 45.2
= 122.3
100
1
(122.3 100) = 5.6
4
225
Research Methodology
Notes
The seasonal index of a quarter =
Self Assessment
Fill in the blanks:
16.
If the time series data are in terms of annual figures, the are absent.
17.
..method is based on the assumption that the trend is linear and cyclical
variations are of uniform pattern.
18.
.method is used when cyclical variations are absent from the data
10.7 Summary
Stock prices, sales volumes, interest rates, and quality measurements are typical examples.
Because of the sequential nature of the data, special statistical techniques that account for
the dynamic nature of the data are required.
A time series is a sequence of data points, measured typically at successive times, spaced
at time intervals.
Time series analysis comprises methods that attempt to understand such time series, often
either to understand the underlying context of the data points, or to make forecasts.
Time series forecasting is the use of a model to forecast future events based on known past
events: to forecast future data points before they are measured.
There are four methods commonly used for the measurement of seasonal variations. They
are: Method of Simple Averages, Ratio to Trend Method, Ratio to Moving Average Method
and Method of Link Relatives
10.8 Keywords
Mean Squared Error: It is the sum of the squared forecast errors for each of the observations
divided by the number of observations.
Period of Oscillation: The time interval between the variations is known as the period of
oscillation.
226
Periodic Variations: The variations that repeat themselves after a regular interval of time.
Notes
Random Variations: The variations that do not reveal any regular pattern of movements.
Secular Trend: It is the general tendency of the data to increase or decrease or stagnate over a
long period of time.
Smart Discount Stores: There are 2117 Smart stores in the India (the chain is building up).
It is one of Indias most interesting discounters tracing its origins back to 1980s and the
opening of the first Smart store. At present Smart has reached an upgrading phase like
so many discounters before.
Given the data below, perform the indicated analyses.
2.
Year
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
Earnings
Per Share
19.0
17.5
20.7
28.4
27.4
23.9
21.1
16.1
8.5
11.1
Dividends
Per Share
9.9
9.5
9.0
8.1
6.8
5.0
3.0
2.4
2.2
1.9
Pre-tax
Margin
2.1
2.0
3.1
4.9
5.4
5.7
5.8
5.8
3.3
5.3
(a)
To what extent does the Board of directors regard dividend payments as a function
of earnings? Test whether there is a significant relationship between the variables.
Use a parametric analysis.
(b)
Find the linear forecasting equation that would allow you to predict dividend
payments based on earnings and test the significance of the slope.
(c)
Is there a significant difference in pre-tax margin when comparing the periods 19951999 and 1990-1994. Perform a non-parametric analysis. Explain the managerial
implications of your findings.
Big and Small Apples Employment figures in thousands for Neo-Classical City and suburbs
are given below. Perform the required analyses:
(a)
Using linear forecasts, predict the year in which employment will be the same for
the two locations.
(b)
(c)
Correlate the employment figures for the two areas using both parametric and nonparametric methods and test the significance of the correlation coefficients.
(d)
Fit a modified exponential trend to SUB data and discuss the results in terms of your
findings in (a) above.
(e)
Are NCC employment figures uniformly distributed over the period 1994 through
2000?
YEAR
1994
1995
1996
1997
1998
1999
2000
NYC
64.1
60.2
59.2
59.0
57.6
54.4
50.9
SUB
20.7
21.4
22.1
23.8
24.5
26.3
26.5
227
Research Methodology
Notes
3.
4.
Determine the seasonal indices from the following data by the method of link relatives:
5.
6.
I Qtr
II Qtr
III Qtr
IV Qtr
2000
42
44
40
38
2001
36
38
36
34
2002
48
50
48
40
2003
38
42
40
38
Find seasonal variations by the ratio to trend method, from the following data:
Year
I Qtr
II Qtr
III Qtr
IV Qtr
2000
40
44
42
40
2001
36
40
38
36
2002
48
52
46
42
2003
38
42
40
38
Assuming that trend and cyclical variations are absent, compute the seasonal index for
each month of the following data of sales (in 000) of a company:
Year
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
2001
38
36
36
38
34
34
32
30
32
34
32
32
2002
34
32
32
32
30
34
30
32
30
34
30
34
2003
32
30
30
34
32
32
34
34
32
36
34
36
7.
Why does minimizing the MSE remain a key criterion in selection estimators?
8.
How would you estimate the scale parameter of distributions for which the variance and
standard deviation do not exist?
9.
What will be the effect on the moving averages if the trend is non-linear?
10.
Secular trend is that irreversible movement which continues, in general, in the same
direction for a considerable period of time. Comment.
11.
Fit a trend line to the following data by method of semi average and forecast the sales for
2007.
12.
228
Year
Year
2000
105
2001
106
2002
116
2003
120
2004
114
2005
128
2006
134
Using the 3 yearly moving averages, determine the trend values and also the short-term
error.
13.
Year
2003
30
2004
31
2005
32
2006
34
2007
33
2008
29
Notes
Quarter
Seasonal Index
January to March
90
April to June
80
July to September
71
October to December
120
If the total sales of woolen garments in the 1st Quarter is worth 45,000 how much worth
of woolen garments should be kept in stock to meet the demand for remaining quarters,
anticipating winter season.
14.
Year
2000
2001
2002
2003
2004
2005
2006
2007
7.5
7.8
8.2
8.2
8.4
8.5
8.7
9.1
Actual Production
in (000) quintals
long-term
2.
quantified
3.
seasonal influences
4.
secular trend
5.
Seasonal Variations
6.
short-term
7.
historical
8.
Exponential smoothing
9.
Moving average
10.
median, mean
11.
standard deviation
12.
variance
13.
14.
true value
15.
error
16.
seasonal variations
17.
Link relatives
18.
Ratio to trend
Books
229
Research Methodology
Notes
David & Moae, Introduction to the Practice of Statistics, W.H. Freeman & Co., February
2005.
James T. McClave Terry Sincich, William Mendenhall, Statistics, Prentice Hall,
February 2005.
Mario F. Triola, Elementary Statistics, Addison-Wesley, January 2006.
Mark L. Berenson, David M. Revine, Tineothy C. Krehbiel, Basic Business Statistics:
Concepts & Applications, Prentice Hall, May 2005.
230
Notes
CONTENTS
Objectives
Introduction
11.1 Definitions and Characteristics of Index Numbers
11.2 Uses of Index Numbers
11.3 Construction of Index Numbers
11.4 Price Index Numbers
11.4.1
11.6.2
Objectives
After studying this unit, you will be able to:
Introduction
An index number is a statistical measure used to compare the average level of magnitude of a
group of distinct but related variables in two or more situations. Suppose that we want to
compare the average price level of different items of food in 1992 with what it was in 1990. Let
the different items of food be wheat, rice, milk, eggs, ghee, sugar, pulses, etc. If the prices of all
these items change in the same ratio and in the same direction; assume that prices of all the items
have increased by 10% in 1992 as compared with their prices in 1990; then there will be no
difficulty in finding out the average change in price level for the group as a whole. Obviously,
231
Research Methodology
Notes
the average price level of all the items taken as a group will also be 10% higher in 1992 as
compared with prices of 1990. However, in real situations, neither the prices of all the items
change in the same ratio nor in the same direction, i.e., the prices of some commodities may
change to a greater extent as compared to prices of other commodities. Moreover, the price of
some commodities may rise while that of others may fall. For such situations, the index numbers
are very useful device for measuring the average change in prices or any other characteristics
like quantity, value, etc. for the group as a whole.
An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations.
Karmel and Polasek
2.
3.
Index number shows by its variation the changes in a magnitude which is not susceptible
either of accurate measurement in itself or of direct valuation in practice.
Edgeworth
4.
An index number is a single ratio (usually in percentage) which measures the combined
(i.e., averaged) change of several variables between two different times, places or
situations.
Tuttle
On the basis of the above definitions, the following characteristics of index numbers are worth
mentioning:
1.
Index numbers are specialised averages: As we know that an average of data is its
representative summary figure. In a similar way, an index number is also an average,
often a weighted average, computed for a group. It is called a specialised average because
the figures, that are averaged, are not necessarily expressed in homogeneous units.
2.
Index numbers measure the changes for a group which are not capable of being directly
measured: The examples of such magnitudes are: Price level of a group of items, level of
business activity in a market, level of industrial or agricultural output in an economy, etc.
3.
Index numbers are expressed in terms of percentages: The changes in magnitude of a group
are expressed in terms of percentages which are independent of the units of measurement.
This facilitates the comparison of two or more index numbers in different situations.
Self Assessment
Fill in the blanks:
232
1.
Index numbers are called a specialised average because the figures, that are averaged, are
not necessarily expressed in ..units.
2.
Notes
To measure and compare changes: The basic purpose of the construction of an index number
is to measure the level of activity of phenomena like price level, cost of living, level of
agricultural production, level of business activity, etc. It is because of this reason that
sometimes index numbers are termed as barometers of economic activity. It may be
mentioned here that a barometer is an instrument which is used to measure atmospheric
pressure in physics.
The level of an activity can be expressed in terms of index numbers at different points of
time or for different places at a particular point of time. These index numbers can be easily
compared to determine the trend of the level of an activity over a period of time or with
reference to different places.
2.
To help in providing guidelines for framing suitable policies: Index numbers are
indispensable tools for the management of any government or non-government
organisation.
Example: The increase in cost of living index is helpful in deciding the amount of
additional dearness allowance that should be paid to the workers to compensate them for the
rise in prices. In addition to this, index numbers can be used in planning and formulation of
various government and business policies.
3.
Price index numbers are used in deflating: This is a very important use of price index
numbers. These index numbers can be used to adjust monetary figures of various periods
for changes in prices.
Example: The figure of national income of a country is computed on the basis of the
prices of the year in question. Such figures, for various years often known as national income at
current prices, do not reveal the real change in the level of production of goods and services. In
order to know the real change in national income, these figures must be adjusted for price
changes in various years. Such adjustments are possible only by the use of price index numbers
and the process of adjustment, in a situation of rising prices, is known as deflating.
4.
To measure purchasing power of money: We know that there is inverse relation between
the purchasing power of money and the general price level measured in terms of a price
index number. Thus, reciprocal of the relevant price index can be taken as a measure of the
purchasing power of money.
Self Assessment
Fill in the Blanks:
3.
4.
233
Research Methodology
Notes
Item
(in Rs/unit)
1. Wheat
2. Rice
3. Milk
4. Eggs
5. Ghee
6. Sugar
300/quintal
12/kg.
7/litre
11/dozen
80/kg.
9/kg.
360/quintal
15/kg.
8/litre
12/dozen
88/kg.
10/kg.
7. Pulses
14/kg.
16/kg.
The comparison of price of an item, say wheat, in 1992 with its price in 1990 can be done in two
ways, explained below:
1.
By taking the difference of prices in the two years, i.e., 360 - 300 = 60, one can say that the
price of wheat has gone up by 60/quintal in 1992 as compared with its price in 1990.
2.
360
= 1.20, one can say that if the price of wheat
300
in 1990 is taken to be 1, then it has become 1.20 in 1992. A more convenient way of
comparing the two prices is to express the price ratio in terms of percentage, i.e.,
360
100 = 120 , known as Price Relative of the item. In our example, price relative of
300
wheat is 120 which can be interpreted as the price of wheat in 1992 when its price in 1990
is taken as 100. Further, the figure 120 indicates that price of wheat has gone up by 120 100
= 20% in 1992 as compared with its price in 1990.
The first way of expressing the price change is inconvenient because the change in price depends
upon the units in which it is quoted. This problem is taken care of in the second method, where
price change is expressed in terms of percentage. An additional advantage of this method is that
various price changes, expressed in percentage, are comparable. Further, it is very easy to grasp
the 20% increase in price rather than the increase expressed as 60/quintal.
For the construction of index number, we have to obtain the average price change for the group
in 1992, usually termed as the Current Year, as compared with the price of 1990, usually called
the Base Year. This comparison can be done in two ways:
1.
By taking suitable average of price relatives of different items. The methods of index
number construction based on this procedure are termed as Average of Price Relative
Methods.
2.
By taking ratio of the averages of the prices of different items in each year. These methods
are popularly known as Aggregative Methods.
Since the average in each of the above methods can be simple or weighted, these can further be
divided as simple or weighted. Various methods of index number construction can be classified
as shown below:
Methods of Index Number Construction
Simple Average
of Price Relatives
Methods
234
Weighted Average
of Price Relatives
Methods
Aggregative Methods
Simple
Aggregative
Methods
Weighted
Aggregative
Methods
In addition to this, a particular method would depend upon the type of average used. Although,
geometric mean is more suitable for averaging ratios, arithmetic mean is often preferred because
of its simplicity with regard to computations and interpretation.
Notes
q
p1i
100 and quantity relative of the i th item as Q i = 1i 100 .
q 0i
p0i
Further, P01 will be used to denote the price index number of period 1 as compared with
the prices of period 0. Similarly, Q 01 and V01 would denote the quantity and the value
index numbers respectively of period 1 as compared with period 0.
Self Assessment
Fill in the blanks:
5.
6.
P
=
p1i
or P01 =
p1
100
0i
Omitting the
100
235
Research Methodology
Notes
1
n
P01 = P1 P2 ..... Pn ) n = Pi
i=1
1
n
log Pi
= Antilog
Example: Given below are the prices of 5 items in 1985 and 1990. Compute the simple price
index number of 1990 taking 1985 as base year. Use (a) arithmetic mean and (b) geometric mean.
Item
1
2
3
4
5
(Rs/unit)
20
7
200
60
100
300
110
130
Solution:
Calculation Table
Price Relative
Item
Price in
1985 (P0i)
Price in
1990 (P0i)
15
20
Pi =
log Pi
1 3 3 .3 3
2 .1 2 4 9
8 7 .5 0
1 .9 4 2 0
200
300
1 5 0 .0 0
2 .1 7 6 1
60
110
1 8 3 .3 3
2 .2 6 3 2
100
130
1 3 0 .0 0
2 .1 1 3 9
6 8 4 .1 6
1 0 .6 2 0 1
Total
p1i
100
p0i
684.16
= 136.83 and Index number, using G.M., is
5
10.6201
P01 = Antilog
= 133.06
5
Pw
w
i
236
Similarly, the index number, given by the weighted geometric mean of price relatives can be
written as follows:
Notes
1
1
w log Pi
w1 w2
w n wi
w i w i or P01 = Antilog i
P01 = P1 .P2 P
= Pi
n
wi
Nature of Weights
While taking weighted average of price relatives, the values are often taken as weights. These
weights can be the values of base year quantities valued at base year prices, i.e., p 0iq0i, or the
values of current year quantities valued at current year prices, i.e., p 1iq1i, or the values of current
year quantities valued at base year prices, i.e., p 0iq1i, etc., or any other value.
Example: Construct an index number for 1989 taking 1981 as base for the following data, by
using
1.
2.
Prices in Prices in
Weights
1981
1989
60
100
30
20
20
20
40
60
24
100
120
30
120
80
10
Solution:
Calculation Table
Item
1
2
3
4
5
Total
Price Relative
Price in Price in
p
1985 (P0i ) 1990 (P0i ) Pi = 1i 100
p0i
15
8
200
60
100
20
7
300
110
130
133.33
87.50
150.00
183.33
130.00
684.16
log Pi
2.1249
1.9420
2.1761
2.2632
2.1139
10.6201
14866.8
= 130.41 and index number using G.M. is
114
239.498
P01 = Antilog
= 126.15
114
237
Research Methodology
Notes
Task Taking 1983 as base year, calculate an index number of prices for 1990, for the
following data given in appropriate units, using:
1.
2.
Weighted geometric mean of price relatives by taking weights as the values of base
year quantities at base year prices.
1i
1i
and
0i
p
p
1i
100
0i
n
Omitting the subscript i, the above index number can also be written as P01 =
p
p
100
Example: The following table gives the prices of six items in the years 1980 and 1981. Use
simple aggregative method to find index of 1981 with 1980 as base.
Item
238
Price in
Price in
1980 ( )
1981 ( )
A
B
C
40
60
20
50
60
30
D
E
F
50
80
100
70
90
100
Notes
Solution:
Let p0 be the price in 1980 and p 1 be the price in 1981. Thus, we have
400
100 = 114.29
350
p w
w
1i
p w
w
0i
p w
w
=
p w
w
1i
100 =
0i
p
p
1i
wi
0i
wi
100
p w 100
p w
1
Nature of Weights
In case of weighted aggregative price index numbers, quantities are often taken as weights.
These quantities can be the quantities purchased in base year or in current year or an average of
base year and current year quantities or any other quantities. Depending upon the choice of
weights, some of the popular formulae for weighted index numbers can be written as follows:
1.
Laspeyres' Index: Laspeyres' price index number uses base year quantities as weights.
Thus, we can write
P01La =
2.
1i
q 0i
0i q 0i
100 or P01La =
p q
p q
1
100
Paasche's Index: This index number uses current year quantities as weights. Thus, we can
write
P01Pa =
3.
p
p
p
p
1i
q 1i
0i q 1i
100 or P01Pa =
p q
p q
1
100
Fisher's Ideal Index: As will be discussed later that the Laspeyres's Index has an upward
bias and the Paasche's Index has a downward bias. In view of this, Fisher suggested that an
ideal index should be the geometric mean of Laspeyres' and Paasche's indices. Thus, the
Fisher's formula can be written as follows:
239
Research Methodology
Notes
p q
p q
If we write L =
p q
p q
1
100
and P =
p q
p q
1
p q
p q
1
100 =
p q
p q
1
p q
p q
1
100
P01 = L P 100 .
4.
Dorbish and Bowley's Index: This index number is constructed by taking the arithmetic
mean of the Laspeyres's and Paasche's indices.
P01DB =
5.
q0 + q1
2
=
100 =
q +q
p0 0 2 1
1 p 1q 0
100 =
+
2 p0 q 0
p q
p q
1
100 = 1 [L P] 100
2
1
p (q
p (q
1
+ q1 )
+ q1 )
100 =
p q + p q
p q + p q
1
100
Walsh's Index: Geometric mean of base and current year quantities are used as weights in
this index number.
P01Wa =
7.
p q
p q
Marshall and Edgeworth's Index: This index number uses arithmetic mean of base and
current year quantities.
P01ME
6.
1 p 1q 0
100 +
2 p0 q 0
p
p
q 0q 1
q 0q 1
100
Kelly's Fixed Weights Aggregative Index: The weights, in this index number, are quantities
which may not necessarily relate to base or current year. The weights, once decided,
remain fixed for all periods. The main advantage of this index over Laspeyres's index is
that weights do not change with change of base year. Using symbols, the Kelly's Index can
be written as
P01Ke =
p q 100
p q
1
Example: Calculate the weighted aggregative price index for 1990 from the following data
:
Item
A
B
C
D
E
F
240
Price in Price in
Weights
1971
1990
8
9.5
5
12
12.5
1
6.5
9
3
4
4.5
6
6
7
4
2
4
3
Notes
Solution:
Calculation Table
Price in Price in
Weights
1971 (p0 ) 1990 (p1 )
(w)
Item
A
B
C
D
E
8
12
6.5
4
6
9.5
12.5
9
4.5
7
5
1
3
6
4
F
Total
p0w
p 1w
40.0
12.0
19.5
24.0
24.0
47.5
12.5
27.0
27.0
28.0
6.0 12.0
125.5 154.0
154.0
100 = 122.71
125.5
The term within bracket, i.e., 1971 = 100, indicates that base year is 1971.
Real Wage=
Money Wage
100
Consumer Price Index
.... (1)
Another application of the process of deflating to find the value of output at constant prices so as
to facilitate the comparison of real changes in output. It may be pointed out here that the output
of a given year is often valued at the current year prices. Since prices in various years are often
different, the comparison of output at current year prices has no relevance.
The output at constant prices is obtained using the following formula.
.... (2)
Example: The following table gives the average monthly wages of a worker along with
the respective consumer price index numbers for ten years.
Years
Average monthly
wages ( )
Consumer Price
Index
: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
:
500
525
560
600
630
635
700
740
800
900
100
110
120
125
135
160
185
200
210
240
241
Research Methodology
Notes
Solution:
Computation of Real Wages
Years
Average Monthly
wage
Consumer Price
Index
1980
500
100
1981
525
110
1982
560
120
1983
600
125
1984
630
135
1985
635
160
1986
700
185
1987
740
200
1988
800
210
1989
900
240
Real average
monthly wage
500
100
525
110
560
120
600
125
630
135
635
160
700
185
740
200
800
210
900
240
100 = 500.00
100 = 477.27
100 = 466.67
100 = 480.00
100 = 466.67
100 = 396.88
100 = 378.38
100 = 370.00
100 = 380.95
100 = 375.00
100
Current Rupee100
=
Price Index
Price Index
Price Index=
100
Constant Rupee
Example: Given the following information on the Gross Domestic Product (in crores)
at the constant (1980 - 81) prices and at current prices for five years. Calculate the series of price
index numbers and of quantity index numbers for each of the five years with 1980 - 81 as base
year.
242
1980-81
200
200
1981-82
1982-83
150
125
240
350
1983-84
1984-85
120
160
360
400
Notes
Solution:
Calculation of Price and Quantity Index Numbers
Year
1980-81
1981-82
1982-83
1983-84
1984-85
GDP at GDP at
Quantity Index Price Index*
constant current
Number Series Number Series
Prices
Prices
200
200
200
100
100=100
200
150
240
150
240
100=75
100=160
200
150
125
350
125
350
100=62.5
100=280
200
125
120
360
120
360
100=60
100=300
200
120
160
400
160
400
100=80
100=250
200
160
Price Index=
The concept of deflating can be used to determine the purchasing power or real value of a
rupee.
Self Assessment
Fill in the blanks:
7.
In case of weighted aggregative price index numbers, quantities are often taken
as..
8.
Q=
q1
100
q0
1.
2.
q
q
100
(a)
Taking A.M. Q
01
q
q
100
Q
n
243
Research Methodology
log Q
Notes
(b)
3.
(a)
Q Pa
01 =
(b)
q p
q p
q p
q p
1
q p
q p
Q Fi01 =
(c)
q p
q p
1
similar way.
4.
Qw
w
(a)
Taking A.M. Q 01 =
(b)
w log Q
1974
1976
Price (Rs) Value (Rs) Price (Rs) Value (Rs)
Article
A
B
C
5
8
6
50
48
18
4
7
5
48
49
20
Solution:
Calculation Table
1974
Article
A
B
C
244
q p
q p
1
q0 =
V0
p0
p0 q 1
p 1q 0
50
10
4 48
12
60
40
8
6
48
18
6
3
7 49
5 20
7
4
56
24
42
15
= 117
140
97
q p
q p
1
= 116
100 =
V1
V1
p1
V0
p q
p1
q1 =
p0
Total
Q Fi01 =
1976
p q
1
140 117
= 120.65
116 97
Notes
Self Assessment
Fill in the blanks:
9.
10.
The formulae for quantity index numbers can be directly written from
.simply by interchanging the role of price and quantity.
Scope and Coverage: The scope of consumer price index, proposed to be constructed, must
be very clearly defined. This implies the identification of the class of people for whom the
index will be constructed such as industrial workers, agricultural workers, urban wage
earners, etc. Further, it is also necessary to define the coverage of the class of people, i.e.,
the definition of geographical location of their stay such as a city or two or more villages,
etc. The selected class of people should form a homogeneous group so that weights of
various commodities are same for all the people.
2.
Selection of Base Period: A normal period having comparative economic stability should
be selected as a base period in order that the consumption pattern used in the construction
of the index remain practically stable over a fairly long period.
3.
Conducting Family Budget Enquiry: A family budget gives the details of expenditure
incurred by the family on various items in a given period. In order to estimate the
consumption pattern, a sample survey of family budgets of the group of people, for whom
the index is to be constructed, is conducted and from this an average family budget is
prepared. The goods and services that are to be included in the construction of the index
are selected from this average family budget. Efforts should be made to include as many
commodities as possible. Generally the commodities are divided into five broad groups:
(i) Food, (ii) Clothing, (iii) Fuel and Lighting, (iv) House Rent and (v) Miscellaneous.
If necessary, these groups may further be divided into sub-groups. Percentage expenditure
of a group is taken as its weight.
4.
Obtaining Price Quotations: The next step in the construction of consumer price index is
to obtain the retail price quotations of various items that are selected. The price quotations
should be obtained from those markets from which the group of people, for whom the
index number is being constructed, normally make purchases. The quality of various
goods and services used by the group of people should also be kept in mind while obtaining
price quotations.
245
Research Methodology
Notes
5.
Computation of the Index Number: After the collection of necessary data, the consumer
price index can be computed by using either of the following formulae.
(a)
Aggregate Expenditure Method: Base year quantities are taken as weights in the
aggregate expenditure method. The formula for the consumer price index is given
CP
by p01
=
(b)
p1 p0
100 which is the Laspeyress formula.
p0 p0
Family Budget Method: This method is also known as weighted average of price
relatives method and accordingly values are taken as weights. The formula for the
CP
consumer price index is given by P01
=
Pw , where p = p1 100
p0
w
Example: From the information given below, construct the consumer price index number
of 1985 by (i) Aggregate Expenditure Method, and (ii) Family Budget Method.
Commodities Quantities (q 0 )
A
2
B
25
C
10
D
5
E
25
F
40
G
1
Solution:
P=
p1
100 w = p0 q0
p0
Com.
p0 q0
p1q0
A
B
C
D
150
300
120
50
250
400
160
75
166.67
133.33
133.33
150.00
150
300
120
50
25000.5
39999.0
15999.6
7500.0
E
F
112.5 125
400
480
111.11
120.00
112.5
400
12499.9
48000.0
G
25
40
Total 1157.5 1530
160.00
1.
2.
Pw
25
4000.0
1157.5 152999.0
1530
100 132.18
1157.5
152999
132.18
1157.5
246
A consumer price index is used to determine the real wages from money wages and the
purchasing power of money.
2.
It is also used to determine the dearness allowance to compensate the workers for the rise
in prices.
3.
4.
Notes
Example: A particular series of consumer price index covers five groups of items. Between
1975 and 1980 the index rose from 180 to 225. Over the same period the price index numbers of
various groups changed as follows:
Food from 198 to 252; clothing from 185 to 205; fuel and lighting from 175 to 195; miscellaneous
from 138 to 212; house rent remained unchanged at 150.
Given that the weights of clothing , house rent and fuel and lighting are equal, determine the
weights for individual groups of items.
Solution:
Let w1% be the weight of food, w2% be the weight of miscellaneous group and w% be the weight
of each of the remaining three groups. Therefore we can write w 1 + w2 + 3w = 100 or w 2 = 100
w1 3w.
The given data can be written in the form of table as given below:
Groups
Weights
Index in Index in
1975 (I1 ) 1980 (I 2 )
w1
Food
w
Clothing
w
Fuel & Lighting
w
House Rent
100 - w 1 - 3w
Miscellaneous
Total
198
185
175
150
138
252
205
195
150
212
100
.... (1)
= 225 (given)
.... (2)
Self Assessment
Fill in the blanks:
11.
12.
247
Research Methodology
Notes
Definition of the purpose: Since it is possible to construct index numbers for a number of
purposes and one cannot have an all purpose index, therefore, it is very essential to define
the specific purpose of its construction. For example, if we are interested in the construction
of a price index number, we must have knowledge about the purpose to be served by it,
i.e., what is to be measured by it; like the cost of living of workers or the change in
wholesale prices, etc. In the absence of this information, it may be difficult to carry out
various steps in the construction of an index number. The questions like what are items to
be included, from which of the markets the price quotations are to be obtained, what will
be the weights of different items, etc., cannot be answered unless the purpose of the index
number construction is known. Further, an index number can be of sensitive or general
nature. In case of sensitive index, only those items are included whose variables (like
prices in case of price index) fluctuate very often; while efforts are made to include as
many items as possible when the index is of general nature. It may be pointed out that the
index numbers are specialised tools and as such are more useful and efficient when properly
used. The first step in this direction is a specific definition of the purpose of its construction.
2.
Selection of the base period: Every index number is constructed with reference to a base
period. There are two important points that must be kept in mind while selecting the base
period of an index number.
(a)
The base period should correspond to a period of relative economic and political
stability, i.e., it should be a normal or representative period in some way. In certain
situations where identification of such a period is not possible, the average of certain
periods can also be taken as base.
(b)
The comparison of current period with a remote base doesnt have much relevance.
In the words of Morris Hamburg, It is desirable that the base period be not too far
away in time from the present. The further away we move from the base period the
dimmer are our recollections of economic conditions prevailing at that time.
Consequently, comparisons with these remote base periods tend to lose significance
and become rather tenuous in meaning.
Another problem with a remote base period can be that certain items that were in
use in the base period are no longer in use while certain new items are in use in
current period. In such a situation the two item bundles are no longer homogeneous
and comparable. This problem is less likely to occur when fairly recent period is
chosen as base.
!
Caution The base period should not be too distant from the current period.
3.
Selection of number and type of items: An index number of a particular group of items is
in fact based on a sample of items taken from it. It is neither possible nor necessary to
include all the items of the group in the construction of an index number. The number of
items to be included depends largely upon the purpose of the index number.
There are no hard and fast rules that can be laid down with regard to the selection of the
number of items, however, it must be remembered that more is the number of items the
more representative will be the index number and more cumbersome will be the task of
computations. Therefore, it is necessary to have some sort of balance between having a
representative index and the work of computation involved in its construction.
248
The following points should be kept in mind in selecting the type of items:
4.
(a)
The items should be representative of the tastes, habits and customs of the people
for whom the index is to be constructed.
(b)
The selected items should be of stable quality. The standardised items should be
given preference.
(c)
As far as possible, the non-tangible items like personal services, goodwill, etc.,
should be excluded because it is difficult to ascertain their value.
Notes
Collection of data: The next important step in the construction of an index number is the
collection of data. For example, for the construction of price index, price quotations are to
be obtained. Since the prices of commodities may vary from one market to another and in
certain cases from one shop to another, it is necessary to select those markets which are
representative in the sense that the group under consideration generally make purchases
from these markets. The next logical step is to select an agency through which price
quotations are to be obtained. The selected agency should be highly reliable and if necessary
the accuracy of price quotations reported by it may also be checked by appointing some
other agency or agencies. Furthermore, care should always be taken to obtain price
quotations for the same quality of items.
Similar type of considerations are necessary for the collection of data for the construction
of index numbers such as quantity index, value index, unemployment index, etc.
5.
Selection of a suitable average: Since the index numbers are also averages, any of the five
averages, viz. arithmetic mean, median, mode, geometric mean and harmonic mean can
be used in its construction. However, since in most of the situations we have to average
ratios of the values in current period to that in base period, geometric mean is the most
suitable average in the construction of index numbers. The main difficulty of using the
geometric mean is the complexities of its computations and hence, the use of arithmetic
mean is more popular in spite of its being less suitable.
6.
Self Assessment
Fill in the blanks:
13.
14.
The basic purpose of ..is to enable each item to have an influence, on the index
number, in proportion to its importance in the group.
249
Research Methodology
Notes
The computation of an index number is based on the data obtained from a sample, which
may not be a true representative of the universe.
2.
The composition of the bundle of commodities may be for different years. This cannot be
taken into account by the fixed base method. Although this difficulty can be overcome by
the use of chain base index numbers, but their calculations are quite cumbersome.
3.
An index number doesnt take into account the quality of the items. Since a superior item
generally has a higher price and the increase in index may be due to an improvement in
the quality of the items and not due to rise of prices.
4.
Index numbers are specialised averages and as such these also suffer from all the limitations
of an average.
5.
An index number can be computed by using a number of formulae and different formulae
will give different results. Unless a proper method is used, the results are likely to be
inaccurate and misleading.
6.
By the choice of a wrong base period or weighing system, the results of the index number
can be manipulated and, thus, are likely to be misused.
Self Assessment
Fill in the blanks:
15.
An index number doesnt take into account the ..of the items.
16.
11.9 Summary
An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations
P01 =
p1
p0
n
(using A.M.)
p1
log p 100
0
P01 = Antilog
n
p1 100
p0
P01 =
250
(using G.M.)
Pw
w
Notes
w log P
P01 Antilog
w
Here P
p1
100 and w denotes values (weights)
p0
p q
p q
1 0
100
0 0
(b)
p q
p q
1 1
100
0 1
(c)
p q p q
p q p q
1 0
1 1
0 0
0 1
DB
1 p1q0 p1q1
100
2 p0 q0 p0 q1
(d)
(e)
(f)
(g)
p
p
q0 q1
q0 q1
100
p q p q
p q p q
1 0
1 1
0 0
0 0
100
100
p q 100
p q
1
Money Wage
100
C .P .I .
Real Wage =
11.10 Keywords
Base Year: The year from which comparisons are made is called the base year. It is commonly
denoted by writing 0 as a subscript of the variable.
Consumer Price: It is the price at which the ultimate consumer purchases his goods and services
from the retailer.
Current Year: The year under consideration for which the comparisons are to be computed is
called the current year. It is commonly denoted by writing 1 as a subscript of the variable.
Index Number: An index number is a statistical measure used to compare the average level of
magnitude of a group of distinct but related variables in two or more situations.
251
Research Methodology
Notes
Quantity Index Number: Index number that measures the change in quantities in current year as
compared with a base year.
Construct Laspeyres's, Paasche's and Fisher's indices from the following data :
1986
1987
Item Price (Rs) Expenditure (Rs) Price (Rs) Expenditure (Rs)
1
10
60
15
75
2
12
120
15
150
3
18
90
27
81
4
8
40
12
48
2.
From the following data, prove that Fisher's Ideal Index satisfies both the time reversal
and the factor reversal tests.
Base Year
Current Year
Examine various steps and problems involved in the construction of an index number.
4.
Distinguish between average type and aggregative type of index numbers. Discuss the
nature of weights used in each case.
5.
6.
252
116.40
125.08
135.40
138.10
127.4
138.2
143.5
149.8
(a)
(b)
In which year did the employees had the greatest buying power?
(c)
What percentage increase in the average weekly wages for the year 1973 is required
to provide the same buying power that the employees enjoyed in the year in which
they had the highest real wages?
Construct Consumer Price Index for the year 1981 with 1971 as the base year.
: Food Rent Clothes Fuel Others
Items
20%
10%
20%
Percentage Expenses : 35% 15%
50
100
20
60
Value Index (1971) : 150
60
125
25
90
Value Index (1981) : 174
7.
Compute consumer price index number from the following data by aggregate expenditure.
Commodity
Quantities
consumed in
base year
Units in
which prices
are quoted
kgs
Prices in
Prices in
base year current year
Wheat
400
/quintal
350
400
Rice
2 quin tals
/quintal
580
700
Gram
100
/quintal
740
950
Pulses
2 quin tals
980
1200
Ghee
50 kgs
/quintal
/ kg .
70
85
Sugar
50 kgs
/ kg .
Fire wood
5 quintals
/quin tal
House Rent
1 house
/house
kgs
Notes
11
50
60
1600
1800
8.
A textile worker in the city of Ahmedabad earns 750 per month. The cost of living index
for January 1986 is given as 160. Using the following data find out the amounts he spends
on (i) Food and (ii) Rent.
9.
"In the construction of index numbers the advantages of geometric mean are greater than
those of arithmetic mean". Discuss.
10.
Show that the Laspeyres's index has an upward bias and the Paasche's index has a downward
bias. Under what conditions the two index numbers will be equal?
homogeneous
2.
weighted
3.
barometers
4.
level of an activity
5.
base year
6.
Current year
7.
weights
8.
price
9.
10.
11.
cost of living
12.
real wages
13.
base
14.
weighing
15.
quality
16.
different
Books
253
Research Methodology
Notes
12.1.2
Significance Level
12.3.2
One-way ANOVA
12.5.2
Two-way ANOVA
12.6.2
12.6.3
K Sample Test
12.7 Summary
12.8 Keywords
12.9 Review Questions
12.10 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
A statistical hypothesis test is a method of making statistical decisions using experimental data.
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.
254
The phrase test of significance was coined by Ronald Fisher: Critical tests of this kind may be
called tests of significance, and when such tests are available we may discover whether a second
sample is or is not significantly different from the first.
Notes
Formulate the null hypothesis, with H 0 and HA, the alternate hypothesis. According to the
given problem, H0 represents the value of some parameter of population.
2.
3.
4.
5.
6.
If the calculated value lies within the critical region, then reject H 0.
7.
Null hypothesis
2.
Alternate hypothesis
Let us assume that the mean of the population is m0 and the mean of the sample is x. Since we
have assumed that the population has a mean of m0, this is our null hypothesis. We write this as
H0m = m0, where H0 is the null hypothesis. Alternate hypothesis is H A = m. The rejection of null
hypothesis will show that the mean of the population is not m0. This implies that alternate
hypothesis is accepted.
255
Research Methodology
Notes
Example:
1.
In a right side test, the critical region lies entirely in the right tail of the sample distribution.
Whether the test is one-sided or two-sided depends on alternate hypothesis.
2.
A tyre company claims that mean life of its new tyre is 15,000 km. Now the researcher
formulates the hypothesis that tyre life is = 15,000 km.
A two-tailed test is one in which the test statistics leading to rejection of null hypothesis falls on
both tails of the sampling distribution curve as shown. One-tailed test is used when the researcher's
interest is primarily on one side of the issue.
Example: "Is the current advertisement less effective than the proposed new advertisement"?
A two-tailed test is appropriate, when the researcher has no reason to focus on one side of the
issue.
Example:
1.
"Are the two markets - Mumbai and Delhi different to test market a product?"
2.
H0 = m1 = m2
Sign of alternate hypothesis
=
256
Type of test
Two-sided
<
One-sided to right
>
One-sided to left
Notes
Degree of Freedom
It tells the researcher the number of elements that can be chosen freely.
Example: a + b/2 = 5. fix a = 3, b has to be 7.
Therefore, the degree of freedom is 1.
Compute
Carry out computation.
Make Decisions
Accepting or rejecting of the null hypothesis depends on whether the computed value falls in the
region of rejection at a given level of significance.
Task Discuss when would you prefer two tailed test to one tailed test.
Self Assessment
Fill in the blanks:
1.
2.
The confidence with which a null hypothesis is accepted or rejected depends upon the
.............................
3.
The rejection of null hypothesis means that the ............................. hypothesis is accepted.
2.
(1) is called Type 1 error (a), (2) is called Type 2 error (b). When a = 0.10 it means that true
hypothesis will be accepted in 90 out of 100 occasions. Thus, there is a risk of rejecting a true
hypothesis in 10 out of every 100 occasions. To reduce the risk, use a = 0.01 which implies that we
are prepared to take a 1% risk i.e., the probability of rejecting a true hypothesis is 1%. It is also
possible that in hypothesis testing, we may commit Type 2 error (b) i.e., accepting a null hypothesis
which is false.
Notes The only way to reduce Type 1 and Type 2 error is by increasing the sample size.
257
Research Methodology
Notes
Correct decision
Correct decision
When the firm has failed to reward a competent retailer, it has committed type-2 error. On the
other hand, when it was rewarded to an incompetent retailer, it has committed type-1 Error.
Self Assessment
Fill in the blanks:
4.
5.
Parametric tests are more powerful. The data in this test is derived from interval and ratio
measurement.
2.
In parametric tests, it is assumed that the data follows normal distributions. Examples of
parametric tests are
3.
(a)
Z-Test,
(b)
T-Test and
(c)
F-Test.
Observations must be independent i.e., selection of any one item should not affect the
chances of selecting any others be included in the sample.
Did u know?
Univariate
If we wish to analyse one variable at a time, this is called univariate analysis. Example:
Effect of sales on pricing. Here, price is an independent variable and sales is a dependent
variable. Change the price and measure the sales.
Bivariate
The relationship of two variables at a time is examined by means of bivariate data analysis.
If one is interested in a problem of detecting whether a parameter has either increased or
decreased, a two-sided test is appropriate.
258
Notes
z Test
1.
Example: You are working as a purchase manager for a company. The following
information has been supplied by two scooter tyre manufacturers.
Company A
Company B
13000
12000
340
388
Sample size
100
100
In the above, the sample size is 100, hence a Z-test may be used.
2.
Testing the hypothesis about difference between two means: This can be used when two
population means are given and null hypothesis is H o : P1 = P2.
Example: In a city during the year 2000, 20% of households indicated that they read Femina
magazine. Three years later, the publisher had reasons to believe that circulation has gone up. A
survey was conducted to confirm this. A sample of 1,000 respondents were contacted and it was
found 210 respondents confirmed that they subscribe to the periodical 'Femina'. From the above,
can we conclude that there is a significant increase in the circulation of 'Femina'?
Solution:
We will set up null hypothesis and alternate hypothesis as follows:
Null Hypothesis is H0 m = 15%
Alternate Hypothesis is HA m > 15%
This is a one-tailed (right) test.
Z=
Z=
210
0.20
1000
0.20 (1 - 0.20 )
1000
0.21 0.20
0.2 0.8
1000
0.01 m
0.16
1000
259
Research Methodology
Notes
=
0.1
0.4
31.62
0.1
= 8.33
0.012
As the value of Z at 0.05 =1.64 and calculated value of Z falls in the rejection region, we reject null
hypothesis, and therefore we conclude that the sale of 'Femina' has increased significantly.
A certain pesticide is packed into bags by a machine. A random sample of 10 bags are
drawn and their contents are found as follows: 50, 49, 52, 44, 45, 48, 46, 45, 49, 45. Confirm
whether the average packaging can be taken to be 50 kgs.
In this text, the sample size is less than 30. Standard deviations are not known using this
test. We can find out if there is any significant difference between the two means i.e.
whether the two population means are equal.
2.
There are two nourishment programmes 'A' and 'B'. Two groups of children are subjected
to this. Their weight is measured after six months. The first group of children subjected to
the programme 'A' weighed 44, 37, 48, 60, 41 kgs. at the end of programme. The second
group of children were subjected to nourishment programme 'B' and their weight was 42,
42, 58, 64, 64, 67, 62 kgs. at the end of the programme. From the above, can we conclude that
nourishment programme 'B' increased the weight of the children significantly, given a 5%
level of confidence.
Null Hypothesis: There is no significant difference between Nourishment programme 'A' and
'B'.
Alternative Hypothesis: Nourishment programme B is better than 'A' or Nourishment
programme 'B' increase the children's weight significantly.
Solution:
Nourishment programme
A
X
xx
( x x )2
Nourishment programme
B
y
= (x - 46)
(y y)
= (y - 57)
44
-2
42
-15
225
37
-9
81
42
-15
225
48
58
60
14
196
64
49
41
-5
25
64
49
67
10
100
62
25
399
674
230
260
yy
310
t=
Here
1
1
s2 +
n1 n2
n1 = 5
n2 = 7
x = 230,
x-x
Notes
x-y
y = 399
= 310 , y - y
= 399
x 230
=
= 46
x = n
5
1
y =
y 399
=
= 57
n2
7
s2 =
1
n1 + n2 - 2
{ (x - x) + (y - y) }
2
D.F. = (n1 + n2 2) = (5 + 7 2) = 10
s2 =
t=
1
{310 + 674} = 98.4
10
46 - 57
1 1
98.4 +
5 7
-11
12
98.4
35
-11
11
=5.8
33.73
= 1.89
t at 10 d.f. at 5% level is 1.81.
Since, calculated t is greater than 1.81, it is significant. Hence H A is accepted. Therefore the two
nutrition programmes differ significantly with respect to weight increase.
n-2
1 - ryx
261
Research Methodology
Notes
n-2
1 - r2
r = 0.52, n = 18
t = 0.52
18 - 2
1 - (0.52)2
0.52 4
= 2.44
0.854
= (n 2) = (18 2) = 16
= 16, t0.05 = 2.12
The calculated value of t is greater than the table value. The given value of r is significant.
F-Test
Let there be two independent random samples of sizes n1 and n2 from two normal populations
with variances s12
s22 =
and s 22
2
respectively. Further, let s1 =
1
( X 1 i - X 1 )2
n1 - 1
and
1
( X2i - X2 )2 be the variances of the first sample and the second samples respectively.
n2 - 1
Then F - statistic is defined as the ratio of two 2 - variates. Thus, we can write
1
( n1 - 1)s12
( n2 - 1)s
n2 - 1
F=
n-1 =
2
-1
n2 - 1
2
1
2
2
2
2
/( n1 - 1)
s12
/( n2 - 1)
s 12
s22
s 22
Features of F- distribution
1.
2.
v2 2( v1 + v2 2)
v 2
v1 ( v2 4)
2
262
v2
and standard error is
v2 - 2
We note that the mean will exist if v2 > 2 and standard error will exist if v2 > 4. Further, the
mean > 1.
3.
The random variate F can take only positive values from 0 to . The curve is positively
skewed.
4.
5.
If a random variate follows t-distribution with degrees of freedom, then its square
follows F-distribution with 1 and d.f. i.e. t2 = F1,
6.
( v21 )
Notes
as 2
Figure 12.2
p(F)
1 = 40, 1 = 40
1 = 30, 1 = 30
1 = 10, 1 = 10
Self Assessment
Fill in the blanks:
6.
7.
8.
Caution One case where the distribution of the test statistic is an exact chi-square
distribution is the test that the variance of a normally-distributed population has a given
value based on a sample variance. Such a test is uncommon in practice because values of
variances to test against are seldom known exactly.
263
Research Methodology
Notes
Sample observations should be independent i.e. two individual items should be included
twice in a sample.
2.
3.
There should be a minimum of five observations in any cell. This is called cell frequency
constraint.
Persons
Under 20-40
20-40
Total
41-50
51 & Over
146
78
48
28
300
54
52
32
62
200
Total
200
130
80
90
500
Is there any significant difference between the age group and preference for the car?
Example: A company marketing tea claims that 70% of population in a metro drinks a
particular brand (Wood Smoke) of tea. A competing brand challenged this claim. They took a
random sample of 200 families to gather data. During the study period, it was found that 130
families were using this brand of tea. Will it be correct on the part of competitor to conclude that
the claim made by the company does not holds good at 5% level of significance?
Solution:
Hypothesis H0 People who drink Wood Smoke brand is 70%.
H0 People who drink Wood Smoke brand is not 70%.
If the hypothesis is true then number of consumers who drink this particular brand is 200 0.7
= 140.
Those who do not drink that brand are 200 0.3 = 60
Degree of freedom = D = 2 1 = 1, since there are two groups.
Group
Observed
(O)
Expected
(E)
O-E
(O-E)2
(O-E)2/E
130
140
-10
100
0.714
70
60
+10
100
1.667
200
200
2 =
(0 - E )
E
= 2.381
A 0.5 level of significance of for 1 d.f. is equal to 3.841 (From tables). The calculated value is 2.381
is lower. Therefore, we accept the hypothesis that 70% of the people in that metro drink Wood
Smoke branded tea.
264
Notes
Self Assessment
Fill in the blanks:
9.
10.
12.5 ANOVA
ANOVA is a statistical technique. It is used to test the equality of three or more sample means.
Based on the means, inference is drawn whether samples belongs to same population or not.
2.
3.
One-way classification
2.
2.
3.
4.
Compare the value of F obtained above in (3) with the critical value of F such as 5% level
of significance for the applicable degree of freedom.
5.
When the calculated value of F is less than the table value of F, the difference in sample
means is not significant and a null hypothesis is accepted. On the other hand, when the
calculated value of F is more than the critical value of F, the difference in sample means is
considered as significant and the null hypothesis is rejected.
2.
Compare the first year earnings of graduates of half a dozen top business schools.
265
Research Methodology
Notes
Price ( )
Total
Sample mean x
39
12
10
11
50
10
44
10
40
49
35
What the manufacturer wants to know is: (1) Whether the difference among the means is
significant? If the difference is not significant, then the sale must be due to chance. (2) Do the
means differ? (3) Can we conclude that the three samples are drawn from the same population
or not?
Example: In a company there are four shop floors. Productivity rate for three methods of
incentives and gain sharing in each shop floor is presented in the following table. Analyze
whether various methods of incentives and gain sharing differ significantly at 5% and 1%
F-limits.
Shop
Floor
1
2
3
4
X2
X3
5
6
2
7
4
4
2
6
4
3
2
3
Solution:
Step 1: Calculate mean of each of the three samples (i.e., x 1, x2 and x3, i.e. different methods of
incentive gain sharing).
X1 =
5+6+2+7
=5
4
X2 =
4+3+2+3
=3
4
X3 =
4+3+2+3
=3
4
X1 + X2 + X3
K
5+3+3
= 4(approximated)
3
Step 3: Calculate sum of squares (s.s.) for variance between and within the samples.
ss between = n 1 (x1 - x)2 + n 2 (x2 - x)2 + n 3 (x3 - x)2
ss within = S(x1i - x1 )2 + S(x 2i - x2 )2 + S(x 3i - x3 )2
Sum of squares (ss) for variance between samples is obtained by taking the deviations of the
sample means from the mean of sample means () and by calculating the squares of such deviation,
266
which are multiplied by the respective number of items or categories in the samples and then by
obtaining their total. Sum of squares(ss) for variance within samples is obtained by taking
deviations of the values of all sample items from corresponding sample means and by squaring
such deviations and then totalling them. For our illustration then
Notes
= (0 + 1 + 9 + 4) + (0 + 0 + 4 + 4) + (1 + 0 + 1 + 0)
= 14 + 8 + 2
= 24
Step 4: ss of total variance which is equal to total of s.s. between and ss within and is denoted by
formula as follows:
S(x ij - x)2
where
i = 1.23
j = 1.23
for our example, total ss will thus be:
[ {(5 - 4)2 + (6 - 4)2 + (2 - 4)2 + (7 - 4)2 } + {(4 - 4)2 + (4 - 4)2 + (2 - 4)2 + (6 - 4)2 }
+ {(4 - 4)2 + (3 - 4)2 + (2 - 4)2 + (3 - 4)2 }]
= {(1 + 4 + 4 + 9) + (0 + 0 + 4 + 4) + (0 + 1 + 4 + 1)}
= 08 + 8 + 6 = 32
We will, however, get the same value if we simply total respective values of ss between and ss
within. For our example, ss between is 8 and ss within is 24, thus ss of total variance is 32 (8+24).
Step 5: Ascertain degrees of freedom and mean square (MS) between and within the samples.
Degrees of freedom (df) for between samples and within samples are computed differently as
follows.
For between samples, df is (k-1), where k' represents number of samples (for us it is 3). For
within samples df is (n-k), where 'n' represents total number of items in all the samples (for us
it is 12).
Mean squares (MS) between and within samples are computed by dividing the ss between and
ss within by respective degrees of freedom. Thus for our example:
(i)
MS between =
ss between
8
= =4
(k - 1)
2
267
Research Methodology
Notes
(ii)
MS within =
ss within 24
=
= 2.67
(n - k)
9
ss between
ss within
4.00
= 1.5
2.67
Step 7: Now we will have to analyze whether various methods of incentives and gain sharing
differ significantly at 5% and 1% 'F' limits. For this, we need to compare observed 'F' ratio with
'F' table values. When observed 'F' value at given degrees of freedom is either equal to or less
than the table value, difference is considered insignificant. In reverse cases, i.e., when calculated
'F' value is higher than table-F value, the difference is considered significant and accordingly we
draw our conclusion.
For example, our observed 'F' ratio at degrees of freedom (v 1* & v2**, i.e., and 9) is 1.5. The table
value of F at 5% level with df 2 and 9 (v 1 = 2, v2 = 9) is 4.26. Since the table value is higher than
the observed value, difference in rate of productivity due to various methods of incentives and
gain sharing is considered insignificant. At 1% level with df 2 and 9, we get the table value of F
as 8.02 and we draw the same conclusion.
We can now draw an ANOVA table as follows to show our entire observation.
Variation
SS
Between
sample
Within
simple
24
df
MS
(k1)=
(31)=2
(nk)=
(123)
=9
F-ratio
Table value
of F
5%
1%
ss between
MS between
F (v1, v2)
F (v1, v2)
(k1)
= 8/2 = 4
ss.within
(nk)
= 24/9
= 2.67
MS within
= 4/2.67
=1.5
=F (2,9)
= 4.26
=F(2,9)
8.02
268
Worker 1
25
26
23
28
Worker 2
23
22
24
27
Worker 3
27
30
26
32
Worker 4
29
34
27
33
Notes
2.
Example: Company X wants its employees to undergo three different types of training
programme with a view to obtain improved productivity from them. After the completion of
the training programme, 16 new employees are assigned at random to three training methods
and the production performance were recorded.
The training managers problem is to find out if there are any differences in the effectiveness of
the training methods? The data recorded is as under:
Daily Output of New Employees
Method 1
15
18
19
22
11
Method 2
22
27
18
21
17
Method 3
18
24
19
16
22
15
2.
3.
n (x
i
-x
k-1
Sample variance si
(x
=
-x
n-1
5.
6.
-1
nr - k
7.
Calculate the number of degree of freedom in the numerator F ratio using equation, d.f =
(No. of samples 1).
8.
Calculate the number of degree of freedom in the denominator of F ratio using the equation
d.f = S(ni k)
9.
10.
Draw conclusions.
269
Research Methodology
Notes
Solution:
Method 1
Method 2
Method 3
15
22
24
18
27
19
19
18
16
22
21
22
11
17
15
85
105
114
18
1.
2.
85
105
114
= 17, x 2 =
= 21, x 3 =
= 19
5
5
6
Grand mean
x=
3.
15 + 18 + 19 + 22 + 11 + 22 + 27 + 18 + 21 + 17 + 24 + 19 + 16 + 22 + 15 + 18 304
=
= 19
16
16
xx
(x x)
17
19
-2
5 4 = 20
21
19
5 4 = 20
19
19
0
i
s =
n (x
i
-x
k-1
n xx
60=0
n (x
-x
= 40
40
= 20
3-1
( x - x)
15-17
Training method -2
x-x
( x - x)
(-2)2 = 4
22-21
18-17
(1)2 = 1
19-17
x-x
( x - x)
(1)2 = 1
18-19
(1)2 = 1
27-21
(6)2 = 36
24-19
(5)2 = 25
(2)2 = 4
18-21
(-3)2 = 9
19-19
(0)2 = 0
22-17
(5)2 = 25
21-21
(0)2 = 1
16-19
(-3)2 = 9
11-17
(-6)2 = 36
17-21
(-4)2 = 16
22-19
(3)2 = 9
15-19
(-4)2 = 16
( x - x ) = 70
(x - x )
n-1
(x - x )
70
=
,
5-1
n-1
(x - x )
62
=
,
5-1
( x - x ) = 60
Sample variance =
( x - x ) = 62
270
Training method -3
n-1
60
5-1
s12 =
5.
Notes
60
70
62
2
= 12
= 17.5 , s22 =
= 15.5 , s3 =
5
4
4
ni - 1 2
s1
i - k
5-1
5-1
6-1
17.5 +
15.5 +
12
=
16 - 3
16 - 3
16 - 3
5
4
4
12
= 17.5 + 15.5 +
13
13
13
192
= 14.76
13
6.
F=
7.
d.f. of Numerator = (3 - 1) = 2.
8.
9.
10.
The value is 3.81. This is the upper limit of acceptance region. Since calculated value 1.354
lies within it we can accept H0, the null hypothesis.
Conclusion: There is no significant difference in the effect of the three training methods.
Example: Let us now frame a problem to study the effects of incentive and gain sharing and
level of technology (independent variables) on productivity rate (dependent variable).
Productivity Rate Data of Workers of M/s. XYZ & Co.
Level of Technology
W
X
Y
Z
4
5
1
6
3
3
1
5
3
2
1
2
Solution:
1.
2.
Correction factor =
(T)2 36 36
=
n
12
= 108
3.
Total ss = (16 + 9 + 9 + 25 + 9 + 4 + 1 + 1 + 1 + 36 + 25 + 4)
= 140 108 = 32
271
Research Methodology
Notes
4.
ss between columns:
16 16 12 12 8 8
+
+
- 108
=
4
4
4
5.
ss between rows:
10 10 10 10 3 3 13 13
+
+
+
- 108
=
3
3
3
3
100 100 9 169
+
+ +
- 108
=
3
3
3
3
= [33.33 + 33.33 + 3 + 56.33] - 108
= 126 - 108
= 18 (after adjusting fraction)
6.
ss residual:
= Total ss - (ss between column + ss between rows)
= 32 (8 +18) = 6.
Now we need to set up ANOVA table.
Variation
source
SS
Between
columns
Between rows
Residual
d.f
M.S
F ratio
(c1) = 2
(r1) = 3
8/2=4
4/1=4
18/3=6
6/1=6
18
(c1) x (r1)
=6
5%
F (2, 6)
= 5.14
F (3, 6)
= 4.76
1%
F (2, 6)
= 10.92
F (3, 6)
= 9.78
6/6=1
From the ANOVA table, we find that differences related to varieties of incentives and gain
sharing are insignificant at 5% level as the calculated F-ratio, i.e., 4 is less than table value of F,
which is 5.14. However differences are significant for different levels of technology at 5% level
as the observed F ratio is higher than table value of F. At 1% level, however, differences are
insignificant.
Self Assessment
Fill in the blanks:
272
11.
12.
Notes
2.
The hypothesis of non-parametric test is concerned with something other than the value
of a population parameter.
3.
Easy to compute. There are certain situations particularly in marketing research, where
the assumptions of parametric tests are not valid. Example: In a parametric test, we assume
that data collected follows a normal distribution. In such cases, non-parametric tests are
used. Example of non-parametric tests are Binomial test, Mann-Whitney U test, Sign test,
etc. A binomial test is used when the population has only two classes such as male, female;
buyers, non-buyers, success, failure etc. All observations made about the population must
fall into one of the two tests. The binomial test is used when the sample size is small.
Advantages
1.
2.
When data are not very accurate, these tests produce fairly good results.
Disadvantage
Non-parametric test involves the greater risk of accepting a false hypothesis and thus committing
a Type 2 error.
10
11
12
Sales
200
250
280
300
320
278
349
268
240
318
220
380
273
Research Methodology
Notes
Sign-test
Sign-test is used with matched pairs. The test is used to identify the pairs and decide whether the
pair has more or less similar characteristics.
Branch B
76
77
64
62
53
To find out whether there is any difference in the performance indices of employees of the two
branches.
Kolmogorov-Smirnov Test
This is used for examining the efficacy of fit between observed samples and expected frequency
distribution of data when the variable is in the ordinal scale.
Example: A manufacturer of cosmetics wants to test four different shades of the liquid
foundation compound very light, light, medium and dark. The company has hired a market
research agency to determine whether any distinct preference exists towards either extreme. If
so, the company will manufacture only the preferred shade, otherwise, the company is planning
to market all shades. Suppose, out of a sample of hundred, 50 preferred very light shade 30
liked light shade, 15 the medium shade, and 50 dark shades. Do you think the results show any
kind of preference?
Since the shade represents ordering (rank), this test can be used to find the preference.
274
Notes
Example: In an assembling unit, three different workers do assembly work in shifts. The
data is tabulated as follows:
Shift No.
Worker-1
Worker-2
Worker-3
25
28
29
31
28
30
35
29
27
33
28
36
35
32
31
31
32
34
Check whether there is any difference in the production quantum of the three workers:
Sample 1
Daily wages
(Painter )
Sample 2
Daily wages
(Carpenter )
Sample 3
Daily wages
(Plumber )
64
72
51
66
74
52
72
75
54
74
78
56
80
Use H-test and state whether the three populations are same or different.
Solution:
H0 - The wages of the three occupations are the same.
H1 - The wages of the three occupations is not the same.
275
Research Methodology
Notes
Item
Wage-Painter
/day
Wage-Carpenter
/day
Rank
Wage-Plumber
/day
Rank
Rank
64
72
7.5
51
66
74
9.5
52
72
7.5
75
11
54
74
9.5
78
12
56
80
13
276
R1 = 28
379
R2 = 53
213
R3 = 10
5
Total
n1 = 4, n2 = 5, n3 = 4
n = n1 + n2 + n3 = 4 + 5 + 4 = 13
R1 = 28, R2 = 53, R3 = 10
H=
R1 2
12
- 3(n + 1)
n (n + 1) n1
H=
282 532 10 2
12
+
+
- 3(3 + 1) = 9.61
13 (13 + 1) 4
5
4
At 5% level of significance, for d.f. = (3 - 1) = 2, the table value is 5.991. Computed value 9.61 is
greater.
Conclusion: Reject the Null hypothesis that the three populations are different.
Self Assessment
Fill in the blanks:
13.
............................. Test is used to determine whether two independent samples have been
drawn from the same population.
14.
............................. Test is used for examining the efficacy of fit between observed samples
and expected frequency distribution.
15.
16.
Non-parametric tests are used to test the hypothesis with ............................. and
............................. data.
17.
18.
12.7 Summary
276
Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true.
Identify a test statistic that can be used to assess the truth of the null hypothesis.
Compute the P-value, which is the probability that a test statistic at least as significant as
the one observed would be obtained assuming that the null hypothesis were true.
The smaller the r -value, the stronger the evidence against the null hypothesis.
If p , that the observed effect is statistically significant, the null hypothesis is ruled out,
and the alternative hypothesis is valid.
Notes
12.8 Keywords
Alternate Hypothesis: An alternative hypothesis is one that specifies that the null hypothesis is
not true. The alternative hypothesis is false when the null hypothesis is true, and true when the
null hypothesis is false.
ANOVA: It is a statistical technique used to test the equality of three or more sample means.
Degree of Freedom: It is the consideration that tells the researcher the number of elements that
can be chosen freely.
Null Hypothesis: The null hypothesis is a hypothesis which the researcher tries to disprove,
reject or nullify.
Significance Level: Significance level is the criterion used for rejecting the null hypothesis.
What hypothesis, test and procedure would you use when an automobile company has
manufacturing facility at two different geographical locations? Each location manufactures
two-wheelers of a different model. The customer wants to know if the mileage given by
both the models is the same or not. Samples of 45 numbers may be taken for this purpose.
2.
What hypothesis, test and procedure would you use when a company has 22 sales
executives? They underwent a training programme. The test must evaluate whether the
sales performance is unchanged or improved after the training programme.
3.
What hypothesis, test and procedure would you use in A company has three categories of
managers:
4.
(a)
(b)
(c)
Each person in a random sample of 50 was asked to state his/her sex and preferred colour.
The resulting frequencies are shown below.
Red
Blue
Green
Colour
Sex
Male
14
Female
15
A chi-square test is used to test the null hypothesis that sex and preferred colour are
independent. Will you reject at the null hypothesis 0.005 level? Why/Why not?
277
Research Methodology
Notes
5.
Are all employees equally prone to having accidents? To investigate this hypothesis,
Parry (1985) looked at a light manufacturing plant and classified the accidents by type and
by age of the employee.
Accident Type
Age
Sprain
Burn
Cut
Under 25
17
25 or over
61
13
12
A chi-square test gave a test-statistic of 20.78. If we test at a =.05, does the proportion of
sprain, cuts and burns seems to be similar for both age classes? Why/why not?
6.
In hypothesis testing, if is the probability of committing an error of Type II. The power
of the test, 1 is then the probability of rejecting H0 when HA is true or not? Why?
7.
In a statistical test of hypothesis, what would happen to the rejection region if , the level
of significance, is reduced?
8.
During the pre-flight check, Pilot Mohan discovers a minor problem - a warning light
indicates that the fuel gauge may be broken. If Mohan decides to check the fuel level by
hand, it will delay the flight by 45 minutes. If he decides to ignore the warning, the aircraft
may run out of fuel before it gets to Mumbai. In this situation, what would be:
(a)
(b)
a type I error?
9.
Can the probability of a Type II error be controlled by the sample size? Why/ why not?
10.
11.
Two samples were drawn from a recent survey, each containing 500 hamlets. In the first
sample, the mean population per hamlet was found to be 100 with a S.D. of 20, while in the
second sample the mean population was 120 with a S.D. 15. Do you find the averages of the
samples to be statistically significant?
12.
A simple random sample of size 100 has a mean of 15, the population variance being 25.
Find an interval estimate of the population mean with a confidence level of (i) 99% and
(ii) 95%.
13.
A population consists of five numbers 2, 3, 6, 8, 11. Consider all possible samples of size
two which can be drawn with replacement from this population. Calculate the S.E. of
sample means.
14.
A certain drug is claimed to be effective in curing colds; half of them were given sugar
pills. The patients reactions to the treatment are recorded in the following table.
Helped
Harmed
No effect
Drug
52
10
18
Sugar pills
44
10
26
Test the hypothesis that the drug is no better than the sugar pills for curing colds. (The 5 %
value of x2 for v = 2 = 5.991)
278
15.
A random sample of 640 persons from a village provided the following information:
Effect of Influenza
Total
Attacked
100
60
160
Not attacked
200
280
480
Total
300
340
640
Notes
Test whether the new drug was effective in preventing the attack of influenza.
confirmatory data
2.
significance level
3.
alternate
4.
Type 1
5.
Type 2
6.
bivariate
7.
ratio
8.
9.
independent
10.
50
11.
ANOVA
12.
quantitative
13.
Mann Whitney U
14.
Kolmogorov-Smirnov
15.
matched
16.
nominal, ordinal
17.
18.
Non-parametric
Books
Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
R.S. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
279
Research Methodology
Notes
Multiple Regression
13.4.2
Objectives
After studying this unit, you will be able to:
Introduction
As the name indicates, multivariate analysis comprises a set of techniques dedicated to the
analysis of data sets with more than one variable. Several of these techniques were developed
recently in part because they require the computational capabilities of modern computers.
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which
involves observation and analysis of more than one statistical variable at a time. In design and
analysis, the technique is used to perform trade studies across multiple dimensions while taking
into account the effects of all variables on the responses of interest. Sometimes, the marketers
will come across situations, which are complex involving two or more variables. Hence, bivariate
analysis deals with this type of situation. Chi-Square is an example of bivariate analysis.
280
Notes
Classification
Multiple-variate analysis: This can be classified under the following heads:
1.
Multiple regression
2.
Discriminant analysis
3.
Conjoint analysis
4.
Factor analysis
5.
Cluster analysis
6.
Multidimensional scaling.
.... (1)
.... (2)
.... (3)
Given n observations on X1, X2 and X3, we want to find such values of the constants of the
(X
n
i=1
ij
- Xijc , j = 1, 2, 3, is minimised.
For convenience, we shall use regression equations expressed in terms of deviations of variables
from their respective means. Equation (1), on taking sum and dividing by n, can be written as
X
n
1c
= a1.23 + b12.3
X
n
+ b13.2
281
Research Methodology
Notes
or
.... (4)
Note: X1 = X1c.
Subtracting (4) from (1), we have
X 1c - X 1 = b12.3 X 2 - X 2 + b13.2 X 3 - X 3
where
X 1c - X 1 = x1c , X 2 - X 2 = x 2 and X 3 - X 3 = x 3 .
and
.... (6)
.... (7)
Notes The subscript of the coefficients preceding the dot are termed as primary subscripts
while those appearing after it are termed as secondary subscripts. The number of secondary
subscripts gives the order of the regression coefficient, e.g., b12.3 is regression coefficient of
order one, etc.
x x
.... (8)
x x
.... (9)
1 2
1 3
b12.3 =
b13.2 =
( x x )( x ) - ( x x )( x x )
( x )( x ) - ( x x )
.... (10)
( x x )( x ) - ( x x )( x x )
( x )( x ) - ( x x )
.... (11)
2
3
1 2
1 3
2
2
2 3
2
2
1 3
2 3
2
3
1 2
2
2
2 3
2
3
2 3
Notes
1.
Various sums of squares and sums of products of deviations, used above, can be
computed using the formula
x x = X X
p q
2
2
, etc.
Contd...
282
2.
The fact that a regression coefficient is independent of change of origin can also be
utilised to further simplify the computational work.
3.
The regression coefficients of equations (2) and (3) can be written by symmetry as
given below:
b21.3 =
b23.1 =
Notes
( x x )( x ) - ( x x )( x x )
( x )( x ) - ( x x )
2
3
2 1
2
1
2 3
1 3
2
3
1 3
( x x )( x ) - ( x x )( x x )
( x )( x ) - ( x x )
2
1
2 3
2
1
2 1
1 3
2
3
1 3
Further, b31.2 = b13.2 and b32.1 = b23.1 and the expressions for the constant terms are
a2.13 = X 2 - b21.3X 1 - b23.1X 3 and a3.12 = X 3 - b31.2 X 1 - b32.1X 2 respectively.
Example: Fit a linear regression of rice yield (X 1 quintals) on the use of fertiliser
(X2 kgs per acre) and the amount of rain fall (X3 inches), from the following data:
X1
45
50
55
70
75
75
85
X2
25
35
45
55
65
75
85
X3
31
28
32
32
29
27
31
X1
X2
X3
X1X2
X1X3
X2X3
X 12
X 22
X 23
45
25
31
1125
1395
775
2025
625
961
50
35
28
1750
1400
980
2500
1225
784
55
45
32
2475
1760
1440
3025
2025
1024
70
55
32
3850
2240
1760
4900
3025
1024
75
65
29
4875
2175
1885
5625
4225
841
75
75
27
5625
2025
2025
5625
5625
729
85
85
31
7225
2635
2635
7225
7225
961
455
385
210
26925
13630
11500
30925
23975
6324
From the above table we compute the following sums of product and sums of squares:
x1x2 = X 1 X 2
x1x3 = SX1 X 3
( X1 )( X2 )
455 385
= 26925
= 1900
n
7
( X1 )( X3 )
455 210
= 13630
= 20
n
7
283
Research Methodology
Notes
x2x3 = SX 2 X 3
( X2 )( X3 )
385 210
= 11500
= 50
n
7
2
x 22 = SX 2
( X 2 )2
3852
= 23975
= 2800
n
7
2
X 23 = SX 3
( X 3 )2
210 2
= 6324
= 24
n
7
b12.3 =
b13.2 =
Also
X1 =
455
385
210
65, X2
55, X3
30
7
7
7
Thus
U1
U2
U3
U1U2
U1U3
U 2U 3
U 21
U 22
U 32
20
30
600
20
30
400
900
15
20
300
30
40
225
400
10
10
100
20
20
100
100
10
25
10
10
100
10
10
100
100
10
20
200
30
60
100
400
20
30
600
20
30
400
900
1900
20
50
1350
2800
24
Hence
b12.3 =
b13.2 =
284
1900 24 - (- 20 )(- 50 )
2800 24 - (- 50 )
= 0.689
Further, we have
X1 =
!
Caution
Notes
U
n
+ 65 = 65, X 2 =
U
n
+ 55 = 55 and X 3 =
U
n
+ 30 = 30
The above method should be used when mean of all the variables are integers.
Alternative Method
The coefficients of the regression equation X1c = a1.23 + b12.3X2 + b13.2X3 can also be obtained by
simultaneously solving the following normal equations:
SX1 = na1.23 + b12.3SX2 + b13.2SX3
SX1X2 = a1.23SX2 + b12.3SX22 + b13.2SX2X3
SX1X3 = a1.23SX3 + b12.3SX2X3 + b13.2SX32
Self Assessment
Fill in the blanks:
1.
2.
3.
Those who buy our brand and those who buy competitors brand.
2.
3.
Those who go to Food World to buy and those who buy in a Kirana shop.
4.
Suppose there is a comparison between the groups mentioned as above along with demographic
and socio-economic factors, then discriminant analysis can be used. One way of doing this is to
proceed and calculate the income, age, educational level, so that the profile of each group could
be determined. Comparing the two groups based on one variable alone would be informative
but it would not indicate the relative importance of each variable in distinguishing the groups.
This is because several variables within the group will have some correlation which means that
one variable is not independent of the other.
If we are interested in segmenting the market using income and education, we would be
interested in the total effect of two variables in combinations, and not their effects separately.
Further, we would be interested in determining which of the variables are more important or
285
Research Methodology
Notes
had a greater impact. To summarize, we can say, that Discriminant Analysis can be used when
we want to consider the variables simultaneously to take into account their interrelationship.
Like regression, the value of dependent variable is calculated by using the data of independent
variable.
Z = b1x1 + b2x2 + b3x3 + ..............
Z = Discriminant score
b1 = Discriminant weight for variable
x = Independent variable
As can be seen in the above, each independent variable is multiplied by its corresponding
weightage.
This results in a single composite discriminant score for each individual. By taking the average
of discriminant score of the individuals within a certain group, we create a group mean. This is
known as centroid. If the analysis involves two groups, there are two centroids. This is very
similar to multiple regression, except that different types of variables are involved.
Application
A company manufacturing FMCG products introduces a sales contest among its marketing
executives to find out How many distributors can be roped in to handle the companys product.
Assume that this contest runs for three months. Each marketing executive is given target regarding
number of new distributors and sales they can generate during the period. This target is fixed
and based on the past sales achieved by them about which, the data is available in the company.
It is also announced that marketing executives who add 15 or more distributors will be given a
Maruti Omni-van as prize. Those who generate between 5 and 10 distributors will be given a
two-wheeler as the prize. Those who generate less than 5 distributors will get nothing. Now
assume that 5 marketing executives won a Maruti van and 4 won a two-wheeler.
The company now wants to find out, Which activities of the marketing executive made the
difference in terms of winning a prize and not winning the prize. One can proceed in a number
of ways. The company could compare those who won the Maruti van against the others.
Alternatively, the company might compare those who won, one of the two prizes against those
who won nothing. It might compare each group against each of the other two.
Discriminant analysis will highlight the difference in activities performed by each group
members to get the prize. The activity might include:
1.
2.
3.
286
1.
What variable discriminates various groups as above; the number of groups could be two
or more? Dealing with more than two groups is called Multiple Discriminant Analysis
(M.D.A.).
2.
Can discriminating variables be chosen to forecast the group to which the brand/person/
place belong to?
3.
Notes
2.
3.
Dialogue box will appear. Select the GROUPING VARIABLE. This can be done by clicking
on the right arrow to transfer them from the variable list on the left to the grouping
variable box on the right.
4.
Define the range of values by clicking on DEFINE RANGE. Enter Minimum and Maximum
value then click CONTINUE.
5.
Select all the independent variable for discriminant analysis from the variable list by
clicking on the arrow that transfers them to box on the right.
6.
Click on STATISTICS on the lower part of main dialogue box. This will open up a smaller
dialogue box.
7.
Click on CLASSIFY on the lower part of the main dialogue box select SUMMARY TABLE
under the heading DISPLAY in a small dialogue box that appears.
8.
Self Assessment
Fill in the blanks:
4.
5.
If the discriminant analysis involves two groups, there are ....................... centroids.
Example: An airline would like to know, which is the most desirable combination of
attributes to a frequent traveller: (a) Punctuality (b) Air fare (c) Quality of food served on the
flight and (d) Hospitality and empathy shown.
Conjoint Analysis is a multivariate technique that captures the exact levels of utility that an
individual customer places on various attributes of the product offering. Conjoint Analysis
enables a direct comparison,
Example: A comparison between the utility of a price level of 400 versus 500, a delivery
period of 1 week versus 2 weeks, or an after-sales response of 24 hours versus 48 hours.
Once we know the utility levels for each attribute (and at individual levels as well), we can
combine these to find the best combination of attributes that gives the customer the highest
utility, the second best combination that gives the second highest utility, and so on. This
information is then used to design a product or service offering.
287
Research Methodology
Notes
Application
Conjoint Analysis is extremely versatile and the range of applications includes virtually in any
industry. New product or service design, including the concepts in the pre-prototyping stage
can specifically benefit from the conjoint applications.
Some examples of other areas where this technique can be used are:
1.
2.
Process
Design attributes for a product are first identified. For a shirt manufacturer, these could be
design such as designer shirts vs plain shirts, this price of 400 versus 800. The outlets can have
exclusive distribution or mass distribution. All possible combinations of these attribute levels
are then listed out. Each design combination will be ranked by customers and used as input data
for Conjoint Analysis. Then the utility of the products relative to price can be measured.
The output is a part-worth or utility for each level of each attribute. For example, the design may
get a utility level of 5 and plain, 7.5. Similarly, the exclusive distribution may have a part utility
of 2, and mass distribution, 5.8. We then put together the part utilities and come up with a total
utility for any product combination we want to offer, and compare that with the maximum
utility combination for this customer segment.
This process clarifies to the marketer about the product or service regarding the attributes that
they should focus on in the design.
If a retail store finds that the height of a shelf is an important attribute for selling at a particular
level, a well-designed shelf may result from this knowledge. Similarly, a designer of clocks will
benefit from knowing the utility attached by customers to the dial size, background colours, and
price range of the clocks.
Approach
From a discussion with the client, identify the design attributes to be studied and the levels at
which they can be offered. Then build a list of product concepts on offer. These product concepts
are then ranked by customers. Once this data is available, use Conjoint Analysis to derive the
part utilities of each attribute level. This is then used to predict the best product design for the
given customer segment. Use the SPSS Conjoint procedure to analyse the data.
There are three steps in conjoint analysis:
1.
2.
Collection of data.
3.
For attributes selection, the market researcher can conduct interview with the customers directly.
288
1.
Weight (3 Kg or 5 Kg)
2.
3.
Notes
Ask each of the respondent to rank all the combination of attributes contained in the file.
This is nomenclated at DATA FILE 1. All the rankings should be entered in another file
called DATA FILE 2.
2.
Now 2 files namely DATA FILE 1 and DATA FILE 2 are created.
3.
A third file called SYNTAX file is to be opened. By using the FILE, OPEN command
followed by syntax.
4.
Type the following - conjoint plan = DATA FILE 1 SAV/DATA' DATA FILE 2 SAV/
SCORES=SCORE 1 to Score number of ranking/FACTOR VARI (DISCRETE)/PLOT ALL
(Here 25 is the possible combination of attributes). Score is the term used for rankings. The
no of scores will be equal to number of rankings. We should use the word RANK in the
syntax instead of scores if Rankings are contained in the data file.
5.
Click RUN from the menu of the syntax file that was created click all in the menu which
appears on the screen. If the syntax is correct, the output for conjoint will appear.
Combination
Rank
One combination 3 kg, 4 hours, Dell clearly dominates and 5 kg, 2 hours, Lenovo is least
preferred.
Let us now take the average rank for 3 kg option = 4 + 3 + 2 + 1/4 = 2.5
For 5 kg option average rank is 5 + 8 + 7 + 6/4 = 6.5
For 4 hour option 5 + 3 + 7 + 1/4 = 4
For 2 hour option 4 + 8 + 2 + 6/4 = 5
For Dell 5 + 6 + 1 + 2/4 = 3.5
For Lenovo 5.5
Looking at the difference in average ranks, the most important characteristic to this
respondent is weight = 4, followed by brand name = 2 and battery life = 1.
289
Research Methodology
Notes
Self Assessment
Fill in the blanks:
6.
....................... analysis is concerned with the measurement of the joint effect of two or more
attributes.
7.
For ....................... selection, the market researcher can conduct interview with the customers
directly.
8.
2.
When the objective is to summarise information from a large set of variables into fewer factors,
principle component factor analysis is used. On the other hand, if the researcher wants to analyse
the components of the main factor, common factor analysis is used.
Example: Common factor Inconvenience inside a car. The components may be:
1.
Leg room
2.
Seat arrangement
3.
4.
5.
2.
3.
Comfort (C)
4.
5.
6.
Price (F)
The questionnaire may be administered to 5,000 respondents. The opinion of the customer is
gathered. Let us allot points 1 to 10 for the variables factors A to F. 1 is the lowest and 10 is the
highest. Let us assume that application of factor analysis has led to grouping the variables as
follows:
290
Notes
A, B, D, E into factor-1
F into Factor -2
C into Factor - 3
Factor - 1 can be termed as Technical factor;
Factor - 2 can be termed as Price factor;
Factor - 3 can be termed as Personal factor.
For future analysis, while conducting a study to obtain customers opinion, three factors
mentioned above would be sufficient. One basic purpose of using factor analysis is to reduce the
number of independent variables in the study. By having too many independent variables, the
M.R study will suffer from following disadvantages:
1.
Time for data collection is very high due to several independent variables.
2.
3.
4.
Example: Following are the data on the drinking habits of different employees in an
organization:
Drinking Habits
Employee Group
(1) None
(2)
Light
(3)
Medium
(4) Heavy
Row Totals
14
20
15
12
10
42
(4) Executives
25
20
30
15
90
30
10
50
Column Totals
79
41
59
37
216
One may think of the 4 column values in each row of the table as coordinates in a 4-dimensional
space, and one could compute the (Euclidean) distances between the 5 row points in the 4-
291
Research Methodology
Notes
dimensional space. The distances between the points in the 4-dimensional space summarize all
information about the similarities between the rows in the table above. Now suppose one could
find a lower-dimensional space, in which to position the row points in a manner that retains all,
or almost all, of the information about the differences between the rows. You could then present
all information about the similarities between the rows (types of employees in this case) in a
simple 1, 2, or 3-dimensional graph. While this may not appear to be particularly useful for
small tables like the one shown above, one can easily imagine how the presentation and
interpretation of very large tables (e.g., differential preference for 10 consumer items among
100 groups of respondents in a consumer survey) could greatly benefit from the simplification
that can be achieved via correspondence analysis (e.g., represent the 10 consumer items in a twodimensional space).
292
possible pairs of predictor variables that would give the same predictions, which is simplest to
use, in the logic of minimizing the number of predictor variables needed in the typical regression?
The pair of predictor variables maximising some measure of minimalism could be said to have
simple structure. In this example involving grades, you might be able to predict grades in some
courses correctly from just a verbal test score, and predict grades in other courses accurately
from just a math score. If so, then you would have achieved a simpler structure in your
predictions than if you had used both tests for each and every predictions.
Notes
Self Assessment
Fill in the blanks:
9.
When the objective is to summarise information from a large set of variables into fewer
factors, ....................... analysis is used.
10.
11.
293
Research Methodology
Notes
2.
Cluster Analysis is a technique used for classifying objects into groups. This can be used to sort
data (a number of people, companies, cities, brands or any other objects) into homogeneous
groups based on their characteristics.
The result of Cluster Analysis is a grouping of the data into groups called clusters. The researcher
can analyse the clusters for their characteristics and give the cluster, names based on these.
Where can Cluster Analysis be applied?
The marketing application of cluster analysis is in customer segmentation and estimation of
segment sizes. Industries, where this technique is useful include automobiles, retail stores,
insurance, B-to-B, durables and packaged goods. Some of the well-known frameworks in consumer
behaviour (like VALS) are based on value cluster analysis.
Cluster Analysis is applicable when:
1.
An FMCG company wants to map the profile of its target audience in terms of life-style,
attitude and perceptions.
2.
A consumer durable company wants to know the features and services a consumer takes
into account, when purchasing through catalogues.
3.
A housing finance corporation wants to identify and cluster the basic characteristics, lifestyles and mindset of persons who would be availing housing loans. Clustering can be
done based on parameters such as interest rates, documentation, processing fee, number
of installments etc.
Process
There are two ways in which Cluster Analysis can be carried out:
1.
2.
The above two are basic approaches used in cluster analysis. This can be used to segment
customer groups for a brand or product category, or to segment retail stores into similar groups
based on selected variables.
Interpretation of Results
Ideally, the variables should be measured on an interval or ratio scale. This is because the
clustering techniques use the distance measure to find the closest objects to group into a cluster.
An example of its use can be clustering of towns similar to each other which will help decide
where to locate new retail stores.
If clusters of customers are found based on their attitudes towards new products and interest in
different kinds of activities, an estimate of the segment size for each segment of the population
can be obtained, by looking at the number of objects in each cluster.
294
Marketing strategies for each segment are fine-tuned based on the segment characteristics. For
instance, a segment of customers, like sports car, get a special promotional offer during specific
period.
Notes
2.
3.
4.
5.
Did u know? Names can also be given to clusters to describe each one. For example, there
can be a cluster called neo-rich. Segments are prioritised based on their estimated size.
Age
Family size
Example: Suppose there are five attributes, 1 to 5, on which we are judging two objects A and
B. The existence of an attribute may be indicated by 1 and its absence by 0. In this way, two
objects are viewed as similar if they share common attributes.
295
Research Methodology
Notes
Attribute
Brand - A
1
0
0
0
0
1
1
1
0
1
0
0
1
0
Brand - B
a+d
a+b+c+d
Where
a = No. of attributes possessed by brands A and B
b = No. of attributes possessed by brand A but not by brand B
c = No. of attributes possessed by brand B but not by brand A
d = No. of attributes not possessed by both brands.
Substituting, we get
S=
1+2
3
= = 0.43
1+2+2+2 7
SAB = M/N
SAB = Similarity between A and B
M = Number of attributes held in common (0 or 1)
N = Total number of attributes
SAB = 3/7 = 0.43
296
1.
2.
3.
Dialogue box will appear select all the variables which are required to be used in cluster
analysis. This can be done by clicking on the right arrow to transfer them from the variable
list on the left.
4.
Click on METHOD. The dialogue box will open. Choose "Between Groups Linkage" as the
CLUSTER METHOD.
5.
6.
Click STATISTICS on the main dialogue box. Choose "Agglomeration schedule" so that it
will appear in the final output click CONTINUE.
7.
Choose DENDROGRAM then on the box called ICICLE, Choose "All Clusters" and "Vertical".
8.
Click OK on the main dialogue box to get the output of the hierarchical cluster analysis.
Notes
Stage 2
This stage is used to know how many clusters are required. This stage is called K- MEANS
CLUSTERING.
1.
2.
Fill in the desired number of clusters that has been identified from stage 1.
3.
Click OPTIONS on the main dialogue box. Select "Initial Cluster Centers". Then click
CONTINUE to return to the main dialogue box.
4.
Click OK on the main dialogue box to get the output which has final clusters.
Self Assessment
Fill in the blanks:
12.
13.
Types of MDS
In general, there are two types of MDS:
1.
Metric
2.
Non-metric
Metric MDS makes the assumption that the input data is either ratio or interval data, while the
non-metric model requires simply that the data be in the form of ranks. Therefore, the nonmetric model has more fewer restrictions than the metric model, but also less rigor. One technique
to use if you are unsure whether your data is ordinal or can be considered interval is to try both
metric and non-metric models. If the results are very close, the metric model may be used.
An advantage of the non-metric models is that they permit the researcher to categorize and
examine preference data, such as the kind obtained in marketing studies or other areas where
comparisons are useful.
Another technique, correspondence analysis, can work with categorical data, i.e., data at the
nominal level of measurement, however that technique will not be described here.
297
Research Methodology
Notes
2.
3.
Decision about the number of stimulus coordinates that represent the data
4.
Example: Let us say that you have a matrix of distances between a number of major cities,
such as you might find on the back of a road map. These distances can be used as the input data
to derive an MDS solution. When the results are mapped in two dimensions, the solution will
reproduce a conventional map, except that the MDS plot might need to be rotated so that the
north-south and east-west dimensions conform to expectations. However, the once the rotation
is completed, the configuration of the cities will be spatially correct.
Self Assessment
Fill in the blanks:
14.
An advantage of the non-metric models is that they permit the researcher to .......................
and ....................... preference data.
15.
13.7 Summary
298
Some of the multi variate analysis are discriminant analysis, Factor analysis, Cluster
analysis, conjoint analysis, and multi dimensional scaling.
In discriminant analysis, it is verified whether the 2 groups differ from one another.
Factor analysis is used to reduce large no of various factors into fewer variables cluster
analysis is used to segmenting the market or to identify the target group.
Regression is a term used for predicting the value of one variable from the other.
MDS as a set of multivariate statistical methods for estimating the parameters in and
assessing the fit of various spatial distance models for proximity data.
The output of MDS looks very similar to that of factor analysis and the determination of
the optimal number of dimensions is handled in much the same way.
Notes
13.8 Keywords
Cluster Analysis: Cluster Analysis is a technique used for classifying objects into groups.
Conjoint Analysis: Conjoint analysis is concerned with the measurement of the joint effect of
two or more attributes that are important from the customers point of view.
Discriminant Analysis: In this analysis, two or more groups are compared. In the final analysis,
we need to find out whether the groups differ one from another.
Factor Analysis: Factor Analysis is the analysis whose main purpose is to group large set of
variable factors into fewer factors.
Multivariate Analysis: In multi variate analysis, the number of variables to be tackled are
many.
Which technique would you use to measure the joint effect of various attributes while
designing an automobile loan and why?
2.
Do you think that the conjoint analysis will be useful in any manner for an airline? If yes
how, if no, give an example where you think the technique is of immense help.
3.
4.
Which analysis would you use in a situation when the objective is to summarise information
from a large set of variables into fewer factors? What will be the steps you would follow?
5.
Which analysis would answer if it is possible to estimate the size of different groups?
6.
Which analysis would you use to compare a good, bad and a mediocre doctor and why?
7.
8.
Which multivariate analysis would you apply to identify specific customer segment for a
companys brand and why?
9.
10.
In your opinion what will be the disadvantages of having too many independent variables
in an MR study?
11.
Load
CGPA
0.60 x F
0.75 x F
Communication Skills
0.85 x F
This table tells us communication skill score loads highly on intelligence factor of
management students, followed by problem solving skills and CGPA. These loads or
299
Research Methodology
Notes
weights are correlations, i.e., the correlations between communication skills and the factor.
But here we have only three variables and only one factor. In real life we may have many
variables and more factors. Whatever may be the case, the basic ideas remain the same.
Suppose we want to recruit management trainees from the campus and as a selection
process, we need to consider the following variables.
12.
X1
CGPA
X2
X3
Communication Skills
X4
X5
GD Score
X6
People have been rated on their suitability for an advanced training course in computer
programming on the basis of six ratings given by their manager (rated 1=low to 20=high):
(a)
Intellect
(b)
(c)
(d)
(e)
(f)
(g)
Number of GCSEs
(h)
The training department believe that these are really measuring only three things; intellect,
computer programming experience and loyalty, and want you to carry out a factor analysis
to explore that hypothesis. Describe the decisions you would have to make in carrying out
a factor analysis and what the results would be likely to tell you.
13.
300
Six observations on two variables are available, as shown in the following table:
Obs.
X1
X2
(a)
Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?
(b)
Apply the nearest neighbor method and the squared Euclidean distance as a measure
of dissimilarity. Use a dendrogram to arrive at the number of groups and their
membership.
14.
Six observations on two variables are available, as shown in the following table:
Obs.
X1
X2
-1
-2
-2
-2
-1
(a)
Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?
(b)
Apply the nearest neighbor method and the Euclidean distance as a measure of
dissimilarity.
Notes
origin
2.
simple linear
3.
Multivariate
4.
two or more
5.
two
6.
Conjoint
7.
attributes
8.
output
9.
10.
descriptive/exploratory
11.
standardized
12.
Cluster
13.
marketing
14.
categorize, examine
15.
perceptual mapping.
Books
301
Research Methodology
Notes
Substantive Characteristics
14.1.2
Semantic Characteristics
14.3.2
14.3.3
Interpreting Information
14.3.4
Precautions
Oral Report
14.4.2
Written Report
14.4.3
14.6.2
14.6.3
14.7 Summary
14.8 Keywords
14.9 Review Questions
14.10 Further Readings
Objectives
After studying this unit, you will be able to:
302
Notes
Introduction
A report is a very formal document that is written for a variety of purposes, generally in the
sciences, social sciences, engineering and business disciplines. Generally, findings pertaining to
a given or specific task are written up into a report. It should be noted that reports are considered
to be legal documents in the workplace and, thus, they need to be precise, accurate and difficult
to misinterpret.
There are three features that, together, characterize report writing at a very basic level: a predefined
structure, independent sections, and reaching unbiased conclusions.
Predefined structure: Broadly, these headings may indicate sections within a report, such
as an introduction, discussion, and conclusion.
Example: A report prepared for a government agency will be different from the one prepared
for a private organization.
In spite of the fact that, marketing report is influenced by the researcher, there are certain
characteristics which the report should possess, if it is to be effectively communicated. These
characteristics can be classified as:
i.
Substantive characteristics
ii.
Semantic characteristics.
Accuracy
Currency
Sufficiency
Availability
Relevancy
The more that the report possesses the above characteristics, the greater is its practical value in
decision making.
Accuracy: Accuracy refers to the degree to which information reflects reality. Specifically, research
report must accurately present both research procedure and research results. Even if the research
results are not as per the expectation of the management, the researcher has the professional
303
Research Methodology
Notes
obligation to present the findings accurately and objectively. Less accurate report means, injustice
to the management.
Currency: Currency refers to the time span between completion of the research project and
presentation of the research report to management. If the management receives the research
report too late, the results are no longer valid due to environmental changes, and then the report
will have no or little value for decision making. Currency is one of the reasons for orally or
informally communicating preliminary research results to management to ensure timely decision
making.
Sufficiency: The research report must have sufficient details, so that important and valid decision
can be made. Sometimes the sample size, sample representativeness may act as a constraint for
sufficient details not being available.
Example: Data required by the management, say segment wise market, whereas overall
market data is available.
Notes A research report must document methodology and techniques used so that an
assessment can be made regarding validity, reliability and generalizability. Therefore,
sufficiency refers to whether enough information is present in the research report to
enable the manager to take valid decision.
It should be remembered that sufficiency characteristic does not mean that all possible research
project information must be incorporated in the research report. A researcher should include in
a report only that information, which is necessary to convey complete perspective of the research
project.
Availability: The fourth important characteristic of research report is that, it is available to the
appropriate decision maker when they need it. Availability refers to the communication process
between researcher and the decision maker. We use the word 'appropriate decision maker' to
emphasize the fact "who should or who should not have access to the report". This decision is
made by the management, and it is the duty of the researcher to carry out this decision. Most
reports carry confidential information. Therefore, it is necessary to restrict the report availability,
to individuals as well as outside of an organization to prevent the competitor from having
access to it.
Relevancy: The research report should be confined to the decision issue researched. Sometimes
the researcher might include some information, which he thinks is interesting, but may not
have any relevance. This type of information should be excluded from the report.
Example: A researcher may be preparing a report on the audience perception of RJs
(Radio Jockeys). This may be done with a view to recruit them based on the perception. In this
context, a lengthy commentary on relative audience appeal of each radio station is included.
This type of data may be readily available from some research agency, who is selling commercial
data. Therefore, including this type of aspect may not be necessary.
304
i.
ii.
iii.
iv.
v.
Language of the report must be simple. For example, sentences like "illumination must be
extinguished when premises are not in use" can be expressed in simple words say "switch
off the lights when you leave".
vi.
vii.
Sometimes, the current research uses the data of research conducted in the past. In this case
it is better to use past tense than present tense.
Notes
The following are the hindrances for clarity of any research report.
Ambiguity
Jargon
Misspelled words
Excessive prediction
Improper punctuation
Unfamiliar words
Clerical error
Some of the illustrations that can cause inaccuracy in report writing are given below:
Addition/subtraction error: Assume that a survey was conducted to ascertain the income
of various strata of population in a city. Suppose, it is found that 15% belong to super rich,
18% belong to rich class, 61% belong to middle class.
By oversight the total is recorded as (15+61+18) which is not equal to hundred. This error
can be corrected easily by the researcher. This type of error leads to confusion because the
reader or decision maker does not know which categories are left out (may be lower
middle class and lower class).
Confusion between percentage and percentage points: Suppose the report indicates that
raw material cost of a product as a percentage of total cost increased from 8 percentage
points in 2003 to 10 percentage points in 2009. Therefore, the raw material cost has increased
by only 2 percentage points in 6 years. The real increase is 2 percentage points or
25 percent.
Wrong conclusion: Mr. X annual income has increased from 20,000 to 40,000 in 8 years.
Therefore, the conclusion is, since income has doubled, the purchasing power also has
doubled. This may not be true because due to inflation in 8 years, purchasing power might
come down or money value could get eroded.
Self Assessment
Fill in the blanks:
1.
The research report will differ based on the of the particular managers using the
report.
2.
305
Research Methodology
Notes
3.
4.
.refers to the time span between completion of the research project and
presentation of the research report to management
Self Assessment
Fill in the blanks:
5.
6.
Writing of report is the ..step in a research study and requires a set of skills somewhat
different from those called for in respect of the former stages of research.
Example:
Example of Induction: All products manufactured by Sony are excellent. DVD player model 2602
MX is made by Sony. Therefore, it must be excellent.
306
Example of Deduction: All products have to reach decline stage one day and become obsolete. This
radio is in decline mode. Therefore, it will become obsolete.
Notes
During the inductive phase, we reason from observation. During the deductive phase, we reason
towards the observation. Successful interpretation depends on how well the data is analysed. If
data is not properly analysed, the interpretation may go wrong. If analysis has to be corrected,
then data collection must be proper. Similarly, if the data collected is proper but analysed
wrongly, then too the interpretation or conclusion will be wrong. Sometimes, even with the
proper data and proper analysis, the data can still lead to wrong interpretation. Interpretation
depends upon the experience of the researcher and methods used by him for interpretation.
Did u know? Both logic and observation are essential for interpretation.
Example: A detergent manufacturer is trying to decide which of the three sales promotion
methods (discount, contest, buy one get one free) would be most effective in increasing the sales.
Each sales promotion method is run at different times in different cities. The sales obtained by
the different sale promotion methods is as follows.
2,000
3,500
2,510
The results may lead us to the conclusion that the second sales promotion method was the most
effective in developing sales. This may be adopted nationally to promote the product. But one
cannot say that the same method of sales promotion will be effective in each and every city
under study.
Make copies of your data and store the master copy away. Use the copy for making edits,
cutting and pasting, etc.
Tabulate the information, i.e., add up the number of ratings, rankings, yes's, no's for each
question.
For ratings and rankings, consider computing a mean, or average, for each question. For
example, "For question #1, the average ranking was 2.4". This is more meaningful than
indicating, e.g., how many respondents ranked 1, 2, or 3.
Consider conveying the range of answers, e.g., 20 people ranked "1", 30 ranked "2", and 20
people ranked "3".
307
Research Methodology
Notes
Attempt to identify patterns, or associations and causal relationships in the themes, e.g.,
all people who attended programs in the evening had similar concerns, most people came
from the same geographic area, most people were in the same salary range, what processes
or events respondents experience during the program, etc.
Keep all commentary for several years after completion in case needed for future reference.
Attempt to put the information in perspective, e.g., compare results to what you expected,
promised results; management or program staff; any common standards for your products
or services; original goals (especially if you're conducting a program evaluation);
indications or measures of accomplishing outcomes or results (especially if you're
conducting an outcomes or performance evaluation); description of the program's
experiences, strengths, weaknesses, etc. (especially if you're conducting a process
evaluation).
14.3.4 Precautions
1.
2.
Analysis of data should start from simpler and more fundamental aspects.
3.
4.
5.
6.
Caution: In report writing, do not miss the significance of some answers, because they are found
from very few respondents, such as "don't know" or "can't say".
Self Assessment
Fill in the blanks:
308
7.
8.
9.
In the method, one starts from observed data and then generalisation is done
Notes
2.
3.
4.
Vital data such as figures may be printed and circulated to the audience so that their
ability to comprehend increases, since they can refer to it when the presentation is
going on.
5.
The presenter should know his target audience well in advance to prepare tailormade presentation.
6.
The presenter should know the purpose of report such as "Is it for making a decision",
"Is it for the sake of information", etc.
309
Research Methodology
Notes
(B)
Daily
(2)
Weekly
(3)
Monthly
(4)
Quarterly
(5)
Yearly
Type of reports:
(1)
Short report
(2)
Long report
(3)
Formal report
(4)
Informal report
(5)
Government report
1.
Short report: Short reports are produced when the problem is very well defined and if the
scope is limited. For example, Monthly sales report. It will run into about five pages. It
consists of report about the progress made with respect to a particular product in a clearly
specified geographical locations.
2.
Long report: This could be both a technical report as well as non-technical report. This will
present the outcome of the research in detail.
3.
(a)
Technical report: This will include the sources of data, research procedure, sample
design, tools used for gathering data, data analysis methods used, appendix,
conclusion and detailed recommendations with respect to specific findings. If any
journal, paper or periodical is referred, such references must be given for the benefit
of reader.
(b)
Non-technical report: This report is meant for those who are not technically qualified.
E.g. Chief of the finance department. He may be interested in financial implications
only, such as margins, volumes, etc. He may not be interested in the methodology.
Formal report:
Example: The report prepared by the marketing manager to be submitted to the VicePresident (marketing) on quarterly performance, reports on test marketing.
4.
Informal report: The report prepared by the supervisor by way of filling the shift log
book, to be used by his colleagues.
5.
Government report: These may be prepared by state governments or the central government
on a given issue.
Example: Programme announced for rural employment strategy as a part of five-year plan.
310
Notes
Did u know? Report on children's education is a kind of government and social welfare
report.
Written report
Not applicable.
Self Assessment
Fill in the blanks:
10.
11.
12.
The .statement should explain the nature of the project, how it came about and
what was attempted.
Title Page
2.
Page Contents
3.
Executive Summary
4.
Body
5.
6.
Bibliography
7.
Appendix
1.
Title Page: Title Page should indicate the topic on which the report is prepared. It should
include the name of the person or agency who has prepared the report.
311
Research Methodology
Notes
Table of Contents: The table of contents will help the reader to know "what the report
contains". The table of contents should indicate the various parts or sections of the report.
It should also indicate the chapter headings along with the page number.
2.
Chapter no.
Page no.
Declaration
Certificates
Acknowledgement
Executive summary
Introduction to the project
Research design and methodology
Theoretical perspective of the study
Company and industry profile
Data analysis and interpretation
Summary of findings, suggestions and conclusions
Bibliography
Appendix
1
2
3
4
5
6
3.
Executive Summary: If your report is long and drawn out, the person to whom you have
prepared the report may not have the time to read it in detail. Apart from this, an executive
summary will help in highlighting major points. It is a condensed version of the whole
report. It should be written in one or two pages. Since top executives read only the executive
summary, it should be accurate and well-written. An executive summary should help in
decision-making.
An executive summary should have,
4.
(a)
Objectives
(b)
Brief methodology
(c)
Important findings
(d)
Key results
(e)
Conclusion
Introduction
(b)
Methodology
(c)
Limitations
(d)
Introduction: The introduction must explain clearly the decision problem and research
objective. The background information should be provided on the product and services
provided by the organisation which is under study.
Methodology: How you have collected the data is the key in this section. For example,
Was primary data collected or secondary data used? Was a questionnaire used? What was
the sample size and sampling plan and method of analysis? Was the design exploratory or
conclusive?
Limitations: Every report will have some shortcoming. The limitations may be of time,
geographical area, the methodology adopted, correctness of the responses, etc.
312
Analysis and interpretations: collected data will be tabulated. Statistical tools if any will
be applied to make analysis and to take decisions.
5.
Notes
(b)
6.
Bibliography: If portions of your report are based on secondary data, use a bibliography
section to list the publications or sources that you have consulted. The bibliography
should include, title of the book, name of the journal in case of article, volume number,
page number, edition, etc.
7.
Appendix: The purpose of an appendix is to provide a place for material which is not
absolutely essential to the body of the report. The appendix will contain copies of data
collection forms called questionnaires, details of the annual report of the company, details
of graphs/charts, photographs, CDs, interviewers' instructions. Following are the items
to be placed in this section.
(a)
(b)
(c)
(d)
!
Caution The date of the submission of the report is to be included in the title page of the
report.
Books
Name of the author, title of the book (underlined), publisher's detail, year of publishing, page
number.
Single Volume Works. Dube, S. C. "India's Changing Villages", Routledge and Kegan Paul Ltd.,
1958, p. 76.
313
Research Methodology
Notes
Periodicals Journal
Dawan Radile (2005), "They Survived Business World" (India), May 98, pp. 29-36.
Newspaper, Articles
Kumar Naresh, "Exploring Divestment", The Economic Times (Bangalore), August 7, 1999, p. 14.
Website
www.infocom.in.com
Task List the various abbreviations frequently used in footnotes with their meanings.
Self Assessment
Fill in the blanks:
13.
14.
15.
A selected bibliography lists the items which the author thinks are of .interest to
the reader.
Has many other urgent matters demanding his or her interest and attention,
314
Quantify when you have the data to do so. Avoid large, small, instead, say 50%, one in
three.
Avoid the passive voice, if possible, as it creates vagueness (e.g., 'patients were interviewed'
leaves uncertainty as to who interviewed them) and repeated use makes dull reading.
Notes
!
Caution In report writing, be consistent in the use of tenses (past or present tense).
ii.
iii.
Give them an idea of how the material has been organised so the reader can make a quick
determination of what he will read first.
An attractive layout for the title page and a clear table of contents.
ii.
iii.
Consistency in headings and subheadings, for example, font size 16 or 18 bold, for headings
of chapters; size 14 bold for headings of major sections; size 12 bold, for headings of subsections, etc.
iv.
Good quality printing and photocopying. Correct drafts carefully with spell check as well
as critical reading for clarity by other team-members, your facilitator and, if possible,
outsiders.
v.
Numbering of figures and tables, provision of clear titles for tables, and clear headings for
columns and rows, etc.
vi.
315
Research Methodology
Notes
of reasons for the behavior of informants or of their attitudes. This is serious maltreatment of
data that needs correction.
The following must be avoided while preparing a report:
Self Assessment
Fill in the blanks:
16.
17.
14.7 Summary
A report is a very formal document that is written for a variety of purposes, generally in
the sciences, social sciences, engineering and business disciplines.
The most important aspect to be kept in mind while developing research report, is the
communication with the audience.
Report should be able to draw the interest of the readers. Therefore, report should be
reader centric.
Other aspect to be considered while writing report are accuracy and clarity.
The point to be remembered while doing oral presentation is language used, Time
management, use of graph, purpose of the report, etc. Visuals used must be understandable
to the audience.
The presenter must make sure that presentation is completed within the time allotted.
Sometime should be set apart for questions and answers.
Written report may be classified based on whether the report is a short report or a long
report. It can also be classified based on technical report or non technical report.
Written report should contain title page, contents, executive summary. Body, conclusions
and appendix. The last part is bibliography.
There should not be endless description in report writing and qualitative data is not to be
excluded.
14.8 Keywords
Appendix: The part of the report whose purpose is to provide a place for material which is not
absolutely essential to the body of the report.
316
Bibliography: The section to list the publications or sources that you have consulted in
preparation of report
Notes
2.
3.
4.
5.
6.
7.
What are the various criteria used for classification of written report?
8.
What are the essential content of the following parts of research report?
(a)
Table of contents
(b)
Title page
(c)
Executive summary
(d)
Introduction
(e)
Conclusion
(f)
Appendix
9.
10.
need
2.
reality
3.
decision maker
4.
Currency
5.
Research report
6.
final
7.
Interpretation
8.
analysed
9.
induction
10.
communication
11.
Long
12.
opening
13.
table of contents
14.
Title
15.
primary
16.
Consistency
17.
systematic
317
Research Methodology
Notes
Books
Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.
Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.
S. N. Murthy and U. Bhojanna, Business Research Methods, Excel Books.
318
Statistical Tables
Statistical Tables
Notes
I. Logarithms
319
Research Methodology
Notes
I. Logarithms
320
Statistical Tables
II. Antilogarithms
Notes
321
Research Methodology
Notes
322
II. Antilogarithms
Statistical Tables
Notes
III. Binomial Coefficients
IV. Values of em
Note: To obtain the value of e1.75, we write e1.75 = e1 e0.75 = 0.s36788 0.4724 = 0.17379
323
Research Methodology
Notes
V. Ordinates of Normal Curve
0.3989
0.1182
1.56
324
1.56
Statistical Tables
Notes
VI. Areas under the Normal Curve
P(z)
0.4945
P(0 z 2.54) = 0.4945
2.54
325
Research Methodology
Notes
VII. Critical Values of t
2.086
+2.086
1.725
two tailed test t.05,20 = 2.086 ; one tailed test t.05,20 = 1.725
326
Statistical Tables
Notes
p( 2)
2
.05,12
21.026
= 21.026
327
Research Methodology
p (F )
6.72
F.01, 9, 7 = 6.72
Notes
328
Statistical Tables
Notes
329
Research Methodology
Notes
330