Nothing Special   »   [go: up one dir, main page]

Chapter 3. Systematization and Processing of Statistical Data

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Ass. Prof. Ph. D. L.

Marcu, 2019-2020

Chapter 3. Systematization and Processing of Statistical Data

1. Definitions

Statistical processing of data is the stage in which the individual data, obtained by
statistical observation for each unit of the population, are transformed in indicators
characterizing the population as a whole.
Statistical data encoding involves assigning numerical codes for the variables expressed
in words in order to allow easier processing by a computer.
Statistical classification is a process of systematisation of mass data in classes (groups),
based on common attributes, to be as homogenous.
The Frequency Distribution1 is made up of two series of parallel rows of data of which
one is represented by grouping intervals and the other by frequencies of a certain value in the
range.

2. Level of measurement2

Data characterizing a statistical unit may be measured and expressed by one of the
following levels of measurement:
- Nominal;
- Ordinal;
- Interval;
- Ratio.

Nominal-Level Data is used for qualitative variables expressed in word. The level of
measurement not allows algebraic calculations; the observations can only be classified and
counted. There is no particular order to the labels. Ex. occupation, area of specialization of
individuals, goods for consumption, etc.
Ordinal-Level Data is used when studied variable values cannot be measured, but can be
ordered ascending or descending. Ex. opinion about the price of a product can be: „very
expensive”, „expensive”, „acceptable”, „cheap”, „very cheap”. It allows ordering units by
rank. Although the algebraic calculation of variable values is not possible, we can set a medium
score.
Interval-Level Data is used for variables that are assigned numerical values according to
the distance (range) of values. This level includes all the characteristics of the ordinal level but,
in addition, the difference between values is a constant size. Ex. temperature, time, distance
between two towns. Thus, the temperatures or the moments can be easily ranked, but we can
also determine the difference between temperatures/moments.
Ratio-Level Data is used for numerical variables. It enables all numerical operations. The
ratio between two numbers is meaningful: wages, units of production, changes in stock prices
etc. The 0 point is also meaningful: zero LEI means you have no money, for example.

1
In Romanian: “seria de distributie de frecevente” or “seria de distributie”.
2
In Romanian: “scale de masurare”.

1
Ass. Prof. Ph. D. L. Marcu, 2019-2020

3. The grouping of data

The grouping of data is the process of sharing the statistical population units in
homogenous classes, depending on the variation of one or more variables, also named as
„grouping variables”.
Grouping variable = variable taken in account when statistical units of the population
are divided in homogenous classes.
 Homogenous class = group of units within the variable variation is minimal.

Types of statistical classes:


a) According to the number of grouping variables:
- Simple classes: only one variable used for grouping;
- Classes in combination: by 2 or more variables;
b) After the content of variables:
- Chronological classes: the variable used is the time;
- Territorial classes: the variable used is the space;
- Attributive classes: the variable is a word or a number;
c) By the size of the interval:
- Classes of equal intervals (numerical groups);
- Classes of unequal interval;
d) By the nature of variation:
- Classes of continuous variation: the upper limit of a range is a lower limit of the
following;
- Classes of discontinuous variation: the upper limit of a range differs by a unit of the
lower limit of the next.

3A. The grouping in equal intervals

It is used when data moves somewhat uniform, with a continuously variation between the
minimum and maximum values of the series. Characteristic for this class is that all intervals are
equal.
The steps to do for this type of class are the following:
a. The ordering of the series of individual data;
b. Determination of the absolute maximum amplitude of the variation: A = x max -
xmin
c. Determination of the number of classes and the size of the interval3:
Case 1. When the number of classes is known (r) we determine the size of the interval
(h) using the formula: h = A/r
Case 2 When the number of classes is unknown, the size of interval is determined
with Sturgers’ formula: h = xmax – xmin / (1+3,322 lgN), where N = total number of
units in the population.
Then we calculate the number of classes: r -= A/h
d. We realize the Frequency Distribution:

3
Class interval or width.

2
Ass. Prof. Ph. D. L. Marcu, 2019-2020

– The lower limit of the first interval is equal with xmin or a convenient
lower value;
– The upper limit of the first interval is obtained by adding the interval
size to the lower limit of the interval;
– The upper limit of the first interval became the lower limit of the
next interval after which the process is repeated.
e. Count the number of items in each class. The number of observations in each
class is called the class frequency.

Example:
For 20 employees we know the following data on length of service:
3, 5, 7, 29, 11, 13, 16, 20, 17, 18, 19, 19, 3, 21, 23, 25, 2, 25, 29, 16.
Please group the employees on five equal intervals by length of service.

Solution:
a. The ordering of the series of individual data:
2, 3, 3, 5, 7, 11, 13, 16, 16, 17, 18, 19, 19, 20, 21, 23, 25, 25, 29, 29
b. A = xmax – xmin = 30 – 2 = 28 years
c. R = 5, h = A/r = 28/5 = 5,6 years ≈ 6 years
d. The frequency distribution is:

Classes Frequency
0-6 4
6-12 2
12-18 5
18-24 7
24-30 2
Total 20
We choose 0 for the lower limit, instead of 2 (the smallest length of service), as it was considered
that it does not affect the class. In addition, it is possible to be engaged (in the future) a person
with a length of service lower than 2 years, in which case this person will be assigned to a class
already formed (0-6 years).

3B. The grouping in unequal intervals

The grouping in unequal interval is used when:


- The analysis require qualitative research of some types which differ in size from one
class to another (ex. Class of employees on the seniority according to the law, class of
companies after the number of employees);
- When the population is structured on some qualitative types. Ex. SMEs are grouped
in three categories: micro-enterprises (0-10 employees), small-size enterprises (10-50
employees) and medium-size enterprises (50-250 employees).

3
Ass. Prof. Ph. D. L. Marcu, 2019-2020

4. Presentation of statistical data

4.1. Statistical tables

Statistical table is an orderly presentation of data and is recommended if we intent to


calculate derived indicators.

A statistical table contains the following components:


- Table title: briefly show what the data relate;
- Table model: horizontal and vertical grid lines;
- The unit (mentioned in the general title or in internal sub-titles);
- Name of rows (topic of the table) and columns (predicate of the table): show the
variables taken into account;
- The data source (below the table);
- Statistical data (in the table);
- The notes (below the table);
- Categories that contain numbers, symbols.

Official statistics use the following symbols:


- „0” numerical expression different from zero and less than half of the unit;
- „…” data is not available;
- - „-” numerical expression of zero;
- - „x” the numerical expression has no meaning;
- „xp” provisional data;
- „xr” revised data.

Types of tables:
- Descriptive tables : used for observation, recording;
- Tables for processing data : used for algorithm and calculation of indicators;
- Simple tables: used for simple groups of data;
- Tables of groups: used for presentation of simple classes of data and also for total values
by classes and frequencies;
- Contingency tables (crosstab or cross tabulation): used for presenting data grouped by
two variables.
- Association tables (or associative tables): reflects the link between two alternative
variables.

Ex. simple table Ex. Association table


The main reason of purchasing The travel reason by age
Reason Nb. persons
Price 129
Promotional offers 61
Traditional recipes 185
Company reputation 98
Other 27
Total 500

4
Ass. Prof. Ph. D. L. Marcu, 2019-2020

4.2. Statistical series

Statistical series as defined as a match/correspondence between two strings of statistical


data. The first row represents the variation of the grouping variable and the second string
indicates the result of centralization of the frequencies of certain values or the values of other
variable with which it correlates.

4.3. Statistical graphs

Statistical graph is a way of presentation data using conventional images in order to allow
identification of essential issues of the studied phenomenon.
The graph describes a phenomenon in a simplified manner using figures and sizes. It
facilitates the formation of a visual image regarding:
- the phenomenon trend,
- the interdependence of variables,
- structures and their changes in time and space.
Graph helps to choose the methods of statistical calculation and also to approximate the
statistical sizes. Graphs can accompany the tables. The graph is used if the user does not intend
to make own calculations. The graph neglects the details, have only suggestive data, reports
briefly on trends and interdependence of phenomena. Reflecting reality is correct if the principle
of proportionality is accomplished (the correct choice of scale and type of chart).
Graphic elements are:
- the title of the chart: suggest data nature, time and space in which data are applied;
- the axes of the graph: usually rectangular axes;
- the scale of representation: ensure the proportionality of the representations, indicates the
equivalent of a unit and helps the gradation of axes (scale may be linear or logarithmic);
- the network of the chart: parallel lines to rectangular axes or concentric circles networks;
- Chart legend: explains the conventional symbols and colors used;
- Data source: mentioned below the chart.

Types of graphs:
a) Column Chart: on horizontal axis (Ox) are placed so many columns how many indicators
we have. On vertical axis (height, Oy) it is the size indicator.
b) Bar Chart: replace the vertical columns of the column chart by horizontal bars. This
graph is used for example in demographic pyramid representation.
c) Geometric Chart (square, rectangle, and parallelogram): for the representation of volume
indicators or structures.
d) Charts for structure presentation: (d1) circle divided into sectors (pie chart); (d2) column
(bar): total area is equal to 100% and the rectangle is divided into as many parts as many
structural elements have the phenomenon.
e) Chart for representation of Frequency Distribution: (e1) histogram: on Ox it indicate the
classes and on Oy the frequencies (e2) frequency polygon = it is obtained if the middle
top points of the columns of the histogram are joined. On Ox we represent the centres of
intervals and on Oy the frequency of each interval.

5
Ass. Prof. Ph. D. L. Marcu, 2019-2020

f) Other type of charts: time chart4 (used to represent the time series5), polar diagrams,
cartograms, natural or symbolic figures.

Exemple of graphs:
Pie Chart

Bar Chart Column Chart

Bar Chart used for values comparison Bar Chart used for structure presentation

Icon Chart6 Figurative Chart

4
In Romanian: “cronograma”.
5
In Romanian: “serii cronologice”.
6
In Romanian: “Pictograma”.

6
Ass. Prof. Ph. D. L. Marcu, 2019-2020

5. Relative statistical measurement7

Relative statistical measurements are statistical indicators resulted from a ratio between
two absolute, average or relative values. Relative measurement can be determined as the ratio of
two statistical indicators and they suggest the proportions between the indicators compared.
Depending on how they are determined and the meaning, there are five categories: structure;
coordination; dynamic; plan; intensity.
Any relative size, with the exception of relative intensity measurement, can be expressed:
- as a coefficient (the result of the ratio);
- as a percentage (the result of the ratio x 100);
- as a per-mil8 (the result of the ratio x 1000).

5.1. Relative structural measurement9

Relative structural measurement indicates the structure of a population or phenomenon


and is determined as a ratio between each part of the population and the population as a whole.

xi
For the attributive variable: xi* = ꞏ 100 where xi* = weight of xi in total;
∑ xi
For recurrent frequency10:

fi
fi * = ꞏ 100 where: fi* = weight of class “i” frequency in total population;
∑ fxi
For Frequency Distribution based on grouping by classes:

xi f
yi* = ꞏ 100 where: yi* = weight of xi class having fi frequency in total population.
i

∑ xi f i
The sum of the relative structural measurement is 100%. Graphical representation of this
relative measurement is mainly by pie chart.

5.2. Relative coordination measurement11

Relative coordination measurement is determined as the ratio of classes of the same


population or a ratio between population of the same type, existing in the same units of time but
in different units of space.

7
In Romanian: “Marimi relative”.
8
Parts per thousand (‰).
9
In Romanian: “marimile relative de structura”.
10
In Romanian: “frecventa de repetitie”.
11
In Romanian: “marimile relative de coordonare”.

7
Ass. Prof. Ph. D. L. Marcu, 2019-2020

xA xC
KA/C = ꞏ100 and KC/A = ꞏ100
xC xA

Examples: ratio between population of county X and population of county Y; ratio


between the number of doctors and the number of nurses, between the number of teachers and
the numbers of pupils/students etc.

5.3. Relative plan measurement12

Relative plan measurement expresses the extent to which targets have been met by a
company. Example:

- Achievement of the plan:

xreal
MRP = ꞏ100, where xreal = level achieved of the variable; xpl = level planed of the variable.
x pl

- Coverage contracts:

x contracted
MRAC = ꞏ100
x planned

5.4. Relative dynamic measurement13

Relative dynamic measurement allows comparing the level reached by a variable in two
periods of time (0 – reference period, t – current period). They are used to describe the time
evolution of a phenomenon (time series).

xt
i= ꞏ100
x0

5.5. Relative intensity measurement14

Relative intensity measurement is determined as ratio of two variables interdependent.


The ratio creates another variable that has its economic content and also its own unit of
measurement.

Fund salaries Production


Ex.: average wage = , labor productivity =
Number of employees Number of employees

12
In Romanian: “marimile relative ale planului”.
13
In Romanian: “marimile relative de dinamica”.
14
In Romanian: “marimile relative de intensitate”.

8
Ass. Prof. Ph. D. L. Marcu, 2019-2020

9
Ass. Prof. Ph. D. L. Marcu, 2019-2020

Worksheet no. 3.1. Study guide individual/group

1. Which type of level of measurement is indicated for the following variables:

a) Occupation of employee: ___________________________________


b) Seniority: _____________________________________________
c) Productivity: ____________________________________
d) Customer perception about the quality of products/services received:
____________________________________________

2. What are the most representative ways to present the structure of a population?

3. When the grouping on unequal interval is used?

4. After determining a relative intensity measurement result:


a) A weight;
b) A dynamic;
c) A new variable.

5. In order to characterize the population components, we can use:


a) Relative structural measurement;
b) Relative intensity measurement;
c) Relative coordination measurement.

6. Relative coordination measurement is determined as:


a) Ratio between part and whole;
b) Sum of the component parts of a population;
c) Ration between the component parts of a population.

10
Ass. Prof. Ph. D. L. Marcu, 2019-2020

Worksheet no. 3.2. Exercises

1. We know the following data regarding a company:

Activity Number of Dynamics of employees Turnover in 2018


employees 2018 number 2018/2017 (%) (thousand lei)
(persons)
A 50 105 500
B 200 95 800
C 100 110 700
D 50 130 500
Total 400 x 2500
Requirements:
a) Determination of the dynamics of employees in 2018/2017 overall.
b) The turnover per employee in 2018 for each activity and overall.

11
Ass. Prof. Ph. D. L. Marcu, 2019-2020

2. Data analysis on three departments of a company indicate the following information:

Departme Nb. workers (persons) Production (thousand lei)


nt Reference period Current Reference Current period
period period
A 100 120 600 900
B 70 90 300 450
C 120 60 460 400
Total 290 270 1360 1750

It requires to determine all relative measurements possible and to represent graphically the
structure of production.

12

You might also like