Keywords

1 Introduction

Nowadays analysis of speech is very popular. It started in the second half of 20th century when basic signal characteristics were discovered. Fundamentals are well described in [1,2,3]. This field of study evolved very quickly so there are many applications from key word detection [4] across transcription of fluent speech [5] to recognition of speaker [6]. This article is reserved for those who want to study phonetic analysis and its fundamentals. Even historians whose field of study is 20th century in the Czech Republic can appreciate the most frequented words and words with the highest energy. Linguists can be satisfied with changes in individual speeches whether it’s written or spoken.

Many authors aim their work on linguistic analysis of political speeches. For example, articles End-of-year speeches of Italian presidents or inaugural speeches of US presidents were researched in [7, 8]. Relationship between ideology and language and thematic concentration of Czechoslovak New Year’s Day speeches is analyzed in article [9]. Of course, it is very interesting to study influences of ideology, originality of author and his abilities to differ from uniformity. The most frequent words can provide an information about recent years because they react on the most important events. Some of those words will be listed.

Main goal of this publication is to present words that were said with the greatest energy. Words with the greatest energy allows to track what president emphasized on during a reading. Emphasis of the speaker will be probably on positive words. The only exception could be the time of war. Ideological words may be emphasized in some speeches. There will be one more characteristic calculated for each speech – speech velocity.

2 Voice Characteristics

Analysis of recordings of New Year’s Day speeches will be introduced in this chapter. The intensity of voice (energy) and speech velocity will represent voice characteristic of speaker. Energy tells something about how much emphasis speaker uses and speech velocity shows how fast speaker speaks. These variables can be influenced e.g. by age or by sickness. Then the words having the greatest energy can be found. It could be interesting to compare these words with most frequent thematic words. President didn’t have to be an author of written text. But he could highlight any words he wanted to (Fig. 1). It depended on what he considered to be important. This is the example of individuality. Then ZCR (zero crossing rate) characteristics will be shown.

Fig. 1.
figure 1

Source: own.

“Dear fellow citizens” – Václav Havel (1998).

2.1 Obtaining Data and Its Processing

Source data have been obtained from website www.rozhlas.cz. Speeches are recorded with useful software Audacity. Sampling rate of each speech is 8 kHz. Each recording is modified because the original ones contain a music before the speech starts. Calculations of voice parameters are realized in MATLAB (Fig. 2). Scheme of processing can be simplified as on Fig. 3.

Fig. 2.
figure 2

Source: own.

Segmentation of recording into frames.

Fig. 3.
figure 3

Source: own.

Audio processing.

Segmentation means that the record is divided into frames of the same length (typically 20 ms long). Frames are overlapping each other right in the half (in this case). Overlapping is recommended to the fact that parameters can be changed in jumps. So this enhancement improves the dramatic changes and it can describe even changes near an edge of frame without loss of useful data. After the segmentation follows a parameterization. Feature vector values of each frame is computed during the parameterization. Features can be divided into: basic, spectral, cepstral and dynamic. ZCR and energy ranks among the basic features. Feature extraction and segmentation is also discussed in [2].

2.2 Intensity of Voice

The intensity of voice is characterized by energy. So the energy is a key parameter which defines the intensity of voice. Energy is defined as the sum of squared values of samples within one frame. Logarithm function is used for better range of energy values. In this case Log energy of ordinary noise is around 5. Whenever speech is contained in recording, values of energy are greater for those frames. Typically, the energy of speaking person can reach even value of 15. It depends on how loud speaker speaks. Log energy is defined as

$$ E = \log \left( {\sum\nolimits_{n = 0}^{{{\text{L}} - 1}} {x^{2} \left( n \right)} } \right) , $$
(1)

where L is the frame length, concretely the number of samples contained in the frame. x(n) is the designation for the current sample value.

In comparison of all presidents it’s evident to see that president Hácha spoke not as loud as others. He had no emphasis. This could be caused by political situation. Hácha used to be a president during the hardest time of the Czechoslovak history. He was helpless president of protectorate state. The only thing he could do was to make people feel calm and safe, even if it wasn’t possible. As for president Husák, very significant decrease of energy was observed between years 1978 and 1979 (Fig. 4).

Fig. 4.
figure 4

Source: own.

Log energy: mean value.

2.3 ZCR

Zero crossing rate is a parameter that characterizes changing of sign from negative to positive or back. Zero crossing rate is related to the frequency. There is one value of ZCR for each frame (the same as for the energy). The principle of ZCR can be easily explained with Fig. 5. ZCR value is equal to the count of all dots. The dots are placed to the points where signal intersects x axis and changes the sign.

Fig. 5.
figure 5

Source: own.

Explanation of ZCR - 20 ms frame containing of phoneme “a”.

It’s often used for voice activity detection [11] – to find out if human speech in record is present or not. As for voiced signal ZCR values are typically low. Noises and unvoiced signals have higher values. This method is sensitive to noises and direct component shifts. It even allows us to find out if concrete phoneme is voiced (b, d, g, z, v, h, …) or unvoiced (p, t, k, s, f, ch, c, …). Especially the sibilance (s,c,š,č,…) have higher ZCR values (Fig. 6).

$$ ZCR = 1/2\sum\nolimits_{n = 1}^{{{\text{L}} - 1}} {\left| {\text{sgn} \,x\left( n \right) - \text{sgn} \,x\left( {n - 1} \right)} \right|} . $$
(2)
Fig. 6.
figure 6

Source: own.

Zero crossing rate: mean value.

Data variability is relatively high. So, the mean value of ZCR is not that good to represent individual speaker. Better results can be obtained using ZCR dynamically. That means ZCR of each frame is used. Then search for dynamic changes instead of treating it as one static value. It’s preferable to use it for each frame.

2.4 Speech Velocity

For the purposes of the article there is a created parameter that can be used to link results of text and voice interpretation into one value that characterizes the speaker. It’s called speech velocity. This mean value represents how many words the speaker pronounces during the time of one second. The speech of president Husák from 1989 is significantly the slowest. President Hácha is speaking relatively slowly too. On the other hand, the fastest tempo of speaking can be recognized in speeches from 1938, 1943 (Beneš), 1959 (Novotný) and 1996 (Havel) (Fig. 7).

Fig. 7.
figure 7

Source: own.

Speech velocity: mean value.

3 Characteristics of Written Text

All studies are realized for 87 speeches of Czechoslovak presidents, Czech presidents or Czechoslovak prime ministers. The unique situation happened due to the World War II. The Czechoslovak Republic had two presidents. President Beneš left his country and exiled to the Great Britain. But he was still very politically active. Then Hácha was chosen to be a president of protectorate. So, both groups of speeches were analyzed between 1940 and 1945. On the other hand, president in exile is considered to be more important subject of our analysis.

We can expect changes in using of different length words during a long history of New Year’s Day speeches. Therefore, the first aim of our calculation is to determine average of word length. Calculations of text parameters are made using software based on Java [10] called “Statistika v lexikální analýze”. This GUI (Graphical User Interface) has been created during diploma thesis. It makes easier the whole text processing. The software allows to analyze frequent letters and words, length of words, aggregation and alliteration and some other features. The original purpose of existence of software is analysis of poems and its translations as in [12].

3.1 Mean of Word Length

On the Fig. 8 can be seen mean values of length of words of analyzed texts. Length of words of communistic presidents (Zápotocký, Novotný, Husák) is much greater than nowadays (Havel, Klaus, Zeman). President Beneš used the shortest words. The greatest variability can be seen at Svoboda and Hácha.

Fig. 8.
figure 8

Source: own.

Length of words: mean value.

Estimate of expected value is given by

$$ \bar{x} = 1/n\sum\nolimits_{i = 1}^{k} {if_{i} } , $$
(3)

where i = 1, …, k is length of word, k is length of the longest word.

3.2 The Most Frequent Words

Conjunctions and prepositions of course ranks among the most frequent words. Conjunction “a” is the most frequent in all speeches except Novotný (1964) – “v”, Svoboda (1973) – “v” and Hácha (1944) – “se”. Figure 10 is the list of sorted conjunctions and prepositions used from first to fourth position. The common words can be seen. The most frequent words with meaning will be presented in Fig. 10 too. These words differ much more than prepositions. Presidents react on current political events such as crisis, protectorate, war or return of democracy. Meaning words can provide a quick preview of content. Comparison with inaugural speeches of US presidents can be interesting. As for words with meaning, for example Roosevelt (1933) said words: HAVE, NATIONAL and Truman (1949) used the words: WORD, HAVE, NATIONS, PEACE, FREEDOM, PEOPLE, FREE, UNITED, MORE, SECURITY, DEMOCRACY [7]. See Fig. 9 or 10.

Fig. 9.
figure 9

Source: own.

The most frequent words (Czech version).

Fig. 10.
figure 10

Source: own.

The most frequent words and words with the highest energy.

Figures are divided into some subsections. As mentioned before, the Czechoslovak Republic had two presidents between 1940 and 1945. In 1993 the second anomaly appeared. The Czechoslovak republic ceased to exist. Since 1993 the country was divided into two smaller autonomous countries: The Czech Republic and the Slovak Republic. So, the president Václav Havel had no speech in 1993. Prime ministers were speaking to their nations instead of president.

Rows are sorted by years. Each row has its color depending on president or prime minister. Colors were chosen according to all figures. The first column contains first four most frequent conjunctions and prepositions. Then there are three columns containing the most frequent words sorted by order. The last column shows the word with the highest energy. Those words are written by uppercase.

3.3 Number of Words

Scatter chart will be used to demonstrate a vocabulary richness. Coordinates on axis x means total number of words in speech and coordinates on axis y means total number of different words. Functional dependency can be modeled by Gompertz curve. Presidents with values above the curve have greater ratio of words than other presidents. Language richness of speeches under the curve can be considered lesser. In article [9] author mentioned that thematic concentration of president Havel is surprisingly low. But this claim doesn’t seem to correspond with language richness. According to the Fig. 10, ratio of number of different words and total number of words is greater as for Havel. This could be caused by choosing different methods of evaluating language richness (Fig. 11).

Fig. 11.
figure 11

Source: own.

Number of different words.

4 Conclusions

This article’s goal is to present results of our research and show that data we already had can be processed in different way. The extraction of information is much discussed nowadays. Main purpose of research is finding the words with the greatest energy. Because they have historical importance, they can be used as keywords and they even characterize the speaker.

Scale of publication doesn’t allow to detail comment and the description of used algorithms. Many hours of machine time have been needed during the calculations of phonetic parameters of speeches. Archive [13] contains 74 speeches. So, this is more than 19 h of recordings to be analyzed. Before calculation of mean values of ZCR and Log energy, there was an extensive table for each speech. Presented parameters were created by reducing the table containing millions of values (each frame parameter values) into one mean value. Unreduced data may be used for further analysis.

Comparing the table of most frequent thematic words with table containing the words pronounced with the greatest energy brings almost no match. The speaker didn’t emphasize the most frequent words. But he chose to highlight other words. For example, Masaryk talked about economy. Beneš emphasized the war and human kind. Novotný insisted on hard work and improving the communistic country. Havel emphasized the very first words: “Dear fellow citizens.” It can provide some information without listening to the whole speech. It even characterizes the president himself and an era of each president (the most important events, standard of living, relationship between president and citizens).