The Feasibility of Speech Recognition As A Form of Security Measure
The Feasibility of Speech Recognition As A Form of Security Measure
The Feasibility of Speech Recognition As A Form of Security Measure
MEASURE
A Thesis Presented To
the High School Department of
Sacred Heart Academy
of Pasig
And Science 10
Researchers:
TALOSIG, Nathan E.
10- Prudence
Research Advisers:
Acknowledgements iii
Abstract iv
Introduction 2
Conceptual Framework 3
Definition of Terms 11
Synthesis 18
Research Design 20
Research Setting 20
Research Instruments 21
Materials 22
Equipment 23
Procedures 24
i
Statistical Treatment 25
Discussion 40
Summary 42
Conclusion 43
Recommendation 45
Bibliography 46
Curriculum Vitae 47
ii
Acknowldegements
The researchers would like to thank their respective families for giving them the support
they needed to get through such horrendous times. They would also like to thank SHAP
for giving them the opportunity to write such an amazing piece and learn about speech
recognition and security. The researchers wanted to thank their fellow classmates for
helping them in every step of the way. Lastly, the researchers want to thank one of the
best research advisers out there, Sir Raldin Gem Frias and Ms. Kendra Caramat. They
have guided the researchers into creating a product that can revolutionize the world.
From the making of the title to the title defense, they have stuck to us like glue, back
when we had no clue on what to do. The researchers thank them for being patient and
understanding. The researchers would also like to thank the panelists for their
constructive criticism which helped the researchers improve the research even more.
Abstract
The first speech recognition systems were focused on numbers, not words. In 1952,
Bell Laboratories designed the “Audrey” system which could recognize a single voice
speaking digits aloud. Ten years later, IBM introduced “Shoebox” which understood and
responded to 16 words in English. Speech recognition was invented with the idea of
making things more hands-free and easier. Though the world is more secure than ever,
that is no reason for you to take the issue of security lightly. This research titled “The
on creating a security measure using speech recognition that is both effective and
iii
cheap. The researchers used an open-source speech recognizer as the base to test
how effective a regular speech recognizer would be. The research was mainly aimed at
finding out how feasible a speech recognition activated security measure is and if the
call/text that will be produced is fast enough to help a person in danger. This thesis
answered the questions mentioned through a series of tests that determined the overall
effectivity of the product. The researchers conclude, with the results taken from the
iv
CHAPTER 1
Introduction
There are numerous crimes both in homes and public places such as theft,
robbery, homicide and others. Nowadays, the use of an effective security measure is
physical actions and events that could cause serious loss or damage to an enterprise,
agency or institution. This includes protection from fire, flood, natural disasters, burglary,
theft, vandalism and terrorism.” (Rouse, 2016, para. 1). This also ensures that it is the
ability to perform its appointed task by protecting it from attacks inside and outside the
organization. There are methods and measures that are meant to detect attackers and
Hope (2019) stated that it is a computer software program or hardware device with the
ability to decode the human voice. It is commonly used to operate a device, perform
commands, or write without having to use a keyboard, mouse, or press any buttons.
Speech Recognizer occurs when the recognizer has recognized an assigned word or
words that was programmed in the software. Rouse (2016) stated that it has a limited
vocabulary of words and phrases, and it may only identify these if they are spoken very
1
clearly. Speech recognizer software are easy to use because it is frequently installed in
computers and mobile devices. The disadvantage of a speech recognizer includes its
different languages, and its inability to sort through background noise. But all
throughout, it’s the most simple and easier form of security measure because it enables
translation, and creates print-ready dictation. Speech must be converted from physical
sound to an electrical signal with a microphone, and then to digital data with an analog-
to-digital converter.
helps a lot of things in our society because nowadays, there are many crimes reported
ahead. But it is also usually overlooked by most organizations. There are many reasons
to do it — the attacker doing it for financial gain, personal gain, for seeking revenge or
for the vulnerable target available. Security measure is challenging than previous
decades as there are more sensitive devices available such as USB drives,
smartphones, laptops, tablets and many more that enables the stealing of data easily.
Though nowadays, there are high tech security measures that cannot pass by easily.
The great advantage is that criminals or attackers need to pass through many methods
and layers of security. And as the result, they will have a hard time gaining their
objectives. There are many types of security measures that is really effective and
somehow efficient for the sake of our safety. It has three important components which
2
are access control, surveillance, and testing. All obstacles should be placed in a way
where attackers frequently do their objectives. The location on where the security
Amos (2018) stated that early systems of a speech recognizer were limited to a
single speaker and had limited vocabularies of about a dozen words. The earliest form
software. It is frequently used for dictation and for giving commands to computer-based
systems. Velde (2019) stated that the first-ever recorded attempt at speech recognition
technology dates back to 1,000 A.D. through the development of an instrument that
could supposedly answer “yes” or “no” to direct questions. Modern speech recognizers
has the ability to recognize speech from multiple speakers and have infinite
programs. Mostly ASR programs requires the user to train the program to recognize the
voice so that it can more accurately convert the assigned speech into text. The first ASR
device was used in 1952, it recognizes single parts of the speech that was spoken by a
certain user.
Conceptual Framework
The researchers used the CIPP Evaluation model in order to show the Context,
Input, Process, and Product of the study. The CIPP Evaluation Model is a decision-
focused framework made by Daniel Stufflebeam and his colleagues in the 1960s for the
main purpose of guiding evaluations of programs and projects. The CPP Model is also
3
defined as the “systematic collection of information about the activities, characteristics,
and outcomes of programs to make judgements about the program, improve program
CIPP is an acronym for the four main concepts this model use. It stands for context,
input, process, and Product. The researchers chose the CIPP Model for it “aims to
provide an analytic and rational basis for program decision-making, based on a cycle of
The researchers decided to employ symbols and colors to provide a clean and
or actions of people or things involved in a complex system or activity. Each color in this
diagram represents a certain part of the process. The researchers decided to use colors
to distinguish between the different concepts to make the diagram simple and not
cluttered. The blue rectangle signifies the Context or the objectives of the study, the
black rectangle signifies the Input, the green signifies the Process, and the yellow
A blue rectangle was chosen by the researchers to show the Context. Blue was
researchers collected and assessed information to scale the main objectives the
4
researchers want to accomplish. The researchers stated 4 objectives that they want to
system for local store owners who can’t afford the more expensive security measures.
They wish to provide an enticing alternative that won’t cost us much as the other
security systems. They also want to show people the advantages of a hands-free
security system and maybe give them a glimpse of the future of security systems. They
also aim to prove that speech recognition as an effective security measure. As speech
recognition is currently looked at as a “pet project” in the security system industry, the
researchers want to show people that using speech recognition as a form of security
measure is not only possible but also very effective. They lastly want to encourage
After the Context is the Input Evaluation Stage. The purpose of this stage is for
make and execute the procedure to create the intended product. After the researchers
outlined their main goals, they then made a list of the components required. The main
components needed in this study are the Raspberry Pi, the brain of the whole product
and the microphone, the component used to detect the sounds that are needed to
activate the security measure. Since the Raspberry Pi comes preinstalled with its own
OS and it’s powerful enough to be used independently, a computer is really not required
unless a powerful system is required to process or load a certain code. Other equipment
such as the cables (micro-USB cable and HDMI cable) are essential in the process.
5
The Process Evaluation Stage is one of the most important stages in the CIPP
Model since the quality of the product here is investigated, documented and assessed
(Wilson & Mertens, 2012). The steps need to be executed properly in order to produce
excellent results. Programming is very crucial as one mistake can stop the program
from working. The researchers need to be meticulous and careful in their coding.
Testing and Troubleshooting are very important steps to spot and correct mistakes.
Once the code is inserted, testing needs to be followed to make sure no mistakes are
made.
The last and final stage in the CIPP Model is the product evaluation. A teardrop
was chosen to show emphasis on the outcome. This stage assesses the final outcome
of the study, whether expected or unexpected. Once the steps have been followed
properly, the Product will be made. Once the product is made, the researchers can then
6
Conceptual Framework
Context
1. make an affordable security system for local store owners;
2. show the advantages of using a hands-free security system;
3. prove that speech recognition can be an effective security measure; and
4. encourage security companies to use speech recognition in their security
measures.
Input
- Raspberry Pi 3 Model B
- Micro-USB Cable and Power Brick
- RODD Brand USB Computer Microphone
- USB Speakers
- HDMI Cable
Process
1. Connect all necessary components to Raspberry Pi.
2. Program all the required codes.
3. Test and troubleshoot.
Product
Security Measure using Speech
Recognition Software
7
Figure 1. The procedures needed to make a security measure using speech recognition
software.
recognizer software wherein the speech recognizer will recognize a certain phrase or
word provided by the researchers and the software will secretly call the police and a
close relative/friend alerting them that there is an emergency. The researchers will also
explore the concept of speech recognition software as an alternative for other such
forms of security. The researchers’ aim to give an affordable alternative to local store
owners who cannot afford the other security measures. During the study, the
1. What will be the optimal distance between the user and the speech recognizer
2. How accurate will the sensor and speech recognizing software be in recognizing
and identifying when the user is trying to activate the security measure?
3. How long is the elapsed time between the activation of the speech recognition
Hypotheses
8
1. If the microphone isn’t being obscured and the background noise levels are
ranging from 35 to 110 dB, then the approximate needed decibel the user needs
away from the mic. The decibels needed to be produced by the user is directly
proportional to the distance of the user from the microphone. If there are no
obstructions covering the microphone, then the optimal distance of the user from
2. If the user is standing 5 meters away from the microphone and the user said the
word in 50-60 dB, then the software has about 90% chance of success, keeping in
mind other factors such as the background noise, possible objects covering up the
3. If the user spoke at the optimal distance between the user and speech recognizer
recognizer, then the pre-recorded call/message inputted will take between 5-20
Caliwan (2019) stated that though the total number of index crimes have
dropped, the number of robberies and theft has remained constant, only dropping by
0.4%. The researchers aim to help Filipino citizens by giving them an alternative
security system that is effective and seamless. This study aims to show Filipino citizens
the effectivity of speech recognition when used in security systems. The researchers
want to provide people an affordable yet reliable security measure usually seen in
9
The researchers also aim to show security system companies that speech recognition
can be a viable form of security. This study aims to specifically help these groups of
people:
Small Local Store Owners. Stores are prone to being robbed. Especially small
local stores. Small local store owners usually can’t afford security systems that can aid
cheap yet reliable security measure. This product takes the feature of speech
system companies to look into speech recognition and invest on this portion of the
security system industry. With this, speech recognition-based security measures can
Corner Stores and Gas Stations. These places are still open up until midnight that
this is get robbed easily by the burglars // this is the perfect target for burglars. Mesa
Alarm Systems (2017) stated that over 7,000 corner stores are robbed each year and
most burglars rarely walk away with more than $900. Since most robberies happen at
night, people should limit their time at these locations after dark. The researchers aim to
lessen the number of robberies in these places. With the speech recognizer, the
10
This study aims to create a device that can detect emergency based on sound. In
order to do so, the aspects looked into was the sensitivity of the sensors even to the
faintest sounds. This study will cover the effectivity of the product in capturing sounds
accurately. The device will be tested to different sound levels to determine its capability
in capturing and detecting sounds based from the user’s activities and speeches and
distinguishing the speaker from background noises. To further enhance the device' s
capability, the researchers will also conduct a test on how precise the device can pick
up sounds based on the environment. The researchers will also focus on programming
the sensors capability in picking up sounds efficiently and also the capability of the
Definition of Terms
For better understanding of this study, the following terms that are defined operationally:
2. Artificial Intelligence (AI). It is the branch of computer science that deal with
identify the words a person has spoken or to authenticate the identity of a person
11
5. Cyber Espionage. It is an unauthorized spying by computer; the term generally
mechanical and digital machines, objects, animals or people that are provided
with unique identifiers (UIDs) and the ability to transfer data over a network
10. Phoneme. It is any of the abstract units of the phonetic system of a language
11. Physical Security. It is that part of security concerned with physical measures
and phrases, and it may only identify these if they are spoken very clearly.
12
14. Speech Recognizer/Speech Recognition. It is a computer software program or
associated with a single entity with a given system, this made it possible to
CHAPTER 2
The literature and studies cited in this chapter tackle the different concepts,
understandings, and ideas related to the topic of the effectivity of speech recognizer as
and different developments related to the topic. The literature and studies included in
this chapter can help in familiarizing the reader to information and abstracts that are
Caliwan (2019) stated that total crime volume has been down and is declining,
thanks in large part to Philippine National Police’s intensified drive against crime and
13
lawlessness. This past year has seen a drop of index crimes (such as robberies,
murders, homicide, physical injury, rape, theft, car napping and cattle rustling.) by
22.6%; from 7421 in May 2018 to 5744 in May 2019. Though the total number of index
crimes have dropped, the number of robberies and theft has remained constant, only
dropping by 0.4%. He stressed the importance of safety. He stated that not because
total crimes have dropped in numbers means people should be comfortable. Filipinos
Bueza (2018) wrote that around 1.4 million families feel victim to common crimes
in the third quarter of 2018, according to a Social Weather Stations (SWS) survey
released on November 29, 2018. The SWS survey held from September 15 to 23
showed that 6.1% of Filipino families (around 1.4 million families) reported victimization
by any of the common crimes within the past 6 months alone (common crimes refer to
pick pocketing or robbery of personal property, break-ins, car napping, and physical
violence.) It also said that 5.6% of Filipino families have suffered from property crimes. It
is very much recommended then that people should have at least a security measure in
stores) and its effectivity on reducing burglaries and robberies. The National Council for
Home Safety and Security stated that homes without alarms are three times more likely
to get burglarized. Burglaries, since the boom of the new generation, have dramatically
14
reduced crime rates down to 28%. It also states the positive and negative effects of
Rode (2019) addressed the importance of security systems for retail stores. He
stated that, as stores are big investments, it can be very upsetting and stressful when
systems not only act as a form of safety measure: it can also be used to deter criminals
from even attempting to enter a person’s store. If a break-in does occur, having a
security system can provide police with invaluable information that can lead them to a
suspect.
2007, para. 2). In this article, she states the meaning of speech recognition and how it
works. Speech recognition works using algorithms through acoustic and language
modeling. Acoustic modeling is the relationship between linguistic units of speech and
audio signals. It is the language modeling matches sounds with word sequences to help
Velde (2019) explained how speech recognition work and its uses. Speech
recognition technology is not just about making things easier. It is also about the safety.
Instead of texting while driving, people can now tell their car who to call or what
dangerous when implemented before it has high enough accuracy. Speech recognition
analyze sounds by filtering what you say, digitizing it to a format it can “read,” and then
15
analyzing it for meaning. Then, based on algorithms and previous input, it can make a
highly accurate educated guess as to what the person are saying. It gets to know the
speaker’s use of language. Background noise can easily throw a speech recognition
device off track. This is because it does not inherently have the ability to distinguish the
person’s voice.
Velde (2019) also said as of May 2017, Google’s machine learning algorithms
have now achieved a 95%-word accuracy rate for the English language. That current
rate also happens to be the threshold for human accuracy. She compared the growth of
speech recognition to a child learning his or her first words. Whereas humans have
refined the process, they are still figuring out the best practices for computers. They
have to be trained in the same way our parents and teachers trained students. That
most current automatic speech recognition systems. This degradation comes mainly
from differences in the learning and use environments of a system. In recent years,
many studies have focused on reducing these differences but the technology, even till
this day, still has a hard time distinguishing the voice from background noises.
recognition. Speech recognition has already proven useful for certain applications, such
recognition for cellular phones, and data entry while walking around a railway yard or
16
clambering over a jet engine during an inspection. Speaker recognition is related to
work on speech recognition. Instead of determining what was said, the focus is on
determining the speaker. Deciding whether or not a particular speaker produced the
utterance is called verification, and choosing a person's identity from a set of known
prototype that includes a security feature wherein they have a feature for calling people
you know. Reynolds also stated that using voice biometrics for security and home
Foster (1996) wrote about his speech activated security system. Speech
actuated security devices and methods whereby a lock, or other security or access
device, may be actuated by a speech input thereto, but without disclosure of the actual
code where doing so to those hearing the code words spoken during use of the security
device. The security device includes a microphone, a display for displaying a plurality of
code elements, and a processor for controlling the display and analyzing the
detected by the microphone and to operate the security device in response thereto. He
stated that using speech recognition can be effective in cutting the physicality in half.
17
Effectivity of Speech Recognizers as Security Measures
De Leon, Hernaez, Pucher, Saratxaga and Yamagishi (2012) wrote about the
synthesizer, which can synthesize speech for a target speaker using small amounts of
training data through model adaptation of an average voice or background model, they
tested and concluded that over 81% of the matched claims are accepted. This result
speech.
Chow, He, Su, Yang and Zhang (2000) focused their paper on the architecture
They concluded that speech recognition is best when there is no background noise and
Synthesis
country. Caliwan (2019), stated that robberies and theft have remained constant from
May 2018 to May 2019. Thus, the reason to have a need to an effective and affordable
security systems that is accessible to the people. Kaysen (2017), stated that homes
without alarms are three times more likely to get burglarized. Security systems are an
essential in every home or stores to protect the property from robberies, theft and other
property crimes. Having a security system can act not only as a safety measure to
protect against burglars and home intruders but to also drive away these criminals.
18
Rouse (2007, para. 2), stated that; “Speech recognition is the ability of a machine
or program to identify words and phrases in spoken language and convert them to a
things easier, it is also about the safety. It has a lot of uses and advantages yet only a
few people tried to use speech recognizer for security measures. One example of
smart home prototype that includes a security feature wherein there’s a feature where it
can call the person’s emergency contact. Foster also made a speech actuated security
system back in 1996. A security device and methods whereby a speech input may be
display for displaying a plurality of code elements, and a processor for controlling the
display and analyzing the microphone signal to detect a proper sequence of code
elements spoken by a user as detected by the microphone and to operate the security
device.
synthetic speech that over 81% of the matched claims are accepted. This result
suggests a vulnerability in the system, the researchers will be using Google’s API Client
Library for Speech Recognition. Velde (2019), stated that Google’s machine learning
something the researchers are tackling on preventing. Chow, He, Su, Yang and Zhang
19
(2000) concluded that speech recognition is best when there is no background noise
In conclusion, speech recognizer has proven itself as a useful tool and the future
affordable and seamless security system with speech recognition capabilities is a step
towards the future. There have been countless examples of people trying to use speech
recognition as a form of security measure and with the current advanced technologies
security system.
CHAPTER 3
RESEARCH METHODOLOGY
Research Design
manner so that the research problem is efficiently handled. It provides insights about
20
design is a model or layout used to answer the research questions. The researchers
design that is thought to be the most accurate type of experimental research, to collect
and gather data and other information that was needed for the product. Bhat (2019)
stated that experimental research is any research conducted with a scientific approach,
where a set of variables are kept constant while the other set of variables are being
order to know if the speech recognizer is effective as a form security measure. The
researchers chose to use the experimental research design to test the security system’s
ability to sort through background noise, properly identify the key word, and quickly
Research Setting
Residences Ortigas, a condominium complex that lies in Pasig, Metro Manila. Each
room had ample space to provide a spacious working environment. One room was used
to test the effectivity of the speech recognizer on a natural setting, wherein the
monotonous routine of hustle and bustle are ongoing. The researchers also provided a
room wherein no background noise is being emitted. This was done to properly record
the sensor’s capability without any physical obstructions. experiment to simulate the
noisy. This place was chosen for the environments usually seen in local stores and to
test the sensor and software’s capability to distinguish between the background noise
and the speaker. This place was also chosen to examine the sensors capability to
21
Research Instruments
researchers used a decibel meter to accurately detect the decibel level of the room. The
researchers also used a tape measure to see the distance between the microphone and
the speaker. The experiment was done to find the effectivity of speech recognizer as a
form of security measure by measuring and identifying the relationship between the
background the background noise level, the distance of the speaker from the
microphone, and the ability of the sensor to properly pick up and identify the key word.
To further test the ability of the speech recognizer the researchers will be using a
to procure the most accurate results. Data collected will then be processed through the
Materials
Table 1. The Raspberry Pi 3 Model B and the Microphone with its Corresponding
22
Home Studio USB 1 ₱928
Condenser Microphone
Speakers
The materials provided above are the components that were used during the
but by supplementing them. The Raspberry Pi is a very versatile product mainly used
for robotics. Due to these reasons, the researchers chose to use the Raspberry Pi
The Home Studio USB Condenser Microphone has a rating of 4.5 stars out of 81
reviews in Lazada. The researchers decided to use this microphone as it also came with
a noise filter which can eliminate most of the researchers’ problems with background
noise.
The F-165 Multimedia Speakers are cheap wired speakers that provides loud
and clear enough sound to alert near neighbors and to scare intruders away.
23
Equipment
Table 2. All the Equipment Used with its Corresponding Quantity, Price and
Appearance.
The equipment used were essentials and mainly used to communicate with the
Raspberry Pi more efficiently. An OTG Cable (a wire that enables a connection between
micro-USB and regular USB) was used to connect normal USB devices to the
Raspberry Pi, since the Raspberry Pi only had micro-USB ports. A monitor was used to
see the input of the device. The HDMI cable was used to connect the monitor to the
24
Raspberry Pi. A keyboard was used to type all the codes necessary and a mouse is
also used to interact with the Raspberry Pi through its user interface.
Procedures
2. Choose an operating system for the Raspberry Pi. The researchers chose to install
3. Install Python, including all the necessary libraries. Download the installer from the
official Linux website. Run the installer then choose the path you want Python to be
4. Install PIP, the package installer for Python which can be used to install packages
from the Python Package Index and other indexes. To install PIP, download get-
pip.py from the official Linux website. Open a terminal using APT and execute curl
get-pip.py.
5. PyAudio, an extension of Python and a requirement for Python to be able to use the
microphone, needs to be installed. To install PyAudio, use the APT and execute
6. Install Google API Client Library for Python, the main library/database the software
will use to properly identify the word by constantly listening to the user and
comparing the audio to thousands of files until they find a match. To install Google
API Client Library for Python, open a terminal using APT and execute pip install
--upgrade google-api-python-client.
25
7. Connect the USB Microphone to the port using an OTG cable. Make sure the
8. Open IDLE, the default source code editor that comes pre-installed with Python, and
https://github.com/malnourishita/speech/blob/master/GUI%20TEST.py
Make sure there are no errors in the code. Test and run the code many times. Run the
Statistical Treatment
To get the results of the data effectively, the researchers used ANOVA to
Analysis of Variance
This can help the researchers in creating a comparison between two or more
variables that allows the researchers to get various results and predictions on two or
Steps In ANOVA
First procedure is to determine the optimal distance between the sensor and the
user.
26
Second procedure is to test how many times the speech recognizer will accept
Third procedure is to assess the elapsed time between the activation of the
27
CHAPTER 4
This chapter overlooks and discusses the results, presentation, analysis and
interpretation of the data gathered by the researchers. This study aims to determine the
an experimental study to properly procure the data presented. Experiments were done
to answer the questions communicated in the statement of the problem. The analytical
The experiments are mainly focused on the feasibility and accuracy of the speech
recognition and the user’s satisfaction towards the product. No experiment will be done
on measuring the effectivity of the security measure after the text has been received by
the user as its effectivity is controlled by variables that are too hard to control and are
too broad. The keyword used in the program is “help” unless said otherwise.
Through the experiments, the researchers wish to answer the following questions
1. What will be the optimal distance between the user and the speech recognizer
2. How accurate will the sensor and speech recognizing software be in recognizing
and identifying when the user is trying to activate the security measure?
3. How long is the elapsed time between the activation of the speech recognition
28
Distance (in meters)
20
18
16
14
Number of Times Recognized
12
10
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
Figure 1.1 shows how many times the speech recognizer successfully recognized the
Figure 1.1 shows the effectivity of the speech recognizer in specific distances. At
each distance, the researchers made sure the speaker kept a constant of 70 dB to
guarantee an accurate finding. The speaker was asked to say the default keyword, help,
at a specific distance for 20 times each. A measuring tape was used to measure the
distance between the user and the microphone. A decibel meter app was used to detect
the decibel level of the user (the decibel meter was placed near the speaker, not the
speech recognizer). The researchers included the results that were around 67-73 dB
and those that did not meet the criteria were not included. 70 dB was chosen to be used
as the standard in all of the researchers test as it is the approximate decibel level
29
60
50
40
30
20
10
0
1 Meter 2 Meters 3 Meters 4 Meters 5 Meters 6 Meters 7 Meters 8 Meters
enough, the graph is not a smooth, gradual decline. There are dips in certain distances.
These dips can be explained away to the program’s difficulty in picking up sounds at
long distances or as a software glitch. 0.5 m and 1 m are the most optimal distances
with 19 successful activations out of 20. 1.5 to 3.5 meters follow next, since they have
almost the same range, at 18-17. 2.5 m had a dip in activations as it only scored 16 out
they feel further distances will no longer be effective and will not help in the study. 0.5 m
– 3.5 m is the optimal distance range wherein the user should stand in order to properly
activate the security measure. Based on this, the researchers conclude that 2 m is the
optimal distance wherein the user should be as the distance was effective in activating
30
Figure 1.2 shows the relationship between decibels produced, the distance between the
user and the speech recognizer and its effectivity in successfully identifying whenever
The experiment was done with a background noise of 55-65 dB. Two sets of
experiments were done. Set A was done with the speaker’s decibel level at 60 dB, while
Set B was done with 60 dB plus 5 dB for every meter passed. For both sets, the
researchers attempted to activate the speech recognizer at 1-8 m. Set A was tested
with the decibel level of the speaker constant through the different distances. Set B, on
the other hand, had the speaker increase her decibel level by 5 dB for every meter.
Figure 1.2 shows the relationship between decibels produced, the distance between the
user and the speech recognizer and its effectivity. Each set had 400 attempts (divided
through the meters). Figure 1.2 shows that set A sees a gradual decline in 1 meter to 6
meters, and a massive drop in 7 and 8 meters. Set B, as seen in Figure 1.2, flatlines in
its effectivity, with minor dips in between them and an improvement in performance at 7
meters (with a 90-dB level). These minor dips fall within the margin of error. With the
results displayed in Figure 1.2, the researchers concluded that if the decibels produced
stays the same and the distance becomes farther, the effectivity level starts declining.
Whilst if the decibels produced increase alongside the distance, then the effectivity
31
Testing the Product’s Effectivity and Accuracy
Not recognized
6%
Recognized
94%
Figure 2.1 shows the accuracy of the speech recognizer in recognizing when the user is
To find out the accuracy of the speech recognizer, the researchers made the
speaker stand at the optimal distance, which is 2 meters, and the speaker spoke at 70
dB. There was no physical obstruction between the user and the microphone. The
background noise level was around 55-65 dB. All variables were set at its optimal
setting. Figure 2.1 shows that out of 100 times, the speech recognizer recognized 94
attempts to activate the security measure, making the speech recognizer’s accuracy
94%. Any attempt that was detected after 30 seconds was considered as not
32
recognized, which was the case for 2 attempts. This shows that the speech recognizer,
50
40
30
20
10
48 46 43 37 25 16
0
60 dB 70 dB 80 dB 90 dB 100 dB 110 dB
Column1
33
Decibel Levels and its Real Life Equivalent
Rock Concert 110
Subway Train 90
Blender 80
Washing Machine, TV 70
0 20 40 60 80 100 120
Column1
Figure 2.2 shows the accuracy of the speech recognizer in recognizing when the user is
Figure 2.3 shows the decibel levels and its real life equivalent.
The speaker stood at the optimal distance, which is 2 meters, away from the
microphone, and the speaker spoke at 70 dB. The researchers played tv static white
noise at different volumes to maintain consistency all throughout the test. 50 attempts
were given to each background noise level. Figure 2.3 shows that the speech
34
Number of Times Recognized Different Keywords
Help 20
Spanghew 17
Mytacism 16
Whiffle 19
Gyascutus 17
Levament 18
Supercalifrajilisticexpialidocious 19
Onomatopoeia 17
Poltophagy 18
Axinomacy 17
0 5 10 15 20 25
background noise levels. According to figure 2.4, 60 dB is the approximate decibel level
machine and 80 dB is the approximate decibel level for a blender. This shows that the
noticed the speech recognizer having some difficulties at around 90 dB (90 dB is the
approximate decibel level for a subway train). At 100 dB, there was a significant drop in
the number of times the speech recognizer recognized when the user was trying to
activate it. The researchers conclude that the optimal background noise level for the
speech recognizer should be around 60-90 dB. For reference, Figure 2.3 shows the
35
Figure 2.4 shows the effectivity of speech recognizer on different types of keywords.
The speaker stood at the optimal distance, which is 2 meters, and the speaker
spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.4 shows
researchers conducted this experiment to show the range of the speech recognizer. To
test the ability of the speech recognizer, the researchers used unconventional words
that are not used in everyday conversation. The researchers chose to use the top 8 of
infamous for being hard to pronounce. The researchers chose these words as they are
the perfect candidates to use as a keyword, as they are obscure words that are rarely
said in everyday conversation. The researchers also included the word help as a base
to compare to the other keywords. The researchers made sure the speaker pronounced
each word in its proper pronunciation to maintain accuracy. Figure 2.4 shows that the
36
speech recognizer was successful in recognizing the inputted keywords. The word the
speech recognizer had the hardest time detecting was Mytacism, but that can be
attributed to the word’s hard pronunciation and its obscurity. A problem the researchers
also noticed is the mispronunciation and the lack of clarity of the voice of the user, as
the speaker frequently mispronounced the words (though these attempts were not
counted in the final result.) The researchers conclude that the speech recognizer can
detect obscure words with ease, as long as the word is said properly.
38 40 42 44 46 48 50 52
37
Figure 2.5 shows the effectivity of speech recognizer on different types of phrases and
sentences.
38
Effectivity If Covered By Various Objects
Windbreaker 17
Pillow 16
Paper Bag 19
Plastic Bag 20
0 5 10 15 20 25
The speaker stood at the optimal distance, which is 2 meters, and the speaker
spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.5 shows
how effective the speech recognizer is on detecting different types of phrases and
sentences. The researchers used phrases that are related to “Robberies” and
“Trespassing”. In phrases or sentences, the results are better than the expected
outcome. Figure 2.5 shows that the phrases actually faired better than the words used
on Figure 2.4. The researchers noticed that there was a correlation between the number
of syllables in a sentence and its effectivity as the phrases with fewer syllables got a
higher result whilst the phrases with the most syllables got the worst results (both “I
think there is a trespasser” and “Robber robber get him robber” have 8 syllables, the
largest number of syllables among the phrases). The researchers concluded that any
concluded through this experiment that a phrase is more effective and there is less risk
39
Figure 2.6 shows the effectivity of speech recognizer if obscured by various thin objects.
As the user might want to hide the mic, the researchers wanted to test the items
tested in Figure 2.6, as they are relatively thin items that can cover the mic. The
speaker stood at the optimal distance, which is 2 meters, and the speaker spoke at 70
dB. The background noise level was around 55-65 dB. The researchers chose to test
the objects windbreaker, pillow, paper bag and a plastic bag if they will hinder the
effectivity of the speech recognizer. These items were chosen as they are relatively thin
objects that are perfect to use as a cover for the microphone. Figure 2.6 shows that thin
objects like plastics and paper bags doesn’t hinder the speech recognizer’s effectivity
whilst objects with a thicker material does struggle a bit. The researchers suggest
covering the mic with plastic bags and paper bags as it is the most effective. They also
suggest the user to avoid covering the mic with pillows, jackets or anything with a
thicker material.
40
3 NO (39.08 8 NO (34.81 13 NO 18 NO
sec) sec)
4 NO 9 NO 14 NO 19 NO (43.01
sec)
5 NO (45.91 10 NO 15 NO 20 NO
sec)
One of the worst enemies a speech recognizer has are physical obstructions.
Physical obstructions greatly decrease the effectivity of a speech recognizer. And any
building has one physical obstruction they can’t remove because it’s part of the
building’s foundation, it’s walls. Figure 2.7 shows how the speech recognizer is not
effective if obscured by a wall. The speaker stood at the optimal distance, which is 2
meters, and the speaker spoke at 70 dB. The background noise level was around 55-65
dB. Out of the 20 times the speech recognizer was tested, only trial no.11 succeeded.
Trials with a number on their side means the speech recognizer did detect that the user
was trying to activate the security measure, though it was after 30 seconds. The number
beside them was how long it took to be recognized by the speech recognizer. After 1
minute, the researchers move unto another trial. The researchers conclude that walls
1 16.98 seconds
2 15.46 seconds
3 10.32 seconds
4 15.34 seconds
41
5 13.78 seconds
6 10.53 seconds
7 14.46 seconds
8 10.16 seconds
9 8.6 seconds
10 9.09 seconds
11 10.23 seconds
12 12.04 seconds
13 9.93 seconds
14 26.20 seconds
15 11.07 seconds
Figure 3.1 shows how long the elapsed Time is between the activation of the speech
Figure 3.1 shows the inconsistency of how long the security measure takes to
call the inputted number. The speaker stood at the optimal distance, which is 2 meters,
and the speaker spoke at 70 dB. The background noise level was around 55-65 dB. Out
of the 15 trials, result shows that the fastest time is during the 9 th trial which is 8.60
seconds. However, there was an irregularity as the result was almost tripled during the
14th trial. These errors can be explained away as software glitches. These software
glitches make the speech recognizer inconsistent in activating the security measure.
The average time elapsed time between the call and activation of the speech
42
TRIAL NUMBER ELAPSED TIME BETWEEN TEXT AND ACTIVATION
1 10.23 seconds
2 7.31 seconds
3 7.25 seconds
4 8.64 seconds
5 10.01 seconds
6 6.23 seconds
7 7.12 seconds
8 6.45 seconds
9 8.13 seconds
10 8.69 seconds
11 9.23 seconds
12 7.46 seconds
13 8.12 seconds
14 7.04 seconds
15 9.62 seconds
Figure 3.2 shows how long the elapsed Time is between the activation of the speech
Figure 3.2 shows that the security measure, compared to Figure 3.1, is far more
consistent in sending a text than initiating a call. The speaker stood at the optimal
distance, which is 2 meters, and the speaker spoke at 70 dB. The background noise
level was around 55-65 dB. Out of 15 trials, results show that the fastest text sent was
during the 6th trial, which had a time of 6.23 seconds. The longest was during the 1 st
trial, which had a time of 10.23, not bad compared to the tripled time in Figure 3.1. The
43
average time elapsed Time between the text and activation of the speech recognition is
7.063 seconds.
Discussion
Established by Figure 1.1, the optimal distance between the user and the speech
recognizer is 0.5 to 3.5 meters, with the decibel level of the user being 70 dB. The
researchers decided to make 2 meters as the optimal distance between the speech
recognizer and user, as it provides effectivity while still providing space and range of
motion for the user. According to Figure 1.2, as the user’s distance increases, the
effectivity of the product decreases. To keep the effectivity as the user’s distance
increases, increase the user’s decibel level along with the distance, as promoted by
Figure 1.2.
The researchers measured the effectivity of the speech recognizer and found
that the speech recognizer had a 94% chance of working on the most optimal setting.
Background noise, when too loud, can hinder the performance of the speech
The researchers suggest the user to avoid background noise levels of up to 90 to 110
Figure 2.4 and Figure 2.5, simple phrases were detected more often than the
complicated words. However, the researchers concluded that any word can be used
44
and the word itself will not hinder the performance of the speech recognizer. But the
pronunciation and clarity of the user is an important factor and it can reduce the
effectivity of the product. The researchers suggest using a word that the user can
properly pronounce and is easy to remember but obscure enough that it will not be said
an object. Thin objects like paper and plastics will not affect the product’s effectivity but
objects that are thicker can slightly hinder the performance of the speech recognizer. In
Figure 2.7, the researchers tried to activate the speech recognizer through a wall but to
no avail. The researchers can conclude that the speech recognizer cannot be used in
different rooms.
The security measure’s elapsed time between the activation and the initiation of
a call is very inconsistent and the average time elapsed time is 12.946 seconds,
according to Figure 3.1. The security measure’s elapsed time between the activation
and the sending of a text, however, was smooth, consistent and a lot faster than
initiating a call. The average time elapsed time is 7.063 seconds, according to Figure
3.2.
CHAPTER 5
This chapter presents the summary of the findings, conclusions and the
corresponding recommendations.
Summary
45
Though crime rates have dropped for non-index crimes, robberies and theft has
remained constant and is even increasing. Having a security measure in every home
and public areas like malls, restaurants, and local stores can make people feel safer.
Security measures plays a big role in the lives of people because this can save people
from danger. Local stores in the Philippines lack security measures because a.) it’s too
expensive or b.) they don’t think of securing their stores until it is too late.
the masses, while also being forward-thinking and advanced. That’s why the
security measure that is hands-free, reliable, easy to use, and affordable security
measure. The researchers aim to have an affordable and viable product that can show
Filipinos a glimpse of the future of safety. Speech recognition is the easiest form of
security measure. By just saying the keyword, the speech recognizer can immediately
The researchers created the security system using the Raspberry Pi, a popular
researchers programmed it and fine-tailored it until it became effective enough for the
researchers’ standards. Once the speech recognizer is activated, the security measure
will then send a text and initiate a call to the inputted numbers. The call will play a pre-
recorded message that is customizable. The GUI allows the user to change the text, the
numbers it will call and the keyword used to activate the security system.
Once the product was done, the speech recognizer went through a myriad of
tests in order for the researchers to examine how effective the speech recognition is.
46
The security measure was then tested by getting the average gap between the call/texts
and activation. The outcome of the product turned out to be very effective and
successful and it proves that the speech recognition is effective and feasible as a
security measure.
Conclusion
1. What will be the optimal distance between the user and the speech recognizer for
- The researchers conducted a series of tests and concluded that the optimal
distance range between the user and the speech recognizer is around 0.5 meters
-3.5 meters and the optimal distance is 2 meters (considering all other variables
are at its optimal setting). The researchers also concluded that the decibel level
of the user and the distance between the user and the speech recognizer should
speech recognizer.
2. How accurate will the sensor and speech recognizing software be in recognizing and
- The researchers tested the speech recognizer 100 times, trying to activate the
security system, and they got a result of 94 out of 100. The researchers
noise of around 60-80 dB is the optimal setting to achieve maximum effectivity for
the product. The researchers also conducted a series of tests testing the
phrases and they concluded that using phrases is not only more effective than
47
using words, it can also prevent accidental activations of the security system.
The researchers also tested the speech recognizer to see if it will activate if
covered by thin materials/objects and a wall. They concluded that thin materials
wont affect the effectivity of the speech recognizer while thicker objects, including
3. How long is the elapsed time between the activation of the speech recognition
- The elapsed time between the activation of the speech recognition software and
the call made has an average time of 12.946 seconds while the elapsed time
between the activation of the speech recognition software and the text made has
After a countless of studies and research was conducted and done by the
Recommendation
There are variations in the study that the researchers recommend the future
researchers to improve on; for example, having a better microphone and equipment is
recommended if the future researchers want better results. A more advanced Raspberry
48
Pi model is recommended to process the program more efficiently, though it is not
needed.
To further enhance the product, a program that can execute the commands more
quickly and more efficiently with fewer bugs is recommended. Having your own speech
libraries instead of using google can be both bad and good, as Google’s libraries are
well optimized but having your own library can essentially remove the use of WIFI in the
product.
Better testing conditions and more trials can really flesh out the results of the
product. Asides for the success of the research, focusing on these can further help the
future researchers in aiding them if they wish to contribute to this investigatory project.
Bibliography
speech-recognition
49
2. Rouse, M. (2016, September 21). What is physical security? - Definition from
security
https://searchcustomerexperience.techtarget.com/definition/speech-recognition
4. Real Python. (2020, January 23). The Ultimate Guide To Speech Recognition With
5. Kaysen, R. (2017, December 22). Do Security Systems Make Your Home Safer?
make-your-home-safer.html?rref=collection%2Ftimestopic%2FHome
%20Security&action=click&contentCollection=timestopics®ion=stream&module=stre
am_unit&version=latest&contentPlacement=5&pgtype=collection
http://www.netfreedom.org/the-importance-of-home-security-system.asp
7. The Importance Of Security Alarm Systems For Your Retail Store – Security Alarms
Miami - Articles - Advanced Fire & Security - Advanced Fire Sprinklers. (n.d.). Retrieved
from http://www.advfireonline.com/advanced-fire-and-security-articles-the-importance-
of-security-alarm-systems-for-your-retail-store.html
8. The National Security situation in 2018, and outlook for 2019. (n.d.). Retrieved from
https://www.google.com/amp/s/pia.gov.ph/news/articles/1016616.amp
9. Caliwan, C. L. (2019, June 16). Total crime volume down in May 2019: PNP.
50
10. Rouse, M. (2016, December 6). What is speech recognition? - Definition from
https://searchcustomerexperience.techtarget.com/definition/speech-recognition
11. Krishnan, S. (2018, October 12). Create your own Voice based application using
assistant-voice-based-assistant-using-python-94b577d724f9
https://github.com/googleapis/google-api-python-client
https://github.com/googleapis/google-cloud-python
https://pypi.org/project/SpeechRecognition/
fbclid=IwAR0mxVZzHMat3JBaBc8PJGe_0DwDLdc6IZGDOpvubp_15CH5lEJKufrHFfs
16. Prell, C. G. L., & Clavier, O. H. (2016, October 12). Effects of noise on speech
https://www.sciencedirect.com/science/article/pii/S0378595516303513
TALOSIG, Nathan E.
09437057481
51
nathantalosig7@gmail.com
EDUCATIONAL BACKGROUND
High School
Grade School
Angelicum College
Pre School
ACHIEVEMENTS
52
Best Boy Scout (Grade 4)
Green Merit Card Receiver (Grade 7-1st Quarter, 3rd Quarter, 4th Quarter)
Green Merit Card Receiver (Grade 10- 1st Quarter, 2nd Quarter, 3rd Quarter)
53
Second Honors (Grade 5)
INTERESTS
Rapping
______________________________________________________________________
CHARACTER REFERENCE
54
Occupation: Filipino Teacher
09063037954
izzyreyes1222@icloud.com / louiseekim03@gmail.com
55
EDUCATIONAL BACKGROUND
High School
Grade School
Pre School
ACHIEVEMENTS
56
Perfect Attendance Awardee (2015 – 2019)
INTERESTS
Dancing
Singing
Modeling
______________________________________________________________________
CHARACTER REFERENCE
MIRAFLOR,Robbin Cross F.
09202672074
robbinmiraflor@yahoo.com/crossmiraflor16@gmail.com
57
EDUCATIONAL BACKGROUND
High School
Grade School
Pre School
ACHIEVEMENTS
INTERESTS
Basketball
Gaming
______________________________________________________________________
CHARACTER REFERENCE
58
Name: Raldin Gem Frias
09993080408
donatoasianti@gmail.com
EDUCATIONAL BACKGROUND
59
High School
Grade School
Pre School
ACHIEVEMENTS
INTERESTS
Reading
Painting
Drawing
60
______________________________________________________________________
CHARACTER REFERENCE
B6L15 Star Apple St. Ph. 8D Greenwoods Exe. Vil. Taytay Rizal
2752891
dyannefrancinedagsil@gmail.com
61
EDUCATIONAL BACKGROUND
High School
Grade School
Pre School
ACHIEVEMENTS
INTERESTS
Animating
Drawing
______________________________________________________________________
CHARACTER REFERENCE
62
Name: Kendra Caramat
63