The Feasibility of Speech Recognition As A Form of Security Measure

THE FEASIBILITY OF A SPEECH RECOGNIZER AS A FORM OF SECURITY
MEASURE
A Thesis Presented To
the High School Department of
Sacred Heart Academy
of Pasig
In Partial Fulfillment of the

Requirements in English
And Science 10
Researchers:
BAROJA, Drezen Scott A.
DAGSIL, Dyanne Francine D.
DONATO, Asianti Crishna E.
MIRAFLOR, Robbin Cross F.
REYES, Louise Erlle P.
TALOSIG, Nathan E.
10- Prudence
Research Advisers:
Mr. Raldin Gem Frias
Ms. Kendra Caramat

Table of Contents
Acknowledgements iii
Abstract iv
Chapter 1: The Problem and Its Background 1
Introduction 2
Background of the Study 2
Conceptual Framework 3
Statement of the Problem 8
Significance of the Study 9
Scope and Delimitations 10
Definition of Terms 11
Chapter 2: Review of Related Literature and Studies 13
Synthesis 18
Chapter 3: Research Methodology 20
Research Design 20
Research Setting 20
Research Instruments 21
Materials 22
Equipment 23
Procedures 24
i
Statistical Treatment 25
Chapter 4: Results and Discussion 27
Discussion 40
Chapter 5: Summary, Conclusions and Recommendations 1
Summary 42
Conclusion 43
Recommendation 45
Bibliography 46
Curriculum Vitae 47
ii
Acknowldegements
The researchers would like to thank their respective families for giving them the support
they needed to get through such horrendous times. They would also like to thank SHAP
for giving them the opportunity to write such an amazing piece and learn about speech
recognition and security. The researchers wanted to thank their fellow classmates for
helping them in every step of the way. Lastly, the researchers want to thank one of the
best research advisers out there, Sir Raldin Gem Frias and Ms. Kendra Caramat. They
have guided the researchers into creating a product that can revolutionize the world.
From the making of the title to the title defense, they have stuck to us like glue, back
when we had no clue on what to do. The researchers thank them for being patient and
understanding. The researchers would also like to thank the panelists for their
constructive criticism which helped the researchers improve the research even more.
Abstract
The first speech recognition systems were focused on numbers, not words. In 1952,
Bell Laboratories designed the “Audrey” system which could recognize a single voice
speaking digits aloud. Ten years later, IBM introduced “Shoebox” which understood and
responded to 16 words in English. Speech recognition was invented with the idea of
making things more hands-free and easier. Though the world is more secure than ever,
that is no reason for you to take the issue of security lightly. This research titled “The
Feasibility of a Speech Recognizer as a Form of Security Measure” focuses on finding
on creating a security measure using speech recognition that is both effective and
iii
cheap. The researchers used an open-source speech recognizer as the base to test
how effective a regular speech recognizer would be. The research was mainly aimed at
finding out how feasible a speech recognition activated security measure is and if the
call/text that will be produced is fast enough to help a person in danger. This thesis
answers three research question imposed by the researchers. The researchers
answered the questions mentioned through a series of tests that determined the overall
effectivity of the product. The researchers conclude, with the results taken from the
tests, that speech recognition is a feasible security measure.
iv
CHAPTER 1
THE PROBLEM AND ITS BACKGROUND
Introduction
There are numerous crimes both in homes and public places such as theft,
robbery, homicide and others. Nowadays, the use of an effective security measure is
becoming an essential part of people’s everyday lives.
Security measure is a precaution taken against terrorism, espionage or other
danger. “It is the protection of personnel, hardware, software, networks and data from
physical actions and events that could cause serious loss or damage to an enterprise,
agency or institution. This includes protection from fire, flood, natural disasters, burglary,
theft, vandalism and terrorism.” (Rouse, 2016, para. 1). This also ensures that it is the
ability to perform its appointed task by protecting it from attacks inside and outside the
organization. There are methods and measures that are meant to detect attackers and
intruders from affecting protected assets.
Speech recognizer is a more simple and effective form of security measure.
Hope (2019) stated that it is a computer software program or hardware device with the
ability to decode the human voice. It is commonly used to operate a device, perform
commands, or write without having to use a keyboard, mouse, or press any buttons.
Speech Recognizer occurs when the recognizer has recognized an assigned word or
words that was programmed in the software. Rouse (2016) stated that it has a limited
vocabulary of words and phrases, and it may only identify these if they are spoken very
1
clearly. Speech recognizer software are easy to use because it is frequently installed in
computers and mobile devices. The disadvantage of a speech recognizer includes its
inability to recognize or capture words due to mispronunciation, lack of support to
different languages, and its inability to sort through background noise. But all
throughout, it’s the most simple and easier form of security measure because it enables
hands-free control of various devices and equipment, provides input to automatic
translation, and creates print-ready dictation. Speech must be converted from physical
sound to an electrical signal with a microphone, and then to digital data with an analog-
to-digital converter.
Background of the Study
Security measure is important because it protects the belongings of a person. It
helps a lot of things in our society because nowadays, there are many crimes reported
ahead. But it is also usually overlooked by most organizations. There are many reasons
to do it — the attacker doing it for financial gain, personal gain, for seeking revenge or
for the vulnerable target available. Security measure is challenging than previous
decades as there are more sensitive devices available such as USB drives,
smartphones, laptops, tablets and many more that enables the stealing of data easily.
Though nowadays, there are high tech security measures that cannot pass by easily.
The great advantage is that criminals or attackers need to pass through many methods
and layers of security. And as the result, they will have a hard time gaining their
objectives. There are many types of security measures that is really effective and
somehow efficient for the sake of our safety. It has three important components which
2
are access control, surveillance, and testing. All obstacles should be placed in a way
where attackers frequently do their objectives. The location on where the security
measure will be placed should be specific and effective.
Amos (2018) stated that early systems of a speech recognizer were limited to a
single speaker and had limited vocabularies of about a dozen words. The earliest form
of speech recognizer were automated telephone systems and medical dictation
software. It is frequently used for dictation and for giving commands to computer-based
systems. Velde (2019) stated that the first-ever recorded attempt at speech recognition
technology dates back to 1,000 A.D. through the development of an instrument that
could supposedly answer “yes” or “no” to direct questions. Modern speech recognizers
has the ability to recognize speech from multiple speakers and have infinite
vocabularies in numerous languages. According to Hope (2019), today, speech
recognition is done on a computer with ASR (automatic speech recognition) software
programs. Mostly ASR programs requires the user to train the program to recognize the
voice so that it can more accurately convert the assigned speech into text. The first ASR
device was used in 1952, it recognizes single parts of the speech that was spoken by a
certain user.
Conceptual Framework
The researchers used the CIPP Evaluation model in order to show the Context,
Input, Process, and Product of the study. The CIPP Evaluation Model is a decision-
focused framework made by Daniel Stufflebeam and his colleagues in the 1960s for the
main purpose of guiding evaluations of programs and projects. The CPP Model is also
3
defined as the “systematic collection of information about the activities, characteristics,
and outcomes of programs to make judgements about the program, improve program
effectiveness, and/or inform decisions about future programming,” (Stufflebeam, 2003).
CIPP is an acronym for the four main concepts this model use. It stands for context,
input, process, and Product. The researchers chose the CIPP Model for it “aims to
provide an analytic and rational basis for program decision-making, based on a cycle of
planning, structuring, implementing and reviewing and revising decisions, each
examined through a different aspect of evaluation - Context, Input, Process, and
Product evaluation,” (Robinson, 2002).
The researchers decided to employ symbols and colors to provide a clean and
understandable process from the Flowchart Diagram (developed by Newman and
Goldstein around the 1940’s). A flowchart is a diagram of the sequence of movements
or actions of people or things involved in a complex system or activity. Each color in this
diagram represents a certain part of the process. The researchers decided to use colors
to distinguish between the different concepts to make the diagram simple and not
cluttered. The blue rectangle signifies the Context or the objectives of the study, the
black rectangle signifies the Input, the green signifies the Process, and the yellow
teardrop signifies the Product.
A blue rectangle was chosen by the researchers to show the Context. Blue was
chosen as we psychologically associate blue with determination and goal-making,
important qualities in setting up our objectives or the context. In Context, the
researchers collected and assessed information to scale the main objectives the
4
researchers want to accomplish. The researchers stated 4 objectives that they want to
accomplish, as shown in Figure 1. The researchers want to make an affordable security
system for local store owners who can’t afford the more expensive security measures.
They wish to provide an enticing alternative that won’t cost us much as the other
security systems. They also want to show people the advantages of a hands-free
security system and maybe give them a glimpse of the future of security systems. They
also aim to prove that speech recognition as an effective security measure. As speech
recognition is currently looked at as a “pet project” in the security system industry, the
researchers want to show people that using speech recognition as a form of security
measure is not only possible but also very effective. They lastly want to encourage
security companies to use speech recognition in their security measures.
After the Context is the Input Evaluation Stage. The purpose of this stage is for
the researchers to assemble a concrete list of materials and components needed to
make and execute the procedure to create the intended product. After the researchers
outlined their main goals, they then made a list of the components required. The main
components needed in this study are the Raspberry Pi, the brain of the whole product
and the microphone, the component used to detect the sounds that are needed to
activate the security measure. Since the Raspberry Pi comes preinstalled with its own
OS and it’s powerful enough to be used independently, a computer is really not required
unless a powerful system is required to process or load a certain code. Other equipment
such as the cables (micro-USB cable and HDMI cable) are essential in the process.
5
The Process Evaluation Stage is one of the most important stages in the CIPP
Model since the quality of the product here is investigated, documented and assessed
(Wilson & Mertens, 2012). The steps need to be executed properly in order to produce
excellent results. Programming is very crucial as one mistake can stop the program
from working. The researchers need to be meticulous and careful in their coding.
Testing and Troubleshooting are very important steps to spot and correct mistakes.
Once the code is inserted, testing needs to be followed to make sure no mistakes are
made.
The last and final stage in the CIPP Model is the product evaluation. A teardrop
was chosen to show emphasis on the outcome. This stage assesses the final outcome
of the study, whether expected or unexpected. Once the steps have been followed
properly, the Product will be made. Once the product is made, the researchers can then
determine the effectivity of a speech recognizer as a form of security measure.
6
Conceptual Framework
Context
1. make an affordable security system for local store owners;
2. show the advantages of using a hands-free security system;
3. prove that speech recognition can be an effective security measure; and
4. encourage security companies to use speech recognition in their security
measures.
Input
- Raspberry Pi 3 Model B
- Micro-USB Cable and Power Brick
- RODD Brand USB Computer Microphone
- USB Speakers
- HDMI Cable
Process
1. Connect all necessary components to Raspberry Pi.
2. Program all the required codes.
3. Test and troubleshoot.
Product
Security Measure using Speech
Recognition Software
7
Figure 1. The procedures needed to make a security measure using speech recognition
software.
Statement of the Problem
The purpose of this study is to develop a security measure using a speech
recognizer software wherein the speech recognizer will recognize a certain phrase or
word provided by the researchers and the software will secretly call the police and a
close relative/friend alerting them that there is an emergency. The researchers will also
explore the concept of speech recognition software as an alternative for other such
forms of security. The researchers’ aim to give an affordable alternative to local store
owners who cannot afford the other security measures. During the study, the
researchers aim to answer the following questions:
1. What will be the optimal distance between the user and the speech recognizer
for the software and microphone to properly identify the voice?
2. How accurate will the sensor and speech recognizing software be in recognizing
and identifying when the user is trying to activate the security measure?
3. How long is the elapsed time between the activation of the speech recognition
software and the call/text has been made?
Hypotheses
8
1. If the microphone isn’t being obscured and the background noise levels are
ranging from 35 to 110 dB, then the approximate needed decibel the user needs
to produce is approximately 60-80 dB, with the distance around 2 – 10 meters
away from the mic. The decibels needed to be produced by the user is directly
proportional to the distance of the user from the microphone. If there are no
obstructions covering the microphone, then the optimal distance of the user from
the microphone is 5 meters.
2. If the user is standing 5 meters away from the microphone and the user said the
word in 50-60 dB, then the software has about 90% chance of success, keeping in
mind other factors such as the background noise, possible objects covering up the
mic and the clarity of the voice of the user.
3. If the user spoke at the optimal distance between the user and speech recognizer
and produced the appropriate number of decibels needed by the speech
recognizer, then the pre-recorded call/message inputted will take between 5-20
seconds to process and call/text the number inputted.
Significance of the Study
Caliwan (2019) stated that though the total number of index crimes have
dropped, the number of robberies and theft has remained constant, only dropping by
0.4%. The researchers aim to help Filipino citizens by giving them an alternative
security system that is effective and seamless. This study aims to show Filipino citizens
the effectivity of speech recognition when used in security systems. The researchers
want to provide people an affordable yet reliable security measure usually seen in
higher-end products that go up to P100,000, something an ordinary joe cannot afford.
9
The researchers also aim to show security system companies that speech recognition
can be a viable form of security. This study aims to specifically help these groups of
people:
Small Local Store Owners. Stores are prone to being robbed. Especially small
local stores. Small local store owners usually can’t afford security systems that can aid
them during a robbery or emergency. The researchers focused on trying to make a
cheap yet reliable security measure. This product takes the feature of speech
recognition from higher-end security systems.
Security System Companies. The researchers’ aim to make speech recognition
software as a viable security measure. The researchers want to challenge security
system companies to look into speech recognition and invest on this portion of the
security system industry. With this, speech recognition-based security measures can
become cheaper and more affordable for the consumer.
Corner Stores and Gas Stations. These places are still open up until midnight that
this is get robbed easily by the burglars // this is the perfect target for burglars. Mesa
Alarm Systems (2017) stated that over 7,000 corner stores are robbed each year and
most burglars rarely walk away with more than $900. Since most robberies happen at
night, people should limit their time at these locations after dark. The researchers aim to
lessen the number of robberies in these places. With the speech recognizer, the
employees can easily alert authorities from burglars.
Scope and Delimitations
10
This study aims to create a device that can detect emergency based on sound. In
order to do so, the aspects looked into was the sensitivity of the sensors even to the
faintest sounds. This study will cover the effectivity of the product in capturing sounds
accurately. The device will be tested to different sound levels to determine its capability
in capturing and detecting sounds based from the user’s activities and speeches and
distinguishing the speaker from background noises. To further enhance the device' s
capability, the researchers will also conduct a test on how precise the device can pick
up sounds based on the environment. The researchers will also focus on programming
the sensors capability in picking up sounds efficiently and also the capability of the
sensor to pick up sounds properly.
Definition of Terms
For better understanding of this study, the following terms that are defined operationally:
1. Acoustic Model. It is a file that contains statistical representations of each the
distinct sounds that makes up a word.
2. Artificial Intelligence (AI). It is the branch of computer science that deal with
writing computer programs that can solve problems creatively.
3. Automated Telephone Systems. It is a telephone system that interacts with
callers without input from a human other than the caller.
4. Automatic Speech Recognition (ASR). It is the use of computer hardware and
software-based techniques to identify and process human voices. It is used to
identify the words a person has spoken or to authenticate the identity of a person
speaking into the system.
11
5. Cyber Espionage. It is an unauthorized spying by computer; the term generally
refers to the deployment of viruses that clandestinely observe or destroy data in
the computer systems of government agencies and large enterprises.
6. Hacking. It generally refers to unauthorized intrusion into a computer or a
network. The person engaging into hacking activities is known as a hacker.
7. Internet of Things (IoT). It is a system of interrelated computing devices,
mechanical and digital machines, objects, animals or people that are provided
with unique identifiers (UIDs) and the ability to transfer data over a network
without requiring human-to-human or human-to-computer interaction.
8. Malware. It is any program or file that is harmful to a computer user. It is also
called as a malicious software.
9. Markov model. It is a stochastic method for randomly changing systems where it
is assumed that future states do not depend on past states.
10. Phoneme. It is any of the abstract units of the phonetic system of a language
that correspond to a set similar speech sounds which are perceived to be a
single distinctive sound in the language.
11. Physical Security. It is that part of security concerned with physical measures
designed to safeguard personnel; and to safeguard them against espionage,
sabotage, damage, and theft.
12. Rudimentary Speech Recognition Software. It has limited vocabulary of words
and phrases, and it may only identify these if they are spoken very clearly.
13. Security Measure. It is measures taken as a precaution against theft or
espionage or sabotage etc.
12
14. Speech Recognizer/Speech Recognition. It is a computer software program or
hardware device with the ability to decode human voice.
15. Statistical Language Model. It is a file used by Speech Recognition Engine to
recognize speech, contains a large list of words and their probability of
occurrence, used in dictation applications.
16. Trigram. It is a graphic unit made up of three parts, as a trigraph.
17. Unique Identifiers (UIDs). It is a numeric or alphanumeric string that is
associated with a single entity with a given system, this made it possible to
address that entity, so that it can be accessed and interacted with.
CHAPTER 2
REVIEW OF RELATED LITERATURE AND STUDIES
The literature and studies cited in this chapter tackle the different concepts,
understandings, and ideas related to the topic of the effectivity of speech recognizer as
a form of security measure. This chapter also contains generalizations or conclusions
and different developments related to the topic. The literature and studies included in
this chapter can help in familiarizing the reader to information and abstracts that are
relevant and similar to the present study.
Current State of Security in the Philippines
Caliwan (2019) stated that total crime volume has been down and is declining,
thanks in large part to Philippine National Police’s intensified drive against crime and
13
lawlessness. This past year has seen a drop of index crimes (such as robberies,
murders, homicide, physical injury, rape, theft, car napping and cattle rustling.) by
22.6%; from 7421 in May 2018 to 5744 in May 2019. Though the total number of index
crimes have dropped, the number of robberies and theft has remained constant, only
dropping by 0.4%. He stressed the importance of safety. He stated that not because
total crimes have dropped in numbers means people should be comfortable. Filipinos
should always stay vigilant.
Bueza (2018) wrote that around 1.4 million families feel victim to common crimes
in the third quarter of 2018, according to a Social Weather Stations (SWS) survey
released on November 29, 2018. The SWS survey held from September 15 to 23
showed that 6.1% of Filipino families (around 1.4 million families) reported victimization
by any of the common crimes within the past 6 months alone (common crimes refer to
pick pocketing or robbery of personal property, break-ins, car napping, and physical
violence.) It also said that 5.6% of Filipino families have suffered from property crimes. It
is very much recommended then that people should have at least a security measure in
their homes and stores.
Importance of Security Systems
Kaysen (2017) stressed the importance of security systems on homes (and
stores) and its effectivity on reducing burglaries and robberies. The National Council for
Home Safety and Security stated that homes without alarms are three times more likely
to get burglarized. Burglaries, since the boom of the new generation, have dramatically
14
reduced crime rates down to 28%. It also states the positive and negative effects of
having security systems on the vicinity.
Rode (2019) addressed the importance of security systems for retail stores. He
stated that, as stores are big investments, it can be very upsetting and stressful when
the safety of a person’s investment is compromised by an outside intruder. Security
systems not only act as a form of safety measure: it can also be used to deter criminals
from even attempting to enter a person’s store. If a break-in does occur, having a
security system can provide police with invaluable information that can lead them to a
suspect.
Overviews on Speech Recognition
“Speech recognition is the ability of a machine or program to identify words and
phrases in spoken language and convert them to a machine-readable format.” (Rouse,
2007, para. 2). In this article, she states the meaning of speech recognition and how it
works. Speech recognition works using algorithms through acoustic and language
modeling. Acoustic modeling is the relationship between linguistic units of speech and
audio signals. It is the language modeling matches sounds with word sequences to help
distinguish between words that sound similar.
Velde (2019) explained how speech recognition work and its uses. Speech
recognition technology is not just about making things easier. It is also about the safety.
Instead of texting while driving, people can now tell their car who to call or what
restaurant to navigate to. As beneficial as it may seem in an ideal scenario, it is
dangerous when implemented before it has high enough accuracy. Speech recognition
analyze sounds by filtering what you say, digitizing it to a format it can “read,” and then
15
analyzing it for meaning. Then, based on algorithms and previous input, it can make a
highly accurate educated guess as to what the person are saying. It gets to know the
speaker’s use of language. Background noise can easily throw a speech recognition
device off track. This is because it does not inherently have the ability to distinguish the
ambient sounds it “hears” of a dog barking or a helicopter flying overhead, from a
person’s voice.
Velde (2019) also said as of May 2017, Google’s machine learning algorithms
have now achieved a 95%-word accuracy rate for the English language. That current
rate also happens to be the threshold for human accuracy. She compared the growth of
speech recognition to a child learning his or her first words. Whereas humans have
refined the process, they are still figuring out the best practices for computers. They
have to be trained in the same way our parents and teachers trained students. That
training involves a lot of innovative thinking, manpower, and research.
Gong (1995) surveyed how speech recognition in noisy environments work.
Gong concluded that environmental noise significantly degrades the performance of
most current automatic speech recognition systems. This degradation comes mainly
from differences in the learning and use environments of a system. In recent years,
many studies have focused on reducing these differences but the technology, even till
this day, still has a hard time distinguishing the voice from background noises.
Graff and Peacocke (1995) gave an introduction to speech and speaker
recognition. Speech recognition has already proven useful for certain applications, such
as telephone voice-response systems for selecting services or information, digit
recognition for cellular phones, and data entry while walking around a railway yard or
16
clambering over a jet engine during an inspection. Speaker recognition is related to
work on speech recognition. Instead of determining what was said, the focus is on
determining the speaker. Deciding whether or not a particular speaker produced the
utterance is called verification, and choosing a person's identity from a set of known
speakers is called identification.
Application of Speech Recognizers used as a Security Measure
Reynolds (2002) described the deployment of speech technologies in
STARHome, a fully functional smart home prototype. STARHome is a smart home
prototype that includes a security feature wherein they have a feature for calling people
you know. Reynolds also stated that using voice biometrics for security and home
automation involves several ergonomic constraints including a drastic limitation of
speech duration for recognition.
Foster (1996) wrote about his speech activated security system. Speech
actuated security devices and methods whereby a lock, or other security or access
device, may be actuated by a speech input thereto, but without disclosure of the actual
code where doing so to those hearing the code words spoken during use of the security
device. The security device includes a microphone, a display for displaying a plurality of
code elements, and a processor for controlling the display and analyzing the
microphone signal to detect a proper sequence of code elements spoken by a user as
detected by the microphone and to operate the security device in response thereto. He
stated that using speech recognition can be effective in cutting the physicality in half.
17
Effectivity of Speech Recognizers as Security Measures
De Leon, Hernaez, Pucher, Saratxaga and Yamagishi (2012) wrote about the
evaluation of speaker verification security and detection of HMM-based synthetic
speech. Through a hidden Markov model (HMM)-based text-to-speech (TTS)
synthesizer, which can synthesize speech for a target speaker using small amounts of
training data through model adaptation of an average voice or background model, they
tested and concluded that over 81% of the matched claims are accepted. This result
suggests vulnerability in SV systems and thus a need to accurately detect synthetic
speech.
Chow, He, Su, Yang and Zhang (2000) focused their paper on the architecture
design, implementation and optimization of distributed speech recognition systems.
They concluded that speech recognition is best when there is no background noise and
there are no physical obstructions.
Synthesis
Crime is a problem in the Philippines, it remains a serious issue throughout the
country. Caliwan (2019), stated that robberies and theft have remained constant from
May 2018 to May 2019. Thus, the reason to have a need to an effective and affordable
security systems that is accessible to the people. Kaysen (2017), stated that homes
without alarms are three times more likely to get burglarized. Security systems are an
essential in every home or stores to protect the property from robberies, theft and other
property crimes. Having a security system can act not only as a safety measure to
protect against burglars and home intruders but to also drive away these criminals.
18
Rouse (2007, para. 2), stated that; “Speech recognition is the ability of a machine
or program to identify words and phrases in spoken language and convert them to a
machine-readable format.”. Speech recognition technology is not just about making
things easier, it is also about the safety. It has a lot of uses and advantages yet only a
few people tried to use speech recognizer for security measures. One example of
speech recognizer is STARHome. Reynolds (2002), described the deployment of
speech technologies in STARHome, a fully functional smart home prototype. It is a
smart home prototype that includes a security feature wherein there’s a feature where it
can call the person’s emergency contact. Foster also made a speech actuated security
system back in 1996. A security device and methods whereby a speech input may be
actuated by the lock, or other security or access device. It includes a microphone, a
display for displaying a plurality of code elements, and a processor for controlling the
display and analyzing the microphone signal to detect a proper sequence of code
elements spoken by a user as detected by the microphone and to operate the security
device.
De Leon, Hernaez, Pucher, Saratxaga and Yamagishi (2012), concluded their
study on evaluation of speaker verification security and detection of HMM-based
synthetic speech that over 81% of the matched claims are accepted. This result
suggests a vulnerability in the system, the researchers will be using Google’s API Client
Library for Speech Recognition. Velde (2019), stated that Google’s machine learning
algorithms has achieved a 95%-word accuracy. Gong (1995) concluded that
environmental noise greatly degrades the performance of the speech recognizer,
something the researchers are tackling on preventing. Chow, He, Su, Yang and Zhang
19
(2000) concluded that speech recognition is best when there is no background noise
and there are no physical obstructions.
In conclusion, speech recognizer has proven itself as a useful tool and the future
of technology. With the Philippines’ current climate towards security, having an
affordable and seamless security system with speech recognition capabilities is a step
towards the future. There have been countless examples of people trying to use speech
recognition as a form of security measure and with the current advanced technologies
we currently have. We the researchers, are confident to create a product an effective
security system.
CHAPTER 3
RESEARCH METHODOLOGY
Research Design
“Research design is defined as a framework of methods and techniques chosen
by a researcher to combine various components of research in a reasonably logical
manner so that the research problem is efficiently handled. It provides insights about
“how” to conduct research using a particular methodology,” (Bhat, 2019). Research
20
design is a model or layout used to answer the research questions. The researchers
used a true experimental research design, which is defined as a type of experimental
design that is thought to be the most accurate type of experimental research, to collect
and gather data and other information that was needed for the product. Bhat (2019)
stated that experimental research is any research conducted with a scientific approach,
where a set of variables are kept constant while the other set of variables are being
measured as the subject of experiment. The researchers conducted several trials in
order to know if the speech recognizer is effective as a form security measure. The
researchers chose to use the experimental research design to test the security system’s
ability to sort through background noise, properly identify the key word, and quickly
notify the authorities.
Research Setting
The study was conducted in two rooms within a condominium at East
Residences Ortigas, a condominium complex that lies in Pasig, Metro Manila. Each
room had ample space to provide a spacious working environment. One room was used
to test the effectivity of the speech recognizer on a natural setting, wherein the
monotonous routine of hustle and bustle are ongoing. The researchers also provided a
room wherein no background noise is being emitted. This was done to properly record
the sensor’s capability without any physical obstructions. experiment to simulate the
noisy. This place was chosen for the environments usually seen in local stores and to
test the sensor and software’s capability to distinguish between the background noise
and the speaker. This place was also chosen to examine the sensors capability to
properly and quickly identify the key word.
21
Research Instruments
The results of the product were acquired through experimentation. The
researchers used a decibel meter to accurately detect the decibel level of the room. The
researchers also used a tape measure to see the distance between the microphone and
the speaker. The experiment was done to find the effectivity of speech recognizer as a
form of security measure by measuring and identifying the relationship between the
background the background noise level, the distance of the speaker from the
microphone, and the ability of the sensor to properly pick up and identify the key word.
To further test the ability of the speech recognizer the researchers will be using a
speaker with noises solely based on regular routines.
Experimentation was chosen as a research instrument to properly identify the
effectivity of speech recognizer as a form of security measure using controlled variables
to procure the most accurate results. Data collected will then be processed through the
success rate percentage.
Materials
Table 1. The Raspberry Pi 3 Model B and the Microphone with its Corresponding
Materials Quantity Price Appearance
Rasperry Pi 3 Model B 1 ₱2500
22
Home Studio USB 1 ₱928
Condenser Microphone
F-165 Multimedia 1 Pair ₱300
Speakers
Quantity, Price and Appearance.
The materials provided above are the components that were used during the
making of the speech-recognizer security system. Moody (2011) described the
Raspberry Pi as a "potential BBC Micro 2.0", not by replacing PC compatible machines
but by supplementing them. The Raspberry Pi is a very versatile product mainly used
for robotics. Due to these reasons, the researchers chose to use the Raspberry Pi
instead of other alternatives.
The Home Studio USB Condenser Microphone has a rating of 4.5 stars out of 81
reviews in Lazada. The researchers decided to use this microphone as it also came with
a noise filter which can eliminate most of the researchers’ problems with background
noise.
The F-165 Multimedia Speakers are cheap wired speakers that provides loud
and clear enough sound to alert near neighbors and to scare intruders away.
23
Equipment
Table 2. All the Equipment Used with its Corresponding Quantity, Price and
Appearance.
Equipment Quantity Price Appearance
OTG Cable 1 piece provided
HDMI Cable 1 piece provided
Monitor 1 piece provided
Keyboard 1 piece provided
Mouse 1 piece provided
The equipment used were essentials and mainly used to communicate with the
Raspberry Pi more efficiently. An OTG Cable (a wire that enables a connection between
micro-USB and regular USB) was used to connect normal USB devices to the
Raspberry Pi, since the Raspberry Pi only had micro-USB ports. A monitor was used to
see the input of the device. The HDMI cable was used to connect the monitor to the
24
Raspberry Pi. A keyboard was used to type all the codes necessary and a mouse is
also used to interact with the Raspberry Pi through its user interface.
Procedures
1. Connect the Raspberry Pi to a monitor, a mouse, and a keyboard.
2. Choose an operating system for the Raspberry Pi. The researchers chose to install
Linux, an open-source operating system modelled on UNIX, as the operating system
for the security system.
3. Install Python, including all the necessary libraries. Download the installer from the
official Linux website. Run the installer then choose the path you want Python to be
installed in. Choose custom and tick all of the boxes.
4. Install PIP, the package installer for Python which can be used to install packages
from the Python Package Index and other indexes. To install PIP, download get-
pip.py from the official Linux website. Open a terminal using APT and execute curl
https://bootstrap.pypa.io/get-pip.py -o get-pip.py. Afterwards, execute python
get-pip.py.
5. PyAudio, an extension of Python and a requirement for Python to be able to use the
microphone, needs to be installed. To install PyAudio, use the APT and execute
sudo apt-get install python-pyaudio python3-pyaudio in a terminal.
6. Install Google API Client Library for Python, the main library/database the software
will use to properly identify the word by constantly listening to the user and
comparing the audio to thousands of files until they find a match. To install Google
API Client Library for Python, open a terminal using APT and execute pip install
--upgrade google-api-python-client.
25
7. Connect the USB Microphone to the port using an OTG cable. Make sure the
Internet is connected for the Google API Client Library to work.
8. Open IDLE, the default source code editor that comes pre-installed with Python, and
write this code:
LINK FOR CODE:
https://github.com/malnourishita/speech/blob/master/GUI%20TEST.py
Make sure there are no errors in the code. Test and run the code many times. Run the
program and leave it.
9. Activate the security system by saying the placed keyword.
Statistical Treatment
To get the results of the data effectively, the researchers used ANOVA to
determine the results of the thesis.
Analysis of Variance
This can help the researchers in creating a comparison between two or more
variables that allows the researchers to get various results and predictions on two or
more sets of data.
Steps In ANOVA
First procedure is to determine the optimal distance between the sensor and the
user.
26
Second procedure is to test how many times the speech recognizer will accept
the key word over a hundred times.
Third procedure is to assess the elapsed time between the activation of the
speech recognition and the text/call.
27
CHAPTER 4
RESULTS AND DISCUSSION
This chapter overlooks and discusses the results, presentation, analysis and
interpretation of the data gathered by the researchers. This study aims to determine the
effectivity of speech recognizer as a form of security measure. The researchers applied
an experimental study to properly procure the data presented. Experiments were done
to answer the questions communicated in the statement of the problem. The analytical
procedures are arranged according to the sequence of specific questions.
The experiments are mainly focused on the feasibility and accuracy of the speech
recognition and the user’s satisfaction towards the product. No experiment will be done
on measuring the effectivity of the security measure after the text has been received by
the user as its effectivity is controlled by variables that are too hard to control and are
too broad. The keyword used in the program is “help” unless said otherwise.
Through the experiments, the researchers wish to answer the following questions
from the statement of the problem:
1. What will be the optimal distance between the user and the speech recognizer
for the software and microphone to properly identify the voice?
2. How accurate will the sensor and speech recognizing software be in recognizing
and identifying when the user is trying to activate the security measure?
software and the call and text made?
28
Distance (in meters)
20
18
16
14
Number of Times Recognized
12
10
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
Distance (in meters)
Figuring Out the Optimal Distance
Figure 1.1 shows how many times the speech recognizer successfully recognized the
word in specific distances.
Figure 1.1 shows the effectivity of the speech recognizer in specific distances. At
each distance, the researchers made sure the speaker kept a constant of 70 dB to
guarantee an accurate finding. The speaker was asked to say the default keyword, help,
at a specific distance for 20 times each. A measuring tape was used to measure the
distance between the user and the microphone. A decibel meter app was used to detect
the decibel level of the user (the decibel meter was placed near the speaker, not the
speech recognizer). The researchers included the results that were around 67-73 dB
and those that did not meet the criteria were not included. 70 dB was chosen to be used
as the standard in all of the researchers test as it is the approximate decibel level
29
60
50
40
30
20
10
0
1 Meter 2 Meters 3 Meters 4 Meters 5 Meters 6 Meters 7 Meters 8 Meters
Set A (Constant: 60 dB) Set B (60 dB +5 dB to every meter)
produced by a human when shouting/talking loudly (Sramkova, 2015). Interestingly
enough, the graph is not a smooth, gradual decline. There are dips in certain distances.
These dips can be explained away to the program’s difficulty in picking up sounds at
long distances or as a software glitch. 0.5 m and 1 m are the most optimal distances
with 19 successful activations out of 20. 1.5 to 3.5 meters follow next, since they have
almost the same range, at 18-17. 2.5 m had a dip in activations as it only scored 16 out
of 20 but 3 m has 17 activations out of 20. The researchers decided to stop at 8 m as
they feel further distances will no longer be effective and will not help in the study. 0.5 m
– 3.5 m is the optimal distance range wherein the user should stand in order to properly
activate the security measure. Based on this, the researchers conclude that 2 m is the
optimal distance wherein the user should be as the distance was effective in activating
the speech recognizer without sacrificing
30
Figure 1.2 shows the relationship between decibels produced, the distance between the
user and the speech recognizer and its effectivity in successfully identifying whenever
the user is trying to activate the speech recognizer.
The experiment was done with a background noise of 55-65 dB. Two sets of
experiments were done. Set A was done with the speaker’s decibel level at 60 dB, while
Set B was done with 60 dB plus 5 dB for every meter passed. For both sets, the
researchers attempted to activate the speech recognizer at 1-8 m. Set A was tested
with the decibel level of the speaker constant through the different distances. Set B, on
the other hand, had the speaker increase her decibel level by 5 dB for every meter.
Figure 1.2 shows the relationship between decibels produced, the distance between the
user and the speech recognizer and its effectivity. Each set had 400 attempts (divided
through the meters). Figure 1.2 shows that set A sees a gradual decline in 1 meter to 6
meters, and a massive drop in 7 and 8 meters. Set B, as seen in Figure 1.2, flatlines in
its effectivity, with minor dips in between them and an improvement in performance at 7
meters (with a 90-dB level). These minor dips fall within the margin of error. With the
results displayed in Figure 1.2, the researchers concluded that if the decibels produced
stays the same and the distance becomes farther, the effectivity level starts declining.
Whilst if the decibels produced increase alongside the distance, then the effectivity
stays relatively the same.
31
Testing the Product’s Effectivity and Accuracy
Accuracy of the Speech Recognizer
Not recognized
6%
Recognized
94%
Recognized Not recognized
Figure 2.1 shows the accuracy of the speech recognizer in recognizing when the user is
trying to activate it.
To find out the accuracy of the speech recognizer, the researchers made the
speaker stand at the optimal distance, which is 2 meters, and the speaker spoke at 70
dB. There was no physical obstruction between the user and the microphone. The
background noise level was around 55-65 dB. All variables were set at its optimal
setting. Figure 2.1 shows that out of 100 times, the speech recognizer recognized 94
attempts to activate the security measure, making the speech recognizer’s accuracy
94%. Any attempt that was detected after 30 seconds was considered as not
32
recognized, which was the case for 2 attempts. This shows that the speech recognizer,
in its most optimal setting, is very effective.
Accuracy on Different Background Noise Levels

60
50
40
30
20
10
48 46 43 37 25 16
0
60 dB 70 dB 80 dB 90 dB 100 dB 110 dB
Column1
33
Decibel Levels and its Real Life Equivalent
Rock Concert 110
Factory Machinery 100
Subway Train 90
Blender 80
Washing Machine, TV 70
Normal Conversation Noise 60
0 20 40 60 80 100 120
Column1
Figure 2.2 shows the accuracy of the speech recognizer in recognizing when the user is
trying to activate it on certain background noise levels.
Figure 2.3 shows the decibel levels and its real life equivalent.
The speaker stood at the optimal distance, which is 2 meters, away from the
microphone, and the speaker spoke at 70 dB. The researchers played tv static white
noise at different volumes to maintain consistency all throughout the test. 50 attempts
were given to each background noise level. Figure 2.3 shows that the speech
34
Number of Times Recognized Different Keywords
Help 20
Spanghew 17
Mytacism 16
Whiffle 19
Gyascutus 17
Levament 18
Supercalifrajilisticexpialidocious 19
Onomatopoeia 17
Poltophagy 18
Axinomacy 17
0 5 10 15 20 25
recognizer has no problem recognizing the keyword around 60, 70 and 80 dB of
background noise levels. According to figure 2.4, 60 dB is the approximate decibel level
for normal conversation noise, 70 dB is the approximate decibel level of a washing
machine and 80 dB is the approximate decibel level for a blender. This shows that the
speech recognizer can withstand normal conversation to a blender. The researchers
noticed the speech recognizer having some difficulties at around 90 dB (90 dB is the
approximate decibel level for a subway train). At 100 dB, there was a significant drop in
the number of times the speech recognizer recognized when the user was trying to
activate it. The researchers conclude that the optimal background noise level for the
speech recognizer should be around 60-90 dB. For reference, Figure 2.3 shows the
various decibel levels and its real-life example.
35
Figure 2.4 shows the effectivity of speech recognizer on different types of keywords.
The speaker stood at the optimal distance, which is 2 meters, and the speaker
spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.4 shows
the effectiveness of the speech recognizer in different types of keywords. The
researchers conducted this experiment to show the range of the speech recognizer. To
test the ability of the speech recognizer, the researchers used unconventional words
that are not used in everyday conversation. The researchers chose to use the top 8 of
the most obscure words, according to Merriam-Webster, and the words
supercalifragilisticexpialidocious and onomatopoeia, as they are two words that are
infamous for being hard to pronounce. The researchers chose these words as they are
the perfect candidates to use as a keyword, as they are obscure words that are rarely
said in everyday conversation. The researchers also included the word help as a base
to compare to the other keywords. The researchers made sure the speaker pronounced
each word in its proper pronunciation to maintain accuracy. Figure 2.4 shows that the
36
speech recognizer was successful in recognizing the inputted keywords. The word the
speech recognizer had the hardest time detecting was Mytacism, but that can be
attributed to the word’s hard pronunciation and its obscurity. A problem the researchers
also noticed is the mispronunciation and the lack of clarity of the voice of the user, as
the speaker frequently mispronounced the words (though these attempts were not
counted in the final result.) The researchers conclude that the speech recognizer can
detect obscure words with ease, as long as the word is said properly.
Number of Times Recognized Different Keywords

Robber Robber Get Him Robber 44
Help Me Somebody Please 48
I Think There is a Trespasser 43
Call the Cops Now 50
There is Someone in My House 44
My Favorite Day is Friday 49
38 40 42 44 46 48 50 52
37
Figure 2.5 shows the effectivity of speech recognizer on different types of phrases and
sentences.
38
Effectivity If Covered By Various Objects
Windbreaker 17
Pillow 16
Paper Bag 19
Plastic Bag 20
0 5 10 15 20 25
The speaker stood at the optimal distance, which is 2 meters, and the speaker
spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.5 shows
how effective the speech recognizer is on detecting different types of phrases and
sentences. The researchers used phrases that are related to “Robberies” and
“Trespassing”. In phrases or sentences, the results are better than the expected
outcome. Figure 2.5 shows that the phrases actually faired better than the words used
on Figure 2.4. The researchers noticed that there was a correlation between the number
of syllables in a sentence and its effectivity as the phrases with fewer syllables got a
higher result whilst the phrases with the most syllables got the worst results (both “I
think there is a trespasser” and “Robber robber get him robber” have 8 syllables, the
largest number of syllables among the phrases). The researchers concluded that any
word can be used, as long as it is pronounced properly. The researchers also
concluded through this experiment that a phrase is more effective and there is less risk
of activating the security system by mistake.
39
Figure 2.6 shows the effectivity of speech recognizer if obscured by various thin objects.
As the user might want to hide the mic, the researchers wanted to test the items
tested in Figure 2.6, as they are relatively thin items that can cover the mic. The
speaker stood at the optimal distance, which is 2 meters, and the speaker spoke at 70
dB. The background noise level was around 55-65 dB. The researchers chose to test
the objects windbreaker, pillow, paper bag and a plastic bag if they will hinder the
effectivity of the speech recognizer. These items were chosen as they are relatively thin
objects that are perfect to use as a cover for the microphone. Figure 2.6 shows that thin
objects like plastics and paper bags doesn’t hinder the speech recognizer’s effectivity
whilst objects with a thicker material does struggle a bit. The researchers suggest
covering the mic with plastic bags and paper bags as it is the most effective. They also
suggest the user to avoid covering the mic with pillows, jackets or anything with a
thicker material.
TRIA ACTIVATE TRIA ACTIVATE TRIA ACTIVATE TRIA ACTIVATE
L NO. D L NO. D L NO. D L NO. D

1 NO 6 NO 11 YES 16 NO
2 NO (37.56 7 NO 12 NO 17 NO
sec)
40
3 NO (39.08 8 NO (34.81 13 NO 18 NO
sec) sec)
4 NO 9 NO 14 NO 19 NO (43.01
sec)
5 NO (45.91 10 NO 15 NO 20 NO
sec)
Figure 2.7 shows the effectivity of speech recognizer if obscured by a wall.
One of the worst enemies a speech recognizer has are physical obstructions.
Physical obstructions greatly decrease the effectivity of a speech recognizer. And any
building has one physical obstruction they can’t remove because it’s part of the
building’s foundation, it’s walls. Figure 2.7 shows how the speech recognizer is not
effective if obscured by a wall. The speaker stood at the optimal distance, which is 2
meters, and the speaker spoke at 70 dB. The background noise level was around 55-65
dB. Out of the 20 times the speech recognizer was tested, only trial no.11 succeeded.
Trials with a number on their side means the speech recognizer did detect that the user
was trying to activate the security measure, though it was after 30 seconds. The number
beside them was how long it took to be recognized by the speech recognizer. After 1
minute, the researchers move unto another trial. The researchers conclude that walls
will hinder and make the speech recognizer not effective.
TRIAL NUMBER ELAPSED TIME BETWEEN CALL AND ACTIVATION
1 16.98 seconds
2 15.46 seconds
3 10.32 seconds
4 15.34 seconds
41
5 13.78 seconds
6 10.53 seconds
7 14.46 seconds
8 10.16 seconds
9 8.6 seconds
10 9.09 seconds
11 10.23 seconds
12 12.04 seconds
13 9.93 seconds
14 26.20 seconds
15 11.07 seconds
Finding the Elapsed Time between Calls and Texts
Figure 3.1 shows how long the elapsed Time is between the activation of the speech
recognition software and the call made.
Figure 3.1 shows the inconsistency of how long the security measure takes to
call the inputted number. The speaker stood at the optimal distance, which is 2 meters,
and the speaker spoke at 70 dB. The background noise level was around 55-65 dB. Out
of the 15 trials, result shows that the fastest time is during the 9 th trial which is 8.60
seconds. However, there was an irregularity as the result was almost tripled during the
14th trial. These errors can be explained away as software glitches. These software
glitches make the speech recognizer inconsistent in activating the security measure.
The average time elapsed time between the call and activation of the speech
recognition is 12.946 seconds.
42
TRIAL NUMBER ELAPSED TIME BETWEEN TEXT AND ACTIVATION
1 10.23 seconds
2 7.31 seconds
3 7.25 seconds
4 8.64 seconds
5 10.01 seconds
6 6.23 seconds
7 7.12 seconds
8 6.45 seconds
9 8.13 seconds
10 8.69 seconds
11 9.23 seconds
12 7.46 seconds
13 8.12 seconds
14 7.04 seconds
15 9.62 seconds
Figure 3.2 shows how long the elapsed Time is between the activation of the speech
recognition software and the text made.
Figure 3.2 shows that the security measure, compared to Figure 3.1, is far more
consistent in sending a text than initiating a call. The speaker stood at the optimal
distance, which is 2 meters, and the speaker spoke at 70 dB. The background noise
level was around 55-65 dB. Out of 15 trials, results show that the fastest text sent was
during the 6th trial, which had a time of 6.23 seconds. The longest was during the 1 st
trial, which had a time of 10.23, not bad compared to the tripled time in Figure 3.1. The
43
average time elapsed Time between the text and activation of the speech recognition is
7.063 seconds.
Discussion
Established by Figure 1.1, the optimal distance between the user and the speech
recognizer is 0.5 to 3.5 meters, with the decibel level of the user being 70 dB. The
researchers decided to make 2 meters as the optimal distance between the speech
recognizer and user, as it provides effectivity while still providing space and range of
motion for the user. According to Figure 1.2, as the user’s distance increases, the
effectivity of the product decreases. To keep the effectivity as the user’s distance
increases, increase the user’s decibel level along with the distance, as promoted by
Figure 1.2.
The researchers measured the effectivity of the speech recognizer and found
that the speech recognizer had a 94% chance of working on the most optimal setting.
Background noise, when too loud, can hinder the performance of the speech
recognizer, as shown in Figure 2.2. 60 to 70 dB is the ideal background noise levels.
The researchers suggest the user to avoid background noise levels of up to 90 to 110
dB as they significantly drop the effectivity of the product.
The researchers recommend the user to use a phrase or sentence as their
keyword as it is proved to be more effective than using an obscure word. According to
Figure 2.4 and Figure 2.5, simple phrases were detected more often than the
complicated words. However, the researchers concluded that any word can be used
44
and the word itself will not hinder the performance of the speech recognizer. But the
pronunciation and clarity of the user is an important factor and it can reduce the
effectivity of the product. The researchers suggest using a word that the user can
properly pronounce and is easy to remember but obscure enough that it will not be said
on a regular conversation. The effectivity of the product can be reduced if obscured by
an object. Thin objects like paper and plastics will not affect the product’s effectivity but
objects that are thicker can slightly hinder the performance of the speech recognizer. In
Figure 2.7, the researchers tried to activate the speech recognizer through a wall but to
no avail. The researchers can conclude that the speech recognizer cannot be used in
different rooms.
The security measure’s elapsed time between the activation and the initiation of
a call is very inconsistent and the average time elapsed time is 12.946 seconds,
according to Figure 3.1. The security measure’s elapsed time between the activation
and the sending of a text, however, was smooth, consistent and a lot faster than
initiating a call. The average time elapsed time is 7.063 seconds, according to Figure
3.2.
CHAPTER 5
SUMMARY, CONCLUSIONS AND RECOMMENDATIONS
This chapter presents the summary of the findings, conclusions and the
corresponding recommendations.
Summary
45
Though crime rates have dropped for non-index crimes, robberies and theft has
remained constant and is even increasing. Having a security measure in every home
and public areas like malls, restaurants, and local stores can make people feel safer.
Security measures plays a big role in the lives of people because this can save people
from danger. Local stores in the Philippines lack security measures because a.) it’s too
expensive or b.) they don’t think of securing their stores until it is too late.
The researchers wanted to provide an affordable and reliable security system to
the masses, while also being forward-thinking and advanced. That’s why the
researchers decided to combine speech recognition and security measure to produce a
security measure that is hands-free, reliable, easy to use, and affordable security
measure. The researchers aim to have an affordable and viable product that can show
Filipinos a glimpse of the future of safety. Speech recognition is the easiest form of
security measure. By just saying the keyword, the speech recognizer can immediately
activate its safety precautions and call for emergency.
The researchers created the security system using the Raspberry Pi, a popular
micro-computer used by programmers everywhere and a simple USB microphone. The
researchers programmed it and fine-tailored it until it became effective enough for the
researchers’ standards. Once the speech recognizer is activated, the security measure
will then send a text and initiate a call to the inputted numbers. The call will play a pre-
recorded message that is customizable. The GUI allows the user to change the text, the
numbers it will call and the keyword used to activate the security system.
Once the product was done, the speech recognizer went through a myriad of
tests in order for the researchers to examine how effective the speech recognition is.
46
The security measure was then tested by getting the average gap between the call/texts
and activation. The outcome of the product turned out to be very effective and
successful and it proves that the speech recognition is effective and feasible as a
security measure.
Conclusion
1. What will be the optimal distance between the user and the speech recognizer for
the software and microphone to properly identify the voice?
- The researchers conducted a series of tests and concluded that the optimal
distance range between the user and the speech recognizer is around 0.5 meters
-3.5 meters and the optimal distance is 2 meters (considering all other variables
are at its optimal setting). The researchers also concluded that the decibel level
of the user and the distance between the user and the speech recognizer should
have a directly-proportional relationship in order to keep the effectivity of the
speech recognizer.
2. How accurate will the sensor and speech recognizing software be in recognizing and
identifying when the user is trying to activate the security measure?
- The researchers tested the speech recognizer 100 times, trying to activate the
security system, and they got a result of 94 out of 100. The researchers
concluded that the speech recognizer has an accuracy of 94%. Background
noise of around 60-80 dB is the optimal setting to achieve maximum effectivity for
the product. The researchers also conducted a series of tests testing the
effectivity of the speech recognizer in recognizing both obscure words and
phrases and they concluded that using phrases is not only more effective than
47
using words, it can also prevent accidental activations of the security system.
The researchers also tested the speech recognizer to see if it will activate if
covered by thin materials/objects and a wall. They concluded that thin materials
wont affect the effectivity of the speech recognizer while thicker objects, including
walls, can hinder the effectivity of a product.
software and the call/text made?
- The elapsed time between the activation of the speech recognition software and
the call made has an average time of 12.946 seconds while the elapsed time
between the activation of the speech recognition software and the text made has
an average time of 7.063 seconds.
After a countless of studies and research was conducted and done by the
researchers, the researchers concluded that speech recognition is an effective, feasible
and affordable security system.
Recommendation
There are variations in the study that the researchers recommend the future
researchers to improve on; for example, having a better microphone and equipment is
recommended if the future researchers want better results. A more advanced Raspberry
48
Pi model is recommended to process the program more efficiently, though it is not
needed.
To further enhance the product, a program that can execute the commands more
quickly and more efficiently with fewer bugs is recommended. Having your own speech
libraries instead of using google can be both bad and good, as Google’s libraries are
well optimized but having your own library can essentially remove the use of WIFI in the
product.
Better testing conditions and more trials can really flesh out the results of the
product. Asides for the success of the research, focusing on these can further help the
future researchers in aiding them if they wish to contribute to this investigatory project.
Bibliography
1. A brief history of speech recognition. (n.d.). Retrieved from https://sonix.ai/history-of-
speech-recognition
49
2. Rouse, M. (2016, September 21). What is physical security? - Definition from
WhatIs.com. Retrieved from https://searchsecurity.techtarget.com/definition/physical-
security
3. Rouse, M. (2016, December 6). What is speech recognition? - Definition from
WhatIs.com. Retrieved from
https://searchcustomerexperience.techtarget.com/definition/speech-recognition
4. Real Python. (2020, January 23). The Ultimate Guide To Speech Recognition With
Python. Retrieved from https://realpython.com/python-speech-recognition/
5. Kaysen, R. (2017, December 22). Do Security Systems Make Your Home Safer?
Retrieved from https://www.nytimes.com/2017/12/22/realestate/do-security-systems-
make-your-home-safer.html?rref=collection%2Ftimestopic%2FHome
%20Security&action=click&contentCollection=timestopics&region=stream&module=stre
am_unit&version=latest&contentPlacement=5&pgtype=collection
6. Importance of Home Security System. (n.d.). Retrieved from
http://www.netfreedom.org/the-importance-of-home-security-system.asp
7. The Importance Of Security Alarm Systems For Your Retail Store – Security Alarms
Miami - Articles - Advanced Fire & Security - Advanced Fire Sprinklers. (n.d.). Retrieved
from http://www.advfireonline.com/advanced-fire-and-security-articles-the-importance-
of-security-alarm-systems-for-your-retail-store.html
8. The National Security situation in 2018, and outlook for 2019. (n.d.). Retrieved from
https://www.google.com/amp/s/pia.gov.ph/news/articles/1016616.amp
9. Caliwan, C. L. (2019, June 16). Total crime volume down in May 2019: PNP.
Retrieved from https://www.pna.gov.ph/articles/1072470
50
10. Rouse, M. (2016, December 6). What is speech recognition? - Definition from
WhatIs.com. Retrieved from
https://searchcustomerexperience.techtarget.com/definition/speech-recognition
11. Krishnan, S. (2018, October 12). Create your own Voice based application using
Python. Retrieved from https://medium.com/@sundarstyles89/create-your-own-google-
assistant-voice-based-assistant-using-python-94b577d724f9
12. Googleapis. (2020, March 24). googleapis/google-api-python-client. Retrieved from
https://github.com/googleapis/google-api-python-client
13. Googleapis. (2020, March 26). googleapis/google-cloud-python. Retrieved from
https://github.com/googleapis/google-cloud-python
14. Speech Recognition. (n.d.). Retrieved from
https://pypi.org/project/SpeechRecognition/
15. Making Calls. (n.d.). Retrieved from https://www.twilio.com/docs/voice/make-calls?
fbclid=IwAR0mxVZzHMat3JBaBc8PJGe_0DwDLdc6IZGDOpvubp_15CH5lEJKufrHFfs
16. Prell, C. G. L., & Clavier, O. H. (2016, October 12). Effects of noise on speech
recognition: Challenges for communication by service members. Retrieved from
https://www.sciencedirect.com/science/article/pii/S0378595516303513
17. EarQ. (n.d.). Retrieved from https://www.earq.com/hearing-health/decibels
TALOSIG, Nathan E.
Lot 31, Blk. 2, Dahlia Street, Phase 7-B, Greenwoods Exec.
Village, Cainta, Rizal
09437057481
51
nathantalosig7@gmail.com
EDUCATIONAL BACKGROUND
High School
Sacred Heart Academy of Pasig (2016 – 2020)
Grade School
Saint Gabriel International School
Angelicum College
Pre School
Saint Vincent Preschool
Saint Gabriel International School
ACHIEVEMENTS
Excellence in Conduct (Grade 2-3)
Excellence in Academics (Grade 2)
Top 3 in Reading (Grade 6)
MTAP Participant (Grade 3)
MTAP Participant (Grade 7-9)
Top 3 in Bookmark Making Contest (Grade 2)
52
Best Boy Scout (Grade 4)
Green Merit Card Receiver (Grade 7-1st Quarter, 3rd Quarter, 4th Quarter)
White Merit Card Reciever (Grade 7-2nd Quarter)
White Merit Card Reciever (Grade 8, Grade 9)
Green Merit Card Receiver (Grade 10- 1st Quarter, 2nd Quarter, 3rd Quarter)
Poetry Slam Contest: 3rd Place (Grade 7)
Mr. & Ms. UN Participant (Grade 7)
Mr. & Ms. UN First Runner Up (Grade 9)
SHAP Pautakan Participant (Grade 8)
SHAP Pautakan Participant (Grade 9)
Literary Cosplay Junior Winner (Grade 9)
Literary Cosplay 3rd Placer All in All (Grade 9)
Perfect Attendance (Grade 9-1st Quarter, 3rd Quarter)
Ultimate Sci-Math Quiz Bee Participant (Grade 8)
Ultimate Sci-Math Quiz Bee Participant (Grade 9)
English Quiz Bee 3rd Placer (Grade 10)
Social Studies Quiz Bee Participant (Grade 10)
2nd Honorable Mention (Grade 6)
53
Second Honors (Grade 5)
Pep Squad Varsity (Grade 9)
Cheerdance Competition (Grade 8-9)
Volleyball Varsity (Grade 9)
Dance Troupe (Grade 6)
Choir (Grade 5-6)
INTERESTS
Rapping
Making YouTube Videos
Being Beautiful and Smart
______________________________________________________________________
CHARACTER REFERENCE
Name: Kendra Caramat
Occupation: English Teacher
Name: Raldin Gem Frias
Occupation: Science Teacher
Name: Yvonne Cagalingan
54
Occupation: Filipino Teacher
Name: Katreng Solas Aporo
Occupation: Social Studies Teacher / Best Friend
“I hereby certify that the information above is true and correct.”
REYES, Louise Erlle P.
Zuri Residences, Tokyo Avenue, Block 6 Lot 8, Cabrera Road,
Barangay Dolores, Taytay Rizal
09063037954
izzyreyes1222@icloud.com / louiseekim03@gmail.com
55
High School
Grade School
Calvary Christian School
Pre School
Mona Lisa Academy
ACHIEVEMENTS
With Honors (Nursery – Grade 4)
Most Improved (Grade 3 – 4)
Dance Class (New Generation Workshop) (2010)
Violin and Piano Lesson (2012 - 2014)
Contestant on The Voice Academy (2015)
With Honors (2017 – 2020)
Dance Class (ACTS Academy) (2018)
Cheer Dance Competition (2017 – 2019)
56
Perfect Attendance Awardee (2015 – 2019)
INTERESTS
Dancing
Singing
Modeling
______________________________________________________________________
CHARACTER REFERENCE
MIRAFLOR,Robbin Cross F.
Ciudad del Carmen b1 l1 Rosario Pasig City
09202672074
robbinmiraflor@yahoo.com/crossmiraflor16@gmail.com
57
High School
Sacred Heart Academy of Pasig
Grade School
Sacred Heart Academy of Pasig
Pre School
Woodstock Learning center
ACHIEVEMENTS
With Honors (Grade 10 1st quarter)
Perfect Attendance Awardee
INTERESTS
Basketball
Gaming
______________________________________________________________________
CHARACTER REFERENCE
58
DONATO, Asianti Crishna E.
Unit 141 Amethyst bldg. East Residences Ortigas Pasig City
09993080408
donatoasianti@gmail.com
59
High School
Grade School
Paintbox School for Kids
Pre School
Paintbox School for Kids
ACHIEVEMENTS
With Honors (Nursery –Grade 5)
Second Honorable Mention (Grade 6)
Piano and swimming lessons (2010-2013)
Violin lessons (2016-2018)
Cheer Dance Competition (2017;2019)
INTERESTS
Reading
Painting
Drawing
60
______________________________________________________________________
CHARACTER REFERENCE
DAGSIL, Dyanne Francine D .
B6L15 Star Apple St. Ph. 8D Greenwoods Exe. Vil. Taytay Rizal
2752891
dyannefrancinedagsil@gmail.com
61
High School
Sacred Heart Academy of Pasig (Grade 7-10)
Grade School
Sacred Heart Academy of Pasig (Grade 1-6)
Pre School
John Michael Learning Center
ACHIEVEMENTS
Perfect Attendance Awardee (Grade 6-10)
With Honors (Grade 1-10)
3rd Place in Filipino Poster Making Contest (Grade 8)
INTERESTS
Animating
Drawing
______________________________________________________________________
CHARACTER REFERENCE
62
63

The Feasibility of Speech Recognition As A Form of Security Measure

Uploaded by

Copyright:

Available Formats

The Feasibility of Speech Recognition As A Form of Security Measure

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Feasibility of Speech Recognition As A Form of Security Measure

Uploaded by

Copyright:

Available Formats

THE FEASIBILITY OF A SPEECH RECOGNIZER AS A FORM OF SECURITY

In Partial Fulfillment of the

BAROJA, Drezen Scott A.

DAGSIL, Dyanne Francine D.

DONATO, Asianti Crishna E.

MIRAFLOR, Robbin Cross F.

REYES, Louise Erlle P.

Mr. Raldin Gem Frias

Ms. Kendra Caramat

Chapter 1: The Problem and Its Background 1

Background of the Study 2

Statement of the Problem 8

Significance of the Study 9

Scope and Delimitations 10

Chapter 2: Review of Related Literature and Studies 13

Chapter 3: Research Methodology 20

Chapter 4: Results and Discussion 27

Chapter 5: Summary, Conclusions and Recommendations 1

Feasibility of a Speech Recognizer as a Form of Security Measure” focuses on finding

answers three research question imposed by the researchers. The researchers

tests, that speech recognition is a feasible security measure.

THE PROBLEM AND ITS BACKGROUND

becoming an essential part of people’s everyday lives.

Security measure is a precaution taken against terrorism, espionage or other

danger. “It is the protection of personnel, hardware, software, networks and data from

intruders from affecting protected assets.

Speech recognizer is a more simple and effective form of security measure.

inability to recognize or capture words due to mispronunciation, lack of support to

hands-free control of various devices and equipment, provides input to automatic

Background of the Study

Security measure is important because it protects the belongings of a person. It

measure will be placed should be specific and effective.

of speech recognizer were automated telephone systems and medical dictation

vocabularies in numerous languages. According to Hope (2019), today, speech

recognition is done on a computer with ASR (automatic speech recognition) software

effectiveness, and/or inform decisions about future programming,” (Stufflebeam, 2003).

planning, structuring, implementing and reviewing and revising decisions, each

examined through a different aspect of evaluation - Context, Input, Process, and

Product evaluation,” (Robinson, 2002).

understandable process from the Flowchart Diagram (developed by Newman and

Goldstein around the 1940’s). A flowchart is a diagram of the sequence of movements

teardrop signifies the Product.

chosen as we psychologically associate blue with determination and goal-making,

important qualities in setting up our objectives or the context. In Context, the

accomplish, as shown in Figure 1. The researchers want to make an affordable security

security companies to use speech recognition in their security measures.

the researchers to assemble a concrete list of materials and components needed to

determine the effectivity of a speech recognizer as a form of security measure.

Statement of the Problem

The purpose of this study is to develop a security measure using a speech

researchers aim to answer the following questions:

for the software and microphone to properly identify the voice?

software and the call/text has been made?

to produce is approximately 60-80 dB, with the distance around 2 – 10 meters

the microphone is 5 meters.

mic and the clarity of the voice of the user.

and produced the appropriate number of decibels needed by the speech

seconds to process and call/text the number inputted.