Geleta Abdissa
Geleta Abdissa
Geleta Abdissa
February 2020
Abstract
Due to various innovative mobile services and applications, traffic is constantly increas-
ing in size and complexity globally and as well as locally in Ethiopia. To fulfill these
requirements in both quality and quantity, a wide range of radio frequency signal cover-
age areas are required. One means of satisfying this requirement is proper planning and
devising proper network management during operational phase for network coverage
hole detection for optimization of uncovered area. Measurement collection is a primary
step towards analyzing and optimizing the performance of a telecommunication service.
In this sense, this work aims to present a solution that contributes to reduce costs and
time in network monitoring by exploiting user equipment Measurement Report (MR)
data via the Minimization of Drive Tests (MDT) functionality.
An automatic coverage hole detection based on classification techniques, which is a
Decision Tree (DT) classifier-based approach is used for rule induction to identify
different scenarios of coverage holes and their respective areas for better service delivery
purposes. The main idea is to jointly observe signal strength and signal quality for
effective coverage-hole detection. It uses a new approach to classify four coverage
scenarios such as “good coverage and good quality”, “good coverage but poor quality”,
“poor coverage but good quality”, and “poor coverage and poor quality” in Universal
Mobile Telecommunications System (UMTS) network considering the last three coverage
classes as coverage -hole with different severity levels.
The result showed that the applied model accuracy was 99.98%, and also the proposed
approach could classify the target classes and allows the visualization of network
performance in terms of signal strength and quality associated with a location. All
four coverage scenarios were visibly observed and the results are almost uniform with
validation results found from the driving test (with about 7dB and 1dB difference of
RSCP and Ec/No respectively considering the cumulative distribution function value
of 18%). 77% of coverage areas were classified as good coverage condition.
Keywords-UMTS, Coverage hole, MDT, MR, DT.
iii
This is to certify that the thesis prepared by Geleta Abdissa Wayessa, entitled UMTS
Network Coverage Hole Detection using Decision Tree Classifier Machine Learning
Approach and submitted in partial fulfillment of the requirements for the degree of
Master of Science in Telecommunication Engineering complies with the regulations of
the University and meets the accepted standards concerning originality and quality.
——————————– ——————————–
Chairperson Signature
——————————– ——————————–
Examiner Signature
——————————– ——————————–
Examiner Signature
——————————————————————–
Dean, School of Electrical and Computer Engineering
iii
Declaration
I, the undersigned, declare that this thesis is my original work, has not been presented
for a degree in this or any other University, and all sources of materials used for the
thesis have been fully acknowledged.
————————–
Signature
This thesis has been submitted for examination with my approval as a University
advisor.
————————–
Signature
First and foremost, I am grateful to the almighty God, for giving me this opportunity
and seeing me through all the challenges in my academic work.
My special gratitude goes to my advisor Dr.-Ing Dereje Hailemariam for his excellent
guidance and valuable comments up to the submission of my thesis. I would also like
to thank my evaluators Beneyam Haile (PhD) and Ephrem Teshale (PhD) for their
valuable feedback during the thesis progress presentations and also thank Mr. Yonas
Yehualaeshet for his dedicated support and guidance.
Next, special thanks go to my wife Gadise Regassa, my children Nani, Moti and Amen,
my parents, brothers, and sisters. Without their help, encouragement and effort in
all moments of my life nothing of this would be possible. I appreciate their support,
patience, and encouragement they provided me throughout my studies.
I also wish to appreciate my colleagues and ethio telecom staffs for their support and
encouragement throughout the course work. I also thank the member of Engineering
department’s staff for their support in providing data.
Last but not the least, a special acknowledgment extends to ethio telecom for giving
me the opportunity to do my master study and be responsible for my full sponsorship.
iii
Lists of Acronyms
2G 2nd Generations
3G 3rd Generations
4G 4th Generations
AI Artificial Intelligence
CM Configuration Management
CN Core Network
CS Circuit Switched
DT Decision Tree
FN False Negatives
FP False Positives
iii
iv
IG Information Gain
ML Machine Learning
MR Measurement Report
NE Network Element
PM Performance Management
PS Packet Switched
RF Radio Frequency
iv
v
TN True Negatives
TP True Positives
UE User Equipment
v
Table of Contents
List of Figures ix
List of Tables xi
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Scope of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 UMTS Overview 10
2.1 UMTS Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 UMTS Functional Units . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 User Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Node-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Radio Network Controller . . . . . . . . . . . . . . . . . . . . . 12
Table of Contents vii
3 Data Mining 28
3.1 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Decision Tree Structure . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Decision Tree Types . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Attribute Selection Criteria . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Evaluation of Decision Trees . . . . . . . . . . . . . . . . . . . . 36
vii
viii Table of Contents
4 Experimentation 39
4.1 Area Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Threshold Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Coverage Scenario (Target Class) Definition . . . . . . . . . . . . . . . 41
4.4 System Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References 61
viii
List of Figures
ix
x List of Figures
x
List of Tables
4.1 Geographical coordinates of available sites and the selected UMTS Node-Bs 40
4.2 Classification of signal coverage and quality based on RSCP and Ec/N o
level [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Summary of coverage scenarios . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Input dataset sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Data sample after prediction . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Generated test result in confusion matrix form . . . . . . . . . . . . . . 46
xi
Chapter 1
Introduction
This chapter provides background of this thesis and describes statement of the problem,
objective, scope and limitation parts of the thesis. Moreover, it briefly describes
reviewed literature that are related to the study, the methodology used and the
contribution of the thesis. Finally, the thesis structure is outlined.
1.1 Background
Since the appearance of cellular networks, traffic is constantly increasing due to various
innovative mobile services and applications [6]. The public interest has become higher
as operators offer new services. Due to these and other reasons, the number of unique
mobile users globally reaches about 5.1 billion in 2018 and will reach 5.8 billion by 2025
[7]. In order to fulfill subscribers’ requirements in capacity, coverage and service quality
a wide range of Radio Frequency (RF) signal coverage areas are required in terms of
strength and quality. One means of satisfying this requirement is proper planning and
deploying sufficient Network Element (NE)s and devising proper network management
[8].
In Ethiopia, about two decades have passed since the first official Global System for
Mobile communications (GSM) service launched for the first time in Addis Ababa in
1999 [6]. Later, for better mobile data services provisioning, ethio telecom, the sole
service provider in Ethiopia, implemented different technologies like an enhancement to
General Packet Radio Services (GPRS) and Enhanced Data Rate for GSM Evolution
(EDGE) and also deployed 3rd Generations (3G) and 4th Generations (4G). Currently,
ethio telecom is running multi Radio Access Technology (RAT) such as GSM, Universal
1
2 Introduction
2
1.2 Statement of the Problem 3
3
4 Introduction
to optimize the network with coverage hole problem, efficiently detecting the coverage
hole area is the task to be performed by capturing required radio measurements.
Operators, including ethio telecom, are using the traditional method of radio measure-
ment data collection for network coverage and quality evaluation which is called drive
test. However, there are some challenges and limitations in traditional drive testing
that could be improved. Firstly, it is a resource-consuming task requiring a lot of time,
specialized equipment, and the involvement of highly qualified engineers. Secondly,
it is difficult to capture the whole coverage data from every geographical location by
using manual drive testing, since most of the UE generated traffic comes from indoor
locations, while drive testing is often limited to roads. Thus, the drive test method
cannot offer a complete and reliable picture of the network coverage situation at a
reasonable cost and time.
So far, some researches have been conducted on network coverage hole detection by
implementing a data-mining approach and other techniques. One is based on the
analysis of the extended Radio Link Failure (RLF) triggering report (event-triggered
report) without needing field measurements [17]. However, in this method, only the
most severe cases, which occur after the link failures can be detected, but other
situations like signal degradation before failure and other information in the periodic
reports cannot be identified. Another study also conducted on coverage hole detection
by using Inter Inter RAT (I-RAT) handover data. I-RAT handover is the handover
process between different technologies (e.g., from 2nd Generations (2G) to 3G). In this
study, only the heterogeneous deployment scenario has considered, which cannot be a
solution for the case of homogeneous scenarios.
1.3 Objectives
The main objective of this thesis is to detect coverage hole in UMTS network by
using Decision Tree (DT) classifier supervised Machine Learning (ML) algorithm.
Received Signal Code Power (RSCP) and Energy per chip to the total received power
density (Ec/No) data collected from UEs via the Nastar tool serves an input for the
classifier.
4
1.4 Scope of the Research 5
• To identify and collect required key measuring parameters for coverage hole
evaluation.
• Classify the collected parameters using the selected algorithms as per their
pre-defined classes.
• Locate the detected area of coverage-hole on a Google map (by considering the
relative density of poor RF).
This thesis addresses one of the use cases of MDT for UEs MR data collection which
is used for coverage hole detection. The scope of the thesis is to investigate the
coverage hole detection method based on MR data of the UMTS mobile network using
a DT algorithm. The thesis is limited to coverage hole detection only rather than
diagnosis the root cause of the problem. The captured data has considered only busy
hours traffic time(condition) i.e. the data generated within certain periods of a day is
considered. Moreover, some parts of Addis Ababa UMTS Node-B sites at the selected
area are considered for analysis. However, the author is confident that the selected
area will represent and can be replicated to the actual whole ethio telecom network
with reasonable accuracy, and this factor does not significantly limit the applicability
of the research.
5
6 Introduction
1.5 Contributions
This thesis can contribute in reducing extra cost and time of the operator like ethio
telecom by providing instant data collection and processing in a more flexible and
cheaper way for network performance assessment. It can also contribute to network
planners in providing sufficient information about the RF signal status of the specific
geographic area of the active network for optimization triggering action. This is due to
the capability of the model in considering the joint effect of RSCP and Ec/No at the
same instance and capturing all possible locations covered by the network including
indoor environments where drive testing cannot be addressed and the algorithm used
in exploring accurate information.
6
1.6 Literature Review 7
severity level of the holes into three classes. The study used individual user-based trace
data on both serving and targeted cells that make better detection accuracy however,
it might be somewhat complex for huge data processing.
Another literature by Galindo et al. and V. Dalakas in [19, 20] conducted the study to
detect the coverage hole by using the data collected mainly from the UEs remotely and
used Bayesian interpolation to forecast the unknown location information from the
collected sample data. The authors in [21] followed the cognitive-tool based approach
that provides location awareness which is Radio Environment Maps (REM)s, whereas,
the study by Lin et al. in [20] used graph-theory based mobile network insight analysis
framework to detect the coverage hole from both network data and user behavior data.
The detection accuracy of the study depends on the sample data size used for the
interpolation case and the reference locations are limited to site-level like site ID, and
location (latitude, longitude) for graph-theory based and so that it has an impact on
accuracy.
Other studies were conducted on the detection of coverage hole and quality of service
estimation by the name of MDT. These studies captured the data by tracking mobile
devices and their real-time information about where, when, and what information about
mobile users to analyze the coverage status of the network. Most of the approaches use
supervised and unsupervised learning techniques to provide different solutions for this
use case. These approaches are observed in [22–24], where the authors address the QoS
estimation by selecting different Key Performance Indicator (KPI)s and correlating
them with common nodes measurements, to establish whether a UE is satisfied with
the received QoS or not. The studies are focused on QoS verification use case of MDT
and used simulation environment. In another study [17], extended RLF reports are
used to classify network problems into three categories that are downlink coverage,
interference and handover problems using DT classifiers. Based on the results, the
coverage, interference and handover problems can be differentiated by using the RLF
reports containing RF measurements from both the serving and neighboring cells. The
study tried to diagnosis the causes of the failures which minimize the operator challenge
in solving the problem. However, only the event-triggered data is considered which
may not provide full information on the network.
Additionally, in [25], the authors focus on multi-layer heterogeneous networks. They
present an approach, based on regression models, which allows predicting QoS in
heterogeneous networks for UEs, independently of the physical location of the UE.
This work is extended in [26] by taking into account the most Principal Component
7
8 Introduction
Analysis (PCA) on the input features and promoting solutions in which only a small
number of input features capture most of the variance, the number of random variables
has reduced. Based on previous results, in [27] the same authors defined a methodology
to build a tool for smart and efficient network planning, based on QoS prediction
derived by proper data analysis of UE measurements in the network.
Generally, the studies tried to follow efficient data collection methods; implemented ML
algorithms for adaptive learning and parameters used are easily handled and applicable
for coverage hole detection. However, the methods that they followed lack the generality,
in detecting the coverage-hole for the heterogeneous network; event-triggered data was
used, which cannot provide sufficient information about coverage hole; complex for
huge data processing and the detection accuracy of the study depends on the sample
data size used.
In this thesis, the measurements are captured using MDT framework from ethio telecom
network by the support of the Nastar tool to investigate RF signal strength and quality.
This approach is of special importance to network service providers in identifying or
locating a particular area experiencing network problems due to coverage-hole prior
to the link failure. In addition to these, the method explores four types of coverage
scenarios by the support of ML that can provide extra information about the specific
area for the experts and so that the diagnosis of the problem will be simplified.
1.7 Methodology
Since the goal of this thesis is to detect the areas having poor RF conditions (coverage-
hole) in different scenarios such as: “Good Coverage (good RSCP power) and Poor
Quality (poor Ec/No)”, “Poor Coverage (poor RSCP power) and Good Quality (good
Ec/No)”, “Poor Coverage (poor RSCP power) and Poor Quality (Poor Ec/No)”, the
main task is implementing classification techniques and identifying the location of the
UEs sending Poor RF signals. Supervised ML techniques are used by considering the
previously collected measurement report data from Nastar system as a training data
and test data-set. DT algorithm is used for classification in this work. The tools like
python, MS excel, and MapInfo are used.
Generally, the methodology used in this research include:
8
1.8 Thesis Layout 9
• An extensive literature review has been conducted related to the work to under-
stand coverage hole and hole detection, ML algorithms, and network parameters
used for coverage hole detection.
• The required data for the study has been captured using the Nastar tool and
preprocessed using MS Excel and python platform.
• Target classes have been defined based on currently in-use thresholds in ethio
telecom on the signal strength and quality and classification of the collected data
have been done using the DT algorithm.
• Predicted classes have been displayed on Google map and the more densely visible
locations considered as detected areas.
• Finally, the results have been discussed using Google map visualization, plots,
graphs, and tables and published in the form of a final thesis paper.
The thesis has organized into six chapters, including this one. Chapter 2 reviews a
state-of-the-art, related to this work. It constituted an overview and structures of
UMTS radio communications, UMTS functional units, and interfaces. UMTS coverage
hole and coverage hole detection techniques also included in this section. It also
discusses a UMTS network performance data collection methods such as performance
management, configuration management, drive tests, crowdsourcing, and network
traces. Chapter 3 explores data mining techniques, specifically supervised learning. It
also contains an overview, structure, attribute selection criteria, algorithm learning,
and evaluation of the DT algorithm. Chapter 4 presents the experimentation that
comprises area selection, threshold definition, coverage scenario definition, and system
processes. Chapter 5 contains result discussion and validation. Finally, in Chapter 6
the work main conclusions and feature works are drawn.
9
Chapter 2
UMTS Overview
10
2.2 UMTS Functional Units 11
Uu
Iub
Node-B
USIM
Iu PLMN
MSC/
RNC VLR
GMSC PSTN
ISDN
Node-B
Iur
Cu HLR
Node-B
ME
Node-B
2.2.2 Node-B
Node-B is the name given to the 3G base stations and it is the logical node responsible
for radio transmission/reception in one or more cells to/from UE. The main function
of a Node-B is to establish the physical implementation of the Uu interface and the Iub
interface. Other functions of the Node-B include spreading, scrambling, modulation,
channel coding, power control, interleaving, synchronization, and measurement report-
ing [31]. It controls the data flow between the Iub and the Uu interfaces, terminates
the physical layer, extracts the Media Access Control(MAC) protocol data units, and
11
12 UMTS Overview
transports them across the Iub interface to the RNC. It also participates in radio
resource management.
The RNC is the central unit in 3G Radio Access Networks (RAN). It is a governing
element in the UMTS radio access network and is used for controlling the Node-Bs
that are connected to it [1]. It is also responsible for controlling the use of all 3G radio
resources by performing Radio Resource Management (RRM) procedures, handover
decision and transmission scheduling [30, 32]. It also plays an important role in
configuration management because the radio-related parameters for the whole Radio
Network Subsystem (RNS) are stored in RNC. For performance management, the
RNC updates performance counters, which are later used to calculate the KPIs for
RAN. RNC is also responsible for fault management by keeping track of the alarms
in any Node-B controlled by that particular RNC and also serves as the intermediate
node which connects CN to RAN.
Home Location Register (HLR) is a database located at the local system of the user,
used to store the master copy of the subscriber service features [30]. Such features
include information on the services allowed, roaming areas and information of value-
added services. This database is created when a new subscriber registers to the system
for network access and is maintained throughout the service period. To find a route to
the UE for the incoming service, the HLR also stores the location information of the
UE.
12
2.3 UMTS NE Interfaces 13
Serving GPRS Support Node (SGSN) function is similar to that of the MSC/VLR,
except that it is used for Packet Switched (PS) services. The network part connected
through the SGSN is referred to as the PS domain.
External networks fall into two groups such as CS network and PS network. CS
network provides circuit-switched connections, such as the existing telephone services
like Integrated Service Digital Network (ISDN) and Public Switch Telephone Network
(PSTN) whereas PS network provides packet-switched connections. Internet is an
example of the PS network.
Interfaces are the logical connections between the UMTS NEs. All the interfaces are
open, which allows an operator to build its UTRAN and CN by using equipment of
different manufacturers, thus reducing the cost for network construction. The main
open interfaces defined in the UMTS network are [30]:
13
14 UMTS Overview
• Iur interface: This interface is used to connect two RNCs. It allows soft handover
between the RNC equipment of different manufacturers as an open interface.
2.4.1 Introduction
In wireless mobile networks QoS changes dynamically due to large variety of factors.
Because of that Mobile Network Operator (MNO)s monitor and optimize their network
regularly in order to provide a good network coverage and quality of service. There
could be different reasons that cause coverage holes such as [4, 33]:
• New building construction and other obstruction which shadows a certain area;
• Network faults.
Coverage hole is the area where the received signal strength or/and quality level of the
serving and neighbor cell is below the levels required to maintain basic services. The
presence of coverage holes in mobile networks is a common problem for mobile operators.
It is an aspect a user can easily observe and which mainly influences the user-experience.
Completely avoiding the existence of coverage holes in cellular networks during the
planning phase is almost impossible and therefore, coverage optimization processes are
usually required during the operational phase [19]. Without coverage provisioning, it
is difficult to talk about service, or quality of service provisioning. Therefore, cellular
14
2.4 UMTS Network Coverage Hole and Coverage Hole Detection Techniques 15
coverage-hole detection and enhancement is one of the basic tasks that MNOs have to
give attention.
In terrains with uneven morphology, the signal coverage and quality are affected
by a lot of factors such as human structures, non-uniform human/vehicular traffic,
hills, vegetation and the like. Other key limiting factors are distance from the cell,
intercellular/intracellular interference and random background noise in the network
environment [34]. Within this context, it highly needs a regular systematic assessment
of deployed and operational mobile communication networks. This will provide up-to-
date information for mobile operators to support the network engineering parameter
tuning process and guarantee end-user satisfactions.
To detect and improve such problems and the others, radio measurements are needed.
Some of the key indicators or metrics for performance evaluation at the system level
and perceived at the UE are RSCP and Ec/No [34]. These measurements can be done
with developed equipment directly at the NE (base station) or by drive tests.
As described in Section 2.4.1, coverage hole can happen due to different factors.
However, all effects are mostly reflected in two aspects which are signal strength
(RSCP) and quality (Ec/No). Despite receiving a high-power level, communication can
be poor because of the interference effect which leads to communication degradation
and so that the transmission rates could be reduced [35, 36]. Interference is typically
measured by the Ec/No of the Common Pilot Channel (CPICH), that infers, how clear
is the signal received. This means that the CPICH power level does not guarantee the
coverage of the network unless the quality of the network is fulfilled.
In 3G networks using WCDMA, mobile terminals receive signals from multiple node-Bs.
On the contrary, in a cellular system where all the air interface connections operate on
the same carrier, the number of simultaneous users directly influences the receivers’
noise floors. During times of peak use, distant users/customers may experience lower
signal than normal as the interference increase. The increased interference causes a
need for additional power in order to maintain the link quality, which in turn effects
additional capacity and coverage degradation [37]. Therefore, it is possible that the
mobile terminal cannot start logging network because several pilot signals are received
with high reception, but none of them is sufficiently dominant so that the mobile can
choose. Not only this, other reason can be overshooting cells, that means the presence
15
16 UMTS Overview
Network coverage-hole detection is the process of identifying the area where the
signal strength or/and quality level is below the required value for the given specified
service standards by using different techniques and tools. It is a special case of
signal degradation on the specific area due to different reasons like cell-overload,
malfunctioning base station, blockage or planning problem. One approach for detecting
a coverage-hole is to monitor the signal strength and quality of the network using
different techniques. MNOs are following different techniques to manage their network
coverage status. However, the techniques and tools operators use can affect the business
due to inefficiency in time and cost.
As it was mentioned in Section 1.6, historical data analysis by using modern data
mining techniques is recently in use to improve the coverage hole detection efficiency.
Most of the approaches use supervised and unsupervised learning techniques to provide
different solutions for this use case. The detection is based on employing the knowledge
mining method to find hidden patterns from the UEs MR databases. Classification is
16
2.5 UMTS Network Performance Data Collection 17
one of the data mining techniques used widely. There are many algorithms used for
classification purposes such as DTs, neural networks, Bayesian networks, and many
others. In this thesis, a supervised learning technique is used to mine the knowledge
from the data using DT algorithm. It is carried out by conducting classification on the
gathered MR reports which are a key performance indicators for revealing the coverage
hole. The analyzed MR reports contained RSCP and Ec/No radio measurements from
the serving cell.
The detection is based on the joint processing of the RSCP and Ec/No measurements.
The DT-based approach is used to classify target classes (good coverage and good
quality, good coverage but poor quality, poor coverage but good quality, poor coverage
and poor quality). The reason why these joint processing needed is to visualize the
coverage scenario of the geographic area easily and so that it will improve the quick
diagnosis of the problem. For example, poor RSCP but good Ec/No scenario can be
reflected due to less interference from other sites and the site itself. Generally, the
relation between Ec/No, RSCP and Received Signal Strength Indicator (RSSI) [39, 40]
can specify more about the coverage scenarios as shown in Equation (2.1) below.
RSCP
Ec /No = (dB) (2.1)
RSSI
From this equation, we can observe that RSSI and everything that affect it have a very
big impact over Ec /No . In other words, we may have good RSCP, but if RSSI is bad
(because of pilot pollution, overshooting, very high speed, external interference, huge
traffic load ), then Ec/No will be negatively affected. On the contrary, when we have
relatively low RSCP but the RSSI is good enough, then Ec/No will be good.
Because of the rapid increase in the use of wireless devices and continued expansion
of cellular networks, effective control of the cellular cells’ coverage has become more
important to ensure QoS provisioning. The level of network coverage provided to
various parts of a region under consideration has to be measured on a regular basis.
This regular check is to determine any coverage holes produced due to construction,
planning, network failure and other factors. Measurement collection is a primary step
towards analyzing and optimizing the performance of a telecommunication service. So,
it is important to utilize network-based or user-based measured information.
17
18 UMTS Overview
CM provides the operator with the ability to assure correct and effective operation
of the 3G network as it evolves [8]. CM actions have the objective of controlling and
monitoring the actual configuration of NEs and network resources.System modification
service component and system monitoring service component are some parts of the
CM service components. The first one is an action performed to introduce new or
modified data into the system due to optimization or configuration, for example software
upgrading. The second component, provides the operator with the ability to receive
reports on the configuration of the entire network, or parts of it, from managed NEs.
In terms of CM functions, they encompasses operator assistance in making the most
18
2.5 UMTS Network Performance Data Collection 19
timely and accurate changes, ensure that CM actions will not result on secondary
effects, traffic has to be protected from effects of CM actions and the mechanisms has
to devised to overcome data inconsistencies [43].
Drive tests are a method of measuring and assessing the coverage, capacity, and QoS of
a mobile radio network by using moving vehicles. It is one of the methods with which
user-side measurements can be collected [32]. The real performance of the cellular
network is usually viewed from the perspective of mobile subscribers and this is the
reason why operators use drive tests in assessing the coverage and the quality of their
networks as the tests give the results from the field. It provides accurate real-world
capture of the RF environment under a particular set of network and environmental
conditions [32].
The most important reasons for drive tests performed in the network is for the
optimization of capacity, coverage, mobility or QoS verification [45]. By measuring
what a subscriber is expected to experience, in a particular location, MNOs can then
make corrective planning for network performance improvement. It is however costly
due to the recruiting of skilled engineers/surveyors and using vehicles. Moreover, it is
constrained in both time and space and rarely covers in-building areas which greatly
limits the validity of data as it restricts real-life scenarios.
Mobile Crowdsourcing (MCS) refers to a group of people who voluntarily collects and
shares data using widely available mobile devices. Mobile devices are equipped with
abundant sensors (e.g., Global Positioning System (GPS), accelerometer, camera, etc)
and powerful computing capabilities, which allow them to collect various types of data
such as image/voice/video, location, and ambient information [46]. This trend enables
individuals to sense, collect, process and distribute data around people at any time
and place. Moreover, advances of communication technologies such as Wi-Fi, and
Bluetooth, offer mobile devices direct connectivity to the Internet to exchange data at
high speed at anytime and anywhere.
Having the capabilities on measurement collection at the user-side, operators are
using this powerful tool in analyzing their network’s performance. As an example,
19
20 UMTS Overview
radio coverage and quality can be monitored with such practice, by using subscribers’
smartphones, which collect information on a signal level and quality that they are
receiving and send such information to the operator. In this way, crowdsourcing can
be used as a method of monitoring the network performance which can minimize drive
tests and broaden the areas that are being monitored in real-time. In spite of the great
benefits that MCS gains, it still faces several serious problems in terms of security,
privacy, and trust. Not only these, but it may also increase costs as sometimes the
subscribers are paid in order to provide this information. There are also other issues
regarding this technology; among them, how should the network be built and structured
to support MCS traffic or how to avoid the overwhelmed processing of subscribers’
machines [47].
2.5.5 Traces
Traces are a means of network performance data acquisition resulting from communica-
tion between RNC and network elements attached to it including the user equipment.
When connected to a Node-B from a certain RNC, a UE is constantly exchanging
data within the network to inform about its communication conditions. All data from
each UE are collected and logged at the RNC and represents a powerful feature to
analyze and monitor the network performance. These data are called traces, and they
are protocol events resulting from communication between RNC and network elements
attached to it, UE (via Node B), Node-B, other RNCs and CN. All these events can
be collected at the RNC through tools developed by vendors by activation of the trace
functionality in an RNC [43].
Traces generate a huge amount of information, providing useful data on the quality of
the communications. As an example, they contain measurements made by UEs relative
to the quality of the signal that they are receiving with the storage of RSCP or Ec/No
values [43].
Vendors have developed tools to process traces information to monitor their networks.
As an example, there is the Ericsson’s General Performance Event Handler (GPEH)
system from Ericsson, and Call History Recording (CHR) for Huawei [48, 49]. Huawei
CHR is a trace logging feature in GSM, UMTS and LTE that collects call and cell
information, radio measurement and messages for all calls in the network and sends
20
2.5 UMTS Network Performance Data Collection 21
to Nastar server. In this thesis Nastar tool is used for data collection and few details
about the tool is provided in the following subsection.
Nastar:The Nastar performance analysis system is an intelligent and integrated tool
developed by Huawei Technologies. It allows locating and analyzing wireless network
quality problems and is applicable for GSM, GPRS, EDGE, UMTS, and LTE networks
[2]. It supports the operations of multiple users; various wireless performance analysis
and it is a basic support platform for further analyzing and locating wireless networks
problems. The Nastar stands on a server-client architecture and includes a set of
functions as service geographic observation, cell and terminal performance analysis
as well as coverage, neighboring and pilot pollution analysis. Figure 2.2 shows the
location of Nastar tool in the network.
With the correct and precise analysis of traces information, many valuable possibilities
are created, either in network performance management, processes automation (e.g.,
SON,MDT), QoS improvement or in costs reduction [50]. All these factors are made
interesting for users and, even more important, for operators where the mechanisms
contribute for easily manage the network and reducing unnecessary costs.
21
22 UMTS Overview
Trace data are used for different use cases in telecommunication for network performance
assessment. As per [51] trace data can be used in different use cases, such as to check
radio coverage in a certain area, interoperability between UEs from different vendors,
QoS profile for a subscriber after a subscriber complaint, to check the malfunctioning of
mobile station or to test new features. This thesis focuses on one use case of trace data
which is checking radio coverage status of UMTS network. To draw measurements,
operators in the NGMN alliance proposed a standardized solution in 2011 called MDT
(i.e., in 3GPP release-10 specification for LTE and UMTS networks)[52]. Key features of
MDT are that a mobile device reports its location information along with performance
measurements using Trace Collection Entity (TCE) via RNC, thereby allowing to have
a much more fine-grain view of a cell’s performance [3]. MDT addresses the issue
that often drive-tests to have to be executed to monitor and assess mobile network
performance in an efficient way. Some details are presented on MDT( architecture,
reporting, location estimation, and measurement) in the following subsection.
MDT Architecture
22
2.5 UMTS Network Performance Data Collection 23
benefit of using control plane architecture is that it allows RAN nodes (i.e., an eNB or
RNC) to include additional data in UE measurements [3]. The measurements for MDT
can be configured either by using management-based or signaling-based configuration
procedures [4]. In signaling-based MDT, UE selection is performed in the Operation
and Maintenance (OAM) based on a permanent UE identity, which uniquely identifies
the UE in the network, such as International Mobile Subscriber Identity (IMSI) or
International Mobile Subscriber Identity and Software Version (IMEI SV) [3]. In
management-based MDT, trace functionality is used to configure a specific RAN node
for collecting measurements for a certain area.
Since this thesis is more about processing and analyzing the data collected from a set
of UEs, the following sections focus on applications that employ a management-based
MDT procedure. The MDT data collection is initiated and controlled by the OAM
system and then the UE and RAN collects the data and sends it to the TCE, which
stores the data and can be used for post-processing analysis [4]. An illustration of
management-based MDT architecture is depicted in Figure 2.3 which is based on [3].
23
24 UMTS Overview
MDT Reporting
The overall MDT operation aims at delivering MDT reports to a data repository and
the collected measurements are transferred to a specified file server which is called a
TCE. There are two MDT operational modes of how the measurement collection can
be done, namely immediate MDT and logged MDT [3, 4]. Immediate MDT uses the
normal Radio Resource Controller (RRC) measurement configuration and reporting
principles with the exception that the reported measurement data may include the
UE location information at the time of measurement results are obtained. Logged
MDT is a new mechanism for idle state UEs to store the radio measurement results
to be reported later when the connection is set up next time [3]. Whenever RAN
node receives the RRC MRs from UEs, it stores the measurements in a trace record
together with its additional MDT information such as timestamp, trace parameters,
and vendor-specific data and then forwarded to the TCE [3]. The procedure used for
immediate MDT is shown in Figure 2.4.
The reporting mechanism of logged MDT and immediate MDT are not uniform. There
is a little bit different among them. In logged MDT reports delivery, the RAN node
or RNC is not aware of trace relevant configuration (MDT context is released after
RRC connection release) before the MDT data is transferred to TCE. Hence, trace
24
2.5 UMTS Network Performance Data Collection 25
relevant parameters (trace reference, trace recording session, TCE ID), memorized by
the UE, are reported back to the network and attached by the UE to the MDT log.
Trace reference and trace recording session reference is used to correlate the data at
the TCE that belongs to the same trace (MDT) session [3, 4]. The procedure used for
logged MDT is shown in Figure 2.5.
25
26 UMTS Overview
to the serving cell base station), the positioning with RF fingerprint is not guaranteed
in all locations [52].
In the best case, detailed location information is obtained from the GNSS if the satellite
positioning has been activated by another function or application. If detailed location
information is obtained from GNSS, then the MR shall consist of latitude and longitude,
and a GNSS timestamp. With immediate MDT reporting, the UE does not send time
stamp information as it does in the case of logged MDT. Instead, the RNC/eNB is
responsible for adding the time stamp to the received MDT MRs when saving them to
the trace records. However, if GNSS was used, the GNSS time information is included
as a way to validate the detailed location information [53]. Even though active, the
GNSS may not be able to provide position information continuously when poor signals
received from the satellites, where indoors or are some locations in urban areas.
The MDT measurements and GNSS functions are normally independent functions,
and therefore, also the timing when the measurement results and GNSS coordinates
become available, can be random. The MDT function at the UE shall tag the measured
result with the latest location information. A certain location sample shall be used
only once in the GNSS report/log. The next accurate location information shall be
included first when new coordinates are provided by the GNSS. The cell identification
information consists of the serving cell CGI or Physical Cell Identification (PCI) of the
detected neighboring cells. The measurements for both the serving and neighboring
cells include the common pilot channel RSCP and Ec/No for a UTRAN system [53].
This thesis focused on the analysis of these measurement logs (RSCP and Ec/No) in
order to detect where the coverage hole is for the optimization input purposes. Hence,
each parameter is discussed in brief as follows.
MDT Measurements
The MDT procedure allows operators to collect radio measurements, such as received
signal strength and quality with UE location information and a time stamp. In im-
mediate MDT, the measurements can be conducted either periodically or network
event-triggered based whereas, in logged MDT, the measurements are collected pe-
riodically [53].The MDT measurements consist of the location information with the
longitude and latitude (if available); time stamp either from a UE or RNC/eNB de-
pending on the MDT mode; cell identification data and the radio measurements for
serving cell and detected intra frequency, inter-frequency and inter-RAT neighboring
cells.
26
2.5 UMTS Network Performance Data Collection 27
There are different mechanisms for estimating user location. The most location info is
the serving CGI and in the best case, detailed location information is obtained from
the GNSS. If detailed location information is obtained from GNSS, then the MR shall
consist of latitude and longitude, and a GNSS time stamp [53]. With immediate MDT,
the UE does not send time stamp information as it does in the case of logged MDT.
Instead, the RNC/eNB is responsible for adding the time stamp to the received MDT
MRs when saving them to the trace records. However, if GNSS was used, the GNSS
time information is included as a way to validate the detailed location information [53].
The cell identification information consists of the serving cell CGI or Physical Cell
Identifications (PCI) of the detected neighboring cells. The measurements for both the
serving and neighboring cells include the common pilot channel RSCP and Ec/No for
a UTRAN system [53]. This thesis focused on the analysis of these measurement logs
(RSCP and Ec/No) in order to detect where the coverage hole is for the optimization
input purposes. Hence, each parameter is discussed in brief as follows.
RSCP: is the signal code power measured by the receiver of a particular UE. It is
used as an indication of received signal strength. It is measured on a CPICH and it can
be obtained in both active and idle mode. RSCP measurement unit is dBm and has
the range of -115 to -40 with a resolution of 1dB [40]. Handover process, cell selection
and resection in the network rely heavily on the RSCP reported readings to the UE,
which keeps measuring RSCP from the serving cell and the neighboring cells as well.
RSCP provides information about the signal power but not the signal quality.
Ec/No: It is the ratio of the received energy per chip and the total received power
spectral noise density of CPICH in the band. It is a radio quality measure for valuing
the level of interference generated by the other cells [34]. Ec can be called RSCP
value and No is the total receive power including thermal noise and interference. It
is measured in dB as it’s a relative value and has the range of -24dB to 0dB with a
resolution of 1dB. The better this value the better can a signal of a cell be distinguished
from the overall noise. The value is negative as the RSCP is smaller than the total
received power. This value can be used to compare different cells on the same carrier
and handover or cell reselection decisions can be taken.
RSSI: is the overall power (dBm), comprising the power of the serving cell, interference,
and noise power received by the UE over the whole channel [34, 40]. RSSI helps in
determining noise and interference information. UTRA carrier RSSI is given with a
resolution of 1 dB with the range of [-94, ..., -32] dBm. Therefore, Ec/No measurement
depend on both RSCP and RSSI [39] and it can be calculated using the Equation (2.1).
27
Chapter 3
Data Mining
Data mining is the science and technology of exploring data in order to discover
previously unknown patterns and is a part of the overall process of knowledge discovery
from databases (KDD) [54]. The two approaches which are ML and rules-based systems
are widely used to make inferences from data. The two approaches have their strengths
and weaknesses. Rules-based systems do still have their place in exploring data. They
are a simple kind of Artificial Intelligence (Artificial Intelligence (AI)), which uses
IF-THEN statements that guide a computer to reach a conclusion or recommendation
with threshold values tailored to the evaluation scenario. Rules-based systems are
typically built from the combined knowledge of human experts related with problem
domain. These domain experts define all the steps to be taken to make a decision and
how to handle any special cases. This full knowledge of the experts has incorporated
into the system [55].
Writing and implementing rules in rules system is relatively easy. If we know about
the situation of interest, we can create rules based on simple IF-THEN statements
[55, 56]. However, rules-based systems are deterministic. Not having the right rule can
result in false positives and false negatives, so the system of rules can become bulky
over time as more and more exceptions and rule changes are added and can be difficult
to grasp. Another challenge by rules-based systems is when the data and scenarios
change faster than we can update the rules. They are always limited by the size of
their underlying rule base (knowledge base) and are said to have rigid intelligence. For
this reason and the other, rule-based systems can only implement narrow AI at best.
The maintenance of these systems also too time-consuming and expensive. As such,
rules-based systems are not very useful for solving problems in complex domains but
28
3.1 Machine Learning Algorithms 29
simple domains. They have been designed to perform a conservative detection and so
that, they lack the ability to learn from experience. They cannot automatically update
their knowledge base based on new information and they stick to the rules always [57].
ML is an alternative approach that can help to address some of the issues with rules-
based methods. The methods typically only take the outcomes from the experts rather
than attempt to fully emulate the decision process of an expert or best practice. For
ML, exactly how the expert arrived at their decision is not important, only what their
decision was is sufficient. Focusing on the outcomes rather than the entire decision-
making process can make machine learning more flexible and less susceptible to some
of the problems encountered with rules-based systems [55, 56].
ML is probabilistic and uses statistical models rather than deterministic rules unlike
rules-based methods. In ML approaches the outputs (identified by historic outcomes
data) can be described by the assumption of a combination of input variables and other
parameters. The input variables can be numerous, and some models can use hundreds of
inputs or features. The learning system is in principle unlimited in its ability to simulate
intelligence and create its models. It is said to have adaptive intelligence,in which the
existing knowledge can be changed or discarded, and new knowledge can be acquired.
This quality and the other makes learning systems so different from rule-based testing.
A machine learning model is trained on historic data outcomes already identified
or labeled by human experts. As it is more amenable to continuous adaption and
improvement through data preparation, algorithm selection, and algorithm parameter
tuning it will be better in the long-run. Machine learning algorithms tend to be one
step away from the human involvement in favor of optimization for computers [55].
Hence, most data mining techniques are based on inductive learning where a model is
constructed explicitly or implicitly by generalizing from a sufficient number of training
examples. The primary assumption of the inductive approach is that the trained model
is applicable to future, unseen examples. ML is an important field for data mining
because of the algorithms that are used in data mining methods belong to algorithms
that exist in the ML field [58].
29
30 Data Mining
30
3.2 Decision Tree 31
DTs are a group of supervised learning methods in the concept of data mining and ML
approach [58]. They are a flow-chart model in which each internal node represents a
test on an attribute, each leaf node represents a response, and the branch represents the
outcome of the test [65]. Similar to the expert system, the outcome of the DT algorithm
can be regarded as a sequence of IF-THEN statements, with the difference that now
these rules are being determined automatically. DT are important in data mining for
various reasons but the most important reasons are that they provide accurate results
and the tree concept is easily understandable compared to other classification methods
[66].
Generally, DT has many appealing features than other classifier like: they can be
visualized graphically and so that it’s simple to understand and interpret; require very
little data preparation whereas other techniques often require data normalization or
standardization of features; the creation of dummy variables and removal of blank
values; can handle both categorical and numerical data whereas other techniques are
specialized for only one type of variable; can handle multi-output problems; use a white
box model i.e. the explanation for the condition can be explained; can be directly
converted to a set of simple if-then rules; robust and work well on noisy data; has well
performance on large data sets [61, 64]. However, DTs are dependent on the coverage
of the training data as with many classifiers. They are sensitive to the specific data on
which they are trained. If the training data is changed the result of the decision tree
can be different and as the result the predictions can be different. Moreover, they are
also susceptible to over-fitting.
DTs consists of a root node, internal nodes and leaf (terminating nodes) that are
connected by branches just like any other tree concepts [58, 67]. DT structure is shown
in Figure 3.1 and the brief descriptions are presented below.
• Root node: It is a starting point of the tree where there are no incoming edges
but zero or more outgoing edges. From the outgoing root node an internal node
or leaf node is produced. It is usually an attribute of the DT model.
31
32 Data Mining
• Internal node: Appears after a root node or an internal node and is followed by
either internal nodes or leaf nodes. It has only one incoming edge and at least
two outgoing edges. Internal nodes are always an attribute of the DT model.
• Leaf node: These are the bottom most elements of the tree and normally represent
classes of the DT model. Leaf nodes have one incoming edge and no outgoing
edges that holds a class label.
• Depth: It is the maximal length of a path from the root node to a leaf node.
Decision trees used in data mining are mainly of two types such as classification
trees and regression tree. Classification trees are used when the predicted outcome is
the class (discrete) to which the data belongs and are designed for data that have a
finite number of class values. The attributes can take numerical or categorical values.
Whereas regression trees used when the resulting leaf nodes of the tree are continuous
[58].
32
3.2 Decision Tree 33
Attribute selection is the idea of how to determine the best attribute that splits the
data efficiently at each stage starting from the root. It is one of the fundamental
properties of building a DT by selecting the attribute that is most useful for classifying
the training data, which gives the maximum degree of discrimination. The selection of
the attribute can affect the entire DT as it will have an impact on the efficiency and
accuracy of the built tree [68]. The main idea is based on the purity of the dataset in
most of the cases. This means the node that will be tested should be split into leaf or
internal nodes that would be as pure as possible. The aim of purity is to partition the
data instances in training data so that the partitioned group would either have all or
most of the data instances in the same class category so that the entropy measure will
be low [67].
To build a DT, identifying attributes for the root node at each level is required and so
that in order to do that, attribute selection measures are used to select the attributes
that partition the tuples into distinct classes. The popular measures/metrics used for
attribute selection are Information Gain, Gain Ratio, Gini Index and Chi-Squared
criterion [65, 67].In this work, information gain which is the function of entropy is used
to measure how well the attribute splits the data.
Entropy
Entropy is the measure that tries to calculate the average amount of information
contained in each message received [67]. In ML terms, entropy tries to find the
most valuable attribute that would be beneficial for a model to be learned. It is a
weighted sum of the logs of the probabilities of each possible outcome when we make a
random selection from a set. It controls how a DT decides to split the data and draws
its boundaries. The weights used in the sum are the probabilities of the outcomes
themselves so that outcomes with high probabilities contribute more to the overall
entropy of a set than outcomes with low probabilities. Mathematically, entropy can be
defined as Equation (3.1).
m
X
Entropy = Inf o(D) = (p(t=i) ) × log2 (p(t=i) ) (3.1)
i=1
33
34 Data Mining
the negative numbers resulted by the log function to positive ones. Equation (3.1) is a
measure of the impurity or heterogeneity of a set. If samples are homogeneous all the
elements are similar, then entropy is 0; else, if the samples are equally divided then
entropy is maximum, which is 1 [68].
Information Gain
V
X |Dj |
Inf oA (D) = × Inf o(Dj ) (3.3)
j=1 |D|
34
3.2 Decision Tree 35
Where, j indicates all the possible values that attribute A can take, D is the whole
collection sample, Dj is the subset of the whole collection sample D for which attribute
|Dj |
A has value j, weight of the j th partition.
|D|
Information gain is defined as the difference between the original information re-
quirement (i.e. based on the classes) and the new requirement (i.e. obtained after
partitioning on A).Hence, by using Equation (3.2) and (3.3), we can now formally
define information gain made from splitting the dataset using the feature A as Equation
(3.4).
Gain(A) = Inf o(D) − Inf oA (D) (3.4)
Where, the first term in the equation is the entire entropy before partitioning the
dataset and the second term of the equation is the entropy after splitting the instances
using attribute A. This means that the information gain Gain(A) is the expected
reduction of entropy after knowing the value of attribute A [61].
Gini Index
Gini Index is a measure of inequality in the sample. It has a value between 0 and
1. Gini index of value 0 means samples are perfectly homogeneous (same class) and
all elements are similar, whereas the Gini index of value 1 means maximal inequality
among elements. Gini Index is an attribute selection measure used by the Classification
and Regression Tree (CART) DT algorithm [64]. As in Equation (3.5), it measures
the impurity of D, a data partition or set of training tuples [65]:
m
pi 2
X
Gini(D) = 1 − (3.5)
i=1
Algorithm learning is the systematic approach for learning a classification model given
a training set. To be able to conclude new predictions from the existing datasets, DT
35
36 Data Mining
True Class
Positive Negative
Positive True Positive (TP) False Positive (FP)
Predicted Class
Negative False Negative(FN) True Negative (TN)
After doing data preprocessing and implementing a model and getting some output in
forms of a class, the next step is to find out how effective is the model based on some
metric using test datasets. Different performance metrics are used to evaluate different
ML algorithms. This is achieved by using measures and metrics that will estimate the
overall performance of the inducer’s model for future use [70]. Well-known evaluation
metrics to measure the classifiers performances are confusion matrix, accuracy, precision
and recall-measure and Receiver Operating Characteristics (ROC) curve [71].
Confusion Matrix: The confusion matrix contains the numbers about actual and
predicted class of the model used. During testing the raw data produced by classification
scheme are the number of the correct and incorrect classifications from each class. The
basic performance of a classifier can be indicated or evaluated by comparing these
predicted labels against the true labels of instances as shown in Table 3.1.
The diagonal cells of the confusion matrices show the outcome of a true-positive test
which indicates the likelihood that a sample is correctly labeled according to the class
it belongs to. A false-positive test indicates the likelihood that samples are labeled
incorrectly and have been assigned to the wrong classes. To evaluate the performance of
the classification model, some terms which are derived from two-class labeled (positive
and negative) data are important [70, 71].
36
3.2 Decision Tree 37
True Positives (TP): These refer to the positive instances that were correctly labeled
as positives by the classifier.
True Negatives (TN): These refer to the negative instances that were correctly
labeled as negatives by the classifier.
False Positives (FP): These are the negative instances that were incorrectly labeled
as positive by the classifier.
False Negatives (FN):These are the positive instances that were mislabeled as
negative by the classifier.
A confusion matrix can provide the required information to determine how a classifi-
cation model performs correctly, however summarizing this information into a single
number makes it more appropriate to compare the relative performance of different
models. This can be done using an evaluation metric such as accuracy, precision, recall,
f-measure which are computed in the following way.
Accuracy: It measures the rate of all correctly classified instances by the total number
of instances and is given in Equation (3.6).
Recall (Positive sensitivity value): It represents the ratio of the number of correctly
classified positives to the number of all the positive instances. It is also called positive
sensitivity value, which can be calculated by Equation(3.8).
37
38 Data Mining
F-measure: It is a model metric that can be used when we want to seek a balance
between precision and recall (see Equation (3.9)).
38
Chapter 4
Experimentation
As has been indicated in the objective part of Chapter 1, data mining technique is used
to adapt a model that detects coverage hole in UMTS mobile network using UEs MR
data as an input to the model. The detection has been performed by implementing
the selected algorithm, which is DT, for knowledge mining. Such knowledge allows for
better operation of the whole network by improving monitoring efficiency, specifically
for early detection of hidden risks related to the joint effect of signal strength and
quality by classifying into four coverage classes. To implement this task, specifically
in this work, defining study-area,thresholds, coverage scenario and system process is
required as shown in the following sections.
The test was carried out on 32 different mobile sites comprising 267 cells that are
distributed over an area of 3000m × 3000m with geographic coordinate range of
(38.7076451E - 38.7348105E , 9.0404363N - 9.067913N) longitude and latitude respec-
tively. The selected UMTS base stations are illustrated in Figure 4.1. It generally
displays the distribution of all Addis Ababa UMTS Node-Bs and the selected region
with the number of Node-Bs as well as their distribution on Google map. Based on
the selected area, information like site ID, cell ID, RNC ID and location information
(latitude and longitude) are captured. UEs MR parameters such as RSCP and Ec/No
used in this work also captured by the support of these geographic information.
39
40 Experimentation
Table 4.1 Geographical coordinates of available sites and the selected UMTS Node-Bs
UMTS Node-B sites
Selected Node-Bs
in Addis Ababa
Longitude 38.645469 – 38.940409 38.7076451 - 38.7348105
Latitude 8.818630 – 9.0943901 9.0404363 - 9.067913
Total Number of RNCs 5 (RNC101- RNC105) 1 (RNC 103)
Total Number of sites 744 32
Total Number of cells 7159 267
Mobile services are considered to be available when radio signal level values are above
the minimum thresholds that allow their use. However, the thresholds may be varied
as per the requirement of mobile operators, vendors, service requirements or technology
[72–77]. For example in [74, 78], WCDMA coverage areas are classified considering
the thresholds below -95dBm and -15dB as poor for signal strength and signal quality
respectively. Also in [75, 77], different thresholds are defined for 19 European countries
to qualify if there is outdoor coverage or not (covered / not covered). As it is observed
from Table 4.2 there are five classes of coverage/quality levels (Poor, Fair, Good, Very
good, and Excellent) [5]. In this thesis, the thresholds are considered based on the
current use case of ethio telecom for coverage assessment which represents -95dBm for
RSCP and -13dB for Ec/No [5, 33].
40
4.3 Coverage Scenario (Target Class) Definition 41
Table 4.2 Classification of signal coverage and quality based on RSCP and Ec/N o level
[5]
Coverage Quality
RSCP(dBm) Ec/No(dB)
Levels of UMTS
Poor −115 < RSCP < −95 −24 < Ec/N o < −13
Fair −95 ≤ RSCP < −85 −13 ≤ Ec/N o < −10
Good −85 ≤ RSCP < −75 −10 ≤ Ec/N o < −8
Very good −75 ≤ RSCP < −65 −8 ≤ Ec/N o ≤ −5
Excellent 65 ≤ RSCP < M ax −5 < Ec/N o ≤ M ax
This section addresses the definition of the scenarios for network coverage status
classification. As per [79], downlink CPICH coverage has to be verified by considering
not only if the RSCP of the pilot channel is sufficient, but also by estimating the level
of interference generated by the other cells. Such interference is typically quantified by
the Ec/No of the CPICH. Where, Ec is an expression of power in CPICH and No is
the cumulative sum of own cell interference, surrounding cell interference and noise
density [39]. Ec/No value effectively estimated how much of the received signal can be
used at a given location or how clean is the signal received. Different works of literature
categorized network coverage status (grades) in different classes by considering RSCP
and Ec/No separately, meaning that they did not use the joint effect of these two
parameters at the same instant. To know the coverage problem of the area, they
separately analyzed the parameters [72]. Hence, in this thesis, the defined scenarios are
considered the joint effect of the parameters by benchmarking thresholds that ethio
telecom is using for coverage/quality grading purposes.
In addition to this, what needs to be noted here is that even though there are more
than two network coverage classifications like Very Good, Fair, Poor, this thesis
considered only the threshold that separates the network coverage into two classes such
as “good” and “poor” scenarios. By considering these two classes and the joint effect
of two parameters (RSCP and Ec/No) the overall MR data were classified into four
coverage scenarios as shown in Table 4.3. The brief description of coverage scenarios
are explained below.
Coverage scenario 1 (Class-1): This scenario illustrates the area where both the
RF signal strength (RSCP) and signal quality (Ec/No) are below the threshold as
depicted in Table 4.3. This means that the RF signal strength (RSCP) and signal
41
42 Experimentation
quality (Ec/No) are below -95dBm and -13dB respectively. It implies that the areas
have a critical coverage hole problem both in signal strength and quality.
Coverage scenario 2 (Class-2): This scenario illustrates the area where RF signal
strengths (RSCP) are below the threshold but the signal qualities (Ec/No) are above
the threshold sated. This depicts that the areas have coverage hole problems due to
poor signal strength.
Coverage scenario 3 (Class-3): This scenario illustrates the area where the RF
signal strength (RSCP) is greater or equal to -95dBm and signals quality (Ec/No) is
below -13dB which indicates the coverage hole due to signal quality.
Coverage scenario 4 (Class-4): This scenario illustrates the area where both RF
signal strength and signal quality are above or equal to the thresholds. This means the
areas are in good condition both in signal strength and quality. The summary of all
scenario is illustrated in Table 4.3.
Having the thresholds and coverage scenarios illustrated the main processing steps of
the framework are shown in Figure 4.2. As can be seen, the analysis starts with the
data collection and preprocessing of MR data and then split the data as a training and
testing set. Applying the data on the model, model learning, testing, and evaluations
are parts of the model framework. In the next process, the model classifies the required
target classes which are coverage scenarios. Finally, those coverage scenarios are
42
4.4 System Process 43
visualized on a Google map which represents the coverage classes in location. The brief
description is explained below.
Data Collection
UE MR data collection
Preprocessing Preprocessing
DT Algorithm Learning
Classification and
Model Evaluation
Model Evaluation
Visualization of Coverage
Scenarios on Google map
43
44 Experimentation
We learn from Figure 4.3 that, 70% (245,587) of the samples are used for training
the model out of 350,839 instances. RSCP is selected as the best splitter and
used as a root node attribute to split the data sets. The ‘value’ row in each
node tells us how many of the observations that were sorted into that node fall
into each of four categories. Finally, four-leaf nodes (classes) were formed with
observed samples value.
44
4.4 System Process 45
45
46 Experimentation
Predicted Class
Class-1 Class-2 Class-3 Class-4
Class-1 7880 2 0 4
Class-2 1 11443 0 3
True Class
Class-3 1 0 5158 0
Class-4 0 0 0 80760
Figure 4.4 reports the predicted class against true class results for the classification of
the four coverage scenarios from MRs in terms of the confusion matrix in plot form.
We learned from the plot that there are points out of the diagonal line which represent
the incorrectly classified instances. The points on the diagonal line show the datasets
correctly classified as per their respective classes. Then, performance measure which is
accuracy is calculated as per the corresponding formula presented in Equation (3.6).
From the confusion matrix in Table 4.6, we can see that out of 105,252 test instances,
the algorithm misclassified only 11. This resulted in 99.98% accuracy which is good.
As it is observed from the table, there are four classes of coverage scenarios, such as
class-1, class-2, class-3, and class-4. The values of TP1, TP2, TP3, and TP4 are 7880,
11443, 5158 and 80760, respectively, which represent the diagonal in the table.