International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
Business Decision Making by Big Data Analytics
Vishal Kumar Goar, Nagendra Singh Yadav
Engineering College Bikaner, Rajasthan, India
dr.vishalgoar@gmail.com, nksyadav100@gmail.com)
Abstract Information is the key component towards success when it comes to controlling the decision-makers performance with the quality of a decision.
In the modern era, an absolute amount of data is available to organizations for analysis usage. Data is the most important component of the business
in the 21st century and a significant number of devices are already equipped with the internet. Based on this the solutions should be studied in
order to control and capture the knowledge value pair out of the datasets. Following this, the decision-makers should have access to insightful and
valuable data based on the dynamic high volume & velocity using big data analytics. Our research focuses on how to integrate big data analytics
into the decision-making process. The B-DAD (big data analytics and decision) framework was created to map the big data tools, its architecture,
and analytics for the several decision-making steps by the adoption of methodology based on design science. The ideal goal and offerings of the
framework are adopting big data analytics in order to intensify & aid decision making for the organization using an integration of big data analytics
into the corresponding decision-making process. Thus, the experiment was carried out in the retail domain to test the framework. As an end result,
the results showcased the value-added if big data analytics is integrated with corresponding decision-making activity.
Keywords: Big Data, Big Data Analytics, Business Analysis, Decision-making process, Software Testing, Business Analytics, Cost of doing
business.
1. Introduction:
The modern era of digital technology evolution has
changed the way how organizations operate on a daily
basis based on the business-driven approach. The
extensive amount of data is easily available since the
storage type has increased with the data collection. With
each second passing by more and more data is being
generated from several sources. [11] Such data is
required to have some mechanisms such that it can be
stored for analysis to draw the value. Organizations
should push for capturing the maximum value out of the
huge data repository. In addition to that, organizations
and corresponding stakeholders hold the technology
and devices allowing them to create and store data based
on different grouping or buckets. [12] Each user these
days can have access to personal devices i.e. laptops,
smartphones, and such devices consist of larger data
volumes which can be important to the organizations.
[3] Such type of data is referred to big data where data
varies based on volume, variety, and velocity which
becomes difficult to maintain and manage with the
existing set of tools.
Big data can consist of various types i.e., sentiments,
click stream, video, audio, website session tracking for
user activity along with location-based data. [1]
Therefore, it requires several methods of big data
analytics depending on size, variety, and consistent
changes along with storage and analysis methodologies.
In addition to the above, such big data should be
analyzed carefully to draw out valuable and related
information. Organizations are looking for a solution
and guidelines for big data management due to the
significant request for employing the big data in order
to make the best of the available opportunities. [4] The
question in our research article is how to employ big
data analytics using the integration with the decisionmaking process? The goal of the research is to construct
and test the framework using the integration of big data
techniques and tools with the decision-making process.
[15] The adoption of the framework will allow the
decision-makers to intensify the standard of the
decision-making process and potentially increase the
decision quality as a byproduct. [13] The framework
comprises several supreme characteristics of big data
analytics i.e. The big data analytics life cycle, required
architecture and infrastructure along with necessary
tools for their mapping of the different decision-making
processes. [6]
22
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
2. The Background.
Big Data is applicable to a set of data that grows
exponentially such that it becomes hard to work with
using the existing traditional database management
system. [14] However, the dimensions of big data have
lengthened far off the capability of generally used
software tools and storage mechanisms to store and
manage the process of data in a certain timeframe. [5]
The V’s are the main characteristics of big data i.e.
volume, velocity, and variety. The Volume can be
considered as data in size, velocity can be considered as
a dynamic form of data that changes at a certain rate or
based on how it is created over time. Lastly, variety is
considered as a different type of data form with formats.
[6]
Big Data analytics is adopted where advanced analytics
techniques can be put in on big data sets. [19] The
Analytics found in a larger data set can support, disclose
and strengthen the business alternatives. The larger set
of data throws more challenges when it comes to
managing it. [8] In order to ameliorate the decision
making, risk depletion, and discovering hidden insights
using the data, knowledgeable analytics can come in
handy else the valuable intel inside the data will remain
hidden forever. [7] Fewer times the decisions are not
always required to be automated instead, they should be
cross-questioned using the analysis based on data, big
data techniques, and technologies. This will help the
individuals to understand better in terms of how
extracted information is valuable. [10]
Besides, the topics related to managerial decisionmaking processes are much important and they’ve been
covered in so much research so far. The decisionmaking consists of four-phase activities: intellect, Plan,
alternative, and execution. There are several ways to the
big data analysis conveyor but they come with their own
hurdles and require decision making. [16] Such a
decision can include what data should be acquired and
how to represent the data post-extraction, clean up and
its integration with additional origins in order to reach
the decision based on results using analysis. [17] All of
these hurdles and decisions should be planned
successfully for the big data analysis to generate the true
value. [18]
The decision-makers should be able to recognize and
make the best use of big data to furthermore intensify
the conventional decision-making process as they are
always in awe to execute the informed decisions
whenever the opportunity is available to them.[2]
Therefore, the research should be able to abstract how
tools and methods can be integrated with the decisionmaking process using big data for intensifying decision
making in order to generate the important insights for
decision-makers. [9]
3. B-DAD Framework Our Research applies the design science technology and
the hance six-phase design science activity is equipped
for creating and assessing the framework. As per the
first two-stage the problem was identified and the
objective is defined for a solution using the
investigative research. Thus, the corresponding
knowledge was acquired from the knowledge base
along with the business requirements for the
environment to transform the B-DAD Framework. This
is accomplished by going through research and testing
the key techniques related to big data analytics such that
they can be inherited in the framework. Based on the
above research diligence along with relevance were
accomplished.
At a later stage after the development of the B-DAD
framework, it was examined and showcased using the
adoption of big data analytics for supporting the
decision-making process. The illustrated idea is static as
per the experiments on real data and Real-time business
use cases for supporting the applicable context for
accessing. The framework evaluation was completed
based on the observation of to what extent the
framework was a success when applied the big data
analytics all over the decision-making process to
support the decision based on insights backing it up.
Furthermore, the ease of the process and relevance of
the framework to several scenarios were perceived.
Based on the results, we navigated back to the
framework design and creation phase for incorporating
the modifications which later resulted in our final BDAD framework. The detailed process is listed below.
3.1. Creation of the Framework
The Big Data Analytics and decision (B-DAD)
framework was created to plot the big data tools,
analytics, and architecture into several decision-making
23
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
process steps. The Big refers to the three aspects which
are not only limited to the data but plot several aspects
into one. The depth analysis of related literature and the
technologies in big data analytics lead to the
development of a designed framework for us.
It is impossible to summarize all the big data
technology, tools, and analytics as this is a conceptual
form using which the support can be added to the
decision-making process with big data analytics. The
precondition for the framework is that the domain of
decision is known and doesn’t require any exploration
with regards to the extraction of the problem for which
we are looking for a resolution. The framework is listed
in Figure 1.
The First step in the decision-making process is the
intellect phase, in which the data is utilized to recognize
the opportunity and issues which is furthermore
gathered using the external and internal data origins.
The identification of big data sources is a must and data
should be captured using several sources, cleaned,
saved, and moved to the end-user. In other words, the
first step in the framework is about identifying which
big data should be utilized for the analysis. The key
difference lies in the diversity of data types identified
from the various sources. Other than the traditional or
transactional data, data related to social media, images
& video exist. In addition to the above, data is generated
by machines and devices such as log files, data captured
from sensors, or location-based data. Besides locationbased data is extremely valuable when combined with
internet data, XML, or clickstream files.
24
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
Figure 1: B-DAD Framework
data, and Study. As per the Model Plan step, a
In order to define the analysis source and type of data
corresponding model for data analytics is opted and
are required here as later on the selected data is obtained
planned. The corresponding applicable model and
and stored. The Selected data can be stored or saved in
algorithms are selected depending on the available data
any big data storage unit or management tool. Such
type and analysis. A set of few models and analyses
tools can consist of conventional Database management
chosen here are displayed in the framework.
systems. i.e. My SQL (open source) or MPP DBMS like
The set of conventional data mining and analytics
Cassandra, SAND, PADB. Adding to this the
techniques i.e., classification, a regression can opt for
distributed file systems HDFS is utilized for big data
machine learning and AI techniques i.e., decision tree,
storage along with MongoDB (transformed on top of
and pattern analytics. To analyze the data point
HDFS).
sequence, time series analysis can be used which will
In the data preparation phase, Once the big data is stored
reflect the values at consecutive time intervals. If the big
and obtained, it is categorized, prepared, and processed
data is in the form of text or social media data, the text
for the data analytics lifecycle. Once the data is
and social media analysis can be performed for the
processed, it enters the organized phase inside the
above. To represent the complex path analysis or
integrated information architecture.
This is
network, graph analysis can be used which can
accomplished using the high-speed network along with
elaborate on the direct dependency between the
ETL or a big data processing tool. For the data
variables. For the cluster of dense data or location-based
processing Hadoop, MapReduce & in memory
data, spatial or density analysis can be performed. For
management is utilized. The data can be obtained using
Web data and analysis of clicks on a web page, the click
query, computation and the processing can be executed
stream analysis can opt.
using the several languages starting from Pig, Hive, and
In the analysis of the data step, the opted model is
R (preferred for statistical analysis) to SQL-H & SQL
applied. In order to predict better for the future,
used for accessing the data stored in Hadoop. The group
predictive analytics can opt for historical and current
of such tools can contribute to the big data findings and
data and it can also comprise Online analytical
create the required analysis.
Processing (OLAP). To intensify the speed and access
Few vendors offer a pool of tools and platforms for
for scoring in the analytics model, in-memory
supporting big data storage and management for
processing and analytics can be combined with big data.
organizations. This facilitates extensive solutions based
Diverse tech and analytics tools are used in this phase
on big data along with more features in a single
i.e., Kognitio, and HANA built on top of R Language,
package. i.e., Greenplum, Vertica.
TWM, Mahout, and MADlib. Radoop or RapidMiner is
The next step in the decision-making process is the Plan
an extension used to integrate the offerings of data
Phase where the possible direction of actions is defined,
analytics into Hive. Mahout serves solutions for
created, and analyzed using the core fundamentals or
Hadoop based on data analytics which can execute
based on a model of the problem. The framework
consists of the following steps: Model Plan, Analysis of
25
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
OLAP, prediction analytics, or capable to integrate
NoSQL with Hadoop & analytics DB.
Inside the Study phase, the end result of past steps along
with analytics results goes for analysis. Later on, the
probable direction of action is defined. The action is
later selected based on an alternative form per the next
phase.
The alternative phase is the next step in the decisionmaking process where each method is utilized to
identify the significant influence of the suggested
solution, and direction of action based on the plan
phase. The framework consists of two steps: evaluate &
determine. Inside the evaluation step, evaluation is
carried out for the evaluation steps, suggested direction
of action along with the consequences are checked with
prioritization. This is accomplished using Reporting,
Simulations, dashboards, KPI & what-if analysis. The
data visualization tools come in handsy i.e., Gephi
(popular graph-based visibility), Tableau, and SAS
Visual analytics.
The next step in the Alternative phase is to determine
the best direction of action. The actual decisions are
executed based on the results of the evaluation for the
probable best direction of action.
At last, the final phase in the decision-making process
is the execution or operational phase in which the
suggested solution is implemented based on the
outcome of previous results in the phases listed above.
Thus, in order to monitor the decision results, big data
tools and technology can be utilized which provide realtime data and feedback for the end results of execution.
Figure 2 – Data Model
3.2. Retail Domain: framework evaluation
Post the competition of IT infrastructure, the framework
should be evaluated furthermore. An experimental
method was selected to test the B-DAD framework.
Therefore, the illustration of the framework was
generated with real-world data. Though the accessible
solutions i.e., TWM, Aster, and several other DBMS
were tested per lab findings. The flow and integration in
the middle of several tools were captured and studied.
The findings were executed in the retail domain for
evaluating the B-DAD framework. The decision testing
was executed to identify the effectiveness, sentiment
impact, & social media impact on Sales. Thus, the
decision lies in which product the promotions should be
offered, what is the instance on which the promotion
should be offered, does the marketing based on social
media are effective or should be focused on or not. To
support the decisions following set of knowledge should
26
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
be available: analysis related to purchase data,
feedback, and customer revert to the posts on social
media. All the different phases in the framework are
listed in the later section. Thus, the implementation
phase was not tested since when it comes to decisions
they have to be executed and tracked over time and that
was not feasible with the given scope of the findings.
1. Intellect Phase:
In this phase of the framework, the big data user is
required to be captured. In our experiment, a
combination of social media, text, and relational data is
used. Our data model is listed in Figure 2. We selected
the supermarket data using the POS (Point of sale) and
ERP (enterprise resource planning) to capture the
relational data related to the retail purchases. The
supermarket listed more than 80,000 consumer items
with the two largest branches segmented into 30
sections. The daily user visit counter to the outlet can
hit up to 50,000. For our usage, we captured the samples
of data from POS and ERP. The data was captured using
Teradata tools using Teradata DBMS.
The Social Media data was captured using the customer
posts and comments on the supermarket's Facebook fan
page. We were unable to find the tweets related to
supermarkets on Twitter. Thus, we utilized the opensource APIs to capture the post and related comments
in a given timeframe based on sales data available for
us. We also captured the post and comments related to
fan pages for the given timeframe.
2. Plan Phase
The second tested phase in the framework is the Plan
phase. Here the actual model planning takes place and
the relationship is required to be recognized and
explained. To discover the relation between several
retail aspects we focused on purchase and discount.
Thus, this generated several models in the planning
phase. The model and analysis are explained below in
detail –
• Visibility analysis – Firstly, it is required to
understand how the distribution of purchased
quantity and the sum of item purchase work
across sections or branches, and hence we begin
using Tableau for adding visibility to
relationships.
Using the visibility, we were able to capture the
supermarket branches and departments that are
•
•
generating maximum discount, profit, or sales.
In order to add visibility to the relationship
between total sales, discounts offered, and
quantity of sales, a time series analysis was
executed for a certain time duration. Some
hidden facts were discovered related to the
product and peak time impacting the sales,
relation with the promotion, and discount.
Regression & Association Analysis – An
association analysis was carried out on TWN to
discover the relationship among variables. We
discovered that the higher association to the
discount were sales and quantity values when
generating the association matrix. Logistics
regression was executed on TWM to identify
the connection between if there is a discount or
not and the left independent variable. This, the
projection of a scenario with a discount or not
was discovered on the basis of 9 independent
variables. Such variables were used in our
model.
Cluster Analysis – The cluster analysis was
carried out to group the items on the basis of
regular occurrence as per discount. We begin to
use cluster analysis in TVM where K means the
planned algorithm and the two clusters were
used as we focus on the scenario where the
discount should be offered to have Boolean
values 0 and 1. 0 represents no discount and 1
represents the discount. The resultant metric
displayed 78% of the items in “Cluster 1” and
“Cluster 2” had 22% of the items.
At a later stage, the clustering was executed for
Weka with the help of the k-means algorithm
where k is equal to 2 clusters based on discount
categories. Discount is a class where clusters
are recorded using evaluation class to cluster.
Once the cluster model was developed based on
training data, it is tested by mapping each class
value to the cluster. To measure the validations
of clustering the rule was verified. i.e., the
discount number of points matches the class in
which it was clustered. The difference was
found in the results when compared the cluster
0 and 1 with TWM. “Cluster 0” was recorded
with 43% of the instance & “Cluster 1” was
27
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
recorded with 57% of the instance. The
discrepancy can occur since Mahalanobis
distance is used to calculate the distance among
the points & mean for a cluster in TWM. On the
other hand, Euclidean distance was used in the
case of Weka to calculate the distance.
cluster instance
$1,200
$1,000
$800
$600
$400
$200
$0
0
2
4
6
8
Total Value
10
12
14
16
18
20
Discount
Figure 3 – K-means clustering visualization
Figure 3 adds the visibility of the cluster
instance. “cluster 0” can be identified by a
orange point where the discount is not available
& “Cluster 1” can be identified by blue points
where the discount is available. Based on the
classes to cluster findings, the overlapping
represents the incorrect clustered instance, the
orange circle refers to items that have a
discount but were mapped to the cluster which
holds no discount. The Blue circle refers to an
item that doesn’t have the discount but was
mapped to the discount cluster. Now the
promotions can be mapped to the blue circle.
The reason for the error or rise in the orange
circle could be because of the fact that 80% of
the promotion-based items are supermarket
discounts & discounts were identified using the
promotion plan with the help of a supplier
instead of the supermarket.
The different end results can be generated based
on changes in the clustering algorithm. We can
utilize EM (Expectation-Maximization) rather
than k-means as we have only clusters to
represent. The Expectation maximization
clustering runs through each instance and a
probability distribution is mapped to each
instance to measure the instance probability of
belonging to the corresponding cluster. Using
the cross-verification the no of clusters was
decided. The algorithm bisects the data into 10
clusters and the % of cluster placement is
stabilized.
28
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
Clustered Instance
1200
1000
800
600
400
200
0
1
2
3
4
5
6
7
8
Item_code
9
10
11
12
13
14
15
16
17
18
Total Value
Figure 4 - Expectation-Maximization cluster visualization
•
Figure 4 adds the visibility on clustered
instances founded on the ExpectationMaximization algorithm. The X-axis notation
refers to item code and the Y-axis notation
refers to the total value where the colors refer to
10 clusters. The higher item code was
discovered in blue “Cluster 0”, the “cluster 5”
instance aka magenta was found to have the
lowest total value & the highest total value was
discovered in orange “Cluster 7”.
Association Analysis – The association
analysis consists of association rule mining to
realize the undiscovered relations and to notice
the most usual combination among the variable
of data. In our research, we transformed the
attribute of numeric code into normal attributes
and later segregated the rest of the numerical
variables with the help of equal frequency
binning such that they can be converted into
category-based variables for our analysis. At a
later stage, we executed association rule mining
by utilizing Apriori & Weka algorithm where
minimum support was set to 0.1 with minimum
confidence as 0.25. No rules of value will result
If the support was lifted up.
•
Thus, various interesting rules were discovered.
i.e., There is no discount applicable if the
purchased quantity is low having the
confidence ratio as 84% and the second branch
listed the confidence as 45%. The “Chocolate
& candy” section doesn’t list any discount on
80% of the products. In the case of the first
branch, no discount was observed on the 74%
of the products in a given time but in the case
of the second branch, no discount was observed
on the 72% of the products in a given time.
Close to 60% of the purchases were discovered
in the “Cosmetics” section in a given time and
then later on the same was made available for
the first branch. There are many more
association rules which can be extracted based
on the results. We were unable to capture the
rules which rely on the purchase of some items
together (pack item) since the data doesn’t
consist of the individual cart instead it listed
daily aggregated invoice.
Decision tree – The decision tree was
developed as a part of classification analysis.
We have used TWM to construct the tree model
with the split strategy on the gain ratio. The
29
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
•
selected dependent variable was the Boolean
discount variable whereas the liberated
variables are i.e., purchased item quantity,
branch, item code, section, supplier code, and
sales. The decision tree metric indicated the
higher accuracy of the model along with the %
of true classification as 77.21 % whereas the
incorrect classification percentage was
observed to be 22.79%. The end result of the
decision tree i.e., the resulting rule indicated the
factors impacting the decision if the item should
have a discount or not.
Text mining and social media analysis – We
executed the text mining on the data captured
from social media to realize or identify people's
opinions about the supermarket and the listed
products along with the reverts to the
supermarket post and marketing. Next, we were
required to discover the commonly used words
on such posts and the association within those
words. This helped us to learn about sentiments,
products or what people think f supermarket
outlets. For example, we discovered that the
people were not happy with a certain product or
they were happy with some promotions, such
information can be added to the model to
enhance the decision.
we used RapidMiner to execute the text mining.
Reading the data from the database, choosing
the expected attribute for analysis, 7 appending
the final set of data into an illustration set are
the first step to start with the process. Next,
refine the data into a document form where
each text has some tokens assigned and to filter
the English stop words, the operator is used to
filter them and remove them from the
document. The tokens are cleaned based on
length to eliminate the very long or short tokens
such that n-grams are produced. Once the data
processing is finished, Nominal and numeric
attributes should be transformed into binomial
for executing the association rule mining. In
order to capture the frequency of itemset, the
FP-Growth algorithm is employed with the help
of an operator which builds the association rule.
•
The process was never able to go beyond the
FP-growth operator post several days of
process execution. The largest execution time
was for close to 2 days but no result was
generated. This could be due to the fact that the
language was in Unicode format and it became
very hard for processing. Due to the
grammatical error or wrong sentence
formation, the FP-Growth operator might not
be able to perform in such a scenario.
Results generated by document processing
went for depth analysis using the TF-IDF (term
frequency-inverse
document
frequency)
algorithm which gives the importance to
relative words in the document when
differentiated from the words in the rest of the
document. We were able to locate the number
of reoccurring words which are frequently used
in social media posts and comments. i.e.,
“quantity, “offer”, “promotion”, “orange”.
Sentiment Analysis – To explore the negative
and positive opinions shared by the user on the
supermarket social media page, we executed
the sentiment analysis on RapidMiner. We
begin by building the label data set using the
samples for 100 posts which were stored in
documents based on folders with different
prioritization or if they represent the negative or
positive posts. The document was, later on,
went for processing, and tokens are assigned to
each of them. The word vector was trimmed to
eliminate the regular and reoccurring terms,
such ratio was limited to less than 3% or close
to 95% based on time limitations. The vector
wordlist was constructed from the data and the
same was stored as the “naive bayes
classification model” with the help of crossvalidation. The data was segmented into
training and test sets and the model construction
begin with the help of the training set once
done, it was applied to the test set based on the
accuracy of the performance was calculated.
One of the documents out of 64 was forecasted
to be negative whereas 40 of them were
classified as negative correctly. This led to a
precision ratio of close to 62.5%, and a recall
30
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
ratio as 100% as the negative document was
marked as negative after classification was
completed. Speaking of 60 documents, into
which 36 of them were classified successfully
based on correctness resulting in 60% recall,
and 36 of the documents were forecasted to be
positive results with 100% accuracy. Thus, the
model accuracy was 76%.
The created model is implied on the posts
without any labels to have them filled into the
negative or positive category. This was not a
practical approach for our case knowing it to be
a traditional and widely used sentiment analysis
form. This needs each of the posts to be stored
in a different document and later manually
labeling was required for the corresponding
positive or negative training set. Later, after
successfully implying the model and marking
the documents to be negative or positive it
needed to go back into the document and verify
how the corresponding post was classified by
opening each of them.
Repustate is social media and sentiment
analysis website. It consists of APIs which
count for sentiment analysis in the
corresponding languages. The sentiment
analysis was executed on the posts using the
trial version of Repustate on the supermarket’s
social media page. We were able to locate the
negative posts based on time to identify when
the negative comments were posted and at
which time throughout the day. We were able
to identify the set of customers frustrated with
the product or service experience and how they
ended up posting negative feedback or opinion.
i.e., this helped us to identify that the quality of
delivering services was concerning for the
second branch.
3.
Alternative Phase – in this phase, we are
required to adopt the proposed solution based on the
defined results in the previous phase. For our case what
item, when refers to the promotions created for
customers. We don’t seem to have complete
background knowledge on how supermarket operates
or function or utilizes the KPIs for evaluating them.
Thus, we opted to go for visualization as the way of
measuring step in the experiment as it turns out to be
adding more value in our case. The analysis in the last
phase generated the visibility which can be utilized in
the alternative phase. New visualizations were created
in this phase to foresee the relationship between
variables and how they tend to differ over time based
on time series. We opted for the specific instance as an
example to realize the impact caused by social media
adverts on customer purchasing. For instance, to
visualize the best method for the promotion of specific
juices Tableau was used since the customer feedback
using posts and opinions about the purchase is equally
important.
To capture the social media activity of the users, the
user engagement with social media posts was
visualized. This enabled us to pivot on the specific days
which had higher user activity in terms of published
posts. We were able to capture the likes and shares for
each post to be able to identify which post had the
higher user engagement ratio along with comments and
feedback.
The higher likes on the corresponding promotion post
reflected that the user is more engaged and showcased
a positive opinion and vice versa.
Thus, using the analysis for item sales & discounts on
the items over time and including the user feedback
enabled us to identify the accuracy of the online
promotions and sentiments based on purchase patterns.
This leads to another analysis where we can forecast
which date and time are best for offering the
promotions.
3.3. Findings in the experiment –
As per our experiment, we have showcased the decision
on which corresponding items should have promotions
offered and how social media can be used for
marketing, resulting in what extent this can influence
the customer purchase. This helps to capture the
customer feedback for each purchase. Our framework is
accompanied by opting for mapped big data analytics
tools and methods build in the corresponding phase of
the decision-making process. Through the process, data
was stored in big data and later went for processing of
the same. Later the analytics was executed and the
results were analyzed to enable the decision-making
31
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
with enhanced visibility. Using big data analytics, the
decisions were supported with the additional set of
knowledge. Steps in the framework were easy and
added value with visibility. The required set of changes
is explained below.
We discovered a few limitations and disadvantages not
specific to the framework but related to the experiment.
They are related to the Turkish language as it is difficult
to work with. Unlike another language, it is argued if
the said word or phrase reflects the positive or negative
sentiment and later it comes down to the context since
it is the important aspect here. Turkish is more difficult
than the rest of the languages when trying to identify the
root word to opt for, tense, and grammar. Consequently,
the same word can have different meanings or
interpretations. For instance, the word “helwa” can refer
to sweet or candy and the adjective can refer to taste,
something which tastes like sugar or nice. In our case,
we can never know if the intention to use the word refer
to an item or taste, or probably someone like the item,
or its an expression.
Though there was complexity, we have successfully
extracted the valuable insight with the help of analysis
based on social media. We merged the results obtained
from several analyses such that we can gain out-of-theordinary visibility on insight based on which decisions
can be made. The social media data including the post
and comments helped us to understand the relationship
between attributes by executing the analytics on
relational data. We might have not realized the impact
of social media on customer engagement with purchase
patterns. This helped to identify the reason for the spike
in sales overtime after the online promotion. At last, we
never are able to have the user perspective & sentiment
as a basis for the decision and try to understand how the
user thinks or feels about service, promotion, or items.
4. Result –
Based on the outcome of our experiment, we have
realized observations and improvements related to the
framework. Firstly, as defined in the framework
development previously, the framework enables the use
of conceptual approaches for executing big data and
analytics to brace the decision-making process. It was
not all about big data analytics and tools. Some of them
already exist and the corresponding offering is
increasing over time which makes it impossible to have
the list of feasible solutions. Various solutions are not
intended to be released as big data tools could be
utilized for other usages. As per our experiment, we
have used “Weka” which is not a big data analytics tool
instead this is an open-source machine learning tool that
brings ease to learning, is useful and best fit for analysis.
Thus, it was not comprised of a big data tool. It was
moreover used in the experiment to generate further
knowledge and visibility.
We discovered that the visibility was the intention to be
used in the intellect phase in the process of discovery
for data and in the alternative phase at the time of
assessing the possible direction of action, was found to
be serving as analysis on their own for the planning
phase based on which the important understanding can
be captured. Therefore, we can foresee the valuable
relationship well in advance which can be included in
the added analysis or the results of the analysis can be
used for a comprehensive study. This was required to be
included in the plan phase of the framework. A similar
goes for the statistics which was initially planned for the
intellect phase at the time of data findings.
Nevertheless, it was identified that the statistical
analysis could be utilized in the Plan phase in the form
of analysis or can be used with other analyses using
integration. Consequently, in the Analysis of data step,
Prediction & descriptive analytics can be utilized for
value and to add insights.
It was later discovered that there is no need for two steps
in two different phases in order to analyze and evaluate
the end results related to big data analytics instead they
can merge in one step for refinement and better
understanding. The evaluation and analysis must be
carried out on the analytics which relate to the decision
domain instead of the probable direction of action. This
is because of the fact once analytics are completed, the
generated outcomes are analyzed to obtain the insight
which can enable more knowledge and helps at the time
of decision making. It's not always possible to have the
best scenario where the probable direction of action is
defined first and later start evaluating each of them to
retrieve the best selection for the decision.
32
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
Figure 5 – Updated B-DAD Framework
Big data analytics has diverged from that of traditional
ones in terms of being more structured, ability to have a
method in order to detect the business problem,
capturing the data, and analyzing it instead of opting for
an unstructured method where firstly data is gathered
followed by analysis i.e. capture the max information
out of the data and later trying figure out how to support
business decision using the extraordinary analysis. We
don’t have to know about the decisions which are about
to happen in advance and such a case is required to be
33
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
supported by the framework. Consequently, in a single
step analyzing and evaluating must be merged using
which one or both can be applied to the end result of the
analysis.
The framework is not a tool for generating adequate
informed and structured decisions instead it's moreover
conceptual when it comes to the mapping tools with the
decision-making process. Besides the framework was
more of a flexible process instead of being sequential.
We must be able to navigate back 7 forth within the
steps of the phases. As we stated previously that big data
analytics is not fully predictable so one can opt for an
unstructured method where no one is aware of the
decisions in advance and in such a case it will be more
meaningful if one wants to move back to the prior stage.
For example – while performing the analysis for the
plan phase, we might find ourselves in a situation where
more diverse data increases the knowledge of our
model. Therefore, a framework must enable the ability
to move back and forth between the steps.
The obtained findings of the experiment were inherited
in the updated framework. The newly checked B-DAD
framework is displayed in figure 5. No changes are
made to the intellect phase. Though the plan phase only
consists of a model plan and analysis of data steps.
Descriptive analysis was a new addition to the analysis
of data step. The analysis and evaluation are combined
into one step located in the alternative phase. The step
doesn’t reflect analyzing or evaluating the probable
direction of action instead analyzing the previous
analysis and assesses the impact on the decisions. The
bilateral arrow was included in the middle of phases to
reflect the ability to move back and forth all around the
framework.
5. Conclusion –
We have inspected the innovational aspect of big data
in this research which has captured lots of attention due
to its offering and opportunities. The current era of
information technology consists of high-velocity data
which is produced on daily basis and they hold the
hidden information inside of it and it is required to have
it extracted. Thus, utilizing bug data analytics business
can be leveraged resulting in enhanced business
decision making with the adoption of advanced
analytics method on big data and identifying the hidden
and valuable knowledge or insights.
Enforcing analytics on big data can help to extract
valuable information and it can be utilized to intensify
decision making and help the informed decision. We
have implied the data science methodology to discover
the solution to the problem of how to integrate big data
analytics into the decision-making process? This
research contributed to the development and testing of
the B-DAD framework enabling us to know the
decision-making process helped by big data analytics
and several tools or methods.
The research-based on data science must contribute
towards the knowledge base along with the application
in corresponding environments. Our research adds the
contribution to the knowledge base by capturing various
theories, frameworks, and methods along with opted
data analysis methods from research in the past to
construct the reliable B-DAD framework. Hence, it
creates a cluster of so many aspects & integrates them
to transform the one framework. The testing of such a
framework in real-life scenarios adds more diligence
and reinforces the framework evaluation.
The research adds contributions to the environment.
The B-DAD framework can be utilized both in research
and in the industry or organization. The prime goal of
the decision-makers or stakeholders in the organization
is to intensify decision-making and capture the hidden
knowledge with facts and insights. This research has
produced the B-DAD framework for organizations by
showcasing how it can be integrated with the big data
analytics for each phase in the decision-making process
to generate enhanced and impactful decisions.
It's not easy to intensify decision-making and capturing
of hidden insights or knowledge using big data
analytics. One of the key challenges we faced in our
case was access to big data.
We trust in the fact that in the time of overflow of
information big data analytics will be a huge help to
foresee the hidden insights and by offering the
advantages to the decision-makers in so many aspects.
Once it is explored and utilized efficiently, big data
analytics have a lot to offer at the scientific and
technological levels.
References –
[1].
Harrison, E. Frank. "A process perspective on strategic
decision making." Management decision (1996).
34
IJRITCC | May 2022, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 10 Issue: 5
DOI: https://doi.org/10.17762/ijritcc.v10i5.5550
Article Received: 14 March 2022 Revised: 28 March 2022 Accepted: 25 April 2022 Publication: 31 May 2022
___________________________________________________________________________________________________________________
[2].
Nutt, Paul C. "Investigating the success of decision
making processes." Journal of Management Studies 45.2
(2008): 425-455.
[3].
Sagiroglu, Seref, and Duygu Sinanc. "Big data: A
review." 2013 international conference on collaboration
technologies and systems (CTS). IEEE, 2013.
[4].
Hurwitz, Judith, et al. "Big Data." New York (2013).
[5].
Labrinidis, Alexandros, and Hosagrahar V. Jagadish.
"Challenges
and
opportunities
with
big
data." Proceedings of the VLDB Endowment 5.12 (2012):
2032-2033.
[6].
Reichert, Ramón. Big data. Bielefeld: transcript Verlag,
2014.
[7].
George, Gerard, Martine R. Haas, and Alex Pentland.
"Big data and management." Academy of management
Journal 57.2 (2014): 321-326.
[8].
Madden, Sam. "From databases to big data." IEEE
Internet Computing 16.3 (2012): 4-6.
[9].
Davenport, Thomas H., and Jill Dyché. "Big data in big
companies." International Institute for Analytics 3.1-31
(2013).
[10].
Che, Dunren, Mejdl Safran, and Zhiyong Peng. "From big
data to big data mining: challenges, issues, and
opportunities." International conference on database
systems for advanced applications. Springer, Berlin,
Heidelberg, 2013.
[11].
Russom, Philip. "Big data analytics." TDWI best
practices report, fourth quarter 19.4 (2011): 1-34.
[12].
Kambatla, Karthik, et al. "Trends in big data
analytics." Journal of parallel and distributed
computing 74.7 (2014): 2561-2573.
[13].
Fisher, Danyel, et al. "Interactions with big data
analytics." interactions 19.3 (2012): 50-59.
[14].
Silva, Bhagya Nathali, Muhammad Diyan, and Kijun
Han. "Big data analytics." Deep Learning: Convergence
to Big Data Analytics. Springer, Singapore, 2019. 13-30.
[15].
Duan, Lian, and Ye Xiong. "Big data analytics and
business
analytics." Journal
of
Management
Analytics 2.1 (2015): 1-21.
[16].
Vassakis, Konstantinos, Emmanuel Petrakis, and Ioannis
Kopanakis. "Big data analytics: applications, prospects
and challenges." Mobile big data. Springer, Cham, 2018.
3-20.
[17].
Bumblauskas, Daniel, et al. "Big data analytics:
transforming data to action." Business Process
Management Journal (2017).
[18].
Santos, Maribel Yasmina, et al. "A big data analytics
architecture for industry 4.0." World Conference on
Information Systems and Technologies. Springer, Cham,
2017.
[19].
Shi, Yong. Advances in big data analytics: theory,
algorithms and practices. Springer Nature, 2022.
35
IJRITCC | May 2022, Available @ http://www.ijritcc.org