Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
1, February 2016
Jai Prakash Verma , Smita Agrawal , Bankim Patel and Atul Patel
ABSTRACT
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted.
KEYWORDS
Big Data, Big Data Analytics, Social Media Analytics, Content Based Analytics, Text Analytics, Audio
Analytics, Video Analytics.
1. INTRODUCTION
The term big data is used to describe the growth and the availability of huge amount of structured
and unstructured data. Big data which are beyond the ability of commonly used software tools to
create, manage, and process data within a suitable time. Big data is important because the more
data we collect the more accurate result we get and able to optimize business processes. The Big
data is very important for business and society purpose. The data came from everywhere like
sensors that used to gather climate information, available post or share data on the social media
sites, video movie audio etc. This collection of data is called BIG DATA.
Now a days this big data is used in multiple ways to grow business and to know the world [1,2,
15].
In most enterprise scenarios the data is too big or it moves too fast or it exceeds current
processing capacity. Big data has the potential to help companies improve operations and make
faster, more intelligent decisions. Big data usually includes data sets with sizes beyond the ability
of commonly used software tools to capture, curate, manage, and process data within a tolerable
elapsed time. Big data is a set of techniques and technologies that require new forms of
integration to uncover large hidden values from large datasets that are diverse, complex, and of a
massive scale. Wal-Mart handles more than 1 million customer transaction every hour. Facebook
handles 40 billion photos from its user base. Big data require some technology to efficiently
process large quantities of data. It use some technology like, data fusion and integration, genetic
algorithms, machine
learning, and signal processing, simulation, natural language processing,
time series Analytics and visualization [12,13,16]
DOI :10.5121/ijscai.2016.5105
41
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
1.1. Characteristics of Big Data:Volume: Many factors contribute to the increase in data volume. Transaction-based data stored
through the years. Unstructured data streaming in from social media. Increasing amounts of
sensor and machine-to-machine data being collected. In the past, excessive data volume was a
storage issue. But with decreasing storage costs, other issues emerge, including how to determine
relevance within large data volumes and how to use analytics to create value from relevant data
[10, 12,13, 15,16].
Velocity: Data is streaming in at unprecedented speed and must be dealt with in a timely manner.
RFID tags, sensors and smart metering are driving the need to deal with torrents of data in nearreal time. Reacting quickly enough to deal with data generation speed is a challenge for most
organizations.
Variety: Data today comes in all types of formats. Structured, numeric data in traditional
databases. Information created from line-of-business applications. Unstructured text documents,
email, video, audio, stock ticker data and financial transactions. Managing, merging and
governing different varieties of data is something many organizations still grapple with.
Variability: In addition to the increasing velocities and varieties of data, data flows can be highly
inconsistent with periodic peaks. Daily, seasonal and event-triggered peak data loads can be
challenging to manage. Even more so with unstructured data involved.
Complexity: Today's data comes from multiple sources. And it is still an undertaking to link,
match, cleanse and transform data across systems. However, it is necessary to connect and
correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out
of control.
Value: It includes how we can use this big data for enhancing the business and living style. We
know that different types of business or social application generate different types of data. Still
identifying values form Big Data in their application areas is a big issue.
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
predictive analytics, text analytics, statistical analytics and data mining[1,2,4]. Types of big data
analytics are: Prescriptive: - This type of analytics help to decide what actions should be taken. It
very valuable but not used largely. It focuses on answer specific question like, hospital
management, diagnosis of cancer patients, diabetes patients that determine where to focus
treatment. Predictive: - This type of analytics help to predict future or what might be happen. For
example some companies use predictive analytics to take decision for sales, marketing,
production, etc. Diagnostic: - In this type look at past and analyze the situation what happen in
past and why it happen. And how we can overcome this situation. For example weather
preadiction, customer behavioral analysis etc. Descriptive:-It describes what is happening
currently and prediction near future. For example market analysis, compatains behavioral
analysis etc.
By using appropriate analytics organization can increase sales, increase customer service, and can
improve operations. Predictive Analytics allow organizations to make better and faster decisions
[1, 2, 4, 10].
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
likely victims are credit card issuers, insurance companies, retail merchants, manufacturers,
business-to-business suppliers and even services providers. Predictive analysis can help to
identify high-risk fraud candidates in business or the public sector.
Portfolio, product or economy-level prediction: These types of problems can be addressed by
predictive analytics using time series techniques. They can also be addressed via machine
learning approaches which transform the original time series into a feature vector space, where
the learning algorithm finds patterns that have predictive power.
Risk management: When employing risk management techniques, the results are always to
predict and benefit from a future scenario. Predictive analysis helps organizations or business
enterprises to identify future risk, Natural Disaster and its effect. Risk management helps them to
take correct decision on correct time.
Underwriting: Many businesses have to account for risk exposure due to their different services
and determine the cost needed to cover the risk. For example, auto insurance providers need to
accurately determine the amount of premium to charge to cover each automobile and driver. For
a health insurance provider, predictive analytics can analyze a few years of past medical claims
data, as well as lab, pharmacy and other records where available, to predict how expensive an
enrollee is likely to be in the future. Predictive analytics can help underwrite these quantities by
predicting the chances of illness, default, bankruptcy, etc. Predictive analytics can streamline the
process of customer acquisition by predicting the future risk behaviour of a customer using
application level data.
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
Behavior Analytics
Location-based interaction Analytics
Recommender systems development
Link prediction
Customer interaction and Analytics & marketing
Media use
Security
Social studies
Massive amounts of data require lots of storage space and processing power.
Shifting social media platforms.
Worldwide online accessibility provides more data in many languages.
Evolution of online language.
Credibility: Not all customers tell the truth (especially online), and users who have only a
small rating history can skew the data. In addition, some vendors may give (or encourage
others to give) positive ratings to their own products while giving negative ratings to their
competitors products.
b) Scarcity: Not all items will be rated or will have enough ratings to produce useful data.
c)
Inconsistency: Not all users use the same keywords to tag an item, even though the meaning
may be the same. Additionally, some attributes can be subjective. For example, one viewer
of a movie may consider it short while another says its too long.
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
purchase. Other systems provide social-media-style links so customers can like or dislike a
product.
The ideal system would have both high precision and high recall. But realistically, the best
outcome is to strike a delicate balance between the two. Emphasizing precision or recall really
depends on the problem youre trying to solve [4,6,13].
5. TEXT ANALYTICS
Most of all information or data is available in textual form in databases. From these contexts,
manual Analytics or effective extraction of important information are not possible. For that it is
relevant to provide some automatic tools for analyzing large textual data. Text analytics or text
mining refers process of deriving important information from text data. It will use to extract
meaningful data from the text. It use many ways like associations among entities, predictive
rules, patterns, concepts, events etc. based on rules. Text analytics widely use in government,
research, and business needs. Data simply tells you what people did but text analytics tell you
why. From unstructured or semi structured text data all information will retrieve. From all textual
data it will extract important information. After extracting information it will be categorized. And
from these categorized information we can take decision for business [5, 6].
46
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
people to answer the questions posted by the people. A large number of questions and
answers are posted on the social networking websites.
c) Social Tagging: Tagging of the data has also increased to a great extent. For example when
any particular user is looking or searching for a recent event like Bihar Election then the
system will return the results that are tagged as Bihar or Election.
Textual data in social media provides lots of information and also the user-generated content
provides diverse and unique information in forms of comments, posts and tags. [5,6]
6. AUDIO ANALYTICS
Audio analytics is the process of compressing data and packaging the data in to single format
called audio. Audio Analytics refers to the extraction of meaning and information from audio
signals for Analysis. There are two way to represent the audio Analytics is 1) Sound
Representation 2) Raw Sound Files. Audio file format is a format for store digital audio data on a
system. There are three main audio format: Uncompressed audio format, Lossless compressed
audio format, Lossy compressed audio format. [11]
7. VIDEO ANALYTICS
Video is a major issue when considering big data. Videos and images contribute to 80 % of
unstructured data. Now a days, CCTV cameras are the one form of digital information and
surveillance. All these information is stored and processed for further use, but video contains lots
of information and is generally large in size. For example YouTube has innumerable videos
being uploaded every minute containing a massive information. Not all video are important and
viewed largely. This creates a situation where videos create a junk and hard-core contribution to
big data problems. Apart from videos, surveillance cameras generate a lot of information in
seconds. Even a small Digital camera capturing an image stores millions of pixel information in
mille seconds.
48
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
VIDEO Data Analytics dimensions - Volume: Size of video being more, takes the network as
well as the server, time for processing. Low bandwidth connections create traffic on network as
these videos deliver slowly. When stored on mass storage on secondary storage requires huge
amount of space and takes more time retrieving as well as processing. Variety: Videos consisting
of various format and variety such as HD videos, Blu-ray copies etc. Velocity: It is speed of data.
Now a days, Digital cameras process and capture videos at a very high quality and high speed.
Video editing makes it to grow in size as it contains other extra information about the videos.
Videos grow in size faster as they are simply nothing but collection of images.[7]
REFERENCES
[1]
[2]
[3]
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
AUTHORS
Jai Prakash Verma is associated with Nirma University since 2006. He joined as a
Lecturer in MCA section - Computer Science and Engineering Department. He received
Bachelor in Science (B Sc in PCM) and MCA from University of Rajasthan, Jaipur. He
is currently pursuing his PhD from Charusat University, Changa in the area of Big Data
Analysis. Data Warehousing and Mining is the main area of his expertise. He has been
actively involved in many STTP organized within Nirma university. He is currently
working as Assistant Professor in MCA Section - Computer Science and Engineering Department, Institute
of Technology, Nirma University.
Smita Agrawal received Bachelor in Science (B.Sc. in Chemistry) degree from Gujarat
University, Gujarat, India in 2001 and Masters Degree in Computer Applications
(M.C.A) from Gujarat Vidhyapith, Gujarat, India in 2004. She is pursuing PhD in
Computer Science and Applications from Charotar University of Science and
Technology (CHARUSAT). She is associated with Computer Science and Engineering
Department of Instutute of Technology - Nirma University since 2009. Her research
interests include Parallel Processing, Object Oriented Analysis & Design and Programming Language(s).
50
International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016
51