DS Lecture 15
DS Lecture 15
DS Lecture 15
Instructor:
Rabia Tariq
Lecturer, Department of Computer Science
Email: rabiatariq964@gmail.com
Lecture Content
• Exploratory Data Analysis
• Data Science Process
Exploratory Data Analysis
• Exploratory data analysis (EDA) is used by data
scientists to analyze and investigate data sets and
summarize their main characteristics, often employing
data visualization methods.
• EDA helps determine how best to manipulate data
sources to get the answers you need, making it easier
for data scientists to discover patterns, spot anomalies,
test a hypothesis, or check assumptions.
• EDA is primarily used to see what data can reveal
beyond the formal modeling or hypothesis testing task
and provides a provides a better understanding of data
set variables and the relationships between them.
Exploratory Data Analysis
• It can also help determine if the statistical techniques
you are considering for data analysis are appropriate.
• Originally developed by American mathematician John
Tukey in the 1970s.
Importance of Exploratory Data
Analysis in Data Science
• The main purpose of EDA is to help look at data before
making any assumptions. It can help identify obvious
errors, as well as better understand patterns within the
data, detect outliers or anomalous events, find interesting
relations among the variables.
• Data scientists can use exploratory analysis to ensure the
results they produce are valid and applicable to any
desired business outcomes and goals.
• EDA also helps stakeholders by confirming they are
asking the right questions.
Importance of Exploratory Data
Analysis in Data Science
Mining social network graphs can be important in data science for several
reasons:
1. Understanding social connections:
• By analyzing the structure of a social network, it is possible to understand
how people are connected and identify patterns in those connections.
For example, one might study the connections between co-authors on
scientific papers to understand patterns of collaboration in a field.
2. Modeling the spread of information or disease:
• Social networks can be used to model the spread of information or
diseases through a population. By understanding the structure of the
network, researchers can predict how fast a piece of information or
disease might spread and identify key individuals who might play a role in
its spread.
Importance of Mining Social Network Graphs in
Data Science
3. Improving recommendation systems:
• Social network data can be used to improve recommendation
systems by incorporating the preferences and interests of an
individual's connections. For example, a movie recommendation
system might use data from a social network to recommend movies
that are popular among an individual's friends.
4. Identifying communities and influence:
• Social network analysis can be used to identify communities within a
network, and to identify individuals who have a high level of
influence within those communities. This can be useful for marketing
and public relations efforts, as well as for identifying key stakeholders
in a particular domain.
Representing mine social network data
through a graph
• There are many ways to represent a social network graph, but the
most common representation is a graph where the nodes represent
individuals, and the edges represent relationships between them.
• Relationships can be of various types, such as friendship, family, co-
worker, etc.
• Some common tasks in mining a social network graph include
finding communities or clusters within the network, predicting
missing or future relationships, and identifying key influencers or
central nodes within the network.
Representing mine social network data through a graph
• There are many different tools and techniques that can be used for
mining a social network graph, including machine learning algorithms,
graph theory, and network analysis.
Steps for mining a social network graph
Here are the general steps that can be followed when mining a social
network graph:
1. Gather data on the social network:
This can be done using APIs or web scraping techniques to access data
from social media platforms or other sources.
2. Construct the social network graph:
Use the data to create a visual representation of the network, with
nodes representing individuals or organizations and edges
representing the connections or relationships between them.
Steps for mining a social network graph
3. Visualize the social network graph:
Use tools such as Gephi or NodeXL to visualize the network and identify
key nodes and connections.
4. Analyze the social network:
Use statistical and machine learning techniques to identify patterns and
trends in the data, and to forecast future events or behaviors within the
network.
5. Interpret and communicate the results:
Use the insights gained from the analysis to understand the structure
and dynamics of the social network, and to inform decision-making or
strategic planning.
Important terms for mining social
network graphs
• There are several key terms and concepts that are important to
understand when it comes to mining social network graphs. These
include:
1. Graph theory:
The study of graphs and the relationships between their nodes and
edges.
2. Node: A node is a point or vertex in a graph, representing an
individual or organization.
Important terms for mining social
network graphs
3. Edge: An edge is a line connecting two nodes in a graph, representing
a relationship or connection between them.
Important terms for mining social network
graphs
4. Network analysis: The process of using graph theory and data
analysis techniques to understand the structure and dynamics of a
network.
Important terms for mining social
network graphs
• 5. Centrality measures: Algorithms that measure the importance or
influence of a node within a network. Examples include degree
centrality, betweenness centrality, and eigenvector centrality.
Centrality measures are algorithms that are used to identify the most
important or influential nodes in a network. In a social network graph,
centrality measures can be used to identify individuals or
organizations that are highly connected or influential within the
network.
Important terms for mining social network
graphs