All Exp Lab
All Exp Lab
All Exp Lab
EXP 1
To implement the task of collecting and visualizing complex social networks from Twitter and
Wikipedia using NodeXL and Python, you can follow these steps:
EXP 2
To compute various vertex and network metrics for social graphs using NodeXL and Python, you
can utilize the NetworkX library, which provides implementations for these metrics. Below is an
example code demonstrating how to compute each of the specified metrics:
```python
import networkx as nx
import matplotlib.pyplot as plt
# Assuming you have loaded your graph, let's compute the metrics:
# (iv) PageRank
pagerank = nx.pagerank(G)
# Example: Visualize the network with node sizes representing degree centrality
# You can customize the visualization as per your preference
node_sizes = [degree_centrality[node] * 1000 for node in G.nodes()]
nx.draw(G, with_labels=True, node_size=node_sizes)
plt.show()
```
Make sure to replace `'your_graph_file.txt'` with the path to your social graph file if you're loading
it from a file. Also, ensure you have installed the required libraries (`networkx`, `matplotlib`) using
pip.
This code snippet computes the specified metrics for the given social graph and prints them out.
You can also visualize the network with customized node sizes based on degree centrality as shown
in the example. Adjustments and customizations can be made according to your specific
requirements.
EXP 3
To visualize social graphs reflecting various metrics using NodeXL and Python, you can use
NetworkX for graph manipulation and Matplotlib for visualization. Below is a sample code
demonstrating how to visualize a social graph while reflecting the computed metrics:
```python
import networkx as nx
import matplotlib.pyplot as plt
# Assuming you have loaded your graph, let's compute some metrics
# For demonstration purposes, let's compute Degree Centrality and PageRank
# (iv) PageRank
pagerank = nx.pagerank(G)
# Draw edges
nx.draw_networkx_edges(G, pos, alpha=0.5)
# Add title
plt.title("Social Network Graph with Degree Centrality and PageRank")
# Show plot
plt.axis('off') # Turn off axis
plt.show()
```
In this code:
- We compute Degree Centrality and PageRank for the given social graph.
- The graph is visualized using a spring layout.
- Nodes are colored based on their degree centrality, and node size remains constant.
- Edges are drawn with a semi-transparent gray color.
- Node labels are added, displaying both the node ID and its corresponding PageRank value.
- A color bar is added to indicate the degree centrality of nodes.
You can customize the visualization further according to your specific requirements or include
additional metrics to reflect in the visualization.
EXP 4
Detecting bridges in a social graph helps identify edges whose removal would disconnect the graph.
You can accomplish this using NetworkX in Python. Below is how you can implement it:
```python
import networkx as nx
In this code:
- We utilize NetworkX's `nx.bridges(G)` function to detect bridges in the graph `G`.
- Bridges are edges whose removal would disconnect the graph.
- The function returns a list of tuples, where each tuple represents a bridge edge `(u, v)`.
You can adapt this code to your specific graph data and further analyze or visualize the detected
bridges as required.
EXP 5
Detecting communities and influencers in a social graph can provide insights into the structure and
key players within the network. One approach to identifying communities is through clique
identification, where cliques represent densely connected subgraphs. Below is a basic
implementation of brute-force clique identification on Enron email data using NetworkX in Python:
```python
import networkx as nx
# Assuming you have loaded your graph, let's identify cliques (communities)
This approach is a brute-force method and may not be efficient for large graphs. You can explore
more advanced community detection algorithms such as Louvain or Girvan-Newman for more
scalable solutions. Additionally, you can analyze the identified cliques further to identify influencers
within each community based on their centrality metrics or other criteria.
EXP 6
```python
import networkx as nx
import matplotlib.pyplot as plt
# Assuming you have loaded your graph, let's apply Girvan-Newman algorithm
You can customize this code according to your specific graph data and further analyze or visualize
the detected communities as required.
EXP 7
Performing classification with network information involves leveraging features derived from the
network structure to classify nodes. One approach is the Weighted Vote Relational Neighbor (WV-
RN) classifier, which uses the relational information of neighboring nodes to make predictions.
Below is a basic implementation of WV-RN classifier for Twitter data using NodeXL and Python:
```python
import networkx as nx
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Assuming you have loaded your graph and labeled nodes for classification
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
In this code:
- We define a function `extract_features()` to extract features from the graph for classification. In
this example, we extract the node's degree and average neighbor degree.
- We iterate over all nodes in the graph and extract features along with their labels for classification.
- We split the data into training and test sets.
- We train a K-nearest neighbors classifier using the extracted features.
- We evaluate the classifier's performance by predicting labels for the test data and calculating
accuracy.
You can customize this code according to your specific Twitter data and classification requirements.
Additionally, you can explore more sophisticated classifiers and feature extraction techniques for
better classification performance.
EXP 8
Performing sentiment analysis on an IMDb dataset involves analyzing the sentiment (positive or
negative) associated with movie reviews. Below is a basic implementation of sentiment analysis on
an IMDb dataset using Python with the NLTK library:
```python
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# Assuming you have loaded your IMDb dataset, let's perform sentiment analysis
In this code:
- We use the NLTK library's VADER (Valence Aware Dictionary and sEntiment Reasoner)
sentiment analysis tool for sentiment analysis.
- We load the IMDb dataset into a pandas DataFrame.
- We define a function `get_sentiment()` to calculate sentiment polarity scores using VADER and
classify the sentiment as positive, negative, or neutral based on the compound score.
- We apply sentiment analysis to the 'Review' column of the IMDb dataset and add a new column
'Sentiment' to store the sentiment labels.
- Finally, we print the sentiment analysis results, showing the counts of positive, negative, and
neutral sentiments in the dataset.
You need to ensure you have the NLTK library installed (`pip install nltk`) and have downloaded
the VADER lexicon (`nltk.download('vader_lexicon')`). Additionally, replace `'imdb_dataset.csv'`
with the path to your IMDb dataset CSV file.
EXP 9
To apply the k-means clustering algorithm on an IMDb dataset using Python, you can use libraries
such as scikit-learn. Below is a basic implementation:
```python
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
# Assuming you have loaded your IMDb dataset, let's apply k-means clustering
In this code:
- We use the scikit-learn library to perform k-means clustering.
- We extract text data (e.g., movie reviews) from the IMDb dataset.
- We initialize a TF-IDF vectorizer to convert text data into numerical features.
- We fit and transform the text data into TF-IDF features.
- We apply k-means clustering with a specified number of clusters (k).
- We add cluster labels to the IMDb dataset.
- Finally, we print the count of movies in each cluster.
You need to ensure you have scikit-learn and pandas installed (`pip install scikit-learn pandas`).
Additionally, replace `'imdb_dataset.csv'` with the path to your IMDb dataset CSV file. Adjust the
parameters of the TF-IDF vectorizer and the number of clusters (k) according to your specific
dataset and requirements.
EXP 10
To apply user-based collaborative filtering on Amazon review data using Python, you can use
libraries such as Surprise. Below is a basic implementation:
```python
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse
# Assuming you have loaded your Amazon review data, let's apply user-based collaborative filtering
# Compute RMSE (Root Mean Squared Error) to evaluate the model performance
accuracy = rmse(predictions)
print("RMSE:", accuracy)
```
In this code:
- We use the Surprise library, which provides collaborative filtering algorithms and evaluation
metrics.
- We load the Amazon review data into a Surprise dataset object.
- We split the data into training and test sets.
- We define the user-based collaborative filtering model using the KNNBasic algorithm with cosine
similarity.
- We train the model on the training set.
- We make predictions on the test set.
- We compute RMSE to evaluate the model's performance.
You need to ensure you have Surprise installed (`pip install scikit-surprise`). Additionally, replace
`'amazon_reviews.csv'` with the path to your Amazon review data CSV file. Adjust the parameters
of the model and evaluation metrics according to your specific dataset and requirements.
EXP 11
To apply item-based collaborative filtering on Amazon review data using Python, you can still use
the Surprise library. However, you'll need to set `user_based` parameter to `False` to perform item-
based collaborative filtering. Below is how you can implement it:
```python
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse
# Assuming you have loaded your Amazon review data, let's apply item-based collaborative
filtering
# Compute RMSE (Root Mean Squared Error) to evaluate the model performance
accuracy = rmse(predictions)
print("RMSE:", accuracy)
```
In this code:
- We still use the Surprise library for collaborative filtering.
- We load the Amazon review data into a Surprise dataset object.
- We split the data into training and test sets.
- We define the item-based collaborative filtering model using the KNNBasic algorithm with cosine
similarity and `user_based` parameter set to `False`.
- We train the model on the training set.
- We make predictions on the test set.
- We compute RMSE to evaluate the model's performance.
You need to ensure you have Surprise installed (`pip install scikit-surprise`). Additionally, replace
`'amazon_reviews.csv'` with the path to your Amazon review data CSV file. Adjust the parameters
of the model and evaluation metrics according to your specific dataset and requirements.
EXP 12
Predicting individual behavior of users in social media can involve various techniques depending on
the specific behavior you're interested in. One common approach is to use machine learning
algorithms to predict user actions or preferences based on historical data and user features. Below is
a basic example of how you can predict user behavior in social media using Python with scikit-
learn:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Assuming you have loaded your social media data, let's predict user behavior
In this code:
- We load social media data into a pandas DataFrame. This data should contain features related to
user behavior and a target column representing the behavior you want to predict.
- We define features (X) and the target variable (y).
- We split the data into training and test sets.
- We define a machine learning model (e.g., RandomForestClassifier) to predict user behavior.
- We train the model on the training set.
- We make predictions on the test set.
- We evaluate the model's performance using accuracy score.
You need to ensure you have pandas and scikit-learn installed (`pip install pandas scikit-learn`).
Additionally, replace `'social_media_data.csv'` with the path to your social media data CSV file and
`'target_column'` with the name of the target column representing the behavior you want to predict.
Adjust the machine learning model and parameters according to your specific dataset and prediction
task.