ML 1

Machine Learning(102045609) 12202080503005
PRACTICAL – 1
AIM: STUDY PYTHON ECOSYSTEM FOR MACHINE LEARNING :PYTHON
,SCIPY,SCIKIT-LEARN.
1. Python:
- Description: Python is a versatile, high-level programming language widely used in

various fields, including machine learning. It is known for its readability, ease of use, and
extensive support through a vast ecosystem of libraries.
- Relevance to Machine Learning: Python serves as the primary language for developing
machine learning models due to its simplicity, readability, and the availability of numerous
libraries and frameworks.
2. NumPy:
- Description: NumPy is a fundamental package for scientific computing in Python. It

provides support for large, multi-dimensional arrays and matrices, along with a collection of
mathematical functions to operate on these arrays.
- Relevance to Machine Learning: NumPy is crucial for handling numerical operations

efficiently. Many machine learning libraries, including scikit-learn, rely heavily on NumPy
arrays for data manipulation.
3. SciPy:
- Description: SciPy is an open-source library used for scientific and technical computing.
It builds on NumPy and provides additional functionality for optimization, integration,
interpolation, eigenvalue problems, and more.
- Relevance to Machine Learning: SciPy complements NumPy and is often used for
advanced mathematical operations required in machine learning algorithms. It enhances the
capabilities of NumPy, making it an essential part of the machine learning ecosystem.
4. scikit-learn:
- Description: scikit-learn is an open-source machine learning library for Python. It

provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy,
and Matplotlib.
pg. 1
- Key Features:
- Provides a wide range of machine learning algorithms for classification, regression,

clustering, dimensionality reduction, and more.
- Offers tools for model selection, preprocessing, and evaluation.
- Has a consistent API, making it easy to experiment with different algorithms.
- Relevance to Machine Learning: scikit-learn is one of the most widely used machine
learning libraries due to its simplicity, ease of use, and comprehensive documentation. It's
suitable for both beginners and experts in the field.
When working on machine learning projects in Python, these libraries are often used in
combination to handle data manipulation, scientific computing, and implementing machine
learning models. Additionally, other libraries like TensorFlow and PyTorch are popular for
deep learning applications, but scikit-learn remains a valuable tool for traditional machine
learning tasks.
pg. 2
PRACTICAL – 2
AIM: STUDY OF PREPROCESSING METHODS.
Data cleaning : Makes data easier to understand and use.It also involves processing data to
reduce noise and treat missing values.
Feature engineering : Involves selecting, extracting, transforming, and creating new features
from the available data to improve the performance of machine learning algorithms.
Data reduction : A crucial step in data preprocessing. It can help make data analysis and the
mining of huge amounts of data easier and simpler.
Data transformation : A preprocessing task that includes any procedure that modifies the
original form of the data.
Deep learning : A subset of machine learning that uses the artificial networks for the
preprocessing of the data. It also creates patterns that are used for decision making.
Feature scaling : An important technique in Machine Learning and it is one of the most
important steps during the preprocessing of data before creating a machine learning model.
Feature selection : A preprocessing step to machine learning to reduce dimensionality,
removing irrelevant and redundant data to improve the result comprehensibility.
Data integration : A form of preprocessing that may combine multiple data sources.
Write a program to find following statistics from a given dataset. Mean, mode, median,
variance, standard deviation, quartiles, interquartile range.
import numpy as np
from scipy import stats
def calculate_statistics(dataset):
# Mean
mean_value = np.mean(dataset)
# Mode
mode_value = stats.mode(dataset).mode[0]
# Median
median_value = np.median(dataset)
# Variance
variance_value = np.var(dataset)
# Standard Deviation
pg. 3
std_deviation_value = np.std(dataset)
# Quartiles
first_quartile = np.percentile(dataset, 25)
third_quartile = np.percentile(dataset, 75)
# Interquartile Range
interquartile_range = third_quartile - first_quartile
return {
'Mean': mean_value,
'Mode': mode_value,
'Median': median_value,
'Variance': variance_value,
'Standard Deviation': std_deviation_value,
'First Quartile': first_quartile,
'Third Quartile': third_quartile,
'Interquartile Range': interquartile_range
}
# Example usage:
dataset = [4, 8, 6, 2, 10, 5, 8, 7]
result = calculate_statistics(dataset)
# Display results
for key, value in result.items():
print(f'{key}: {value}')
Output:
pg. 4
PRACTICAL – 3
AIM: Study and implement PCA in python.
Principal Component Analysis (PCA) is a technique used for dimensionality reduction and
feature extraction. It's commonly used in machine learning and data analysis to transform
high-dimensional data into a lower-dimensional representation while preserving the most
important information.
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
np.random.seed(42)
X = np.random.rand(100, 2) # 100 samples, 2 features
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1])
plt.title("Original Data")
plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title("Data after PCA")
plt.show()
pg. 5
output:
Steps:
1. Create a Sample Dataset:
- In this example, we generate a random dataset with 100 samples and 2 features.
2. Instantiate PCA:
- Create a PCA object from scikit-learn with the desired number of components. In this
case, we set `n_components` to 2.
3. Fit and Transform:
- Fit the PCA model to the data and simultaneously transform the original data into the
principal components.
4. Explained Variance Ratio:
- The `explained_variance_ratio_` attribute provides the ratio of variance captured by each

principal component. It helps us understand how much information is retained in each
component.
5. Plotting:
- Visualize the original data and the transformed data after PCA using Matplotlib.
pg. 6

ML 1

Uploaded by

Copyright:

Available Formats

ML 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML 1

Uploaded by

Copyright:

Available Formats

Machine Learning(102045609) 12202080503005

- Description: Python is a versatile, high-level programming language widely used in

- Description: NumPy is a fundamental package for scientific computing in Python. It

- Relevance to Machine Learning: NumPy is crucial for handling numerical operations

- Description: scikit-learn is an open-source machine learning library for Python. It

- Provides a wide range of machine learning algorithms for classification, regression,

- Offers tools for model selection, preprocessing, and evaluation.

- Has a consistent API, making it easy to experiment with different algorithms.

from sklearn.decomposition import PCA

import matplotlib.pyplot as plt

X = np.random.rand(100, 2) # 100 samples, 2 features

print("Explained Variance Ratio:", pca.explained_variance_ratio_)

plt.scatter(X[:, 0], X[:, 1])

plt.scatter(X_pca[:, 0], X_pca[:, 1])

plt.title("Data after PCA")

1. Create a Sample Dataset:

3. Fit and Transform:

4. Explained Variance Ratio:

- The `explained_variance_ratio_` attribute provides the ratio of variance captured by each

You might also like