ML 1
ML 1
ML 1
PRACTICAL – 1
AIM: STUDY PYTHON ECOSYSTEM FOR MACHINE LEARNING :PYTHON
,SCIPY,SCIKIT-LEARN.
1. Python:
- Relevance to Machine Learning: Python serves as the primary language for developing
machine learning models due to its simplicity, readability, and the availability of numerous
libraries and frameworks.
2. NumPy:
3. SciPy:
- Description: SciPy is an open-source library used for scientific and technical computing.
It builds on NumPy and provides additional functionality for optimization, integration,
interpolation, eigenvalue problems, and more.
- Relevance to Machine Learning: SciPy complements NumPy and is often used for
advanced mathematical operations required in machine learning algorithms. It enhances the
capabilities of NumPy, making it an essential part of the machine learning ecosystem.
4. scikit-learn:
pg. 1
Machine Learning(102045609) 12202080503005
- Key Features:
- Relevance to Machine Learning: scikit-learn is one of the most widely used machine
learning libraries due to its simplicity, ease of use, and comprehensive documentation. It's
suitable for both beginners and experts in the field.
When working on machine learning projects in Python, these libraries are often used in
combination to handle data manipulation, scientific computing, and implementing machine
learning models. Additionally, other libraries like TensorFlow and PyTorch are popular for
deep learning applications, but scikit-learn remains a valuable tool for traditional machine
learning tasks.
pg. 2
Machine Learning(102045609) 12202080503005
PRACTICAL – 2
AIM: STUDY OF PREPROCESSING METHODS.
Data cleaning : Makes data easier to understand and use.It also involves processing data to
reduce noise and treat missing values.
Feature engineering : Involves selecting, extracting, transforming, and creating new features
from the available data to improve the performance of machine learning algorithms.
Data reduction : A crucial step in data preprocessing. It can help make data analysis and the
mining of huge amounts of data easier and simpler.
Data transformation : A preprocessing task that includes any procedure that modifies the
original form of the data.
Deep learning : A subset of machine learning that uses the artificial networks for the
preprocessing of the data. It also creates patterns that are used for decision making.
Feature scaling : An important technique in Machine Learning and it is one of the most
important steps during the preprocessing of data before creating a machine learning model.
Feature selection : A preprocessing step to machine learning to reduce dimensionality,
removing irrelevant and redundant data to improve the result comprehensibility.
Data integration : A form of preprocessing that may combine multiple data sources.
Write a program to find following statistics from a given dataset. Mean, mode, median,
variance, standard deviation, quartiles, interquartile range.
import numpy as np
from scipy import stats
def calculate_statistics(dataset):
# Mean
mean_value = np.mean(dataset)
# Mode
mode_value = stats.mode(dataset).mode[0]
# Median
median_value = np.median(dataset)
# Variance
variance_value = np.var(dataset)
# Standard Deviation
pg. 3
Machine Learning(102045609) 12202080503005
std_deviation_value = np.std(dataset)
# Quartiles
first_quartile = np.percentile(dataset, 25)
third_quartile = np.percentile(dataset, 75)
# Interquartile Range
interquartile_range = third_quartile - first_quartile
return {
'Mean': mean_value,
'Mode': mode_value,
'Median': median_value,
'Variance': variance_value,
'Standard Deviation': std_deviation_value,
'First Quartile': first_quartile,
'Third Quartile': third_quartile,
'Interquartile Range': interquartile_range
}
# Example usage:
dataset = [4, 8, 6, 2, 10, 5, 8, 7]
result = calculate_statistics(dataset)
# Display results
for key, value in result.items():
print(f'{key}: {value}')
Output:
pg. 4
Machine Learning(102045609) 12202080503005
PRACTICAL – 3
AIM: Study and implement PCA in python.
Principal Component Analysis (PCA) is a technique used for dimensionality reduction and
feature extraction. It's commonly used in machine learning and data analysis to transform
high-dimensional data into a lower-dimensional representation while preserving the most
important information.
import numpy as np
np.random.seed(42)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title("Original Data")
plt.subplot(1, 2, 2)
plt.show()
pg. 5
Machine Learning(102045609) 12202080503005
output:
Steps:
- In this example, we generate a random dataset with 100 samples and 2 features.
2. Instantiate PCA:
- Create a PCA object from scikit-learn with the desired number of components. In this
case, we set `n_components` to 2.
- Fit the PCA model to the data and simultaneously transform the original data into the
principal components.
5. Plotting:
- Visualize the original data and the transformed data after PCA using Matplotlib.
pg. 6