Nothing Special   »   [go: up one dir, main page]

Práctica de Maraton - Ipynb - Colaboratory - Alejandro

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ALEJANDRO GIRÓN – PRÁCTICA DE MARATON – 8590 20 931 – INTELIGENCIA ARTIFICIAL - UMG SANARATE Práctica de maraton.

ipynb - Colaboratory

#para traer la tabla al ejercicio


from google.colab import files
uploaded = files.upload()

Elegir archivos MarathonData.csv


MarathonData.csv(text/csv) - 5664 bytes, last modified: 16/3/2024 - 100% done
Saving MarathonData.csv to MarathonData (2).csv

#importando librerias necesarias


import io
import pandas as pd
#crear una variable de tipo pandas apra tratar los datos de archivos csv
datos_maraton = pd.read_csv(io.BytesIO(uploaded['MarathonData (2).csv']))

#visualización de datos de la variable


datos_maraton[['Name', 'Category']]

Name Category

0 Blair MORGAN MAM

1 Robert Heczko MAM

2 Michon Jerome MAM

3 Daniel Or lek M45

4 Luk ? Mr zek MAM

... ... ...

82 Stefano Vegliani M55

83 Andrej Madliak M40

84 Yoi Ohsako M40

85 Simon Dunn M45

86 Pavel ?imek M40

87 rows × 2 columns

#verificar la estructura de datos


datos_maraton.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87 entries, 0 to 86
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 87 non-null int64
1 Marathon 87 non-null object
2 Name 87 non-null object
3 Category 81 non-null object
4 km4week 87 non-null float64
5 sp4week 87 non-null float64
6 CrossTraining 13 non-null object
7 Wall21 87 non-null object
8 MarathonTime 87 non-null float64
9 CATEGORY 87 non-null object
dtypes: float64(3), int64(1), object(6)
memory usage: 6.9+ KB

#preparar los datos


datos_maraton['Wall21']= pd.to_numeric(datos_maraton['Wall21'], errors='coerce')

#descriptivo de datos
datos_maraton.describe()

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 1/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

id km4week sp4week MarathonTime

count 87.000000 87.000000 87.000000 87.000000

mean 44.000000 62.347126 139.840706 3.319080

std 25.258662 26.956019 1191.427864 0.376923

min 1.000000 17.900000 8.031414 2.370000

25% 22.500000 44.200000 11.498168 3.045000

50% 44.000000 58.800000 12.163424 3.320000

75% 65.500000 77.500000 12.854036 3.605000

max 87.000000 137.500000 11125.000000 3.980000

#grafica de histogramas
datos_maraton.hist()

array([[<Axes: title={'center': 'id'}>,


<Axes: title={'center': 'km4week'}>],
[<Axes: title={'center': 'sp4week'}>,
<Axes: title={'center': 'MarathonTime'}>]], dtype=object)

#borrar columnaas
datos_maraton = datos_maraton.drop(columns=['Name'])
datos_maraton = datos_maraton.drop(columns=['id'])
datos_maraton = datos_maraton.drop(columns=['Marathon'])
datos_maraton = datos_maraton.drop(columns=['CATEGORY'])

#ver los elementos con valores null


datos_maraton.isna() .sum()

Category 6
km4week 0
sp4week 0
CrossTraining 74
Wall21 0
MarathonTime 0
dtype: int64

#reemplazar los datos null de una columna por el valor 0


datos_maraton['CrossTraining']=datos_maraton['CrossTraining'].fillna(0)

#visualizar datos
datos_maraton

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 2/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

Category km4week sp4week CrossTraining Wall21 MarathonTime

0 MAM 132.8 14.434783 0 1.16 2.37

1 MAM 68.6 13.674419 0 1.23 2.59

2 MAM 82.7 13.520436 0 1.30 2.66

3 M45 137.5 12.258544 0 1.32 2.68

4 MAM 84.6 13.945055 0 1.36 2.74

... ... ... ... ... ... ...

82 M55 50.0 10.830325 0 2.02 3.93

83 M40 33.6 10.130653 ciclista 3h 1.94 3.93

84 M40 55.4 11.043189 0 1.94 3.94

85 M45 33.2 11.066667 0 2.05 3.95

86 M40 17.9 10.848485 ciclista 5h 2.05 3.98

87 rows × 6 columns

Next steps: toggle_off View recommended plots

#eliminar filas de nuestra data que contienen valores nulos


datos_maraton = datos_maraton.dropna( how='any')
datos_maraton

Category km4week sp4week CrossTraining Wall21 MarathonTime

0 MAM 132.8 14.434783 0 1.16 2.37

1 MAM 68.6 13.674419 0 1.23 2.59

2 MAM 82.7 13.520436 0 1.30 2.66

3 M45 137.5 12.258544 0 1.32 2.68

4 MAM 84.6 13.945055 0 1.36 2.74

... ... ... ... ... ... ...

82 M55 50.0 10.830325 0 2.02 3.93

83 M40 33.6 10.130653 ciclista 3h 1.94 3.93

84 M40 55.4 11.043189 0 1.94 3.94

85 M45 33.2 11.066667 0 2.05 3.95

86 M40 17.9 10.848485 ciclista 5h 2.05 3.98

81 rows × 6 columns

Next steps: toggle_off View recommended plots

datos_maraton['CrossTraining'] .unique()

array([0, 'ciclista 1h', 'ciclista 4h', 'ciclista 13h', 'ciclista 3h',


'ciclista 5h'], dtype=object)

#crear variable vector para hacer remplazo


valores_cross={"CrossTraining": {'ciclista 1h':1,'ciclista 4h':4,'ciclista 13h':13,'ciclista 3h':3,'ciclista 5h':5}}

datos_maraton.replace(valores_cross, inplace=True)
datos_maraton

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 3/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

<ipython-input-29-1202fe270fef>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user


datos_maraton.replace(valores_cross, inplace=True)
Category km4week sp4week CrossTraining Wall21 MarathonTime

0 MAM 132.8 14.434783 0 1.16 2.37

1 MAM 68.6 13.674419 0 1.23 2.59

2 MAM 82.7 13.520436 0 1.30 2.66

3 M45 137.5 12.258544 0 1.32 2.68

4 MAM 84.6 13.945055 0 1.36 2.74

... ... ... ... ... ... ...

82 M55 50.0 10.830325 0 2.02 3.93

83 M40 33.6 10.130653 3 1.94 3.93

84 M40 55.4 11.043189 0 1.94 3.94

85 M45 33.2 11.066667 0 2.05 3.95

86 M40 17.9 10.848485 5 2.05 3.98

81 rows × 6 columns

Next steps: toggle_off View recommended plots

#remplazo en los datos de una columna


datos_maraton['Category'] .unique()
valores_category={"Category":{'MAM':1,'M45':2,'M40':3, 'M50':4, 'M55':5, 'WAM':6}}

datos_maraton.replace(valores_category,inplace=True)

<ipython-input-31-543da5dd9897>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v


datos_maraton.replace(valores_category,inplace=True)

datos_maraton

Category km4week sp4week CrossTraining Wall21 MarathonTime

0 1 132.8 14.434783 0 1.16 2.37

1 1 68.6 13.674419 0 1.23 2.59

2 1 82.7 13.520436 0 1.30 2.66

3 2 137.5 12.258544 0 1.32 2.68

4 1 84.6 13.945055 0 1.36 2.74

... ... ... ... ... ... ...

82 5 50.0 10.830325 0 2.02 3.93

83 3 33.6 10.130653 3 1.94 3.93

84 3 55.4 11.043189 0 1.94 3.94

85 2 33.2 11.066667 0 2.05 3.95

86 3 17.9 10.848485 5 2.05 3.98

81 rows × 6 columns

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 4/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

Next steps: toggle_off View recommended plots

#analisis tiempo de maraton vrs los km en las ultimas 4 semanas


import matplotlib.pyplot as plt
plt.scatter(x = datos_maraton['km4week'], y=datos_maraton['MarathonTime'])
plt.title('Graficacion de km las ultimas 4 semanas vrs el tiempo maraton')
plt.xlabel('km4week')
plt.ylabel('MarathonTime')
plt.show()

import matplotlib.pyplot as plt


plt.scatter(x = datos_maraton['sp4week'], y=datos_maraton['MarathonTime'])
plt.title('Graficacion sp4week vrs MarathonTime')
plt.xlabel('MarathonTime')
plt.ylabel('MarathonTime')
plt.show()

datos_maraton = datos_maraton.query('sp4week<1000')

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 5/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

import matplotlib.pyplot as plt


plt.scatter(x = datos_maraton['sp4week'], y=datos_maraton['MarathonTime'])
plt.title('Graficacion sp4week vrs MarathonTime')
plt.xlabel('MarathonTime')
plt.ylabel('MarathonTime')
plt.show()

#sintaxis para asignar datos de entrenamiento y validacion


datos_entrenamiento = datos_maraton.sample(frac=0.8,random_state=0)
datos_test = datos_maraton.drop(datos_entrenamiento.index)

#visualizacion de datos de entrenamiento


datos_entrenamiento

Category km4week sp4week CrossTraining Wall21 MarathonTime

54 3 70.7 11.783333 0 1.77 3.47

28 2 51.6 13.008403 0 1.50 3.15

31 1 79.4 13.344538 0 1.60 3.19

84 3 55.4 11.043189 0 1.94 3.94

47 2 39.6 12.247423 0 1.67 3.35

... ... ... ... ... ... ...

55 1 26.9 13.121951 0 1.67 3.50

20 1 94.5 11.886792 0 1.45 2.99

79 1 53.9 11.802920 0 1.98 3.90

8 1 70.0 13.770492 1 1.38 2.83

13 3 84.4 13.836066 0 1.41 2.88

64 rows × 6 columns

Next steps: toggle_off View recommended plots

#visualizacion de datos de validación


datos_test

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 6/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory

Category km4week sp4week CrossTraining Wall21 MarathonTime

9 2 84.2 13.365079 0 1.35 2.86

12 2 53.5 14.078947 4 1.37 2.88

21 3 67.3 13.239344 0 1.50 3.04

26 6 129.6 12.188088 0 1.54 3.12

38 1 64.7 13.294521 0 1.50 3.24

39 6 69.2 10.053269 0 1.60 3.25

41 5 58.8 12.829091 0 1.68 3.28

46 6 48.6 12.252101 0 1.66 3.33

48 2 60.1 12.182432 0 1.55 3.36

49 1 78.2 12.000000 0 1.64 3.39

62 2 48.8 11.665339 0 1.66 3.56

68 1 59.1 10.910769 0 1.75 3.65

69 1 41.6 12.235294 0 1.80 3.67

71 4 24.2 11.523810 3 1.76 3.69

75 1 23.9 12.050420 4 1.85 3.78

83 3 33.6 10.130653 3 1.94 3.93

Next steps: toggle_off View recommended plots

#separacion de columna a predecir y de validar


etiquetas_entrenamiento = datos_entrenamiento.pop('MarathonTime')
etiquetas_test = datos_test.pop('MarathonTime')

etiquetas_entrenamiento

54 3.47
28 3.15
31 3.19
84 3.94
47 3.35
...
55 3.50
20 2.99
79 3.90
8 2.83
13 2.88
Name: MarathonTime, Length: 64, dtype: float64

etiquetas_test

9 2.86
12 2.88
21 3.04
26 3.12
38 3.24
39 3.25
41 3.28
46 3.33
48 3.36
49 3.39
62 3.56
68 3.65
69 3.67
71 3.69
75 3.78
83 3.93
Name: MarathonTime, dtype: float64

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 7/8
16/3/24, 22:43 Práctica de maraton.ipynb - Colaboratory
#entrenar el modelo
from sklearn.linear_model import LinearRegression
modelo = LinearRegression()
modelo.fit(datos_entrenamiento, etiquetas_entrenamiento)

▾ LinearRegression
LinearRegression()

#realizamos predicción del modelo


predicciones = modelo.predict(datos_test)
predicciones

array([2.79480021, 2.83189506, 3.05760835, 3.05438992, 3.05923578,


3.29367694, 3.36224426, 3.36170598, 3.17305931, 3.29025649,
3.37473166, 3.51837301, 3.58860937, 3.55560568, 3.67594295,
3.85721606])

#comparacion de los valores de nuestros valores de testo con los datos de prediccion
import numpy as np
from sklearn.metrics import mean_squared_error
error = np.sqrt(mean_squared_error(etiquetas_test, predicciones))
print("Error porcentual: %f" % (error*100))

Error porcentual: 10.922736

#nueva prediccion
nuevo_corredor = pd.DataFrame(np.array([[1,400,15,0,1.4]]),columns=['Category','km4week','sp4week','CrossTraining','Wall21'])
nuevo_corredor

Category km4week sp4week CrossTraining Wall21

0 1.0 400.0 15.0 0.0 1.4

#ver la nueva prediccion


modelo.predict(nuevo_corredor)

array([2.35457459])

https://colab.research.google.com/drive/1JldUTy5iVfGf7htgpg5ZK01IDAeZ0Hr1#scrollTo=JyMHeqAtt410&uniqifier=2&printMode=true 8/8

You might also like