- Download the datasets from the Microsoft Azure Predictive Maintenance Kaggle project either directly or by using the Kaggle CLI. Then, upload it to a GCS bucket (in the correct region / project).
kaggle datasets download -d arnabbiswas1/microsoft-azure-predictive-maintenance
- Create a virtual environment and pip install requirements.txt locally to ensure you have the necessary versions of the google cloud and kfp libraries installed.
pip install --trusted-host pypip.python.org -r requirements.txt
-
Create a service account key and copy the resulting json file into the ./data directory.
-
Fill out the environment variables in env.sh and source the file.
source env.sh
-
Make sure that have setup your gcloud CLI, including authorizing your service account and/or switching to an active account that you'd like to use that has the correct privileges / permissions.
-
Run the build_image.sh script to build the component container image and push it to the Artifact Registry (make sure your Docker daemon is running).
. ./build_image.sh
- Run the run_pipeline.py script to trigger the end-to-end Vertex AI pipeline (ingests the data into BigQuery from your GCS bucket in Step 1, dbt run, creates and pushes features into Vertex feature store, trains a simple Random Forest classification model via scikit-learn, evaluates the model, and then deploys the model to a Vertex endpoint).
python run_pipeline.py
NOTE: Pipeline will take approximately 1-2 hours to complete
- (Optional) If you would like, run the cleanup.py script once you're done and if you don't need the underlying BigQuery dataset, feature store, model, endpoint, or other pipeline assets / resources anymore.
python src/cleanup.py
(Note: You may need to undeploy the model first from the endpoint before being able to delete it)