Getting Started

Download the datasets from the Microsoft Azure Predictive Maintenance Kaggle project either directly or by using the Kaggle CLI. Then, upload it to a GCS bucket (in the correct region / project).

kaggle datasets download -d arnabbiswas1/microsoft-azure-predictive-maintenance

Create a virtual environment and pip install requirements.txt locally to ensure you have the necessary versions of the google cloud and kfp libraries installed.

pip install --trusted-host pypip.python.org -r requirements.txt

Create a service account key and copy the resulting json file into the ./data directory.
Fill out the environment variables in env.sh and source the file.

source env.sh

Make sure that have setup your gcloud CLI, including authorizing your service account and/or switching to an active account that you'd like to use that has the correct privileges / permissions.
Run the build_image.sh script to build the component container image and push it to the Artifact Registry (make sure your Docker daemon is running).

. ./build_image.sh

Run the run_pipeline.py script to trigger the end-to-end Vertex AI pipeline (ingests the data into BigQuery from your GCS bucket in Step 1, dbt run, creates and pushes features into Vertex feature store, trains a simple Random Forest classification model via scikit-learn, evaluates the model, and then deploys the model to a Vertex endpoint).

python run_pipeline.py

NOTE: Pipeline will take approximately 1-2 hours to complete

(Optional) If you would like, run the cleanup.py script once you're done and if you don't need the underlying BigQuery dataset, feature store, model, endpoint, or other pipeline assets / resources anymore.

python src/cleanup.py

(Note: You may need to undeploy the model first from the endpoint before being able to delete it)

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
components		components
data		data
dbt		dbt
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build_image.sh		build_image.sh
env.sh		env.sh
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py