Skip to content

This is a Bank Marketing Machine Learning Classification Project in fulfillment of the Udacity Azure ML Nanodegree. It demonstrates the End-to-End Deployment of a classification model. It contains files and documentations with experiment instructions for replicating the project.

Notifications You must be signed in to change notification settings

chollette/Azure-ML-StudioProject_Operationalizing-Machine-Learning

Repository files navigation

Operationalizing Machine Learning

Table of Contents

  1. Overview
  2. Summary
  3. Project Architecture
  4. Key Steps
  5. Screen Recording
  6. Standout Suggestions
  7. Future Work

Overview

This project is part of the Udacity Azure Machine Learning Nanodegree and focuses on operationalizing machine learning models using Azure ML. The project consists of two main components:

  1. A no-code approach using Azure Automated Machine Learning (AutoML) to configure, deploy, and consume a production-ready machine learning model.
  2. A code-based approach using the Azure ML Python SDK to build, deploy, publish, and consume a machine learning pipeline that achieves the same objectives. The Python SDK implementation for this project resides in the notebook file bank-marketing-automl-deployment(main).ipynb.

Together, these two approaches demonstrate Azure ML’s end-to-end platform for building, deploying, and operationalizing machine learning models that can be consumed by external web services through REST APIs.

Summary

The dataset used in this project is the UCI Bank Marketing dataset, which contains information on client responses collected during a direct marketing campaign conducted by a Portuguese banking institution. The objective is to predict whether a client will subscribe to a bank term deposit, represented as either “yes” or “no”.

This problem is therefore framed as a binary classification task, where the goal is to predict the likelihood of a client subscribing to a term deposit.

After training multiple models using AutoML, the Voting Ensemble model emerged as the best-performing model with an accuracy of approximately 92%. This model was selected as the production model and subsequently deployed and consumed via an Azure ML endpoint.

Project Architecture

The overall project architecture is illustrated below.

The individual steps to the realization of the Azure ML End-to-End solution deployment platform comprises of the following.

  1. Authentication
  2. Upload and Register Dataset
  3. Automated ML Experiment
  4. Deploy the best model
  5. Enable logging
  6. Swagger Documentation
  7. Consume model endpoints
  8. Create and publish a pipeline
  9. Documentation

In this project, the Authentication and Documentation steps are not explicitly implemented. Authentication is handled using Udacity-provided access, which already enables role-based access control to Azure resources. Documentation is provided through this README file, which explains the Azure ML pipeline execution and deployment process in detail.

Key Steps

Step 1: Upload and Register the Bank Marketing Dataset to Workspace.

Click Datasets >> from local files >> Enter dataset details >> Click Browse to upload the “bankmarketing_train.csv dataset >> Select “Use header from the first file” >> Create dataset in Azure data store. Once registered, the dataset becomes available in the workspace blob store, as shown below.

Step 2: Create a New Automated ML Experiment.

The Automated ML (AutoML) run trains the dataset using multiple machine learning algorithms and performs hyperparameter tuning to identify the best-performing model.

Click “Automated ML” >> New Auto ML run >> Select dataset >> Enter Experiment name >> Select the Outcome variable >> Configure or Select Compute Cluster >> Select ML Task. Since the target variable contains “yes” and “no” values, classification was the appropriate task. AutoML typically takes between 30 minutes to 1 hour to complete. Upon completion, the experiment status changes from Running to Completed, as shown below.

Step 3: Select the best model.

After the AutoML run completes:

Click Run ID >> Model to view all model runs. The model at the top of the list represents the best-performing model. In this project, the Voting Ensemble model achieved the highest accuracy of approximately 92%, as shown below.

Step 4: Deploy the Best Model.

A deployed model exposes an HTTP endpoint that accepts POST requests and returns predictions.

To deploy the best model:

Click the “Voting Ensemble model” >> navigate to “deploy” button and click >> Enter Model Deployment name >> Select compute type >> Enable Authentication >> Click Deploy. Once deployment completes successfully, the status changes to Succeeded, as shown below.

Step 5: Enable logging.

TAzure ML provides diagnostic logging through Application Insights, which helps monitor performance, track errors, and debug issues.

Logging can be enabled in two ways:

  1. Directly from the deployment settings in Azure ML Studio
  2. Programmatically using a script

In this project, logging was enabled using the Logs.py script. The following images show Application Insights enabled for the deployed endpoint and the successfully generated logs.

The enabled “Application Insights” from the deployed model endpoint.

Step 6: Swagger Documentation.

Azure ML supports Swagger, which provides interactive API documentation for deployed models. Swagger enables both internal and external services to easily interact with the model’s REST API.

In this project, the Swagger documentation is stored in the Swagger folder. The Swagger UI displays the request and response formats for the production model, as shown below.

Step 7: Consume model endpoints.

Once deployed, the model endpoint can be consumed by sending HTTP requests and receiving predictions.

A JSON request script (endpoint.py) was created containing two sample bank customer records. The model response shows a prediction of “Yes” for the first customer and “No” for the second customer, as illustrated below.

To benchmark performance, an Apache benchmarking script (Benchmark.sh) was used to test repeated calls to the endpoint. The data.json file was generated during execution, and the scoring URI and authentication key were retrieved from the deployed service. The benchmark results are shown below.

Step 8: Create and publish a pipeline.

The Azure ML pipeline offers a coding possibility for an End-to-End Machine Learning pipeline using the Python SDK. The processes are summarized:

Create a Pipeline

Using the Python SDK, an Auto ML pipeline is created. Once created a “Run ID” will be generated with a “completed’ status as shown below.

To view the pipeline endpoint, navigate to the Endpoints >> Pipeline endpoints as shown below.

As visualized from the Auto ML module with the Bank Marketing dataset, the Python SDK ML pipeline run also outputted the “Voting Ensemble” as the best model with a 92% accuracy as well.

Create and Publish the Pipeline

Create a Pipeline

Using the Azure ML Python SDK, an end-to-end AutoML pipeline was created. Once executed, the pipeline generated a Run ID with a Completed status, as shown below.

The pipeline endpoint can be viewed by navigating to Endpoints → Pipeline endpoints.

The AutoML pipeline also identified the Voting Ensemble model as the best performer with approximately 92% accuracy, consistent with the no-code AutoML run.

Publish the Pipeline

Once published, the pipeline becomes active and exposes a REST endpoint.

  1. Navigate to Pipelines
  2. Click the pipeline Run ID

The published pipeline shows an Active status and a REST endpoint, as shown below.

The pipeline can also be re-run using the Python SDK and visualized with the RunDetails Widget.

To view the re-run in Azure ML Studio, navigate to Endpoints → Pipeline endpoints, where the pipeline shows a Running status.

Screen Recording

A screencast demonstrating the end-to-end operationalization of the machine learning model using Azure ML Studio is available at the link below:

Azure ML Studio: Operationalizing Machine Learning

Standout Suggestions

Two notable enhancements were implemented in this project:

Exploratory Data Analysis (EDA): The default Azure ML dataset profiler was replicated using Pandas Profiling to generate detailed insights into the dataset. Below is an example of the correlation matrix produced during EDA.

Model Explainability: To analyze feature importance, the model_explainability parameter was set to True in the AutoML configuration. Although full explainability was not successfully generated, the top four contributing features to the best-performing model are shown below.

Future work

Several strategies can be explored to further improve model performance:

  • Implement HyperDrive to tune hyperparameters of the top-performing models using grid search or random search.

  • Evaluate performance using alternative metrics such as AUC or F1-score, which are more robust to class imbalance.

  • Apply class balancing techniques, such as up-sampling the minority class or down-sampling the majority class, to reduce overfitting.

  • Experiment with different cross-validation folds in the Azure ML Python SDK AutoML configuration to identify the optimal validation strategy.

About

This is a Bank Marketing Machine Learning Classification Project in fulfillment of the Udacity Azure ML Nanodegree. It demonstrates the End-to-End Deployment of a classification model. It contains files and documentations with experiment instructions for replicating the project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •