Code free Data Science with Microsoft Azure Machine Learning Studio

by Gilbert Tanner on Jun 17, 2019 · 11 min read

Code free Data Science with Microsoft Azure Machine Learning Studio

In the last weeks, months and even years a lot of tools arose that promise to make the field of data  science more accessible. This isn’t an easy task considering the complexity of most parts of the data science and machine learning pipeline. None the less many libraries and tools including Keras, FastAI, and Weka made it significantly easier to create a data science project by providing us with an easy to use high-level interface and a lot of prebuilt components.

In the last few days, I tried a new cloud product promising to let none-coders create their own machine learning applications called Microsoft Azure Machine Learning Studio (ML Studio).

Let me start out by saying that the concept of ML Studio is really great and even though you still need your statistic knowledge you can build, test and even deploy a machine learning model without writing a single line of code. It allows for this by offering prebuilt building blocks that can be customized and connected together using a visual interface.

ML Studio Visual Interface Example
Figure 1: ML Studio Visual Interface Example

Starting an Experiment

To get started, you first need to navigate to Azure ML Studio and sign in with a Microsoft Account. Once registered and signed in, you will see the homepage which provides you with multiple tabs.

Starting Screen
Figure 2: Starting Screen

On the left we are getting the following options:

  • Projects: Collection of experiments, web-services, data-sets and other things representing a single project
  • Experiments: ML Studio Experiment
  • Web Services: Experiments deployed as a web service
  • Notebooks: Jupyter Notebooks
  • Datasets: Datasets you have uploaded to ML Studio
  • Trained Models: Models trained and saved in experiments
  • Settings: Account Settings

To create a new experiment you need to navigate to the experiment tab and click on the New button. After clicking the button another tab will appear which not only lets you create a new blank experiment but also offers you a lot of samples ranging from basic things like downloading a data-set to classifying text or detecting fraud.

Creating a new experiment
Figure 3: Creating a new experiment

Obtaining the data

On the left of the experiment canvas is a palette offering you all the available pre-built functionality. Here we can also access some sample data-sets like the Automobile price data-set used in the getting started article of the Azure ML Studio documentation but instead of using one of these data-sets for this article I decided to work on the Heart Disease Data-set available on Kaggle.

To use a custom data-set in ML Studio we need to click on the New button navigate to the data-set tab and click on FROM LOCAL FILE. This will open an overlay window where you can upload the data-set.

Upload data
Figure 4: Upload data

With the free workspace you get 10Gb of data storage which should be more than enough for normal users.

Once the file is uploaded you can access it by simply selecting your data-set under the My Datasets category or by searching for the name and then dragging it onto the experiment canvas.

Adding data-set
Figure 5: Adding data-set

A really nice feature of Azure ML Studio is the ability to take a look at your data at any time in the analysis. To do this simply right click on the data-set and navigate to dataset>Visualize.

Visualizing data-set
Figure 6: Visualize data

This will open a new overlay window showing you information of all the columns and rows. In this data-set, each row represents a patient which either has or doesn’t have a health disease.

Figure 7: Data-set

Summarizing the data

Once you loaded in your data-set and took a quick look at it, it is a good idea to get more information about each column. This can be achieved by dragging in the Summarize Data module.

After pressing the RUN button it will give you access to some basic statistics of each column including the unique value count, mean and the standard deviation. The   exact statistics will vary depending on the column type.

Summarize Data module
Figure 8: Summarizing data
Data summarization
Figure 9: Summarizing data II

Preparing the data

Now that you have a “good” understanding of your data it’s time to prepare it for machine learning.

For this data-set, we don’t need to do much preprocessing. The only significant thing we will do to increase the accuracy of our model is to transform categorical features to one-hot encoded features.

Because Azure ML Studio thinks that our categorical columns are numeric we first need to change their type to categorical using the Edit Metadata module.

Edit Metadata
Figure 10: Edit Metadata

By clicking on the Edit Metadata module we launch the Properties tab. Here we need to click on the Launch column selector button. This opens a new window where we can select all columns we want to transform from numerical to categorical.

Select columns to transform
Figure 11: Select columns to transform

Now that we selected the columns we need to change the Data type from Unchanged to Integer and then select Make Categorical from the Categorical tab.

Change data-type
Figure 12: Change data-type

Similarly, to now convert the chosen columns to a one-hot encoding, drag the Convert to Indicator Values module onto the experiment canvas, click onto it to open the Properties tab and select the same columns you selected before. Also, check the Overwrite categorical columns checkbox so that the old columns will get deleted when the conversion happens.

One-hot encoding
Figure 13: One-hot encoding

If we now visualize the output data of the Convert to Indicator Values module we can see that instead of the 14 columns we started off with we now have 31 columns.

Data-set after One-Hot encoding
Figure 14: Data-set after One-hot encoding

Now the data must be split into training and validation set. This can be achieved by dragging the Split Data module onto the experiment canvas, connecting its input to the output of the Convert to Indicator Values, and then navigating to the Properties tab and choosing a right split percentage.

Splitting data
Figure 15: Splitting data

Choosing and applying an algorithm and training it

At this point, we are ready to add a machine learning algorithm of our choice. By expanding the Machine Learning>Initialize Model category on the left of the canvas we get all the available models (We can also create our own using Python or R as discussed at the end of the article).

In order to keep this article simple drag the Two-Class Logistic Regression module onto the canvas. Also, drag the Train Model module from the Machine Learning>Train category onto the canvas, connect the output of the Logistic Regression model to the left input of the Train Model module and the left output of the Split Data module (training data) to the right input of the Train Model module as shown below.

Connecting the model and data to the Train Model module
Figure 16: Connect the model and data to the Train Model module

Next we need to specify a label column by opening the Launch column selector of the Train Model module. In our case, the label column is called target.

Selecting label column
Figure 17: Select label column

Now we can run the experiment again and it should work without errors.

Current state of our experiment
Figure 18: Current state of our experiment

By right clicking on the Train Model module we can now access the trained model and have the option to save it for later usage as well as to visualize information about its settings and learned weights.

Access trained model
Figure 19: Access trained model

Scoring and evaluating the model

Now that we have trained our model, we can use our validation set to see how well our model is doing. We can do this by first of making predictions using the Score Model module and then using the Evaluate Model module to get our accuracy and loss metrics.

Making predictions

To make predictions on the validation set we connect the trained model to the left input of the Score Model Module and the right output node of the Split data module to the right input of the Score Model Module.

Score model module
Figure 20: Score model

When visualizing the output we can see that we have two new columns. The Scored Labels column contains the labels represented by integers of either 0 or 1. The other column called Scored Probabilities gives us the raw probabilities of the predictions.

Figure 21: Predictions

Evaluating the model

To now evaluate the model we drag in the Evaluate Model module and connect the output of the Score Model module to its left input node.

Evaluating our model
Figure 22: Evaluating our model

A really nice feature of Azure ML Studio is that it automatically determines the right metrics for our problem and therefore we don’t need any further configurations and can just run the experiment again and check the results.

Evaluation results
Figure 23: Evaluation results

The evaluation gives us a lot of different classification metrics including ROC, AUC, Precision. It also allows us to vary the threshold of the two classes and see the increase/decrease of the given metrics immediately.

Tuning parameters

With over 84% accuracy our model does a decent job at predicting if a patient has or doesn’t have a heart disease but we can easily get a higher accuracy by tuning our models hyperparameters.

For this, we have two options. We can either left click on our model and tune it manually or we can replace the Train Model module with the Tune Model Hyperparameters module which not only trains the model but also tunes its hyperparameters whilst training.

Tune the parameters manually
Figure 24: Tune the parameters manually
Tuning Hyperparameters using the Tune Model Hyperparameters module
Figure 25: Tune the parameters manually or use the Tune Model Hyperparameters module

The Tune Model Hyperparameters module boosts our accuracy up to over 85%.

Deploying as an Azure Web Service

Probably the best thing about Azure ML Studio is the feature to deploy your   experiment as a web service with only a few clicks. This not only allows you to deploy your model fast but also provides you with a service that you can scale to multiple consecutive requests without adding any complexity.

Azure ML Studio provides multiple ways to transform your experiment to a web service. The easiest one is to just use the SET UP WEB SERVICE button at the bottom of the experiment.

This option converts an experiment into a predictive experiment by eliminating data summary, splits, training and other steps not needed for production.

Predictive Experiment
Figure 26: Predictive Experiment

After running the predictive experiment to ensure it’s working you can deploy it by clicking on the DEPLOY WEB SERVICE button. This will open a new window with two tabs, the dashboard, and a configuration tab.

Using the dashboard you can test your web service, access the API Key and view some general information.

Web Service Overview
Figure 27: Web Service Overview

Using the Configuration tab we can set some general information and edit the input and output schema of the web service.

Web Service Configuration
Figure 28: Web Service Configuration

For more information on how to access and manage the web service as well as the pricing for scaling check out the official documentation.

Using Python/R Inside Azure ML Studio

Lastly, even though we don’t need it for this simple example I want to quickly   mention that Azure ML Studio provides support for both Python and R as well as some chosen functionality of the OpenCV library.

Python or R can be used by either dragging in the Execute Python Script or Execute R Script module onto the canvas and connecting it to the data you want to access.

Python Script
Figure 29: Python Script

With these modules you can perform tasks not currently supported by Azure ML Studio like creating extensive visualizations or performing complex data manipulations.

For more information about the capabilities of the Execute Python Script module take a look at the documentation.

Personal opinion — Ideas for improvements

Even though Azure ML Studio could never be my main driver for creating machine learning experiments/projects I think that Microsoft did a really good job at creating a platform that can be accessed by anyone and offers its users an intuitive interface for creating their machine learning projects.

Like all data science and machine learning tools out there Azure ML Studio still requires a good bit of statistical knowledge to understand things like data preprocessing and choosing a model but this makes sense considering the complexity of both data science and machine learning.

To take Azure ML Studio to the next level I think Microsoft needs to offer more pre-built modules with an extended range of functionality including the ability for extensive exploratory data analysis, feature engineering and model comparison.

All in all, I think that ML Studio will work for a lot of people seeking to build data science projects with an intuitive drag-and-drop interface and deploy them on other Azure Services. Furthermore, because of its simplicity, it is a great tool for getting started with data science and machine learning.

What’s next?

Even though I enjoy exploring completely new tools and presenting them to you I decided to focus on the theory behind data science and machine learning for the next couple of months. Building more in-depth knowledge about the theory of machine learning will allow me to go into more detail in the upcoming articles and will hopefully make me a better educator.

That’s all from this article. Thanks for reading. If you have any feedback, recommendations or ideas of what I should cover next feel free to leave a comment or contact me on social media.