In the last weeks, months and even years a lot of tools arose that promise to make the field of data science more accessible. This isn’t an easy task considering the complexity of most parts of the data science and machine learning pipeline. None the less many libraries and tools including Keras, FastAI, and Weka made it significantly easier to create a data science project by providing us with an easy to use high-level interface and a lot of prebuilt components.
In the last few days, I tried a new cloud product promising to let none-coders create their own machine learning applications called Microsoft Azure Machine Learning Studio (ML Studio).
Let me start out by saying that the concept of ML Studio is really great and even though you still need your statistic knowledge you can build, test and even deploy a machine learning model without writing a single line of code. It allows for this by offering prebuilt building blocks that can be customized and connected together using a visual interface.
Starting an Experiment
To get started, you first need to navigate to Azure ML Studio and sign in with a Microsoft Account. Once registered and signed in, you will see the homepage which provides you with multiple tabs.
On the left we are getting the following options:
- Projects: Collection of experiments, web-services, data-sets and other things representing a single project
- Experiments: ML Studio Experiment
- Web Services: Experiments deployed as a web service
- Notebooks: Jupyter Notebooks
- Datasets: Datasets you have uploaded to ML Studio
- Trained Models: Models trained and saved in experiments
- Settings: Account Settings
To create a new experiment you need to navigate to the experiment tab and click on the New button. After clicking the button another tab will appear which not only lets you create a new blank experiment but also offers you a lot of samples ranging from basic things like downloading a data-set to classifying text or detecting fraud.
Obtaining the data
On the left of the experiment canvas is a palette offering you all the available pre-built functionality. Here we can also access some sample data-sets like the Automobile price data-set used in the getting started article of the Azure ML Studio documentation but instead of using one of these data-sets for this article I decided to work on the Heart Disease Data-set available on Kaggle.
To use a custom data-set in ML Studio we need to click on the New button navigate to the data-set tab and click on FROM LOCAL FILE. This will open an overlay window where you can upload the data-set.
With the free workspace you get 10Gb of data storage which should be more than enough for normal users.
Once the file is uploaded you can access it by simply selecting your data-set under the My Datasets category or by searching for the name and then dragging it onto the experiment canvas.
A really nice feature of Azure ML Studio is the ability to take a look at your data at any time in the analysis. To do this simply right click on the data-set and navigate to dataset>Visualize.
This will open a new overlay window showing you information of all the columns and rows. In this data-set, each row represents a patient which either has or doesn’t have a health disease.
Summarizing the data
Once you loaded in your data-set and took a quick look at it, it is a good idea to get more information about each column. This can be achieved by dragging in the Summarize Data module.
After pressing the RUN button it will give you access to some basic statistics of each column including the unique value count, mean and the standard deviation. The exact statistics will vary depending on the column type.
Preparing the data
Now that you have a “good” understanding of your data it’s time to prepare it for machine learning.
For this data-set, we don’t need to do much preprocessing. The only significant thing we will do to increase the accuracy of our model is to transform categorical features to one-hot encoded features.
Because Azure ML Studio thinks that our categorical columns are numeric we first need to change their type to categorical using the Edit Metadata module.
By clicking on the Edit Metadata module we launch the Properties tab. Here we need to click on the Launch column selector button. This opens a new window where we can select all columns we want to transform from numerical to categorical.
Now that we selected the columns we need to change the Data type from Unchanged to Integer and then select Make Categorical from the Categorical tab.
Similarly, to now convert the chosen columns to a one-hot encoding, drag the Convert to Indicator Values module onto the experiment canvas, click onto it to open the Properties tab and select the same columns you selected before. Also, check the Overwrite categorical columns checkbox so that the old columns will get deleted when the conversion happens.
If we now visualize the output data of the Convert to Indicator Values module we can see that instead of the 14 columns we started off with we now have 31 columns.
Now the data must be split into training and validation set. This can be achieved by dragging the Split Data module onto the experiment canvas, connecting its input to the output of the Convert to Indicator Values, and then navigating to the Properties tab and choosing a right split percentage.
Choosing and applying an algorithm and training it
At this point, we are ready to add a machine learning algorithm of our choice. By expanding the Machine Learning>Initialize Model category on the left of the canvas we get all the available models (We can also create our own using Python or R as discussed at the end of the article).
In order to keep this article simple drag the Two-Class Logistic Regression module onto the canvas. Also, drag the Train Model module from the Machine Learning>Train category onto the canvas, connect the output of the Logistic Regression model to the left input of the Train Model module and the left output of the Split Data module (training data) to the right input of the Train Model module as shown below.
Next we need to specify a label column by opening the Launch column selector of the Train Model module. In our case, the label column is called target.
Now we can run the experiment again and it should work without errors.
By right clicking on the Train Model module we can now access the trained model and have the option to save it for later usage as well as to visualize information about its settings and learned weights.
Scoring and evaluating the model
Now that we have trained our model, we can use our validation set to see how well our model is doing. We can do this by first of making predictions using the Score Model module and then using the Evaluate Model module to get our accuracy and loss metrics.
To make predictions on the validation set we connect the trained model to the left input of the Score Model Module and the right output node of the Split data module to the right input of the Score Model Module.
When visualizing the output we can see that we have two new columns. The Scored Labels column contains the labels represented by integers of either 0 or 1. The other column called Scored Probabilities gives us the raw probabilities of the predictions.
Evaluating the model
To now evaluate the model we drag in the Evaluate Model module and connect the output of the Score Model module to its left input node.
A really nice feature of Azure ML Studio is that it automatically determines the right metrics for our problem and therefore we don’t need any further configurations and can just run the experiment again and check the results.
The evaluation gives us a lot of different classification metrics including ROC, AUC, Precision. It also allows us to vary the threshold of the two classes and see the increase/decrease of the given metrics immediately.
With over 84% accuracy our model does a decent job at predicting if a patient has or doesn’t have a heart disease but we can easily get a higher accuracy by tuning our models hyperparameters.
For this, we have two options. We can either left click on our model and tune it manually or we can replace the Train Model module with the Tune Model Hyperparameters module which not only trains the model but also tunes its hyperparameters whilst training.
The Tune Model Hyperparameters module boosts our accuracy up to over 85%.
Deploying as an Azure Web Service
Probably the best thing about Azure ML Studio is the feature to deploy your experiment as a web service with only a few clicks. This not only allows you to deploy your model fast but also provides you with a service that you can scale to multiple consecutive requests without adding any complexity.
Azure ML Studio provides multiple ways to transform your experiment to a web service. The easiest one is to just use the SET UP WEB SERVICE button at the bottom of the experiment.
This option converts an experiment into a predictive experiment by eliminating data summary, splits, training and other steps not needed for production.
After running the predictive experiment to ensure it’s working you can deploy it by clicking on the DEPLOY WEB SERVICE button. This will open a new window with two tabs, the dashboard, and a configuration tab.
Using the dashboard you can test your web service, access the API Key and view some general information.
Using the Configuration tab we can set some general information and edit the input and output schema of the web service.
For more information on how to access and manage the web service as well as the pricing for scaling check out the official documentation.
Using Python/R Inside Azure ML Studio
Lastly, even though we don’t need it for this simple example I want to quickly mention that Azure ML Studio provides support for both Python and R as well as some chosen functionality of the OpenCV library.
Python or R can be used by either dragging in the Execute Python Script or Execute R Script module onto the canvas and connecting it to the data you want to access.
With these modules you can perform tasks not currently supported by Azure ML Studio like creating extensive visualizations or performing complex data manipulations.
For more information about the capabilities of the Execute Python Script module take a look at the documentation.
Personal opinion — Ideas for improvements
Even though Azure ML Studio could never be my main driver for creating machine learning experiments/projects I think that Microsoft did a really good job at creating a platform that can be accessed by anyone and offers its users an intuitive interface for creating their machine learning projects.
Like all data science and machine learning tools out there Azure ML Studio still requires a good bit of statistical knowledge to understand things like data preprocessing and choosing a model but this makes sense considering the complexity of both data science and machine learning.
To take Azure ML Studio to the next level I think Microsoft needs to offer more pre-built modules with an extended range of functionality including the ability for extensive exploratory data analysis, feature engineering and model comparison.
All in all, I think that ML Studio will work for a lot of people seeking to build data science projects with an intuitive drag-and-drop interface and deploy them on other Azure Services. Furthermore, because of its simplicity, it is a great tool for getting started with data science and machine learning.
Even though I enjoy exploring completely new tools and presenting them to you I decided to focus on the theory behind data science and machine learning for the next couple of months. Building more in-depth knowledge about the theory of machine learning will allow me to go into more detail in the upcoming articles and will hopefully make me a better educator.
That’s all from this article. Thanks for reading. If you have any feedback, recommendations or ideas of what I should cover next feel free to leave a comment or contact me on social media.