Introduction to Machine Learning in C# with ML.NET

Introduction to Machine Learning in C# with ML.NET

When thinking of data science and machine learning two programming languages, Python and R, immediately come to mind. These two languages have support for every common machine learning algorithm, preprocessing techniques and much more and can therefore be used for almost every machine learning problem.

However, sometimes an individual or company can't or doesn't want to use Python or R. This can be because of one of many reasons including already having a code-base in another language or having no experience in Python or R. One of the most popular languages today is C# which is used for many applications. To use the power of Machine Learning in C# Microsoft  created a package called ML.NET which provides all the basic Machine  Learning functionality.

In this article, I will show you how to use ML.NET to create a binary classification model, discuss its AutoML capabilities and show you how to use a Tensorflow model with ML.NET. The complete code for the binary classification model can be found on my Github.

Adding ML.NET to a C# project

Adding ML.NET to your C# or F# project is actually fairly easy. The only thing needed is to install the Microsoft.ML package. Depending on your use-case you might need to also install some extra packages like Microsoft.ML.ImageAnalytics, Microsoft.ML.TensorFlow or Microsoft.ML.OnnxTransformer.

Adding ML.NET to a C# project
Figure 1: Installing ML.NET

Loading in a data-set and creating a data pipeline

Loading in and preprocessing a data-set in ML.NET is quite different than when working with other machine learning packages/frameworks because it requires us to explicitly state the structure of our data. To do so we create a file called ModelInput.cs inside a folder called DataModels. Inside this file we will state all the columns of our data-set.

For this article, we will use the Credit Card Fraud Detection data-set which can be freely downloaded on Kaggle. This data-set contains 31  columns. The class of the transaction, either 0 or 1,  the amount of the transaction, the time the transaction occurred as well as 28 anonymous  features.

Credit Card Fraud Detection data-set
Figure 2: Credit Card Fraud Detection data-set
using Microsoft.ML.Data;

namespace CreditCardFraudDetection.DataModels
{
    public class ModelInput
    {
        [ColumnName("Time"), LoadColumn(0)]
        public float Time { get; set; }

        [ColumnName("V1"), LoadColumn(1)]
        public float V1 { get; set; }

        [ColumnName("V2"), LoadColumn(2)]
        public float V2 { get; set; }

        [ColumnName("V3"), LoadColumn(3)]
        public float V3 { get; set; }

        [ColumnName("V4"), LoadColumn(4)]
        public float V4 { get; set; }

        [ColumnName("V5"), LoadColumn(5)]
        public float V5 { get; set; }

        [ColumnName("V6"), LoadColumn(6)]
        public float V6 { get; set; }

        [ColumnName("V7"), LoadColumn(7)]
        public float V7 { get; set; }

        [ColumnName("V8"), LoadColumn(8)]
        public float V8 { get; set; }

        [ColumnName("V9"), LoadColumn(9)]
        public float V9 { get; set; }

        [ColumnName("V10"), LoadColumn(10)]
        public float V10 { get; set; }

        [ColumnName("V11"), LoadColumn(11)]
        public float V11 { get; set; }

        [ColumnName("V12"), LoadColumn(12)]
        public float V12 { get; set; }

        [ColumnName("V13"), LoadColumn(13)]
        public float V13 { get; set; }

        [ColumnName("V14"), LoadColumn(14)]
        public float V14 { get; set; }

        [ColumnName("V15"), LoadColumn(15)]
        public float V15 { get; set; }

        [ColumnName("V16"), LoadColumn(16)]
        public float V16 { get; set; }

        [ColumnName("V17"), LoadColumn(17)]
        public float V17 { get; set; }

        [ColumnName("V18"), LoadColumn(18)]
        public float V18 { get; set; }

        [ColumnName("V19"), LoadColumn(19)]
        public float V19 { get; set; }

        [ColumnName("V20"), LoadColumn(20)]
        public float V20 { get; set; }

        [ColumnName("V21"), LoadColumn(21)]
        public float V21 { get; set; }

        [ColumnName("V22"), LoadColumn(22)]
        public float V22 { get; set; }

        [ColumnName("V23"), LoadColumn(23)]
        public float V23 { get; set; }

        [ColumnName("V24"), LoadColumn(24)]
        public float V24 { get; set; }

        [ColumnName("V25"), LoadColumn(25)]
        public float V25 { get; set; }

        [ColumnName("V26"), LoadColumn(26)]
        public float V26 { get; set; }

        [ColumnName("V27"), LoadColumn(27)]
        public float V27 { get; set; }

        [ColumnName("V28"), LoadColumn(28)]
        public float V28 { get; set; }

        [ColumnName("Amount"), LoadColumn(29)]
        public float Amount { get; set; }

        [ColumnName("Class"), LoadColumn(30)]
        public bool Class { get; set; }
    }
}

Here we are creating a field for each of the columns inside in our data-set. The important thing is to specify the right data-type and column index.

Now that we have our data modeled we need to also model what our output should look like. This can be done in a similar fashion as the above  script.

using Microsoft.ML.Data;

namespace CreditCardFraudDetection.DataModels
{
    public class ModelOutput
    {
        [ColumnName("PredictedLabel")]
        public bool Prediction { get; set; }

        public float Score { get; set; }
    }
}

Here, we have two fields. The score field denotes the output in percent whilst the prediction field is a boolean.

Now that we have both the input and output data modeled we can load in our actual data using the LoadFromTextFile method.

IDataView trainingDataView = mlContext.Data.LoadFromTextFile<ModelInput>(
                                            path: dataFilePath,
                                            hasHeader: true,
                                            separatorChar: ',',
                                            allowQuoting: true,
                                            allowSparse: false);

Creating and training a model

To create and train a model using ML.NET we need to create a pipeline which contains both the needed data-preprocessing and the training algorithm. For this particular data-set it is hard to do any preprocessing because it has 28 anonymous features and therefore I chose to keep it simple and only concatenated all features (this must always be done in ML.NET).

var dataProcessPipeline = mlContext.Transforms.Concatenate("Features", new[] { "Time", "V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "Amount" });

For the model I choose the LightGBM algorithm. This algorithm isn't actually included in Microsoft.ML from the start and therefore you need to install Microsoft.ML.LightGbm to use it.

// Choosing algorithm
var trainer = mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "Class", featureColumnName: "Features");
// Appending algorithm to pipeline
var trainingPipeline = dataProcessPipeline.Append(trainer);

Now we can train the model with the Fit method and save it using mlContext.model.save.

ITransformer model = trainingPipeline.Fit(trainingDataView);
mlContext.Model.Save(model , trainingDataView.Schema, <path>);

Evaluating a model

Now that our model is trained, we need to check its performance. The easiest way to do this is using  cross validation.  ML.Net offers us cross validation methods for all kinds of different data-set. Because our data-set is a binary classification data-set we will use mlContext.BinaryClassification.CrossValidateNonCalibrated method to score our model.

var crossValidationResults = mlContext.BinaryClassification.CrossValidateNonCalibrated(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: "Class");

Making predictions

Making predictions on new data is really straight forward using ML.NET. We only need to create a PredictionEngine, another representation of our model specifically made for inference, and call its Predict method passing it a ModelInput object.

var predEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(mlModel);

ModelInput sampleData = new ModelInput() {
    time = 0,
    V1 = -1.3598071336738,
    ...
};

ModelOutput predictionResult = predEngine.Predict(sampleData);

Console.WriteLine($"Actual value: {sampleData.Class} | Predicted value: {predictionResult.Prediction}");

Auto-ML

Another great thing about ML.NET is its excellent implementation of Auto-ML. Using Auto-ML we can build basic Machine Learning solutions by only  specifying what problem we are working on and providing our data.

To get started with Auto-ML in ML.NET you need to download the "ML.NET Model Builder (Preview)" Visual Studio Extension. This can be done through the extensions tab.

After successfully installing the extension, Auto-ML can be used by right-clicking on your project inside the Solution Explorer and selecting Add->Machine Learning.

Add Auto-ML to project
Figure 3: AutoML

This opens the Model Builder window. The model builder will guide you through the process of building a machine learning model.

Model Builder window
Figure 4: Model Builder

For information about how to get through all the individual steps be sure to check out the official getting started tutorial. After you finished all the steps the Model Builder will automatically generate the code.

Using a pre-trained Tensorflow model

Another great thing about ML.NET is that it allows us to use Tensorflow and ONNX models for inference. To use a Tensorflow model you need to install Microsoft.ML.TensorFlow using NuGet. After installing the necessary package you can load in a Tensorflow model using the Model.LoadTensorFlowModel method. After that, you need to call the ScoreTensorFlowModel method and pass it the input and output layer name.

private ITransformer SetupMlnetModel(string tensorFlowModelFilePath)
{
    var pipeline = _mlContext.<preprocess-data>
           .Append(_mlContext.Model.LoadTensorFlowModel(tensorFlowModelFilePath)
                                               .ScoreTensorFlowModel(
                                                      outputColumnNames: new[]{TensorFlowModelSettings.outputTensorName },
                                                      inputColumnNames: new[] { TensorFlowModelSettings.inputTensorName },
                                                      addBatchDimensionInput: false));
 
    ITransformer mlModel = pipeline.Fit(CreateEmptyDataView());
 
    return mlModel;
}

For more information on how to use a Tensorflow Model in ML.NET check out 'Run with ML.NET C# code a TensorFlow model exported from Azure Cognitive Services Custom Vision'.

Conclusion

ML.NET is a .NET package that allows you to create and use Machine Learning models inside .NET. In this article, you learned how to use ML.NET to create a Credit Card Fraud detection model.

That’s all from this article. If you have any questions or just want to chat with me feel free to leave a comment below or contact me on social media. If you want to get continuous updates about my blog make sure to follow me on Medium and join my newsletter.

What's next?

In my second ML.NET article, I will show you how to use ML.NET and ASP.NET  to create a Restful API for a deep neural network written in  Tensorflow.