This article is for those with Java roots

Author: DJL – Keerthan&Lanking

HelloGitHub introduces the Open Source project series. This is the fourth installment in our series on DJL, a deep learning platform built entirely out of Java, by Amazon engineer Keerthan Vasist.

One, foreword

Java has been a popular programming language in the enterprise for a long time. Java has a large developer community thanks to its rich ecosystem and well-maintained packages and frameworks. Despite the evolution and implementation of deep learning applications, frameworks and libraries for Java developers are in short supply. The main popular deep learning models today are compiled and trained in Python. For Java developers, entering the world of deep learning involves relearning and embracing a new programming language as well as learning the complexities of deep learning. This makes it difficult for most Java developers to learn and transition to deep learning development.

To reduce the cost of Deep learning for Java developers, AWS has built Deep Java Library (DJL), an open source Deep learning framework customized for Java developers. It provides a bridge for Java developers to connect to mainstream deep learning frameworks.

In this article, we will attempt to build a deep learning model using DJL and train MNIST handwritten number recognition tasks with it.

What is deep learning?

Before we get started, let’s take a look at the basic concepts of machine learning and deep learning.

Machine learning is the process of using statistical knowledge to input data into a computer for training and performing a specific task. This inductive learning method allows computers to learn features and perform a series of complex tasks, such as identifying objects in photographs. These tasks are difficult to accomplish in traditional computational science because of the complex logic and measurement criteria required to write.

Deep learning is a branch of machine learning, which mainly focuses on the development of artificial neural networks. Artificial neural network is a set of computational logic derived from studying how the human brain learns and achieves goals. It simulates the process of information transmission between human brain nerves to achieve various complex tasks. The “depth” in deep learning comes from the fact that we weave many layers in artificial neural network to conduct data information in a deeper level. Deep learning technology has a wide range of applications, and is now used for target detection, action recognition, machine translation, semantic analysis and other practical applications.

Training MNIST handwritten number recognition

3.1 Project Configuration

You can import dependencies with the gradle configuration as follows. In this case, we use DJL’s API package (the core DJL component) and basicdataset package (the DJL dataset) to build the neural network and dataset. In this case, we use MXNet as the deep learning engine, so we will introduce two packages mxnet-Engine and Mxnet-native Auto. This example can also run under the PyTorch engine, simply by replacing it with the corresponding package.

Plugins {id 'Java'} repositories {jCenter ()} dependencies {implementation Platform ("ai.djl:bom:0.8.0") implementation "ai.djl:api" implementation "ai.djl:basicdataset" // MXNet runtimeOnly "ai.djl.mxnet:mxnet-engine" runtimeOnly "ai.djl.mxnet:mxnet-native-auto" }Copy the code

3.2 NDArray and NDManager

Dar Ray is the basic structure for DJL to store data structures and mathematical operations. An NDArray represents a multi-dimensional array of fixed length. NDArray is used similarly to numpy. NDArray in Python.

NDManager is the owner of NDArray. It manages the generation and recycling of NDArray, which helps optimize Java memory. Each NDArray will be created by an NDManager and will be closed together when NDManager is closed. Both NDManager and NDArray are built with Java’s AutoClosable, which ensures timely collection at the end of the run. For more on their use and practice, see our previous article:

HelloGitHub DJL: DJL Java plays with multi-dimensional arrays, just like NumPy

Model

In DJL, training and reasoning are built from Model Class. Here we mainly talk about the construction method in the training process. Let’s create a new target for Model. Since Model also inherits the AutoClosable structure, we’ll implement it with a try block:

try (Model model = Model.newInstance()) {
    ...
    // Body training code. }Copy the code

To prepare data

The MNIST (Modified National Institute of Standards and Technology) database contains a large number of graphs of handwritten numbers, which are often used to train image processing systems. DJL has included MNIST data sets into basicdataset, and the size of each MNIST graph is 28 x 28. If you have your own dataset, you can also use the DJL Dataset Import tutorial to import datasets into your training tasks.

DJL. Ai /docs/ Develo…

int batchSize = 32; / / batch size
Mnist trainingDataset = Mnist.builder()
        .optUsage(Usage.TRAIN) / / training set
        .setSampling(batchSize, true)
        .build();
Mnist validationDataset = Mnist.builder()
        .optUsage(Usage.TEST) / / validation set
        .setSampling(batchSize, true)
        .build();
Copy the code

This code produces training and validation sets, respectively. We also randomly arranged the data set for better training. In addition to these configurations, you can also add further manipulation to the image, such as resizing the image, normalizing the image, etc.

Make model (build Block)

When your data set is ready, we can build the neural network. In DJL, a neural network is made up of blocks. A Block is a structure with multiple neural network properties. They can represent an operation, a part of a neural network, or even an entire neural network. Blocks can then be executed sequentially or in parallel. The Block itself can also take parameters and child blocks. This nesting structure allows us to construct a complex yet maintainable neural network. During training, the parameters attached to each Block are updated in real time, as well as their child blocks. This recursive updating process ensures that the whole neural network is fully trained.

As we build these blocks, the easiest way to do this is to nest them one by one. Using the djL-ready Block class directly, we can quickly make all kinds of neural networks.

Based on several basic neural network working modes, we provide several variants of Block. A SequentialBlock is constructed to handle sequential execution of each child Block. It takes the output of the previous child Block as input to the next Block. Parallelblocks, on the other hand, are used to put one input into each of the subblocks in parallel, while combining the output according to specific merge equations. Finally, LambdaBlock is aBlock that helps users do quick operations. It has no parameters, so no parts are updated during training.

Let’s try to create a basic multilayer perceptron (MLP) neural network. Multilayer perceptron is a simple forward neural network, which contains only a few full linear blocks. So to build this network, we can use SequentialBlock directly.

int input = 28 * 28; // Input layer size
int output = 10; // Output layer size
int[] hidden = new int[] {128.64}; // Hide the layer size
SequentialBlock sequentialBlock = new SequentialBlock();
sequentialBlock.add(Blocks.batchFlattenBlock(input));
for (int hiddenSize : hidden) {
    // Full connection layer
    sequentialBlock.add(Linear.builder().setUnits(hiddenSize).build());
    // Activate the function
    sequentialBlock.add(activation);
}
sequentialBlock.add(Linear.builder().setUnits(output).build());
Copy the code

DJL also provides a ready-to-use MLP Block:

Block block = new Mlp(
        Mnist.IMAGE_HEIGHT * Mnist.IMAGE_WIDTH,
        Mnist.NUM_CLASSES,
        new int[] {128.64});
Copy the code

training

Once we have the data set and neural network ready, we can start training the model. In deep learning, the following steps are generally used to complete a training process:

  • Initializer: We initialize the parameters of each Block. The function used to initialize each parameter is determined by the specified Initializer.
  • Forward propagation: This step passes the input data layer by layer through the neural network, and then produces the output data.
  • Loss calculation: We will calculate the deviation between the output and the labeled results according to the specific Loss function Loss.
  • Backpropagation: In this step, you can calculate the gradient of each parameter by taking the reverse derivative of the loss.
  • Update weight: We update the value of each parameter on the Block according to the selected Optimizer.

DJL leverages the Trainer structure to streamline the process. All you need to do is create the Trainer and specify Initializer, Loss, and Optimizer. These parameters are set by TrainingConfig. Let’s look at the specific parameter Settings:

  • TrainingListener: This is the listener set for the training process. It can feedback the training results of each stage in real time. These results can be used to record the training process or debug the problems in the neural network training process. Users can also customize their own TrainingListener to monitor the training process.
DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
    .addEvaluator(new Accuracy())
    .addTrainingListeners(TrainingListener.Defaults.logging());
try (Trainer trainer = model.newTrainer(config)){
    // Training code
}
Copy the code

When the trainer is generated, we can define the Shape of the input. You can then call the FIT function for training. The FIT function will train multiple Epochs on the input data and eventually store the results in a local directory.

/* * MNIST contains 28x28 grayscale images and imports them into 28 * 28 NDArray. * The first dimension is the batch size, where we set the batch size to 1 for initialization. * /
Shape inputShape = new Shape(1, Mnist.IMAGE_HEIGHT * Mnist.IMAGE_WIDTH);
int numEpoch = 5;
String outputDir = "/build/model";

// Initialize the trainer with input
trainer.initialize(inputShape);

TrainingUtils.fit(trainer, numEpoch, trainingSet, validateSet, outputDir, "mlp");
Copy the code

That’s all there is to it! Is it still easy to train with DJL? Then look at the output of each step of the training results. If you use our default listener, the output will look something like this:

[INFO ] - Downloading libmxnet.dylib ... [INFO] -training on: CPU (). [INFO] -load MXNet Engine Version 1.7.0 in 0.131 ms. 100% | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | Accuracy: SoftmaxCrossEntropyLoss: 0.93, 0.24, speed: 1235.20 the items/SEC Validating: 100% | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | [INFO] - Epoch 1 finished. [INFO] - "Train" : Accuracy: 0.93, SoftmaxCrossEntropyLoss: 0.24 [INFO] -validate: 0.95, SoftmaxCrossEntropyLoss: 0.14 Training: 100% | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | Accuracy: SoftmaxCrossEntropyLoss: 0.97, 0.10, speed: 2851.06 the items/SEC Validating: 100% | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | [INFO] - 2 finished Epoch. NG [1 m 41 s] [INFO] - "Train" : Accuracy: 0.97, SoftmaxCrossEntropyLoss: 0.10 [INFO] - Validate: Accuracy: 0.97, SoftmaxCrossEntropyLoss: 0.09 [INFO] -train P50: 12.759ms, P90: 21.759ms [INFO] -train P50: 0.759ms, P90: 21.759ms [INFO] -train P50: 0.759ms, P90: 0.607 ms [INFO] -training -metrics P50: 0.021 ms, P90: 0.034 ms [INFO] -Backward P50: 0.608 ms, P90: 0.973 ms [INFO] -Step P50: 0.543 ms, P90: 0.869 ms [INFO] -epoch P50: 35.989 s, P90: 35.989 sCopy the code

When the training results are complete, we can use the model for reasoning to recognize handwritten numbers. If you don’t have a clear idea of what to do, you can use the following two links to try it out.

Handwriting data set training: docs. DJL. Ai /examples/do…

Handwriting data set reasoning: docs. DJL. Ai/Jupyter /tut…

Four, the last

In this article, we introduce the basic concepts of deep learning, as well as how to elegantly use DJL to build a deep learning model and conduct training. DJL also provides more diverse data sets and neural networks. If you are interested in learning about deep learning, check out our Java Deep Learning book.

Java Deep Learning book: zh.d2l.ai/

The Deep Java Library (DJL) is a Java-based Deep learning framework that supports both training and reasoning. DJL is well-received, built on multiple deep learning frameworks (TenserFlow, PyTorch, MXNet, etc.) and has the best features of multiple frameworks. You can easily use DJL to train and then deploy your models.

It also has a strong model library support: you can easily read a variety of pre-trained models in a single line. DJL’s model library now supports up to 70 models from GluonCV, HuggingFace, TorchHub, and Keras.

Project address: github.com/awslabs/djl…