Machine learning models aim to make the most accurate predictions possible. Statistical models are designed to infer relationships between variables. In the past, we used to look for data to validate our ideas with questions. Today, we can use data to predict possible problems.

Relevant concepts

Machine learning

Machine Learning (ML) is a multidisciplinary discipline, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in the study of how the computer is to simulate or realize human learning behavior, in order to acquire new knowledge or skills, reorganize the existing knowledge structure, so that it constantly improve its own performance.

Supervised learning: In supervised learning, input data is referred to as “training data”, and each set of training data has a clear identity or outcome. When building a prediction model, supervised learning establishes a learning process that compares the predicted results with the actual results of the “training data” and continually adjusts the prediction model until the predicted results of the model reach a desired accuracy. Common application scenarios of supervised learning include classification problems and regression problems. Common algorithms include Logistic Regression and Back Propagation Neural Network.

Unsupervised learning: In unsupervised learning, data is not specifically identified and the learning model is designed to infer some internal structure of the data. Common application scenarios include association rule learning and clustering. Common algorithms include Apriori algorithm and K-means algorithm.

Semi-supervised learning: In semi-supervised learning, input data is partially labeled and partially unlabeled. This learning model can be used for prediction, but the model first needs to learn the internal structure of the data in order to organize the data reasonably for prediction. Its application scenarios include classification and regression. Common algorithms include some extensions of commonly used supervised learning algorithms. These algorithms, such as Graph Inference or Laplacian SVM, first attempt to model the unlabeled data and then make predictions on the labeled data.

Reinforcement learning: In reinforcement learning, input data serves as feedback to the model, unlike in supervised models, where input data serves only as a way to check that the model is correct. In reinforcement learning, input data is fed directly back to the model, which must be adjusted immediately. Common application scenarios include dynamic systems and robot control. Common algorithms include Q-learning and Temporal Difference Leaming.

Regression algorithm: Regression algorithm is an algorithm that tries to use the measurement of error to explore the relationship between variables.

TFJS

  • Tensors (sors) : TFJS is a framework for defining and running computations in JavaScript using tensors, which are collections of values in the shape of one-dimensional or multidimensional arrays.
  • WebGL: 3D drawing standard in browser platform for storage and mathematical manipulation of tensors.
  • Models: In machine learning, a model is a function with trainable parameters. This function converts input to output.
  • Losses function: The model will aim to minimize losses. This function aims to quantify the “degree of error” of the model’s prediction into concrete numbers.
  • Optimizers: Optimizers are designed to determine how much change can be made to each parameter in the model, given the predictions of the current model.
  • Metrics: Like losses, metrics calculate a number that summarizes how the model is doing. Metrics are usually calculated at the end of each cycle based on the overall data.

statistical

  • Mean-square Error (MSE) : A measure that reflects the degree of difference between an estimator and an estimator.

WHY (Purpose, idea)

Based on the input data of structural attribution and correlation analysis, machine learning training and prediction are performed, and linear regression data and visual charts are outputed.

HOW (to)

Machine learning solution

1. Create a model

Function createModel(numOfFeatures) {// Create a sequential model const model = with the tm.sequential () factory function tf.sequential(); // Input layer // define inputShape defined as 1 // Units Sets the size of the weight matrix in the layer. Add (tf.layers.dense({inputShape: [numOfFeatures], units: 1})); Add (tf.layers.dense({units: 1})); return model; }Copy the code

Two, prepare the data and transform the data into tensors

Normalization formula:

Function convertToTensor(data) {return tf.tidy(() => {const inputs = data.x; Const labels = data.y; Input: [1,2,3] // Output: [[1], [2], [3]] const inputTensor = tf.tensor2d(Inputs); inputs: input: [1,2,3] const labelTensor = tf.tensor2d(labels, [labels.length, 1]); Const inputMax = inputten.max (); const inputMin = inputTensor.min(); const labelMax = labelTensor.max(); const labelMin = labelTensor.min(); const normalizedInputs = inputTensor .sub(inputMin) .div(inputMax.sub(inputMin)); const normalizedLabels = labelTensor .sub(labelMin) .div(labelMax.sub(labelMin)); return { inputs: normalizedInputs, labels: normalizedLabels, inputMax, inputMin, labelMax, labelMin, }; }); }Copy the code

3. Training model

The input tensor x maps to the output tensor y through the equationWhere _kernel_ and _biAS_ are tunable parameters of dense layer. Their values were chosen randomly when the model was created, and these random values are not very predictable. To make more accurate predictions, we must learn from the data through the model to find better _kernel_ values and deviation values. This search is the training process.

function trainModel(model, inputs, labels) { return new Promise((resolve, reject) => { model.compile({ optimizer: 'sgd', loss: 'meanAbsoluteError', metrics: ['mse'], }); const batchSize = 32; // Batch size const epochs = 200; Return model. fit(inputs, labels, {batchSize, epochs, callbacks: {onEpochEnd: inputs, labels, {batchSize, epochs, callbacks: {onEpochEnd: inputs, labels, {batchSize, epochs, callbacks: (epoch, logs) => console.log(logs) }, }) .then(res => { resolve(res); }) .catch(err => { reject(err); }); }); }Copy the code

Four, forecasting

function predict(model, inputData, normalizationData) { const { inputMax, inputMin, labelMin, labelMax } = normalizationData; Const arr = tf.Tidy (() => {// Create 100 new examples to supply to the model const xs = tf.linspace(0, 1, 100); 00 00 00 00 00 00 00 00 00 00 0 // Convert data to the original const unNormXs = xs.mul(inputMax.sub(inputMin)).add(inputMin); const unNormPreds = preds.mul(labelMax.sub(labelMin)).add(labelMin); return [unNormXs.dataSync(), unNormPreds.dataSync()]; }); const xs = arr[0]; const preds = arr[1]; const predictedPoints = Array.from(xs).map((val, i) => { return { x: val, y: preds[i] }; }); const originalPoints = inputData.map(d => ({ x: d.x, y: d.y, })); return { predictedPoints, originalPoints, }; }Copy the code

Visual effects (unary linear regression as an example)

WHAT (phenomenon, result)

Mean square error (mse)

The mean square error is the average sum of the squares of the distances from the true value of the data.

Formula:

The evaluation method of the prediction results is obtained by comparing the loss value of the loss function with the mean square error calculated manually. The lower the loss value is compared with the mean square error calculated manually, the more accurate the prediction results are.

function evaluate(inputs, labels) {
  const result = labels.sub(inputs).pow([2]).mean();
  result.print();
  return result;
}
Copy the code

Evaluate the results

The mean square error calculated manually is about 0.16. After setting EPOCHS as 200, the loss value is 0.017, indicating that the prediction data is many times more accurate than the manual calculation.

reference

The depth study of Javascript: [wendydesigner. Making. IO/DLwithjs -…

] (link.zhihu.com/?target=htt…).