Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

Let’s take a look at how machine learning can be used in healthcare and how to process patient monitoring data with Apache Spark

Today, the combination of IoT data, streaming Analytics, machine learning, and distributed computing has come a long way, at a lower cost, allowing us to store and analyze more data faster than ever before.

Here are some examples of IoT, big data, and machine learning working together to accomplish tasks:

  • Health care: continuous monitoring of chronic diseases
  • Smart city: Management and guidance of traffic flow and congestion
  • Manufacturing: Structural optimization and prediction of whether buildings will require maintenance
  • Transportation: route planning to reduce fuel consumption
  • Cars: Driverless cars
  • Telecommunications, information transmission: exception checking
  • Retail: Location-based advertising recommendations

To understand the IoT, streaming data is combined with machine learning why can promote the effect of health care, should first understand the chronic diseases, such as heart disease is a disease of the human, the people most of the medical expenses are made on them, and one of the key is how well the nursing care of chronic diseases, in order to avoid unnecessary chronic patients in hospital. Machine learning can monitor vital signs with cheaper sensors, allowing doctors to prescribe smarter prescriptions more quickly based on a patient’s condition, a model that also holds potential for low-cost, scalable chronic disease management.

Results from a team at Stanford University show that machine learning models can identify arrhythmias from electrocardiograms (EKG) better than specialists can.

As Michael Chui of the McKinsey Global Institute says, “Sensors placed on patients allow remote real-time monitoring to provide early warning, avoiding the onset of chronic diseases and costly care. And better care for congestive heart failure alone could save the United States $1 billion a year.”

Monitoring data can be analyzed in real time and alerts can be sent to caregivers if necessary so that they can know about changes in the patient’s condition in real time. A low rate of false positives and an abnormal alert for a genuine emergency are essential; One patient at UCSF died after taking 39 times more antibiotics than normal. The 39 times more warnings look exactly the same as the 1 percent more warnings, so that doctors and pharmacists who see them too many times often don’t read them.

In this article, we will discuss the application of flow machine learning in anomaly detection of cardiac monitoring data, using this example to show how digital medical technology can be applied. We will discuss in detail how technology can be used to control the accuracy of triggering alarms to reduce false positives. Details of the app, which is based on Practical Machine Learning by Ted Dunning and Ellen Friedman, can be viewed here as a PDF: A New Look At Anomaly Detection.

What is machine learning?

Machine learning uses algorithms to find and model corresponding patterns in data to recognize these patterns and predict them on new data.

What is exception detection?

Anomaly detection is an example of unsupervised learning.

The unsupervised learning algorithm does not need to obtain the category or target value of the sample in advance. It is often used to find patterns and similarities in input data — for example, to group similar customers by their consumption data.

Anomaly detection first establishes a pattern/group of normal signs, then compares the observed signs with them and raises an alarm if a significant deviation is determined. Under this approach, we initially did not have a data set of heart conditions under abnormal conditions that we wanted to classify. So we first looked at the literature for the bias value and evaluated it in near real time.

Build the model by clustering

Cardiologists have defined the waveform pattern of a normal electrocardiogram; We use these patterns to train the model to predict the observed values at subsequent times based on previously observed heartbeat activity and compare them with actual values to evaluate abnormal behavior.

To normal heartbeat behavior modeling, we processing to extract a electrocardiogram and split it for about a third of a second segment (data from a particular patient or one of many patients to extract, overlapping between fragment and fragments), then by clustering algorithm for grouping similar waveform.

Clustering algorithms can group the data in a data set. After the training of clustering algorithm, the samples can be classified into corresponding categories by analyzing the similarity between the input samples. The K-means clustering algorithm will divide the observed values into K groups, and which group each observation value belongs to depends on the shortest distance between the sample and the cluster center.

In the following Apache Spark code, we do the following:

  • Convert the ECG data into a vector.
  • Create k-means objects and set the number of clustering and the maximum number of iterations of clustering algorithm training.
  • Train the model on input data.
  • Save the model for later use.

The reconstruct reconstruct of the ECG waveform (s) in our catalog. The reconstruct of the ecg waveform (s) in our catalog is the best.

The real-time stream data is processed by using the model derived from normal data

In order to obtain the real ecg data with normal heartbeat modeling step on behavior, when the heartbeat waveform to the overlap in the left of the green wave, we can get the normal waveform matching, as shown in red in the middle of the waveform, superposition of the two red waveform can be similar to the left side of the green wave of normal ecg waveform. (To reconstruct the waveform from overlapping slices, we multiplied a sinusoidal window function.)

In the Apache Spark code below, we complete the following steps:

  • Each RDD in DStream is processed using DStream’s foreachRDD method.
  • The ecg data are interpreted as vectors.
  • The clustering model is used to get the normal waveform category corresponding to the waveform in the current time window.
  • Information was created from the currently obtained class IDS, 32 actual ECG observation points, and 32 reconstructed ECG data points.
  • Pass the enhanced information to another MAPR-ES topic

The actual observed ECG and reconstructed normal ECG waveform are displayed in the real-time dashboard

Vert.x, a microservice toolkit for building interactive time-driven microservices, was used to build a real-time Web application displaying observed ECG waveforms and reconstructed normal ECG data. In this Web application:

  • Vert.x Kafka client consumes enhanced ECG data from MAPR-ES Topic and will push messages on vert. x event bus.
  • The JavaScript browser client subscribed to vert. x’s Event bus and used SockJS to display the observed ECG waveform (green) and the reconstructed expected waveform (red).

Anomaly detection

The difference between the observed ecg waveform and the expected ECG waveform (green waveform minus red waveform) is the reconstruction error, also known as the residual (corresponding to the yellow waveform below), if the residual is large, then there may be an anomaly.

The goal of anomaly detection is to find real problems while ensuring low false positive rate. The challenge here is to determine the residual threshold that triggers the alarm.

T-digest algorithm can evaluate the size of reconstruction error based on the distribution of data set. By incorporating this algorithm into the anomaly detection workflow you can set the number of alerts as a percentage of the total observations. The T-Digist algorithm can estimate its distribution (especially the long-tail distribution, which is also the distribution of outliers that we are usually concerned with) very accurately with just the right amount of samples. Once you have an estimate of the distribution, you can set thresholds for generating alarms. For example, setting a threshold of 99% would generate approximately one alert per 100 rebuilds, which would produce a relatively large number of alerts (which should be rare according to the definition of exceptions). When set to 99.9%, it takes roughly 1,000 rebuilds to generate one alert.

conclusion

This paper describes how a streaming system can use incoming cardiac monitoring data for anomaly detection, showing how the data can be compared with subsequent contextual data through an autoencoder model to detect abnormal heartbeat data. This article is also an example of how IoT, real-time streaming data, machine learning and data visualization, and alarm scenarios can be combined to improve health care worker productivity and reduce maintenance costs.

In different IoT scenarios, enterprises are required to collect and aggregate data to understand the events and situations that occur in the entire device cluster. In addition, according to Jack Norris of MapR, enterprises should also inject intelligence into boundary events so that they can react to them more quickly. Having a common data structure helps you process all data the same way, control access to data and apply intelligent algorithms in a faster and more efficient way.

Question and answer


What are the application scenarios of the Iot based on cloud computing?


reading


From Internet + to smart intelligence, how fast is the development of the medical industry?


Applying machine learning to the Internet of Things: Using Android Things with TensorFlow


Insights into data science automation in big data and iot environments

Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1106060?fromSource=waitui