With the rapid development of medical and health informatization

Medical research is entering the era of big data

Many of the promises of big data are becoming reality in the healthcare industry

Real-time processing and data analysis of big data

It allows people in the medical field

Make decisions and actions faster and more comprehensively

The field is slowly maturing

With the improvement of cloud computing, Internet of Things, mobile Internet and other new technologies, the accumulated data of all walks of life has shown exponential growth. The era of “big data” has arrived.

In recent years, big data solutions and big data analysis tools have been widely used in the field of healthcare. Through data, the valuable experience accumulated by medical experts can be transformed into standardized knowledge base, so as to achieve data-driven medical services, thus greatly improving service capacity and efficiency, and solving many demands in China’s medical field. But what exactly is healthcare big data? What is its “big”?

one Big data

The types of big data can be roughly divided into the following two types:

The first type is products, services, and insights that generate tremendous value through the analysis of vast amounts of data, which we call “verb definitions.”

The second type is the summation of the subversive changes in the decision-making process, business model, scientific paradigm, lifestyle and ideology generated by the analysis of the massive data (data amount, data form and data analysis and processing method) based on multi-source heterogeneous and cross-domain correlation, which is called “noun definition”.

two Medical data

The data generated by doctors in the process of diagnosis and treatment of patients, including patients’ basic data, electronic medical records, diagnosis and treatment data, medical image data, medical management data, economic data, medical equipment and instrument data, etc., take patients as the center and become the main source of medical data.

3. Medical Data Sources

First of all, there are four main sources of “medical data” : the first is patients’ medical treatment, the second is clinical research and scientific research, the third is life pharmaceuticals, and the fourth is wearable devices.

The first “patient seeking medical treatment” is derived from the patient, the patient’s physical sign data, the patient’s laboratory data, the patient’s description, the patient’s hospitalization data, the doctor’s consultation data, the doctor’s clinical diagnosis and treatment, medication, surgery and other data of the patient.

The second type of “clinical research and research” is mainly data generated during experiments, but also includes data generated by patients.

The third kind of “Life Pharmacy” is mainly the data generated by the experiment, such as drug dosage, drug duration, drug composition, reaction time of experimental subjects, symptom improvement and other data, as well as data related to genomics such as life.

The fourth kind of “wearable device” mainly collects various physical data of the human body through various wearables (bracelets, pacemakers, glasses, etc.).

Four. Medical data features

First of all, medical data belongs to a kind of data, so its big data must also have general data characteristics: large scale, diverse structure, rapid growth and great value. However, as the data generated in the medical field, it also has medical characteristics: polymorphism, imcompleteness, redundancy, timeliness and privacy.

Polymorphism: Medical data packet contains pure data generated by laboratory tests, signal maps such as electrocardiogram and image data generated by physical examination, text descriptions of symptoms of patients and judgments made by doctors following up their own experience or data results, as well as similar sound data such as heart beats, cries and coughs. At the same time, there are all kinds of animation data in the data of modern hospitals (such as images of fetal movement, etc.).

Incompleteness: Many medical data are incomplete due to various reasons, such as doctors’ subjective judgment and incomplete written description, incomplete data caused by patients’ treatment interruption, incomplete data caused by patients’ unclear description, etc.

Redundancy: The huge amount of medical data will produce a large amount of redundant data every day, which brings great difficulties to the screening of data analysis.

Timeliness: Most medical data are temporal and persistent, such as electrocardiogram and fetal movement thinking map, which belong to data change map in time dimension.

Privacy: Privacy is also an important characteristic of medical data, and it is also a reason why most medical data are reluctant to be opened to the outside world. The clinical data systems of many hospitals are relatively independent of the local area network, and even do not go to the external network.

Five, data processing

Data processing is generally divided into six steps: data mining, data collection, data analysis, data storage, data conversion, and finally data generation in the practical process, and so on.

Six. Uses of healthcare big data

The main uses of medical big data include: drug analysis, etiology analysis, mobile medicine, genomics, disease prevention, wearable medicine, etc.

With the development of medical big data and the continuous innovation of analytical methods, artificial intelligence and other technologies, there will be more and more scenarios that can accurately use medical big data for analysis and prediction, and then big data will become an important auxiliary basis for medical decision-making.

Seven. Medical Big Data

Medical big data enterprises are mainly divided into three categories: chronic disease and health management (assisting patients), clinical decision support (assisting doctors), and medical research and development.

The service objects of medical big data mainly include: residents, doctors, scientific research, regulatory institutions and public health.

The main uses of medical big data include: drug analysis, etiology analysis, mobile medicine, genomics, disease prevention, wearable medicine, etc.

Eight. The use of statistics in medical care

Statistics is an important tool for medical scientific research. It applies the principles and methods of probability theory and mathematical statistics, and combines with medical practice to study the collection, analysis and inference of digital data. Correct statistical analysis can help people correctly understand the regularity of objective things, so as to have a clear mind, carry out work with a definite object in mind, and improve the quality of work.

In the field of statistical analysis, there is a very widely used characteristic curve called receiver operating characteristic curve.

The reason for this is that the points on the curve reflect the same receptivity. They are all responses to the same signal stimulus, but the results are obtained under different criteria.

The receiver operating characteristic curve is the coordinate graph composed of false alarm probability as the horizontal axis and hit probability as the vertical axis, and the curve drawn by the different results of different judgment criteria under specific stimulus conditions.

The AUC often mentioned in statistics is the “Area Under the ROC curve”, whose value is between 0.1 and 1 and is a score value calculated by the current classification algorithm. The larger the AUC value is, the more likely the positive sample is to rank ahead of the negative sample, so as to better classify the statistical sample.

In an existing statistical method, the diagnosis of samples is usually divided into two categories, one is healthy and the other is sick. In addition to these two categories, there is also a group called sub-healthy people, if we continue to classify patients in the original way, then some of the results we get may be misleading.

In statistics, the confidence interval of the probability sample is the interval for some population parameter of the sample, and the estimate in general, let’s say it has a 95% confidence interval, that means that 95% of the tester’s statistics fall within the confidence interval.

It actually shows the true value of the parameter, the degree to which there is a certain probability of falling around the test result, and also gives a degree of confidence to the test value of the measured parameter.

To determine the accuracy of this diagnosis, we need to look at the coverage of the confidence interval. The closer the coverage is to a given probability, the more accurate the method will be.

A broad search is a way of combining all the values together and comparing their sizes to find the biggest differences.

Thus, as health care providers become better at extracting meaningful insights from patient data, they will also learn better ways to deliver care and improve the quality of service. As the field of big data technology matures, many organizations will benefit from improved operations, lower costs, and improved health.

In many ways, big data and artificial intelligence can help address the growing shortage of care providers. Healthcare providers will also take full advantage of big data technologies to continue powering the healthcare technology framework.