At eight o ‘clock on August 27th, Xuesong Cheng, senior solution architect of Seven Niuyun, conducted a live broadcast entitled “Mining the Infinite Value of big data of traditional industry log” in IT Big Club. He made an in-depth analysis of the common difficulties in operation and maintenance of traditional industry and the necessity of unified log management. And through some real user cases of Pandora and everyone elaborated how to mine the infinite value of traditional industry log big data. This article is the reorganization of the live broadcast content.

The second part mainly analyzes the key points of log management platform construction and shares some real user cases of Pandora, and answers the audience’s questions.

Monitoring and Alarm

After data analysis, relevant data monitoring and alarms need to be generated. For example, I should solidify the monitoring of some important indicators that I care about and regularly monitor these data. Once problems occur, I need to timely reflect them and form some alarm notices. It is best to save the search results directly as alarm Settings. Let’s say I find a result, and I think it’s important, and I don’t just want to find that one result, I want it to be monitored all the time. If there is a problem with this monitoring value, I will be able to call the police. You can save as alarm later to set up the relevant strategies, for example: I’m going to monitor the index of maximum, minimum, average, etc., then this value once broke through a certain threshold, can by SMS or email, to timely report to the police, this is a monitoring alarm function, it is must have.

Out-of-the-box reports

And then there is a visual display, able to support all kinds of charts, put some more difficult to understand the text type of log, or type of log data, become easier to read some statements more intuitive, make some people more able to understand such as pie charts, bar charts, line charts, even you may encounter as a map of China, the world map, and so on. I want to support such charts in advance, be able to easily define the functions of these reports, enter the data in, select the relevant chart format, and automatically generate such reports. In this way, you can easily reflect your log analysis results and related parameters monitoring.

Screen display

And it will eventually be able to support large displays, be able to project to a podium or monitor large screens. Let our IT operation personnel or IT department leaders, or even the boss of the company, can easily see the information platform, how the internal system is running, what is the current state, this is a big screen display.

Machine learning (exception detection)

At present, log analysis can no longer meet the changing requirements of operation and maintenance by merely analyzing existing data or analyzing existing strategies. In fact, many IT problems are unknown at the beginning. Even at the beginning of monitoring the business, it may be difficult for me to anticipate how the monitoring strategy should be designed, or to suggest what the threshold of the monitoring metrics should be at the beginning.

Historical data analysis

Therefore, if the log analysis platform can introduce some new big data technologies, such as deep learning or machine learning, it can be very convenient to analyze historical data and tell me the outliers in the middle. For example, the screenshot below actually shows the change of data flow of an enterprise, and we will find that the blue line representing the actual data is not flat, but shows a periodic change law. Our traditional traffic monitoring is to draw two parallel lines, all the traffic sandwiched inside, so that once the real time value of the flow exceeds my threshold will alarm. Then I will push corresponding emails or SMS messages to inform my operation and maintenance personnel. But what we can see is that there is an understatement, something like the little arrow in the middle, that is, it doesn’t look like it’s actually breaking through all the peaks and lows in all the historical periods, but we can see that it’s actually not consistent with the historical data.

At this time, if we can adjust the threshold flexibly and dynamically based on different periods, it will be very convenient to find the data that does not break through the historical threshold but does not conform to the law of periodic change. This is actually very convenient to find some data anomalies that might have been missed. Moreover, these unreported anomalies are likely to become great risks in the future. Therefore, if we can timely find these anomalies at this time and intervene in time, we can prevent such risks and even some future faults from happening. This is a classic scenario for historical data analysis.

To predict the future

Since we can do comprehensive learning and analysis based on historical data, we also want to be able to match incremental learning with real-time data to accurately predict future trends and support more intelligent features. Where does this come in?

For example, we want to do a business change or upgrade in the next few days, but I don’t know which time is the most appropriate and has the least impact. Usually, we will choose midnight, such as midnight, to do the system delivery or the new system online, which will put a lot of pressure on us to stay up late. But if we can use machine learning to predict the overall traffic change in the future, we can easily find a point in time when the traffic is low and we don’t have to put too much burden on people to make changes to our system.

The minimalist use

Machine learning is great, but people often think that machine learning is a particularly lofty thing, think that the algorithm model is a particularly complex thing. I wonder if I can play, if I can play, if I can easily use these machine learning features and features. It becomes very important to automate, activate and intelligently select algorithms and automatically generate relevant models so that our operation and maintenance personnel can use this machine learning function with low or even zero threshold. Machine learning is also a direction you may want to consider when choosing a log management platform.

Openness (API)

Finally, there is the open API. In fact, many times when operation and maintenance personnel use the log management platform, they do not simply log in to the platform through your visual interface to use your functions. The log analysis platform needs to be connected to the service system, analysis system, or user’s existing monitoring platform. Even more often than not, the log analysis platform is not directly accessible to your interface, but to use your capabilities through the interface. Therefore, the openness of the whole log analysis platform is a problem that must be considered in the daily use of the platform. The richness of open apis is a very important indicator, considering that we need to connect with many different business systems and even different monitoring software developed by different application developers. Now basically mainstream languages include Java, PHP, Python, C, C++, JavaScript, Go and so on some languages, it is best to support. Therefore, openness is also an issue that must be considered when choosing a log analysis platform.

Seven Niuyun log analysis platform Pandora

Is there a platform that can solve all the problems mentioned earlier? Yes, it is Pandora, the log analysis platform of Qiniu. It can realize intelligent management for the whole life cycle of logs, such as data collection, cleaning, storage, search, monitoring alarms, analysis, reports, open and other related aspects, we have related technologies and products to meet the needs of users. Pandora can implement intelligent log lifecycle management and is suitable for a variety of scenarios such as operation and maintenance analysis, security audit, business data analysis, and other industries such as the Internet, intelligent hardware, and intelligent manufacturing.

This chart is a panoramic view of Pandora’s current capabilities, reflecting the eight areas mentioned earlier, and shows that Pandora has covered all of the requirements for a log analysis platform. Logkit is our data acquisition platform, which can support data collection, analysis, transformation and transmission. Pipeline is a big data platform based on big data technology, which can help users to conduct real-time and offline analysis. Insight is a data analysis platform that supports unified log storage, search, reporting, alarm monitoring, APIS, analysis and prediction, including machine learning.

To sum up, Pandora has six advantages: large data scale, fast processing speed, clever open interface, extensive ecological support, cool user experience and abundant public cloud experience. Pandora not only supports public cloud services, we can also do private deployment. This can be flexibly selected according to the actual situation of the user.

Data scale Pandora now supports a fully scale-out storage and computing design on the public cloud. Now the accumulated storage data on the cloud exceeds 40 PB and the accumulated calculation data exceeds 500 PB. The traditional ELK method cannot meet the requirements of such a large amount of data.

Fast processing Pandora supports real-time computing and can respond in milliseconds to seconds. All logs can be stored in milliseconds. For example, if the system side or data source side can generate logs in real time, we can collect these logs to our platform in real time, and ensure that the data is not lost and redundant.

All of our operations are supported by corresponding APIS, which can be easily combined with third-party systems. This is our third advantage called open interface.

We support most of the current mainstream relational databases, non-relational databases, message queues and some big data related components. The detailed list can be found on our website.

User experience is great, such as the automatic field statistics, segmentation analysis, joint search, machine learning and other details we just mentioned. We all considered in advance for the user, these details all the ease of use of our more than 200 items, all our development goals and requirements is to reduce the user’s mental burden, don’t put the log analysis as a particularly complex things, let everybody can lower the threshold and even zero threshold to use our log analysis products. Simply importing your logs into the platform and then easily accessing the results of log analysis can bring business value and improvement to users. That’s what Pandora is hoping to do.

The final advantage of public cloud experience is that we have some data to prove our ability. The first is that there are now more than 250 terabytes of data flowing into the public cloud every day, over 365 billion logs. Now we serve more than 200 customers, and the daily log calculation volume can reach 3.2 PB. We are also able to provide over 10,000 effective alerts per day, which is a very robust platform. All of our functionality is fully open to users in the public cloud, and users can choose to deploy our platform in a private way in your local machine room.

Case sharing

I have a little bit of time at the end to share with you some examples. And then I’m going to answer some of your questions.

Seven NiuYun

The first case is Qiuniuyun. Logs generated by all product lines of Qiuniuyun will be imported into our log analysis platform for unified aggregation, cleaning, storage, search and so on, and then support different departments within us to use these logs. For example, the department of Business Operation will make portraits of users according to their daily use and consumption behaviors. Product r & D department for troubleshooting online errors; Technical support department for customer service; Quality assurance department for quality analysis and review; The operation and maintenance department is used for operation and maintenance monitoring alarm and cost analysis.

bank

The second is a big bank, the bank has a number of large data centers, and each data center both the physical machine, also have a virtual machine to provide services, they encounter the pain points for physical machines and virtual machine, different network equipment, different operating systems and the increasing of huge amounts of business data, can’t do unified collection, storage and analysis. The bank finally adopted Pandora platform to collect metrics for various devices using LogKit, and finally monitored alarms on the platform. In addition, logKit is also used to collect service logs for service personnel to retrieve and analyze these logs. In this way, problems can be quickly located and data value can be continuously mined.

Manufacturing companies

The third is a large manufacturing enterprise in East China, which is also a scenario representative of the Internet of Things we just mentioned. Customers import data from sensors deployed on many production lines into Pandora. The customer’s entire workshop and plant is very large, and all the sensors can generate millions of levels of data per second. Millions of levels of data like this in real time transfer to my platform, and then the real-time data processing, doing real-time monitoring at the same time, will generate some multidimensional reports, convenient user for real-time performance of the entire production line, including the production line for a period of time, the overall performance of a precise analysis.

Internet company

The fourth case is a large Internet company whose main business is to provide video-on-demand services. They purchased the CDN service of Qiuniuyun, so a large number of CDN logs are generated, including where the user comes from, what resources he accesses, his entire access situation, and the average time he opens videos, etc. Such data are actually in our CDN logs and contain a lot of value. Based on our CDN logs and combined with our log analysis platform, the user can analyze a lot of application quality and operation indicators to support and make decision analysis for subsequent business.

If you are interested, you can go to our official website to see our product introduction. There are two websites below, one is the official website of Qiuniuyun, where you can apply for a free registered account to try out our platform. The second is our documentation site, where you can see the specific introduction of all products, and they also provide scenario description and analysis of typical products, which can help you better understand the use of our platform.

Register for free: www.qiniu.com quickly understand: developer.qiniu.com/insight/

And finally one More Thing, we expect to launch these three features in September, and the first one is multidimensional analytics, which we call Datacube. It can predict many daily key operating indicators of users. When you query some key indicators you care about, you can output related results more quickly. The second is a full-link monitoring and analysis solution for daily operation and maintenance monitoring. The third is a root cause analysis we do for specific failures. We’ll be launching all of this in September as well. Please pay attention to the official website of seven Niuyun, we will inform you in time as soon as we have relevant news.

Thank you all for listening to my simple sharing.

Q&A

Q Are there any privatized deployments? A: There are private deployments. Many of our customers in traditional industries use Pandora in private deployments, including some of our banks. In fact, private deployment scenarios have been one of our biggest concerns since Pandora’s first days.

Q Can the container module load be located to a specific process? A: Yes, we now support container log collection for K8S, load monitoring for container modules, and locating specific processes.

Q Can I customize log analysis rules? A: Yes, we support very flexible rules for log analysis. You can use delimits to parse logs and save the results as rules, which can be easily configured for logs later.

Q If the cloud space is used and the log volume is large, how do I solve the traffic problem? A: When using the log analysis platform, some users may encounter similar concerns. If the daily business and log management and analysis services are not in the same cloud, there will be a lot of intermediate traffic charges. We have two solutions for this situation: 1. All log collection and transmission of Pandora is compressed and encrypted, with a compression rate of more than 10 times, greatly reducing the traffic burden; 2. Second, if you have idle computing resources, virtual machine can be local or cloud hosting, we also support the log analysis services deployed in the form of privatisation deployment in the local or a third party cloud platforms, so in a unified platform, log transmission flow can be changed into internal flow, in general the cost of internal flow is very low.

What’s the difference between Q Pandora and ELK? A:

  • We are fully hosted, out of the box, pay-as-you-go, low cost

  • Pandora’s data collection product logKit is far better than Logstash/FileBeat, both in terms of experience and performance

  • We support flexible enterprise-class data buses

  • Our performance in the stability and functional richness of “acquisition” is much better than that of ES

  • In the case of large data volume (more than 1 billion logs and more than TB level), system stability and performance are better than ES

  • ES does not support data desensitization

  • ES does not support multiple tenants

  • ES does not support key functions such as user rights and security audit

  • ES has no built-in machine learning support

  • ES is not supported by a variety of rich solutions

In summary, any scenario with ELK can be done with Pandora, which provides an excellent product experience. Pandora is a superset of ELK in terms of features, such as streaming computing and multidimensional analysis.

People say

The Great Talk column is dedicated to the discovery of the minds of technical people, including technical practices, technical dry goods, technical insights, growth tips, and anything worth discovering. We hope to gather the best technical people to dig out the original, sharp and contemporary sound.