At present, there is a new concept developed in the era of cloud computing, which is called observability, which actually comes from Cybernetics. Cybernetics originated in 1948 when Nobert Wiener published cybernetics: The Science of Control and Communication in Animals and Machines.

Observability in control theory refers to the extent to which a system can infer its internal state from its external output. Observability and controllability of a system are mathematical duality concepts. Observability was first proposed by Rudolf Kalman, an engineer of Hungarian origin, for linear dynamic systems. In terms of signal flow diagrams, if all internal states can be output to output signals, the system is observable.

This is not a new concept. Before the concept was introduced into computer software, we were actually monitoring the stability of the system as a whole, and very few people in computer science talked about observability.

In fact, there is a misunderstanding, because if we want to complete the monitoring of a set of computer systems, so monitoring is the data generated by the computer, and the premise of monitoring is to be able to produce observable indicators and other data monitored objects. If there are fewer metrics or data, that is, we have less data to observe, monitoring adds less value and meaning. According to the cybernetic concept of duality, that means we’re also less able to control the system.

For example, if we can only monitor whether a server is normal, we can only judge the status of the server, and cannot observe the status of the operating system above.

If we only monitor operating system metrics, then we can only judge the state of the operating system, not the state of the applications installed on the operating system; To monitor applications, each application needs to be observable; otherwise, we can only determine whether the application is running from an operating system perspective.

Therefore, monitoring is an action, and its prerequisite is that the monitored object should have observability, and there are more observable data, which means that we can better control the whole system.

Why is the need for observability ever higher and more pressing today?

Let’s start by looking at the history of computers and surveillance software.

History of computer and surveillance software

01 Computer History

02 History of monitoring software

Development history of monitoring software

01 PC era

Most of the earliest computers were stand-alone machines with no concept of networks. At that time in the operating system level with a lot of tools and software, to ensure that we can know and observe the operating state of the operating system. The task manager is best known for Windows users, but Linux has a bunch of commands like Top and PS to help you know how your operating system is doing. To facilitate troubleshooting, some applications use output text to record the running status of applications, such as event association in Windows and syslog in Linux. In this period, we understand and control the operating system through the operating system itself and some of the functions of the application itself.

02 LAN Era

With the progress of The Times, the computer has entered the LAN era, and the C/S architecture (Client/Server) has emerged. Appeared in this case, will become in the local area network (LAN) a computer server (server, just as its name implies is to service others machine), then with the matching is the client through the interaction with the server data, to realize the various business requirements, so that can be used between the client server up together.

Then came the early distributed systems.

The appeal of the earliest distributed systems was high availability, because once the server failed, the client would not work properly. It was later discovered that you could change this situation by having different servers handle requests from different clients. As the number of connected clients increased and the complexity of services increased, the concept of “clustering” emerged. Servers were put together, connected in series by network equipment such as switches and routers. At this time, the monitoring requirements change. As the number of servers to manage increases, it is impossible for people to log in to each server to inspect the status of the system at any time. As a result, cluster-based monitoring software, such as Zabbix, appears.

Zabbix uses C/S architecture to collect observable data in the operating system for unified viewing, including the threshold value (that is, a value greater than or less than a certain value) based on the collected data for Alert. However, since the overall computer performance of Zabbix was not strong at the time of its birth, the system data that could be collected was few and more complete data could not be collected, so the granularity of Zabbix collection was relatively coarse. For performance reasons, the Zabbix agent (Zabbix Agent) determines the log data (for example, the log keyword contains some content), and then transmits the log data to the Zabbix Server as a signal.

As a result, Zabbix as a simple monitoring software, open source and free, gained great popularity and still has a large number of users today. However, its biggest limitation is that it only collects the simple observability indicators inherent in the system and processes the signals in a scripted way.

When Zabbix became popular, there were similar types of software, some focused on networks and others focused on specialized software, such as database-specific monitoring software, due to different processing power. At the same time, Splunk, a product that collects log data from a cluster for unified processing and analysis, has emerged. That’s a scary amount of data. Splunk is also often treated as a database because of its unique storage structure and algorithms, and its ability to completely collect and analyze massive amounts of data compared to Zabbix.

03 Internet Age

Time and technology evolved, the Internet came along, a unified client product called the browser (originally from Netscape), a unified form of content presentation, the Web, which uses HTML (text), and the browser is the HTML parser. It turned text into a visual web page, and From the outset Netscape provided a small programming language (JavaScript) that made static text protocols dynamic.

With this technology came the concept of B/S (Brower/Server browser /Server), where the browser interacts with the Server as a unified client. Due to the rise of the Internet and more and more users access, a large number of Websites based on Web technology have appeared. At that time, if people wanted to create their own website, one side of the website was put on the server, and the browser served as a unified client to get the services provided by the server.

The original form is personal computer connected to the Internet, usually only have the Internet address of the university are likely to use, and with the development of the Internet is more and more, telecom operators to provide a unified way to server connected to the Internet, IDC (Internet data center, and mass storage server, as opposed to the traditional And the offices connected by dedicated lines are called data centers).

The initial users of data centers are countries, banks, telecom companies, etc., and they began to use them to meet their internal business. Later, Internet companies also emerged, and they provided application services to the world through the Internet. There are more and more servers on the Internet, and the system is becoming more and more complex. Some companies, such as netease, also start to provide similar WebSite hosting services, so that users can quickly have a WebSite without having to go to IDC hosting server (which is also the predecessor of cloud computing).

04 Mobile Internet era

At this time, a great company — Google emerged. It scanned the whole Internet WebSite in the way of crawler (also a client) in the way of search engine, indexed its content, and then provided people with a way to quickly find the content they want to visit.

At the same time, a large number of Internet applications were born, such as instant messaging, novel websites and Internet games. Correspondingly, the system complexity behind them was getting higher and higher, and the scale of server cluster was also getting larger and larger. During that time, a programmer created ElasticSearch (a search engine technology) to help his wife search recipes faster and founded a company for it.

But instead of trying to be the next Google, Elastic ended up competing with Splunk, creating ELK solutions (ElasticSearch, Logstash, Kibana) that compete with Splunk for managing massive log data collection. At a time when computing power and technology are improving and massive log processing is becoming possible, ELK as open source software is becoming a very popular choice, Splunk as commercial software is growing rapidly, and ELK is seen as an efficient solution for log-based monitoring.

Large-scale Internet applications also gave birth to THE CDN technology, which will cache the content accessed by users to different physical locations of the server to accelerate the end user access. Began, meanwhile a new way of monitoring, measurement, namely web site or Internet service provider, in order to ensure that your web site or the service is monitoring, want to emulate his way is the client to visit your web site or service, to ensure that the site is healthy without exception, at the same time also can analyze the website access speed in different areas, To confirm whether the CDN works normally.

In 2007, Apple released its great product, the iPhone, which revolutionized and accelerated the development of the Internet. Google’s Android has also been launched. With the rapid development of wireless communication technology, mankind has quickly entered the era of mobile Internet. Mobile Internet has brought two changes. First, due to Steve Jobs’ pursuit of extreme experience and the computing power of mobile devices at that time, Apps, the unified client on mobile operating system, quietly appeared (we will refer to the application on PC with Program, and the application on mobile terminal with App). The other change is even more dramatic, a lot of devices around the world are connected to the Internet, a lot of users, a lot of access.

At the same time, a way to manage massive servers on a large scale was invented. Vmware was the first to launch a virtualization solution. Later, it developed into a physical server cluster divided by software to create more virtual machines to improve the utilization rate of the server. In 2006, Amazon began to use this technology to quietly launch an Internet Service, AWS (Amazon Web Service). At first, it used idle servers in a large number of data centers of Amazon as an e-commerce website to provide a large-scale hosting Service for websites. Today this service is known as cloud computing.

As games become Internet games, there are Internet video websites, and Internet services such as national or even global e-commerce, taxi-hailing and take-out. Enterprise-level software also begins to provide services in the way of Paas and SaaS through Internet plus browser or mobile client (App). And in order to meet the Internet all kinds of services, and mobile devices with different application form, a large number of new form of database, the message queue middleware are created, including no, is actually a series of meet the needs of the particular scenario database, and a single relational data is no longer satisfy the needs of customers (database history is very long, Will not expand).

With the emergence of a large number of users, each user generates a large amount of data in the process of using different Internet applications. In order to analyze and process these data, the concept of big data comes out. Including as user for services has increased in recent years and the change of demands, and the popularity of Internet push updates, the development of the Internet companies test link is becoming more and more agile, also a traditional application development requires a large number of tests to release a version of the way more and more difficult to be accepted, enterprise concept appeared. Especially in recent years, in order to further make applications more agile and easier to manage, container technology and the concept of cloud native to adapt to container technology have emerged (cloud native is a general term for ecosoftware built on the basis of the container layout framework Kubernetes).

To further improve Application Performance, Application Performance Moniter is introduced. It collects data on server-side and client-side code execution, not just to troubleshoot problems. In order to improve the performance of applications, New Relic, Dynatrace and AppDynamic have launched corresponding APM services. However, this also brings a problem. As originally described by cybernetics, IT services of a complex Internet need to be fully observed in order to fully control the system, so a large number of monitoring products are required, from infrastructure, cloud, cloud native, database, middleware, big data, dial measurement and security.

Need observation object from the original server into a virtual machine to container, now we need more database middleware, and we need observation including like AWS cloud services, and even different application deployed on different cloud vendors, old Zabbix has completely cannot afford to monitor so much need observation object.

In order to solve the massive data observation problems in the open source world, monitoring software based on time series database began to appear, such as Prometheus, Telegraf+InfluxDB, APM software, such as ZipKin, Jaeger, Pinpoint, Skywalking, etc. If we want to complete in on an Internet system, and need a lot of various forms of open source monitoring products are combined to use, and in the business world, unified platform has become the direction of DataDog as a full range of observable monitoring of SaaS service was born, is already the world’s most valuable IT monitor management class manufacturer; ELK is not only a logging platform, but also features such as ELK APM; Splunk also didn’t want to be stuck with logging, so it bought SignalFX, DataDog’s main competitor; New Relic, Dynatrace, and AppDynamic are starting to position themselves as more than just a single APM vendor, but as a further step towards offering complete observable capabilities. At the same time, the emergence of OpenTelemetry also indicates that the industry realizes the need to change the system observability into a unified standard and specification, and puts forward the concept of three pillars of observability, namely Metric, Log and Trace. The goal is to push more applications or services to comply with the specification and provide the corresponding observable capabilities.

This is basically a rough overview of the entire history of computer monitoring and observability, but you probably don’t have a clear understanding of what observability is and what the difference is between the rest of the monitoring. In fact, observability emphasizes the need for servers, cloud services, and applications themselves to proactively offer all three pillars of observability in one form or another. For example, Prometheus, which is open source, offers a concept called Exporters, All middleware and applications are called upon to proactively expose metrics to monitoring software, and at the same time, software is required to support reading the provided observable data, and further performance analysis and monitoring.

Computers have been around for so long, and the observability standard has only been proposed in the last year or two, that it is impossible for all systems to support observability. Tidb, the latest open source database, has its own observability interface by default, while MySQL does not. This is also the strength of DataDog. On the one hand, it provides DataDog Agent, which can realize the observability interface of a large number of systems conveniently and professionally, so that the systems that do not have observability can quickly become observable. On the other hand, it also provides corresponding services to process a large amount of observable data. Visualize this observable data and provide analysis and alarm capabilities. The open source world is not so easy, with Prometheus and Telegraf having a large number of immature and potentially dangerous exporters, and most software engineers unable to fully adapt their systems to observability.

Therefore, compared with traditional monitoring software or products, we can say that products with observability have two elements: one is able to make the objects to be monitored observable (collecting their indicators, logs and code links); the other is able to store, process and analyze these massive real-time data.

So the history of monitoring and observability is pretty much there, and you can see that this particular product development is closely tied to the development of computers themselves and the development of the Internet. What is the future like? I can only say that the human system will become more and more complicated, we will face more devices connected to the Internet (Internet of things under the IOT technology, the Internet industry), at the same time, there will be more new cloud technology, data, and these devices and new technologies also requires observability, and they can monitor management monitoring products. In order to ensure the stability of a complex and diverse system, monitoring and observability will continue to develop.

DataFlux, the cloud observability platform launched by Resident Cloud Technology, is also a historical trend and user demand.

DataFlux is the official website of DataFlux