Content Source:On May 6, 2017, Zitian Zhang, client Technology Director of Qunar Platform Business Unit, delivered a speech titled “Qunar Fast App Development and Problem Solving Platform Practice” at ctrip Technology Salon — Mobile Development Engineering Practice and Performance Optimization. IT big said as the exclusive video partner, by the organizers and speakers review authorized release.



Read the wordReading: 2211 | 4 minutes



Abstract

This sharing mainly introduces how qunar’s client team can quickly, easily and reliably maintain their own products in a large-scale multi-team and multi-app scenario.

Through the reoccurrence of the actual scene, this paper introduces the user behavior tracking and network data interaction monitoring related content, and solves the current industry difficult to deal with the scheme, such as the collection and extraction of unburied statistics, network monitoring Hook scheme and wireless remote test.

Through the introduction of qunar’s process of solving product and user problems, and the introduction of the use and technical insider of relevant systems, we can inspire everyone how to develop and maintain APP faster in multi-front-end and cross-team scenarios, and quickly locate and solve problems.

Guest speech video and PPT address

t.cn/RChIMfH


APP flash back

When many ordinary users experience the APP flash back, they often cannot accurately express the problems of the APP. At most, they can only tell us the model or user account, so that we can learn very little information.

Troubleshooting Method

The most important information we need to know is the user’s flash back time, the specific page and the reason for the flash back. However, users generally can not provide this information, so we can only query logs in each system, pull the troubleshooting group, to “guess” the cause of the failure.

The role change

Because the entire work team changes dramatically during the business process. When you first encounter a problem, a few people can discuss it and basically solve it. As the business grew larger, it split from a single team into multiple teams; Later, the emergence of different development methods made everyone’s responsibilities and understanding of the APP very one-sided, so it was not clear at a glance what the problem was.

The page doesn’t come up

As for the problem that the page cannot be reproduced, our latest approach is to monitor the user’s process.





You can see exactly what the user did and when.

We call this “user scrutiny.”

Each page can also open its specific request, see the request time, time line, and even open each request to see the interface request of the background system through what links.

Now users can carefully analyze the problem in which link, just call the relevant responsible person of the corresponding link together to solve the problem, not like the traditional method that takes a long time, but also consume a lot of manpower.

The technical details involved here are as follows:

How to know user interactions and render changes;

How to know users’ network requests and timelines;

How to restore the user’s scene;

How to develop business code without affecting it.

The system involved – “Tumbling Cloud”

QAV is interaction statistics, QACR is exception monitoring, and QTrace is used to monitor the entire flow of network requests.



Interactive behavior and render changes

Let’s start with the interaction, and first look at the type of event being monitored. Event types include APP life cycle event, page switching event, and interaction event.

In the early years, positioning controls were done in view-ID mode, but this method was very unreliable, so in that era, manual burying was often used to operate.

Later with the coordinate way, in fact, there is no better than view-ID much, especially on Android, because of various models, screen size is not the same and inaccurate.

After using xpath for a while, I found that it wasn’t stable enough on Android. Reflected in different system ROM, it will be on the whole view number to do some of the manufacturer’s customized content, and even some will automatically add and delete content.

So we’ve made some improvements to xpath, using custom formats to define xPath-based pages and layouts.

Either way, the data will change, so we need a merge data item.

The style of xpath varies from platform to platform.



In the business development process can not let it manually buried points, so to adopt Hook method.



Hooks work in different ways on different platforms. On IOS you can do this with the Runtime, but on Android you do it differently.

You can actually do Runtime on Android, but it’s not very good. Because it is not a real run-time Hook, it needs to be pre-inserted, which will affect the efficiency of operation.

So Hook uses InstantRun during runtime and JavaAgent during build.



Injecting code on IOS is simple, but on Android it’s more complicated.



In the process of building, Hook out the build script, add all the embedded points into Dex and then package, and when the package is finished, the code is already in the real output code Dex.



This is divided into three parts, one is the Agent, one is the Gradle plug-in, and the part that actually changes the insert content.

The inserted content part is the monitoring of network part and user behavior, which is the business layer compared with Hook, so we call it Dex.

Agent itself is used to make Hook.



Let’s see what we hooked up to. The most basic part of the network is the time and status of the request, and whether the current network is Wifi or 4G.

Inject a few data.

The network will inject different types for different uses. Because of some legacy issues or third party issues, it is necessary to adopt a framework for different network requests.

In React – Native, Hook solutions are directly injected into the React – Native framework layer.

Aggregate the data

How do I concatenate data concurrently

We will have an ID that binds user behavior to network requests. Each user interaction generates an ID, which is carried along the next time network data is available, so that which interaction triggered the user request can be linked together.

Uuid is used to concatenate the call stack of the interface. Each layer is added with its own identity so that the entire network call stack can be traced.

The corrected time sort is used to put all the previous actions together in the correct order.



All user logs are sorted by the client’s own time.

Upload the log

We compress the interaction logs and network request logs and upload them.

Exception logs such as crashes or delays are uploaded in real time.



This set of system is developed to meet the development, testing, release, monitoring of this a complete process to do, can ensure that with the least manpower to do the most.

Tip of the iceberg — binding data items

Binding data items gives the control a more personal name and can be done by a non-worker. After binding, you can see user actions in the logging behavior.

This greatly reduces the time spent on statistical class requirements during development. It also avoids the embarrassment that only programmers can understand the network log and allows it to operate autonomously.

Tip of the iceberg — collapse aggregation

We found that the mainstream vendors out there didn’t have a more complete collection of all the required errors, and we did it in a whole set of ways.



conclusion

We package everything together from data, testing, release, and monitoring to give the business people a friendly development environment.

That’s all for today’s sharing, thank you!

Recommend the article

  • Architecture and implementation of Xiaomi deep learning platform

  • Support baidu search engine 99.995% reliable name service architecture design

Recent activities