How scary is the Fingerprinting browser mentioned in WWDC?

When Apple announced macOS Mojave at WWDC 2018, it revealed that Safari now has fingerprinting defense capabilities. What does this technology have to do with fingerprints, what does it do, and how much should the average user worry? Let’s start with the ins and outs 🙂

What is a Fingerprinting

Fingerprinting stands for fingerprint collection, so what does it mean in the context of a Web browser? Take a look at the problem it’s trying to solve.

Are name and ID number enough to uniquely identify a person in human society? In general, there’s nothing wrong with using these social institution-based conventions, but many times it’s not enough:

Names can be changed at will, and there are plenty of duplicate names.
Id cards can be forged or used fraudulently.
In extreme cases (such as an unidentified body) there are no names and no identification cards.

On the Web, if we think of browsers as people, we have a very similar analogy: a User Agent is a name, and a cookie is an ID card. For example, the Chrome User Agent will use fields like Chrome/66.0.3359.181 to indicate its name and version, and for the same name (many users use the same version of Chrome), we can also use cookies to uniquely identify the User. Is that intuitive? But we can’t escape these three problems on the Web:

The User Agent is like a name and can be changed almost at will in modern browsers.
Cookies are like id cards. As long as you know someone else’s ID number (cookie value), you can disguise your identity as someone else.
For anonymous or malicious access, the above information is often invalid.

This exposes the inherent fragility of the assumption that everyone is a good person. So we need to develop technology to uniquely identify a person biologically and to uniquely identify a browser technically. For the former, we have fingerprint, iris, DNA and other recognition technologies available. Similarly, for the latter, the technology we use is fingerprinting, as described below.

Quick overview of Web Fingerprinting technology

In a way, Fingerprinting is a peculiar quirk — not using an object for its original purpose at all, but developing new uses for it:

Fingerprints were originally designed to prevent skidding, and we used them to identify a man.
The iris is originally designed to regulate pupil size, and we use it to identify a person.
DNA was meant to be given to girls to make offspring, and we used it to identify a person.

In the world of programmers, there are many more. How many ways can you think of to uniquely identify a browser running on an OS platform? In this regard, just look at the open-source FingerprintJs2 library to get a sense of how clever things programmers can come up with to keep track of users. The dimensions involved in these operations mainly include but are not limited to:

The IP address
JavaScript behavior
Flash and Java plug-ins
The font
Canvas
WebGL

Let’s take a brief look at each of these dimensions.

The IP address

The simplest IP address collection does not require the cooperation of the client, but is primarily the work of the server. For example, the Web site server can log the REQUESTED IP address and obtain the user’s geographic location based on that. If a user has added a proxy server, we can detect this by detecting the X-Forwarded-For field in the HTTP header. Between the HTTP application layer and the IP network layer, it is not difficult to obtain some transport layer information by collecting TCP headers on the server side.

All you need to get this information is the back-end service. So this kind of data collection, whether there is no front-end display space? Not so. Let’s take a look at two special fingerprinting methods: DNS Leak and WebRTC Leak.

With a little bit of work on the front end, we can locate the DNS server that the user is using. Specifically, when you visit example.com, you can have the browser initiate DNS queries for these subdomains by randomly generating a series of images in the front page with addresses like abcdefg.example.com. As long as example.com controls the final secondary domain name server, such as NS1.example.com, the DNS queries that are initiated hierarchies when querying these addresses can be logged by the server to obtain the user’s DNS server. As a result, if a proxy is configured only for HTTP requests, the DNS address used by the user may be compromised. In this case, if the user uses the nearest DNS server assigned by the carrier by default, his real location may be exposed to the server.

Compared to the above only need to insert dynamic links, WebRTC leakage requires more front-end participation. We know WebRTC can be used for real-time applications such as video streaming, but Firefox and Chrome implementations of WebRTC require STUN to allow UDP communication between two post-NAT hosts. STUN servers can return local and public IP addresses to users. In this way, we can use JavaScript to get the IP address of the user’s Intranet after NAT.

For a taste of the data you can collect with any of the fingerprinting methods described above, please click here.

JavaScript behavior

The above description seems to be mostly web level work, but there is also a lot of information to be gathered within the JavaScript context of browsing.

To programmatically control the UI and behavior of a Web page, we must use JavaScript to manipulate the DOM. As those of you with a bit of front-end experience know, the DOM is heavily loaded with attributes. This means that a lot of sensitive information about browsers is stored in the DOM: User-Agent, system architecture, system language, local time, time zone, screen resolution… As for the new FEATURES of HTML5, such as power, accelerometer, information, Timing and other apis, not to mention the specific value of detecting them, just detecting the existence of these apis, the information is very large. How easy is it to detect these attributes? We can easily get a browser’s “height, weight, blood type, constellation…” by simply accessing the navigator. XXX property in JavaScript. .

Of course, modern browsers restrict access to sensitive DOM attributes by using security policies. But for the Fingerprinting scene, some security strategies are little more than a cover-up. Let’s take a look at fingerprintjs2:

// https://bugzilla.mozilla.org/show_bug.cgi?id=781447
hasLocalStorage: function () {
  try {
    return!!!!!window.localStorage
  } catch (e) {
    return true // SecurityError when referencing it means it exists}},Copy the code

This pattern appears quite a few times throughout the library. Hiding it from me? There is no silver in this place 🙂

For a Fingerprinting demo of JavaScript, go here.

Flash and Java plug-ins

Flash and Java reveal information about the user’s device to varying degrees.

At the browser level, their corresponding navigator.plugins fields are a pit in themselves: listing all the plugins installed by the user and their detailed version numbers, which in itself greatly increases the uniqueness of the browser. For example, in older Versions of Firefox, it is easy to get the user’s browser plug-in information:

for (plugin of navigator.plugins) { console.log(plugin.name); }

"Shockwave Flash"
"QuickTime plug-in in 7.7.3"
"Default Browser Helper"
"Unity Player"
"Google Earth Plug-in"
"Silverlight Plug-In"
"Java Applet Plug-in"
"Adobe Acrobat NPAPI Plug-in, Version 11.0.02"
"WacomTabletPlugin"
Copy the code

In a last-gasp fix, browser manufacturers added “cloaking” protection for this property, blocking names other than common plug-ins. In today’s Firefox, the code above would look like this:

for (plugin of navigator.plugins) { console.log(plugin.name); }

"Shockwave Flash"
"QuickTime plug-in in 7.7.3"
"Java Applet Plug-in"
Copy the code

However, this ability does not prevent trackers from actively detecting the installation of plugins in the form of navigator.plugins[“Shockwave Flash”]. Therefore, this is the first information leak of the browser plug-in API.

What could possibly be fingerprinting in Flash and Java applets at the plug-in Runtime level outside of the browser level?

Flash provides AS3 with the ability to read system information: in addition to Flash version, this includes OS version, hardware manufacturer, Web browser architecture, resolution, and many other attributes used to describe hardware and system multimedia compatibility. In the case of Java applets, they can provide a description of the JVM, system version, user locale information, and even some information about file systems, memory usage, and network state. Taken together, this information will undoubtedly make tracking much easier.

In the shadow of these security issues, Flash and Java applets have faded from the modern Web. Navigatior.plugins API has been deprecated.

In most of the fingerprinting methods introduced so far, the data obtained is not particularly unique (such as UA) or may have a lot of jitter (such as IP addresses). Next, we’ll talk about some features that are really close to fingerprinting. They’re closer to the essence of fingerprinting.

Here is an example of Flash Fingerprinting. Thankfully, your browser probably doesn’t support Flash anymore 🙂

The font

Seemingly ordinary fonts can actually lead to a very large topic. In fingerprinting, fonts play an integral role.

As mentioned in the share of our @xiaomi boss, the calculation of font typesetting involves a lot of parameters: baseline/ligatures/kerning… It is so complex that browsers rely on operating system drawing libraries (such as Pango on Linux, CoreText on macOS, and DirectWrite on Windows). Not only do these libraries behave in their own subtle ways, but browsers continue to control the font rendering process through CSS properties. In this way, we can know the calculation process through the typesetting result of the font. The process may seem subtle, but it’s actually quite simple:

Render in a variety of special fonts where the user cannot see them<span>The label.
The obtained labels are measured and Bounding Box.

With this simple step, we can learn two key things:

Whether the user has a font installed (uninstalled fonts will Fallback to the default font).
Differences in pixel-level Bounding Box layout caused by different font rendering methods.

To see the fingerprint differences calculated based on font typography, see here.

Canvas

The Canvas API in HTML provides JavaScript with pixel-level control over rendered content. As we know, in addition to support for basic shapes, text and drawing modes, Canvas can also export Canvas content as images (if you have used the “Save to Album” function in various moments links, you have used this API). At the image format level, browsers use different image processing engines, export parameters, and compression levels, which makes it easy to hash slightly different exported files even if the final image is exactly the same for every pixel. At the operating system level, font rendering, anti-aliasing, sub-pixel rendering can also make subtle differences. Overall, we can use Canvas to get “fingerprint”.

In Fingerprintjs2, the source code for this feature is very succinct:

getCanvasFp: function () {
  var result = []
  var canvas = document.createElement('canvas')
  // ...
  // After calling a bunch of Canvas apis
  if (canvas.toDataURL) { result.push('canvas fp:' + canvas.toDataURL()) }
},
Copy the code

You probably won’t find any other “core implementation” that doesn’t include an if-else… But it worked pretty well. In this example page, you can view your browser’s Canvas fingerprint:

Your Fingerprint Signature roll up 4FAFB231 Uniqueness 99.56% (1130 of 258561 user agents have the same Signature)Copy the code

This means it is easy to get very high Uniqueness.

WebGL

WebGL is a much lower-level API than Canvas, and you can use it to get the power of 3D drawing. WebGL based Fingerprinting is no different from font and Canvas in principle, except the following two points:

Take a thorough look at browser support for the WebGL API (yes, there are 88 apis alone).
Draw a special shape and calculate the hash value of the rendered image.

Here is the corresponding demo page. Probably because the Demo didn’t introduce font drawing, the images here are not very unique, and my Safari and Chrome were able to get exactly the same image hash…

Real World performance

The combination of the above tools creates a very powerful library of industrial-grade Fingerprinting. If you’re in doubt about the actual effect, go to the FingerprintJs2 project home page and try something like this:

Start with a fingerprint in normal mode of your Chrome.
In Safari, we also generate a fingerprint.
Compare the results with your colleague’s Mac.
Change your User Agent and refresh the page to see if there is a difference in fingerprints.
Go to Anonymous mode in Chrome and re-generate a fingerprint to see if it is consistent.

Not surprisingly, fingerprint does not change when you change a variety of common configurations in the same Chrome. And switching to the same version of the browser on another computer or a different browser on the same computer has different fingerprint results. That’s the power of Fingerprinting.

According to Mozilla’s data 1, 83.6% of the 1 million visits to the site were unique fingerprint on browsers, and 94.2% on browsers with Flash or Java enabled.

Another interesting data point is that cookies, which are often written into privacy policies, contribute very little to the uniqueness of fingerprints. According to the data, the browser plugin was able to add 15.4 bits of entropy to the tracking method described above, while cookie enabled added only 0.353 bits. That’s an order of magnitude difference between 2^15 and 2^0.3 — and the statistics don’t take into account the better Canvas tracking technology. Now you can understand how hard the little ads on all kinds of junk sites try to find you 🙂

conclusion

Foreign countries use gunpowder to make bullets to fight the enemy, but China uses it to make firecrackers to worship god; Foreign countries use the compass to sail, But China uses it to see feng shui; Foreign countries use opium to cure diseases, but China uses it as food.

Currently, browser vendors are trying to provide better privacy policies. Safari, mentioned at the beginning of this article, uses simplified configuration items to make tracking harder. But behind Fingerprinting, it’s worth thinking about the value of privacy and the misuse of technology. On the one hand, would you disable Canvas, WebGL and font rendering for privacy? I’m afraid it’s hard for most people to make it back. On the other hand, just as Google uses AI to play Go and Baidu uses it to optimize advertising for fake drugs, the technology itself is not right or wrong, but the people who use it are important.

reference

Fingerprinting – MozillaWiki
DNS Leak Test
webrtc-ips
Fingerprinting web users through font metrics
HTML5 Canvas Fingerprinting
WebGL Browser Report