Xlab 2015/05/26, however

0 x00 preface


Author: R3DF09@Tencent Xuanwu Laboratory

In March 2015, Google Project Zero published Exploiting the DRAM Rowhammer bug to gain kernel Privileges. Because the bug mentioned in this article is relatively difficult to fix, the NEED to update the BIOS to improve the speed of memory refresh, causing people to worry. However, RowHammer is difficult to execute because it requires specific assembly code to run on the target host.

The purpose of this article is to investigate whether RowHammer can be triggered by dynamic execution of scripting languages such as Javascript, which would greatly increase RowHammer’s aggression. To verify this idea, this paper analyzes the implementation mechanisms of Java Hotspot, Chrome V8,.net CoreCLR and Firefox SpiderMonkey and gives feasibility analysis.

Unfortunately, we didn’t find the best way to use these programs. Either there is no relevant command, or the command cannot meet the requirements of RowHammer, or it needs additional operation to change the execution environment to trigger, which lacks actual attack significance.

0x01 RowHammer


This section will briefly review how RowHammer exists, the mechanisms it triggers, and some of the challenges it has faced in exploiting it.

1.1 What ` s RowHammer?

RowHammer is a problem in DDR3 memory. By frequently accessing one row of data in memory, the data in adjacent rows will be bit reversed. As shown in Figure 1.1(a), memory is a two-dimensional array composed of a series of memory units. As shown in Figure 1.1(b), each memory cell consists of a transistor and a capacitor, which is connected to wordline and is responsible for storing data. Each ROW of DRAM has its own wordline, and wordlines need to be set to a high voltage so that data from a particular row can be accessed. When a line of wordline is set to high voltage, the data for that line goes into row-buffer. As wordlines are charged and discharged frequently, they can cause capacitors in nearby row storage cells to discharge, and if they lose too much voltage before they can be flushed, the data in memory will change.

Figure 1.2 shows a block of memory, where a row is 64KB (8KB) in size, 32K rows form a Bank, and 8 banks form a Rank, which is 2G. Note that different banks have their own row-buffers. Accessing rows in different banks does not cause wordlines to be charged or discharged.

The voltage in memory cannot be stored for a long time, so it needs to be constantly refreshed. The refresh speed is 64ms, so the RowHammer operation must be completed within 64ms.

1.2 RowHammer trigger method


Table 1.1 shows the snippet of code from Google Project Zero that triggers RowHammer.

Table 1.1

code1a:
  mov (X), %eax // Read from address X
  mov (Y), %ebx // Read from address Y 
  clflush (X) // Flush cache for address X 
  clflush (Y) // Flush cache for address Y 
  jmp code1a
Copy the code

Where the selection of x and Y addresses is very important,x and Y must be in the same Bank, different row.

Because different banks have dedicated row-buffers. If x and y were in the same row, wordlines would not be charged and discharged frequently, and RowHammer would not be triggered.

The code above is one useful test, but it’s not the only one, and all it really takes is for a Wordline to charge and discharge frequently in 64ms.

1.3 Trigger the RowHammer command


In order to frequently charge and discharge wordlines, the CPU Cache must be considered. If the current address is in the Cache, memory will not be accessed and wordlines will not be charged and discharged.

Table 1.2

instruction role
CLFLUSH Deletes data from the Cache
PREFETCH Data is read from memory and stored in Cache
MOVNT* Manipulate data directly without caching

The instructions in Table 1.2 can be used to frequently access a memory address and cause the corresponding wordline to be charged and discharged. To trigger RowHammer, these instructions need to be combined.

(note: These instructions are not the only way to trigger a RowHammer, such as by analyzing the mapping algorithm between the physical address and the L3 Cache (different CPU architectures may implement different algorithms), finding a set of addresses that map to the same Cache set, and repeatedly accessing these sets of addresses.

0x02 Script level triggers RowHammer


The POC provided by Google Project Zero is run directly in assembly mode and can be used to verify whether memory has security problems. Most scripting languages currently have JIT engines. If RowHammer can be triggered directly by the script-controlled JIT engine, it will have greater attack significance. In order to analyze its feasibility, this section studies the running mechanism of Java Hotspot, Chrome V8 and other execution engines.

2.1 the Java Hotspot


Hotspot is the default vm of the Oracle JDK, mainly used to interpret and execute Java bytecode. Its source code is located in the Hotspot directory of the Openjdk and can be compiled independently. Java bytecode is a stack instruction set with a small number of 256 instructions, which complete functions such as data transmission, type conversion, program control, object operation, operation and function call. Java bytecode is stored in a class file as input to the Hotspot simulator, which is programmatically controllable by the user. Can we construct a class file so that Hotspot does RowHammer at execution time?

Java Hotspot interprets bytecode by default, and when a method is called frequently enough that it reaches a certain threshold, it calls the built-in JIT compiler to compile it and calls the compiled code directly the next time it is executed.

The Java bytecode interpreter has two implementations, the template interpreter and the C++ interpreter. Hotspot implicitly uses the template interpreter. There are three implementations of the Java JIT compiler: the client-side (C1) compiler, the server-side (C2) compiler, and the Shark (LLVM-based) compiler.

Figure 2.1 shows the virtual machine used by Java on different platforms.

2.1.1 Does the template interpreter trigger RowHammer?

A) How the template interpreter works

A template interpreter is a relatively low-level implementation of the interpreter, where each bytecode corresponds to a template, and all the templates are combined to form a template table. Each template is essentially a piece of assembly code that is initialized during virtual machine creation. When executing the class file, the bytecode is traversed, and each bytecode is detected, the corresponding assembly code block is called for execution, thus completing the interpretation of the bytecode.

In order to interpret bytecode,Hotspot also generates a variety of assembly code blocks during initialization to assist bytecode interpretation, such as function entry code blocks, exception handling code blocks, etc. Viewing code blocks and templates generated in Hotspot can be done using the command Java -xx :+PrintInterpreter directive.

The assembly code for each bytecode template is quite large. For example, the bytecode Invokevirtual has a total of 352 bytes of code block, while the bytecode putStatic has 512 bytes.

B) Can the interpreter trigger RowHammer?

Bytecode interprets execution in assembly code, so is it possible to use a class file to have the interpreter generate the instructions RowHammer needs?

Through analysis, prefetch, CLflush, and Movnt * instructions are not included in the template and auxiliary code blocks corresponding to bytecode, so it is not feasible to trigger RowHammer by constructing bytecode directly and then using the template interpreter.

* * * | | 2.1.2 triggered the JIT compiler RowHammer?

The JIT compiler is also a compiler, but it compiles code when it is needed while the program is running dynamically. The compilation process is basically the same as the general compiler.

C1 compiler is the implementation of JIT compiler used by the client, which mainly pursues the speed of compilation and is relatively conservative in code optimization.

The Hotspot compiler is compiled asynchronously by default, with a thread called CompilerThread calling a particular method. When the number of calls reaches a certain threshold, the JIT compiler will be called to compile the method. The default value is 10000. You can set this threshold by running the -xx :+CompileThreshold parameter.

At the code level, the C1 compiler consists of the following steps

Table 2.1

typedef enum {
  _t_compile,
  _t_setup,
  _t_optimizeIR,
  _t_buildIR,
  _t_emit_lir,
  _t_linearScan,
  _t_lirGeneration,
  _t_lir_schedule,
  _t_codeemit,
  _t_codeinstall,
  max_phase_timers
} TimerName;
Copy the code

C1 compiler execution process is roughly shown in Figure 2.2:

1) Generate HIR (build_hir)

The C1 compiler first analyzes the JVM bytecode stream and converts it into a control flow diagram, the basic blocks of which are represented in the form of SSA. HIR is a high-level intermediate language representation that is a long way from machine-specific code.

2) Generate LIR (emIT_lir)

Through each basic block of the control flow diagram, as well as each statement in the basic block, the corresponding LIR form is generated. LIR is a relatively close to the representation of machine language, but it is not the machine can understand the code.

3) Register allocation

Virtual registers are used in LIR, which must be assigned real usable registers at this stage. C1 adopts register allocation algorithm based on linear scan to ensure compilation speed

4) Generate object code

The process of actually generating platform-specific machine code in which all instructions in the LIR are iterated and instruction specific assembly code is generated. The LIR_Assembler class is used primarily.

Table 2.2

#! c++ LIR_Assembler lir_asm(this); lir_asm.emit_code(hir()->code());Copy the code

Through traversing LIR_List, the emIT_code related to each instruction is successively called (as shown in Table 2-3). The instructions in LIR are inherited from LIR_Op

Table 2.3

#! c++ op->emit_code(this);Copy the code

Take LIR_Op1 as an example, its emit_code method is

Table 2.4

#! c++ void LIR_Op1::emit_code(LIR_Assembler* masm) { //emit_code masm->emit_op1(this); }Copy the code

Assume that the operation code of LIR_Op1 is LIR_prefetchr

Table 2.5

#! c++ case lir_prefetchr: prefetchr(op->in_opr()); break;Copy the code

The result is a call to the prefetchr function, which is platform-specific and implemented on different platforms. For example, the x86 implementation is located at Assembler_x86.cpp

Table 2.6

#! c++ void Assembler::prefetchr(Address src) { assert(VM_Version::supports_3dnow_prefetch(), "must support"); InstructionMark im(this); prefetch_prefix(src); emit_byte(0x0D); emit_operand(rax, src); // 0, src }Copy the code

The corresponding machine code will eventually be generated.

B) Can trigger RowHammer

Can the C1 compiler trigger RowHammer? After analysis, it is found that encapsulation of preFETCH related instructions in x86 platform is indeed promising to control the generation of PreFETCH instructions.

From the bottom up, if prefetch instruction is to be generated, LIR_op1 operation needs to appear in the LIR layer, and the opcode needs to be lir_preFETchr or lir_preFETchw. Further analysis, to appear such an instruction in the LIR layer, after the bytecode to The procedure in HIR must be able to call the GraphBuilder::append_unsafe_prefetch function. The method is called in the GraphBuilder:: try_inline_Instrinsics function, where parsing is triggered by simply calling sun.misc. Unsafe prefetch. Hotspot does support prefetch, however,sun.misc. Unsafe does not declare prefetch in the Java runtime rt.jar, so you need to change rt.jar to trigger it. You lose the point of the attack.

There is also the clflush instruction in Hotspot, which generates a code block during the initialization phase of Hotspot. As follows:

Table 2.7

#! c++ __ bind(flush_line); __ clflush(Address(addr, 0)); //addr: address to gain 1 (1, 1, 1); __ decrementl(lines); //lines: range to flush __ jcc(Assembler::notZero, flush_line);Copy the code

This part of the code is called after the C1 compiler finishes compiling

Table 2.8

#! c++ // done masm()->flush(); //invoke ICache flushCopy the code

Cache flush the area where the current code is stored

Table 2.9

#! c++ void AbstractAssembler::flush() { sync(); ICache::invalidate_range(addr_at(0), offset()); }Copy the code

The main problem with this method is that the area of code stored in the heap is randomly allocated, so there is no way to directly specify the area of cache flush, and since compilation is involved, there is no way to generate a large number of CLFlush instructions in a short period of time.

C) other compiler implementations

C2 compiler and C1 compiler have some similarities but are very different. Because it is mainly used on the server side, C2 compiler pays more attention to the execution efficiency of the compiled code, so it does a lot of optimization operations compared with C1 compiler in the compilation process, but they use the same when generating assembly code So the C2 compiler should be roughly the same as the C1 compiler, capable of generating prefetch instructions but not directly usable by default.

The Shark compiler is based on the LLVM implementation and is generally not enabled, so there is no further analysis of the compiler.

2.2 the Chrome V8


V8 is Google’s open source Javascript engine, written in C++ and can run independently. V8 compiles JavaScript code directly to native machine code, with no intermediate code and no interpreter. The implementation mechanism is to convert Javascript code into an abstract syntax tree, and then walk the abstract syntax tree directly to generate the corresponding machine code.

Prefetch, clflush, movnt* instructions cannot be generated while V8 generates machine code. However, the prefetch directive may be introduced during V8 execution.

The function that generates prefetch is

Table 2.10

#! c++ MemMoveFunction CreateMemMoveFunction() {Copy the code

Table 2.11

#! c++ __ prefetch(Operand(src, 0), 1); __ cmp(count, kSmallCopySize); __ j(below_equal, &small_size); __ cmp(count, kMediumCopySize); __ j(below_equal, &medium_size); __ cmp(dst, src); __ j(above, &backward);Copy the code

The main function of this function is to double the size of the buffer when the buffer cannot meet the storage requirements of the instruction. In this process, the prefetch instruction will be called once, but it is far from enough to trigger the condition of RowHammer.

2.3 the.net CoreCLIR


CoreCLR is. NET’s execution engine,RyuJIT is. NET JIT implementation, has been open source. As a competitor to Java,.NET makes a lot of reference to Java’s implementation mechanism, from the design of bytecode, to the implementation of compilers, etc., which are somewhat similar to Java. RyuJIT’s instruction set definition only defines some common instructions (Figure 2.3), but there are no instructions required by RowHammer, so they cannot be fired directly. However, there is a Prefetch operation in the GC of CoreCLR (Table 2.12), which is set to invalid by default (Table 2.13).

Figure 2.3

Table 2.12

#!c++
void gc_heap::relocate_survivor_helper (BYTE* plug, BYTE* plug_end) {
    BYTE* x = plug; 
    while (x < plug_end) {
        size_t s = size (x);
        BYTE* next_obj = x + Align (s); 
        Prefetch (next_obj); 
        relocate_obj_helper (x, s); 
        assert (s > 0);
        x = next_obj;
} }
Copy the code

Table 2.13

#! c++ //#define PREFETCH #ifdef PREFETCH __declspec(naked) void __fastcall Prefetch(void* addr) { __asm { PREFETCHT0 [ECX]  ret }; } #else //PREFETCH inline void Prefetch (void* addr) { UNREFERENCED_PARAMETER(addr);     } # endif / / PREFETCHCopy the code

2.4 Firfox SpiderMonkey


SpiderMonkey is a Javascript engine with a JIT that Firfox uses by default. None of the instructions RowHammer needs appear in SpiderMonkey.

0 x03 summary


The main purpose of this study is to trigger the execution of RowHammer through the JIT engine. In order to improve the execution efficiency of scripting language, most scripting engines have JIT compilers to improve the efficiency of operation. In this paper, Hotspot, V8, RyuJIT and SpiderMonkey have been studied, and there is no better way to trigger RowHammer than Hotspot. Of course, there are still some JIts that have not been studied, but the JIT has been proved by the above research Triggering is difficult for several reasons:

1) The trigger condition of RowHammer is quite harsh. The successful trigger within 64ms means that the number of irrelevant instructions must be very small. Otherwise, wordline cannot charge and discharge enough times within 64ms.

2) Cache-related instructions are not commonly used. The RowHammer operation requires CLFLUSH, PREFETCH, and MOVNT* instructions. These instructions are not common in actual use, and cache-related operations in user mode are rare.

3) From the perspective of JIT developers, in order to achieve cross-platform, instructions are generally abstracted and then implemented on each platform. Abstract instructions as little as possible, because each abstraction requires a large amount of additional code. Among the JIT engines analyzed, only hotspot abstracts the Prefetch instruction. The engine abstracts as little as possible the instructions needed by the compiler. It is very difficult to directly generate the required assembly instruction through the script. (The exception is that if a third party engine (such as AsmJit) is used, the engine abstracts all the assembly instructions, which is more likely to trigger, whereas the JIT part of the current main flow language is mostly developed independently, and the third party engine is often extracted and progressively refined from this code).

4) In the whole analysis process, it was found that the main reason for the instruction was auxiliary JIT compilation, such as using Prefetch to improve the speed of some data access, using CLFLUSH to flush the instruction buffer, etc. The frequency and frequency of the instructions fell far short of RowHammer’s requirements.

The resources


  1. Google Project Zero blog http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

  2. Paper: Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors http://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf

  3. High-level language virtual fleet group http://hllvm.group.iteye.com/    

  4. Open source code for each language