This article has been authorized by the author Yao Taihang netease cloud community.

Welcome to visit netease Cloud Community to learn more about Netease’s technical product operation experience.

These connections

The introduction of the case has been given in the previous article, and relevant parts will not be described in this article. In order to better read this article, a brief understanding of the background is required. It is recommended that you scan the preceding paragraphs briefly:

Code online compiler (part 1) – edit and compile

Safety inspection

Security checks in online compilers are designed to determine whether user code can be run safely and without compromising the runtime environment. General scenarios and special scenarios (described above) are used as examples to distinguish:

  • Common scenario: User code only depends on the native library. When the operating environment is sandbox, sandboxes are independent of each other. Environmental damage caused by user code only affects a single sandbox and does not affect the normal use of other sandboxes and underlying systems.

  • Special scenario: User code depends on the API provided by the platform, and the runtime environment cannot use an independent sandbox. Improper operation of user code may cause exceptions in the entire runtime environment, resulting in failure of other user codes or server crash.

Therefore, in special scenarios, how to determine whether the user code meets the security requirements is the content of security detection.

When building a reasonable security detection process, it is not possible to directly enter the stage of code analysis. Before this, it is necessary to make clear the various stages involved in the user code life cycle. Only on the basis of fully understanding the tested objects and detection targets, can a reasonable process be given:

  • Who wrote the code: the user who wrote the code and the user’s image

  • How to write the code: content analysis of the code language, structure, content, etc

  • How the code works: call analysis of how the code runs, functions, effects, etc

  • How to test: Design a proper test process

User profile – “Who wrote the code”

The analysis of user image and background is often neglected in the process of pure technology implementation.

Netease precious metals quantitative platform in the beginning, most of the energy is placed on the safety detection scheme based on user code itself, ignoring the user Angle consideration, resulting in no image of the user and the ability to have a correct assessment under the circumstances of building safety inspection scheme, always appear “unexpected”, “incomplete cover”, and so on and so forth, User code always exceeds expectations. After reflection, we changed the entry point, starting with the user image and possible behavior, redesigned and designed the detection process, which greatly increased the coverage of detection and made it easier to implement.

The user image is mainly to clarify the state and knowledge level of the user subject who participates in the code writing process. The content related to the user image that needs to be clarified can mainly include:

  • User programming background: Whether the user has programming experience in the relevant programming language, and how familiar with the programming language. The worst assumption is that if the user changes the code in the demo without even touching the corresponding programming language, there will be all kinds of errors.

  • User business knowledge background: whether the user has the business knowledge background related to the code, and the degree of business knowledge mastery. In special scenarios, when users write codes with a relatively clear business background, if they do not have a good grasp of the business background, the written code may not be in line with reasonable logic.

  • User operation guidance effect: whether the user will follow the guidance given by the platform. If the user does not fully read and understand instructions such as help documentation, the user code can be structurally and logically broken.

  • User intent: The user’s intent to use the platform is unclear, and there is a possibility that malicious users may use online compilers to damage the platform.

With regard to the above concerns, there should be no illusions about the user’s situation, and the user image assumptions made should consider the most adverse case. In short:

User code is never trusted.

Case description

Netease Precious Metals quantization platform (introduced in the previous article), provides Java online compiler, user code content for quantitative investment strategy logical description, user image of some adverse scenario assumptions:

  1. Users come from different backgrounds, and some may not have been exposed to Java before, so they can’t expect code syntax legitimacy

  2. User skill level is uneven, what kind of code is possible, can not expect code specification

  3. Even if the help document is written well, users may not be able to read it completely and can’t expect the code operation to be reasonable

  4. Users may not be in accordance with your requirements of the specification, honestly in accordance with the Demo process given to write, can not expect code in line with the predetermined process

  5. The user may not know where a quantification strategy starts, ends, and what needs to be included, and cannot expect the code to be logical

  6. There may be some malicious users, the purpose is to do damage, do not expect code security

After identifying the possible causes of user code triggering security problems, then designing corresponding solutions according to different causes will greatly improve the effectiveness of the security detection process.

Code Content Analysis – “How code is written”

Code content analysis is mainly to analyze the content, structure and other aspects involved in the code, so as to facilitate the prediction of the content that may be involved in and exist in the user code before obtaining the user code, so as to conceive the coping ideas. The focus covers:

  • Code structure: Using Java as an example, you can make it clear whether the class is the body, whether the necessary methods are included, and whether there is an explicit method structure

  • Business logic: Whether the code itself has an explicit business logic structure

  • Include references: Whether code must refer to other classes, or to a method in a class

  • Data interaction: What exists between user code and the platform that involves data interaction, input and output.

Once the above concerns are clear, the general process of user code can be defined before user code is acquired, in conjunction with the platform’s fixation of code structure in the construction stage of user code.

Case description

Netease precious Metals quantization platform has a fixed code structure and must implement the policy class template interface (introduced in the last article). The business logic is very clear, and the behavior abstraction of the policy code mainly includes:

By abstracting the behavior of the policy code, you can identify all the businesses that the code contains, the business-corresponding class references, and the things that will generate data interaction. In the figure above, the behavior is abstracted into a tree structure, the behavior points are distinguished by category, and the input (I), output (O) and other concerns that generate data interaction are clearly marked, so that the code inspection process can be constructed based on this.

Code Usage Analysis – “How the Code works”

User code ultimately needs to be loaded into the running environment for invocation. In the process of platform invocation of user code, the invocation details will also affect the planning and conception details of security detection. Concerns about usage analysis can be considered in terms of:

  • Compilation process: Details of the compilation process

    • Whether the language needs to be compiled

    • Whether compilation accompanies diagnostics, and diagnostics are included in the diagnostics process

    • Compile results in storage form (file, database, or memory)

  • Call process: Platform call details

    • What class to call in user code

    • Call any method in user code

    • Call mode (one-time call, circular trigger, timed trigger, event trigger, etc.)

    • Data interaction during invocation (based on content analysis, analyze the initial source and final flow of data, files, databases, memory, etc.)

  • End of run: End user code retention

    • Whether the call will be used by the platform for other functions

    • Whether the code compiles the class after the call is rejected by the class loader

    • Whether the user code remains in the system after the call

Through usage analysis, it can plan the part of user code interacting with the platform in the whole life cycle. These interactions at different levels may pose a threat to system security. Therefore, different detection strategies should be given according to different interaction levels in the design of security detection process.

Case description

The process of use analysis of netease Precious Metals quantitative platform (the case is introduced above) can be summarized as follows:

  1. Compilation process: Java language, need to be compiled, the compilation process is accompanied by diagnosis, the final compilation result in the form of bytecode, without landing

  2. Call procedure:

    1. The user code is an implementation of the policy class that is loaded into the server’s classloader when called.

    2. Call the init method in the policy class to complete the initialization of the policy class object, call the handle method repeatedly to complete the corresponding action within each dispatch period of the market, and finally call the onExit method to complete the policy calling process.

    3. In terms of data interaction, the user strategy class will read the market information in memory and generate trading information (opening and closing positions) according to the logic content of the strategy.

  3. End of run: After the user code is called, the user policy class will not be used, the class loader will remove the class from the loader, and the instance object of the policy class will be collected by GC.

Security Testing Process – “How to Test”

After the above process, the life cycle and usage scenarios of user code are basically described completely, and the security detection process can be constructed according to specific scenarios.

layout

Do not expect that the security test can be completed at once, even if it is a one-time to achieve the purpose of security testing, the cost is outweighed by the gain.

Security checks should be distributed throughout the code lifecycle.

Methods that can be checked once, in theory, must be deployed far enough later in the entire code lifecycle that once the user code has been detected as a security problem and is not being run by the platform, all previous operations will be in vain. If security checks are distributed in each phase of the life cycle, different security checks can be solved at different stages. Security problems in the code can be discovered as early as possible, thus reducing unnecessary operations and resource consumption.

Test content and purpose

In a nutshell, the purpose of detection is to ensure that user code is only available within permissible limits.

For this fundamental purpose, the detection content concerns may include:

  • Whether user code uses classes outside the specified scope

  • Whether user code uses methods outside the scope specified

  • Whether the user code has behavior outside of the specified scope

  • Is there bad behavior in the user code that is not easy to detect (large memory usage, long thread occupancy)

The above content is relatively abstract, and the relevant technologies and specific details will be introduced in combination with specific cases.

Case description

Netease Precious Metals quantization platform, with the deepening of the understanding of Java online compiler, security detection process is also constantly improved.

The simplest version

Users can independently import JDK related packages, platform supplement API related packages. After the source code is generated, the diagnostic information in JavaComplier is used to fully determine whether a policy class is operational in the current project environment.

Problems faced:

  • There is no restriction on the package import, and users can use all JDK classes

  • Even with import restrictions, java.lang and the package in which the current class is imported by default are imported by default, and the classes in them cannot be restricted to import

Problems caused by:

  • You can use system-related classes such as System to affect System status

  • Users can send network requests to the outside world

  • Users can create their own threads

  • .

Updated detection process

Through analyzing the content of the user code, operation analysis, come to a conclusion that: if you only want the user to use the platform to provide services and behavior, we must limit the rest of the ACTS, but behavior is the implementation of “method”, but the executive main body of the method is to “classes and objects”, so “the essence of the limiting behavior is restricted to use”.

With this understanding as the core, and in accordance with the idea of distributing the testing process in each stage of the code life cycle, the testing process planned by the quantitative platform is as follows:

The detection process is distributed in each process before, during and after compilation, and the specific content includes:

  • Precompile – Precompile:

    • Specifies the package path for all compiled classes

    • Check whether the file is empty

    • Check whether the file is imported separately

    • Detects whether the file contains a policy template interface implementation class

    • Tests whether a class is unique and whether an inner class exists

    • Simple lexical analysis of code:

    • Code location restrictions:

  • Compile time – Compile Diagnostics:

    • Obtain diagnostic information during compilation of class source code to determine whether a diagnosis exists

  • After compiling – Result check:

    • The compiled.class bytecode is parsed to find all classes involved and whitelisted for the list of safe classes

    • Class list security check:

The detection content involved before and during compilation has been described in detail in the previous “Code Online Compiler (PART 1) – Editing and compilation”. Here we mainly introduce the related content of class list security check.

Class list security check

The purpose is to find all the classes involved in the user policy and compare them with the class whitelist to verify whether these classes are all in the whitelist. If there are classes that are not in the whitelist, the user code is considered unsafe and can be rejected to run and give feedback to the user.

Class white list

The classes in the class whitelist come from two parts:

  • The classes involved in the API provided by the platform

  • JDK classes that must be used (classes in java.lang should also be specified separately; not all classes in java.lang are available)

  • Classes in the tripartite toolkit

Get involved classes

There are two ways to get all the classes involved in user code:

  • Analysis from Source code: Using the Abstract Syntax Tree in Compilation Principles

  • Bytecode analysis: Directly analyze bytecode composition using Java bytecode framework.

Class class bytecode is the final embodiment of the code after compilation, which can feedback all the classes actually used. In the actual implementation process, this scheme is also used for implementation.

The bytecode framework used is ASM.

ASM using

Introduction to the

ASM is a Java bytecode manipulation framework. It can be used to dynamically generate classes or enhance the functionality of existing classes. ASM can either generate binary class files directly or dynamically change the behavior of classes before they are loaded into the Java virtual machine. Java classes are stored in rigorously formatted.class files that have enough metadata to parse all the elements of the class: class names, methods, attributes, and Java bytecodes (instructions). After reading information from class files, ASM can change class behavior, analyze class information, and even generate new classes on demand. The most common use scenario for ASM is static AOP implementations (such as CGLIB).

This article uses ASM to analyze class information to extract all the classes involved in the use of class bytecode.

Class file structure

Before analyzing the class file, let’s take a quick look at the class structure:

The meanings of each part of the file are as follows:

  • Magic: This holds the Magic number and version information of a Java class file. The first four bytes of a Java class file are called its magic number. Each correct Java class file starts with 0xCAFEBABE, making it easy for the Java virtual machine to distinguish between Java files and non-Java files.

  • Version: Stores the Version information of Java class files.

  • Constant Pool: This item holds literal strings, class names, method and interface names, final variables, and references to external classes. The virtual machine must maintain a constant pool for each loaded class that stores symbolic references to all types, fields, and methods used for that type.

  • Access_flag: Specifies whether the file defines a class or an interface (there can only be one class or interface in a class file). It also specifies the access flags of the class or interface, such as public, private, and abstract.

  • This Class: pointer to a string constant representing the fully qualified name of the Class.

  • Super Class: pointer to a string constant representing the fully qualified name of the parent Class.

  • Interfaces: An array of Pointers that hold Pointers to string constants of all interface names implemented by the class or its parent.

  • Fields: This field describes the Fields declared in a class or interface. Only the Fields in the class or interface are listed. Fields inherited from the superclass and parent interface are not included.

  • Methods: This item provides a detailed description of Methods declared in a class or interface. Examples include the name of the method, parameters, and return value types. Note that the Methods list contains only methods of the class or interface, not methods inherited from the superclass or parent interface.

  • Class Attributes: This holds basic information about attributes defined by a Class or interface in the file.

In these contents, Fileds and Methods are involved in class names. You need to use ASM to browse these contents.

Class file contents

A simple code containing Fields and Methods illustrates how classes are represented in bytecode.

The source code

public class IniBean {private static Logger logger = LoggerFactory.getLogger("inibean"); private static AtomicBoolean initFlag = new AtomicBoolean(true); @PostConstructpublic voidinit() {
logger.info("[IniBean] init method invoke.");if (initFlag.getAndSet(false)) { refreshIniInfo(); }}...Copy the code

Compiled bytecode

  public void init();
    Code:       0: getstatic     #2 // Field logger:Lorg/slf4j/Logger;
       3: ldc           #3 // String [IniBean] init method invoke.
       5: invokeinterface #4, 2 // InterfaceMethod org/slf4j/Logger.info:(Ljava/lang/String;) V
      10: getstatic     #5 // Field initFlag:Ljava/util/concurrent/atomic/AtomicBoolean;
      13: iconst_0      14: invokevirtual #6 // Method java/util/concurrent/atomic/AtomicBoolean.getAndSet:(Z)Z
      17: ifeq          24
      20: aload_0      21: invokevirtual #7 // Method refreshIniInfo:()V
      24: returnCopy the code

In the compiled bytecode, the class name is a file structure (note that this is not “. Instead, the split path is a “/”, which is obtained by class.getClassName(). The partitioned class package path, need to pay attention to conversion before comparison), using ASM to browse the class file bytecode can get the list of classes involved.

ASM process
Key Class Description

A brief description of some of the key classes used:

  • org.objectweb.asm.ClassVisitor

    /**
    * A visitor to visit a Java class. The methods of this class must be called in
    * the following order: <tt>visit</tt> [ <tt>visitSource</tt> ] [
    * <tt>visitOuterClass</tt> ] ( <tt>visitAnnotation</tt> |
    * <tt>visitTypeAnnotation</tt> | <tt>visitAttribute</tt> )* (
    * <tt>visitInnerClass</tt> | <tt>visitField</tt> | <tt>visitMethod</tt> )*
    * <tt>visitEnd</tt>.
    *Copy the code

    Class accessor, used to access class nodes.

  • org.objectweb.asm.ClassReader

    /**
    * A Java class parser to make a {@link ClassVisitor} visit an existing class.
    * This class parses a byte array conforming to the Java class file format and
    * calls the appropriate visit methods of a given class visitor for each field,
    * method and bytecode instruction encountered.
    * 
    ...Copy the code

    Used to read in class bytecode related content and provide internal element access methods.

  • org.objectweb.asm.tree.ClassNode

    /**
    * A node that represents a class.
    *Copy the code

    Said a class, and the corresponding class file structure, inheritance in org. Objectweb. Asm. ClassVisitor.

  • org.objectweb.asm.ClassWriter

    /**
    * A {@link ClassVisitor} that generates classes in bytecode form. More
    * precisely this visitor generates a byte array conforming to the Java class
    * file format. It can be used alone, to generate a Java class "from scratch". * or with one or more {@link ClassReader ClassReader} and adapter class visitor * to generate a modified class from one or more existing Java classes. *Copy the code

    Class writer, and provides the process of class line by line scanning.

The process is briefly
  1. Read the content of the bytecode class content

    ClassReader cr = new ClassReader(classByte);Copy the code
  2. Generate ClassNode object ClassNode, ClassWriter ClassWriter (for using AbstractInsnNode)

    this.classNode = new ClassNode(); Classreader.accept (classNode, classreader.skip_debug); Cw = new ClassWriter(ClassWriter.COMPUTE_MAXS); classAdapter = new ASMClassAdapter(cw); classReader.accept(classAdapter, ClassReader.SKIP_DEBUG);Copy the code

    3. The traverse Fields

    private Set<String> getClassInFields() { Set<String> result = new HashSet<>(); List<FieldNode> fieldList = this.classNode.fields;for(FieldNode FieldNode: fieldList) {// 1.0 getClassName String classNameStr = type.gettype (fieldnode.desc).getclassname (); / / 2.0 will be true in the name of the class to fill out to the class list result. The add (ASMConstant. PickClassName (classNameStr)); }return result;
    }Copy the code

    4. Go through all Methods

    private Set<String> getClassInMethods() { Set<String> result = new HashSet<>(); List<MethodNode> methodList = this.classNode.methods;for(MethodNode MethodNode: methodList) {// 1.1 Captures the current class type if it is a constructorif(ASMConstant.METHOD_TYPE_INIT.equals(methodNode.name)){
     result.add(ASMConstant.pickClassName(this.classNode.name.replaceAll(\ \ "/"."\ \."))); } // 1.2 Extract parametersfor(Type argumentType : Type.getArgumentTypes(methodNode.desc)){ result.add(ASMConstant.pickClassName(argumentType.getClassName())); } / / 1.3 to extract local variable List < LocalVariableNode > lvNodeList = methodNode. LocalVariables;for(LocalVariableNode lvn : lvNodeList) { result.add(ASMConstant.pickClassName(Type.getType(lvn.desc).getClassName())); }}return result;
    }Copy the code

    5. Traversal within Methods line by line (the previous step only traversed the Methods declaration)

    public void visitInsn(int opcode) {
    Iterator<AbstractInsnNode> itr = this.instructions.iterator(0);while(itr.hasNext()) { AbstractInsnNode insn = itr.next(); switch (insn.getType()) {caseAbstractinsnnode. FIELD_INSN:// 1.0 String fieldInsnDesc = (FieldInsnNode) insn).desc; / / 1.1 to obtain the name of the class String classNameStr = ASMConstant. PickClassName (the getType (fieldInsnDesc). GetClassName ()); This.classset.add (classNameStr);break;caseMethodInsnOwner = ((MethodInsnNode) insn).owner; abstractinsnNode. METHOD_INSN:// 2.0 methodInsnOwner = (MethodInsnNode) insn).owner; / / 2.1 name translation this. ClassSet. Add (ASMConstant. PickClassName (methodInsnOwner. ReplaceAll (\ \ "/"."\ \.")));break; caseAbstractinsnnode. TYPE_INSN:// Type String to be called in the 3.0 methodtypeInsnDesc = ((TypeInsnNode) insn).desc; this.classSet.add(ASMConstant.pickClassName(typeInsnDesc.replaceAll(\ \ "/"."\ \.")));break; .Copy the code

    After the above process is summarized, the list of all classes involved in a certain class of bytecode can be obtained, and then compared with the explicitly constructed whitelist list, the security of the class usage category can be verified.

tips

To accumulate experience in engineering practice, we not only need to improve our proficiency in the process, but also need to abstract models, sort out ideas and summarize theories from practice, and pursue the precipitation obtained from “technique” to “technique” in technology.


Free experience cloud security (EASY Shield) content security, verification code and other services

For more information about netease’s technology, products and operating experience, please click here.


Hadoop + Hive test environment (for novice users)