In the process of learning Dynamic Taint analysis, I became interested in the process of implementing Dynamic Taint analysis. In Phosphor: Dynamic Taint Tracking for the JVM, their implementation is analyzing instructions. The implementation is too complex and the analysis is too slow. At this time, the structure of the domestic open source hole IAST is very clear, and it is also based on jVM-Sandbox, so it is quite conducive to analysis.

This hole state IAST source code, I read a few times, the whole stain analysis shelf built, but I have a few questions (uneducated, want to consult the big guys).

  1. How is it propagated against primitive data types? In the IastTaintPool class, the Object, Map, and String array types are wrapper classes for primitive types, but if the primitive type is, for example: Int, long, Boolean, etc., are propagation invalid because hashcode cannot be computed, so it cannot be detected as a stain?

  2. In the SourceImpl solveSource method, if the method return value is not Boolean, the method name is getAttribute, and the return value is not null, then the return value is added to the smudge value. The question here is: why is the return value a stain? According to the stain analysis principle, the method parameter should be a stain, and the parameter should be added to the stain pool. The return value of the method may not be Taint.

  3. What does solveator () do in the PropagatorImpl class, matching O P R & I? Why is it designed this way? Why is the depth of the call stack 11? Does this not cause the call stack information to be lost? (I used ASM peg to get the call stack at the sink point, and found that the key information was written at the front, which should be the call of user code, and the call made by the framework at the back. Is this information useful?)

  1. What does IastHookRuleModel mean by parameters in model.xml?

  1. In the IastHookRuleModel class, the Source point read from the config file is not used, the variable is not assigned at the time of instantiation (because there is no IastSourceModel), so why include Source in the config file? At first I didn’t quite understand why there was a source configuration file, but also some server, framework entry in the peg file. Is this because the source point in model.xml doesn’t work at all?

  1. Another question is how the smudge graph generated by the GraphBuilder class is a graph. By invokeID? What do sourceHash and targetHash mean? SourceHash is the current method caller’s Hashcode? TargetHash is the Hashcode of the outgoing value of the method?