The DAG State Machine for DAG-based Instructin Selection

We know that for the DAG-based instruction selector in LLVM, we specifies the pattern in a target-specific Tablegen file to be matched during Instruction Selection stage after the DAG of a given basic block is prepared.

For example, given the following instruction description. By the way, this example is from Cpu0 Tutorial.

...
// Generic Cpu0 Format
class Cpu0Inst<dag outs, dag ins, string asmstr, list<dag> pattern,
               InstrItinClass itin, Format f>: Instruction {
  field bits<32> Inst;
  Format Form = f;
  let Namespace = "Cpu0";
  let Size = 4;
  bits<8> Opcode = 0;
  let Inst{31-24} = Opcode;
  let OutOperandList = outs;
  let InOperandList  = ins;
  let AsmString   = asmstr;
  let Pattern     = pattern;
  let Itinerary   = itin;
  bits<4> FormBits = Form.Value;
  let TSFlags{3-0}   = FormBits; // Target-specific flags
  let DecoderNamespace = "Cpu0";
  field bits<32> SoftFail = 0;
}
//===----------------------------------------------------------------------===//
// Format A instruction class in Cpu0 : <|opcode|ra|rb|rc|cx|>
//===----------------------------------------------------------------------===//
class FA<bits<8> op, dag outs, dag ins, string asmstr,
         list<dag> pattern, InstrItinClass itin>:
      Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmA> {
  bits<4>  ra;
  bits<4>  rb;
  bits<4>  rc;
  bits<12> shamt; // shift left/right by amount
  let Opcode = op;
  let Inst{23-20} = ra;
  let Inst{19-16} = rb;
  let Inst{15-12} = rc;
  let Inst{11-0}  = shamt;
}

class ArithLogicR<bits<8> op, string instr_asm, SDNode OpNode,
                  InstrItinClass itin, RegisterClass RC, bit isComm = 0>:
  FA<op, (outs GPROut:$ra), (ins RC:$rb, RC:$rc),
     !strconcat(instr_asm, "\t$ra, $rb, $rc"),
     [(set GPROut:$ra, (OpNode RC:$rb, RC:$rc))], itin> {
  let shamt = 0;
  let isCommutable = isComm;	// e.g. add rb rc =  add rc rb
  let isReMaterializable = 1;
}
def MUL :ArithLogicR<0x17, "mul", mul, IIImul, CPURegs, 1>;
Copy the code

Now, we focus on the pattern part for matching the IR instruction of mul to Cpu0 instruction MUL, with arguments replaced, we get the following pattern.

[(set GPROut:$ra, (mul CPURegs:$rb, CPURegs:$rc))]
Copy the code

The GPROut and CpuRegs are the register class, which are a set of registers. That requires the destination register, $ra, for mul should be one of GPROut, and the source registers, $rb and $rc, should be one of CPURegs.

Moreover in class FA, which MUL is derived from, specifies ra, rb, and rc are bits<4> as well as let them unbound, therefore the operands for LLVM IR mul, $ra, $rb, and $rc will fill into those unbound fields in the instruction MUL.

Thus, that pattern says, when there is a LLVM IR mul instruction with 2 register operands and one result register, chose the MUL Cpu0 instruction for it.

After we setup the instruction td file, the tablegen tool will generate the needed information for Instruction Selection. Here is the quote for the procedure of instruction selection.

To be more specific, the sequence is:

The generic SelectionDAGISel::DoInstructionSelection method calls Select per DAG node.

Select is an abstract method, implemented by the targets. For example X86DAGToDAGISel::Select.

The latter intercepts some nodes for manual matching, but delegates the bulk of the work to X86DAGToDAGISel::SelectCode.

X86DAGToDAGISel::SelectCode is auto-generated by TableGen [4], and contains the matcher table, followed by a call to the generic SelectionDAGISel::SelectCodeCommon, passing it the table.

So what is the matcher table? Essentially, it’s a “program” written in a sort of a “bytecode” specific for instruction selection.

The method void SelectionDAGISel::SelectCodeCommon(SDNode *NodeToMatch, const unsigned char *MatcherTable, unsigned TableSize) is to use the matcher table and do an instruction selectin for a given node. The matcher table contains the actions needed to be taken for matching a particular node. Here is part of matcher table.

static const unsigned char MatcherTable[] = {
/*     0*/ OPC_SwitchOpcode /*19 cases */, 23, TARGET_VAL(ISD::LOAD),// ->27
/*     4*/  OPC_RecordMemRef,
/*     5*/  OPC_RecordNode, // #0 = 'ld' chained node
/*     6*/  OPC_RecordChild1, // #1 = $addr
/*     7*/  OPC_CheckPredicate, 0, // Predicate_unindexedload
/*     9*/  OPC_CheckPredicate, 1, // Predicate_load
/*    11*/  OPC_CheckPredicate, 2, // Predicate_load_a
/*    13*/  OPC_CheckType, MVT::i32,
/*    15*/  OPC_CheckComplexPat, /*CP*/0, /*#*/1, // SelectAddr:$addr #2 #3
/*    18*/  OPC_EmitMergeInputChains1_0,
/*    19*/  OPC_MorphNodeTo1, TARGET_VAL(Cpu0::LD), 0|OPFL_Chain|OPFL_MemRefs,
                MVT::i32, 2/*#Ops*/, 2, 3, 
            // Src: (ld:{ *:[i32] } addr:{ *:[iPTR] }:$addr)<<P:Predicate_unindexedload>><<P:Predicate_load>><<P:Predicate_load_a>> - Complexity = 13
            // Dst: (LD:{ *:[i32] } addr:{ *:[iPTR] }:$addr)
/*    27*/ /*SwitchOpcode*/ 24, TARGET_VAL(ISD::STORE),// ->54
...
Copy the code

At position 0 we have a OPC_SwitchOpcode operation, which is kind of a huge switch table on the node opcode. It’s followed by a list of cases. Each case begins with its size (so that the matcher knows where to go if matching the case fails), and then the opcode.

The selection will start to find the index to start off for a particular SDNode. The way to find the index is quite straightforward, you can see in that table, TARGET_VAL(ISD::LOAD) is for a SDNode with opcode of ISD::LOAD, it will check the opcode of the SDNode to be matched is equal to that TARGET_VAL, if not, go to find the next one by adding the size and the checked index, until opcodes are equal.

Then suppose we have the index for our SDNode, start from the index + 1, those are the actions to take one by one or there is goto-like action, until it finishes or fails. The OPC_XXXX is the action id to tell it what to do.

For our example, the following is the actions for matching LLVM IR mul to Cpu0 MUL.

/*  1176*/ /*SwitchOpcode*/ 12, TARGET_VAL(ISD::MUL),// ->1191
/*  1179*/  OPC_RecordChild0, // #0 = $rb
/*  1180*/  OPC_RecordChild1, // #1 = $rc
/*  1181*/  OPC_CheckPatternPredicate, 4, // (Subtarget->hasChapter4_1())
/*  1183*/  OPC_MorphNodeTo1, TARGET_VAL(Cpu0::MUL), 0,
                MVT::i32, 2/*#Ops*/, 0, 1, 
            // Src: (mul:{ *:[i32] } CPURegs:{ *:[i32] }:$rb, CPURegs:{ *:[i32] }:$rc) - Complexity = 3
            // Dst: (MUL:{ *:[i32] } CPURegs:{ *:[i32] }:$rb, CPURegs:{ *:[i32] }:$rc)
Copy the code

For the comments in the code snappet, we can know it does exactly what we specified in instruction target description file.

The DAG State Machine for DAG-based Instructin Selection

Related Posts

Migrate from Oracle to MySQL

HTML version of the countdown display

(4) Some other operational uses of PyTest