Architecture Review

1. Short answer

1. Take a series of computers as an example to illustrate the relationship among computer system structure, computer composition and computer implementation.

Computer composition is the logical realization of computer system structure. A computer implementation is a physical implementation composed of computers. A system structure can have multiple components; There can be multiple implementations of one composition. All models of machines in the same series have the same system structure, but adopt different composition and implementation technology, so they have different performance and price.


2. What are the four most frequently used quantitative principles in the structural design and analysis of computer systems?

(2) Amdahl’s Law: the system performance acceleration ratio obtained by speeding up the execution speed of a certain part is limited by the importance of this part in the system. (3) CPU performance formula. CPU time required to execute a program = IC x CPI x clock cycle time. (4) The principle of locality of the program: the distribution of the addresses accessed by the program during execution is not random, but relatively clustered.


3. According to Amdahl’s law, which two factors determine the system acceleration ratio?

The system acceleration ratio depends on two factors: (1) improvable proportion: the proportion of the improvable part in the calculation time of the original system; (2) No acceleration ratio: the performance of the improved part can be improved.


4. Briefly describe the advantages of instruction dynamic scheduling.

(1) It can handle some dependencies that are unknown at compile time (such as those involving memory access) and simplify the compiler; (2) It can make the optimized compilation of the code for one pipeline be efficiently executed on other pipelines (dynamic scheduling). Of course, these advantages of dynamic scheduling come at the cost of a significant increase in hardware complexity.


5.Tomasulo algorithm adopts distributed retentive stations. What are its characteristics?

(1) Conflict detection and instruction execution control are distributed. The information in each feature’s hold station determines when instructions can start executing on that feature. (2) Through CDB, the calculation results are directly transmitted from the retention station where it is generated to all functional components that need it, without going through registers.


6. What is the purpose of dynamic branch prediction?

Predict whether the branch is successful and find the branch target address (or instruction) as soon as possible, so as to avoid the flow line halt caused by control correlation.


7. What are the two key problems to be solved by using dynamic branch prediction technology?

(1) How to record the historical information of branches; (2) How to predict the whereabouts of branches (or even get instructions) according to the information.


8. What are at least two fields for each entry in the BTB table?

The address of a successful branch instruction that was executed; Predicted branch target address.


9. Briefly describe the basic idea of guessing execution

Make a guess at the result of a branch instruction, and assume that the guess is always correct, and then proceed to fetch, discharge, and execute subsequent instructions based on the guess result. It’s just that instead of writing the result of executing the instruction back to register or memory, it’s put into a Buffer called a Reorder Buffer. Wait until the corresponding instruction is “confirmed” (that is, should be executed) before writing the result to a register or storage.


10. What three ideas do hardware guess-based execution combine?

(1) Dynamic branch prediction. Used to select instructions for subsequent execution. (2) Execute subsequent instructions speculatively before control-related results are available. (3) Use dynamic scheduling to schedule various combinations of basic blocks across basic blocks.


11. What are the types of assembly line conflicts?

Pipeline conflicts are classified into the following three types: (1) Structure conflicts: conflicts occur because hardware resources cannot meet the requirements of overlapping instruction execution. (2) Data conflict: when instructions are overlapped in the pipeline, conflicts occur because the execution results of previous instructions are needed. (3) Control conflict: the assembly line encounters conflicts caused by branch instructions and other instructions that will change the VALUE of PC.


12. In the design of memory hierarchy, four problems to be solved first and their meanings are discussed.

(1) Block placement strategy: how to place blocks in the memory hierarchy? (2) Block replacement strategy: how to replace a block when it fails once? (3) Block identification strategy: how does a block find it in the storage hierarchy? (4) Writing strategy: What happens when you write?


13. What are the methods to solve the conflicts of assembly line structure?

(1) Streamline functional units; (2) Duplication of resources; (3) Suspend the assembly line.


There are three kinds of reasons why Cache Misses.

(1) Forcible miss: the initial access block is always not in the Cache and must be fetched from memory. Therefore, it is also called first access miss. (2) Capacity loss: When the program is executed, the required blocks cannot all be called into the Cache due to insufficient capacity. When some blocks are replaced, if they are accessed again, page loss will occur, also known as jitter phenomenon. (3) Collision missing: Because multiple Blocks map to the same block, other groups or Blocks have free positions. This is due to the side effects of a combination and a direct combination.


15. What are some ways to reduce Cache invalidation costs?

(1) Let read invalidation take precedence over write. (2) Write buffer merge. (3) Request word processing technology. (4) Non-blocking Cache or non-locking Cache technology. (5) Use level-2 Cache.


16. What are the benefits of a small and simple Cache?

(1) It can effectively improve the Cache access speed. Because the simpler the hardware, the faster it is. Small Cache can realize fast identity detection, which is beneficial to reduce hit time. (2) The Cache is small enough to be on the same chip as the processor, so as to avoid the time overhead caused by off-chip access. (3) Direct image Cache can be used to keep the Cache structure simple. The main advantage of direct image Cache is that identity detection and data transfer overlap, which can effectively reduce hit times.


17. The failure rate of group linked Cache is lower than that of direct image Cache of the same capacity. Can we conclude from this that there is definitely a performance improvement in using group associative images? Why is that?

Not necessarily. Because the hit rate of group linkage is increased at the cost of increasing hit time, group linkage requires the addition of multiple selector switches.


18. Explain the basic idea of Victim cache?

A small, fully connected Cache is placed between the Cache and the path through which it fetches data from the lower level of storage to hold the replaced blocks for reuse. These saved replacement blocks are called Victim blocks, and the buffer that holds them is called the Victim cache. Victim cache is very effective for reducing collision failures, especially for small direct image caches.


19. Explain how pseudo-associative cache works.

Logically divide the directly mapped Cache space into two partitions. For any access, the pseudo-associative Cache is treated as a direct Cache. If it hits, the access process is the same as if it were directly mapped to the Cache. If not hit, then to another area to find the corresponding position. If found, a false hit has occurred, otherwise the next level of memory has to be accessed.


20. In the Cache-main memory level, what are the two main memory update algorithms? What are the characteristics of each?

(1) Write direct method. Easy to implement, and the data in the next level of storage is always up to date. (2) Write back. Fast, “write” operations can be performed at the speed of Cache memory. In addition, multiple writes of the same cell only need to be written back to the next level of memory once, and some “writes” only reach Cache, not main memory, so the memory band used is low.


21. The failure rate of group linked Cache is lower than that of direct image Cache of the same capacity. Can we conclude from this that using group linkage can definitely bring performance improvement? Why is that?

Not necessarily. Because the hit rate of group linkage is increased at the cost of increasing hit time, group linkage requires the addition of multiple selector switches.


22. Two of the biggest challenges affecting multiprocessor parallelism?

Limited parallelism and long communication latency are the two biggest challenges of multiprocessors. A better parallel algorithm is used to overcome the problem of low parallelism. The reduction of remote communication latency can be achieved either by architecture or by programmers. Caching shared data on hardware or reconstructing it on software can increase local access and thus reduce the frequency of remote access. You can also use prefetch or multithreading to reduce the impact of latency.


23. Briefly describe the design principle of RISC instruction set structure

(1) Select the most frequently used instructions and supplement some of the most useful instructions; (2) The function of each instruction should be as simple as possible and be completed within one machine cycle; (3) All instructions have the same length; (4) Only Load and Store operation instructions access memory, other instruction operations are carried out between registers; (5) Support high-level languages in a simple and effective way.


24. Briefly describe the main objectives of CISC instruction set structural and functional design. From the point of view of current computer technology, what are the disadvantages of CISC instruction set architectures?

The main goal is to enhance the command function, leaving more and more functions to the hardware, and the number of instructions is increasing. Disadvantages: (1) In the instruction set of CISC structure, the frequency of use of various instructions varies greatly. (2) The complexity of CISC structural instructions brings about the complexity of computer architecture, which not only increases the development time and cost, but also easily leads to design errors. (3) The complexity of CISC instruction sets adds great burden to VLSI design and is not conducive to monolithic integration. (4) In the instruction set of CISC structure, many complex instructions need very complex operations, so the running speed is slow. (5) In the instruction set of CISC structure, due to the function imbalance of each instruction, advanced computer architecture technology (such as flow technology) is not conducive to improve the performance of the system.


25.According to the CPU performance formula, the performance characteristics of RISC and CISC instruction set computers are introduced.

CPU performance formula: CPU time = IC x CPI x T IC is the number of instructions executed by the target program, CPI is the average number of instruction execution cycles, and T is the clock cycle time. CISC object programs with the same function have fewer instructions than RISC’s ICCISC, but CISC’s CPICISC and TCISC are larger than RISC’s CPIRISC and TRISC, so CISC object programs have longer execution time than RISC’s.


26. Briefly describe the idea of directional techniques

The idea is that one instruction does not really need a result of a calculation until it produces it, and that a pause can be avoided by sending the result directly to where the other instructions need it.


27. Briefly describe the advantages and disadvantages of distributed shared multiprocessors

The advantages of distributed memory architecture are as follows: (1) If most of the access is to the local memory of the local node, the bandwidth requirements for the memory and interconnection network can be reduced; (2) Low access delay to local memory. Main disadvantages: The communication between processors is complex, and the access delay between processors is large.


28. Describe the characteristics of pseudo-associative Cache

Pseudo-associative Cache can not only achieve the low failure rate of multi-path group associative Cache, but also maintain the hit speed of direct image Cache. In this approach, in the case of a hit, the Cache access process is the same as in the case of a direct image Cache, whereas in the case of a failure, another location in the Cache is checked for a match before accessing the next level of storage.


29. Describe the two types of names

Inverse correlation: instruction I executes first, and the name written by instruction J is the name read by instruction I. The order of execution between anticorrelation instructions must be guaranteed. Anticorrelation is read before write correlation. Output correlation: instruction J and instruction I write the same name. It is not allowed to reverse the order of the instructions that output the relevant instructions. Output correlation is write correlation after write.


30. What is instruction level parallelism (ILP)? How do processors take advantage of instruction-level parallelism to improve their performance?

Instruction-level parallelism refers to the fact that many instructions in a sequential program are unrelated, which means that the correct result can be produced without having to follow the order in which the instructions are executed in the program. This feature allows the processor to execute instructions not sequentially but in parallel, reducing the time it takes the processor to execute the program.


31. I am not satisfied with the speed of the Cache, so I apply for a limited amount of money. To maximize the economic benefit, I am advised to buy some more Cache chips with the same speed to expand the capacity. Others suggest that you simply buy a faster Cache slide and replace the existing slow Cache slide. Which suggestion do you think is advisable? How do you make decisions? Why is that?

The speed and capacity of the Cache itself affect the equivalent access speed of the Cache. If you are not satisfied with the equivalent access speed of the Cache and need to improve it, you need to do a detailed analysis to see if the equivalent access speed of the Cache is close to the speed of the Cache itself. If the difference is far, it indicates that the Cache hit ratio is low. Improve the Cache hit ratio, including adjusting the group size, block size, replacement algorithm, and Cache capacity. If the Cache’s equivalent access speed is too close to the Cache’s own speed to meet the need, a higher Cache slice should be replaced.


Word problems

1. Assume that the first improvement takes 20% of the time but delivers twice the performance; Assume that the second improvement takes 70% of the time but improves performance by 1.3 times. Set old execution time to Told, ask: Which improvement can reduce execution time the most?

Solution: Applying Amdahl’s law, for the first improvement measures: new execution time = old execution time *((1-0.2)+0.2/2) =0.9 old execution time for the second improvement measures: Time of new execution = time of old execution *((1-0.7)+0.7/1.3) =0.9 time of old execution =0.84 time of old execution Therefore, the performance improvement after the use of the second improvement measure is small, but it has a great influence on the total execution time after the use of the improvement measure.


2. Suppose a computer spends 90% of its time processing a particular type of computation while running a program. Modifications by a manufacturer have increased the execution speed of this calculation type by a factor of 10. Q:

(1) When the original execution time of the program is 100 seconds, what is the execution time of the program after modification? (2) What is the acceleration ratio between the old and new systems?

Answer: (1) enhancement ratio =0.9, enhancement acceleration ratio =10, then according to Amdahl’s law: new execution time = old execution time *((1-enhancement ratio)+ enhancement ratio/enhancement acceleration ratio) =100*((1-0.9)+0.9/10) =19 seconds (2) The acceleration ratio between the old and new systems is: Acceleration ratio = Original execution time/Current execution time =100/19=5.3


3. In the instruction set of a certain machine, only Load/Store instructions can be used for memory access, and other instructions can only be operated between registers. The frequency and clock count of the Load/Store instruction on this machine are as follows:

operation frequency CPI
ALU 43% 1
Loads 21% 2
Stores 12% 2
Branch 24% 2

Now optimized compilation is used to improve its performance. If compilation can reduce 50% ALU instructions, but it cannot reduce Load, Store and Branch instructions, ignoring system factors, and assuming that the clock cycle is 20ns(50MHz), ask: how much MIPS after optimized compilation and MIPS without optimization compilation are respectively? Are changes in MIPS consistent with changes in execution time?


4. Look at the code

           L.D       F6, 34(R2)
           L.D       F2, 45(R3) 
           MULT.D   F0, F2, F4
           SUB.D    F8, F6, F2
           DIV.D     F10, F0, F6
           ADD.D    F6, F8, F2
Copy the code

Assuming that L.D can be completed in 1 clock cycle, CDB takes 1 clock cycle to deliver results to its destination. For dynamic scheduling based on Tomasulo algorithm, please fill in the state table of reservation stations and registers after 12 clock cycles: