EXAM 2 QUESTION 8 AND EXPLANATION

 

Suppose the major functional units in the single-cycle processor implementation have the following latencies:

Memory:          3ns                               ALU and adders: 2ns

Register file:   1ns                              

The delay of multiplexors, control units, PC access, sign extension units, and wires are considered to be negligible.

 

Consider a program with the following mix of executed instructions:

            R-type             50%                             LW                  20%

            SW                  10%                             Branch             15%

            Jump                5%

Assume no other instructions are used in the program or supported in the implementation.

 

(1)  A straightforward multi-cycle implementation that performs every operation in a single cycle, as the one given in the textbook, may bring negative speedup for the program.

Explain: The cycle time of single-cycle implementation is determined by the longest instruction, LW in this case. The cycle time is 10ns. That’s the latency for every instruction. The cycle time of multi-cycle implementation is determined by the longest operation, memory access in this case. Its cycle time is 3ns. Then the latency for LW instruction will be 15ns, SW 12ns, R-type 12ns, Jump and Branch 9ns. The average instruction latency is obviously longer than that of single-cycle implementation.

 

(2)  Discuss how you will change the multi-cycle implementation to bring a positive speedup. Determine the cycle time and the number of cycles required to execute each type of instructions in your design.

 

Explain: We can use a short cycle time 1ns for the multi-cycle implementation, and make memory access finish in 3 cycles, ALU 2 cycles, and register file access 1 cycle.

 

(3)  Calculate the speedup of your design over the single-cycle implementation.

Explain: The calculation is straightforward: first calculate the latency of each type of instructions, and then calculate the average instruction latency using the percentages of distribution.

 

(4)  Suppose the processor control is implemented by a finite state machine. How many states are needed? Explain. Assume only the five instructions are supported.

 

Explain: Consider the 10-state FSM in the textbook. Now we need more than one state for each operation. For example, state 0 was for instruction fetch. Since instruction-fetch accesses memory, now we need three states, say IF1, IF2, IF3, to replace that state. There is no change of the control signals or their values in the new states. The same set of control signals will now be valid for three cycles for the memory access to finish. In a similar manner, state 1 in the 10-state FSM will be converted into 2 states in the new FSM. We can continue to find out how many more states are needed. Totally 20 states are needed as follows:

	Old State	New state	Comments
	0		0, 1, 2		Memory read for instruction
	1		3, 4		ALU invloved
	2		5, 6		ALU op
	3		7, 8, 9		Memory read
	4		10		Register file acess
	5		11, 12, 13	Memory write
	6		14, 15		ALU op
	7		16		Register file access
	8		17, 18		ALU op
	9		19		Change PC