CSE/EEE 230 – Assignment 4  Important: This is an individual assignment. Please do not collaborate.  Make sure to follow the academic integrity policies. Using work done by someone else will be considered a

CSE/EEE 230 – Assignment 4

 Important: This is an individual assignment. Please do not collaborate. 

Make sure to follow the academic integrity policies. Using work done by someone else will be considered a violation of the academic integrity and will result in a report to the Dean’s office. Your work should not match with anything found online. 

Copying any part of this assignment, and providing them to another person or posting them on the Internet without a permission of the instructor will be a violation of its copyright. http://www.asu.edu/copyright/ 

Show all the steps in your solution to receive full credit. 

 

Question 1: The following MIPS Code is executed using the single cycle MIPS architecture. Include all iterations of the loop while answering the questions.

 

     Start: addiu $t6, $0, 64 addi $t8, $0, 8         add $s1, $s0, $t8  Loop: slt $t0, $s0, $s1         beq $t0, $0, Exit         lbu $t1, 0($s0)            sub $t1, $t1, $t6         sb $t1, 0($s0)             addi $s0, $s0, 1

               j Loop      Exit:      

(a)  Data Path – For the given code, write the Functional Units used in order.  

•               Specify if any Functional Units (FUs) are used at the same time for the same instruction, for example PC+4 Adder is used while Instruction Memory is in use.

•               For Multiplexors, if the output of the MUX is used, then it implies that the MUX is used (i.e., if the select lines are 0 or 1 and not don’t cares, then the MUX is used). Similarly, if any FU output is unused, we assume that the FU is unused.

•               You may group the instructions with the same datapath.

•               You may use the following notations for the Functional Units or the image on the next page.

Instruction Memory (IM), Data Memory (DM), Register File (RF), Arithmetic and Logical Unit (ALU), Program Counter Register (PC), Sign Extension block (SE),  Control Unit (ctrl1), ALU control unit (ctrl2), PC+4 Adder (add1), Branch PC target Adder (add2), Shift Left 2 (sll), Regdst MUX (mux1), ALUSrc MUX (mux2), MemtoReg MUX (mux3), PCSrc MUX (mux4), Jump MUX (mux5).

 

(b)  Control Path – For the given code, write the values of the control signals for each instruction. 

 

•        You may group the instructions with the same control signals.

•        The control signals to be included are: ALUOp, ALU control output (ALU ctrl), Branch, Jump, PCSrc, Regdst, ALUSrc, MemtoReg, RegWrite, MemRead and MemWrite

 

Instr

ALU Op

ALU

ctrl

Reg

dst

ALU

Src

Memto

Reg

Branch

Jump

PC

Src

Reg Write

Mem

Read

Mem

Write

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(c)  Execution Time – For the given code, compute the execution time for each instruction and for the complete code based on the following information. 

 

(i)     Given the following access times for the critical functional units, compute the time taken to execute each instruction.You may group the instructions with the same execution time.

Memory Access = 0.25ns; 

Register Access = 0.13ns; 

ALU execution = 0.18ns

(ii)   Assuming all instructions are executed using a fixed clock cycle length. What is the execution time for the complete code/program? 

Question                                                                            multicycle architecture. Include

all iterations of the loop. Initial values for the registers are: $s0 = 0x230, $s1 = 0x240 and $t1 = 1

 

Loop:     slt $t0, $s0, $s1            beq $t0, $0, Exit         lbu $t1, 0($s0)            sub $t1, $t1, $t6         sb $t1, 0($s0)             add $s0, $s0, $t1

               j Loop      Exit:      

(a)  Data Path – For the given code, trace the steps of execution using the image below.  

 

(b)  Control Path – For the given code, write the values of the control signals for each instruction.

 

•        You may group the instructions with the same control signals.

•        The control signals to be included are: ALUOp, ALU control output (ALU ctrl), IorD, ALUSrcA, ALUSrcB, IRWrite, PCWrite, PCWriteCond, PCSource, Regdst, RegWrite,

MemRead and MemWrite

 

(c)  Execution Time – For the given code, compute the execution time for the program with Processor clock rate 4GHz. 

 

Question         pipeline architecture. Include all iterations of the loop. For the given code, write the Pipeline Implementation by resolving all the three hazards using stalls or Hardware, based on the provided assumptions for each part.

 

       Start: addiu $s1, $0, 0x1234

addi $s0, $0, 0x122C

     Loop:     lb $t1, 0($s0)              sb $t1, 2($s0)           nor $t2, $t1, $t1         sb $t2, 0($s0)             addi $s0, $s0, 4           bne $s0, $s1, Loop

     Exit:     addi $s0, $0, 0x122C

(a) Consider the following assumptions: (only structural hazards are resolved in hardware, other hazards have to be resolved with stall/nop.)

 

•        There is separate instruction and data memory access.

•        Register read and write can happen within the same clock cycle.

•        There is NO Forwarding unit and NO Hazard detection Unit.

•        Use stall if an instruction is delayed after fetch.

•        Use nop if an instruction is delayed before fetch.

(b) Consider the following assumptions: (structural and data hazards are resolved using hardware, other hazards have to be resolved with stall/nop.)

•        There is separate instruction and data memory access.

•        Register read and write can happen within the same clock cycle.

•        There is Forwarding unit. Show data forwarding between the correct stages, wherever necessary.

•        There is NO Hazard detection Unit, i.e., no resolution for control hazards in hardware.

•        Use stall if an instruction is delayed after fetch.

•        Use nop if an instruction is delayed before fetch.

(c)  Consider the following assumptions: (all hazards are resolved using hardware)

 

•        There is separate instruction and data memory access.

•        Register read and write can happen within the same clock cycle.

•        There is Forwarding unit. Show data forwarding between the correct stages, wherever necessary.

•        There is Hazard detection Unit to detect mispredictions and flush if necessary.

•        The 2-bit branch prediction scheme is used with initial prediction being weakly Not Taken.

•        There is Branch target Buffer (BTB) containing target address for the branch instruction.

•        Use stall if an instruction is delayed after fetch.

•        Use nop if an instruction is delayed before fetch.

Question         2-issue pipeline architecture. Include all iterations of the loop. For the given code, show the steps in unrolling the loop and write the 2-issue VLIW pipeline representing the order of issuing the instructions from the loop unrolled code.

 

     Start: addi $s0, $s1, 32  Loop:     lb $t1, 0($s0)              sb $t1, 2($s0)           nor $t2, $t1, $t1         sb $t2, 0($s0)             addi $s0, $s0, -8         bne $s0, $s1, Loop  Exit: