Pipeline Control and Performance (Chapter 6) ELEC 5200-001/6200-001 Computer Architecture and Design

Please download to get full document.

View again

of 43
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Computer Science

Published:

Views: 0 | Pages: 43

Extension: PDF | Download: 0

Share
Related documents
Description
ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2016 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher Professor…
Transcript
ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2016 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 1 Pipelined Datapath (without Jump) 1 mux 0 Add IF/ID ID/EX EX/MEM MEM/WB 4 ALU opcode Shift left 2 26-31 zero 21-25 Instr Reg. File ALU PC 1 mux 0 0 mux 1 16-20 mem Data mem. Sign ext. 1 mux 0 16-20 for I-type lw 11-15 for R-type 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 2 Mem. and Reg. File Need Controls 1 mux 0 Add IF/ID ID/EX EX/MEM MEM/WB 4 ALU RegWrite opcode Shift MemWrite MemRead left 2 26-31 zero 21-25 Instr Reg. File ALU PC 1 mux 0 16-20 0 mux 1 mem Data mem. Sign ext. 1 mux 0 16-20 for I-type lw 11-15 for R-type 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 3 Multiplexers Need Controls 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB ALU RegWrite Shift opcode MemtoReg Branch PCSrc left 2 MemWrite MemRead 26-31 ALUSrc zero 21-25 Instr Reg. File ALU PC 1 mux 0 16-20 0 mux 1 mem Data mem. Sign ext. 1 mux 0 16-20 for I-type lw 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 4 ALU Needs a Control 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB ALU RegWrite Shift opcode MemtoReg Branch PCSrc left 2 MemWrite MemRead 26-31 ALUSrc zero 21-25 Instr Reg. File ALU PC 1 mux 0 16-20 0 mux 1 mem Data ALU cont. mem. Sign ext. 0-5 1 mux 0 16-20 for I-type lw ALUOp 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 5 Compare with Single-Cycle Control Control signals are the same as those needed for a single-cycle datapath. Control signals are generated using the Opcode in the ID (instruction decode) cycle and then distributed to other cycles. Let us reexamine the implementation of the single-cycle control (slides 19-21 of Lecture 5). Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 6 Hardwired CU: Single-Cycle Implemented by combinational logic. 6 Datapath funct. Control code signals To ALU Control opcode logic 6 3 ALU ALUOp 2 control Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 7 0-25 Jump Shift 0 mux 1 left 2 1 mux 0 Add 4 ALU CONTROL Branch opcode MemtoReg 26-31 RegWrite ALUSrc 21-25 ALU Reg. File zero MemWrite Instr. MemRead 1 mux 0 PC 0 mux 1 1 mux 0 mem. 16-20 Data Single-cycle 11-15 mem. RegDst datapath Sign Shift ALUOp ALU 0-15 Cont. ext. left 2 0-5 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 8 Single-Cycle Control Logic Inputs Outputs MemtoReg Instr. Opcode MemRead RegWrite MemWrite Branch ALUOp1 ALUOp0 RegDst Jump ALUSrc type Instruction bits 31 31 29 28 27 26 R 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 lw 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 sw 1 0 1 0 1 1 X 1 X 0 0 1 0 0 0 0 beq 0 0 0 1 0 0 X 0 X 0 0 0 1 0 1 0 J 0 0 0 0 1 0 X X X 0 X 0 X X X 1 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 9 Single-Cycle Control Circuit Op5 Op4 Op3 Op2 Op1 Op0 R lw sw beq J RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0 Jump Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 10 ALU Control Logic Inputs Outputs to ALU Instr. From CU Funct. Code from IR 3-bit Opera- (bits 0-5) code tion type ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 lw, sw 0 0 X X X X X X 010 Add B 0 1 X X X X X X 110 Subtract 1 X X X 0 0 0 0 010 Add 1 X X X 0 0 1 0 110 Subtract R 1 X X X 0 1 0 0 000 AND 1 X X X 0 1 0 1 001 OR 1 X X X 1 0 1 0 111 slt Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 11 ALU Control Operation From Control Circuit select ALUOp1 ALUOp0 from control 3 zero ALU result F3 overflow F2 Operation select ALU function F1 000 AND 001 OR F0 010 Add ALU control 110 Subtract 111 Set on less than Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 12 Returning to Pipelined Control Opcode input to control is supplied by the pipeline register IF/ID in the ID (instruction decode) cycle. Nine control signals are generated in the ID cycle, but none is used. They are saved in the pipeline register ID/EX. ALUSrc, RegDst and ALUOp (2 bits) are used in the EX (execute) cycle. Remaining 5 control signals are saved in the pipeline register EX/MEM. Branch, MemWrite and MemRead are used in the MEM (memory access) cycle. Remaining 2 control signals are saved in the pipeline register MEM/WB. MemtoReg and RegWrite are used in the WB (write back) cycle. Pipelined control is shown without Jump. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 13 Placing Control in Pipelined Datapath 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB ALU CONTROL opcode Shift RegWrite MemtoReg Branch PCSrc left 2 MemWrite MemRead 26-31 ALUSrc zero Instr Reg. File ALU 21-25 PC 1 mux 0 16-20 0 mux 1 mem Data ALU cont. mem. Sign ext. ALUOp 0-5 1 mux 0 16-20 for I-type lw 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 14 Highlighted Pipelined Control 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB ALU CONTROL opcode Shift RegWrite MemtoReg Branch PCSrc left 2 MemWrite MemRead 26-31 ALUSrc zero Instr Reg. File ALU 21-25 PC 1 mux 0 16-20 0 mux 1 mem Data ALU mem cont. Sign ext. ALUOp 0-5 1 mux 0 16-20 for I-type lw 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 15 Single-Cycle Performance Assume 200 ps for memory access 100 ps for ALU operation 50 ps for register file read or write Cycle time set according to longest instruction: lw ≡ IF + ID/RegRead + ALU + MEM + RegWrite = 200 + 50 +100 + 200 + 50 = 600 ps  Av. instruction execution time = clock cycle time = 600 ps Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 16 Multicycle Performance Consider SPECINT2000* instruction mix: 25% lw 5 cycles 10% sw 4 cycles 11% branch 3 cycles 2% jump 3 cycles 52% ALU instr. 4 cycles Av. CPI = 0.25×5 + 0.10×4 + 0.11×3 + 0.02×3 + 0.52×4 = 4.12 Clock cycle time determined from longest operation (memory access) = 200 ps Av. instruction execution time = 4.12×200 = 824 ps *Set of benchmark programs used for performance evaluation, to be discussed in a later lecture. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 17 Pipeline Performance Neglect initial latency (reasonable for long programs). One instruction completed every clock cycle unless delayed by hazard. Average CPI: lw 2 cycles in 50% cases due to hazard 1.5 cycles sw 1 cycle ALU 1 cycle branch 2 cycles in 25% cases due to hazard 1.25 cycles jump 2 cycles For SPECINT2000 Av. CPI = 0.25×1.5 + 0.10×1 + 0.11×1.25 + 0.02×2.0 + 0.52×1 = 1.17 Clock cycle time (longest operation: memory access) = 200 ps Av. instruction execution time = 1.17×200 = 234 ps Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 18 Comparing Alternatives Type of Clock cycle Average Av. instruction datapath time CPI execution time and control Single- 600 ps 1.00 600 ps cycle Multicycle 200 ps 4.12 824 ps Pipelined 200 ps 1.17 234 ps Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 19 Other Controls for Pipeline Forwarding Stall Branch hazard and branch prediction Instruction flush Exceptions Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 20 Forwarding Consider a data hazard: sub $2, $1, $3 # computes result in CC3, writes in $2 in CC5 and $12, $2, $5 # reads $2 in CC3, adds in CC4 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 MEM: CC3: ALU saves new ID: REG. EX: ALU EX/MEM MEM/WB WRITE IF: IM DM READ ID/EX REG. IF/ID FILE WB: sub $2, $1, $3 data in EX/MEM, to be written to $2 in CC5 MEM: ID: REG. EX: ALU MEM/WB EX/MEM WRITE IF: IM READ DM ID/EX REG. IF/ID FILE WB: and $12, $2, $5 CC3: and reads $2 to ID/EX, but the correct data is in EX/MEM CC4: forwarding allows execution of “and” with correct data Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 21 Understanding Forwarding Let’s ask following questions: Q: Why is there a hazard? A: Source register for the present instruction is the same as the destination register of the previous instruction. Q: When is the source register data needed? A: In the execute cycle (CC4). Q: Is source register data available in CC4? A: Yes – use forwarding. No – use stall. Q: Where is the required data in CC4? A: In the pipeline register EX/MEM as ALU output. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 22 Forwarding Hardware A forwarding unit is added to execute (ALU) cycle hardware. Functions of forwarding unit: – Hazard detection – Forward correct data to ALU Inputs to forwarding unit: – Source registers of the instruction in EX – Destination registers of instructions in DM and WB Outputs of forwarding unit: multiplexer controls to route correct data to the ALU. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 23 Recall Register Definitions R-type instruction (add, sub, and, or, . . . ) opcode Rs Rt Rd shamt funct I-type instruction (beq, lw, sw, addi, . . . ) opcode Rs Rt constant_or_address J-type instruction (j, jal, jr) opcode a___d___d___r___e___s___s where Rs is the first source register Rt is the second source register Rd is the destination register Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 24 Forwarding Implemented IF/ID ID/EX EX/MEM MEM/WB ALU Branch PC+4 addr. opcode Shift left 2 26-31 Addr 21-25 zero Reg. File MUX mem 1 mux 0 16-20 ALU 0 mux 1 Data MUX mem. Sign ext. 1 mux 0 16-20 11-15 Rd 21-25 Rs Forwarding unit 16-20 Rt 0-15 Rd Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 25 Stall Delay next instruction by sending nop through pipeline. Necessary when hazard not resolved by forwarding. CC1 CC2 CC3 CC4 CC5 CC6 CC4: new data MEM/WB ID, REG. EX/MEM REG. WRITE READ DM ID/EX in MEM/WB, to IF/ID FILE FILE ALU IM lw $2, 20($1) be written to $2 MEM/WB ID, REG. EX/MEM REG. WRITE READ DM ID/EX IF/ID FILE FILE ALU and $4, $2, $5 IM CC4: execution of and is impossible; correct data unavailable until end of CC4 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 26 Detecting Hazard Requiring Stall Consider instruction in IF/ID being decoded: If Previous instruction (lw) activated MemRead, and Instruction being decoded has a source register (Rs or Rt) same as the destination register (Rt for lw) of the previous instruction Then, stall the pipeline: Force all control outputs to 0 Prevent PC from changing Prevent IF/ID from changing Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 27 Rt Stall Implementation Hazard MemRead PCWrite IF/IDWrite detection unit Rs ID/EX EX/MEM MEM/WB Control opcode IF/ID MUX 26-31 0 Shift left 2 21-25 Addr mem Reg. File zero MUX PC 1 mux 0 ALU 16-20 0 mux 1 MUX Data mem. Sign ext. 16-20 1 mux 0 11-15 Rd 21-25 Rs Forwarding unit 16-20 Rt Rd 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 28 next lw $2, 20($1) and $4, $2, $5 IF: IM CC1 Spr 2016, Mar 9 . . . IF/ID ID: REG. was frozen IF: IM FILE CC2 READ next is fetched twice since PC IF/ID ID/EX ID: REG. State of IF/ID FILE EX: ALU IF: IM CC3 is frozen in CC3 READ IF/ID IF/ID ID/EX EX/MEM ID: REG. MEM: IF: IM FILE EX: ALU DM CC4 READ Stall IF/ID ID/EX EX/MEM MEM/WB ID: REG. WB: MEM: FILE EX: ALU REG. ELEC 5200-001/6200-001 Lecture 7 CC5 READ DM WRITE ID/EX EX/MEM MEM/WB MEM: WB: EX: ALU DM REG. CC6 WRITE Execution with stall and forwarding: EX/MEM MEM/WB MEM: WB: REG. DM (nop) CC7 WRITE bubble MEM/WB CC4: new data WB: in MEM/WB, to be written to $2 29 REG. WRITE Branch Hazard Consider heuristic – branch not taken. Continue fetching instructions in sequence following the branch instructions. If branch is taken (indicated by zero output of ALU): – Control generates branch signal in ID cycle. – branch activates PCSource signal in the MEM cycle to load PC with new branch address. – Three instructions in the pipeline must be flushed if branch is taken – can this penalty be reduced? Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 30 Branch Hazard 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB ALU CONTROL opcode Shift RegWrite MemtoReg left 2 PCSrc Branch MemWrite MemRead 26-31 beq ALUSrc zero Instr Reg. File ALU 21-25 PC 1 mux 0 16-20 0 mux 1 mem Data ALU cont. mem. Sign ext. 0-5 ALUOp 1 mux 0 16-20 for I-type lw 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 31 Branch Not Taken Branch on condition to Z A B C D Z cycle b cycle b+1 cycle b+2 cycle b+3 cycle b+4 Branch fetched Branch decoded Branch decision PC keeps D (br. not taken) A fetched A decoded A executed A continues B fetched B decoded B executed C fetched C decoded D fetched Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 32 Branch Taken Branch on condition to Z A B C D Z cycle b cycle b+1 cycle b+2 cycle b+3 cycle b+4 Branch fetched Branch decoded Branch decision PC gets Z (br. taken) A fetched A decoded A executed Nop B fetched B decoded Nop Three-cycle penalty C fetched Nop Three instructions are Z fetched flushed if branch is taken Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 33 Branch Penalty Reduction 1 mux 0 4 Add IF/ID ID/EX EX/MEM MEM/WB Add CONTROL opcode Shift RegWrite MemtoReg left 2 PCSrc Branch MemWrite MemRead 26-31 beq ALUSrc zero Instr Reg. File ALU 21-25 PC 1 mux 0 16-20 0 mux 1 mem Data ALU cont. mem. Sign ext. 0-5 ALUOp 1 mux 0 16-20 for I-type lw 11-15 for R-type RegDst 0-15 Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 34 Branch Taken Branch to Z A B C D Z cycle b cycle b+1 cycle b+2 cycle b+3 cycle b+4 Branch fetched Branch decision PC gets Z A fetched A flushed Nop Nop Z fetched Z decoded Z executed One-cycle penalty One instructions is flushed if branch is taken Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 35 Pipeline Flush If branch is taken (as indicated by zero), then control does the following: – Change all control signals to 0, similar to the case of stall for data hazard, i.e., insert bubble in the pipeline. – Generate a signal IF.Flush that changes the instruction in the pipeline register IF/ID to 0 (nop). Penalty of branch hazard is reduced by – Adding branch detection and address generation hardware in the decode cycle – one bubble needed – a next address generation logic in the decode stage writes PC+4, branch address, or jump address into PC. – Using branch prediction. – Unrolling loops. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 36 Branch Prediction Useful for program loops. A one-bit prediction scheme: a one-bit buffer carries a “history bit” that tells what happened on the last branch instruction History bit = 1, branch was taken History bit = 0, branch was not taken Not taken Predict Predict taken branch branch Not taken taken not taken 1 0 taken Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 37 Branch Prediction Address of Target History recent branch addresses bit(s) instructions Low-order PC+4 Next PC bits used as index 0 1 Prediction = Logic PC Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 38 Branch Prediction for a Loop Execution of Instruction d a I=0 Execu Old Next instr. New Predi -tion hist. hist. Pred. I Act. ction seq. bit bit b I=I+1 1 0 e 1 b 1 Bad 2 1 b 2 b 1 Good c X = X + R(I) 3 1 b 3 b 1 Good 4 1 b 4 b 1 Good 5 1 b 5 b 1 Good N d I – 10 = 0? 6 1 b 6 b 1 Good 7 1 b 7 b 1 Good Y 8 1 b 8 b 1 Good e Store X in memory 9 1 b 9 b 1 Good 10 1 b 10 e 0 Bad h.bit = 0 branch not taken, h.bit = 1 branch taken. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 39 Prediction Accuracy One-bit predictor: 2 errors out of 10 predictions Prediction accuracy = 80% To improve prediction accuracy, use two- bit predictor: A prediction must be wrong twice before it is changed Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 40 Two-Bit Prediction Buffer Implemented as a two-bit counter. Can improve correct prediction statistics. Not taken Predict Predict taken branch branch taken taken 11 10 taken taken Not taken Not taken Predict Predict Not taken branch branch not taken not taken 00 01 taken Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 41 Branch Prediction for a Loop Execution of Instruction 4 1 I=0 Execu Old Next instr. New Predi -tion Pred. pred. Pred. I Act. Buf ction seq. Buf 2 I=I+1 1 10 2 1 2 11 Good 2 11 2 2 2 11 Good 3 X = X + R(I) 3 11 2 3 2 11 Good 4 11 2 4 2 11 Good N 5 11 2 5 2 11 Good 4 I – 10 = 0? 6 11 2 6 2 11 Good Y 7 11 2 7 2 11 Good 8 11 2 8 2 11 Good 5 Store X in memory 9 11 2 9 2 11 Good 10 11 2 10 5 10 Bad Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 42 Exceptions A typical exception occurs when ALU produces an overflow signal. Control asserts following actions on exception: – Change the PC address to 4000 0040hex. This is the location of the exception routine. This is done by adding an additional input to the PC input multiplexer. – Overflow is detected in the EX cycle. Similar to data hazard and pipeline flush, Set IF/ID to 0 (nop). Generate ID.Flush and EX.Flush signals to set all control signals to 0 in ID/EX and EX/MEM registers. This also prevents the ALU result (presumed contaminated) from being written in the WB cycle. Spr 2016, Mar 9 . . . ELEC 5200-001/6200-001 Lecture 7 43
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks