Avi's Blog

Saturday 2 June 2018

Pipeline Hazards

There are certain scenarios which prevent the execution of next instruction is called Hazard.
Hazards reduce the performance of pipeline. THere are three types of Hazards.

1. Structural Hazards: These are mainly because of resource conflict. When more than one instruction wants to access the same resource then this hazard will occur.

e. g. A processor which has shared data and instruction memory. When instruction contains data memory reference ( like load instruction ), it will cause conflict with instruction in its Fetch stage as both instructions are trying access shared memory.

Reason: To reduce the cost, if functional units are doubled to resolve structural hazard it may be too costly.

2. Data Hazards: When instruction depends on the result of previous instruction. In this case instruction attempts to use data before it is ready.

Solutions:
a. Forwarding technique / Bypassing / Short-Circuit: Ex | Mem, and Mem | WB registers are fed back to ALU input. If forwarding hardware detect that previous instruction result is available as input to ALU, the control unit selects the forwarded result instead of reading it from the register file.

b. Stall / Interlock: Always it's not possible to solve data hazards with forwarding in those cases need to stall pipeline until right data is available. Pipeline interlock hardware preserves correct execution pattern, it stalls pipeline until the hazard is cleared.

e.g. Here all the instructions after ADD use the result of ADD instruction. ADD instruction writes the value to R1 in WB stage whereas SUB instruction read the value during ID stage.
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9

3. Branch / Control Hazard:

Comming Soon

Monday 20 April 2015

4x4 bit Wallace Tree Multiplier Implementation in Verilog

//Half Adder Code:

`timescale 1ns / 1ps
module half_add( input a, input b, output s, output c );

assign s = a ^ b;
assign c = a & b;

endmodule

//Full Adder Code:

`timescale 1ns / 1ps
module full_add( input a, input b, input ci, output s, output co );

assign s = (a ^ b) ^ ci;
assign co = (a & b) ^ (ci & (a ^ b));

endmodule

//Wallace 4 Code:

`timescale 1ns / 1ps

module wallace4(input [3:0] A, input [3:0] B, output [7:0] P);

integer i;
wire s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s34,s35,s36,s37;
wire c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c34,c35,c36,c37;
reg [3:0] pp0,pp1,pp2,pp3 ;

//Calculation of Partial Product
always @(A or B)
begin
for(i=0;i<4;i=i+1) begin
pp0[i] <= A[i] & B[0];
pp1[i] <= A[i] & B[1];
pp2[i] <= A[i] & B[2];
pp3[i] <= A[i] & B[3];
end
end

assign P[0] = pp0[0];
assign P[1] = s11;
assign P[2] = s22;
assign P[3] = s32;
assign P[4] = s34;
assign P[5] = s35;
assign P[6] = s36;
assign P[7] = s37;

//first stage
half_add ha11 (pp0[1],pp1[0],s11,c11);
full_add fa12 (pp0[2],pp1[1],pp2[0],s12,c12);
full_add fa13 (pp0[3],pp1[2],pp2[1],s13,c13);
full_add fa14 (pp1[3],pp2[2],pp3[1],s14,c14);
half_add ha15 (pp2[3],pp3[2],s15,c15);

//second stage
half_add ha21 (c11,s12,s22,c22);
full_add fa22 (pp3[0],c12,s13,s23,c23);
full_add fa23 (c13,c23,s14,s24,c24);
full_add fa24 (c14,c24,s15,s25,c25);
full_add fa25 (c15,c25,pp3[3],s26,c26);

//third stage
half_add ha31 (c22,s23,s32,c32);
half_add ha32 (c32,s24,s34,c34);
half_add ha33 (c34,s25,s35,c35);
half_add ha34 (c35,s26,s36,c36);
half_add ha35 (c36,c26,s37,c37);

endmodule

//Simulation Results:

Wednesday 15 April 2015

4x4 bit Wallace Tree Multiplier Implementation in VHDL

A fast process for multiplication of two numbers was developed by Wallace. Using this method, a three step process is used to multiply two numbers; the bit products are formed, the bit product matrix is reduced to a two row matrix where sum of the row equals the sum of bit products, and the two resulting rows are summed with a fast adder to produce a final product.

--Wallace Tree Multiplier Main Code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity wallace4 is
Port ( A : in STD_LOGIC_VECTOR (3 downto 0);
B : in STD_LOGIC_VECTOR (3 downto 0);
P : out STD_LOGIC_VECTOR (7 downto 0));
end wallace4;

architecture Behavioral of wallace4 is
component full_adder is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
c : in STD_LOGIC;
sum : out STD_LOGIC;
carry : out STD_LOGIC);
end component;

component half_adder is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
sum : out STD_LOGIC;
carry : out STD_LOGIC);
end component;

signal s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s34,s35,s36,s37 : std_logic;
signal c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c34,c35,c36,c37 : std_logic;
signal pp0,pp1,pp2,pp3 : std_logic_vector(3 downto 0);

begin

process(A,B)
begin
for i in 0 to 3 loop
pp0(i) <= A(i) and B(0);
pp1(i) <= A(i) and B(1);
pp2(i) <= A(i) and B(2);
pp3(i) <= A(i) and B(3);
end loop;
end process;

P(0) <= pp0(0);
P(1) <= s11;
P(2) <= s22;
P(3) <= s32;
P(4) <= s34;
P(5) <= s35;
P(6) <= s36;
P(7) <= s37;

--first stage
ha11 : half_adder port map(pp0(1),pp1(0),s11,c11);
fa12 : full_adder port map(pp0(2),pp1(1),pp2(0),s12,c12);
fa13 : full_adder port map(pp0(3),pp1(2),pp2(1),s13,c13);
fa14 : full_adder port map(pp1(3),pp2(2),pp3(1),s14,c14);
ha15 : half_adder port map(pp2(3),pp3(2),s15,c15);

--second stage
ha22 : half_adder port map(c11,s12,s22,c22);
fa23 : full_adder port map(pp3(0),c12,s13,s23,c23);
fa24 : full_adder port map(c13,c23,s14,s24,c24);
fa25 : full_adder port map(c14,c24,s15,s25,c25);
fa26 : full_adder port map(c15,c25,pp3(3),s26,c26);

--third stage
ha32 : half_adder port map(c22,s23,s32,c32);
ha34 : half_adder port map(c32,s24,s34,c34);
ha35 : half_adder port map(c34,s25,s35,c35);
ha36 : half_adder port map(c35,s26,s36,c36);
ha37 : half_adder port map(c36,c26,s37,c37);

end Behavioral;

--Half Adder Code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity half_adder is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
sum : out STD_LOGIC;
carry : out STD_LOGIC);
end half_adder;

architecture Behavioral of half_adder is
begin

sum <= a xor b;
carry <= a and b;

end Behavioral;

--Full Adder Code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity full_adder is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
c : in STD_LOGIC;
sum : out STD_LOGIC;
carry : out STD_LOGIC);
end full_adder;

architecture Behavioral of full_adder is
begin
sum <= (a xor b xor c);
carry <= (a and b) xor (c and (a xor b));
end Behavioral;