ASM Design Example Bin Mult

Please download to get full document.

View again

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report



Views: 3 | Pages: 11

Extension: PDF | Download: 0

Related documents
binary multiplication
  An Example of ASM Design: A Binary Multiplier Dr. D. Capson Electrical and Computer Engineering McMaster University   Introduction An algorithmic state machine (ASM) is a Finite State Machine that uses a sequential circuit (the Controller) to coordinates a series of operations among other functional units such as counters, registers, adders etc. (the Datapath). The series of operations implement an algorithm. The Controller passes “control” signals which can be Moore or Mealy outputs from the Controller, to the Datapath. The Datapath returns information to the Controller in the form of “status” information that can then be used to determine the sequence of states in the Controller. Both the Controller and the Datapath may each have external inputs and outputs and are clocked simultaneously as shown in the following figure: ControllerDatapath StatusControl  clock  Inputs Outputs   InputsOutputs  Think about this :  A microprocessor may be considered as a (large !) ASM with many inputs, states and outputs. A program (any software) is really just a method for specification of its initial state … The two basic strategies for the design of a controller are: 1. hardwired control   which includes techniques such as “one-hot-state” (also known as one flipflop  per state ) and decoded sequence registers. 2. microprogrammed control which uses a memory device to produce a sequence of control words to a datapath.. Since hardwired control is, generally speaking, fast compared with microprogramming strategies, most modern microprocessors incorporate hardwired control to help achieve their high performance (or in some cases, a combination of hardwired and microprogrammed control). The early generations of microprocessors used microprogramming almost exclusively. We will discuss some basic concepts in microprogramming later in the course – for now we concentrate on a design example of hardwired control. The ASM we will design is an n-bit unsigned binary multiplier. Binary Multiplication The design of binary multiplication strategies has a long history. Multiplication is such a fundamental and frequently used operation in digital signal processing, that most modern DSP chips have dedicated multiplication hardware to maximize performance. Examples are filtering, coding and compression for telecommunications and control applications as well as many others. Multiplier units must be fast !  The first example that we considered (in class) that used a repeated addition strategy is not always fast. In fact, the time required to multiply two numbers is variable  and dependent on the value of the multiplier itself. For example, the calculation of 5 x  9 as 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 requires more clock pulses than the calculation of 5 x  3 = 5 + 5 + 5. The larger the multiplier, the more iterations that are required. This is not practical. Think about this : How many iterations are required for multiplying say, two 16-bit numbers, in the worst case ? Another approach to achieve fast multiplication is the look-up table (LUT). The multiplier and multiplicand are used to form an address in memory in which the corresponding, pre-computed value of the product is stored . For an n-bit multiplier (that is, multiplying an n-bit number by an n-bit number), a (2 n+n x  2n)-bit memory is required to hold all possible products. For example, a 4-bit x  4-bit multiplier requires (2 8 ) x 8 = 2048 bits. For an 8-bit x  8-bit multiplier, a (2 8+8 ) x  16 = 1 Mbit memory is required. This approach is conceptually simple and has a fixed multiply time equal to the access time of the memory device, regardless of the data being multiplied. But it is also impractical for larger values of n. Think about this : What memory capacity is required for multiplying two 16-bit numbers ? Two 32-bit numbers ? Most multiplication hardware units use iterative algorithms implemented as an ASM for which the worst-case multiplication time can be guaranteed. The algorithm we present here is similar to the “pencil-and-paper” technique that we naturally use for multiplying in base 10. Consider the following example: 123 (the multiplicand) x 432 (the multiplier) --- 246 (1 st  partial product) 369 (2 nd  partial product) 492 (3rd partial product) ----- 53136 (the product)  Each digit of the multiplier is multiplied by the multiplicand to form a partial product. Each partial  product is shifted left (that is, multiplied by the base) by the amount equal to the power of the digit of the corresponding multiplier. In the example above, 246 is actually 246 x 10 0 , 369 is 369 x 10 1 = 3690 and 492 is actually 492 x 10 2  = 49200, etc. There are as many partial products as there are digits in the multiplier. Binary multiplication can be done in exactly the same way: 1100 (the multiplicand) x 1011 (the multiplier) ---- 1100 (1 st  partial product) 1100 (2nd partial product) 0000 (3rd partial product) 1100   (4th partial product) -------- 10000100 (the product)  However, with binary digits we can make some important observations: -   Since we multiply by only 1 or 0, each partial product is either a copy  of the multiplicand shifted  by the appropriate number of places, or, it is 0. -   The number of partial products is the same as the number of bits in the multiplier -   The number of bits in the product is twice the number of bits in the multiplicand. Multiplying two n-bit numbers produces a 2n-bit product. We could then design datapath hardware using a 2n-bit adder plus some other components (as in the example of Figure 10.17 of Brown and Vranesic) that emulates this manual procedure. However, the hardware requirement can be reduced by considering the multiplication in a different light. Our algorithm may be informally described as follows. Consider each bit of the multiplier from right to left  . When a bit is 1, the multiplicand is added to the running total that is then shifted right. When the multiplier bit is 0, no add is necessary since the partial  product is 0 and then only  the shift takes place. After n cycles of this strategy (once for each bit in the multiplier) the final answer is produced. Consider the previous example again: 1100 (the multiplicand)  x 1011   (the multiplier)   ---- 0000   (initial partial product, start with 0000)   1100   (1 st  multiplier bit is 1, so add the multiplicand)   ---- 1100   (sum)   ---- 01100   (shift sum one position to the right)   1100   (2 nd  multiplier bit is 1, so add multiplicand again)   ---- 100100   (sum, with a carry generated on the left)   ---- 100100   (shift sum once to the right, including carry)   0100100   (3rd multiplier bit is 0, so skip add, shift once)   ---- 1100  ( 4th multiplier bit is 1, so add multiplicand again)   ---- 10000100   (sum, with a carry generated on the left)   10000100   (shift sum once to the right, including carry) ↑↑↑↑    Notice that all the adds take place in these 4 bit positions – we need only a 4-bit adder ! We also need shifting capability to capture the bits moving to the right as well as a way to store the carries resulting from the additions. The final answer (the product) consists of the accumulated sum and the bits shifted out to the right. A hardware design that can implement this algorithm is described in the next section.  Design of the Binary Multiplier Datapath The multiplication as described above can be implemented with the components as shown in the figure on the next page (note that for simplicity, the clock inputs are not shown). It is the role of the controller to provide a sequence of the inputs to each component to cause the datapath hardware to perform the desired operations. Registers A and Q are controlled with synchronous inputs Load (parallel load), Shift (shift one position to the right with left serial input) and Clear (force the contents to 0). The D flipflop has an asynchronous Clear input and the counter has an asynchronous input Init (force the contents to 11..1). The log 2 n-bit counter (Counter P) is used to keep track of the number of iterations (n). Counter P is loaded with the value n-1 and counts down to zero - thus n operations are ensured. Each operation is either (a)  add then shift or (b)  just shift as described in the multiply algorithm above. Zero detection on the counter produces an output Z that is HI when the counter hits zero and this is used to tell the controller that the sequence is complete. The Counter P is initialized to n-1 with input Init = 1. The multiplicand is applied to one n-bit input of the adder. The sum output from the adder is stored as a  parallel load into Register A. Register A can also shift to the right, accepting a 1-bit serial input from the left. This is provided from the output of a D flip flop which stores the value of the carry out from the adder in the previous addition. Register Q receives its left serial input when shifting from the right-most  bit (lsb) of Register A. Register A and Q are identical in operation (but controlled differently) and together with the carry flipflop, they form a (1 + n + n)-bit shift register. That is, Registers C, A and Q are connected such that the carry value stored in the flipflop enters Register A from the left and the bit shifted out from the right of Register A enters Register Q from its left. At the end of the process, registers A and Q will hold the 2n-bit product (the n msb’s are in Register A). The multiplier is initially stored in Register Q via its parallel load capability. The reason for this is that it provides a convenient way to access each bit of the multiplier in succession at the lsb position (Q 0 ) of Register Q. In the multiply algorithm, each bit of the multiplier is used to ‘decide” if there should be an (a) add with shift or (b) shift only. So, Q 0  is used to tell the controller which of these operations to  perform on each iteration. After each shift, one bit of the multiplier is lost to the right and the Product shifts into Register Q from the left. After n shifts, Register Q holds the n lsb’s of the product and the Multiplier is totally lost. Putting the datapath circuit for the binary multiplier into a box, we see it has: Data Inputs : Multiplicand (n bits) Multiplier (n bits) Data Outputs : Product (2n bits) Control inputs : Clear carry Load, Shift and Clear (for each shift register) Init (for the counter) Status outputs : Z (zero detect) and Q 0  (each bit of the Multiplier, in succession)
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks