Du kannst nicht mehr als 25 Themen auswählen Themen müssen entweder mit einem Buchstaben oder einer Ziffer beginnen. Sie können Bindestriche („-“) enthalten und bis zu 35 Zeichen lang sein.

9.7KB

Excercise Zero

The goal of this excercise is to gain some familiarity with developing for FPGAs using chisel. In this exercise you will implement a circuit capable of performing matrix matrix multiplication in the chisel hardware description language.

HAND IN YOUR CODE IN AN ARCHIVE WITH YOUR USERNAME (e.g peteraa_ex0). PLEASE ENSURE THAT WHEN UNZIPPING THE TESTS CAN BE RUN.

Your first component

There are two types of digital components: Combinatorial and stateful. The first component we will consider is a simple combinatorial incrementor:

  class myIncrement(incrementBy: Int) extends Module {
    val io = IO(
      new Bundle {
        val dataIn  = Input(UInt(32.W))
        val dataOut = Output(UInt(32.W))
      }
    )

    io.dataOut := io.dataIn + incrementBy.U

Let's break the code down down. First, myIncrement is a Module, meaning that this class can be instantiated as a hardware circuit. Figure [rm3] shows the model that you have just declared. A 32 bit signal, data_in goes in, and another 32 bit signal goes out.

Apart from the IO, there is only one statement, assigning dataOut to dataIn + incrementBy.

In RTL the component looks like fig [rm4]

Let's see how we can use our module:

  class myIncrementTwice(incrementBy: Int) extends Module {
    val io = IO(
      new Bundle {
        val dataIn  = Input(UInt(32.W))
        val dataOut = Output(UInt(32.W))
      }
    )

    val first  = Module(new myIncrement(incrementBy))
    val second = Module(new myIncrement(incrementBy))

    first.io.dataIn  := io.dataIn
    second.io.dataIn := first.io.dataOut

    io.dataOut := second.io.dataOut
  }

Fig [rm5] shows the RTL design, as expected it's just two incrementors chained.

The following code shows off how you can use for loops to instantiate an arbitrary amount of modules.

  class myIncrementN(incrementBy: Int, numIncrementors: Int) extends Module {
    val io = IO(
      new Bundle {
        val dataIn  = Input(UInt(32.W))
        val dataOut = Output(UInt(32.W))
      }
    )

    val incrementors = Array.fill(numIncrementors){ Module(new myIncrement(incrementBy)) }

    for(ii <- 1 until numIncrementors){
      incrementors(ii).io.dataIn := incrementors(ii - 1).io.dataOut
    }

    incrementors(0).io.dataIn := io.dataIn
    io.dataOut := incrementors(numIncrementors).io.dataOut
  }

Keep in mind that the for-loop only exists at design time, just like a for loop generating a table in HTML will not be part of the finished HTML.

So, what does combinatorial mean? To answer that, let's create a stateful circuit first.

  class myDelay() extends Module {
    val io = IO(
      new Bundle {
        val dataIn  = Input(UInt(32.W))
        val dataOut = Output(UInt(32.W))
      }
    )
    val delayReg = RegInit(UInt(32.W), 0.U)

    delayReg   := io.dataIn
    io.dataOut := delayReg
  }

This circuit seems rather pointless, it simply assigns the input to the output. However, the register has another input, as seen in the RTL: fig [rm6]. The register can only change value during rising edges on the clock!

To examplify, assume at step 0 data in is 0x45. delayReg will now have 0x45 as its data in, but data out will still be 0. Only when the clock ticks will delayReg.dataOut take on the value 0x45.

You should now be able to implement myDelayN following the same principles as myIncrementN

  class myDelayN(delay: Int) extends Module {
    val io = IO(
      new Bundle {
        val dataIn  = Input(UInt(32.W))
        val dataOut = Output(UInt(32.W))
      }
    )
  
    ???
  }

This should answer the initial question of combinatorial vs stateful: The output of a combinatorial circuit will be available instantly, while a stateful circuit will only update its output during rising edges on the clock.

Before you continue it is recommended that you check out the chisel3 tutorials.

In the basics.scala there is one more module, a basic selector. At compile time this component builds n random numbers, to see which we can cycle through them. The component comes with a test, this test will be run when you do sbt.run You should study this component. What is the difference between if/else and when/otherwise?

Matrix matrix multiplication

When designing digital logic you should always start with decomposition. Your first task is therefore to implement a dot product calculator, since a matrix matrix multiplication is essentially a series of these.

Dot Prod

First, let's consider how a dot product calculator would look like in regular scala:


  val vecA = List(1,  2, 4)
  val vecB = List(2, -3, 1)

  val dotProductForLoop = {
    var dotProduct = 0
    for(i <- 0 until vecA.length){
      dotProduct = dotProduct + (vecA(i) * vecB(i))
    }
    dotProduct
  }

In the for loop you can see how the dot product is sequentially calculated by multiplying vector values of the same indice and summing the result.

To implement this logic in hardware the first thing you need is some way to represent a vector which is your first task.

Task 1 - Vector

The first component you should implement is a register bank for storing a vector. This module works as follows:

// pseudocode

let dataOut(T) = if (T - vectorLength) < 0 then 0 else
                 if enableIn(T - vectorLength) 
                   then dataIn(T - vectorLength)
                 else
                   dataOut(T - vectorLength)

From the figure the principle of operation becomes clearer

To test your implementation you can run sbt> testOnly Core.daisyVecSpec in your sbt console

/sindre/Chisel-intro/src/commit/e9a8550f7c03322c2b7c76b5b1b26ac423891592/tdt4255figs/pngs/vector.png
A vector with 4 registers

This shows a vector with 4 registers. Row 1 shows cycles 0 to 3, row 2 shows 4 - 7 etc. After writing the write enable signal is turned off, thus the values held in the registers are not overwritten.

In RTL the black box surrounding the vector shows only the inputs and outputs. The figure shows the corresponding black box to the last column in the previous figure

/sindre/Chisel-intro/src/commit/e9a8550f7c03322c2b7c76b5b1b26ac423891592/tdt4255figs/pngs/vectorBB.png

Task 2 - Dot Product

Your next task is to implement a dot product calculator. daisyDot should calculate the dot product of two vectors, inA and inB. Ensure that validOut is only asserted when you have a result. Ensure that your accumulator gets flushed after calculating your dot product.

Implement the dot product calculator in daisyDot.scala

To test your implementation you can run sbt> testOnly Core.daisyDotSpec in your sbt console

Task 3 - Vector Matrix multiplication

Having implemented a dot product calculator, a vector matrix multiplier is not that different. In imperative code we get something like this:

type Matrix[A] = List[List[A]]
def vectorMatrixMultiply(vec: List[Int], matrix: Matrix[Int]): List[Int] = {
  val transposed = matrix.transpose

  val outputVector = Array.ofDim[Int](vec.length)
  for(ii <- 0 until matrix.length){
    outputVector(ii) = dotProductForLoop(vec, transposed(ii))
  }
  outputVector.toList
}

This is just repeated application of dotProduct. Since vector matrix multiplication is the dotproduct of the vector and the rows of the matrix, the matrix must be transposed. The skeleton code contains more hints if this did not make any sense.

Subtask 1 - representing a matrix

Like the dot product calculator, the first step is to implement a register bank for storing a matrix. This can be done by creating n vectors from Task 1 and then select which row is the 'current' row.

Implement this in daisyGrid.scala

The matrix representation you have created in this task allows you to select which row to read, but not which column. This isn't very efficient when you want to read an entire column since you would have to wait a full cycle for each row. The way we deal with this is noticing that when multiplying two matrices we work on a row basis in matrix A, and column basis on matrix B. If we simply transpose matrix B, then accessing its rows is the same as accessing the columns of matrix B.

A consequence of this is that the API exposed by your matrix multiplier requires matrix B to be transposed.

Subtask 2 - vector matrix multiplication

You now have the necessary pieces to create a vector matrix multiplier. Your implementation should have a vector and a matrix (grid). Input for the vector is in order, input for the matrix is transposed.

Implement this in daisyVecMat.scala

Task 4 - Matrix Matrix multiplication

You can now implement a matrix matrix multiplier. You can (and should) reuse the code for this module from the vector matrix multiplier.

Implement this in daisyMatMul.scala

When all tests are green you are good to go.

Bonus exercise - Introspection on code quality and design choices

This "exercise" has no deliverable, but you should spend some time thinking about what parts gave you most trouble and what you can do to change your approach.

In addition, the implementation you were railroaded into has a flaw that lead to unescessary code duplication when going from a vector matrix multiplier to a matrix matrix multiplier.

Why did this happen, and how could this have been avoided?