* Excercise Zero
  The goal of this excercise is to gain some familiarity with developing for 
  FPGAs using chisel. 
  In this exercise you will implement a circuit capable of performing matrix 
  matrix multiplication in the chisel hardware description language.
  
* Chisel
** Prerequisites
   + *You should have some idea of how digital logic circuits work.*

     Do you know what a NAND gate is? 
     Do you know how many wires you need to address 64kb of memory? 
     If so you should be able to pick it up :)

   + *You must be able to run scala programs.*

     If you can run java then you can run scala.
     If not grab the jvm. Remember to curse Larry Page if you pick it up from the
     oracle webpage.

   + *Some flavor of GNU/Linux, or at least something UNIX-like.*

     If you use anything other than Ubuntu 16.04 or 18.04 I won't be able to offer
     help if something goes wrong.

   + *An editor suited for scala.*

     My personal recommendation is GNU Emacs with emacs-lsp for IDE features along
     with the metals language server (which works for any editor with lsp (language 
     server protocol), such as vim, vscode and atom).
     If you prefer an IDE I hear good things about intelliJ, however I haven't tested
     it personally, so if odd stuff happens I can't help you.

   + *Optional: sbt*

     You can install the scala build tool on your system, but for convenience I've
     included a bootstrap script in sbt.sh.
     sbt will select the correct version for you, so you don't have to worry about
     getting the wrong version.


** Terms
   Before delving into code it's necessary to define some terms.
   
   + *Wire*

     A wire is a bundle of 1 to N condictive wires (yes, that is a recursive 
     definition, but I think you get what I mean). These wires are connected
     either to ground or a voltage source, corresponding to 0 or 1, which
     is useful for representing numbers
     
     We can define a wire consisting of 4 physical wires in chisel like this
     #+begin_src scala
     val myWire = Wire(UInt(4.W))
     #+end_src
 
   + *Driving*

     A wire in on itself is rather pointless since it doesn't do anything.
     In order for something to happen we need to connect them.
     #+begin_src scala
     val wireA = Wire(UInt(4.W))
     val wireB = Wire(UInt(4.W))
     wireA := 2.U
     wireB := wireA
     #+end_src
     Here wireA is driven by the signal 2.U, and wireB is driven by wireA.
     
     For well behaved circuits it does not make sense to let a wire be driven 
     by multiple sources which would make the resulting signal undefined.
     
     Similarily a circular dependency is not allowed a la
     #+begin_src scala
     val wireA = Wire(UInt(4.W))
     val wireB = Wire(UInt(4.W))
     wireA := wireB
     wireB := wireA
     #+end_src
     
     Physically it *is* possible to have multiple drivers, but it's not a good idea
     as attempting to drive a wire with 0 and 1 simultaneously causes a short circuit
     which is definitely not a good thing.
     
   + *Module*

     In order to make development easier we separate functionality into modules, 
     defined by its inputs and outputs.
 
   + *Combinatory circuit*

     A combinatory circuit is a circuit whose output is based only on its
     inputs.
     
   + *Stateful circuit*

     A circuit that will give different results based on its internal state.
     In common parlance, a circuit without registers (or memory) is combinatory
     while a circuit with registers is stateful.
 
   + *Chisel Graph*

     A chisel program is a program whose result is a graph which can be synthesized
     to a transistor level schematic of a logic circuit.
     When connecting wires wireA and wireB we were actually manipulating a graph
     (actually, two subgraphs that were eventually combined into one).
     The chisel graph is directed, but it does allow cycles so long as they are not
     combinatorial.

** Your first component
   The code for the snippets in this subchapter can be found in Example.scala in the test directory.
   You can run them using sbt by running ./sbt in your project root which will open
   your sbt console.

   This will start a large download, so be patient even if it looks like it's stuck.

   The first component we will consider is a simple combinatorial incrementor:
   #+begin_src scala
   // Using `val` in a class argument list makes that value public.
   class MyIncrement(val incrementBy: Int) extends Module {
     val io = IO(
       new Bundle {
         val dataIn  = Input(UInt(32.W))
         val dataOut = Output(UInt(32.W))
       }
     )
   
     io.dataOut := io.dataIn + incrementBy.U
   }
   #+end_src
   
   with incrementBy = 3 we get the following circuit:
   TODO: Fig
   

** Testing your chisel component
   After creating a module you might wonder how it can be run.
   It is not a program, it's a description of a digital circuit, so in order to "run" a chisel model
   we have to simulate it.
   
   This is done by creating a test program where the test runner drives inputs and reads outputs from 
   the module using what is called a peek poke tester.
   

*** Creating a peek poke tester
    #+begin_src scala
    class TheTestRunner(c: MyIncrement) extends PeekPokeTester(c)  {
      for(ii <- 0 until 5){
        poke(c.io.dataIn, ii)
        val o = peek(c.io.dataOut)
        println(s"At cycle $ii the output of myIncrement was $o")
        expect(c.io.dataOut, ii+c.incrementBy)
      }
    }
    #+end_src
    
    There are three features of the peek poke tester on display here:

    1. a peek poke tester has the ability to peek at a value, returning its state. 
       This however is restricted to the modules *io only*

    2. it has the ability to poke (set) the value of an input signal.
       Again, this can be done to *input io only*
    
    3. It can expect a signal to have a certain value and throw an exception if not met.
       Expect is defined in terms of peek.
    
    A peek poke tester can also /step/ the circuit, this will be covered when stateful circuits
    have been introduced.
   
*** Running a peek poke tester
    The test runner class we have defined requires a MyIncrement module that can be simulated.
    However, by writing ~~val inc3 = Module(new MyIncrement(3))~~ the return value is a *chisel graph*,
    i.e a schematic for a circuit.
    In order to interact with a circuit the schematic must be interpreted, resulting in a simulated
    circuit which the peek poke tester can interact with.
    
    If this isn't clear don't worry, in terms of code all we need to do is to invoke a chisel method for
    building the circuit:
    
    #+begin_src scala
    chisel3.iotesters.Driver(() => new MyIncrement(3)) { c =>
      new TheTestRunner(c)
    }
    #+end_src

    Unfortunately this code might be a little hard to parse if you're new to scala. 
    Understanding it is not necessary, it is sufficient to simply swap 
    ~~MyIncrement(3)~~ in ~~(() => MyIncrement(3))~~ with the module you want to test, and
    ~~TheTestRunner(c)~~ with the test runner you want to run.

    #+begin_src scala
    chisel3.iotesters.Driver(() => new MyIncrement(3)) { c =>
      new TheTestRunner(c)
    }
    #+end_src

*** Putting it together into a runnable test
    We want to be able to run our test from sbt. To do this we use the scalatest framework.
    A test looks like this:

    #+begin_src scala
    class MyTest extends FlatSpec with Matchers {
      behavior of "the test that I have written"

      it should "sum two numbers" in {
        2 + 2
      } should be 4
    }
    #+end_src
    
    The tester class introduces a lot of special syntax, but like above it is not necessary 
    to understand how, simply using the template above is sufficient.

    Applying the tester template we end up with:

    #+begin_src scala
    class MyIncrementTest extends FlatSpec with Matchers {
      behavior of "my increment"
    
      it should "increment its input by 3" in {
        chisel3.iotesters.Driver(() => new MyIncrement(3)) { c =>
          new TheTestRunner(c)
        } should be(true)
      }
    }
    #+end_src
    
    By creating this test it is now possible to run it from sbt.
    There are two ways to test. By simply writing "test" in the sbt console as so:
    ~~sbt:chisel-module-template> test~~ every single test will be run.
    Since this creates a lot of noise it's more useful to run "testOnly":
    ~~sbt:chisel-module-template> testOnly Examples.MyIncrementTest~~ where "Examples" 
    is the name of the package and MyIncrementTest is the name of the test.
    The tests for the exercise itself is located in Ex0, so for instance
    ~~sbt:chisel-module-template> testOnly Ex0.MatrixSpec~~ will run the matrix test.
    
    Note: by running "test" once you can use tab completion in the sbt shell to find tests with testOnly.
    using testOnly <TAB>
    
    Running the test should look something like this.
    #+begin_src
    sbt:chisel-module-template> testOnly Examples.MyIncrementTest
    Run starting. Expected test count is: 0
    ...
    Circuit state created
    [info] [0.001] SEED 1556890076413
    [info] [0.002] At cycle 0 the output of myIncrement was 3
    [info] [0.003] At cycle 1 the output of myIncrement was 4
    [info] [0.003] At cycle 2 the output of myIncrement was 5
    [info] [0.003] At cycle 3 the output of myIncrement was 6
    [info] [0.003] At cycle 4 the output of myIncrement was 7
    test MyIncrementTestMyIncrement Success: 5 tests passed in 5 cycles taking 0.011709 seconds
    [info] [0.004] RAN 0 CYCLES PASSED
    - should increment its input by 3
    Run completed in 771 milliseconds.
    ...
    #+end_src

    In the Example.scala file you can find the entire test.
    The only difference is that everything is put in the same class.

** Using modules
   Let's see how we can use our module by instantiating it as a submodule:
   #+begin_src scala
   class MyIncrementTwice(incrementBy: Int) extends Module {
     val io = IO(
       new Bundle {
         val dataIn  = Input(UInt(32.W))
         val dataOut = Output(UInt(32.W))
       }
     )
   
     val first  = Module(new MyIncrement(incrementBy))
     val second = Module(new MyIncrement(incrementBy))
   
     first.io.dataIn  := io.dataIn
     second.io.dataIn := first.io.dataOut
   
     io.dataOut := second.io.dataOut
   }
   #+end_src
   
   The RTL diagram now looks like this:
   
   Note how the modules ~~first~~ and ~~second~~ are now drawn as black boxes. 
   When drawing RTL diagrams we're not interested in the internals of submodules.

   However, what if we want to instantiate an arbitrary amount of incrementors and chain them?
   To see how this can be done it is necessary to take a detour:


** Scala and chisel
   A major stumbling block for learning chisel is understanding the difference between scala and chisel.
   To highlight the difference between the two consider how HTML is generated.
 
   When creating a list we could just write the HTML manually
   #+begin_src html
   <ul>
     <li>Name: Siv Jensen, Affiliation: FrP</li>
     <li>Name: Jonas Gahr Støre, Affiliation: AP</li>
     <li>Name: Bjørnar Moxnes, Affiliation: Rødt</li>
     <li>Name: Malcolm Tucker, Affiliation: DOSAC</li>
   </ul>
   #+end_src
   
   However this is rather cumbersome, so we generate HTML programatically.
   In scala we might do something (sloppy) like this:
   #+begin_src scala
   def generateList(politicians: List[String], affiliations: Map[String, String]): String = {
     val inner = new ArrayBuffer[String]()
     for(ii <- 0 until politicians.size){
       val nameString = politicians(ii)
       val affiliationString = affiliations(nameString)
       inner.add(s"<li>Name: $nameString, Affiliation: $affiliationString</li>")
     }
     "<ul>\n" + inner.mkString("\n") + "</ul>"
   }
 
   // Or if you prefer brevity
   def generateList2(politicians: List[String], affiliations: Map[String, String]): String = {
     val inner = politicians.map(p => s"<li>Name: $p, Affiliation ${affiliations(p)}</li>")
     "<ul>\n" + inner.mkString("\n") + "</ul>"
   }
   #+end_src
   
   Similarily we can use constructs such as for loops to manipulate the chisel graph:
   
   #+begin_src scala
   class MyIncrementN(val incrementBy: Int, val numIncrementors: Int) extends Module {
     val io = IO(
       new Bundle {
         val dataIn  = Input(UInt(32.W))
         val dataOut = Output(UInt(32.W))
       }
     )
   
     val incrementors = Array.fill(numIncrementors){ Module(new MyIncrement(incrementBy)) }
   
     for(ii <- 1 until numIncrementors){
       incrementors(ii).io.dataIn := incrementors(ii - 1).io.dataOut
     }
   
     incrementors(0).io.dataIn := io.dataIn
     io.dataOut := incrementors.last.io.dataOut
   }
   #+end_src
   Keep in mind that the for-loop only exists at design time, just like a for loop
   generating a table in HTML will not be part of the finished HTML.
   
   *Important!*
   In the HTML examples differentiating the HTML and scala was easy because they're
   fundamentally very different. However with hardware and software there is a much
   larger overlap.
   A big pitfall is vector types and indexing, since these make sense both in software
   and in hardware.


*** Troubleshooting a vector design
    Here's a rather silly example highligthing the confusion that can happen when mixing
    scala and chisel.
    
    Some of the code here will cause compiler errors, thus the corresponding code in 
    Examples/myVector.scala is commented out.
    #+begin_src scala
    class MyVector() extends Module {
      val io = IO(
        new Bundle {
          val idx = Input(UInt(32.W))
          val out = Output(UInt(32.W))
        }
      )
    
      val values = List(1, 2, 3, 4)
 
      io.out := values(io.idx)
    }
    #+end_src
    
    If you try to compile this you will get an error.
    
    #+begin_src scala
    sbt:chisel-module-template> compile
    ...
    [error]  found   : chisel3.core.UInt
    [error]  required: Int
    [error]   io.out := values(io.idx)
    [error]                       ^
    #+end_src
 
    This error tells us that io.idx was of the wrong type, namely a chisel UInt.
    The List is a scala construct, it only exists when your design is synthesized, so
    attempting to index using a chisel type would be like HTML attempting to index the
    generating scala code which is nonsensical.
    Let's try again:
 
    #+begin_src scala
    class MyVector() extends Module {
      val io = IO(
        new Bundle {
          val idx = Input(UInt(32.W))
          val out = Output(UInt(32.W))
        }
      )
    
      // val values: List[Int] = List(1, 2, 3, 4)
      val values = Vec(1, 2, 3, 4)
 
      io.out := values(io.idx)
    }
    #+end_src
    
    Egads, now we get this instead
    #+begin_src scala
    [error] /home/peteraa/datateknikk/TDT4255_EX0/src/main/scala/Tile.scala:30:16: inferred type arguments [Int] do not conform to macro method apply's type parameter bounds [T <: chisel3.Data]
    [error]   val values = Vec(1, 2, 3, 4)
    [error]                ^
    [error] /home/peteraa/datateknikk/TDT4255_EX0/src/main/scala/Tile.scala:30:20: type mismatch;
    [error]  found   : Int(1)
    [error]  required: T
    [error]   val values = Vec(1, 2, 3, 4)
    ...
    #+end_src
 
    What is going wrong here? In the error message we see that the type Int cannot be constrained to a 
    type T <: chisel3.Data, but what does that mean?
 
    The <: symbol means subtype, meaning that the compiler expected the Vec to contain a chisel data type
    such as chisel3.Data.UInt or chisel3.Data.Boolean, and Int is not one of them!
    
    A scala int represent 32 bits in memory, whereas a chisel UInt represents a bundle of wires that we
    interpret as an unsigned integer, thus they are not interchangeable although they represent roughly
    the same thing.
    
    Let's fix this
    #+begin_src scala
    class MyVector() extends Module {
      val io = IO(
        new Bundle {
          val idx = Input(UInt(32.W))
          val out = Output(UInt(32.W))
        }
      )
    
      val values = Vec(1.U, 2.U, 3.U, 4.U)
      
      // Alternatively
      // val values = Vec(List(1, 2, 3, 4).map(scalaInt => UInt(scalaInt)))
 
      io.out := values(io.idx)
    }
    #+end_src
    
    This works!
    So, it's impossible to access scala collections with chisel types, but can we do it the other way around?
    
    #+begin_src scala
    class MyVector() extends Module {
      val io = IO(
        new Bundle {
          val idx = Input(UInt(32.W))
          val out = Output(UInt(32.W))
        }
      )
    
      val values = Vec(1.U, 2.U, 3.U, 4.U)
 
      io.out := values(3)
    }
    #+end_src
    
    ...turns out we can?
    This is nonsensical, however thanks to behind the scenes magic the 3 is changed
    to 3.U, much like [] can be a boolean in javascript.
 
 
    To get acquainted with the (rather barebones) testing environment, let's test this.
    #+begin_src scala
    class MyVecSpec extends FlatSpec with Matchers {
      behavior of "MyVec"
    
      it should "Output whatever idx points to" in {
        wrapTester(
          chisel3.iotesters.Driver(() => new MyVector) { c =>
            new MyVecTester(c)
          } should be(true)
        )
      }
    }
    
    
    class MyVecTester(c: MyVector) extends PeekPokeTester(c)  {
      for(ii <- 0 until 4){
        poke(c.io.idx, ii)
        expect(c.io.out, ii)
      }
    }
    #+end_src
    
    #+begin_src
    sbt:chisel-module-template> testOnly Ex0.MyVecSpec
    ...
    ...
    [info] Compiling 1 Scala source to /home/peteraa/datateknikk/TDT4255_EX0/target/scala-2.12/test-classes ...
    ...
    ...
    MyVecSpec:
    MyVec
    [info] [0.001] Elaborating design...
    ...
    Circuit state created
    [info] [0.001] SEED 1556197694422
    test MyVector Success: 4 tests passed in 5 cycles taking 0.009254 seconds
    [info] [0.002] RAN 0 CYCLES PASSED
    - should Output whatever idx points to
    Run completed in 605 milliseconds.
    Total number of tests run: 1
    Suites: completed 1, aborted 0
    Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
    All tests passed.
    #+end_src
 
    Great!

** Compile time and synthesis time
   In the HTML example, assume that we omitted the last </ul> tag. This would not
   create valid HTML, however the code will happily compile. Likewise, we can easily
   create invalid chisel:
 
   #+begin_src scala
   class Invalid() extends Module {
     val io = IO(new Bundle{})
   
     val myVec = Module(new MyVector)
   }
   #+end_src
 
   This code will happily compile!
   Turns out that when compiling, we're not actually generating any chisel at all!
   Let's create a test that builds chisel code for us:
   
   #+begin_src scala
   class InvalidSpec extends FlatSpec with Matchers {
     behavior of "Invalid"
   
     it should "Probably fail in some sort of way" in {
       chisel3.iotesters.Driver(() => new Invalid) { c =>
 
         // chisel tester expects a test here, but we can use ???
         // which is shorthand for throw new NotImplementedException.
         //
         // This is OK, because it will fail during building.
         ???
       } should be(true)
     }
   }
   #+end_src
   
   This gives us the rather scary error:
 
   #+begin_src scala
   sbt:chisel-module-template> compile
   ...
   [success] Total time: 3 s, completed Apr 25, 2019 3:15:15 PM
   ...
   sbt:chisel-module-template> testOnly Ex0.InvalidSpec
   ...
   firrtl.passes.CheckInitialization$RefNotInitializedException: @[Example.scala 25:21:@20.4] : [module Invalid]  Reference myVec is not fully initialized.
    : myVec.io.idx <= VOID
   at firrtl.passes.CheckInitialization$.$anonfun$run$6(CheckInitialization.scala:83)
   at firrtl.passes.CheckInitialization$.$anonfun$run$6$adapted(CheckInitialization.scala:78)
   at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789)
   at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:138)
   at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:236)
   at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:229)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:138)
   at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788)
   at firrtl.passes.CheckInitialization$.checkInitM$1(CheckInitialization.scala:78)
   #+end_src
   
   While scary, the actual error is only this line:
   #+begin_src scala
   firrtl.passes.CheckInitialization$RefNotInitializedException: @[Example.scala 25:21:@20.4] : [module Invalid]  Reference myVec is not fully initialized.
    : myVec.io.idx <= VOID
   #+end_src
   
   Which tells us that myVec has unInitialized wires!
   While our program is correct, it produces an incorrect design, in other words, the scala part
   of the code is correct as it compiled, but the chisel part is incorrect because it does not synthesize.
   
   Let's fix it:
   #+begin_src scala
   class Invalid() extends Module {
     val io = IO(new Bundle{})
   
     val myVec = Module(new MyVector)
     myVec.io.idx := 0.U
   }
   #+end_src
   
   Hooray, now we get ~scala.NotImplementedError: an implementation is missing~
   as expected, along with an enormous stacktrace..
 
   The observant reader may have observed that it is perfectly legal to put chisel types in scala
   collection, how does that work?
   
   A scala collection is just a collection of references, or pointers if you will.
   If it happens to contain values of chisel types then these will exist in the design, however the
   collection will not, so we cannot index based on the collection.
   
   This can be seen in ~MyIncrementN~ where an array of incrementors is used.
   The array is only used help the scala program wire the components together, and once this is
   done the array is not used.
   We could do the same with MyVector, but it's not pretty:
 
   #+begin_src scala
   class MyVector2() extends Module {
     val io = IO(
       new Bundle {
         val idx = Input(UInt(32.W))
         val out = Output(UInt(32.W))
       }
     )
   
     val values = Array(0.U, 1.U, 2.U, 3.U)
   
     io.out := values(0)
     for(ii <- 0 until 3){
       when(io.idx === ii.U){
         io.out := values(ii)
       }
     }
   }
   #+end_src
   
   Note that it is nescessary to specify a default for io.out even though it will never be
   selected.
   While it looks ugly, the generated hardware should, at least in theory, not take up any
   more space or run any slower than the Vec based implementation, save for one difference
   as we will see in the next section.
   
 
** Bit Widths
   What happens if we attempt to index the 6th element in our 4 element vector?
   In MyVector we get 1, and in MyVector2 we get 0, so they're not exactly the same.
   In MyVector the Vec has 4 elements, thus only two wires are necessary (00, 01, 10, 11),
   thus the remaining 28 wires of io.idx are not used.
   
   In MyVector2 on the other hand we have specified a default value for io.out, so for any
   index higher than 3 the output will be 0.
 
   What about the values in the Vec?
   0.U can be represented by a single wire, whereas 3.U must be represented by at
   least two wires.
   In this case it is easy for chisel to see that they must both be of width 32 since they will
   be driving the output signal which is specified as 32 bit wide.
 
   In theory specifying widths should not be necessary other than at the very endpoints of your
   design, however this would quickly end up being intractable, so we specify widths at module
   endpoints.

** Stateful circuits
 
   #+begin_src scala
   class SimpleDelay() extends Module {
     val io = IO(
       new Bundle {
         val dataIn  = Input(UInt(32.W))
         val dataOut = Output(UInt(32.W))
       }
     )
     val delayReg = RegInit(UInt(32.W), 0.U)
   
     delayReg   := io.dataIn
     io.dataOut := delayReg
   }
   #+end_src
   
   This circuit seems rather pointless, it simply assigns the input to the output.
   However, unlike the previous circuits, the simpleDelay circuit stores its value 
   in a register, causing a one cycle delay between input and output.
   
   Lets try it!
   #+begin_src scala
   class DelaySpec extends FlatSpec with Matchers {
     behavior of "SimpleDelay"
   
     it should "Delay input by one timestep" in {
       chisel3.iotesters.Driver(() => new SimpleDelay) { c =>
         new DelayTester(c)
       } should be(true)
     }
   }
   
   
   class DelayTester(c: SimpleDelay) extends PeekPokeTester(c)  {
     for(ii <- 0 until 10){
       val input = scala.util.Random.nextInt(10)
       poke(c.io.dataIn, input)
       expect(c.io.dataOut, input)
     }
   }
   #+end_src
   
   We then run the test:
 
   #+begin_src
   sbt:chisel-module-template> testOnly Ex0.DelaySpec
   ...
   [info] [0.001] Elaborating design...
   [info] [0.071] Done elaborating.
   Total FIRRTL Compile Time: 144.7 ms
   Total FIRRTL Compile Time: 9.4 ms
   End of dependency graph
   Circuit state created
   [info] [0.001] SEED 1556196281084
   [info] [0.002] EXPECT AT 0   io_dataOut got 0 expected 7 FAIL
   [info] [0.002] EXPECT AT 0   io_dataOut got 0 expected 6 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 1 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 2 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 7 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 4 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 8 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 8 FAIL
   [info] [0.003] EXPECT AT 0   io_dataOut got 0 expected 7 FAIL
   #+end_src
 
   Oops, the tester doesn't advance the clock befor testing output, totally didn't
   make an error on purpose to highlight that...
   
   #+begin_src scala
   class DelayTester(c: SimpleDelay) extends PeekPokeTester(c)  {
     for(ii <- 0 until 10){
       val input = scala.util.Random.nextInt(10)
       poke(c.io.dataIn, input)
       step(1)
       expect(c.io.dataOut, input)
     }
   }
   #+end_src
   
   Much better..
   
   You should now be able to implement myDelayN following the same principles as
   MyIncrementN
   
   #+begin_src scala
   class myDelayN(delay: Int) extends Module {
     val io = IO(
       new Bundle {
         val dataIn  = Input(UInt(32.W))
         val dataOut = Output(UInt(32.W))
       }
     )
   
     ???
   }
   #+end_src
   
   Before you continue you should have a good grasp on the difference between scala and
   chisel. For instance, what is the difference between ~=~ and ~:=~?
   If ~a~ is the input for a module, and ~b~ is the output, should it be ~a := b~ or ~b := a~?
   What's the difference between 
   ~if( ... ) ... else ...~
   and
   ~when( ... ){ ... }.elsewhen( ... ){ ... }.otherwise{ ... }~
   ?

** Debugging
   A rather difficult aspect in HDLs, including chisel is debugging.
   When debugging it is necessary to inspect how the state of the circuit evolves, which
   leaves us with two options, peekPokeTester and printf, however both have flaws.

*** Printf
    Printf statements will be executed once per clock cycle if the surrounding block is executed.
    The way printf statements work is that they create a special abstract hardware element that
    sends a message to your console whenever it is enabled (i.e. when it is either at the top
    level of a module, or in a conditional block whose condition has been met).


    This means we can put a printf statement in a module and have it print some state every 
    cycle, and we can put it inside a when block in order to conditionally print.
    
    Other than quickly creating a tremendous amount of noise, printf has a tendency to fool you
    since it often reports values that are one clock cycle off.

    To see this in action, try running EvilPrintfSpec

*** PeekPoke
    The good thing about PeekPokeTester is that it won't lie to you, but it's not a very
    flexible tester either.
    
    The most annoying flaw is that it cannot inspect the value of a submodule. 
    
    Consider the following module
    #+begin_src scala
    class Outer() extends Module {
      val io = IO(
        new Bundle {
          val dataIn  = Input(UInt(32.W))
          val dataOut = Output(UInt(32.W))
        }
      )
      
      val inner = Module(new Inner).io
      
      inner.dataIn := io.dataIn
      io.dataOut   := inner.dataOut
    }
    #+end_src
    
    It would be nice if we could use the peekPokeTester to inspect what goes on inside
    Inner, however this information gets removed before the peekPokeTester is run.
    
    The way I deal with this is using a multiIOModule.
    In this example I have done the same for inner, using a special debug IO bundle to
    separate the modules interface and whatever debug signals I'm interested in.
    
    MultiIOModule can do everything Module can, so if you want to you can use it everywhere.

    #+begin_src scala
    import chisel3.experimental.MultiIOModule

    class Outer() extends MultiIOModule {
      val io = IO(
        new Bundle {
          val dataIn  = Input(UInt(32.W))
          val dataOut = Output(UInt(32.W))
        }
      )
      
      val debug = IO(
        new Bundle {
          val innerState = Output(UInt(32.W))
        }
      )
      
      val inner = Module(new Inner)
      
      inner.io.dataIn := io.dataIn
      io.dataOut   := inner.io.dataOut
      
      debug.innerState := inner.debug.frobnicatorState
    }
    #+end_src

* Matrix matrix multiplication
  For your first foray into chisel you will design a matrix matrix multiplication unit.
  Matrix multiplication is fairly straight forward, however on hardware it's a little
  trickier than the standard for loops normally employed..
  
** Task 1 - Vector
   The first component you should implement is a register bank for storing a vector.
   
   In Vector.scala you will find the skeleton code for this component.
   Unlike the standard Chisel.Vec our custom vector has a read enable which means that
   the memory pointed to by idx will only be overWritten when readEnable is true.
   (You could argue that writeEnable would be a more fitting name, it's a matter of
   perspective)

   Implement the vector and test that it works by running
   ~testOnly Ex0.VectorSpec~ in your sbt console.
   
** Task 2 - Matrix
   The matrix works just like the vector only in two dimensions.
   The skeleton code and associated tests should make the purpose of this module obvious.
   Run the tests with ~testOnly Ex0.VectorSpec~
   
** Task 3 - Dot Product
   This component differs from the two previous in that it has no explicit control input,
   which might at first be rather confusing.
   
   With only two inputs for data, how do we know when the dotproduct has been calculated?
   The answer to this is the ~elements~ argument, which tells the dot product calculator the
   size of the input vectors.
   Consequently, the resulting hardware can only (at least on its own) compute dotproducts
   for one size of vector, which is fine in our circuit.
   
   To get a better understanding we can model this behavior in regular scala:

   #+begin_src scala
   case class DotProdCalculator(vectorLen: Int, timeStep: Int, accumulator: Int){
     def update(inputA: Int, inputB: Int): (Int, Boolean, DotProdCalculator) = {
       val product = inputA * inputB
       if(((timeStep + 1) % vectorLen) == 0){
         (accumulator + product, true, this.copy(timeStep = 0, accumulator = 0))
       else
         (accumulator + product, false, this.copy(timeStep = this.timeStep + 1, accumulator = accumulator + product))
       }
     }
   }
   #+end_src

   To see it in action run ~testOnly Ex0.DPCsimulatorSpec~ in your sbt console.
   
   As with the previous tasks, the dot product calculator must pass the tests with
   ~testOnly Ex0.DotProdSpec~


** Task 4 - Matrix Matrix multiplication
   With our matrix modules and dot product calculators we have every piece needed to 
   implement the matrix multiplier.

   When performing matrix multiplication on a computer transposing the second matrix
   can help us reduce complexity by quite a lot. To examplify, consider 
      
   #+begin_src
       | 2,  5 |
   A = | 7, -1 |
       | 0,  4 |
       

   B = | 1,  1,  2 |
       | 0,  4,  0 |
   #+end_src
   
   It would be much simpler to just have two modules with the same dimensions, and we
   can do this by transposing B so we get
       
   #+begin_src
        | 2,  5 |
   A  = | 7, -1 |
        | 0,  4 |
       
        | 1,  0 |
   BT = | 1,  4 |
        | 2,  0 |
   #+end_src
   
   Now all we need to do is calculate the dot products for the final matrix:

   #+begin_src
   if A*B = C then

        |  A[0] × BT[0],   A[0] × BT[1],   A[0] × BT[2] |
   C  = |  A[1] × BT[0],   ...         ,   ...          |
        |  ...         ,   ...         ,   A[2] × BT[2] |

   where 
   A[0] × BT[0] is the dot product of [2, 5] and [1, 0]
   and
   A[0] × BT[1] is the dot product of [2, 5] and [1, 4]
   and so forth..
   #+end_src
   
   Because of this, the input for matrix B will be supplied transposed, thus you do not
   have to worry about this. For B the input would be [1, 0, 1, 4, 2, 0]
   
   The skeleton code for the matrix multiplier is less detailed, with only one test.
   You're encouraged to write your own tests to make this easier.
   Additionally, if you feel like you're getting stuck you can take a look at 
   MatMulTips.org
       
** Bonus exercise - Introspection on code quality and design choices
   This last exercise has no deliverable, but you should spend some time thinking about
   where you spent most of your efforts.

   A common saying is "A few hours of work can save you from several minutes of planning", 
   and this holds especially true for writing chisel!!