ハードウェアの気になるあれこれ

技術的に興味のあることを調べて書いてくブログ。主にハードウェアがネタ。

Chiselで3次元メモリっぽいのを作る

スポンサーリンク

今回もChiselのMemについての話。
元ネタは自分の以下のツイート。

おそらくお察し頂けたとは思うが、3次元のメモリを作る場合についてまとめておく。

Chiselで3次元のメモリ

System Verilogだと多次元配列に拡張されてるのでのように変数の前のビット幅指定部分を増やすと任意の次元の配列が作れる。

reg [1:0][2:0][7:0] mem[0:1023]; // アンパックド型

ChiselのMemで実現するには?

これももう冒頭のツイートからもお察しいただけたとは思うがMemの第二引数にVec入れ子にしたものを入れてやれば所望のメモリが作成できる。

val m = Mem(10, Vec(2, Vec(3, UInt(8.W))))

これを使ってひとつのメモリモジュールとして完成させたものが以下のコードだ。 一応入出力データをVec(2, UInt(32.W))にして書き込み制御信号wrenVecにしてあるので、Vec(2)の単位でのライト制御が可能となる。

/**
  * 3次元のメモリ
  * @param useWriteTask MemBaseのwriteメソッドを使うかどうかを選択
  */
class Mem3D(useWriteTask: Boolean = false) extends Module {
  val io = IO(new Bundle {
    val wren = Input(Vec(2, Bool()))
    val rden = Input(Bool())
    val addr = Input(UInt(14.W))
    val wrdata = Input(Vec(2, UInt(32.W)))
    val rddata = Output(Vec(2, UInt(32.W)))
  })

  // これだと"reg [1:0][3:0][7:0] m[0:15]"に相当
  val m = Mem(16, Vec(2, Vec(4, UInt(8.W))))

  if (useWriteTask) {
    /**
      * MemBaseのwriteタスクを使う場合
      */
    val wrdata = Wire(Vec(2, Vec(4, UInt(8.W))))

    for {i <- 0 until m(0).length
         j <- 0 until m(0)(0).length} {
      wrdata(i)(j) := io.wrdata(i)(((j + 1) * 8) - 1, j * 8)
    }

    // 現状のMemBase.writeではVec(2)の各要素に対してのmaskの指定のみ可能
    m.write(io.addr, wrdata, io.wren)
  } else {
    /**
      * こちらの場合はVecの各要素を指定してライトできるので、ちゃんと書けば
      * UInt(8.W)でのmask指定が可能になる(けど今回は書いてない)
      */
    for {i <- 0 until m(0).length
         j <- 0 until m(0)(0).length} {
      when (io.wren(i)) {
        m(io.addr)(i)(j) := io.wrdata(i)(((j + 1) * 8) - 1, j * 8)
      }
    }
  }

  /**
    * In/Outのデータの型をUInt(32.W)にしたのでVecの中身を
    * CatしてVec(4, UInt(8.W))→UInt(32.W)に変換する
    */
  for (i <- 0 until m(0).length) {
    io.rddata(i) := Cat(m(io.addr)(i).reverse)
  }
}

コード中のコメントに書いたとおりで、Vec(4, UInt(8.W))の各バイトレーン毎にmask制御したい場合は現状はuseWriteTask == falseの状態で各要素単位での書き込み論理を生成して制御してやる必要がある。
というのはMemBasewriteメソッドの実装が以下のようにmask信号の型がSeq[Bool]に制限されているから。

  def write(idx: UInt, data: T, mask: Seq[Bool]) (implicit evidence: T <:< Vec[_], compileOptions: CompileOptions): Unit = {
    implicit val sourceInfo = UnlocatableSourceInfo
    // 多分ここで作ったaccessorがメモリの1要素(mem(0)とかでアクセスできるもの)になってる
    val accessor = makePort(sourceInfo, idx, MemPortDirection.WRITE).asInstanceOf[Vec[Data]]
    val dataVec = data.asInstanceOf[Vec[Data]]
    if (accessor.length != dataVec.length) {
      Builder.error(s"Mem write data must contain ${accessor.length} elements (found ${dataVec.length})")
    }
    if (accessor.length != mask.length) {
      Builder.error(s"Mem write mask must contain ${accessor.length} elements (found ${mask.length})")
    }

    // mask(Seq[Bool])とaccessorとdataVecをzipでまとめてループ処理
    for (((cond, port), datum) <- mask zip accessor zip dataVec)
     /**
        * 今回のMem3Dの場合condはただのBoolになっているのに対して
        * portとdatumはVec(4, UInt(8.W))なのでこの単位でのmask制御になる
        */
      when (cond) { port := datum }
  }

MemBaseと同等のクラスを自分で作ってwriteの部分だけ修正すれば普通に対応は出来そう。
使うかどうかは別の話。。。

とりあえずテストして確認

テスト用の記述は前回2次元メモリのテストの際に作ったものを少しだけ変更して3次元メモリに対応させたものを準備した。

import chisel3.iotesters._

import scala.math.{floor, random}

/**
  * Mem3Dの単体テストクラス
  * @param c Mem3D
  */
class Mem3DUnitTester(c: Mem3D) extends PeekPokeTester(c) {

  /**
    * メモリライト
    * @param bank 読みだすメモリのバンク
    * @param addr メモリアドレス
    * @param data 書き込むデータ
    */
  def write(bank: Int, addr: Int, data: BigInt): Unit = {
    poke(c.io.wren(bank), true)
    poke(c.io.addr, addr)
    poke(c.io.wrdata(bank), data)
    step(1)
    poke(c.io.wren(bank), false)
  }

  /**
    * メモリリード
    * @param bank 読みだすバンク
    * @param addr メモリアドレス
    * @return メモリの値
    */
  def read(bank: Int, addr: Int): BigInt = {
    poke(c.io.rden, true)
    poke(c.io.addr, addr)
    step(1)
    poke(c.io.rden, false)
    peek(c.io.rddata(bank))
  }

  /**
    * テストシナリオ
    *  - 各バンク毎のアドレス0-15に適当に値書いて読むだけ
    */
  for {i <- 0 until 2
       j <- 0 until 16} {
    val data = i + longToUnsignedBigInt(floor(random * 0xffffffffL).toLong)
    write(i, j, data)
    step(1)
    println(s"read data($i, $j) = 0x${read(i, j).toInt.toHexString}")
    expect(c.io.rddata(i), data)
  }
}

/**
  * Mem3Dのテスト
  */
class Mem3DTester extends ChiselFlatSpec {

  behavior of "Mem3D"

  it should "32bit単位でメモリの各バンクにアクセス出来る" in {
    Driver.execute(Array[String](), () => new Mem3D()) {
      c => new Mem3DUnitTester(c)
    } should be (true)
  }

  it should "writeを使っても32bit単位でメモリの各バンクにアクセス出来る" in {
    Driver.execute(Array[String](), () => new Mem3D(true)) {
      c => new Mem3DUnitTester(c)
    } should be (true)
  }
}
  • テスト結果

テストは正常にPASSすることを確認できた。

[info] Mem3DTester:
[info] Mem3D
[info] - should 32bit単位でメモリの各バンクにアクセス出来る
[info] - should writeを使っても32bit単位でメモリの各バンクにアクセス出来る
[info] ScalaTest
[info] Run completed in 3 seconds, 91 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 2, Failed 0, Errors 0, Passed 2
[success] Total time: 3 s, completed 2019/05/04 10:41:46

RTLを生成して比較

writeタスク不使用版

前回同様に書き込み部分のalwaysブロックだけ記載。2次元の場合と同じような感じでVec(4, UInt(8.W))の各要素ごとに4つのライト処理が生成されそれを更にVec(2)でくるんでいるためx2倍して8つのmask論理が生成されそのうちの1つが有効な書き込みなった。

module Mem3D( // @[:@3.2]
  input         clock, // @[:@4.4]
  input         reset, // @[:@5.4]
  // 入力をVecにしたので_0/_1に展開される。
  input         io_wren_0, // @[:@6.4]
  input         io_wren_1, // @[:@6.4]
  input         io_rden, // @[:@6.4]
  input  [13:0] io_addr, // @[:@6.4]
  input  [31:0] io_wrdata_0, // @[:@6.4]
  input  [31:0] io_wrdata_1, // @[:@6.4]
  output [31:0] io_rddata_0, // @[:@6.4]
  output [31:0] io_rddata_1 // @[:@6.4]
);
  reg [7:0] m_0_0 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_0;
  wire [7:0] m_0_0__T_1435_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_1435_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_1546_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_1546_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_347_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_347_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_347_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_347_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_456_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_456_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_456_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_456_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_565_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_565_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_565_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_565_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_674_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_674_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_674_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_674_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_891_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_891_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_891_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_891_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_1000_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1000_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1000_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_1109_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1109_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1109_en; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_1218_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1218_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_1218_en; // @[Mem3D.scala 19:14:@8.4]

  /*** 中略 ***/

  assign m_0_0__T_1435_addr = io_addr[3:0];
  assign m_0_0__T_1435_data = m_0_0[m_0_0__T_1435_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_0__T_1546_addr = io_addr[3:0];
  assign m_0_0__T_1546_data = m_0_0[m_0_0__T_1546_addr]; // @[Mem3D.scala 19:14:@8.4]
  // m_0_0についてはT_347のみが有効なライト
  assign m_0_0__T_347_data = io_wrdata_0[7:0];
  assign m_0_0__T_347_addr = io_addr[3:0];
  assign m_0_0__T_347_mask = 1'h1;
  assign m_0_0__T_347_en = io_wren_0;
  assign m_0_0__T_456_data = 8'h0;
  assign m_0_0__T_456_addr = io_addr[3:0];
  assign m_0_0__T_456_mask = 1'h0;
  assign m_0_0__T_456_en = io_wren_0;
  assign m_0_0__T_565_data = 8'h0;
  assign m_0_0__T_565_addr = io_addr[3:0];
  assign m_0_0__T_565_mask = 1'h0;
  assign m_0_0__T_565_en = io_wren_0;
  assign m_0_0__T_674_data = 8'h0;
  assign m_0_0__T_674_addr = io_addr[3:0];
  assign m_0_0__T_674_mask = 1'h0;
  assign m_0_0__T_674_en = io_wren_0;
  assign m_0_0__T_891_data = 8'h0;
  assign m_0_0__T_891_addr = io_addr[3:0];
  assign m_0_0__T_891_mask = 1'h0;
  assign m_0_0__T_891_en = io_wren_1;
  assign m_0_0__T_1000_data = 8'h0;
  assign m_0_0__T_1000_addr = io_addr[3:0];
  assign m_0_0__T_1000_mask = 1'h0;
  assign m_0_0__T_1000_en = io_wren_1;
  assign m_0_0__T_1109_data = 8'h0;
  assign m_0_0__T_1109_addr = io_addr[3:0];
  assign m_0_0__T_1109_mask = 1'h0;
  assign m_0_0__T_1109_en = io_wren_1;
  assign m_0_0__T_1218_data = 8'h0;
  assign m_0_0__T_1218_addr = io_addr[3:0];
  assign m_0_0__T_1218_mask = 1'h0;
  assign m_0_0__T_1218_en = io_wren_1;

  /*** 中略 ***/

  /**
    * 2次元の場合と同様にVec(4, UInt(8.W))の1要素につき4つの書き込み論理が生成される
    * 上のマスク論理の生成を追ってもらうとわかる通りで実際に有効になるのは4つのうち1つだけ。
    */
  always @(posedge clock) begin
    if(m_0_0__T_347_en & m_0_0__T_347_mask) begin
      m_0_0[m_0_0__T_347_addr] <= m_0_0__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_456_en & m_0_0__T_456_mask) begin
      m_0_0[m_0_0__T_456_addr] <= m_0_0__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_565_en & m_0_0__T_565_mask) begin
      m_0_0[m_0_0__T_565_addr] <= m_0_0__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_674_en & m_0_0__T_674_mask) begin
      m_0_0[m_0_0__T_674_addr] <= m_0_0__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_891_en & m_0_0__T_891_mask) begin
      m_0_0[m_0_0__T_891_addr] <= m_0_0__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_1000_en & m_0_0__T_1000_mask) begin
      m_0_0[m_0_0__T_1000_addr] <= m_0_0__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_1109_en & m_0_0__T_1109_mask) begin
      m_0_0[m_0_0__T_1109_addr] <= m_0_0__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_0__T_1218_en & m_0_0__T_1218_mask) begin
      m_0_0[m_0_0__T_1218_addr] <= m_0_0__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_347_en & m_0_1__T_347_mask) begin
      m_0_1[m_0_1__T_347_addr] <= m_0_1__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_456_en & m_0_1__T_456_mask) begin
      m_0_1[m_0_1__T_456_addr] <= m_0_1__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_565_en & m_0_1__T_565_mask) begin
      m_0_1[m_0_1__T_565_addr] <= m_0_1__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_674_en & m_0_1__T_674_mask) begin
      m_0_1[m_0_1__T_674_addr] <= m_0_1__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_891_en & m_0_1__T_891_mask) begin
      m_0_1[m_0_1__T_891_addr] <= m_0_1__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_1000_en & m_0_1__T_1000_mask) begin
      m_0_1[m_0_1__T_1000_addr] <= m_0_1__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_1109_en & m_0_1__T_1109_mask) begin
      m_0_1[m_0_1__T_1109_addr] <= m_0_1__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_1218_en & m_0_1__T_1218_mask) begin
      m_0_1[m_0_1__T_1218_addr] <= m_0_1__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_347_en & m_0_2__T_347_mask) begin
      m_0_2[m_0_2__T_347_addr] <= m_0_2__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_456_en & m_0_2__T_456_mask) begin
      m_0_2[m_0_2__T_456_addr] <= m_0_2__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_565_en & m_0_2__T_565_mask) begin
      m_0_2[m_0_2__T_565_addr] <= m_0_2__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_674_en & m_0_2__T_674_mask) begin
      m_0_2[m_0_2__T_674_addr] <= m_0_2__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_891_en & m_0_2__T_891_mask) begin
      m_0_2[m_0_2__T_891_addr] <= m_0_2__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_1000_en & m_0_2__T_1000_mask) begin
      m_0_2[m_0_2__T_1000_addr] <= m_0_2__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_1109_en & m_0_2__T_1109_mask) begin
      m_0_2[m_0_2__T_1109_addr] <= m_0_2__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_1218_en & m_0_2__T_1218_mask) begin
      m_0_2[m_0_2__T_1218_addr] <= m_0_2__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_347_en & m_0_3__T_347_mask) begin
      m_0_3[m_0_3__T_347_addr] <= m_0_3__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_456_en & m_0_3__T_456_mask) begin
      m_0_3[m_0_3__T_456_addr] <= m_0_3__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_565_en & m_0_3__T_565_mask) begin
      m_0_3[m_0_3__T_565_addr] <= m_0_3__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_674_en & m_0_3__T_674_mask) begin
      m_0_3[m_0_3__T_674_addr] <= m_0_3__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_891_en & m_0_3__T_891_mask) begin
      m_0_3[m_0_3__T_891_addr] <= m_0_3__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_1000_en & m_0_3__T_1000_mask) begin
      m_0_3[m_0_3__T_1000_addr] <= m_0_3__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_1109_en & m_0_3__T_1109_mask) begin
      m_0_3[m_0_3__T_1109_addr] <= m_0_3__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_1218_en & m_0_3__T_1218_mask) begin
      m_0_3[m_0_3__T_1218_addr] <= m_0_3__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_347_en & m_1_0__T_347_mask) begin
      m_1_0[m_1_0__T_347_addr] <= m_1_0__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_456_en & m_1_0__T_456_mask) begin
      m_1_0[m_1_0__T_456_addr] <= m_1_0__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_565_en & m_1_0__T_565_mask) begin
      m_1_0[m_1_0__T_565_addr] <= m_1_0__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_674_en & m_1_0__T_674_mask) begin
      m_1_0[m_1_0__T_674_addr] <= m_1_0__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_891_en & m_1_0__T_891_mask) begin
      m_1_0[m_1_0__T_891_addr] <= m_1_0__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_1000_en & m_1_0__T_1000_mask) begin
      m_1_0[m_1_0__T_1000_addr] <= m_1_0__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_1109_en & m_1_0__T_1109_mask) begin
      m_1_0[m_1_0__T_1109_addr] <= m_1_0__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_1218_en & m_1_0__T_1218_mask) begin
      m_1_0[m_1_0__T_1218_addr] <= m_1_0__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_347_en & m_1_1__T_347_mask) begin
      m_1_1[m_1_1__T_347_addr] <= m_1_1__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_456_en & m_1_1__T_456_mask) begin
      m_1_1[m_1_1__T_456_addr] <= m_1_1__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_565_en & m_1_1__T_565_mask) begin
      m_1_1[m_1_1__T_565_addr] <= m_1_1__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_674_en & m_1_1__T_674_mask) begin
      m_1_1[m_1_1__T_674_addr] <= m_1_1__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_891_en & m_1_1__T_891_mask) begin
      m_1_1[m_1_1__T_891_addr] <= m_1_1__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_1000_en & m_1_1__T_1000_mask) begin
      m_1_1[m_1_1__T_1000_addr] <= m_1_1__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_1109_en & m_1_1__T_1109_mask) begin
      m_1_1[m_1_1__T_1109_addr] <= m_1_1__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_1218_en & m_1_1__T_1218_mask) begin
      m_1_1[m_1_1__T_1218_addr] <= m_1_1__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_347_en & m_1_2__T_347_mask) begin
      m_1_2[m_1_2__T_347_addr] <= m_1_2__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_456_en & m_1_2__T_456_mask) begin
      m_1_2[m_1_2__T_456_addr] <= m_1_2__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_565_en & m_1_2__T_565_mask) begin
      m_1_2[m_1_2__T_565_addr] <= m_1_2__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_674_en & m_1_2__T_674_mask) begin
      m_1_2[m_1_2__T_674_addr] <= m_1_2__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_891_en & m_1_2__T_891_mask) begin
      m_1_2[m_1_2__T_891_addr] <= m_1_2__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_1000_en & m_1_2__T_1000_mask) begin
      m_1_2[m_1_2__T_1000_addr] <= m_1_2__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_1109_en & m_1_2__T_1109_mask) begin
      m_1_2[m_1_2__T_1109_addr] <= m_1_2__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_1218_en & m_1_2__T_1218_mask) begin
      m_1_2[m_1_2__T_1218_addr] <= m_1_2__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_347_en & m_1_3__T_347_mask) begin
      m_1_3[m_1_3__T_347_addr] <= m_1_3__T_347_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_456_en & m_1_3__T_456_mask) begin
      m_1_3[m_1_3__T_456_addr] <= m_1_3__T_456_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_565_en & m_1_3__T_565_mask) begin
      m_1_3[m_1_3__T_565_addr] <= m_1_3__T_565_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_674_en & m_1_3__T_674_mask) begin
      m_1_3[m_1_3__T_674_addr] <= m_1_3__T_674_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_891_en & m_1_3__T_891_mask) begin
      m_1_3[m_1_3__T_891_addr] <= m_1_3__T_891_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_1000_en & m_1_3__T_1000_mask) begin
      m_1_3[m_1_3__T_1000_addr] <= m_1_3__T_1000_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_1109_en & m_1_3__T_1109_mask) begin
      m_1_3[m_1_3__T_1109_addr] <= m_1_3__T_1109_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_1218_en & m_1_3__T_1218_mask) begin
      m_1_3[m_1_3__T_1218_addr] <= m_1_3__T_1218_data; // @[Mem3D.scala 19:14:@8.4]
    end
  end
endmodule

writeタスク使用版

こちらはwriteを使用した場合のRTL。
随分短い(ライト不使用版:875行/ライト使用版:259行)。違いはライト論理がVec(2) * Vec(4, UInt(8.W))の8つのみ生成される形になっていること。

module Mem3D( // @[:@3.2]
  input         clock, // @[:@4.4]
  input         reset, // @[:@5.4]
  input         io_wren_0, // @[:@6.4]
  input         io_wren_1, // @[:@6.4]
  input         io_rden, // @[:@6.4]
  input  [13:0] io_addr, // @[:@6.4]
  input  [31:0] io_wrdata_0, // @[:@6.4]
  input  [31:0] io_wrdata_1, // @[:@6.4]
  output [31:0] io_rddata_0, // @[:@6.4]
  output [31:0] io_rddata_1 // @[:@6.4]
);
  reg [7:0] m_0_0 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_0;
  wire [7:0] m_0_0__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_0__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_0__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_0__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_0_1 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_1;
  wire [7:0] m_0_1__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_1__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_1__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_1__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_1__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_1__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_1__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_1__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_0_2 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_2;
  wire [7:0] m_0_2__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_2__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_2__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_2__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_2__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_2__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_2__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_2__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_0_3 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_3;
  wire [7:0] m_0_3__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_3__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_3__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_3__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_0_3__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_0_3__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_3__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_0_3__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_1_0 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_4;
  wire [7:0] m_1_0__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_0__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_0__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_0__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_0__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_0__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_0__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_0__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_1_1 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_5;
  wire [7:0] m_1_1__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_1__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_1__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_1__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_1__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_1__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_1__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_1__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_1_2 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_6;
  wire [7:0] m_1_2__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_2__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_2__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_2__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_2__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_2__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_2__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_2__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  reg [7:0] m_1_3 [0:15]; // @[Mem3D.scala 19:14:@8.4]
  reg [31:0] _RAND_7;
  wire [7:0] m_1_3__T_758_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_3__T_758_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_3__T_869_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_3__T_869_addr; // @[Mem3D.scala 19:14:@8.4]
  wire [7:0] m_1_3__T_542_data; // @[Mem3D.scala 19:14:@8.4]
  wire [3:0] m_1_3__T_542_addr; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_3__T_542_mask; // @[Mem3D.scala 19:14:@8.4]
  wire  m_1_3__T_542_en; // @[Mem3D.scala 19:14:@8.4]
  wire [15:0] _T_865; // @[Cat.scala 30:58:@46.4]
  wire [15:0] _T_866; // @[Cat.scala 30:58:@47.4]
  wire [15:0] _T_976; // @[Cat.scala 30:58:@52.4]
  wire [15:0] _T_977; // @[Cat.scala 30:58:@53.4]
  assign m_0_0__T_758_addr = io_addr[3:0];
  assign m_0_0__T_758_data = m_0_0[m_0_0__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_0__T_869_addr = io_addr[3:0];
  assign m_0_0__T_869_data = m_0_0[m_0_0__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_0__T_542_data = io_wrdata_0[7:0];
  assign m_0_0__T_542_addr = io_addr[3:0];
  assign m_0_0__T_542_mask = io_wren_0;
  assign m_0_0__T_542_en = 1'h1;
  assign m_0_1__T_758_addr = io_addr[3:0];
  assign m_0_1__T_758_data = m_0_1[m_0_1__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_1__T_869_addr = io_addr[3:0];
  assign m_0_1__T_869_data = m_0_1[m_0_1__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_1__T_542_data = io_wrdata_0[15:8];
  assign m_0_1__T_542_addr = io_addr[3:0];
  assign m_0_1__T_542_mask = io_wren_0;
  assign m_0_1__T_542_en = 1'h1;
  assign m_0_2__T_758_addr = io_addr[3:0];
  assign m_0_2__T_758_data = m_0_2[m_0_2__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_2__T_869_addr = io_addr[3:0];
  assign m_0_2__T_869_data = m_0_2[m_0_2__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_2__T_542_data = io_wrdata_0[23:16];
  assign m_0_2__T_542_addr = io_addr[3:0];
  assign m_0_2__T_542_mask = io_wren_0;
  assign m_0_2__T_542_en = 1'h1;
  assign m_0_3__T_758_addr = io_addr[3:0];
  assign m_0_3__T_758_data = m_0_3[m_0_3__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_3__T_869_addr = io_addr[3:0];
  assign m_0_3__T_869_data = m_0_3[m_0_3__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_0_3__T_542_data = io_wrdata_0[31:24];
  assign m_0_3__T_542_addr = io_addr[3:0];
  assign m_0_3__T_542_mask = io_wren_0;
  assign m_0_3__T_542_en = 1'h1;
  assign m_1_0__T_758_addr = io_addr[3:0];
  assign m_1_0__T_758_data = m_1_0[m_1_0__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_0__T_869_addr = io_addr[3:0];
  assign m_1_0__T_869_data = m_1_0[m_1_0__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_0__T_542_data = io_wrdata_1[7:0];
  assign m_1_0__T_542_addr = io_addr[3:0];
  assign m_1_0__T_542_mask = io_wren_1;
  assign m_1_0__T_542_en = 1'h1;
  assign m_1_1__T_758_addr = io_addr[3:0];
  assign m_1_1__T_758_data = m_1_1[m_1_1__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_1__T_869_addr = io_addr[3:0];
  assign m_1_1__T_869_data = m_1_1[m_1_1__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_1__T_542_data = io_wrdata_1[15:8];
  assign m_1_1__T_542_addr = io_addr[3:0];
  assign m_1_1__T_542_mask = io_wren_1;
  assign m_1_1__T_542_en = 1'h1;
  assign m_1_2__T_758_addr = io_addr[3:0];
  assign m_1_2__T_758_data = m_1_2[m_1_2__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_2__T_869_addr = io_addr[3:0];
  assign m_1_2__T_869_data = m_1_2[m_1_2__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_2__T_542_data = io_wrdata_1[23:16];
  assign m_1_2__T_542_addr = io_addr[3:0];
  assign m_1_2__T_542_mask = io_wren_1;
  assign m_1_2__T_542_en = 1'h1;
  assign m_1_3__T_758_addr = io_addr[3:0];
  assign m_1_3__T_758_data = m_1_3[m_1_3__T_758_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_3__T_869_addr = io_addr[3:0];
  assign m_1_3__T_869_data = m_1_3[m_1_3__T_869_addr]; // @[Mem3D.scala 19:14:@8.4]
  assign m_1_3__T_542_data = io_wrdata_1[31:24];
  assign m_1_3__T_542_addr = io_addr[3:0];
  assign m_1_3__T_542_mask = io_wren_1;
  assign m_1_3__T_542_en = 1'h1;
  assign _T_865 = {m_0_1__T_758_data,m_0_0__T_758_data}; // @[Cat.scala 30:58:@46.4]
  assign _T_866 = {m_0_3__T_758_data,m_0_2__T_758_data}; // @[Cat.scala 30:58:@47.4]
  assign _T_976 = {m_1_1__T_869_data,m_1_0__T_869_data}; // @[Cat.scala 30:58:@52.4]
  assign _T_977 = {m_1_3__T_869_data,m_1_2__T_869_data}; // @[Cat.scala 30:58:@53.4]
  assign io_rddata_0 = {_T_866,_T_865}; // @[Mem3D.scala 40:18:@49.4]
  assign io_rddata_1 = {_T_977,_T_976}; // @[Mem3D.scala 40:18:@55.4]

  /*** 中略 ***/

  always @(posedge clock) begin
    if(m_0_0__T_542_en & m_0_0__T_542_mask) begin
      m_0_0[m_0_0__T_542_addr] <= m_0_0__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_1__T_542_en & m_0_1__T_542_mask) begin
      m_0_1[m_0_1__T_542_addr] <= m_0_1__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_2__T_542_en & m_0_2__T_542_mask) begin
      m_0_2[m_0_2__T_542_addr] <= m_0_2__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_0_3__T_542_en & m_0_3__T_542_mask) begin
      m_0_3[m_0_3__T_542_addr] <= m_0_3__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_0__T_542_en & m_1_0__T_542_mask) begin
      m_1_0[m_1_0__T_542_addr] <= m_1_0__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_1__T_542_en & m_1_1__T_542_mask) begin
      m_1_1[m_1_1__T_542_addr] <= m_1_1__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_2__T_542_en & m_1_2__T_542_mask) begin
      m_1_2[m_1_2__T_542_addr] <= m_1_2__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
    if(m_1_3__T_542_en & m_1_3__T_542_mask) begin
      m_1_3[m_1_3__T_542_addr] <= m_1_3__T_542_data; // @[Mem3D.scala 19:14:@8.4]
    end
  end
endmodule

とりあえず、これで大体調査しておきたかったことは把握できた。興味本位で調べてただけなので実はあんまり使いどころは考えてない。画像処理とかでなら使いどころはあるかなぁ??と言った感じ。まあその場合でも普通の1次元配列使いそうではあるけど。。。
ここ最近投稿したChiselのMemに関しての記事で作ったコードは以下のリポジトリに置いてありますので試してみたい方はご覧になってみてください。

github.com