ハードウェアの気になるあれこれ

技術的に興味のあることを調べて書いてくブログ。主にハードウェアがネタ。

Chiselで2次元メモリっぽいのを作る(1)

スポンサーリンク

今回はふとChiselで2次元のメモリは作れるのか??と思ったので試してみたのでそれをまとめてみようと思う

Chiselで2次元のメモリ

2次元メモリはSystem Verilogで使える以下のような奴のことを指しています。

reg [3:0][7:0] mem[0:1023]

これを調べてみようと思ったのはSCR1というRISC-Vのコアのメモリでこの形にメモリが使われていたから。

github.com

//-------------------------------------------------------------------------------
// Local signal declaration
//-------------------------------------------------------------------------------
logic [SCR1_NBYTES-1:0][7:0] memory_array [0:(SCR1_SIZE/SCR1_NBYTES)-1];
logic [3:0] wenbb;
//-------------------------------------------------------------------------------
// Port B memory behavioral description
//-------------------------------------------------------------------------------
assign wenbb = {4{wenb}} & webb;
always_ff @(posedge clk) begin
    if (wenb) begin
        if (wenbb[0]) begin
            memory_array[addrb][0] <= datab[0+:8];
        end
        if (wenbb[1]) begin
            memory_array[addrb][1] <= datab[8+:8];
        end
        if (wenbb[2]) begin
            memory_array[addrb][2] <= datab[16+:8];
        end
        if (wenbb[3]) begin
            memory_array[addrb][3] <= datab[24+:8];
        end
    end
    qb <= memory_array[addrb];
end

試してみたことその1~Mem in Mem~

最初は何も考えずにChiselのMemを重ねてみたらいいのでは??と思って試してみたのが以下のコード。

  // 文法エラー
  val m = Mem(16, Mem(4, UInt(8.W)))

すでにご存じの方もいるとは思うが、上記はそもそも文法エラーになってしまう。その理由はMemの実装追いかけていくとわかる。

object Mem {
  @chiselRuntimeDeprecated
  @deprecated("Mem argument order should be size, t; this will be removed by the official release", "chisel3")
  def apply[T <: Data](t: T, size: Int)(implicit compileOptions: CompileOptions): Mem[T] = do_apply(size, t)(UnlocatableSourceInfo, compileOptions)

  /** Creates a combinational/asynchronous-read, sequential/synchronous-write [[Mem]].
    *
    * @param size number of elements in the memory
    * @param t data type of memory element
    */
  def apply[T <: Data](size: Int, t: T): Mem[T] = macro MemTransform.apply[T]
  def do_apply[T <: Data](size: Int, t: T)(implicit sourceInfo: SourceInfo, compileOptions: CompileOptions): Mem[T] = {
    if (compileOptions.declaredTypeMustBeUnbound) {
      requireIsChiselType(t, "memory type")
    }
    val mt  = t.cloneTypeFull
    val mem = new Mem(mt, size)
    pushCommand(DefMemory(sourceInfo, mem, mt, size))
    mem
  }
}

Chiselで通常のお作法でMemを宣言するとオブジェクトが呼ばれ上記のapplyが実行される。その際にマクロ展開によりdo_applyが呼ばれるのだが、その中でMemクラスをnewインスタンスされるようになっている。
ではこのMemクラスの実装はどうなっているかを見てみよう。

// こっちがMemの宣言
sealed class Mem[T <: Data] private (t: T, length: Int) extends MemBase(t, length)

// Memのスーパークラス
sealed abstract class MemBase[T <: Data](t: T, val length: Int) extends HasId with NamedComponent {

上記はMemの宣言を追っていったものなのだが、Memスーパークラスで継承しているのがHasIdになっている。このHasIdはChiselの型の大元に近いものでChiselのDataクラスもこのHasIdを継承したものになっている。

abstract class Data extends HasId with NamedComponent {

ということでMemのクラスパラメータのT <: DataにはMemは含まれていないので、文法エラーが発生したということになる。

試してみたことその2~じゃあBundleでくるんでみよう~

上記の話を踏まえて、じゃあDataにしたものに化かしてみようということでMemBundleでくるんでみる。

class NdimMem extends Module {
  val io = IO(new Bundle {
    val wren = Input(Bool())
    val rden = Input(Bool())
    val addr = Input(UInt(14.W))
    val wrdata = Input(UInt(32.W))
    val rddata = Output(UInt(32.W))
  })

  val m = Mem(16, new Bundle {
    val col = Mem(4, UInt(8.W))
  })

  when (io.wren) {
    m(io.addr).col(0) := io.wrdata(7, 0)
    m(io.addr).col(1) := io.wrdata(15, 8)
    m(io.addr).col(2) := io.wrdata(23, 16)
    m(io.addr).col(3) := io.wrdata(31, 24)
  }

  io.rddata := Cat(m(io.addr).col(3), m(io.addr).col(2), m(io.addr).col(1), m(io.addr).col(0))
}

これは文法エラーにはならずにRTL自体は生成可能だ。ただ所望の機能は実現できず、何を書いても0が読める。

[warn] there were 37 feature warnings; re-run with -feature for details
[warn] one warning found
[info] Done compiling.
[warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see the list
[info] Done packaging.
[info] Running Test
[info] [0.016] Elaborating design...
[info] [1.843] Done elaborating.
Total FIRRTL Compile Time: 650.9 ms
Total FIRRTL Compile Time: 251.8 ms
file loaded in 0.4342985 seconds, 29 symbols, 11 statements
[info] [0.000] SEED 1556600558508
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 2   io_rddata got 0 expected 1 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 4   io_rddata got 0 expected 2 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 6   io_rddata got 0 expected 3 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 8   io_rddata got 0 expected 4 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 10   io_rddata got 0 expected 5 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 12   io_rddata got 0 expected 6 FAIL
[info] [0.016] read data = 0x0
[info] [0.016] EXPECT AT 14   io_rddata got 0 expected 7 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 16   io_rddata got 0 expected 8 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 18   io_rddata got 0 expected 9 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 20   io_rddata got 0 expected 10 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 22   io_rddata got 0 expected 11 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 24   io_rddata got 0 expected 12 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 26   io_rddata got 0 expected 13 FAIL
[info] [0.031] read data = 0x0
[info] [0.031] EXPECT AT 28   io_rddata got 0 expected 14 FAIL
[info] [0.031] read data = 0x0
[info] [0.047] EXPECT AT 30   io_rddata got 0 expected 15 FAIL
[info] [0.094] read data = 0x0
[info] [0.094] EXPECT AT 32   io_rddata got 0 expected 16 FAIL
test NdimMem Success: 0 tests passed in 37 cycles in 0.125426 seconds 294.99 Hz
[info] [0.094] RAN 32 CYCLES FAILED FIRST AT CYCLE 2
[success] Total time: 12 s, completed 2019/04/30 14:02:42

でなんでこうなるんだろう??ってエラボレート後のFIRRTLを見てみたら以下のようになっていた。

circuit NdimMem :
  module NdimMem :
    input clock : Clock
    input reset : UInt<1>
    output io : {flip wren : UInt<1>, flip rden : UInt<1>, flip addr : UInt<14>, flip wrdata : UInt<32>, rddata : UInt<32>}

    cmem _T_18 : UInt<8>[4] @[Mem.scala 44:18]
    cmem _T_22 : UInt<8>[4] @[Mem.scala 44:18]
    cmem m : {}[16] @[Mem.scala 43:14]

メモリの変数mの中が何もない。。。 そりゃ書けないわーーー。

試してみたことその3~それならVecにしてみよう~

最終的にはVecMemに入れることにした。

注:2019/05/02の記事を書いている際に気づいたので訂正してます。

class NdimMem extends Module {
  val io = IO(new Bundle {
    val wren = Input(Bool())
    val rden = Input(Bool())
    val addr = Input(UInt(14.W))
    val wrdata = Input(UInt(32.W))
    val rddata = Output(UInt(32.W))
  })

  /**
   * 2019/05/02の記事執筆時にBundleでくるむ必要が無いことに
   * 気づいたため修正しました。
  /*
  /*
  val m = Mem(16, new Bundle {
    val col = Vec(4, UInt(8.W))
  })
  */
  val m = Mem(16, Vec(4, UInt(8.W)) // 2019/05/02 : ただのVecに変更
  /*
  val m = Mem(16, new Bundle {
    val col = Mem(4, UInt(8.W))
  })
  */

  when (io.wren) {
    /*
    m(io.addr).col(0) := io.wrdata(7, 0)
    m(io.addr).col(1) := io.wrdata(15, 8)
    m(io.addr).col(2) := io.wrdata(23, 16)
    m(io.addr).col(3) := io.wrdata(31, 24)
    */
    // 2019/05/02 : Bundleの削除に伴いcol.を削除
    m(io.addr)(0) := io.wrdata(7, 0)
    m(io.addr)(1) := io.wrdata(15, 8)
    m(io.addr)(2) := io.wrdata(23, 16)
    m(io.addr)(3) := io.wrdata(31, 24)
  }

  //io.rddata := Cat(m(io.addr).col(3), m(io.addr).col(2), m(io.addr).col(1), m(io.addr).col(0))
  io.rddata := Cat(m(io.addr).reverse) // 2019/05/02 : Catで結合可能なので変更
}

Vec版の2次元メモリのテスト結果

見ての通りで今度はうまくいった。

[warn] there were 29 feature warnings; re-run with -feature for details
[warn] one warning found
[info] Done compiling.
[warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see the list
[info] Done packaging.
[info] Running Test
[info] [0.015] Elaborating design...
[info] [1.968] Done elaborating.
Total FIRRTL Compile Time: 1000.4 ms
Total FIRRTL Compile Time: 502.3 ms
file loaded in 0.8713672 seconds, 285 symbols, 183 statements
[info] [0.000] SEED 1556601773665
[info] [0.016] read data = 0x1
[info] [0.016] read data = 0x2
[info] [0.016] read data = 0x3
~略~
[info] [4.314] read data = 0x3fd
[info] [4.314] read data = 0x3fe
[info] [4.314] read data = 0x3ff
[info] [4.314] read data = 0x400
test NdimMem Success: 1024 tests passed in 2053 cycles in 4.339843 seconds 473.06 Hz
[info] [0.110] RAN 32 CYCLES PASSED
[success] Total time: 17 s, completed 2019/04/30 14:22:59

FIRRTL

こちらもFIRRTLを確認してみたが、今度はcmem mの中身にちゃんとUInt<8>[4]が入っていた。

;buildInfoPackage: chisel3, version: 3.1.7, scalaVersion: 2.11.12, sbtVersion: 1.1.1, builtAtString: 2019-03-20 22:15:13.399, builtAtMillis: 1553120113399
circuit NdimMem :
  module NdimMem :
    input clock : Clock
    input reset : UInt<1>
    output io : {flip wren : UInt<1>, flip rden : UInt<1>, flip addr : UInt<14>, flip wrdata : UInt<32>, rddata : UInt<32>}

    cmem m : {col : UInt<8>[4]}[16] @[Mem.scala 38:14]

RTL

最後にRTLにどう変換されるのかについても確認しておこう。
長いけどほぼ丸々載せます。見てもらうとわかる通りでVecの要素がm_col_0~m_col_3マッピングされる形となった。 ちょっと気になるのが各メモリ配列に対して4つの書き込み処理が実装されていること。ただこれも各メモリ配列ごとに有効な書き込みは4つのうち1つになっているみたいなので、合成時には最適化で消えるはず。
当初想定していた形にはなっていないけど、まあこれはこれでいいかな、という気もする。
# 2019/05/02追記 : Verilog-HDLの範囲だと多次元配列がサポートされていないので、これ以外の方法が無かったですね。

module NdimMem( // @[:@3.2]
  input         clock, // @[:@4.4]
  input         reset, // @[:@5.4]
  input         io_wren, // @[:@6.4]
  input         io_rden, // @[:@6.4]
  input  [13:0] io_addr, // @[:@6.4]
  input  [31:0] io_wrdata, // @[:@6.4]
  output [31:0] io_rddata // @[:@6.4]
);
  reg [7:0] m_col_0 [0:15]; // @[Mem.scala 38:14:@8.4]
  reg [31:0] _RAND_0;
  wire [7:0] m_col_0__T_42_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_42_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_46_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_46_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_50_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_50_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_54_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_54_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_22_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_22_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_22_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_22_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_27_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_27_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_27_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_27_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_32_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_32_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_32_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_32_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_0__T_37_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_0__T_37_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_37_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_0__T_37_en; // @[Mem.scala 38:14:@8.4]
  reg [7:0] m_col_1 [0:15]; // @[Mem.scala 38:14:@8.4]
  reg [31:0] _RAND_1;
  wire [7:0] m_col_1__T_42_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_42_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_46_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_46_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_50_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_50_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_54_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_54_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_22_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_22_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_22_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_22_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_27_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_27_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_27_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_27_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_32_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_32_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_32_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_32_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_1__T_37_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_1__T_37_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_37_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_1__T_37_en; // @[Mem.scala 38:14:@8.4]
  reg [7:0] m_col_2 [0:15]; // @[Mem.scala 38:14:@8.4]
  reg [31:0] _RAND_2;
  wire [7:0] m_col_2__T_42_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_42_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_46_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_46_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_50_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_50_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_54_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_54_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_22_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_22_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_22_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_22_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_27_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_27_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_27_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_27_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_32_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_32_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_32_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_32_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_2__T_37_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_2__T_37_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_37_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_2__T_37_en; // @[Mem.scala 38:14:@8.4]
  reg [7:0] m_col_3 [0:15]; // @[Mem.scala 38:14:@8.4]
  reg [31:0] _RAND_3;
  wire [7:0] m_col_3__T_42_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_42_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_46_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_46_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_50_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_50_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_54_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_54_addr; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_22_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_22_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_22_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_22_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_27_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_27_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_27_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_27_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_32_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_32_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_32_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_32_en; // @[Mem.scala 38:14:@8.4]
  wire [7:0] m_col_3__T_37_data; // @[Mem.scala 38:14:@8.4]
  wire [3:0] m_col_3__T_37_addr; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_37_mask; // @[Mem.scala 38:14:@8.4]
  wire  m_col_3__T_37_en; // @[Mem.scala 38:14:@8.4]
  wire [15:0] _T_57; // @[Cat.scala 30:58:@35.4]
  wire [15:0] _T_58; // @[Cat.scala 30:58:@36.4]
  assign m_col_0__T_42_addr = io_addr[3:0];
  assign m_col_0__T_42_data = m_col_0[m_col_0__T_42_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_0__T_46_addr = io_addr[3:0];
  assign m_col_0__T_46_data = m_col_0[m_col_0__T_46_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_0__T_50_addr = io_addr[3:0];
  assign m_col_0__T_50_data = m_col_0[m_col_0__T_50_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_0__T_54_addr = io_addr[3:0];
  assign m_col_0__T_54_data = m_col_0[m_col_0__T_54_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_0__T_22_data = io_wrdata[7:0];
  assign m_col_0__T_22_addr = io_addr[3:0];
  assign m_col_0__T_22_mask = 1'h1;
  assign m_col_0__T_22_en = io_wren;
  assign m_col_0__T_27_data = 8'h0;
  assign m_col_0__T_27_addr = io_addr[3:0];
  assign m_col_0__T_27_mask = 1'h0;
  assign m_col_0__T_27_en = io_wren;
  assign m_col_0__T_32_data = 8'h0;
  assign m_col_0__T_32_addr = io_addr[3:0];
  assign m_col_0__T_32_mask = 1'h0;
  assign m_col_0__T_32_en = io_wren;
  assign m_col_0__T_37_data = 8'h0;
  assign m_col_0__T_37_addr = io_addr[3:0];
  assign m_col_0__T_37_mask = 1'h0;
  assign m_col_0__T_37_en = io_wren;
  assign m_col_1__T_42_addr = io_addr[3:0];
  assign m_col_1__T_42_data = m_col_1[m_col_1__T_42_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_1__T_46_addr = io_addr[3:0];
  assign m_col_1__T_46_data = m_col_1[m_col_1__T_46_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_1__T_50_addr = io_addr[3:0];
  assign m_col_1__T_50_data = m_col_1[m_col_1__T_50_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_1__T_54_addr = io_addr[3:0];
  assign m_col_1__T_54_data = m_col_1[m_col_1__T_54_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_1__T_22_data = 8'h0;
  assign m_col_1__T_22_addr = io_addr[3:0];
  assign m_col_1__T_22_mask = 1'h0;
  assign m_col_1__T_22_en = io_wren;
  assign m_col_1__T_27_data = io_wrdata[15:8];
  assign m_col_1__T_27_addr = io_addr[3:0];
  assign m_col_1__T_27_mask = 1'h1;
  assign m_col_1__T_27_en = io_wren;
  assign m_col_1__T_32_data = 8'h0;
  assign m_col_1__T_32_addr = io_addr[3:0];
  assign m_col_1__T_32_mask = 1'h0;
  assign m_col_1__T_32_en = io_wren;
  assign m_col_1__T_37_data = 8'h0;
  assign m_col_1__T_37_addr = io_addr[3:0];
  assign m_col_1__T_37_mask = 1'h0;
  assign m_col_1__T_37_en = io_wren;
  assign m_col_2__T_42_addr = io_addr[3:0];
  assign m_col_2__T_42_data = m_col_2[m_col_2__T_42_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_2__T_46_addr = io_addr[3:0];
  assign m_col_2__T_46_data = m_col_2[m_col_2__T_46_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_2__T_50_addr = io_addr[3:0];
  assign m_col_2__T_50_data = m_col_2[m_col_2__T_50_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_2__T_54_addr = io_addr[3:0];
  assign m_col_2__T_54_data = m_col_2[m_col_2__T_54_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_2__T_22_data = 8'h0;
  assign m_col_2__T_22_addr = io_addr[3:0];
  assign m_col_2__T_22_mask = 1'h0;
  assign m_col_2__T_22_en = io_wren;
  assign m_col_2__T_27_data = 8'h0;
  assign m_col_2__T_27_addr = io_addr[3:0];
  assign m_col_2__T_27_mask = 1'h0;
  assign m_col_2__T_27_en = io_wren;
  assign m_col_2__T_32_data = io_wrdata[23:16];
  assign m_col_2__T_32_addr = io_addr[3:0];
  assign m_col_2__T_32_mask = 1'h1;
  assign m_col_2__T_32_en = io_wren;
  assign m_col_2__T_37_data = 8'h0;
  assign m_col_2__T_37_addr = io_addr[3:0];
  assign m_col_2__T_37_mask = 1'h0;
  assign m_col_2__T_37_en = io_wren;
  assign m_col_3__T_42_addr = io_addr[3:0];
  assign m_col_3__T_42_data = m_col_3[m_col_3__T_42_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_3__T_46_addr = io_addr[3:0];
  assign m_col_3__T_46_data = m_col_3[m_col_3__T_46_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_3__T_50_addr = io_addr[3:0];
  assign m_col_3__T_50_data = m_col_3[m_col_3__T_50_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_3__T_54_addr = io_addr[3:0];
  assign m_col_3__T_54_data = m_col_3[m_col_3__T_54_addr]; // @[Mem.scala 38:14:@8.4]
  assign m_col_3__T_22_data = 8'h0;
  assign m_col_3__T_22_addr = io_addr[3:0];
  assign m_col_3__T_22_mask = 1'h0;
  assign m_col_3__T_22_en = io_wren;
  assign m_col_3__T_27_data = 8'h0;
  assign m_col_3__T_27_addr = io_addr[3:0];
  assign m_col_3__T_27_mask = 1'h0;
  assign m_col_3__T_27_en = io_wren;
  assign m_col_3__T_32_data = 8'h0;
  assign m_col_3__T_32_addr = io_addr[3:0];
  assign m_col_3__T_32_mask = 1'h0;
  assign m_col_3__T_32_en = io_wren;
  assign m_col_3__T_37_data = io_wrdata[31:24];
  assign m_col_3__T_37_addr = io_addr[3:0];
  assign m_col_3__T_37_mask = 1'h1;
  assign m_col_3__T_37_en = io_wren;
  assign _T_57 = {m_col_1__T_50_data,m_col_0__T_54_data}; // @[Cat.scala 30:58:@35.4]
  assign _T_58 = {m_col_3__T_42_data,m_col_2__T_46_data}; // @[Cat.scala 30:58:@36.4]
  assign io_rddata = {_T_58,_T_57}; // @[Mem.scala 55:13:@38.4]
  always @(posedge clock) begin
    if(m_col_0__T_22_en & m_col_0__T_22_mask) begin
      m_col_0[m_col_0__T_22_addr] <= m_col_0__T_22_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_0__T_27_en & m_col_0__T_27_mask) begin
      m_col_0[m_col_0__T_27_addr] <= m_col_0__T_27_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_0__T_32_en & m_col_0__T_32_mask) begin
      m_col_0[m_col_0__T_32_addr] <= m_col_0__T_32_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_0__T_37_en & m_col_0__T_37_mask) begin
      m_col_0[m_col_0__T_37_addr] <= m_col_0__T_37_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_1__T_22_en & m_col_1__T_22_mask) begin
      m_col_1[m_col_1__T_22_addr] <= m_col_1__T_22_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_1__T_27_en & m_col_1__T_27_mask) begin
      m_col_1[m_col_1__T_27_addr] <= m_col_1__T_27_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_1__T_32_en & m_col_1__T_32_mask) begin
      m_col_1[m_col_1__T_32_addr] <= m_col_1__T_32_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_1__T_37_en & m_col_1__T_37_mask) begin
      m_col_1[m_col_1__T_37_addr] <= m_col_1__T_37_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_2__T_22_en & m_col_2__T_22_mask) begin
      m_col_2[m_col_2__T_22_addr] <= m_col_2__T_22_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_2__T_27_en & m_col_2__T_27_mask) begin
      m_col_2[m_col_2__T_27_addr] <= m_col_2__T_27_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_2__T_32_en & m_col_2__T_32_mask) begin
      m_col_2[m_col_2__T_32_addr] <= m_col_2__T_32_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_2__T_37_en & m_col_2__T_37_mask) begin
      m_col_2[m_col_2__T_37_addr] <= m_col_2__T_37_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_3__T_22_en & m_col_3__T_22_mask) begin
      m_col_3[m_col_3__T_22_addr] <= m_col_3__T_22_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_3__T_27_en & m_col_3__T_27_mask) begin
      m_col_3[m_col_3__T_27_addr] <= m_col_3__T_27_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_3__T_32_en & m_col_3__T_32_mask) begin
      m_col_3[m_col_3__T_32_addr] <= m_col_3__T_32_data; // @[Mem.scala 38:14:@8.4]
    end
    if(m_col_3__T_37_en & m_col_3__T_37_mask) begin
      m_col_3[m_col_3__T_37_addr] <= m_col_3__T_37_data; // @[Mem.scala 38:14:@8.4]
    end
  end
endmodule

MemBaseにマスク付きのライトタスクが用意されているので、ひょっとするとそっちを使うと何か変化が起きるのかもしれないので別途試してみようと思う。
ということで平成最後の記事でした。令和になっても相変わらずChiselネタを投稿すると思いますので興味がある方はお付き合いください。