Chisel Bootcamp - 幕間（2） - ChiselのQueue - ハードウェアの気になるあれこれ

前回の記事ではChisel BootcampはModule3.2のと3.3の幕間の章に入り、Chiselの標準ライブラリの紹介の導入としてDecoupledIOについてを学習した。

www.tech-diningyo.info

今日も引き続きChiselの標準ライブラリを紹介していく。今日はChiselのQueueだ。

Module 3.2幕間: Chiselの標準ライブラリ
- Queue

Module 3.2幕間: Chiselの標準ライブラリ

Queue

Chiselの標準ライブラリの一つにキューがある。ChiselのQueueは以下の様な特徴を備える。

データの入力／出力の双方にDecoupledIOを備える
バックプレッシャーに対応
データの種類とキューの深さを変更可能

まずはサンプルコードを見ていく。

Driver(() => new Module {
    // Example circuit using a Queue
    val io = IO(new Bundle {
      val in = Flipped(Decoupled(UInt(8.W)))
      val out = Decoupled(UInt(8.W))
    })
    val queue = Queue(io.in, 2)  // 2-element queue
    io.out <> queue
  }) { c => new PeekPokeTester(c) {
    // Example testsequence showing the use and behavior of Queue
    poke(c.io.out.ready, 0)
    poke(c.io.in.valid, 1)  // Enqueue an element
    poke(c.io.in.bits, 42)
    println(s"Starting:")
    println(s"\tio.in: ready=${peek(c.io.in.ready)}")
    println(s"\tio.out: valid=${peek(c.io.out.valid)}, bits=${peek(c.io.out.bits)}")
    step(1)
  
    poke(c.io.in.valid, 1)  // Enqueue another element
    poke(c.io.in.bits, 43)
    // What do you think io.out.valid and io.out.bits will be?
    println(s"After first enqueue:")
    println(s"\tio.in: ready=${peek(c.io.in.ready)}")
    println(s"\tio.out: valid=${peek(c.io.out.valid)}, bits=${peek(c.io.out.bits)}")
    step(1)
  
    poke(c.io.in.valid, 1)  // Read a element, attempt to enqueue
    poke(c.io.in.bits, 44)
    poke(c.io.out.ready, 1)
    // What do you think io.in.ready will be, and will this enqueue succeed, and what will be read?
    println(s"On first read:")
    println(s"\tio.in: ready=${peek(c.io.in.ready)}")
    println(s"\tio.out: valid=${peek(c.io.out.valid)}, bits=${peek(c.io.out.bits)}")
    step(1)
  
    poke(c.io.in.valid, 0)  // Read elements out
    poke(c.io.out.ready, 1)
    // What do you think will be read here?
    println(s"On second read:")
    println(s"\tio.in: ready=${peek(c.io.in.ready)}")
    println(s"\tio.out: valid=${peek(c.io.out.valid)}, bits=${peek(c.io.out.bits)}")
    step(1)
  
    // Will a third read produce anything?
    println(s"On third read:")
    println(s"\tio.in: ready=${peek(c.io.in.ready)}")
    println(s"\tio.out: valid=${peek(c.io.out.valid)}, bits=${peek(c.io.out.bits)}")
    step(1)
} }

見ての通り、今回のサンプルコードでは、テスターに直接Moduleの実装を行いインスタンスしたモジュールをテスターに引き渡している。

実はQueueの他にもサラッと新しい機能が使われているのでModuleの定義部分のみを取り出して分解してみていく。

Queueを使ったモジュール

先のサンプルコードからModuleの定義部分を抜粋してChiselのモジュールとして切り出すと以下になる。

class QueueWrapper extends Module {
    // Example circuit using a Queue
    val io = IO(new Bundle {
      val in = Flipped(Decoupled(UInt(8.W)))
      val out = Decoupled(UInt(8.W))
    })
    val queue = Queue(io.in, 2)  // 2-element queue
    io.out <> queue
}

このサンプルではChiselのQueueをくるむラッパークラスとしてQueueWrapperを定義してみた。先ほど書いた新しい機能のうちの一つがこのクラスのIOで使用されているFlippedのことだ。

Flipped

まあ、ぶっちゃけ見たまんまでなんとなく想像は付いていると思うし、たぶんその想像で正解だ。

単純に機能を書くと

引数に与えられたIOのIn/Outをひっくり返したオブジェクトが生成される

という風に表現できるだろうか。

なので、上記のQueueWrapperクラスからRTLを生成してIO部分を見てみると以下の様にinに指定したDecoupledIOのIn/Outの方向がひっくり返したポートが定義される。

module cmd4HelperQueueWrapper( // @[:@53.2]
  input        clock, // @[:@54.4]
  input        reset, // @[:@55.4]
  output       io_in_ready, // @[:@56.4]
  input        io_in_valid, // @[:@56.4]
  input  [7:0] io_in_bits, // @[:@56.4]
  input        io_out_ready, // @[:@56.4]
  output       io_out_valid, // @[:@56.4]
  output [7:0] io_out_bits // @[:@56.4]
);

作ったモジュールの対向を書くときとかにポート宣言部分をコピペしてIn/Outだけひっくり返したりしてたけど、ChiselならFilppedで一発。これ、、、めっちゃ便利！！

ちょっと気になったのでもう少し掘り下げてみる。

Flipped自体はChisel3のData.scalaというファイルに定義されている。宣言は以下のようなものだ。

object Flipped {
  def apply[T<:Data](source: T)(implicit compileOptions: CompileOptions): T = {
    if (compileOptions.checkSynthesizable) {
      requireIsChiselType(source)
    }
    val out = source.cloneType.asInstanceOf[T]
    out.specifiedDirection = SpecifiedDirection.flip(source.specifiedDirection)
    out
  }
}

まだそこまでSclaaの文法を細かく説明できないんだがFlippedを使った時にapplyの処理が実行されており、

引数として渡されたChiselのData型を複製
複製したoutにSpecifiedDirection.flipの処理を適用
outを戻り値に設定

とすることでポートの反転を行っているようだ。

SpecifiedDirection.flipの処理は以下の様にmatch文を使って各々の属性に対しての対の属性を返却しているみたい。

def flip(dir: SpecifiedDirection) = dir match {
    case Unspecified => Flip
    case Flip => Unspecified
    case Output => Input
    case Input => Output
}

`Queue`のインスタンス

さて本筋に戻ってQueueのインスタンス部分とその使い方を見てみる。

val queue = Queue(io.in, 2)  // 2-element queue
io.out <> queue

これはとてもシンプルに、io.inを第一引数に渡して、第二引数にキューの深さを指定すればOK。

でここでももうひとつ見知らぬ記号<>が出てきた。 ~~調べてみたところprojectionという処理のご様子。~~ ＃この演算子<>の名称知ってたらお教えいただきたい。。Scalaの記号、調べにくすぎる。。

stackoverflow.com

~~このstack overflowの記事にとても丁寧に処理を書いていただいてあるので、なんとなくは理解した。~~

2019/07/29追記

上記のprojection云々は関係なかった。。 Scalaではメソッド名に記号も使えるので、Chiselではそれを利用して以下のようにIO接続用のメソッドを定義している。

実体はbulkConnectになるのでバルク・コネクトって呼べば良いのかしら。。。

  final def <> (that: Data)(implicit sourceInfo: SourceInfo, connectionCompileOptions: CompileOptions): Unit = 
this.bulkConnect(that)(sourceInfo, connectionCompileOptions)

追記ここまで

これも先のFlippedで紹介したapllyが適用される処理に見える。

これを踏まえてQueueの実装を確認していくと、Queueのapplyは以下の様になっていた。

def apply[T <: Data](
    enq: ReadyValidIO[T],
    entries: Int = 2,
    pipe: Boolean = false,
    flow: Boolean = false): DecoupledIO[T] = {
    if (entries == 0) {
        val deq = Wire(new DecoupledIO(enq.bits))
        deq.valid := enq.valid
        deq.bits := enq.bits
        enq.ready := deq.ready
        deq
    } else {
        require(entries > 0)
        val q = Module(new Queue(chiselTypeOf(enq.bits), entries, pipe, flow))
        q.io.enq.valid := enq.valid // not using <> so that override is allowed
        q.io.enq.bits := enq.bits
        enq.ready := q.io.enq.ready
        TransitName(q.io.deq, q)
    }
}

処理がentriesの数で切り替わるようになっており

キューの深さが0の場合はdeqと言いながらWireを宣言してI/Oを直結
それ以外の場合はQueueクラスをインスタンスして、各種信号を接続

するような仕組みになっているようだ。

Chiselの標準ライブラリ`Queue`から生成されるRTL

サンプルの深さ==2のキューからRTLを生成すると以下の様なRTLが得られた。

因みに最初の方のQueueのChisel実装はDecoupled.scalaにあるので興味があればご確認いただきたい。

// こっちがChiselの標準ライブラリQueueで生成されたキューの実装
module Queue( // @[:@3.2]
  input        clock, // @[:@4.4]
  input        reset, // @[:@5.4]
  output       io_enq_ready, // @[:@6.4]
  input        io_enq_valid, // @[:@6.4]
  input  [7:0] io_enq_bits, // @[:@6.4]
  input        io_deq_ready, // @[:@6.4]
  output       io_deq_valid, // @[:@6.4]
  output [7:0] io_deq_bits // @[:@6.4]
);
  reg [7:0] ram [0:1]; // @[Decoupled.scala 214:24:@8.4]
  reg [31:0] _RAND_0;
  wire [7:0] ram__T_63_data; // @[Decoupled.scala 214:24:@8.4]
  wire  ram__T_63_addr; // @[Decoupled.scala 214:24:@8.4]
  wire [7:0] ram__T_49_data; // @[Decoupled.scala 214:24:@8.4]
  wire  ram__T_49_addr; // @[Decoupled.scala 214:24:@8.4]
  wire  ram__T_49_mask; // @[Decoupled.scala 214:24:@8.4]
  wire  ram__T_49_en; // @[Decoupled.scala 214:24:@8.4]
  reg  value; // @[Counter.scala 26:33:@9.4]
  reg [31:0] _RAND_1;
  reg  value_1; // @[Counter.scala 26:33:@10.4]
  reg [31:0] _RAND_2;
  reg  maybe_full; // @[Decoupled.scala 217:35:@11.4]
  reg [31:0] _RAND_3;
  wire  _T_41; // @[Decoupled.scala 219:41:@12.4]
  wire  _T_43; // @[Decoupled.scala 220:36:@13.4]
  wire  empty; // @[Decoupled.scala 220:33:@14.4]
  wire  _T_44; // @[Decoupled.scala 221:32:@15.4]
  wire  do_enq; // @[Decoupled.scala 37:37:@16.4]
  wire  do_deq; // @[Decoupled.scala 37:37:@19.4]
  wire [1:0] _T_52; // @[Counter.scala 35:22:@26.6]
  wire  _T_53; // @[Counter.scala 35:22:@27.6]
  wire  _GEN_4; // @[Decoupled.scala 225:17:@22.4]
  wire [1:0] _T_56; // @[Counter.scala 35:22:@32.6]
  wire  _T_57; // @[Counter.scala 35:22:@33.6]
  wire  _GEN_5; // @[Decoupled.scala 229:17:@30.4]
  wire  _T_58; // @[Decoupled.scala 232:16:@36.4]
  wire  _GEN_6; // @[Decoupled.scala 232:28:@37.4]
  wire  _T_60; // @[Decoupled.scala 236:19:@40.4]
  wire  _T_62; // @[Decoupled.scala 237:19:@42.4]
  assign ram__T_63_addr = value_1;
  assign ram__T_63_data = ram[ram__T_63_addr]; // @[Decoupled.scala 214:24:@8.4]
  assign ram__T_49_data = io_enq_bits;
  assign ram__T_49_addr = value;
  assign ram__T_49_mask = do_enq;
  assign ram__T_49_en = do_enq;
  assign _T_41 = value == value_1; // @[Decoupled.scala 219:41:@12.4]
  assign _T_43 = maybe_full == 1'h0; // @[Decoupled.scala 220:36:@13.4]
  assign empty = _T_41 & _T_43; // @[Decoupled.scala 220:33:@14.4]
  assign _T_44 = _T_41 & maybe_full; // @[Decoupled.scala 221:32:@15.4]
  assign do_enq = io_enq_ready & io_enq_valid; // @[Decoupled.scala 37:37:@16.4]
  assign do_deq = io_deq_ready & io_deq_valid; // @[Decoupled.scala 37:37:@19.4]
  assign _T_52 = value + 1'h1; // @[Counter.scala 35:22:@26.6]
  assign _T_53 = _T_52[0:0]; // @[Counter.scala 35:22:@27.6]
  assign _GEN_4 = do_enq ? _T_53 : value; // @[Decoupled.scala 225:17:@22.4]
  assign _T_56 = value_1 + 1'h1; // @[Counter.scala 35:22:@32.6]
  assign _T_57 = _T_56[0:0]; // @[Counter.scala 35:22:@33.6]
  assign _GEN_5 = do_deq ? _T_57 : value_1; // @[Decoupled.scala 229:17:@30.4]
  assign _T_58 = do_enq != do_deq; // @[Decoupled.scala 232:16:@36.4]
  assign _GEN_6 = _T_58 ? do_enq : maybe_full; // @[Decoupled.scala 232:28:@37.4]
  assign _T_60 = empty == 1'h0; // @[Decoupled.scala 236:19:@40.4]
  assign _T_62 = _T_44 == 1'h0; // @[Decoupled.scala 237:19:@42.4]
  assign io_enq_ready = _T_62;
  assign io_deq_valid = _T_60;
  assign io_deq_bits = ram__T_63_data;
  always @(posedge clock) begin
    if(ram__T_49_en & ram__T_49_mask) begin
      ram[ram__T_49_addr] <= ram__T_49_data; // @[Decoupled.scala 214:24:@8.4]
    end
    if (reset) begin
      value <= 1'h0;
    end else begin
      if (do_enq) begin
        value <= _T_53;
      end
    end
    if (reset) begin
      value_1 <= 1'h0;
    end else begin
      if (do_deq) begin
        value_1 <= _T_57;
      end
    end
    if (reset) begin
      maybe_full <= 1'h0;
    end else begin
      if (_T_58) begin
        maybe_full <= do_enq;
      end
    end
  end
endmodule

// こっちがラッパークラス
module cmd4HelperQueueWrapper( // @[:@53.2]
  input        clock, // @[:@54.4]
  input        reset, // @[:@55.4]
  output       io_in_ready, // @[:@56.4]
  input        io_in_valid, // @[:@56.4]
  input  [7:0] io_in_bits, // @[:@56.4]
  input        io_out_ready, // @[:@56.4]
  output       io_out_valid, // @[:@56.4]
  output [7:0] io_out_bits // @[:@56.4]
);
  wire  queue_clock; // @[Decoupled.scala 293:21:@58.4]
  wire  queue_reset; // @[Decoupled.scala 293:21:@58.4]
  wire  queue_io_enq_ready; // @[Decoupled.scala 293:21:@58.4]
  wire  queue_io_enq_valid; // @[Decoupled.scala 293:21:@58.4]
  wire [7:0] queue_io_enq_bits; // @[Decoupled.scala 293:21:@58.4]
  wire  queue_io_deq_ready; // @[Decoupled.scala 293:21:@58.4]
  wire  queue_io_deq_valid; // @[Decoupled.scala 293:21:@58.4]
  wire [7:0] queue_io_deq_bits; // @[Decoupled.scala 293:21:@58.4]
  Queue queue ( // @[Decoupled.scala 293:21:@58.4]
    .clock(queue_clock),
    .reset(queue_reset),
    .io_enq_ready(queue_io_enq_ready),
    .io_enq_valid(queue_io_enq_valid),
    .io_enq_bits(queue_io_enq_bits),
    .io_deq_ready(queue_io_deq_ready),
    .io_deq_valid(queue_io_deq_valid),
    .io_deq_bits(queue_io_deq_bits)
  );
  assign io_in_ready = queue_io_enq_ready;
  assign io_out_valid = queue_io_deq_valid;
  assign io_out_bits = queue_io_deq_bits;
  assign queue_clock = clock;
  assign queue_reset = reset;
  assign queue_io_enq_valid = io_in_valid;
  assign queue_io_enq_bits = io_in_bits;
  assign queue_io_deq_ready = io_out_ready;
endmodule

`Queue`の引数

最後にChiselのQueueの引数について改めて紹介しておく。因みにこの情報はここから

new Queue(gen: T, entries: Int, pipe: Boolean = false, flow: Boolean = false)(implicit compileOptions: CompileOptions)

gen     : The type of data to queue
entries : The max number of entries in the queue
pipe    : True if a single entry queue can run at full throughput (like a pipeline).             The ready signals are combinationally coupled.
flow    : True if the inputs can be consumed on the same cycle (the inputs "flow"
          through the queue immediately). The valid signals are coupled.

gen/entriesは既に見てきたようにインターフェースとキューの深さになる。

残りのpipe/flowだが、キューの処理に影響が出るものとなる。ChiselのQueueの実装から関係のある部分をに着だすとそれぞれ以下の様になる。

pipe

if (pipe) {
  when (io.deq.ready) { io.enq.ready := true.B }
}

pipe == trueにすると、deq.readyが0x1になるとそのサイクルでenq側のreadyが0x1になりキューへのデータライトが可能になるようだ。

flow

if (flow) {
  when (io.enq.valid) { io.deq.valid := true.B }
  when (empty) {
    io.deq.bits := io.enq.bits
    do_deq := false.B
    when (io.deq.ready) { do_enq := false.B }
    }
}

flowはデータをライトした際のdeq側のデータ取得のタイミングに影響が出る。trueの場合にはデータをライトしたサイクル(==enq.valid==trueのサイクル)でdeq側でデータを取得することが可能となる。キューが空の場合にはデータはキュー内部に入ることなくバイパスされる処理になる。

2019/07/29追記

上記のpipe/flowによる挙動の変化について調べた記事を書いたので、必要に応じてこちらもどうぞ

追記ここまで

ということで今日はChiselのQueueについて紹介した。

明日以降もChiselの標準ライブラリについて見ていくが、もう少しサクッと行きたいところ。