CSE x25 Lab Assignment 3
Welcome to CSE 125/225! Each lab is formed from one or more parts. Each part is
relatively independent and parts can normally be completed in any order. Each part will teach a concept by implementing multiple modules.
Each lab will be graded on multiple categories: correctness, style/lint, git hygiene, and demonstration. Correctness will be assessed by our autograder. Lint will be assessed by
Verilator (make lint). Style. and hygiene will be graded by the TAs.
To run the test scripts in this lab, run make test from one of the module directories. This will run (many) tests against your solution in two simulators: Icarus Verilog, and Verilator. Both will generate waveform. files (.fst) in a directory: run/<test_name and parameter
values>/<simulator name>. You will need to run make extraclean after make test to clean your previous output files.
You may use any waveform. viewer you would like. The codespace has Surfer installed. It is also an excellent web-based viewer. You can also download GTKWave, which is a bit more finicky.
Each Part will have a demonstration component that must be shown to a TA or instructor for credit. We may manually grade style/lint after the assignment deadline. Style. checking and linting feels pedantic, but it is the gold standard in industry.
At any time you can run make help from inside one of the module directories to find all of the commands you can run.
When you have questions, please ask. Otherwise, happy hardware hacking!
Part 1: Memories as LUTs as Programmable Logic Déjà vu anyone?
We like to treat FPGAs as a “sea of gates” , but they are not. They are actually made up of discrete elements, like look-up-tables (LUTs) (Xilinx/AMD, Lattice), multiplexers (Xilinx/AMD), multipliers (Xilinx/AMD, Lattice) and memories (Xilinx/AMD). The job of the Electronic Design Automation (EDA) toolchain is to synthesize SystemVerilog into these discrete elements and
then program them.
In this part, the objective is to use actual FPGA primitives to re-create the logic functions you completed in Lab 1 and 2. In effect, do synthesis by hand.
The following is the instantiation template for a AMD/XILINX LUT6 module:
module LUT6
# (parameter [63:0] INIT = 64'h0000000000000000)
(output O,
,input I0
,input I1
,input I2
,input I3
,input I4
,input I5);
The Look-up-Table (LUT) operates by using I0 through I5 as a 5-bit address that indexes the bits in the INIT parameter to produce O. For example, if I5 - I0 have the values {1’b0, 1’b0, 1’b0, 1’b1, 1’b0, 1’b0. O will be the value at index 4 in INIT (0 in this example). Edit/Update:
Remember that index 0 of of the bit string 2’b01 is 1 (not 0)
We do not use the Xilinx LUT6 in this lab; we use the SB_LUT4 on your ICE40 FPGA. This is the SB_LUT4 definition:
module SB_LUT4 (
output O,
input I0,
input I1,
input I2,
input I3
);
parameter [15:0] LUT_INIT = 0;
Like in the Xilinx example, LUT_INIT is the LUT initialization string. The Look-up-Table (LUT) operates by using I0 through I3 as a 4-bit address that indexes the bits in the INIT
parameter to produce O.
Please complete the following parts using the primitives dictated. All of the FPGA primitives are available in the provided folder.
● xor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Or module.
● xnor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Nor module.
● mux2: Using only the Lattice SB_LUT4 module, create a 2-input multiplexer module.
● full_add: Using only the Lattice ICESTORM_LC module, create a full_add module. The documentation for this module is here (Page 2-2). Another good reference is here.
You will need to produce sum_o using the internal Look-up-Table and inputs I1, I2, and CIN. These inputs also connect to “hardened” carry logic (It looks like a mux in the
diagram). These are the relevant lines in the provided module:
wire mux_cin = CIN_CONST ? CIN_SET : CIN;
assign COUT = CARRY_ENABLE ? (I1_pd && I2_pd) || ( (I1_pd || I2_pd) &&
mux_cin) : 1'bx;
LUT_INIT, CIN_SET, and CARRY_ENABLE are the three key parameters to set. All the remaining parameters can be ignored.
● We are removing this module to simplify Lab 3.
● triadder: Sometimes, ripple-carry-adders (chaining the c_o from one Full-Adder to c_i in the next Full-Adder) are suboptimal. For example, adding three numbers together
requires two ripple carry adders – the longest path through the circuit would be through all the carry bits. Fortunately, there’s a “faster” approach.
Implement a 3-way adder without using the verilog + operator more than once. You
should use the full_add module (either the one above, or the one from the previous lab). The key technique here is to use a 3:2 compressor, and then add the resulting 2-bit output using an adder. This is a good reference: link
Tl:DR: Use the full_add module as a 3:2 compressor, and then add the resulting two bits to get the final result. This technique generalizes to N inputs if you read further in the link above.
● shift: Using only the Lattice ICESTORM_LC module, create a parameterized, shift register identical to Lab 1. The shift register should shift left, and shift in d_i to the low-order-bit, on the positive edge of c lk_i, when enable_i == 1.
The documentation for the ICESTORM_LC module is here (Page 2-2). Another good reference is here.
These are the key lines from the provided module:
always @ (posedge polarized_c lk)
if (CEN_pu)
o_reg <= SR_pd ? SET_NORESET : lut_o;
assign O = DFF_ENABLE ? ASYNC_SR ? o_reg_async : o_reg : lut_o;
The key ports for this module are I0, 0, CLK, CEN and SR. The key parameters for this module are: LUT_INIT (How do you use the LUT_INIT to pass through I0, unmodified? How do you use it to make a mux, to select the d_i input?), SET_NORESET (Related to reset_val_p), and DFF_ENABLE (parameter for enabling the D-Flip-Flop. NEG_CLK and ASYNC_SR must be left at their default values.
Demonstration (All Students):
There is no demonstration for this part.
Part 2: Asynchronous Memories in SystemVerilog
Let’s get some practice with memories. Instead of using the LUTs in the FPGA, let’s use verilog to describe memories. Since these memories are asynchronous-read, they are
synthesized to registers in the actual fabric.
● ram_1r1w_async: Using behavioral SystemVerilog, create a read-priority (aka read-first) asynchronous memory with 1 write port and 1 read port. It must implement the
parameters width_p, and depth_p. Read priority means that reads get the old write data when there is an address collision (i.e. the read happens first).
Your asynchronous memory should initialize using the function $readmemh.
When simulating, you will see a warning like this in icarus: FST warning : array
word ram_1r1w_async .ram [10] will conflict with an escaped identifier. This is OK.
● hex2ssd: Using your ram_1r1w_async, create a module that converts a 4-bit
hexadecimal number into a seven-segment display encoding. Commit your memory initialization file along with your solution.
● kpyd2hex: Using your ram_1r1w_async memory, create a module that converts from a keypad (Row, Column) output to a hexadecimal value.
kpyd_i is the one-hot encoding of the row value in the high-order bits, and the column value in the low-order bits. I think the Icebreaker PMOD pin definitions are the swapped from the Digilent PMOD definitions. My solution treats Column 1 as 0001, corresponding to the column with 1/4/7/0, and Row 1 as 0001, corresponding to the row 1/2/3/A.
Commit your memory initialization file along with your solution. You will need to copy both .hex files into this directory for it to compile to your FPGA.
Demonstration (All Students):
Demonstrate your working Keypad to Seven-Segment Display module on the FPGA by instantiating your modules in top.sv. Use your keypad to show which button is being pressed on the seven segment display.
You will need to iteratively select columns to determine which column has a button being pressed. (Do this with a 1-hot shift register/ring counter!)
● You will need to drive the column pins on the keypad, and it will respond with the row being pressed within that column.
● The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001). However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rows are also zero-hot (see datasheet for more info). For example, if button “ 1" is being
pressed and we send 1110 to the keypad, it will respond with 1110.
● Finally, the keypad glitches if you send too many requests, so you need to slow the 12MHz clock
You do not need to debounce or edge-detect the buttons.
● You will need to iteratively select columns to determine which column has a button being pressed. (Do this with a 1-hot shift register!)
● The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001). However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rows are also zero-hot.
● You will need to handle the case where no button is pressed.
● It is safe to assume we will only press one button at a time in a column.
● You can use persistence of vision.
Part 3: Elastic Pipelines and FIFOs
We are working our way up to pipelines. There are two types of pipelines: inelastic, and elastic (we will cover these in class). You can always wrap inelastic pipelines to create elastic ones. In this lab, you will write an in-elastic pipeline stage. Next, you will create an elastic
pipeline stage. Finally, you will use your memory (from above) to create a FIFO.
You may use whatever operators and behavioral description you prefer, except you may not use always@(*). You are encouraged to reuse whatever modules see fit from previous
labs or this lab.
● inelastic: Write an inelastic pipeline stage. When en_i is 1, it should save the data.
Otherwise, it should not. When datapath_reset_p == 1, data_o should be reset to 0 if reset_i ==1 at the positive edge of the clock.
You can use /* verilator lint_off WIDTHTRUNC */ around
datapath_reset_p to clear the lint warnings. Note: This should look a lot like writing a DFF.
● elastic: Write a mealy elastic pipeline stage. You can think of this as a 1-element FIFO, with a mealy state machine to improve throughput. The module must be Ready Valid & on the input/consumer interface ( ready_o and valid_i) and Ready Valid & (valid_o and ready_i) on the output/producer.
When datapath_reset_p == 1, data_o should be reset to 0 if reset_i ==1 at the positive edge of the clock.
When datapath_gate_p ==1, data_o should only be updated when valid_i == 1. Otherwise, data_o should be updated whenever ready_o == 1. This a very simple form. of “Data Gating”, and is the missing “bit” from the class lecture slides.
10/23 Note: A potentially better way to say above: When datapath_gate_p == 1,
data_o should only be updated when (valid_i & ready_o) == 1. Otherwise, data_o should be updated whenever ready_o ==1.
● fifo_1r1w: Using behavioral SystemVerilog, your ram_sync_1r1w module, and any other module you have written, write a First-in-First-Out (FIFO) module. The module must be Ready Valid & on the input/consumer interface ( ready_o and valid_i) and Ready Valid & (valid_o and ready_i) on the output/producer. This paper and this google doc have good breakdowns of the interface types.
Demonstration (All Students):
Demonstrate your working FIFO by using it to connect between audio input and output on your FPGA board. You will plug the PMOD I2S2 into PMOD Port B on your board, and then use 3.5mm cables to connect to the Audio I/O ports to/from your computer/speaker. You should set your FIFO to a very small depth (e.g. 2) because the Lattice boards do not have memories that support the ram_1r1w_async pattern. The output must sound the same as the original
audio for credit.
What is the maximum value for depth_p that you can use on your FPGA, before the toolchains fail to compile?
Part 4: Sinusoid / Fixed Point Representation
Have you ever heard anyone complain about how complicated IEEE 754 floating point is? The problem is that it’s easy to use (in software), until it isn’t: List of Failures from IEEE 754. For this reason, floating point arithmetic isn’t used in many safety critical applications. For the same reasons, floating point numbers aren’t used in signal processing. Fixed point operations are vastly less complicated than floating point operations, require vastly less area, and are numerically stable.
Fixed point arithmetic follows the same rules as normal two’s-complement arithmetic. In that sense, you already know the basics. The difference is that when two fixed-point numbers are multiplied, the number of fractional digits/bits increases. For example, .5 * .5, which is
representable with one fractional digit, produces .25, which needs two fractional digits to
represent. In fixed point, the fractional digits represent ½ (.5), ¼ (.25), ⅛ (.125), etc. In the
example above, .5 is represented in binary by .1. When you multiply 0.1 and 0.1, the result will be two bits, 0.01 (binary), or .25 (decimal)
I like to handle fractional bits by declaring the fractional bits in the negative range of the bus. For example, wire [11 :-4] foo, has 12 integer bits, and 4 fractional bits. When foo is multiplied by itself, it produces 24 integer bits, and 8 fractional bits, or [23:-8]. However, if
[-1 :-4] bus is multiplied by a [11 :-4] bus, the result is only a [11 :-8] bus.
Here are a few good tutorials:
From Berkeley: https://inst.eecs.berkeley.edu/~cs61c/sp06/handout/fixedpt.html From UW: https://courses.cs.washington.edu/courses/cse467/08au/labs/l5/fp.pdf
● sinusoid: Using your ram_1r1w_async memory, create a module that generates a sine wave, turning hexadecimal indices into (signed) 12-bit values. See the demo below for more information.
Since this is an audial challenge, there is no testbench for this part. If you need
accommodations, please see the instructor. Commit your memory initialization file (in hex format) along with your solution.
Demonstration (All Students):
Demonstrate using your counter module from Lab Assignment 1/2, and your sinusoid module above, play a Tuning-A tone on the speakers in the lab with the PMOD I2S2 module.
You need to figure out how to generate a tone at 440 Hz, given that the PLL clock runs at
22.591MHz, and the I2S2 accepts a Left channel and a Right channel output at approximately 44.1 KHz. The interface to the I2S2 module is Ready-Valid-&. (Note: Do not use the output of your counter as your clock. All of your logic should run at 22.591MHz.). Implement your solution in sinusoid/top .sv. We will use this link (or similar) to determine if you have succeeded.
The clock frequency in this lab has changed. The signal from the PLL is faster, 22.591MHz, and called c lk_o.
In both demonstration folders, top .sv instantiates an I2S2-to-AXI-Streaming module, which drives the I2S2 PMOD. The input and output of this module uses a ready/valid
handshake. The left and right audio channels are separate wires, but you can concatenate them if you would like (for your FIFO). Drive both, for your sinusoid.
top .sv “works out of the box” . You can test your setup works by connecting your
computer to the audio input and connecting the audio output into amplified speakers, i.e. those with a power cable. You will need to instantiate your logic between the interfaces for the demo.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。