MemoCode 2009 Team Mr. Jones
General Information
- Problem assigned on March 1, 2009, solution due March 31, 2009
Simulation
Dr. Zambreno's matlab script for visualizing how the grids overlay. Memocode.m and Memocode.fig
Team
Team Name
We need to come up with a name. The other team picked a name which has a play on words with their platform and was a famous song. Barracuda by Heart I couldn't think of a team name with a play on words with our platform so I picked one that was a famous song and also happened to be our professor. Mr. Jones by Counting Crows
Team Members
- Adwait Gupte - adwait@iastate.edu
- Alex Baumgarten - abaumgar@iastate.edu
- Madhu Monga - madhum@iastate.edu
- Matt Clausman - mclausma@iastate.edu
- Pavan Gorti - gnpavan@iastate.edu
- Song Sun - sunsong@iastate.edu
Deadlines
03/12 - PPC - Adwait
03/12 - Cache Design - Pavan/Madhu
03/12 - Initalizer - Matt
03/12 - Trig Unit - Matt
03/12 - Averager - Madhu
03/12 - Reader - Alex
03/12 - Waiter - Alex
03/12 - Get reference design working - Song
Top Level
We have a new top level drawing. The original design has some major draw backs. First, in a lot of cases (when theta is large), we don't need to access over half of the cartesian map. If someone gets a change upload the picture from my email here. Also, They can be over 60 different polar coordinates inside of a single cartesian grid. This new design doesn't focus on reading the memory quickly, but instead writing a new value every clock cycle. This makes more sense since we won't need to read as much as we will write.
Here is the new toplevel drawing:
Steps
- Start filling cache
- Convert theta and R from float to fixed point
- Compute dr, dTheta, dx, dy
- Start hardware
- Create trig table
- Start finding start and end values for rows
- Run pipeline
Components
Initializer
add description here
PLB Master
The PLB Master is our connection to the system bus. This is 128 bits wide and operates at 133MHz.
Input Row Buffer
The Input Row Buffer is in charge of getting data from the PLB. It should contain and array of row buffers which hold all rows that could potentially be accessed. Since the data will be incrementing R, followed by theta, eventually a Cartesian row will no longer be needed. The rowDone signal tells the row buffer that its lowest row is no longer needed and it can put a different row in its place. At the time the now unneeded row buffer should be populated with the data from the SDRAM so it can quickly be accessed once needed. The Input Row Buffer has a DataValid signal (not on the drawing) to indicated the read data is ready. Since we don't need to read the entire row, we need to come up with a good scheme of reading only partial rows so we don't waste bus time. The system will (hopefully) be writing back every clock cycle, so we will have a very busy bus. At present, I think the first access will have to be a miss, since we can calculate and request the first access (in the Solver Pipeline) before any data in the SDRAM can be accessed. This initial value can be very useful in determining the amount of the row to read. Any ideas on how to solve this problem would welcomed.
This needs a block diagram
Solver Pipeline
The solver pipeline is a structure designed to accept a new R_cnt and Theta_cnt value every clock cycle. These are the row column address in the memory. All stages must be properly pipelined with a larger emphasis on clock rate than latency. This pipeline should only stall if the writeback fifo is full (unlikely) or there was a miss in the input row buffer.
R Value Calc and Theta Value Calc convert the row column address into actual fixed point R Theta values based on inR and inTheta (the initialization values) and dR and dTheta. They pass out R_cnt, R, theta_cnt, adn theta.
The rcosTheta, rsinTheta modules does exactly what its name says. It find the x y location of the Cartesian point. This then needs to be debiased (based on the minimum x and y values) and the scaled and trunctated to make it a memory address. The minimum X is always RcosTheta. The minimum Y is always 0. We also need range X, which is (R+1) - RcosTheta. So the bias factor is RcosTheta which needs to be subtracted and then the result scaled by the range so it results in a value which can be floored to the X location desired. Y is done in a similar fashion. Note that the initialize should probably make use of the R Value Calc and Theta Value calc to do this. If this slows down the pipeline too much then these modules could be replicated instead.
After we know the x y memory locations, a read is requested. Then the counts are passed to the wait stage which is long enough for a Input row Buffer hit to return as the counts leave the wait stage. This way the averager can run right away.
the Averager computes the average and sends the data to the writeback fifo.
Writeback Fifo
add description here
Extra Features
- Skipping the unneeded values that exist on the corners due to fitting a round Polar coordinates into a square Cartesian grid.
- Trig table overlapping.
- Compress # of values written to the plb if writing the same value.