Team 142857: Difference between revisions
Line 341: | Line 341: | ||
assign c_sum_vld = (st_addr >= r_last_addr3); | assign c_sum_vld = (st_addr >= r_last_addr3); | ||
... | ... | ||
[[image:HW_simulation_stop.jpg]] |
Revision as of 22:38, 22 February 2012
Team members
- Michael Patterson
- Chetan N-Govindaiah
- Jungmin Park
Assignment 1
Developing a Custom Personality
1 Analyze Application
- How does the current application perform on existing hardware?
- What bottleneacks are limiting the performance?
- What data structures are involved?
- How parallelizable is the application?
2 Evaluate Hardware Options
3 Define Custom Instructions
- The functions implemented by the hardware design can then be mapped to custom instructions.
4 Develop Software Model of Custom Personality
- Convey provides an architecture simulation environment to allow rapid prototyping of both the hardware and software components of a custom personality.
- This environment is written in C++ to emulate the rest of the system. It includes hardware models of instruction dispatch, register state and the memory subsystem.
5 Modify Application to Use Coprocessor
6 Compile Application with Convey Compiler
7 Simulate Application with Convey Architecture Simulator
- This step allows the application and the custom instruction set to be debugged before the hardware is designed.
8 Develop FPGA Hardware
9 Simulate Hardware in Convey Simulation Environment
- Convey provides a hardware simulation environment with bus-functional models for all hardware interfaces to the Application Engine (AE) FPGA.
- Using a standard VPI interface (Verilog Procedural Inteface) the architecture simulator can be used to provide stimulus to the HDL simulation.
10 Integrate with Convey Hardware
Running the Sample Application
1. Copy Sample AE and Sample Application The Vadd sample personality and application is installed with the PDK RPM in /opt/convey/pdk/2010_08_09/.
The sample is made up of two components:
- cae_pers_vadd - contains the sample custom personality, including the software model which emulates the Application Engine FPGA
- CaeSimPers - contains the AE simulation model of the sample personality
- CaelsaVadd.cpp - models the behavior of the custom personality, implements the following functions :
- void CCaelsa::InitPers()
- void CCaelas::CaepInst(int masked, int ae, int opcode, int immed, unit64 scalar
- CaelsaVadd.cpp - models the behavior of the custom personality, implements the following functions :
- phys - Xilinx physical implementation, contains constraints files
- testbench
- verilog-RTL to be synthesized into FPGA
- CaeSimPers - contains the AE simulation model of the sample personality
- SampleAppVadd - contains the application that uses the instruction defined in the CasSample tree.
cd $HOME mkdir pdk_sample cp -r /opt/convey/pdk/2010_08_09/cae_pers_vadd pdk_sample cp -r /opt/convey/pdk/2010_08_09/pdk_apps/SampleAppVadd pdk_sample
2. Build the Sample AE and Sample Application
cd $HOME/pdk_sample/cae_pers_vadd/CaeSim make cd ../../SampleAppVadd make
3. Custom AE Software Simulation
- Set up Environment Variables
export CNY_PDK_PROJ=~/pdk_sample/cae_pers_vadd/
By this commend, CNY_CAE_EMULATOR will be changed into ~/pdk_sample/cae_pers_vadd/CaeSimPers/CaeSimPers
- Run the application against the architecture simulator
4. Custom AE Hardware Simulation
5. AE Physical Build
cd ~/pdk_sample/cae_pers_vadd/phys make
PDK build flow automatically runs synthesis, place and route, timing analysis and bitgen
Sample Application Software Model
~/pdk_sample/cae_pers_vadd/CaeSimPers/CaeIsaVadd.cpp
ConveyPDKReferenceManual.pdf Section 7.2
Assignment 2 (Sobel Edge Detection)
Sobel Algorithm [1]
Simplified description
- In simple terms, the operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented. In practice, the magnitude (likelihood of an edge) calculation is more reliable and easier to interpret than the direction calculation.
- Mathematically, the gradient of a two-variable function (here the image intensity function) is at each image point a 2D vector with the components given by the derivatives in the horizontal and vertical directions. At each image point, the gradient vector points in the direction of largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction. This implies that the result of the Sobel operator at an image point which is in a region of constant image intensity is a zero vector and at a point on an edge is a vector which points across the edge, from darker to brighter values.
Source Code
- files
- sobel_edge_detection.c - sobel algorithm
- Img - folder including image files
- Makefile
sobel_edge_detection: sobel_edge_detection.c gcc -o sobel_edge_detection -pg sobel_edge_detection.c -lm clean : rm -rf sobel_edge_detection rm -rf lena_edge.pgm
#include <stdio.h> #include <stdlib.h> #include <math.h> int sobel_convolution_x (int x, int y, int *GX, int *in_frame, int width, int depth); int sobel_convolution_y (int x, int y, int *GY, int *in_frame, int width, int depth); void read_pgm_head(FILE *fp_read, char *format, int *width, int *height, int *depth); void read_pgm_data(FILE *fp_read, int *in_frame, int *width, int *height); void write_pgm (FILE *fp_write, int *out_frame, char *format, int *width, int *height, int *depth); int main() { int width; int height; int depth; char format[5]; int sumX; int sumY; int SUM; int *in_frame; int *out_frame; int i,j; int x, y; FILE *fp_read, *fp_write; int GX[9]={-1, 0, 1, -2, 0, 2, -1, 0, 1}; int GY[9]={-1, -2, -1, 0, 0, 0, 1, 2, 1}; //Read pgm file fp_read = fopen("./img/lena.pgm","rb"); read_pgm_head(fp_read, format, &width, &height, &depth); in_frame =(int *)malloc(sizeof(int)*width*height); read_pgm_data(fp_read, in_frame, &width, &height); // Sobel Algorithm out_frame = (int *)malloc(sizeof(int)*width*height); for(y=0; y<height; y++){ sumX = 0; sumY = 0; for(x=0; x<width; x++){ sumX = sobel_convolution_x (x, y, GX, in_frame, width, depth); sumY = sobel_convolution_y (x, y, GY, in_frame, width, depth); SUM = sqrt(pow((double)sumX,2) + pow((double)sumY,2)); if (SUM > depth) SUM = depth; if (SUM < 0) SUM = 0; *(out_frame + y*width + x) = SUM; printf("(row, col) = (%d, %d), sumX = %d, sumY = %d, SUM = %d \n", y, x, sumX, sumY, SUM); } } // Write pgm file fp_write=fopen("lena_edge2.pgm","wb"); write_pgm (fp_write, out_frame, format, &width, &height, &depth); free(in_frame); free(out_frame); close(fp_read); close(fp_write); return 0; } int sobel_convolution_x (int x, int y, int *GX, int *in_frame, int width, int depth) { int i, j; int sumX=0; for (i=0; i<3; i++){ for (j=0; j<3; j++){ if (x+j >=width || y-2+i <0) sumX = sumX; else sumX = sumX + GX[i*3+j]*(*(in_frame + width*(y-2+i)+x+j)); } } return sumX; } int sobel_convolution_y (int x, int y, int *GY, int *in_frame, int width, int depth) { int i, j; int sumY=0; for (i=0; i<3; i++){ for (j=0; j<3; j++){ if (x+j >=width || y-2+i <0) sumY = sumY; else sumY = sumY + GY[i*3+j]*(*(in_frame + width*(y-2+i)+x+j)); } } return sumY; } void read_pgm_head(FILE *fp_read, char *format, int *width, int *height, int *depth) { char line[100]; char line1[100]; fgets(format, sizeof(format),fp_read); fputs(format, stdout); fgets(line, sizeof(line), fp_read); fputs(line, stdout); fgets(line1, sizeof(line1), fp_read); fputs(line1, stdout); fscanf(fp_read, "%d %d %d", width, height, depth); printf("Width = %d, Height = %d, Depth = %d\n", *width, *height, *depth); } void read_pgm_data(FILE *fp_read, int *in_frame, int *width, int *height) { int x, y; for(y=0;y<(*height); y++){ for(x=0;x<(*width); x++){ *(in_frame + y*(*width) + x)=(int)getc(fp_read); } } } void write_pgm (FILE *fp_write, int *out_frame, char *format, int *width, int *height, int *depth) { int x, y; fprintf(fp_write, "%s\n%d %d\n %d\n", format, *width, *height, *depth); for(y=0;y<(*height);y++){ for(x=0;x<(*width);x++){ putc(*(out_frame + y*(*width) + x),fp_write); } } }
Profile C code
% cumulative self self total time seconds seconds calls Ts/call Ts/call name 100.35 0.01 0.01 main 0.00 0.01 0.00 65536 0.00 0.00 sobel_convolution_x 0.00 0.01 0.00 65536 0.00 0.00 sobel_convolution_y 0.00 0.01 0.00 1 0.00 0.00 read_pgm_data 0.00 0.01 0.00 1 0.00 0.00 read_pgm_head 0.00 0.01 0.00 1 0.00 0.00 write_pgm
Software Simulation
- Software emulator
- CaeSimPers : We modified CaeIsaVadd.cpp in order to perform vector multiplication. We should set the environment variable (CNY_CAE_EMULATOR) to CaeSimPers.
export CNY_CAE_EMULATOR=$CNY_PDK_PROJ/CaeSimPers/CaeSimPers
void CCaeIsa::CaepInst(int aeId, int opcode, int immed, uint32 inst, uint64 scalar) // F7,0,20-3F { switch (opcode) { // CAEP00 - M[a1] * M[a2] -> M[a3] case 0x20: { uint64 length, a1, a2, a3; uint64 val1, val2, val3, sum = 0; length = ReadAeg(aeId, AEG_CNT); a1 = ReadAeg(aeId, AEG_MA1); a2 = ReadAeg(aeId, AEG_MA2); a3 = ReadAeg(aeId, AEG_MA3); for (int mc = 0; mc < NUM_MCS; mc += 1) { for (uint64 i = 0; i < length; i += 1) { // Check that address is right for this MC (virtual address bits 8:6 for binary interleave) if ((int)((a1+i*8 >> 6) & 7) == mc) { AeMemLoad(aeId, mc, a1+i*8, MEM_REQ_SIZE, false, val1); AeMemLoad(aeId, mc, a2+i*8, MEM_REQ_SIZE, false, val2); val3 = val1 * val2; sum += val3; AeMemStore(aeId, mc, a3+i*8, MEM_REQ_SIZE, false, val3); } } } WriteAeg(aeId, AEG_SAE_BASE+aeId, sum); break; } default:{ printf("Default case hit - opcode = %x\n", opcode); for (int aeId = 0; aeId < CAE_AE_CNT; aeId += 1) SetException(aeId, AEUIE); } } }
- Sobel_edge_detection.c
// Make vectors [width*height*18] values = (int *)malloc(sizeof(int)*width*height*18); cons = (int *)malloc(sizeof(int)*width*height*18); for(y=0;y<height; y++){ for(x=0;x<width; x++){ for (k=0; k<2; k++){ for (i=0; i<3; i++){ for (j=0; j<3; j++){ *(cons + 18*(width*y + x) + k*9+ i*3 + j) = G[k*9+i*3+j]; if (x+j >= width || y-2+i <0) *(values + 18*(width*y + x) + k*9 + i*3 + j) = 0; else *(values + 18*(width*y + x) + k*9 + i*3 + j) = *(in_frame + width*(y-2+i)+x+j); } } } } } .... // Send vectors to Coprocessor for (i=0; i< size; i++) { a1[i] = cons[i]; a2[i] = values[i]; } // Call coprocessor act_sum = l_copcall_fmt(sig, cpTestEx1, "AAAAA", a1, a2, a3, size, ae_sum); // Call coprocessor ... // Computation at Intel processor ( SUM <-|Gx| + |Gy| ) for(i=0; i<size; i++){ if ( i % 18 == 8 ) { sum1 += (int)a3[i]; SUM = abs(sum1); sum1 = 0; } else if ( i % 18 == 17) { sum1 += (int)a3[i]; SUM += abs(sum1); sum1 = 0; if (SUM > depth) SUM = depth; if (SUM < 0) SUM = 0; *(out_frame + (i+1)/18) = SUM; printf ("(row, col) = (%d, %d), SUM = %d \n", (i+1)/(18*width), ((i+1)/18)%width, SUM); SUM = 0; } else sum1 += (int)a3[i]; }
Hardware Simulation
- Vector Adder Simulation
- vadd.v
... assign c_result[64:0] = {1'b0, op1_ram_out[63:0]} * {1'b0, op2_ram_out[63:0]}; assign c_result_vld = c_op1_vld && c_op2_vld && ~r_resq_afull; assign c_res_cnt = c_result_vld ? r_res_cnt + 24'h1 : r_res_cnt; assign c_sum[64:0] = r_result_vld ? r_sum[64:0] + r_result[64:0] : r_sum[64:0]; assign c_sum_vld = (st_addr >= r_last_addr3); ...