Team 142857

From Cpre584
Jump to navigation Jump to search

Team members

  • Michael Patterson
  • Chetan N-Govindaiah
  • Jungmin Park

Assignment 1

Developing a Custom Personality

1 Analyze Application

  • How does the current application perform on existing hardware?
  • What bottleneacks are limiting the performance?
  • What data structures are involved?
  • How parallelizable is the application?

2 Evaluate Hardware Options

3 Define Custom Instructions

  • The functions implemented by the hardware design can then be mapped to custom instructions.

4 Develop Software Model of Custom Personality

  • Convey provides an architecture simulation environment to allow rapid prototyping of both the hardware and software components of a custom personality.
  • This environment is written in C++ to emulate the rest of the system. It includes hardware models of instruction dispatch, register state and the memory subsystem.

5 Modify Application to Use Coprocessor

6 Compile Application with Convey Compiler

7 Simulate Application with Convey Architecture Simulator

  • This step allows the application and the custom instruction set to be debugged before the hardware is designed.

8 Develop FPGA Hardware

9 Simulate Hardware in Convey Simulation Environment

  • Convey provides a hardware simulation environment with bus-functional models for all hardware interfaces to the Application Engine (AE) FPGA.
  • Using a standard VPI interface (Verilog Procedural Inteface) the architecture simulator can be used to provide stimulus to the HDL simulation.

10 Integrate with Convey Hardware

Running the Sample Application

1. Copy Sample AE and Sample Application The Vadd sample personality and application is installed with the PDK RPM in /opt/convey/pdk/2010_08_09/.

The sample is made up of two components:

  • cae_pers_vadd - contains the sample custom personality, including the software model which emulates the Application Engine FPGA
    • CaeSimPers - contains the AE simulation model of the sample personality
      • CaelsaVadd.cpp - models the behavior of the custom personality, implements the following functions :
        • void CCaelsa::InitPers()
        • void CCaelas::CaepInst(int masked, int ae, int opcode, int immed, unit64 scalar
    • phys - Xilinx physical implementation, contains constraints files
    • testbench
    • verilog-RTL to be synthesized into FPGA
  • SampleAppVadd - contains the application that uses the instruction defined in the CasSample tree.
cd $HOME
mkdir pdk_sample
cp -r /opt/convey/pdk/2010_08_09/cae_pers_vadd pdk_sample
cp -r /opt/convey/pdk/2010_08_09/pdk_apps/SampleAppVadd pdk_sample

2. Build the Sample AE and Sample Application

cd $HOME/pdk_sample/cae_pers_vadd/CaeSim 
make
cd ../../SampleAppVadd
make

3. Custom AE Software Simulation

  • Set up Environment Variables
export CNY_PDK_PROJ=~/pdk_sample/cae_pers_vadd/

By this commend, CNY_CAE_EMULATOR will be changed into ~/pdk_sample/cae_pers_vadd/CaeSimPers/CaeSimPers

  • Run the application against the architecture simulator

4. Custom AE Hardware Simulation

5. AE Physical Build

cd ~/pdk_sample/cae_pers_vadd/phys
make

PDK build flow automatically runs synthesis, place and route, timing analysis and bitgen

Sample Application Software Model

~/pdk_sample/cae_pers_vadd/CaeSimPers/CaeIsaVadd.cpp

ConveyPDKReferenceManual.pdf Section 7.2

Assignment 2 (Sobel Edge Detection)

Michael's Approach [1]

Sobel Algorithm [2]

Simplified description

  • In simple terms, the operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented. In practice, the magnitude (likelihood of an edge) calculation is more reliable and easier to interpret than the direction calculation.
  • Mathematically, the gradient of a two-variable function (here the image intensity function) is at each image point a 2D vector with the components given by the derivatives in the horizontal and vertical directions. At each image point, the gradient vector points in the direction of largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction. This implies that the result of the Sobel operator at an image point which is in a region of constant image intensity is a zero vector and at a point on an edge is a vector which points across the edge, from darker to brighter values.

Source Code

  • files
    • sobel_edge_detection.c - sobel algorithm
    • Img - folder including image files
    • Makefile
sobel_edge_detection: sobel_edge_detection.c
       gcc -o sobel_edge_detection -pg sobel_edge_detection.c -lm
clean :
       rm -rf sobel_edge_detection 
       rm -rf lena_edge.pgm
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int sobel_convolution_x (int x, int y, int *GX, int *in_frame, int width, int depth);
int sobel_convolution_y (int x, int y, int *GY, int *in_frame, int width, int depth); 
void read_pgm_head(FILE *fp_read, char *format, int *width, int *height, int *depth);
void read_pgm_data(FILE *fp_read, int *in_frame, int *width, int *height);
void write_pgm (FILE *fp_write, int *out_frame, char *format, int *width, int *height, int *depth);
int main()
{
   int width;
   int height;
   int depth;
   char format[5];
   int sumX;
   int sumY;
   int SUM;
   int *in_frame;
   int *out_frame;
   int i,j;
   int x, y;
   FILE *fp_read, *fp_write;
   int GX[9]={-1, 0, 1, -2, 0, 2, -1, 0, 1};
   int GY[9]={-1, -2, -1, 0, 0, 0, 1, 2, 1};
   //Read pgm file
   fp_read = fopen("./img/lena.pgm","rb");
   read_pgm_head(fp_read, format, &width, &height, &depth);
   in_frame =(int *)malloc(sizeof(int)*width*height);
   read_pgm_data(fp_read, in_frame, &width, &height);    
   // Sobel Algorithm
   out_frame = (int *)malloc(sizeof(int)*width*height);
   for(y=0; y<height; y++){
    sumX = 0;
    sumY = 0;
    for(x=0; x<width; x++){
       sumX = sobel_convolution_x (x, y, GX, in_frame, width, depth);
       sumY = sobel_convolution_y (x, y, GY, in_frame, width, depth);
       SUM = sqrt(pow((double)sumX,2) + pow((double)sumY,2));
       if (SUM > depth) SUM = depth;
       if (SUM < 0) SUM = 0;   
       *(out_frame + y*width + x) = SUM;
       printf("(row, col) = (%d, %d), sumX = %d, sumY = %d, SUM = %d \n", y, x, sumX, sumY, SUM);
    }
   }
   // Write pgm file
   fp_write=fopen("lena_edge2.pgm","wb");
   write_pgm (fp_write, out_frame, format, &width, &height, &depth);
   free(in_frame);
   free(out_frame); 
   close(fp_read);
   close(fp_write);
   return 0;      
}
int sobel_convolution_x (int x, int y, int *GX, int *in_frame, int width, int depth)
{
   int i, j;
   int sumX=0;
   for (i=0; i<3; i++){
         for (j=0; j<3; j++){
             if (x+j >=width || y-2+i <0)
                sumX = sumX;
             else 
                sumX = sumX + GX[i*3+j]*(*(in_frame + width*(y-2+i)+x+j));
          }
       }
    return sumX;
}
int sobel_convolution_y (int x, int y, int *GY, int *in_frame, int width, int depth)
{
   int i, j;
   int sumY=0;
   for (i=0; i<3; i++){
         for (j=0; j<3; j++){
             if (x+j >=width || y-2+i <0)
                sumY = sumY;
             else 
                sumY = sumY + GY[i*3+j]*(*(in_frame + width*(y-2+i)+x+j));
          }
       }
   return sumY;
}
void read_pgm_head(FILE *fp_read, char *format, int *width, int *height, int *depth)
{
   char line[100];
   char line1[100];
   fgets(format, sizeof(format),fp_read); 
   fputs(format, stdout);
   fgets(line, sizeof(line), fp_read);
   fputs(line, stdout); 
   fgets(line1, sizeof(line1), fp_read);
   fputs(line1, stdout);
   fscanf(fp_read, "%d %d %d", width, height, depth);
   printf("Width = %d, Height = %d, Depth = %d\n", *width, *height, *depth);
}
void read_pgm_data(FILE *fp_read, int *in_frame, int *width, int *height)
{
   int x, y;
   for(y=0;y<(*height); y++){
     for(x=0;x<(*width); x++){
       *(in_frame + y*(*width) + x)=(int)getc(fp_read);
     }
    }   
}
void write_pgm (FILE *fp_write, int *out_frame, char *format, int *width, int *height, int *depth)
{
   int x, y;
   fprintf(fp_write, "%s\n%d %d\n %d\n", format, *width, *height, *depth);
   for(y=0;y<(*height);y++){
     for(x=0;x<(*width);x++){
         putc(*(out_frame + y*(*width) + x),fp_write);
     }
   }
}

Profile C code

%   cumulative   self              self     total           
time   seconds   seconds    calls  Ts/call  Ts/call  name    
100.35      0.01     0.01                             main
 0.00      0.01     0.00    65536     0.00     0.00  sobel_convolution_x
 0.00      0.01     0.00    65536     0.00     0.00  sobel_convolution_y
 0.00      0.01     0.00        1     0.00     0.00  read_pgm_data
 0.00      0.01     0.00        1     0.00     0.00  read_pgm_head
 0.00      0.01     0.00        1     0.00     0.00  write_pgm

Software Simulation

  • Software emulator
    • CaeSimPers : We modified CaeIsaVadd.cpp in order to perform vector multiplication. We should set the environment variable (CNY_CAE_EMULATOR) to CaeSimPers.
export CNY_CAE_EMULATOR=$CNY_PDK_PROJ/CaeSimPers/CaeSimPers

void
CCaeIsa::CaepInst(int aeId, int opcode, int immed, uint32 inst, uint64 scalar) // F7,0,20-3F
{
    switch (opcode) {
    // CAEP00 - M[a1] * M[a2] -> M[a3]
      	case 0x20: {
            uint64 length, a1, a2, a3;
            uint64 val1, val2, val3, sum = 0;
            length = ReadAeg(aeId, AEG_CNT);
            a1 = ReadAeg(aeId, AEG_MA1);
            a2 = ReadAeg(aeId, AEG_MA2);
            a3 = ReadAeg(aeId, AEG_MA3);
            for (int mc = 0; mc < NUM_MCS; mc += 1) {
                 for (uint64 i = 0; i < length; i += 1) { 
                    // Check that address is right for this MC (virtual address bits 8:6 for binary interleave)
                        if ((int)((a1+i*8 >> 6) & 7) == mc) {
                          AeMemLoad(aeId, mc, a1+i*8, MEM_REQ_SIZE, false, val1);
                          AeMemLoad(aeId, mc, a2+i*8, MEM_REQ_SIZE, false, val2);
                          val3 = val1 * val2;
                          sum += val3;
                          AeMemStore(aeId, mc, a3+i*8, MEM_REQ_SIZE, false, val3);
                        }
                 }
            }
            WriteAeg(aeId, AEG_SAE_BASE+aeId, sum);
            break;
       }
       default:{
           printf("Default case hit - opcode = %x\n", opcode);
               for (int aeId = 0; aeId < CAE_AE_CNT; aeId += 1)
                 SetException(aeId, AEUIE);
       }
  }
}
  • Sobel_edge_detection.c
// Make vectors [width*height*18]   
   values = (int *)malloc(sizeof(int)*width*height*18); 
   cons = (int *)malloc(sizeof(int)*width*height*18);
   for(y=0;y<height; y++){
     for(x=0;x<width; x++){ 
        for (k=0; k<2; k++){ 
           for (i=0; i<3; i++){
               for (j=0; j<3; j++){
                  *(cons + 18*(width*y + x) + k*9+ i*3 + j) = G[k*9+i*3+j];
                  if (x+j >= width || y-2+i <0)
                      *(values + 18*(width*y + x) + k*9 + i*3 + j) = 0;
                  else 
                      *(values + 18*(width*y + x) + k*9 + i*3 + j) = *(in_frame + width*(y-2+i)+x+j); 
               }    
           }      
        } 
     }
   }
   ....
   // Send vectors to Coprocessor
   for (i=0; i< size; i++) {
      a1[i] = cons[i];
      a2[i] = values[i];
   }
   // Call coprocessor
   act_sum = l_copcall_fmt(sig, cpTestEx1, "AAAAA", a1, a2, a3, size, ae_sum);  // Call coprocessor
   ...
   // Computation at Intel processor ( SUM  <-|Gx| + |Gy| )
   for(i=0; i<size; i++){
       if ( i % 18 == 8 ) {
            sum1 += (int)a3[i];
            SUM = abs(sum1);
            sum1 = 0; 
       }
       else if ( i % 18 == 17) {
            sum1 += (int)a3[i];
            SUM += abs(sum1);
            sum1 = 0;
            if (SUM > depth) SUM = depth;
            if (SUM < 0) SUM = 0;
            *(out_frame + (i+1)/18) = SUM;
            printf ("(row, col) = (%d, %d), SUM = %d \n", (i+1)/(18*width), ((i+1)/18)%width,  SUM);
            SUM = 0;
       }
       else 
            sum1 += (int)a3[i]; 
    }

Hardware Simulation

  • Vector Adder Simulation

  • vadd.v
 ... 
 assign c_result[64:0] = {1'b0, op1_ram_out[63:0]} * {1'b0, op2_ram_out[63:0]};
 assign c_result_vld = c_op1_vld && c_op2_vld && ~r_resq_afull;
 assign c_res_cnt = c_result_vld ? r_res_cnt + 24'h1 : r_res_cnt;
 assign c_sum[64:0] = r_result_vld ? r_sum[64:0] + r_result[64:0] : r_sum[64:0]; 
 assign c_sum_vld = (st_addr >= r_last_addr3);
 ...