Stretch

From Cpre584
Jump to navigation Jump to search

Documentation

Usage

Papers


Setup and Configuration

Starting Stretch

  • Set up the environment.
    • Make sure you are using a bash shell: just type "bash"
    • source /usr/local/bin/Stretch_src
  • Copy the example directory to where you want
    • cp /usr/local/Stretch/Examples <where you want>
  • Start the ide
    • st-ide &

Compiling

When you compile your StretchC file (extension .xc) the compiler automatically creates a header file that ***must be included in your c code.*** The header file name is defined in Project -> Project Properties... -> Compiler Options tab -> EI Header.

In summary: 1) set the header file name.
2) include it in the c code. #include "givenName.h"
3) compile your stretchC file
4) run stretchC code by hitting debug

Connecting to the board

Setup a connection type Board Type: S5530, S55DB-ddr

Board IP address: 192.168.1.135
PC IP address:
Board MAC address:00:1A:3B:3C:D0:B3 - Set in code
PC MAC address: 00:1b:21:23:33:54

To use IP/UDP for stretch a static ARP route must be defined in the ARP table.

Check the ARP table in linux with the command

 /sbin/arp 

Devices

Descriptions of each device and how they operate along with source code can be placed here.

Ethernet over PPI

In the example directory for stretch, there is a MAC loopback example under ../Examples/system/ppi/. You can open that up, turn off the loopback mode by changing the call ppi_init(mode,1) to ppi(mode,0) which turns off the loopback mode.

Further down you will see comments about scheduling a receive and scheduling a transmission. The buffers tx_buf and rx_buf are used.

To send data from the computer, send it to the boards IP address and in stretch C you should have this data in rx_buf. rx_buf will contain the Ethernet layer packet starting with the destination and source MAC, then the IP header and then the UDP header followed by the payload.

To send data to the PC, you will have to construct a similar packet in tx_buf.

To send the data from the host computer to the Stretch board we used Stretch_sendpic.c which was compiled using gcc. The syntax for using this program is ..... This allowed us to send an image from the desktop computer to the Strectch board.

For the Stretch project, we modified the ppiExample program so that it would receive an image buffer it and retransmit that image Stretch_ppiExample.c.

Example

We tried to send an image to the stretch board, copy that image from the receive buffer, rx_buf, into a buffer in memory and then once the entire image was transmitted, copy from the main memory buffer back to the tx_buf and transmit that to the host computer. The results can be seen below.

File:Lena512.png
Image used to send to the Stretch Board
File:Noflush.png
Image received from the Stretch Board without flushing the cache
File:Cacheflushed.png
Image received from the Stretch Board after flushing the cache


Since Wikimedia could not display the original .ppm files, they have been linked here. The images seen on the left are the .ppm image converted to the .png filetype.
Lena512.ppm
Cacheflushed.ppm
Noflush.ppm

Even after adding the cache flush line to the source code, the final image does not quite match the original even before any algorithm was applied to the image on the Stretch board. Since every packet sent is received and no extra packets are received by the Stretch board and likewise when the image is transmitted from the Stretch board to the host computer this leads us to believe that the problem is in the handling of the main memory buffer.

UART

asdgasdg

Stretch Instructions

The Stretch instructions make this platform unique, but they require a special format....

Requirements

asdg

Limitations

  • Loops: Must be completely unroll-able at compile time
  • 4096 Arithmetic Unit, used for arithmetic and logic operators
  • 8192 Multiply Unit, used for multiplication and shifting

Calculating AU and MU usage

C Operators AU MU
A * B 0 |A| * |B|
A (+, -) B Max(|A|, |B|) 0
A (<<, >>) B 0 |A| * 2|B|
A (<<, >>) constant 0 0
A (<, <=, >, >=, ==, !=) B Max(|A|, |B|) 0
A (&, ^, |) B Max(|A|, |B|) 0
A (&&, ||) B |A| + |B| 0
A (++, --) |A| 0
cond ? B : C Max(|B|, |C|) 0
cond ? B + C : B - C Max(|B|, |C|) 0
cond ? B(±)C : B+const Max(|B|, |C|, |const|) 0
cond? B+const1: B+const2 Max(|B|, |const1|, |const2|) 0
TABLE[X] (2n-1 * n * m)/3 0
A (const) 0 0
constant bit extract 0 0
A (const0, const1) constant bit-range extract 0 0
A (x) variable bit extract 0 |A| * |A|
A (x, y) variable bit-range extract 0 |A| * |A|

Syntax

Single instruction

Syntax:

SE_FUNC void INSTR_NAME(<arguments>)
{...} 

Example:

SE_FUNC void SimilarFuncs(SE_INST F1, SE_INST F2, <arguments>) 
{ 
   ...                       // lots of shared code - reused resources 
   x = F1 ? a : b;    // minor difference between F1 and F2 
   ...                       // more shared code - reused resources 
} 

Multiple instructions

Syntax:

SE_FUNC void func_name(SE_INST INSTR_NAME1, SE_INST 
INSTR_NAME2, ... <arguments>)
{...}

Example:

SE_FUNC void DisjointFuncs(SE_INST F1, SE_INST F2, <arguments>) 
{ 
   if (F1) 
   {   ...  }                // some code here - maybe some reused resources 
   else 
   {   ...  }                // very different code here 
} 

Example

asdg

Sobel Project

Sobel c source code

Here is the optimized code that I used in the Stretch IDE to implement the Sobel algorithm. The stretch instruction takes in 32 bytes of data and outputs 4 processed pixels. When processing a 128*128 version of the lena image, it takes the function that implements the Sobel algorithm "detectEdges" 7633 cycles. When compared to the unoptimized code, where the "detectEdges" function took 14888 cycles, this equates to a 48.73% reduction. Here are the resources used by the stretch instruction:

Arithmetic bits.................720
Logic bits........................0
Mux bits.........................80
Register bits.....................0
Pipeline bits....................32
AU total..........................832 out of 4096
Multiply bits.....................0
MU total............................0 out of 8192
Extension registers.................0 out of 4096

I would expect both AU's and MU's to be used since both addition and multiplication are present in the stretch instruction. I am a little confused why only AU's are used.

One issue I found when processing an image larger than 256*256, is the following exception occurs: *Warning* Unhandled user exception: LoadStoreTLBMultiHitCause. This has something to do with the "data" array used in the detectEdges function. When the array size is doubled: char data[((rowNum-2)*(colNum-2)*8)]; --> char data[((rowNum-2)*(colNum-2)*16)]; the exception no longer occurs. This issue does not occur when the code is complied with gcc (without the stretch instruction) and executed.

The following table shows % reduction in execution of the "detectEdges" function as more data in is passed into the stretch instruction. The image is 128*128.

Bytes Passed into
Stretch Instruction
Pixels Processed
per Instruction Call
Cycles per
Function Call
% Reduction
00148890
81826944.46%
162780847.56%
324763348.73%


Also, large images like the 512*512 lena image shown below takes a while to execute (about 20 mins) in the Stretch IDE so you will want to use a smaller image to test this code such as 128*128 or smaller.

Using Edge_Detection_Opt.xc and Sobel_Edge_Det.c the results can be seen below.

File:Lena512 edge.bmp
Original Image
File:EdgeSob1.bmp
After running edge detection