Data Analysis Tool

From Robotic Agriculture Data Acquisition
Jump to: navigation, search

Overview

Our group focused on creating a MATLAB-based Data Analysis Tool to parse and analyze logged experimental data. The tool parses log files that contain data in a certain format. It stores the parsed data into a MATLAB struct, on which analysis is performed. Analysis primarily involves generating plots of the struct's data using user-configurable plotting options. Additionally, the user can use helper functions to generate custom plots.

The tool has two types of interfaces:

  • MATLAB Command-line Interface
  • MATLAB GUI

The MATLAB GUI is meant for users that are not familiar with MATLAB. The GUI offers extensive, simple and easy-to-use plotting configuration options. The GUI was constructed using MATLAB's development environment.

The MATLAB command-line interface requires the users to have prior experience with MATLAB. This interface allows the users to play with data contained in MATLAB structs. Furthermore, the user can use MATLAB helper functions to generate additional required or custom plots.

In the following sections, the design features and functionality of the Data Analysis Tool for the command-line interface are described. Documentation for the GUI interface can be found in RADA's git repository under 5-Matlab/DataAnalysisTool/SupportFiles.

How To Use the Tool

The two interfaces of the tool are used differently due to the nature of their interfaces.

Command-line Interface

If the command line interface is used independently, the following are the sequential steps that one needs to follow:

  1. Open up the DataAnalysis.m script
  2. Enter the name of the log text file
    • If you know the name of your log file AND if the file is in the same folder as that of DataAnalaysis.m, set the value of fname to the name of the file.
    • If you don't know the name of your file, set the value of fname to be empty. You will be able to select your file through a file explorer window after you run the script.
  3. Set the analysis configuration options in the script
  4. Run the DataAnalysis.m script
  5. Use the plotting helper functions to generate additional plots, if needed

If the command-line interface is used along with the MATLAB GUI, the following describes the behavior of the command-line interface:

  • If a struct called main already exists in the MATLAB workspace (possibly generated by the MATLAB GUI), the command-line interface will use that struct to perform the data analysis and will disregard the analysis configuration options set in the DataAnalysis.m script. Additionally, it will use the data contained in the main.expData struct and not parse a log file.
  • To generate plots and perform analysis, run the DataAnalysis.m script.

Log File

Log File Format.
Log File Format for the RADA Project.

The log files of an experiment must be in a .txt file. The data is logged in a matrix format such that the data for a certain entity is in a column. The entity names, or headers, corresponding to each column of data are mentioned once in a line before the data is continuously logged. The units of the entities are also mentioned in a separate line before the data is continuously logged; the line with units can be located either before or after the line that mentions the headers. The log file should contain different symbols at the beginning of each line to signify the type of information being specified in that particular line. For the RADA project, the following symbols were used:

  • #  - Information regarding the experiment's configuration. In the command line interface, this kind of information will be displayed once on the command line when the tool is executed. In the GUI interface, this information is disregarded.
  • % - Names of the entities being logged. We refer to entity names as "headers."
  • &  - Units of the headers

If n entities are being logged, n units must be mentioned and n data values need to be logged in each row.

The log files can have alternate formats as well. However, every format of a log file will need a corresponding MATLAB parsing function that extracts the data in the log file into the format of the MATLAB struct described below. Refer to the Parsing section of this page to learn how to include add a new parsing function that corresponds to the alternate log format.

Although any header can be logged in the log file, the following headers must be logged:

  • Time - Timestamps of the logged data. This column contains time values for every row of data that is logged. In this column, each value must be greater than the previous value. For the RADA project, the first value of this column need not be 0 or 1. RADA's log-parsing function computes a new time vector which contains values relative to the first one. In other words, the parsing function replaces the values in the Time column with the results of the following function: Time(i) = Time(i) - Time(1) [where Time(1) refers to the first value in this column and Time(i) refers to the ith value in this column].
  • Marker - Locations at which the marker has been applied. Whenever a marker is applied during the experiment, a non-zero value that corresponds to the marker number should be logged under this column. For example, when the third marker is applied, the value 3 should be stored under this column. When no markers are being applied, a 0 is logged under this column. For the RADA project, the number m is logged under this column whenever the mth marker is applied. For example, when the third marker is applied the value 3 is logged continuously under this column until the fourth marker is applied. The parsing function converts the logged values into the format described above; the Marker data will contain non-zero elements only at the times at which the markers were applied and zero elsewhere.

Struct

The Data Analysis Tool, after execution, returns a MATLAB struct, main to the MATLAB workspace. This is true for both types of interfaces of the tool. The following list hierarchy precisely describes the organization of the contents within the main struct:

  • main: top-level struct that contains all of the data parsed and data analysis configuration options
    1. params: struct that contains the parameters of the analysis
      1. file: struct that contains information about the log file being parsed
        • name: name of the file
        • path: location of the directory in the system where the file is located
        • pathName: location of the file in the system (location of directory + file name)
      2. plotting: struct that contains information about the default plotting parameters to be used for all headers while plotting
        • plot: main plotting switch; chooses whether or not to plot any header; 0 - no plotting OR 1 - yes plotting
        • separatePlot: switch to generate separate-plots; 0 - no separate-plots OR 1 - yes separate-plots
        • multiPlot: switch to generate multi-plots; 0 - no multi-plots OR 1 - yes multi-plots
        • subPlot: switch to generate sub-plots; 0 - no sub-plots OR 1 - yes sub-plots
        • clearFigs: switch that decides whether or not to close all the open figures; 0 - don't close any open figure OR 1 - close all open figures
        • separateData: MATLAB cell-array that contains the names of the headers to plot using separate-plots
        • multiData: MATLAB cell-array that contains the names of the headers to plot using multi-plots
        • subData: MATLAB cell-array that contains the names of the headers to plot using sub-plots
        • color: specifier of color of plotting line; domain: MATLAB plotting line color specifiers
        • marker: specifier of marker of plotting line; domain: MATLAB plotting line marker specifiers
        • style: specifier of style of plotting line; domain: MATLAB plotting line style specifiers
        • backgnd: 1x3 array of RGB values; domain: MATLAB ColorSpec
    2. expData
      • <header-name>: (repeated for all headers) struct that contains information about the header
        1. data: column vector that contains all of the logged data contained in the log file
        2. unit: units of the header as mentioned in the log file
        3. params: struct that contains information about the plotting parameters to be used for this header in specific. Overrides main.params.plotting, if any different

Parsing

The GUI and command line interface use the parse_log function to parse the log text file and return a MATLAB structure. The MATLAB-prototype of this function is:

function [loggedData] = parse_log(filename, params, expData)

  • filename: required input; this is the complete path of the file (along with the file name). Example: "/path/to/file/logfile.txt"
  • params: optional input; this is the main.params struct which holds the analysis configuration options.
  • expData: optional input; this is the main.expData struct which holds already parsed data. This is mainly used by the GUI interface in order to import the plotting parameters of the previously logged data.
  • loggedData: output; a MATLAB struct that contains all data contained in the log file. The format of the struct is the same as that of main.expData described in the "Struct" section.

Since only the first input argument is required, the parse_log is an independent function.

The following describes the values that are set in the loggedData struct:

  • loggedData
    • <header-name>: (repeated for all headers) the name of the header as mentioned in the line of the log file that begins with %
      1. data: the column of data contained in the log file under the header name for this header
      2. unit: unit of this header as mentioned in the line of the log file that begins with &
      3. params: if the input, params, wasn't passed in, then MATLAB's default plotting parameters are set as the values of the fields of this struct. The default plotting format corresponds to a blue, straight line on a white background. If the input, params, was passed in, the values of the following fields are set to those in the corresponding fields of the params struct.
        • plot: default value - 1; some of the analysis configuration options in the params struct dictate the value of this field. Please refer to the Analysis Configuration Options section under Command Line Interface for more details.
        • style: default value - '-'; if the input, params, was passed in, the value of this field equals params.plotting.style
        • color: default value - 'b'; if the input, params, was passed in, the value of this field equals params.plotting.color
        • marker: default value - (empty); if the input, params, was passed in, the value of this field equals params.plotting.marker
        • backgnd: default value - [1 1 1]; if the input, params, was passed in, the value of this field equals params.plotting.backgnd

Analysis Configuration Options

Both the interfaces allow multiple analysis configuration options. Analysis configuration options entail options regarding the type of plotting, headers to plot, plotting parameter, etc.

Command-line Interface

{Talk about what analysis configuration options are. What are the different types. Then you can talk about the different types of plotting. In the end you can go over the analysis configuration options that are there in the DataAnalaysis.m script}

  • fname: Name of the log text file. If you know the name of the log file that you want to parse, enter the name here only if it is in the working directory i.e. it is in the same directory as that of DataAnalysis.m. Otherwise, you may leave it blank. You will be able to choose the file to parse through an explorer window.
  • plot: Main switch to choose whether or not to plot at all. If set to 0, no plots will be generated regardless of the other plotting options. If set to 1, plots will be generated based on other plotting options that are set.
  • separatePlot: Switch to choose whether or not to generate separate-plots. Separate-plots are those that contain only one entity. If n entities were to be plotted using separate-plots, n plots will be generated with one plot for one entity. If set to 0, no separate-plots will be generated regardless of the other plotting options. If set to 1, separate-plots will be generated for the required entities.
  • multiPlot: Switch to choose whether or not to generate multi-plots. Multi-plots are those that contain multiple entities. If n entities were to be plotted using multi-plots, n plots will be generated with one plot for all entities. If set to 0, no multi-plots will be generated regardless of the other plotting options. If set to 1, multi-plots will be generated for the required entities.
  • subPlot: Switch to choose whether or not to generate sub-plots. Sub-plots are those that contain multiple independent graphs in a single window. This tool generates 2x1 subplots i.e. if n entities were to be plotted using sub-plots and if n is even, n/2 plot windows will be generated with one window containing two independent plots. If n is odd, n/2 + 1 plot windows will be generated; the last entity will be generated using a separate-plot. If set to 0, no sub-plots will be generated regardless of the other plotting options. If set to 1, sub-plots will be generated for the required entities.
  • clearFigs: Switch to close all the plots. NOTE: This is used only by the GUI and is not relevant in the command-line interface. It has been included here only to maintain consistency between the GUI and the command-line interface.
  • separateData: MATLAB cell array of entity names to plot using separate-plots. If empty AND separatePlot switch is set, all the entities (in the log file) will be plotted using separate-plots.
  • multiData: MATLAB cell array of entity names to plot using multi-plots. If empty AND multiPlot switch is set, all the entities specified by separateData will be plotted using multi-plots.
  • subData: MATLAB cell array of entity names to plot using sub-plots. If empty AND subPlot switch is set, all the entities specified by multiData will be plotted using sub-plots.
  • color: Character for color of the plotting line; MATLAB plotting line color specifiers
  • marker: Character for the marker of the plotting line; MATLAB plotting line marker specifiers
  • style: Character for the style of the plotting line; MATLAB plotting line style specifiers
  • backgnd: RGB array for background color of the plot. The value of this can have a minimum of [0 0 0] i.e. black and a maximum of [1 1 1] i.e. white.

GUI

The MATLAB GUI's analysis configuration options are similar to those of the command-line interface. Please refer to the GUI_Guide.pdf file located in RADA's git repository under 5-Matlab/DataAnalysisTool/SupportFiles to review the analysis configuration options of the GUI.

Plotting

The tool offers three types of plotting:

  • Separate-plots: 1 entity on 1 plotting figure; 1 plotting figure in 1 figure window
  • Multi-plots: Multiple entities on 1 plotting figure; 1 plotting figure in 1 figure window
  • Sub-plots: 1 entity on 1 plotting figure; multiple plotting figures in 1 window

"Plotting figure" refers to the area on which on the entity is plotted. "Figure window" refers to the operating system's window that contains the plotting figure.

Command Line Interface

The command-line interfaces offers the following functions to produce the different types of plotting:

  • plot_separate: separate-plotting for entities in the expData struct
  • plot_multi: multi-plotting for entities in the expData struct
  • plot_sub: sub-plotting for entities in the expData struct
  • plot_multi_vectors: multi-plotting for data contained in MATLAB vectors
  • plot_sub_vectors: sub-plotting for data contained in MATLAB vectors

The following sections describe the prototypes of the functions and how to use them.

plot_separate()

Separate-plots.
Separate-plots.

This function generates separate-plots for the data-headers (names of logged entities) passed to the function. The MATLAB-prototype of this function is:

function plot_separate(expData, useMarker, varargin)

  • expData: required input; this the struct that contains all of the parsed data
  • useMarker: required input; this is a flag that indicates whether or not to plot vertical lines at the marker locations.
  • varargin: variable-length input; this is where all of the data-headers to be plotted are specified. They should be contained in the expData struct. Every data header could optionally be followed by a character string that specifies the plotting parameters (color, style, marker) for the plot of that header. If no plotting parameter is specified here, the default plotting parameters stored in expdata.<header-name>.plot will be used.

The plot_separate function is dependent on the expData struct.

Example use

plot_separate(expData, 0, 'Pitch','r-','Roll','Yaw','go')

This means that Pitch will be plotted in red solids, Roll with default formatting (stored in expData.Roll.plot) and Yaw in green circles. Markers will not be used in plotting.

plot_multi()

Multi-plots
Multi-plots (with markers)

This function generates multi-plots for the data-headers (names of logged entities) passed to the function. The MATLAB-prototype of this function is:

function plot_multi(expData, useMarker, varargin)

  • expData: required input; this the struct that contains all of the parsed data
  • useMarker: required input; this is a flag that indicates whether or not to plot vertical lines at the marker locations.
  • varargin: variable-length input; this is where all of the data-headers to be plotted are specified. They should be contained in the expData struct. Every data header could optionally be followed by a character string that specifies the plotting parameters (color, style, marker) for the plot of that header. If no plotting parameter is specified here, the default plotting parameters stored in expdata.<header-name>.plot will NOT be used since the possibility of two entities having similar plotting parameters will lead to an unreadable graph (examples: same color for two headers).

The plot_multi function is dependent on the expData struct.

Example use

plot_multi(expData, 0, 'Pitch','r-','Roll','Yaw','go')

This means that Pitch will be plotted in red solids, Roll with no special formatting and Yaw in green circles. The default plotting parameters stored in expData.Roll.plot will NOT be used. Markers will not be used for plotting.

plot_sub()

Sub-plots.
Sub-plots.

This function generates sub-plots for the data-headers (names of logged entities) passed to the function. The command-line interface generates 2x1 sub-plots. If there are odd number of entities, a separate-plot will be generated for the last entity specified. The MATLAB-prototype of this function is:

function plot_sub(expData, useMarker, varargin)

  • expData: required input; this the struct that contains all of the parsed data
  • useMarker: required input; this is a flag that indicates whether or not to plot vertical lines at the marker locations.
  • varargin: variable-length input; this is where all of the data-headers to be plotted are specified. They should be contained in the expData struct. Every data header could optionally be followed by a character string that specifies the plotting parameters (color, style, marker) for the plot of that header. If no plotting parameter is specified here, the default plotting parameters stored in expdata.<header-name>.plot will be used.

The plot_sub function is dependent on the expData struct.

Example use

plot_sub(expData, 0, 'Pitch','r-','Roll','Yaw','go')

This means that Pitch will be plotted in red solids, Roll with no special formatting and Yaw in green circles. 2 figure windows will be generated. The first one will contain the plots for Pitch and Roll. The second one will contain a separate-plot for Yaw. Markers will not be used for plotting.

plot_multi_vectors()

This function generates multi-plots for MATLAB-vectors passed to the function. The MATLAB-prototype of this function is:

function plot_multi_vectors(xval, varargin)

  • xval: required input; this the x-axis vector that the following vectors will be plotted against.
  • varargin: variable-length input; this is where all of the vectors to be plotted will be specified. Every vetor could optionally be followed by a character string that specifies the plotting parameters (color, style, marker) for the plot of that vector. If no plotting parameter is specified here, the default MATLAB plotting parameters will be used.

The plot_multi_vectors function is an independent functions since it doesn't depend on the expData struct.

Example use

plot_multi_vectors(time, Pitch, 'r-', Roll, Yaw,'go')

This means that Pitch will be plotted in red solids, Roll with no special formatting and Yaw in green circles. NOTE: Pitch, Roll and Yaw are MATLAB-vectors here and NOT character strings that represent header names.

plot_sub_vectors()

This function generates sub-plots for MATLAB-vectors passed to the function. The MATLAB-prototype of this function is:

function plot_sub_vectors(xval, varargin)

  • xval: required input; this the x-axis vector that the following vectors will be plotted against.
  • varargin: variable-length input; this is where all of the vectors to be plotted will be specified. Every vector could optionally be followed by a character string that specifies the plotting parameters (color, style, marker) for the plot of that vector. If no plotting parameter is specified here, the default MATLAB plotting parameters will be used.

The plot_sub_vectors function is an independent function since it doesn't depend on the expData struct.

Example use

plot_sub_vectors(time, Pitch, 'r-', Roll, Yaw,'go')

This means that Pitch will be plotted in red solids, Roll with no special formatting and Yaw in green circles. 2 figure windows will be generated. The first one will contain the plots for Pitch and Roll. The second one will contain a separate-plot for Yaw. NOTE: Pitch, Roll and Yaw are MATLAB-vectors here and NOT character strings that represent header names.

GUI

The GUI offers separate-plots, multi-plots and sub-plots as well. It uses the command-line plotting functions to generate results that will be displayed in the GUI. All of GUI's plotting output is dependent on the expData struct and hence, plots of custom vectors cannot be generated through the GUI. The results of the MATLAB GUI can be found in the GUI_Guide.pdf file located in RADA's git repository under 5-Matlab/DataAnalysisTool/SupportFiles.

Additional Functionality

The Data Analysis offers two additional functions:

  • parse_camera_log
  • quaternionToEuler

parse_camera_log

The camera system can also log its own data through the OptiTrack Tools software. This data is logged in a .csv file. Examples of camera log files can be found in RADA's git repository under 5-Matlab/DataAnalysisTool/SupportFiles/. The software can track multiple trackables and store data for all the trackables in one .csv file.

This function parses the .csv file and outputs a struct that contains data logged by the camera. The MATLAB-prototype of this function is:

function trackableData = parse_camera_log(filepath)

  • filepath: required input; this the location of the .csv file to be parsed.
  • trackableData: output; this is a MATLAB struct that contains all of the data stored in the .csv file.

The following list hierarchy describes the organization of content in the trackableData struct:

  • trackableData
    • numOfTrackables: This is the number of trackables that the .csv file contains data for
    • <trackable-name>: This is a struct that contains the trackable's data
      • Name: Name of the trackable
      • T: Column vector that contains the timestamps of logging
      • P: nx3 matrix that contains the x,y and z coordinate values of the trackable. n equals the number of timestamps (length of the T column vector). The first, second and third column hold the x-, y-, and z- coordinate values respectively.
      • Q: nx4 matrix that contains the quaternion values of the trackable. n equals the number of timestamps (length of the T column vector). The first, second, third, and fourth column hold the qx, qy, qz, and qw values respectively for each timestamp.
      • Euler: nx3 matrix that contains the Euler angle values of the trackable. n equals the number of timestamps (length of the T column vector). The first, second, and third column hold the yaw, pitch, and roll values respectively for each timestamp.

quaternionToEuler

This function converts quaternions to Euler angles in the Yaw, Pitch, Roll order. The MATLAB-prototype of this function is:

function yawPitchRoll = quaternionToEuler(quat)

  • quat: required input; this is an nx4 matrix which contains the quaternions int the following order: qx, qy, qz, qw. n is an arbitrary number of sets of quaternions
  • yawPitchRoll: output; this is an nx3 matrix which contains the corresponding Euler angles for each set of quaternions contained in quat. The order of the angles is: Yaw, Pitch, Roll. These angles are in degrees.

The angles are calculated in the following way:

  • Pitch = arcsin(L3) where
    • L3 = 2*(qx*qz - qw*qy)
  • Yaw = arcosine(L1/cos(Pitch))*signum(L2) where
    • L1 = qw^2 + qx^2 - qy^2 - qz^2
    • L2 = 2*(qx*qy + qw*qz)
  • Roll = arcosine(N3/cos(Pitch))*signum(M3) where
    • N3 = qw^2 - qx^2 - qy^2 + qz^2
    • M3 = 2*(qy*qz + qw*qx)

Components

The source code and the necessary files of the Data Analysis Tool can be found in RADA's git repository under 5-Matlab/DataAnalysisTool/Tool/ The following enlists the primary files that make up the Data Analysis Tool:

  • DataAnalysis.m: Script that is the first means of interaction with the user for the command-line interface.
  • parse_log.m: Function that parses the text log file.
  • plot_data.m: Function that calls the plotting functions based on the user's analysis configuration options set in the DataAnalaysis.m script.
  • plot_separate.m: Function that generates separate_plots.
  • plot_multi.m: Function that generates multi_plots.
  • plot_sub.m: Function that generates sub_plots.
  • GUI.m: MATLAB GUI file that describes the behavior of the tool's GUI

The following enlists the support files that are required by the primary files to enable the tool's functionality:

  • GUI.fig: MATLAB figure that described the physical appearance of the GUI.
  • buildPlotCharString.m: Function. Input: plotParams (struct with a structure exactly similar to that of expData.<header-name>.params). Output: a character string. This function builds a character string corresponding to the values set in a plotParams struct.
  • haveDiffPlotStyles.m: Function. Input: data (the expData struct), headers (cell array of entity names contained in the expData struct). Output: boolean value. This function checks to see if the plotting parameters of the specified headers have the same values. True, if the plotting parameters distinct for all the headers. False, if any of the specified headers have the same plotting parameters.
  • isDefaultPlotCharString.m: Function. Input: plotParams (struct with a structure exactly similar to that of expData.<header-name>.params). Output: boolean value. This function compares the values of the plotParams struct to MATLAB's default plotting format (color: b; marker: (none); style: -). True if the color, marker and style values of the input plotParams struct are the same as the default plotting values. False otherwise.
  • isPlotCharString.m: Function. Input: str (character string). Output: boolean value. This function checks to see if the str character string is composed of characters that are used to describe a plot-formatting string. This only checks for a valid length of the character string and for valid characters in the character string. A character string that passes this test may or may not be valid plot-formatting character string. For example: 'bbb' and 'bo:' will both pass the check although the first character is not a valid character string.

The following enlists additional files that are not required to enable the primary functionality of the tool. However, these are helpful functions to automate parsing and computational tasks:

  • parse_camera_log.m: This function parses data stored in the .csv file generated by the camera system's OptiTrack Tools software.
  • quaternionToEuler.m: This function converts quaternion values to Euler angles.
  • plot_multi_vectors.m: This function is used to generate multi-plots on custom MATLAB vectors (description under the plotting section).
  • plot_sub_vectors.m: This function is used to generate sub-plots on custom MATLAB vectors (description under the Plotting section).