Software Platform

Auri’s software system is composed of several distinct modules:

  • Sensor Drivers
  • Computer vision
  • Mission Planning
  • Diagnostics

These components are all connected through the Robot Operating System (ROS), an open source communications library. ROS was chosen as a software framework because it supports a  highly distributed system, which lets Auri  maximize the use of its two onboard computers. In addition, ROS nodes empower the software team to create modular code and consistent I/O endpoints, which have been invaluable to the development process. By leveraging each component separately, different techniques can be evaluated and iterated on quickly. For example, swapping computer vision algorithms at runtime to determine the best performing one given current conditions.

Auri’s system’s are distributed over two computers: the Nvidia Jetson TX2 and the Odroid C2. The two computers coordinate over the ROS communication protocol to split up the computational workload of the software tasks. Intensive computer vision computations are ran on the Jetson to take advantage of its graphics processing unit (GPU). The Odroid handles the rest of the tasks, which includes gathering sensor data, updating the PID controllers, and monitoring the mission planner.

Control System

Auri’s control system is the low-level framework that handles all of its movement and data collection processes. The first part is a collection of custom-made interfaces for all Auri’s sensors and motors, allowing each device to interoperate easily with higher-level software components. Among others, the drivers are responsible for initializing a device, communicating with it via the appropriate protocol, and passing information between devices and higher-level software components using ROS messages.

Navigation Through Computer Vision

Although ARVP has grown greatly, budget restraints have made our Navigation system focus almost entirely on a vision based solution. This year, ARVP has made enormous strides in developing a computer vision system for Auri. Since these algorithms from previous years have never worked during the competition, it needed to be completely redeveloped keeping in mind two key principals: all new algorithms should work equally well during pool tests and on competition footage, and dependence on color should minimized or eliminated. This is because underwater image processing is affected by light attenuation and scattering, which results in poor contrast and non-uniform colors. Instead, Auri’s new vision algorithms use the shapes of the competition objects (eg, Buoy, path, gate), which are more reliable indicators. In light of a completely revamped architecture, this year’s two main goals are  to accomplish the buoy task and to follow the path. The three main vision algorithms used for target localization are:

  • Contour based shape detection
  • Parameterless Ellipse fitting
  • Deep Learning

The vision algorithms are implemented using the OpenCV library as well as LAPACK , a linear algebra package. To utilize the full capabilities of the Jetson TX2, the algorithms have been optimized for use on a GPU, especially when it comes to deep learning.

Contour Based Shape Detection

The method used to detect rectangles is contour detection. First, the image is converted to grayscale, then OpenCV’s Canny edge detector is used to find edges in the image. As the white borders on the bins contrasts with the background behind it, the edge detector is likely to find the edges of the bins. Next, in the resulting binary image, OpenCV’s contour detector is run, and each contour is approximated into a closed polygonal curve. Then, the set of  approximations, is searched for those which consist of four points, each point being a possible corner of a rectangle. Next  checking the angle created by each possible subset that consists of three points from the approximation will give  the angles for the corners of the polygon. Finally , an approximation will be considered a rectangle if each angle is 90 degrees.

Parameterless Ellipse Fitting

Combining a shape detector and a shape filter, parameterless ellipse fitting is a robust detection system for many elementary geometric shapes. For the competition, this technique is used to find the geometric center of buoys within the field of view and provides a viable alternative to deep learning techniques for buoy detection and classification.

Deep Learning

Another method used for object detection was deep learning. Deep learning refers to using artificial neural networks with more than one hidden layer to learn a function from training data. The advantage of deep learning is that it takes raw data as input and with enough training data can generalize well to a variety of scenes and conditions. For Robosub being able to handle variable lighting and water conditions in the Transdec pool, without having to hand tune algorithm parameters is important. The team’s goal for deep learning is to have the buoy and path task object detectors combined into a single generic deep learning model, both simplifying and improving Auri’s object detection system. The image to the right shows a red buoy correctly detected (green bounding box), and false positive yellow buoy (blue bounding box). For more details, read our journal paper.

Mission Planning

High-level control of robot operations is handled by a single hierarchical state-machine. The idea of it came at the beginning of preparation when the software team decided to aim for more tasks this year. To ensure an extensible structure and robust error handling, SMACH a open source Python library is used. The top-level plan is first conceived and drawn out as a flow chart including the flow of the main process, data keys, as well as concurrent side processes. Based on this plan, every state is coded as an individual python class, each with a function to run on activation as well as a list of exit paths and the in/out flow of data keys. In addition to creating custom states, states for ROS action and service clients are created directly to more easily interface with other ROS nodes. When this cannot be done, custom interfaces are made to manage the sending and receiving of messages on ROS topics, allowing the Mission Planner to control other nodes in the system. With all the interfaces and states coded, a complicated web of transitions between states is defined, as well as the flow of data between states. In our case, a separate state machine was built for every task to be achieved, such as the buoy task where the robot has to touch three different buoys in a certain order. This state machine contains several states such as detect, track, and repositioning, and the transitions between them are based on the success and failure of each state. For instance, when the tracker loses the buoy it makes the appropriate transition to re-detect it. Once a state machine for a certain task is ready, it is added itself as a state to the top-level state machine. It is this hierarchical capability of SMACH that allows handling both the completion of certain tasks, as well as the flow of all tasks in the competition. Overall, the goal of the ARVP Mission Planner is a combination of properly interfacing with all other components, and applying a plan to make every component work together seamlessly.


This year a simulator was built in order to test vision and control algorithms. The insight to build a simulator came from other teams at the competition last year. The simulator is the mutation of a real situation, and particularly useful for some special environments, i.e. underwater. The software team took advantage of the open source underwater simulator UWsim for marine robotics research and reconfigured it to simulate ARVP’s robot . Using the simulator is useful for testing the entire software stack together, makes debugging control logic easier and helps new members learn ROS. The development and maintenance of it will keep going into future years for the team.

However, the image quality of the simulator was not sufficient to be used as training data when it comes to deep learning and realistic underwater refractions. Therefore, an additional simulated scene was built using Unity. The high resolution and fast rendering comes in handy and helped generate synthetic images.


Three hydrophones (a microphone which detects sound waves under water) will be used to capture the signal. The hydrophones will be arranged in a L configuration, with the middle hydrophone acting as a reference for the other two. This signal will be amplified and DC-biased using a simple op-amp circuit. A dual channel ADC will be used to sample one pair of hydrophone data (reference + one other hydrophone). A software defined radio approach is used for processing. The signal will be filtered, normalized, and then the phase shift between the two signals will be used to determine the angle to the pinger in one axis. The other pair will then be sampled, and used to localize the pinger in the other axis. Combining this information will allow us to get a heading to the pinger.