My Portfolio

Software Lead for ARC. Led a team of 40 engineers and architected three in-house neural networks that take camera frames in and decisions out.

NN 1 — 6D Pose Estimation

The first network predicts the opponent's 6-DoF pose (position + orientation) from a single camera frame. The output is enough to know not just where the opponent is, but which way it's facing — critical for choosing an approach angle.

Sensor fusion (EKF + IMU): The vision pose is fused with the bot's on-board IMU (Inertial Measurement Unit — accelerometer + gyroscope, hundreds of Hz) through an Extended Kalman Filter. The NN gives accurate-but-slow fixes; the IMU is fast-but-drifty; the EKF combines them into one smooth 6-DoF estimate, even between camera frames.

Jargon:

Pose: Position + orientation.
6-DoF: Three position numbers + three orientation numbers.
Keypoint: A labeled landmark (e.g., weapon tip) the network finds.
IMU: Sensor with an accelerometer (linear motion) and gyroscope (rotation).
EKF: Filter that fuses noisy sensor streams into a single best estimate. "Extended" handles nonlinear rotation.
Drift: Error that builds up integrating IMU motion alone — corrected by the vision fix.

NN 2 — Semantic Segmentation @ 90 FPS

The second network is a semantic segmentation model that labels every pixel of the camera feed as opponent-or-not. Trained from scratch and quantized to run at 90 FPS on-bot.

Blue pixels = the opponent mask predicted by the network on live arena footage.

Segmentation gives us a much richer signal than a bounding box: we can see partial occlusions, shape changes during a flip, and exact weapon-arm geometry — all of which feed the strategy network downstream.

Jargon:

Semantic segmentation: Classify every pixel in an image. Output is a mask the same size as the image.
Mask: A per-pixel label map. Here, a binary mask: opponent vs. not.
Encoder–decoder: The standard segmentation architecture — compress the image into features, then expand back to a pixel map.
Quantization: Compressing the network's weights (e.g., 32-bit → 8-bit) so it runs faster on edge hardware with minimal accuracy loss.
FPS: Frames per second. 90 FPS = ~11 ms per frame.
Edge inference: Running the network on the bot, not over a wireless link.

NN 3 — Strategy (PPO)

A policy network trained with PPO (Proximal Policy Optimization). The agent plays against itself in simulation, learning which actions win.

PPO training dashboard: win rate, throttle, distance, planned paths.

Jargon:

RL: Train by rewarding good outcomes and penalizing bad ones.
PPO: Stable RL algorithm; only allows small policy updates per step.
Policy network: Maps current state → action.
Reward shaping: Designing the reward (we reward hits; penalize idle time and flips).
Self-play: Train against past versions of yourself.

Why three networks?

Splitting pose → segmentation → policy gives each stage a clean contract, lets us train and debug them separately, and lets each run at the frame rate it actually needs.