Computer science grad student and Autonomous Driving Software Engineer with a special focus on perception and prediction tasks in the field.
M.S. in Bogazici University
ML for Autonomous Driving
2 Working Years
What I Do?
ROS
Robot Operationg System
Autonomous Driving Software Engineer
Perception/Prediction Engineer
Sensor Fusion
Embedded Computer Vision
ML Engineer
Test Engineer
Object Recognition: Architecture 1
This is a demonstration for a 3D object recognition for autonomous driving. This experiment is implemented on a real car and tested over real-life data gathered under mid and heavy traffic situations.
For the task of 3D object recognition, a multi-modal perception pipeline is implemented via fusing:
2D camera
3D LIDAR
In this demonstration there are following boundaries:
Detection is limited to camera's field of view.
Vision detection and classification is done via TensorRT-YOLOV3.
3D clusters are extracted by Euclidean Clustering.
Used Sensors
One LiDAR
One Camera
Tasks done for this demonstration (sub-processes):
Creating 3D map of the environment vehicle is supposed to operate in.
Activating and calibrating sensors.
Specifying transformations between different frames.
Pre-processing pointcloud.
Localization.
Detection (vision, range).
Fusion.
Creating 3D map of environment:
LIDAR sensor is used to create a 3D pointcloud of the road in which the vehicle is supposed to operate. And a vectormap layer can be also added, over pointcloud data to specify some semantic information about road, topology (e.g. location of driving area, lanes, lines, traffic signs, etc.) and, traffic rules (e.g. speed limits).
Activating and calibrating sensors:
Sensors (LIDAR, Camera) must be activated with proper parameters and they should be calibrated individually and together in order to be sure about their measurement accuracy. This is necessary to find a precise correspondence between LiDAR points and camera pixels.
Following link depicts a precise calibration performance:
world to map: This is a static transformation from world origin to the origin of local map.
map to base-link: This is a static transformation from local map to the control center of the ego-vehicle.
base-link to LÄ°DAR: This is a dynamic transformation interpreting LIDAR points to vehicle's control center for better control and planning performances.
LIDAR to camera: This is a static transformation to obtain a correspondence between LiDAR and camera observations.
Pre-processing pointcloud data:
This is a step for optimizing the performance of further processes by reducing amount of data and eliminating the negative effect of unnecessary data in execution time.
Down sampling the pointcloud while retaining the necessary information.
Cropping the pointcloud data to remove unnecessary parts that is not going to be used in further processes.
Removing ground from pointcloud since it is not a collidable obstacle for autonomous vehicle.
Comparing online LIDAR data with offline pointcloud map to remove similar static objects. This will retain only dynamic objects in interest.
Localization:
Localizing the ego-vehicle relative to the local pointcloud map. This is done via ndt-matching algorithm which compares live LIDAR data in execution time with previously gathered pointcloud data of the same environment to localize the vehicle inside map.
Detection:
LIDAR Detection: LIDAR detection is done via Euclidean Clustering algorithm that extracts the pointcloud cluster of dynamic objects. This only gives the shape and location of detected objects and is not providing any classification information about it.
Camera Detection: Camera detection is done via TensorRT-YOLOv3 which provides objects' location over image-plane and also classification label.
Fusion:
Here I use a late fusion approach that associates vision detected labels with range detected clusters after each domain specific network processes its own data and makes its own decision.
Object Recognition: Architecture 2
This demonstration is to depict the importance of test driven debugging for perception software of autonomous driving. Considering fine grain details of this safety-critical task can lead to come up with heuristics that can boost performance and accuracy of detection and classification tasks.
in this regard I have done tests under heavy and complex traffic situations for which the resulting algorithm can satisfy general, simple scenarios
Used Sensors
One LiDAR
One Camera
Object Recognition: Architecture 3
Point Pillars is a pipeline for 3D object recognition only from pointcloud data.
Image data is not involved.
There is no fusion (no multi-modality in data).
Sensor used: One LiDAR
Network architecture:
Input: 3D LIDAR pointcloud.
Feature encoding: Network to encode a 3D pointcloud into a pseudo-image in order to be processed with 2D CNN network in a fast and GPU compatible manner.
Backbone network: CNN network to extract features from pseudo-image.
Detection Head: Network to predict 3D bounding boxes with heading and with classification labels.
Point pillars is trained over KITTI dataset to detect classes:
car
pedestrian
cyclist
Two different network structures designed to be exploited:
One for detecting {car} class and one for {pedestrian, cyclist} classes.
This demonstration employs the network which exclusively recognizes only objects belong to 'car' class.
MODIFICATION
In this experiment the modification is reached by feeding spatiotemporal clusters of dynamic objects as input to the algorithm instead of raw pointcloud data.