3D Vehicle Detection and Classification Project Page

Introduction

This project is associated with the Real-time Traffic Monitoring Project at ARCS at CSUN. Vehicle detection and classification are important for intelligent transportation planning, directly impacting traffic flow and management efficiency. There are three main approaches for detecting and classifying objects in 3D point cloud data: projection-based, voxel-based, and point-based methods[1]. A recent study examined autonomous driving models and proposed a convolutional neural network (CNN), which was trained and tested on the PandaSet dataset, that showed in average precision ranging from 69.9% to 74.2%, with average recall between 69.2% and 79.7%[2]. In this project, we investigated vehicle detection and classification methods for 3D roadside traffic flow analysis.

Dataset

Data Collection

We conducted roadside traffic data collection with a 3D Velodyne 32C Lidar sensor camera and a stereo-based depth 2D camera.

Camera Setup

3D LiDAR Point Cloud Traffic Flow

Data Labeling

Manual labeling was performed in Matlab for the frames where a vehicle enters and exits the camera frame, and then intervening frames labeled automatically using the auto-labeling Point Cloud Temporal Interpolator. For vehicles turning at intersections or being obscured, otherwise, each frame needs to be manually labeled. We label the traffic dataset using LiDAR Labeler in MATLAB, and then organize them to be the same as the KITTI dataset format.

Public Datasets

KITTI Dataset

KITTI is a dataset for autonomous driving developed by the Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago. It is a collection of images and LIDAR data used in computer vision research, such as stereo vision, object detection, and 3D tracking.

Intelligent Algorithms

Complex-YOLO Algorithm (GitHub Repo)

Author: Jonathan Cordova
We implemented a Complex-YOLO [3] projection-based model by converting 3D point cloud data into a 2D bird-eye-view projection and applying the YOLO model for vehicle detection and classification. Our experimental results demonstrated that using transfer learning as a training technique and normalization of the rotation angle enhanced the model performance for vehicle detection and classification.

Data Labeling Formatting: We label the traffic dataset using LiDAR Labeler in MATLAB, and then organize them to be the same as the KITTI dataset format.

Converting PCAP File into BIN Files: Converting a PCAP file into BIN files begins by separating the PCAP file into multiple Point Cloud Data (PCD) files. Afterward, these PCD files are converted to BIN binary format while preserving the data's reflectivity (intensity) attribute.

Complex-YOLO Convolutional Neural Networks Model: The Complex-YOLO framework is configured to recognize three categories: cars, cyclists, and pedestrians. The neural network layers and filter values are modified to expand from three to nine categories to extend its classification capabilities for additional vehicle types.

Transfer Learning: Transfer learning is a technique where a pre-trained model is used to train a new model, allowing it to benefit from the knowledge gained earlier. We apply the transfer learning to our roadside traffic dataset, ensuring that the data format matches that of the pre-trained model.

Pipeline of the Complex-YOLO Algorithm:
Transform LiDAR frames into bird's-eye-view (BEV) images as input for CNN
Efficient Region Proposal Network (E-RPN) identifies potential object regions for the detection and classification of vehicles.

Model Training Results

Data is collected at the roadside traffic intersection of Zelzah Avenue and Plummer Street in Northridge, California. Using the LiDAR Labeler in MATLAB, tags for vehicle types ('Sedan', 'Truck', 'Motorcycle', 'SUV', 'Semi', 'Bus', and 'Van') are added. Our roadside dataset was divided into distinct subsets for training and evaluation, with 80% (5,165 frames) for training and 20% (1,292 frames) for evaluation.

Baseline Model: Table II shows the performance of a pre-trained Complex-YOLOv4 model, originally trained on the autonomous driving KITTI dataset, in evaluating our roadside traffic flow dataset for vehicle type ‘Sedan’.

Transfer Learning Model: Transfer learning (TL) models were developed and trained using knowledge gained from the baseline model. The performance results are shown in Table III. when implementing the transfer learning process without normalizing the vehicle rotation (orientation) angle, the highest precision achieved was 89.71%, with a recall of 85.15% at TF epoch 5. Conversely, the highest recall attained without normalization was 86.04%, with a precision of 86.86% at TF epoch 15.

Table IV results show that normalizing the rotational (vehicle orientation) angle of data labels improved model performance significantly. The best precision recorded was 92.20%, with a recall of 84.90% at TF epoch 10. Additionally, the highest recall achieved with normalization was 86.26%, with a precision of 90.58% at TF epoch 5.

Expanding on Vehicle Type Classification Table V shows better performance was achieved for the 'Sedan' class, as there were more instances of sedans in our roadside traffic dataset. With additional transfer learning epochs, vehicles of 'Truck' and 'SUV' could be classified.

Point-base Vehicle Detection (GitHub Repo)

Author: Rachel Gilyard
Improved traffic monitoring, with better roadside vehicle detection and classification, can help make driving safer. Information on the type and density of cars can help transportation planners to better design roads and provide more timely road maintenance. Current 2D RGB vehicle detection methods don't provide depth information and perform poorly in low visibility conditions like nighttime. Most open-source 3D LiDAR vehicle classification models are slow and don't reach the speed needed for real-time systems. Background filtering is not uncommon in 3D object detection, but there is a lack of data on its costs and benefits. Using background filtering with the PointPillars model reduced its computational load and inference time. The azimuth-height preprocessing algorithm showed an improvement of 6% for average precision with the dynamic background filter. This method also achieved a 31% reduction in inference time, including the time to filter the background LiDAR points. With more accurate predictions and faster inference, background filtering is a worthwhile technique to pursue when attempting to use 3D vehicle detection models in real-time systems.

Background Filtering: Two strategies were used for background filtering. The first with a background map holding the maximum distance at a given azimuth and elevation angle, and the second with a similar map that keeps track of the greatest height at a given azimuth-elevation. The filtering step took 6.2ms per frame (124 ms to filter one second of traffic frames).

PointPillars Model Evaluation: MMDetection3D library’s implementation of PointPillars was used for 3D object detection. MMDetection3D’s model comes pre-trained on the KITTI dataset, with a version that detects cars, and a version that detects cars, pedestrians, and cyclists.

Inference Pipeline: The background Map Updater accepted raw packets from the Velodyne VLP 32c sensor and parse them with the Velodyne Decoder package, which is written in Python and C++. The pipeline was written in Python with Numba used for acceleration. The pre-trained model was sourced from the mmdet3d API.

Traffic Monitoring

3D Vechile Detection and Classification

3D LiDAR Data

Data Labeling

Object Detection

Introduction

Dataset

Data Collection

Camera Setup

3D LiDAR Point Cloud Traffic Flow

Data Labeling

Public Datasets

KITTI Dataset

Intelligent Algorithms

Complex-YOLO Algorithm (GitHub Repo)

Model Training Results

Point-base Vehicle Detection (GitHub Repo)

Acknowledgement

References

Research Teams

Ruitao
Wu

2021 -
2022

Icess
Nisce

2022 -
2023

Jimwell
Castillo

2023 -
2024

Jonathan
Cordova

2023 -
2024

Rachel
Gilyard

2024-
2025

Faculty Advisors

Xunfei Jiang, Bingbing Li, Xudong Jia

Student Researchers

CSUN Student Researchers

Traffic Monitoring

3D Vechile Detection and Classification

3D LiDAR Data

Data Labeling

Object Detection

Introduction

Dataset

Data Collection

Camera Setup

3D LiDAR Point Cloud Traffic Flow

Data Labeling

Public Datasets

KITTI Dataset

Intelligent Algorithms

Complex-YOLO Algorithm (GitHub Repo)

Model Training Results

Point-base Vehicle Detection (GitHub Repo)

Acknowledgement

References

Research Teams

RuitaoWu

2021 -2022

IcessNisce

2022 -2023

JimwellCastillo

2023 -2024

JonathanCordova

2023 -2024

Rachel Gilyard

2024-2025

Faculty Advisors

Xunfei Jiang, Bingbing Li, Xudong Jia

Student Researchers

CSUN Student Researchers

Ruitao
Wu

2021 -
2022

Icess
Nisce

2022 -
2023

Jimwell
Castillo

2023 -
2024

Jonathan
Cordova

2023 -
2024

Rachel
Gilyard

2024-
2025