Perception¶
Perception is the extraction of information from raw sensory data.
The goals of perception can include:
- Object recognition (detection), e.g.:
- Pedestrian, cyclist, vehicle recognition
- Traffic sign recognition, traffic light recognition
- Drivable surface and lane recognition (also for localization and planning)
- Object classification:
- Classifying already recognized objects. For example, determining the color of a traffic light, or distinguishing between a van and a horse-drawn carriage.
- Object tracking and prediction:
- Determining the past paths of vehicles and pedestrians and predicting their future paths. This can be related to classification, as a horse-drawn carriage, despite being similar in size to a trailer, has different acceleration capabilities. This information can be used to plan routes and trajectories.
- Localization and mapping:
- SLAM: non-GNSS-based localization supplemented with local map creation. LOAM: LIDAR-based odometry.
Based on the sensors used, perception can involve: - LIDAR - Camera - Radar - IMU - GNSS/GPS - Microphone - Any combination of the above sensors
Danger
In Hungarian, it is easy to confuse the terms sensing (sensing) and perception (perception). Perception is a complex function that deals with producing processed, interpreted output from raw data.
flowchart LR
L[Planning]:::light
subgraph Perception [Perception]
T[Mapping]:::light
H[Localization]:::light
P[Object
Prediction]:::light
D[Object
Detection]:::light
K[Object
Classification]:::light
D-->K
end
subgraph Sensing [Sensing]
GPS[GPS/GNSS]:::light -.-> T
GPS -.-> H
LIDAR[LIDAR]:::light
KAM[Camera]:::light
IMU[IMU]:::light
LIDAR -.-> D
LIDAR -.-> P
LIDAR -.-> T
KAM-.-> P
KAM-.-> D
IMU-.-> T
D-.->P
end
T -->|map| L
H -->|pose| L
P -->|obj.| L
K -->|obj.| L
classDef light fill:#34aec5,stroke:#152742,stroke-width:2px,color:#152742
classDef dark fill:#152742,stroke:#34aec5,stroke-width:2px,color:#34aec5
classDef white fill:#ffffff,stroke:#152742,stroke-width:2px,color:#152742
classDef red fill:#ef4638,stroke:#152742,stroke-width:2px,color:#fff
This material is based on the Autonomous Driving Software Engineering course at TU Munich, compiled by the staff of the Institute of Automotive Technology. The lecture video is available in German:
Challenges and Difficulties¶
Several challenges can hinder recognition and its accuracy: - Weather (rain, snow, fog, ...) - Time of day (night, sunset, sunrise ...) - Occlusion (objects are only partially visible) - Computation time (exponentially more critical at higher speeds) - Different environments (urban, highway, forested areas ...)
Use Cases¶
Since it would be difficult to demonstrate every aspect of perception, we will instead showcase a few use cases.
Camera-based Traffic Light Classification¶
Processing camera images using artificial intelligence (neural network: YOLOv7).
LIDAR-based Simple Height Filtering¶
A task often encountered in practice is simple LIDAR filtering based on X, Y, and Z coordinates. Since LIDAR provides a simple representation of the 3D environment, it can be easier to work with than a camera. A common technique is to filter out the road level from LIDAR data (ground segmentation), with the remaining points (non-ground) representing all objects. Here we demonstrate a much simpler technology:
Clustering¶
After filtering out the road level from LIDAR data (ground segmentation), ground points and non-ground points are generated. The non-ground points need to be clustered to form points describing objects. The essence of clustering is that the points of a given object (e.g., a car) are close to each other.
Source: codeahoy.com
Sensor Fusion¶
The following video demonstrates perception through a real-life example.
LIDAR-based Road Surface / Curb Detection¶
An algorithm developed by our university.
LIDAR-based Object Tracking and Prediction¶
SLAM LIDAR and Camera Fusion¶
Simultaneous Localization and Mapping (SLAM) involves mapping the position and environment of a moving system (robot or vehicle) while navigating.