Learning-based Traversability Costmap for Autonomous Off-road Navigation ††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base const (2024)

Qiumin Zhu, Zhen Sun, Songpengcheng Xia, Guoqing Liu, Kehui Ma, Ling Pei†∗ and Zheng Gong§∗
Shanghai Key Laboratory of Navigation and Location Based Services,
Shanghai Jiao Tong University, Shanghai, China
§China Academy of Information and Communications Technology, Beijing, China
Correspondence: ling.pei@sjtu.edu.cn, gongzheng1@caict.ac.cn

Abstract

Traversability estimation in off-road terrains is an essential procedure for autonomous navigation. However, creating reliable labels for complex interactions between the robot and the surface is still a challenging problem in learning-based costmap generation. To address this, we propose a method that predicts traversability costmaps by leveraging both visual and geometric information of the environment. To quantify the surface properties like roughness and bumpiness, we introduce a novel way of risk-aware labelling with proprioceptive information for network training. We validate our method in costmap prediction and navigation tasks for complex off-road scenarios. Our results demonstrate that our costmap prediction method excels in terms of average accuracy and MSE. The navigation results indicate that using our learned costmaps leads to safer and smoother driving, outperforming previous methods in terms of the highest success rate, lowest normalized trajectory length, lowest time cost, and highest mean stability across two scenarios.

Index Terms:

autonomous navigation, traversability, off-road environments, unmanned ground vehicles, Inertial Measurement Unit (IMU)

I Introduction

Autonomous navigation in off-road environments is a critical problem for unmanned ground vehicles. Robots are utilized in wilderness, forests, mines and other complex terrains for tasks like agriculture, mining, planetary exploration, surveillance and so on. To ensure safe driving, it is crucial to analyze the traversability of these terrains and construct a costmap for navigation.

Various works focus on this issue and have contributed to the progress of research. Initially, the traversability estimation was regarded as a binary classification problem to differentiate between traversable and untraversable terrains. Currently, it can be viewed as the multiple classes categorization relevant to the level of traverse difficulty [1, 2] or the regression to assign a continuous traversability value [3]. Despite remarkable efforts on evaluating traversability, there are two remaining challenges for the research community.

One challenge is the representation of the terrain characteristics. The statistics of geometric properties is a popular approach, building a height grid map to calculate step height, roughness and slope as traversability score [4, 5]. The appearance-based method redirects the problem to image processing and classification. The terrain is categorized by texture and color of the image and recently semantic segmentation has become a useful tool for traversability estimation [6, 7]. While the geometry and appearance information can assess traversability separately, there are conditions with the same geometric properties or appearance but different traversability such as dry mud and wet mud, or low grass and high grass. We consider both geometric and visual characteristics of environments to evaluate traversability comprehensively. While such information can represent different terrains, the actual impact of the ground on the vehicle that determines the traversability is still unknown.

Therefore, the other challenge concerns the definition of traversability cost labels. Although a variety of works propose learning-based costmap approaches, they predict the cost ignoring the nuance of the robot’s interactions with different terrains. Some simply divide the ground into traversable and untraversable whether the vehicle can reach or not [8], while others represent slip as the velocity error [9]. To capture roughness and bumpiness that the vehicle experiences, proprioceptive sensors like Inertial Measurement Units (IMUs), are useful for sensing these characteristics. It is easy for IMUs to build on the ground vehicle and capture the state [10, 11]. The signals reflect the vehicle’s vibrations, movements, and changes in feelings as traversability costs. We take the properties of IMU data and robotic risk into consideration, handling the IMU linear acceleration in the z-axis to generate continuous traversability values as learning targets.

In brief, we propose a learning-based method to predict traversability costmaps that present risk-aware influence of the terrains with respect to the robot navigation. Exteroceptive information including semantic and geometric features and interoceptive information as the robot velocity are the inputs for learning a continuous cost which is supervised by processed IMU data. We combine a Convolutional Neural Network (CNN) backbone to extract features from exteroceptive information, a neural network to process interoceptive information and a Long Short-Term Memory (LSTM) to handle the concatenate features as our learning architecture.

The main contributions of this research are as follows:

II RELATED WORK

To estimate traversability, hybrid methods encode both geometric and semantic information to build a traversability map representing the vehicle’s surroundings. [12, 13] compute continuous geometric traversability scores, assign discrete scores to various semantics and calculate traversability cost with the sum of these two scores.

Recently, many works have focused on costmap learning for autonomous navigation in challenging off-road environments. Fan et al. [14] learn a traversability risk-aware costmap through a CNN with LiDAR point clouds as inputs and geometric cost as ground truth labels. Cai et al. [15] learn a speed distribution map from a semantic input and convert the map into a costmap with a conditional value at risk (CVaR). Seo et al. [8] leverage Positive-Unlabeled (PU) learning method and a 2D normalizing flow to learn a binary traversability map considering wheel-contact points as traversable. Although these approaches predict costmaps using visual and geometric characteristics captured by cameras and LiDARs, the interactions between the robot and the ground surface are ignored in the process of cost prediction. To learn a costmap based on the interaction, Frey et al. [9] use the discrepancy between the robot’s current linear velocity and reference linear velocity as labels to estimate dense traversability from RGB images.

Besides the robot’s linear velocity, other interactions can be sensed by IMU. Sathyamoorthy et al. [16] generate the cost labels by applying Principal Component Analysis (PCA) to reduce the dimensions of the 6-dimensional IMU data to two principal components. Waibel et al. [17] combine normalized angular velocity in x and y and linear acceleration in z to obtain the real IMU cost. Seo et al. [18] project the magnitude of z-acceleration from IMU to contact points as ground-truth traversability. Some works process the IMU data at frequency domain. Yao et al. [19] analyze the average amplitude spectrum of the x-axis and y-axis angular velocity and the z-axis linear acceleration by Fast Fourier Transform (FFT) to calculate traversable costs. Castro et al. [20] describe the traversability as properties of the terrain with the bandpower of the z-axis IMU linear acceleration. We use colored point cloud data to populate continuous costmaps, training with risk-aware IMU traversability cost labels and demonstrate our approach in complex and challenging off-road terrains on the autonomous robot platform.

III METHODS

III-A Overview

We propose a learning-based framework to assess terrain traversability and obtain a costmap for navigation in off-road environment. It takes RGB-D point cloud and robot velocity as inputs of a neural network, and outputs a robot-centric traversability costmap. The overview of the framework is illustrated in Fig. 1. The costmap predicted by the network can be used by the path planning algorithm to realize autonomous navigation.

Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (1)

The framework is decomposed into three main modules: 1) traversability labels generation; 2) 3D environment and robot motion preprocessing; and 3) costmap prediction. The labels generation step calculates traversability cost from proprioception. The preprocessing module extracts semantic and geometric data from RGB-D images and point clouds, and represent robot velocity, as the inputs of the network. With the inputs and labels, a neural network is trained to predict traversability cost.

III-B Traversability Labels Generation

To learn a continuous and normalized traversability cost, we use linear acceleration in the z-axis to describe the interactions between the robot and the ground [18, 19, 20]. The z-axis linear acceleration, generally understood as the force acting on the z-axis of the robot, reflects the roughness and bumpiness of the terrain. As shown in previous work [21, 22], the IMU linear acceleration measurements follow normal or Gaussian distributions in stationary conditions. With the physical properties of the z-axis linear acceleration, the mean of the distribution is approximately the value of gravitational acceleration. We adopt Value at Risk (VaR), which is used as assessment of robot risk [23], to quantify traversability cost. Since values that deviate equally from the mean on both sides can be considered to have the same cost, the distribution XN(μ,σ2)similar-to𝑋𝑁𝜇superscript𝜎2X\sim N\left(\mu,\sigma^{2}\right)italic_X ∼ italic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) can be transformed into a half-normal distribution by |Xμσ|𝑋𝜇𝜎\left|\frac{X-\mu}{\sigma}\right|| divide start_ARG italic_X - italic_μ end_ARG start_ARG italic_σ end_ARG | and VaRα(Az)subscriptVaR𝛼subscript𝐴𝑧\text{VaR}_{\alpha}(A_{z})VaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) at level α𝛼{\alpha}italic_α is simply the (1α)1𝛼(1-{\alpha})( 1 - italic_α )-quantile, shown in Fig. 2:

VaRα(Az):=min{az[Az>az]α}assignsubscriptVaR𝛼subscript𝐴𝑧conditionalsubscript𝑎𝑧delimited-[]subscript𝐴𝑧subscript𝑎𝑧𝛼\operatorname{VaR}_{\alpha}(A_{z}):=\min\{a_{z}\mid\mathbb{P}[A_{z}>a_{z}]\leq\alpha\}roman_VaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) := roman_min { italic_a start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ∣ blackboard_P [ italic_A start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ] ≤ italic_α }(1)

We take each processed IMU z-axis linear acceleration value as VaRα(Az)subscriptVaR𝛼subscript𝐴𝑧\text{VaR}_{\alpha}(A_{z})VaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) and calculate its risk level α[0,1]𝛼01{\alpha}\in\left[0,1\right]italic_α ∈ [ 0 , 1 ] as the traversability cost label, where 1 means lowest cost and 0 means highest cost. The data recorded during a steady motion is used to derive the distribution.

Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (2)

III-C 3D Environment and Robot Motion Preprocessing

To acquire inputs of the learning network, we represent terrain characteristics about the environment as visual and geometric information and parameterize the robot velocity to high-dimensional. We use Fast-SCNN [24] to predict the semantic segmentation of a RGB image from the RGB-D camera and generate colored point cloud based on the semantic and depth images. A local grid map with a resolution of 0.1 meter is built from the point cloud and each grid cell contains RGB and geometric values, regardless of points 2 meters above the vehicle that have no contact.

To calculate the geometric estimation including slope, flatness and height difference, we first construct an octree of the local point clouds to expedite the retrieval of points in each location. The slope s𝑠sitalic_s in each grid cell is represented by the angle between the z-axis of the vehicle coordinate frame and the surface normal of a square area with a width of 0.5 meter centered around the grid cell:

s=180arccos(𝐧𝐞𝐳)π𝑠180𝐧subscript𝐞𝐳𝜋s=\frac{180\arccos{\left(\mathbf{n}\cdot\mathbf{e_{z}}\right)}}{\pi}italic_s = divide start_ARG 180 roman_arccos ( bold_n ⋅ bold_e start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ) end_ARG start_ARG italic_π end_ARG(2)

where 𝐧𝐧\mathbf{n}bold_n is the unit normal vector calculated with PCA [12] and 𝐞𝐳subscript𝐞𝐳\mathbf{e_{z}}bold_e start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT is vector [0,0,1]superscript001top\left[0,0,1\right]^{\top}[ 0 , 0 , 1 ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. The range of the slope value is from 0 to 90 degrees.The flatness f𝑓fitalic_f is calculated by the vertical height of points in arbitrary grid cell:

f=j=1N[𝐧(𝐩𝐣𝐩¯)]2N+1𝑓superscriptsubscript𝑗1𝑁superscriptdelimited-[]𝐧subscript𝐩𝐣¯𝐩2𝑁1f=\sqrt{\frac{\sum_{j=1}^{N}\left[\mathbf{n}\cdot(\mathbf{p_{j}}-\bar{\mathbf{%p}})\right]^{2}}{N+1}}italic_f = square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ bold_n ⋅ ( bold_p start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT - over¯ start_ARG bold_p end_ARG ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N + 1 end_ARG end_ARG(3)

where 𝐩¯¯𝐩\bar{\mathbf{p}}over¯ start_ARG bold_p end_ARG is the 3D centroid of the grid cell, 𝐩𝐣=[x,y,z]subscript𝐩𝐣superscript𝑥𝑦𝑧top\mathbf{p_{j}}=[x,y,z]^{\top}bold_p start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT = [ italic_x , italic_y , italic_z ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the position of points in the grid cell and N𝑁Nitalic_N is the number of these points. The height difference hhitalic_h is computed as the max deviation between the vertical height of points:

h=max[𝐧(𝐩𝐢𝐩¯)]min[𝐧(𝐩𝐣𝐩¯)],i,j[1,N]formulae-sequence𝐧subscript𝐩𝐢¯𝐩𝐧subscript𝐩𝐣¯𝐩𝑖𝑗1𝑁h=\max\left[\mathbf{n}\cdot(\mathbf{p_{i}}-\bar{\mathbf{p}})\right]-\min\left[%\mathbf{n}\cdot(\mathbf{p_{j}}-\bar{\mathbf{p}})\right],i,j\in[1,N]italic_h = roman_max [ bold_n ⋅ ( bold_p start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT - over¯ start_ARG bold_p end_ARG ) ] - roman_min [ bold_n ⋅ ( bold_p start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT - over¯ start_ARG bold_p end_ARG ) ] , italic_i , italic_j ∈ [ 1 , italic_N ](4)

When driving at high speeds or making sharp turns, the vehicle can sense significant vibrations caused by bumpy terrains. We consider the influence of the velocity and process it as an input of our network. To match the dimension of the local grid map, we use Fourier features [20] to parameterize the velocity to a higher dimensional vector λ(v)𝜆𝑣\lambda(v)italic_λ ( italic_v ):

λ(v)=[cos(2πb1v)cos(2πb1ω)sin(2πb1v)sin(2πb1ω)cos(2πbmv)cos(2πbmω)sin(2πbmv)sin(2πbmω)]𝜆𝑣delimited-[]2𝜋subscript𝑏1𝑣2𝜋subscript𝑏1𝜔2𝜋subscript𝑏1𝑣2𝜋subscript𝑏1𝜔2𝜋subscript𝑏𝑚𝑣2𝜋subscript𝑏𝑚𝜔2𝜋subscript𝑏𝑚𝑣2𝜋subscript𝑏𝑚𝜔\lambda(v)=\left[\begin{array}[]{cc}\cos\left(2\pi b_{1}v\right)&\cos\left(2%\pi b_{1}\omega\right)\\\sin\left(2\pi b_{1}v\right)&\sin\left(2\pi b_{1}\omega\right)\\\vdots&\vdots\\\cos\left(2\pi b_{m}v\right)&\cos\left(2\pi b_{m}\omega\right)\\\sin\left(2\pi b_{m}v\right)&\sin\left(2\pi b_{m}\omega\right)\end{array}\right]italic_λ ( italic_v ) = [ start_ARRAY start_ROW start_CELL roman_cos ( 2 italic_π italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_v ) end_CELL start_CELL roman_cos ( 2 italic_π italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ω ) end_CELL end_ROW start_ROW start_CELL roman_sin ( 2 italic_π italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_v ) end_CELL start_CELL roman_sin ( 2 italic_π italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ω ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL roman_cos ( 2 italic_π italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_v ) end_CELL start_CELL roman_cos ( 2 italic_π italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_ω ) end_CELL end_ROW start_ROW start_CELL roman_sin ( 2 italic_π italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_v ) end_CELL start_CELL roman_sin ( 2 italic_π italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_ω ) end_CELL end_ROW end_ARRAY ](5)

where v𝑣vitalic_v is the norm of linear velocity in the x-axis and y-axis, ω𝜔\omegaitalic_ω is z-axis angular velocity, bi𝒩(0,σ2)similar-tosubscript𝑏𝑖𝒩0superscript𝜎2b_{i}\sim\mathcal{N}\left(0,\sigma^{2}\right)italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) are sampled from a Gaussian distribution and m𝑚mitalic_m corresponds to the scale of the local map patches.

III-D Costmap Prediction

The pipeline of our costmap prediction is similar to [20]. We first extract local map patches from robot trajectories, then predict traversability costs for these patches by training a CNN-LSTM network, and finally generate a costmap for navigation with the trained network.

We collect all environment data to construct a global map, then locate and extract 1×1111\times 11 × 1 meter patches in the global map with a set of robot odometry information sampled per 0.1 second. The linear and angular velocity is also recorded at these positions. We compute the average risk level using five consecutive frames of IMU linear acceleration data at each position as the ground-truth label for traversability cost.

We train a network to predict a continuous value from 0 to 1 as the traversability cost with inputs of semantic and geometric map patches, as well as parameterized velocity. We concatenate these features extracted by ResNet18 [25] and MLP, then pass them through an LSTM [26] to handle sequences of data since the point cloud patches are sampled along the driven trajectory of a global map. We use the same loss and optimizer as [20].

To produce real-time costmaps during navigation, we take 10×10101010\times 1010 × 10 meter point cloud local map generated from the current RGB image and depth image. The 1×1111\times 11 × 1 meter patches are downsampled from the local map per 0.2 meter. With these patches and velocity, the network predicts traversability costs and the final value of each 0.1×0.10.10.10.1\times 0.10.1 × 0.1 meter cell is the average of traversability costs of all patches that cover this cell. The costmap maintains the shape of the local map input by checking the cell with no point and removing it from the output. While the network only learns the information of the environment that the vehicle can traverse, it ignores the obstacle data of where is unreachable. We record the semantic classes that the robot cannot travel during the dataset collection and assign a value of 0 to the cells containing these classes.

IV EXPERIMENT AND RESULTS

IV-A Simulation Setup

We use the natural environments in Gazebo simulation [27] to train and test our framework. A HUSKY robot is equipped with a RGB-D camera and an IMU. A Gazebo plugin is used to obtain the ground-truth odometry of the vehicle. Two scenarios, as shown in Fig. 3, are employed for experimentation: i) the rugged hillside, which contains steep slopes with high grass, bushes, rock and trees; ii) the dense forest, which consists of high grass, stones, bushes, trees and fallen trunks. In order to enhance the realism of the scene, we adjust the physical properties of high grass so that the vehicle can pass through but with resistance. We also modify the environments to be more challenging for navigation by adding grass, rocks and bushes in different places. With the Robot Operating System (ROS), the data is recorded and the vehicle is controlled. The algorithm and the simulation are processed by a desktop computer with Intel i5-12490F CPU, 32GB RAM, and Nvidia RTX 3070 GPU.

Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (3)
Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (4)

IV-B Training Data

We collect data in the simulation environments by following different paths for our network training. We use RGB and depth images in the dataset to obtain dense colored point clouds and the IMU to obtain the ground-truth traversability costs. The odometry data including position, orientation and velocity is provided by the plugin. The ratio of low to high cost frames is 2:1 in the hillside scenario for its complexity and roughness, while it is 5:1 in the forest scenario because of the flatness. In total, we generate 4K training frames, 0.5K validation frames and 0.5K test frames for our experiment.

IV-C Costmap Evaluation

We compare the performance with two appearance and geometry based baselines to evaluate the quality of our method. TNS [12] uses a non-learning approach that calculates the traversability score based on terrain classes and geometric traversability including slope and step height. HDIF [20] characterizes roughness as the 1-30 Hz bandpower of the IMU z-axis linear acceleration and trains a network to learn the traversability costs. This network is trained on our dataset. Fig. 4 shows the costmaps predicted by these three methods at the same location. Although these methods output a continuous value between 0 and 1, we simplify the comparison to avoid any biases. We use the ground-truth semantic environments to populate a traversability grid map by converting the labels of the point cloud to either 0 or 1. Traversable regions like ground, high grass and trail are set to be 1, and other regions like trunk, bush and rock are set to be 0.

Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (5)

The vehicle travels along the paths both in realistic and semantic environments. The global costmaps are generated by the baselines and our method with the collected sensor data and evaluated with four different metrics, similar to [12]. The metrics are described as follows:

Trav. Accuracy: The accuracy of traversable regions.

All Accuracy: The accuracy over all grids of the map.

ROC (Receiver Operation Curve): We set the cost exceeding 0.5 to 1, and the rest to 0 as a binary classification and use ROC to indicate the performance through true positive and false positive rates.

MSE (Mean Squared Error): We calculate the average distance between the continuous predictions and the ground truth labels to estimate the quality of the predictions.

SceneMethodTrav. Acc \uparrowaAcc \uparrowAUC \uparrowMSE \downarrow
hillTNS[12]86.3486.120.8940.117
HDIF[20]14.8055.950.6950.662
ours92.3991.520.8970.149
forestTNS[12]40.7342.070.9200.405
HDIF[20]2.565.000.7950.463
ours99.0797.480.9140.0581
Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (6)
Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (7)

Table I shows the comparison of costmap prediction performance in two scenarios. Our method has better performance on accuracy and MSE. Fig. 5 indicates that our method performs similarly to TNS on AUC, both superior to HDIF. TNS divides the semantic classes into traversable and untraversable terrains in advance, hence it has a little advantage over ours. The result of the costmap comparisons verifies that our definition of traversability cost is reasonable and our costmap prediction is accurate.

IV-D Navigation Evaluation

We validate our costmap for autonomous navigation in off-road environments and compare the performance with the two baselines. A real-time costmap is generated by the current frame of RGB images, depth images and odometry data. We use the output point cloud of the costmap for local path planning by combining it with a collision-free path planning algorithm [28].

We design 8 trials in each scenario and use the following metrics [16] to evaluate the performance of the navigation:

Success Rate (Rsuccesssubscript𝑅𝑠𝑢𝑐𝑐𝑒𝑠𝑠R_{success}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT): The proportion of the robot achieving the goal.

Normalized Trajectory Length (L¯¯𝐿\bar{L}over¯ start_ARG italic_L end_ARG): The trajectory length normalized by the Euclidean distance between the start and the goal for all successful trials.

Relative Time Cost (trelsubscript𝑡𝑟𝑒𝑙t_{rel}italic_t start_POSTSUBSCRIPT italic_r italic_e italic_l end_POSTSUBSCRIPT): The ratio of the time cost in the same successful trials of other methods to ours.

Mean Stability (S¯¯𝑆\bar{S}over¯ start_ARG italic_S end_ARG): The traversability cost calculated by the IMU data. We compute the mean traversability cost of all frames of IMU z-axis linear acceleration as the stability of the robot in trials.

SceneMethodRsuccesssubscript𝑅𝑠𝑢𝑐𝑐𝑒𝑠𝑠R_{success}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_c italic_c italic_e italic_s italic_s end_POSTSUBSCRIPT (%) \uparrowL¯¯𝐿\bar{L}over¯ start_ARG italic_L end_ARG (1absent1\to 1→ 1)trelsubscript𝑡𝑟𝑒𝑙t_{rel}italic_t start_POSTSUBSCRIPT italic_r italic_e italic_l end_POSTSUBSCRIPT \downarrowS¯¯𝑆\bar{S}over¯ start_ARG italic_S end_ARG \uparrow
hillTNS[12]47.461.5051.6480.727
HDIF[20]29.282.6473.6580.712
ours1001.1031.0000.799
forestTNS[12]40.481.4071.8900.758
HDIF[20]69.671.0901.0470.799
ours96.821.0981.0000.882
Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (8)
Learning-based Traversability Costmap for Autonomous Off-road Navigation††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base construction and application industrialization (HCXBCY-2023-020). (9)

Table II shows the comparison of navigation performance in two scenarios. Our method outperforms the other methods in terms of all four metrics. HDIF takes shorter trajectories in the forest scene since it travels directly towards the goal without considering the obstacles and its successful trajectories for evaluation are less than ours. A higher success rate of our method shows that it is effective even in different and complex situations while the other methods have failed. The shorter normalized trajectory length of our method indicates that it is more efficient and precise by minimizing unnecessary movements and simultaneously avoiding collision. The less time cost and higher stability in the condition of the same preset speed demonstrate that our method ensures a faster but safer and more stable navigation. The trajectories of the navigation experiment are illustrated in Fig. 6.

IV-E Further Analysis

We use angular velocity as the inputs and an LSTM in the network. To validate whether they improve the performance of our network, we make an ablation study. We use the same dataset to train three networks: ours, ours without angular velocity ω𝜔\omegaitalic_ω and ours without LSTM. We compare the three best learned models in the validation set and test set. Table III shows that angular velocity and LSTM both have improvements in the loss of the network.

MethodVal. Loss (×102absentsuperscript102\times 10^{-2}× 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT)Test Loss (×102absentsuperscript102\times 10^{-2}× 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT)
Ours5.596.25
Ours w/o ω𝜔\omegaitalic_ω6.037.42
Ours w/o LSTM6.276.56

V CONCLUSIONS

We present a costmap prediction system that uses a learning method to identify the interactions between the robot and different terrains for autonomous navigation in off-road environments. We introduce a novel traversability cost labelling in the consideration of IMU data and robot risk. Our method incorporates semantic and geometric information of the surface and the robot’s velocity as inputs, then outputs a continuous traversability costmap. We demonstrate that our costmaps ensure safe and stable navigation for complex off-road scenarios in comparison of previous work. In the future, hardware experiments will be conducted to validate the system on the robot platform in the real world and adaptation for other vehicles like legged robots would be taken into consideration.

References

  • [1] Zürn J, Burgard W, Valada A. Self-supervised visual terrain classification from unsupervised acoustic feature learning. IEEE Transactions on Robotics, 2020, 37(2): 466-481.
  • [2] Vulpi F, Milella A, Marani R, et al. Recurrent and convolutional neural networks for deep terrain classification by autonomous robots. Journal of Terramechanics, 2021, 96: 119-131.
  • [3] Maturana D, Chou P W, Uenoyama M, et al. Real-time semantic mapping for autonomous off-road navigation. Field and Service Robotics: Results of the 11th International Conference. Springer International Publishing, 2018: 335-350.
  • [4] Meng X, Cao Z, Liang S, et al. A terrain description method for traversability analysis based on elevation grid map. International Journal of Advanced Robotic Systems, 2018, 15: 1–12.
  • [5] Fankhauser P, Bloesch M, Hutter M. Probabilistic terrain mapping for mobile robots with uncertain localization. IEEE Robotics and Automation Letters, 2018, 3(4): 3019-3026.
  • [6] Hosseinpoor S, Torresen J, Mantelli M, et al. Traversability analysis by semantic terrain segmentation for mobile robots. 2021 IEEE 17th international conference on automation science and engineering (CASE). IEEE, 2021: 1407-1413.
  • [7] Dabbiru L, Sharma S, Goodin C, et al. Traversability mapping in off-road environment using semantic segmentation. Autonomous Systems: Sensors, Processing, and Security for Vehicles and Infrastructure 2021. SPIE, 2021, 11748: 78-83.
  • [8] Seo J, Sim S, Shim I. Learning Off-Road Terrain Traversability with Self-Supervisions Only. IEEE Robotics and Automation Letters, 2023.
  • [9] Frey J, Mattamala M, Chebrolu N, et al. Fast Traversability Estimation for Wild Visual Navigation. arXiv preprint arXiv:2305.08510, 2023.
  • [10] Zhao H, Ji X, Wei D, et al. Online IMU-odometer extrinsic calibration based on visual-inertial-odometer fusion for ground vehicles. 2022 IEEE 12th International Conference on Indoor Positioning and Indoor Navigation (IPIN). IEEE, 2022: 1-8.
  • [11] Morales E S, Botsch M, Huber B, et al. High precision indoor navigation for autonomous vehicles. 2019 International Conference on Indoor Positioning and Indoor Navigation (IPIN). IEEE, 2019: 1-8.
  • [12] Guan, Z. He, R. Song, D. Manocha and L. Zhang, TNS: Terrain traversability mapping and navigation system for autonomous excavators. Proceedings of Robotics: Science and Systems, 2022.
  • [13] Leung T H Y, Ignatyev D, Zolotas A. Hybrid terrain traversability analysis in off-road environments. 2022 8th International Conference on Automation, Robotics and Applications (ICARA). IEEE, 2022: 50-56.
  • [14] Fan D D, Agha-Mohammadi A A, Theodorou E A. Learning risk-aware costmaps for traversability in challenging environments. IEEE robotics and automation letters, 2021, 7(1): 279-286.
  • [15] Cai X, Everett M, Fink J, et al. Risk-aware off-road navigation via a learned speed distribution map. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022: 2931-2937.
  • [16] Sathyamoorthy A J, Weerakoon K, Guan T, et al. Terrapn: Unstructured terrain navigation using online self-supervised learning. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022: 7197-7204.
  • [17] Waibel G G, Löw T, Nass M, et al. How rough is the path? Terrain traversability estimation for local and global path planning. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16462-16473.
  • [18] Seo J, Kim T, Kwak K, et al. Scate: A scalable framework for self-supervised traversability estimation in unstructured environments. IEEE Robotics and Automation Letters, 2023, 8(2): 888-895.
  • [19] Yao X, Zhang J, Oh J. Rca: Ride comfort-aware visual navigation via self-supervised learning. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022: 7847-7852.
  • [20] Castro M G, Triest S, Wang W, et al. How does it feel? self-supervised costmap learning for off-road vehicle traversability. 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023: 931-938.
  • [21] Aranburu A. IMU Data Processing to Recognize Activities of Daily Living with a Smart Headset. University of California, Santa Cruz, 2018.
  • [22] Nirmal K, Sreejith A G, Mathew J, et al. Noise modeling and analysis of an IMU-based attitude sensor: improvement of performance by filtering and sensor fusion. Advances in optical and mechanical technologies for telescopes and instrumentation II. SPIE, 2016, 9912: 2138-2147.
  • [23] Majumdar A, Pavone M. How should a robot assess risk? towards an axiomatic theory of risk in robotics. Robotics Research: The 18th International Symposium ISRR. Springer International Publishing, 2020: 75-84.
  • [24] Poudel R P K, Liwicki S, Cipolla R. Fast-scnn: Fast semantic segmentation network. arXiv preprint arXiv:1902.04502, 2019.
  • [25] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2016: 770-778.
  • [26] Graves A, Graves A. Long short-term memory. Supervised sequence labelling with recurrent neural networks, 2012: 37-45.
  • [27] Sánchez M, Morales J, Martínez J L, et al. Automatically annotated dataset of a ground mobile robot in natural environments via gazebo simulations. Sensors, 2022, 22(15): 5599.
  • [28] Zhang J, Hu C, Chadha R G, et al. Falco: Fast likelihood‐based collision avoidance with extension to human‐guided navigation. Journal of Field Robotics, 2020, 37(8): 1300-1313.
Learning-based Traversability Costmap for Autonomous Off-road Navigation
††thanks: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.62273229 and smart city beidou spatial-temporal digital base const (2024)

References

Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6240

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.