SU-Net: pose estimation network for non-cooperative spacecraft on-orbit

The estimation of spacecraft pose is crucial in numerous space missions, including rendezvous and docking, debris removal, and on-orbit maintenance. Estimating the pose of space objects is significantly more challenging than that of objects on Earth, primarily due to the widely varying lighting conditions, low resolution, and limited amount of data available in space images. Our main proposal is a new deep learning neural network architecture, which can effectively extract orbiting spacecraft features from images captured by inverse synthetic aperture radar (ISAR) for pose estimation of non-cooperative on orbit spacecraft. Specifically, our model enhances spacecraft imaging by improving image contrast, reducing noise, and using transfer learning to mitigate data sparsity issues via a pre-trained model. To address sparse features in spacecraft imaging, we propose a dense residual U-Net network that employs dense residual block to reduce feature loss during downsampling. Additionally, we introduce a multi-head self-attention block to capture more global information and improve the model’s accuracy. The resulting tightly interlinked architecture, named as SU-Net, delivers strong performance gains on pose estimation by spacecraft ISAR imaging. Experimental results show that we achieve the state of the art results, and the absolute error of our model is 0.128\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}∘ to 0.4491\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}∘, the mean error is about 0.282\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}∘, and the standard deviation is about 0.065\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}∘. The code are released at https://github.com/Tombs98/SU-Net.


Introduction
With the progress of space technology and the diversified development of space tasks, space on-orbit service has become an important method to ensure the stable operation of space vehicles in the complex space environment. For such scenarios, The pose estimation of a non-cooperative spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, such as debris removal, target tracking and monitoring. An accurate pose estimation model can not only save a lot of time, but also help relevant personnel manner to obtain the pose data in time. It's of great significance for the research of a School of Artificial Intelligence, Beijing Normal University, Beijing 100000, China b China Aerodynamics Research and Development Centre, Mianyang 621000, China *Corresponding Author Hu Gao E-mail: gaoh@mail.bnu.edu.cn subsequent docking, maintenance and the determination of on-orbit spacecraft orbital decay drop falling space velocity, the flight pose angle as well as the angle between solar panels and two hulls. Moreover, for subsequent large spacecraft reentry process from rail, it provides a precondition for calculating aerodynamic.
Relative pose estimation is one of the most studied fields in many terrestrial pose estimation tasks, especially for indoor object [1]. However, pose estimation of space non-cooperative spacecraft on-orbit and pose estimation of objects on earth are two different tasks. Visual conditions in space are even more challenging than those on earth: due to the lack of an atmosphere, light cannot fully diffuse in space, resulting in extreme imaging contrast. In addition, due to the influence of technical equipment, the image resolution is relatively low [2]. Notably, the non-cooperative target has no cooperative mark, can not provide pose information, and usually in a rolling state, it brings great difficulty to the pose estimation of the space non-cooperative target.
For a known target, using stereoscopic camera to analyze the image of target satellite engine, and visual measurement tools are used to calculate its relative pose has gained interest druing the recent years [2][3][4][5][6][7]. However, this often depend on either simplified geometries, require a lot of formula equation solution process, need a long operation time. One example is establishing an axial analysis method based on morphological linear mechanism corrosion to extract features from low-resolution spacecraft images and run a matching scheme, such as RANSAC with PnP (Perspective-n-Point) for model evaluation or SoftPOSIT [8]. The results of feature extraction have a great impact on subsequent target pose estimation results. In this paper, we mainly study pose estimation of known but uncooperative on-orbit targets based on vision.
With the rapid development of artificial intelligence, the implementation of CNNs for pose estimation of non-cooperative spacecraft on-orbit has already become an attractive solution in recent years [2,4,7,[9][10][11][12][13]. The images of spacecraft monitored by radar or large-scale simulation is constructed a dataset, and use it to train and test different neural network. Therefore, the network model has certain fitting ability, and it is applied to the real scene to predict the pose information of the spacecraft on-orbit. Studies have shown that CNNs over standard feature-based method for pose estimation and reduction in the computational complexity. Since most of the pose estimation of non-cooperative spacecraft on-orbit based on deep learning is to use simulation data to train the model. Although the test effect is considerable, there are still many problems when it is applied to the on-orbit image of spacecraft monitored by real radar.
As shown in Fig. 1a 1b, the real radar imaging is not clear, the resolution is low, and the spacecraft contour boundary is not obvious, and there are differences between the imaging, which brings a lot of problems to pose estimation [14,15]. In this paper, we analyzed that the radar imaging resolution of spacecraft on-orbit is low, the imaging boundary is fuzzy, the spacecraft sailboard sometimes can not image complete, but the image semantics is relatively simple, the structure is relatively fixed and so on. We propose a new deep learning neural Network structure named Dense Residual U-shaped Network (DR-U-Net) to extract image features. And we further introduce a novel neural network based on DR-U-Net, namely Spacecraft U-shaped Network (SU-Net) to achieve pose estimation for non-cooperative spacecraft on-orbit. To the best of the authors' knowledge, this paper is the first to propose a feature extraction network for spacecraft radar imaging characteristics, and form a completed attitude estimation neural network. The contributions of this paper are summarized as follows: 2 Related Work

Pose Estimation
In the research work of pose estimation of non-cooperative spacecraft on-orbit. The experimental service satellite of the German Aerospace Center uses stereoscopic cameras for imaging analysis of the target satellite engine and visual measurement tools to calculate its relative pose [16]. [17] et al used texture feature extraction method to fit the parameter information of the butt ring for relative pose measurement. The pose estimation of non-cooperative target usually adopts model-based algorithm, and the pose information is obtained by matching the model wire frame with the target edge [9,[18][19][20]. However, these methods are difficult to deal with complex space background, illumination and occlusion due to the artificial design of geometric features. In addition to edge information, other feature information can also be used to assist pose estimation, and will improve the effect. Deep learning can ignore background information while learning complex features while it has been very mature in the fields of target recognition and target detection, but it was not until [21] that the task was first transferred from target recognition to pose estimation by means of pre-training model sharing weight through transfer learning.
Deep convolutional neural network can extract features from the input image without manual design, and map the input image end-to-end to pose data. [11] used AlexNet as the main dry network, discretized the continuous pose angle into pose label, transformed the regression problem into a classification problem, and estimated the pose classification of the target spacecraft. [12] adopted the transfer learning method, based on the shared weight method, used the pre-training model to initialize the model weight, and used the residual network as the main dry network to regression the attitude angle of the target spacecraft, but there were outliers in the test results. [2,22] used VGG as a pre-training model, and then make improvements in the last layer. [13] combines convolutional neural networks for feature detection with covariant high efficiency solvers and extended kalman filters to achieve pose estimation for non-cooperative spacecraft. [23] used AlexNet and ResNet as the backbone network, limited the pose estimation to a certain range by the classification method and then the regression method is used to futher fine tune the pose. However, there is an inherent error in pose estimation as a classification problem. In this paper, we directly solve the pose estimation task as a regression problem to avoid the inherent error in classification. Different from the previous researchers using transfer learning and only fine-tuning in the last layer, we add the image pre-processing process and propose an image feature extraction network which named DR-U-Net. And then a network structure namely SU-Net for pose estimation of non-cooperative spacecraft on-orbit is integrated.

U-Net
U-net network model is proposed for semantic segmentation task. In this network structure, it includes a contraction path that captures a context information and a symmetric expansion path that allows precise localization [24]. There are only convolution layer and pooling layer in the contraction path and extension path, but no full connection layer. In the network, the shallow high resolution layer is used to solve the problem of pixel positioning, and the deep layer is used to solve the problem of pixel classification, so that image semantic level segmentation can be realized. In order to improve the effect of the network model, subsequent researchers improved the U-Net network model and put forward the [25][26][27][28] et. U-Net network is currently mainly used in the field of semantic segmentation of medical images, there are no scholars have introduced U-Net network architecture for pose estimation of non-cooperative spacecraft on-orbit. We compared and analyzed the characteristics of radar images of non-cooperative spacecraft on-orbit and medical images, such as simple semantics, relatively fixed structure and unclear boundary contour. In this paper, we proposed DR-U-Net based on U-Net to extract image features. What's more we proposed SU-Net to achieve pose estimation for noncooperative spacecraft on-orbit. [29] introcuce the dense convolutional network, which connects each layer to every other layer in a feedforward fashion. The dense connection alleviate the vanishing-gradient problem, strengthen feature propagation and encourage feature reuse. Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision [30].In order to solve the problem that the network is too deep to train, [31] proposed the residual network. Then [32][33][34][35][36] proposed different residual network structures according to different tasks, [34] combined the residual connection and dense connection. In this paper, we use dense connections to connect the first layer of the contraction path directly to the other layers. And then in addition to the long-distance residual connections of the contraction path and expansion path at each level, there are local residual connections between the convolution at each level. Residual connections are also added between image preprocessing, pretraining and DR-U-Net modules.

Model
In this section, we will introduce the SU-Net network structure proposed in this paper for pose estimation of non-cooperative spacecraft on-orbit. The overall architecture of SU-Net is shown in Fig. 2, including (1)The radar imaging of non-cooperative spacecraft on-orbit is transformed into a tensor x .(2) Image preprocessing, where obtained the tensor x p . (3) Res-Net pre-training Model, where obtained the tensor x r . (4) DR-U-Net, where obtained the feature tensor y . (5) Feedforward neural network, where obtained the final pose prediction valueŷ. It is worth noting that spacecraft radar imaging is not clear and the transformed tensors x are sparse. In this paper, in order to reduce feature loss in image preprocessing and pre-training, residual connection is introduced in these two positions to avoid feature loss and model degradation.

Raw image preprocessing
The original image of radar imaging of non-cooperative spacecraft on-orbit has low resolution and fuzzy imaging boundary, and the convolutional neural network has better performance for the image input after preprocessing. In this paper, we use the contrast limited adaptive histogram equalization (CLAHE) algorithm as the image preprocessing step. As shown in Fig. 3. First, the image is filled in blocks and the histogram array of each block is calculated. Then, the mapping relationship of each histogram is calculated by contrast limited. Finally, the interpolation method is used to get the final enhanced image. Through these steps, we can effectively limit the noise amplification, enhance the image boundary contour. For each radar imaging of non-cooperative spacecraft on-orbit, we first set its size to 256 × 256, and then perform CLAHE processing on the original radar imaging feature tensor x of non-cooperative spacecraft to obtain the pre-processed tensor x p , as shown in Eq. 1. x

Pretraining model
In the past, most researchers only adopted the transfer learning method, such as ResNet [31], AlexNet [37], VGG [38] and other pre-training models, to initialize model parameters, and then fine-tune the last layer of the model. These pre-training models are all based on ImgaeNet dataset [39]. Although ImageNet data sets are quite different from radar imaging of non-cooperative on-orbit spacecraft, they are all composed of color images of a major object. Transfer learning can provide a good initialization to speed up model training. And [40] indicates that transfer learning can still improve performance after significantly adjusting the original weights through training on the new dataset. Similar to [2,9,13,22,23], in our paper adopts the way of transfer learning, carrier on the pretraining first. However, in order to reduce the loss characteristics of image preprocessing, The original radar imaging feature tensor x of the spacecraft was added to the pre-processed feature tensor x p , which was input into the ResNet34 pre-training model to obtain the pre-trained tensor x r , as shown in Eq. 2.

DR-U-Net
The overall model structure of DR-U-Net proposed in this paper is shown in Fig. 4. Similar to the original U-Net model, the proposed model consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3 × 3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling operation with stride 2 for downsampling. Every step in the expansive path consists of an upsamping of the feature map followed by 2 × 2 convolution, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 × 3 convolutions, each followed by a ReLU [24]. Finally, we add the original image tensor, the pre-processed image tensor and the pre-trained tensor, and input it into the DR-U-Net network to obtain a one-dimensional feature vector y , as shown in Eq. 3.
Where x f = x+x p +x r , DR(•) represents the DR-U-Net network proposed in this paper. All the parameters of the model are given in Fig. 4 While beyond the original U-Net architecture, we made several significant improvements over the original architecture by adding the dense connection and residual connection scheme: 2. Residual connection : [31] proposed that adding residual connections can increase the depth and improve the accuracy of convolutional neural network. Inspired by this work, and analyzing the low resolution and sparse features of spacecraft imaging, we also added residual connections to the model. Shown in solid line as shown in Fig. 4 gray, in addition to the path of every level contraction and expansion of long residual connection, between each layer of convolution has partial residual connection, this helps to get a smooth curve loss, also helps to avoid too much due to the network layer, feature after many pooling loss serious and gradient disappeared and explosion. As shown in Fig. 5. For each convolution block, the residual connection is shown in Eq. 5 Where F (•) includes two convolution operations and a maximum pooling operation or an upsampling operation, x is an identity mapping.

Feedforward eeural network
Finally, As shown in Eq. 6, the feature vector y obtained from the radar imaging of non-cooperative spacecraft on-orbit after image preprocessing, pre-training and DR-U-Net network model is put into a feedforward neural network for pose estimation. Then the pose prediction valueŷ is obtained.

Loss function
In order to train the model proposed in this paper, we use three different loss functions, L1 loss (MAE) L2 loss (MSE) and Huber loss respectively. Among them, L1 loss, as shown in Eq. 7, is the average absolute difference between the target value and the predicted value. It measures the average margin of error in a group of predictions, regardless of their direction.
L2 loss is the most commonly used regression loss function, which refers to the sum of the square distance between the target value and the predicted value, as shown in Eq. 8.
For the radar imaging of non-cooperative spacecraft on-orbit, due to the limitation of radar imaging equipment, there are individual outliers in the training data. The model of MSE loss attaches more importance to outliers than MAE loss, which minimizes the case of single outliers at the expense of other common examples, which will reduce its overall performance The MAE loss gradient is always the same, which means that if the loss value is small, the gradient will be large.
However, huber loss function is less sensitive to outliers in the data than square error loss and differentiable at 0, as shown in Eq. 9. It is basically an absolute error and becomes mean square error when the error is very small. The value of this error depends on the hyperparameter δ, which is set as 1 in this paper.

Experiment
We evaluate the effectiveness of SU-Net model which proposed in this paper. To prove the calculation accuracy outperforms the state-of-the-art pose estimation methods, we conducted comparative experments with [2,12,13,22,23]. When considering the effectiveness of each part, we did an ablation experiment. Finally, we explored the influence of different loss functions on the accuracy of pose estimation.

Setup
Datasets. The dataset in our paper are the real radar imaging data of a certain type of non-cooperative spacecraft on-orbit with 17,744 images. And 80% of the dataset are used as training sets, 20% as test sets. The data sample distribution is shown in Fig. 6. The pose angles of all images are distributed within 0 • ∼ 35 • , especially in 5 • ∼ 10 • .
Training & Metrics. We train all models, using Adam with β 1 = 0.9, β 2 = 0.999, learning-rate is 0.00001, a batch size of 16 and apply a linear learning rate warmup and decay. We report the accuracy of pose estimation on our dataset either through minimun absolute error or mean absolute error. Notably, the minimun absolute error is the minimum between target value and prediction value of the model in predicting a batch of poses, and the mean absolute error is the average error of the model in predicting a batch of poses. We also calculated the standard deviation to measure the dispersion of a set of forecasting errors.  Table. 1 shows the effect comparison between the SU-Net model proposed in this paper and state of the art pose estimation models of spacecraft on-orbit. Experimental results shows our method obtains considerable gains over the state-of-the-art approaches. i.e., in the method proposed by [23], he trained and fixed all layers of AlexNet and ResNet and fine-tuning the top layer respectively. The minimum absolute error of SU-Net proposed in this paper is about 0.7521 • less than the AlexNet and about 0.3782 • less than the ResNet. The mean absolute error decreases by 0.6889 • for AlexNet and 0.3209 • for ResNet. Note that the standard deviation of SU-Net is 0.031 • higher than the AlexNet and 0.04 • higher than the ResNet. However due to the small magnitude of standard deviation itself, and as shown in Fig. 7, there is no obvious outlier in the result of SU-Net pose estimation, the overall effect of SU-Net is better than the ResNet and AlexNet. Compare to [2,13,22], they all use the existing mature convolutional neural network model. Although they use simulation data to achieve considerable results, they do not consider real radar imaging characteristics. When using real data, the effect becomes worse, and the absolute error is relatively high compared with the method proposed in this paper.
As shown in Fig. 8, we visualization the feature map extracted by our model and [12,13,23]. From Fig. 8a, we can conclude the our model can completely extract radar imaging features. Fig. 8b shows although more complete features can be extracted, sometimes there will be large noise. Fig. 8c and Fig. 8d shows the extraction feature is incomplete or the noise is too large.

Ablation experiments
Here we present ablation experiments to analyze the contribution of each component of our model, and the results are shown in Table. 2. For the sake of illustration, we use residual connection 1 represents a residual connection outside DR-U-Net, and residual connection 2 is the inside.
Pre-training. As shown in Table. 2, when the pre-training module is removed, the minimum absolute error is increased by 0.0445 • , and the average absolute error is increased by 0.05 • . In this paper, although ImageNet dataset is very different from radar imaging of non-cooperative spacecraft on-orbit, ImageNet includes a large number of different types of data, which have a variety of shallow and deep features. ResNet pre-training model trained on ImageNet dataset can well extract these features, therefore, the ResNet pre-trained model is used as a feature extraction device. In this way, random initialization parameters are replaced by initialization parameters with certain theoretical basis, so as to improve the overall performance.
Residual connection 2. When the residual connection in DR-U-Net is removed, the minimum absolute error and average absolute error are significantly improved. This is because the dataset used for model training is small, and after deep neural network model training, it is easy to have severe feature loss after multiple pooling, which leads to model degradation and even gradient explosion. After introducing residual connection, the model output is expressed as a linear superposition of the input and a nonlinear transformation of the input, which reduces the complexity of the model, reduces the overfitting phenomenon, and improves the performance of the model.  Figure (b) is [23] use the ResNet. Figure (c) is [13]. Figure  (d) is [12]. Figure (e) is the image which input the model. Dense connection. When the dense connections in DR-U-Net are removed, the minimum absolute error is increased by 0.046 • and the average absolute error is increased by 0.0244 • . This shows that the proposed DR-U-Net after using densely connected network, each layer of all layers of input before concatenate to get the feature map after passed all the layers, and each layer can be directly using the loss function of the gradient and the beginning of the input information, strengthen the feature of the transmission, there is no need to relearn redundant feature maps, which makes more effective use of imaging features and reduces the number of parameters.
Only residual connections in the DR-U-Net. The experiment proves that the minimum absolute error and the average absolute error are improved, which proves that the residual connection outside DR-U-Net reduces the feature loss during image preprocessing module and pre-training module, and improves the model effect.
Only the raw U-Net network. The experimental results show that the minimum absolute error and average absolute error are significantly improved, which proves the effectiveness of the SU-Net network model proposed based on U-Net for pose estimation by analyzing the characteristics of spacecraft radar imaging.

The influence of loss functions
Finally, in this paper we explores the influence of different loss functions on the final results, as shown in Table 3. When huber loss function is used, the final minimum absolute error and mean absolute error are smaller than MAE loss function and MSE loss function. However, when MAE loss function is used, the standard deviation is the lowest, because MAE is not sensitive to outliers and will not minimize single outliers at the expense of other common examples, and the standard deviation of the prediction error of the final model will be relatively lower than other ones.

Conclusion
We have explored the application deep learning to pose estimation of non-cooperative spacecraft on-orbit. Unlike prior works using transfer learning and only fine-tuning in the last layer, in this paper, we analyed the on-orbit spacecraft to a low resolution of radar imaging, fuzzy boundaries of the spacecraft, panels sometimes cannot imaging is complete, but the image semantic characteristics is relatively fixed. We add the image preprocessing process and propose an image feature extraction network which named DR-U-Net. And then a network structure namely SU-Net for pose estimation of non-cooperative spacecraft on-orbit is integrated. In this model: (1) by limiting the contrast histogram adaptive algorithm, the image noise is removed and the image boundary contour is enhanced; (2) similar to the previous pose estimation work, the method of transfer learning is adopted, and the performance can be improved after the original weights are significantly adjusted by training on the new data set; (3) the dense connection and residual connection are introduced to reduce the loss of image features and obtain more semantic representation. Experimental results show that our method achieves state-of-the-art effect compared with the currently known pose estimation of space objects based on deep learning.

Declarations
Funding