Adverse weather conditions, like rain, fog, and snow, drastically reduce visibility and degrade image quality, posing a significant challenge for computer vision algorithms in autonomous navigation and surveillance systems. In our project, we explored the integration of Spatial Information Fusion using Depth Maps within the state-of-the-art de-weathering model Transweather for image restoration tasks. We also investigated the how de-weathering can improve the performance of object detection tasks.
Figure in Original Paper
Removing adverse weather conditions from images is crucial for various applications, and traditionally, methods have targeted only one type of weather condition. All-in-One was developed to tackle multiple conditions simultaneously but it uses multiple encoders and is bulky. TransWeather is a recently model which only requires a single encoder and decoder and achieved SOTA performance. Therefore we chose transweather as our baseline.
For baseline implementation, we use the code from the original transweather repo. We made the following changes to the code:
Different weather conditions like rain, haze, and snow affect how we see things because of particles in the air: water drops for rain, aerosols for haze, and snowflakes for snow. These particles scatter light in different ways, which can distort images captured by cameras.
The atmosphere scattering model explains this by showing how light reaches the camera: some light travels directly, while other light is scattered by particles in the air. And its worth noting that this decay is exponential to the depth of an object in scene. Therefore, we think using depth information can improve the model's performance on the de-weathering task.
For our initial experiment, we augmented the input image with depth information by incorporating it as a fourth channel. This depth information was sourced from a depth estimation model called Marigold. We executed the Marigold model on all training images—16263 in total. As a diffusion-based model, Marigold operates slowly, requiring approximately 18 hours to process these images on a V100 AWS machine.
As we can see from the above image, the depth map generated by Marigold is mainly accurate. It even erased some artifacts caused by the rain. Therefore, we hope that by adding this depth information to the input image, the model can better understand the scene and improve the de-weathering performance.
As a second step, we preprocessed the input images by integrating the depth data into the fourth channel during data loading. The architecture of our model was adapted to accommodate four channels instead of the standard three. We trained this modified model using the same hyperparameters as those used in our baseline study.
However, the results of this experiment did not meet our expectations. Both training and validation performances deteriorated, with significant overfitting observed. It appears that the model might perceive the implicitly added fourth channel—intended to enhance learning—as noise, which does not contribute to improving performance.
We also tried finetuning the model for longer epochs, but the results only improved slightly. All these showed that the implicit depth information does not help the model to improve its performance.
For the second experiment, we incorporated depth information into the input images explicitly by adding a depth loss that specifically guides the model to consider this information. The primary concept involves performing a depth estimation on both the predicted image and the corresponding ground truth image during each iteration where backpropagation occurs. We then calculate the depth loss and add it to the overall loss.
In our second experiment, we encountered significant engineering challenges, primarily due to compatibility issues between the models and environments. Initially, we utilized the Marigold model for depth estimation(which require different setup with experimetn 1). However, this model operates on Python 3.10 and Torch 2.0.1, incompatible with TransWeather's Python 3.6 and Torch 1.7.1 setup. Additionally, ensuring support for the required CUDA version further complicated the setup.
After unsuccessful attempts to create a compatible environment for both models, we explored setting up Marigold as an API endpoint. However, the inherent latency of Marigold as a diffusion model, combined with the communication delay from API calls, rendered the 200 epochs training impractical in terms of time.
We then tested MiDas, a CNN-based depth estimation model faster than Marigold but still incompatible with the TransWeather environment. Our final attempt involved ADDS-DepthNet, a model compatible with our existing setup and recommended for implementing depth loss. Successfully integrating ADDS-DepthNet, we modified the TransWeather codebase to include the model loading, inference logic, and implementation of ADDS-DepthNet.
This integration allowed us to add a depth loss to the training process, albeit with modifications to the batch size and loss parameters due to GPU limitations. The batch size was reduced to 16, and the depth loss factor was set at 0.01, in contrast to the perceptual loss factor of 0.04. The revised setup extended the training duration for 200 epochs to approximately 150 hours on a T4 AWS machine.
The results of this new method is much better than our first experiment. However, its still did not surpass the original baseline. The training curve is almost identical, but the validation curve is slightly lower (~0.01). Combined with our ablation study, we believe this is due to the model already capturing the depth information implicitly by learning the de-weathering task itself (especially the de-fog task). Therefore, the explicit depth information does not provide much additional information to the model. This can also be seen from the qualitative results below.
Potential ablation studies for our experiments include varying the depth loss factor, assessing the impact of depth information across different dataset scales, and altering training parameters such as learning rate, batch size, and number of epochs. However, given the constraints of time and budget, we selected the following ablations for execution:
We tested how the depth loss factor affects the model's performance by varying the factor. The results can be seen in the image below. Unfortunately, we found that depth loss have little effect on the model's performance. The training curve is similar to the baseline, with minimal fluctuations in the validations.
The deweathering process not only aligns real-world images more closely with training data but also reduces false positives and negatives caused by weather distortions. Therefore, we want to investigate how deweathering improves the performance object detection task.
Following our previous pipeline, we verified and analyzed Transweather's performance on both “All-Weather” dataset (dataset used in the paper, with many weather types) and BDD100k dataset(A Diverse Driving Dataset, with detection ground truth).
For detection task, we used the “All-weather” dataset to train the model, the quantitative results are compared in the last section.
We also use images from the BDD100k dataset to train the model, and the performance is also pretty good. Train_PSNR: 31.88, Val_PSNR: 29.34, Val_SSIM: 0.9424
BDD100K is a large driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. For object detection, BDD100k contains 70,000/10,000/20,000 images for train/val/test, 1.8M objects.
We first used the “depth anything” method 2 to generate depth map for the 70,000 training data, and then generate their corresponding foggy images using physics based vision method: the intensity E of a scene point in bad weather, as recorded by a monochrome camera : $$E=R e^{-\beta d}+E_{\infty}\left(1-e^{-\beta d}\right)$$ $E_{\infty}$ is the sky brightness, $R$ is radiance of the scene point on a clear day, $\beta$ is the scattering coefficient of the atmosphere and $d$ is the depth of the scene point.
Then we use these paired ground truth and foggy images to train the Transweather model.
Then we implement object detection on it. We use the Faster R-CNN detector3, which is pretrained on the COCO dataset to do the detection.
For the “All-weather” dataset, before deweather, it either makes false detections or misses some objects. But after deweather, we can detect more accurately and detect more objects.
Then comes to the BDD100k dataset, we chose 1345 images from the val dataset, and used the above method to generate foggy images. We used these foggy images for deweather inference and then applied our detector on them. We can see that our detector can detect more objects. Here are some quantitative results. mAP increases after deweather, and since BDD100k is a driving dataset, we also record the average precision for the car category, and it also increases.