Contrastive Unpaired MRI Harmonization


In large-scale neuroimaging research, images often come from various institutions. The differences in image acquisition protocols can lead to unwanted variations in the images, which can adversely affect the performance of machine learning models and other statistical analyses. Harmonization is a technique that aims to reduce the unwanted variations in the images, while ideally preserving bioligical variability.

Common deep learning-based harmonization methods use cycle-GANs to learn a mapping from a source scanner to a target scanner (and back) [1,2,3]. While CycleGAN is a sensible approach, other image-to-image translation methods have outperformed CycleGAN and may be more suitable for MRI harmonization. For example, a contrastive learning-based method called Contrastive Unpaired Translation (CUT) [4] is a promising alternative. CUT generates images by aligning encoded patch representations from the source and translated images, along with a GAN discriminator for adversarial training. This method outperforms CycleGAN and does not require a cycle-consistency loss. It also only needs a single generator, as a referse mapping is not required. To my knowledge, CUT has not been used for harmonization before.


I used two main datasets for harmonization: the Alzheimer's Disease Neuroimaging Initiative Phase 1 (ADNI-1) [5], and the Information Extraction from Images (IXI) Dataset[6]. Both have adult T1-weighted brain images. ADNI subjects are a mix of cognitively normal, mild cognitive impairment, and Alzheimer's Disease; IXI subjects are all healthy.


I did some basic pre-processing on the images. First, I run some pre-processing steps including motion correction, intensity normalization, and skull stripping using a software package called FreeSurfer [7]. Next, I register all images to the MNI152 template using Ants software. Finally, I extract 2D slices along the axial plane. Due to limited computational resources and time, I constrained this project to just use 2D images. The 2D slices were padded 256x256 pixels, with a 1mm x 1mm resolution.


I ran two sets of harmonization experiments. In the first experiment, I harmonized all ADNI-1 scans to a single reference scanner. This reference scanner had only 72 scans, including 30 scans where subjects were scanned on a different scanner on the same day (traveling subjects). Thus, some degree of ground-truth validation was available for these.

In the next experiment, I harmonized all ADNI-1 scans to all IXI scans. This was a more challenging task to evaluate quantitatively, as there was no ground-truth validation available. However, these inter-dataset differences are much more visible to the human eye, as ADNI-1 intra-dataset inter-scanner differences are minimized by the study design.

I repeated both exeriments using CycleGAN for comparison. Most default settings (learning rate, augmentations, architecture, etc.) were used for both sets of experiments.

ADNI Experiment

I show a single example for the ADNI experiment below. Differences in these images are very difficult to determine with the naked eye. However, it is still evident to me that CUT did not perform adequate harmonization. While the target scans generally have a less intense white matter while still having a better grey matter to white matter contrast, CUT was not able to replicate this trend. The CUT images actually have a greater intensity overall, and grey matter to white matter contrast does not seem improved. The arrow may be a good reference point to look at. Here, the CycleGAN image looks similar to the source image while CUT seems to bring the image intensity even further away from the target image.


I also show a quantitative measure of harmonization for this task. Specifically, I take the mean average error of intensity values between the source and target images. Strangely, CUT performs worse than just using the original source image.


ADNI to IXI Experiment

In this experiment, there is no ground truth traveling subjects. However, we can inspect image quality of harmonized images and compare it with example images from each dataset. Here are a few example of adni images:


And here are a few examples of IXI images:


There are clearly some differences in image quality and contrast.

Here are a few examples of harmonized images (i.e. from source dataset ADNI to target dataset IXI) using CycleGAN and CUT.

adni_to_ixi_0 adni_to_ixi_1 adni_to_ixi_2 adni_to_ixi_25

Both methods seem to decently retain the structure of the grey and white matter, but CycleGAN seems to drastically reduce the presence of ventricles (i.e. black structure in the middle of the brain). CUT retains the ventricles better than CycleGAN. Image quality appears to be slightly better with CUT, but again this is difficult to quantify. Unfortunately, with medical images we do not have measures like FID for image quality.


Overall, the results for using CUT harmonization were underwhelming for the first experiment. CUT was not able to effectively harmonize intra-dataset images from different scanners. However, CUT did seem to perform better than CycleGAN for the second experiment, where I harmonized images from two different datasets. This is a less challenging task, as the inter-dataset differences are much more visible to the human eye. However, it is difficult to quantify the performance of CUT for this task, as there is no ground truth available. Overall, this is a very preliminary project on CUT for harmonization; there is much more room for additional experiments (i.e. hyperparameter tuning, different datasets, etc.) and further evaluation.