Co-ordinate-based positional embedding that captures resolution to enhance transformer’s performance in medical image analysis

Das BK, Zhao G, Islam S, Re TJ, Comaniciu D, Maier A, Gibson E (2024)

Publication Type: Journal article

Publication year: 2024

Journal

Scientific Reports Nature Publishing Group: Open Access Journals - Option B

Book Volume: 14

Article Number: 9380

Issue: 1

DOI: 10.1038/s41598-024-59813-x

Abstract

Vision transformers (ViTs) have revolutionized computer vision by employing self-attention instead of convolutional neural networks and demonstrated success due to their ability to capture global dependencies and remove spatial biases of locality. In medical imaging, where input data may differ in size and resolution, existing architectures require resampling or resizing during pre-processing, leading to potential spatial resolution loss and information degradation. This study proposes a co-ordinate-based embedding that encodes the geometry of medical images, capturing physical co-ordinate and resolution information without the need for resampling or resizing. The effectiveness of the proposed embedding is demonstrated through experiments with UNETR and SwinUNETR models for infarct segmentation on MRI dataset with AxTrace and AxADC contrasts. The dataset consists of 1142 training, 133 validation and 143 test subjects. Both models with the addition of co-ordinate based positional embedding achieved substantial improvements in mean Dice score by 6.5% and 7.6%. The proposed embedding showcased a statistically significant advantage p-value< 0.0001 over alternative approaches. In conclusion, the proposed co-ordinate-based pixel-wise positional embedding method offers a promising solution for Transformer-based models in medical image analysis. It effectively leverages physical co-ordinate information to enhance performance without compromising spatial resolution and provides a foundation for future advancements in positional embedding techniques for medical applications.

Authors with CRIS profile

Badhan Kumar Das Lehrstuhl für Informatik 5 (Mustererkennung) Saahil Islam Lehrstuhl für Informatik 5 (Mustererkennung) Andreas Maier Lehrstuhl für Informatik 5 (Mustererkennung)

Involved external institutions

Siemens Healthineers

United States (USA) (US)

How to cite

APA:

Das, B.K., Zhao, G., Islam, S., Re, T.J., Comaniciu, D., Maier, A., & Gibson, E. (2024). Co-ordinate-based positional embedding that captures resolution to enhance transformer’s performance in medical image analysis. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-59813-x

MLA:

Das, Badhan Kumar, et al. "Co-ordinate-based positional embedding that captures resolution to enhance transformer’s performance in medical image analysis." Scientific Reports 14 (2024).

BibTeX: Download