Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection

1King Abdullah University of Science and Technology
2NEC Laboratories China

Abstract

Medical imaging often contains critical fine-grained features, such as tumors or hemorrhages, crucial for diagnosis yet potentially too subtle for detection with conventional methods. In this paper, we introduce DIA, dissolving is amplifying. DIA is a fine-grained anomaly detection framework for medical images. We describe two novel components in the paper. First, we introduce dissolving transformations. Our main observation is that generative diffusion models are feature-aware and applying them to medical images in a certain manner can remove or diminish fine-grained discriminative features such as tumors or hemorrhaging. Second, we introduce an amplifying framework based on contrastive learning to learn a semantically meaningful representation of medical images in a self-supervised manner. The amplifying framework contrasts additional pairs of images with and without dissolving transformations applied and thereby boosts the learning of fine-grained feature representations. DIA significantly improves the medical anomaly detection performance with around 18.40% AUC boost against the baseline method and achieves an overall SOTA against other benchmark methods.

The Dissolving Effects

The dissolving transformations are able to remove or deemphasize fine-grained discriminative features by using data-specific diffusion models. Essentially, it is a single reverse diffusion process of the diffusion models trained on a data-specific dataset.

Desirable: gradual dissolving of semantic features (ours)


With input images from a non-data-specific domain, the diffusion model fails to interpret the correct instance-specific fine details and, therefore, fails to remove the correct features inside the image.

Undesirable: incorrect semantics, images change into PneumoniaMNIST domain (with a diffusion model trained on PneumoniaMNIST)

Undesirable: incorrect semantics, images change into CIFAR10 domain (with a diffusion model trained on CIFAR10)

Undesirable: incorrect semantics, with many detailed artefacts (with Stable Diffusion)

Feature Learning By Contrastive Learning

DIA learns representations that can distinguish fine-grained discriminative features in medical images. First, DIA employs a dissolving strategy based on dissolving transformations. The dissolving transformations are able to remove or deemphasize fine-grained discriminative features. Second, DIA uses the amplifying framework to contrast images that have been transformed with and without dissolving transformations. We use the term amplifying framework as it amplifies the representation of fine-grained discriminative features.

Results