Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Contribution 세미나

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

PaperGPT 2024. 4. 5. 11:02

기존 Mask2Former에 detection head를 적용하면 성능이 하락

기존 DINO에 segmentation head를 적용하면 성능이 하락

Detection과 segmentation을 동시에 잘할 수 있는 알고리즘 만들기

DINO에 mask2former 컨셉을 적용

Unified query selection for mask:

The classification score of each token is considered as the confidence to select top-ranked features and feed them to the decoder as content queries. The selected features also regress boxes and dot-product with the high-resolution feature map to predict masks. The predicted boxes and masks will be supervised by the ground truth and are considered as initial anchors for the decoder.

Mask-enhanced anchor box initialization:

We derive boxes from the predicted masks as better anchor box initialization for the decoder

Unified denoising for mask:

we can treat boxes as a noised version of masks, and train the model to predict masks given boxes as a denoising task. The given boxes for mask prediction are also randomly noised for more efficient mask denoising training

Hybrid matching: