Object segmentation across images and videos is a complex yet pivotal task. Traditionally, this field has witnessed a siloed progression, with different tasks such as referring image segmentation (RIS), few-shot image segmentation (FSS), referring video object segmentation (RVOS), and video object segmentation (VOS) evolving independently. This disjointed development resulted in inefficiencies and an inability to…
