In the past year, large vision language models (LVLMs) have become a prominent focus in artificial intelligence research. When prompted differently, these models show promising performance across various downstream tasks. However, there’s still significant potential for improvement in LVLMs’ image perception capabilities.
Enhanced perceptual abilities for visual concepts are crucial for advancing model development and…
