Large Language Models (LLMs) have demonstrated remarkable potential in performing complex tasks by building intelligent agents. As individuals increasingly engage with the digital world, these models serve as virtual embodied interfaces for a wide range of daily activities. The emerging field of GUI automation aims to develop intelligent agents that can significantly streamline human workflows…
