10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Authors - Chi-Hung Wang, Yu-Siang Siang, Yu-Hsuan Lin, Cheng-Hsien Lin Abstract - Aerial imagery is widely employed in intelligent transportation management and urban planning. However, dynamic objects often occlude critical information such as road signs and traffic markings, reducing the accuracy of image analysis and thereby affecting application reliability. Although traditional methods can partially address this issue, their high cost and low efficiency pose challenges in large-scale data processing. To overcome these limitations, this study proposes a background averaging technique based on real-time open-vocabulary object detection integrated with difference-based object detection using depth estimation. This approach enables zero-shot dynamic object removal, enhancing both processing efficiency and scalability. Experimental results demonstrate that our technique outperforms conventional methods across multiple performance metrics. Specifically, the multimodal framework combining depth-based differencing with the YOLO-world model achieves Precision, Recall, and F1-Score of 0.9062, 1.0000, and 0.9508, respectively. Furthermore, the Structural Similarity Index (SSIM) for background reconstruction reaches 0.9603, exceeding that of traditional YOLO models (SSIM = 0.9375). These findings indicate that our method not only effectively removes dynamic objects but also accurately restores background information, providing robust support for applications in intelligent transportation management and urban planning.