Edge AI for Real-Time Vision: Challenges, Limitations, and Future Research Directions
DOI:
10.26389/AJSRP.K091025Published:
2025-12-15Downloads
Abstract
This survey synthesizes methods that enable real-time computer vision on edge hardware under tight latency, energy, and memory constraints. We conducted a systematic review of 150+ studies (2018–2025) spanning lightweight CNNs and ViTs, compression (pruning, quantization, distillation), compiler/runtime optimization, hardware acceleration (NPUs/TPUs/FPGAs), and edge–cloud collaboration. Across comparable settings, INT8 quantization typically yields 2–4× higher throughput and 2–5× lower energy than FP32; representative mobile backbones (e.g., MobileNetV3-L) achieve millisecond-level latency with competitive accuracy on NPUs; and hybrid CNN–ViT models offer a ~15–20% better accuracy–latency balance than pure CNN or ViT baselines when compiler fusion is effective. We also document trade-offs where preprocessing/postprocessing can account for 20–60% of end-to-end time, and cases where compression underperforms due to operator support gaps. Our unique contribution is a unified taxonomy aligned to practical deployment choices (model class × optimization × hardware), plus prescriptive “when-to-use-what” recommendations for mobile, embedded, and micro-edge targets. Recommendations: prefer INT8 with hardware-supported ops; pair hybrids with fusion-aware toolchains; budget for pre/post-processing; and consider split inference for heavy workloads.
Keywords:
Edge AI real-time vision lightweight CNNs vision transformers quantization ViT hybrid pruning distillation compiler NPU TPU FPGA microcontrollerReferences
License
Copyright (c) 2025 The Arab Institute for Science and Research Publishing (AISRP)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.





