An Accurate Real-Time Object Tracking Method for Resource Constrained Devices
https://doi.org/10.15514/ISPRAS-2024-36(3)-20
Abstract
This paper addresses the challenge of single-object tracking on resource-constrained devices, a critical aspect for applications like autonomous drones and robotics. We propose an efficient real-time tracking system that leverages the strengths of transformer-based neural networks in combination with correlation filters. Our research makes several key contributions: first, we conduct a comprehensive analysis of existing object tracking algorithms, identifying their advantages and limitations in resource-constrained environments. Second, we develop a novel hybrid tracking system that seamlessly integrates both neural networks and traditional correlation filters. This hybrid system is designed with a switching mechanism based on perceptual hashing, which allows it to alternate between fast but less accurate correlation filters and slower but more accurate neural network-based algorithms. To validate our approach, we implement and test the system on the Jetson Orin platform, which is representative of edge computing devices commonly used in real-world applications. Our experimental results demonstrate that the proposed system can achieve significant improvements in tracking speed while maintaining high accuracy, thereby making it a viable solution for real-time object tracking on devices with limited computational resources. This work paves the way for more advanced and efficient tracking systems in environments where computational power and energy are at a premium.
Keywords
About the Authors
Armen SARDARYANArmenia
Received his B.Sci. degree in Applied Mathematics and Computer Science from St. Petersburg State University, Russia, in 2021. He obtained a M.Sc. degree in Intellectual Systems and Robotics from Russian-Armenian University, Armenia, in 2024. He is currently a PhD student in Mathematical Modeling, Numerical Methods and Program Complexes at the Russian-Armenian University, Armenia. He is also a researcher at the Center for Advanced Software Technologies (CAST). His research interests include UAVs, deep learning, computer vision and data analysis.
Vardan SAHAKYAN
Armenia
Researcher at the Center of Advanced Software Technologies (CAST) and a postgraduate student at Russian-Armenian University, specializing in mathematical and software support for computing systems. He holds a B.Sci. in informatics and applied mathematics from the National Polytechnic University of Armenia (2021) and an M.Sc. in intellectual systems and robotics from Russian-Armenian University (2023). His research focuses on UAVs, computer vision, and reinforcement learning.
Vahagn MELKONYAN
Armenia
Received his B.Sc. in Informatics and Applied Mathematics from the National Polytechnic University of Armenia, Armenia, in 2021. In 2023, he earned his M.Sc. degree in Intellectual Systems and Robotics from Russian-Armenian University, Armenia. He is currently pursuing a Ph.D. in Mathematical and Software Support for Computing Machines, Complexes, and Computer Networks at Russian-Armenian University, Armenia. He is also a researcher at the Center of Advanced Software Technologies (CAST). His research interests include UAVs, computer vision, and control algorithms.
Sevak SARGSYAN
Armenia
Received his B. Sci. and M. Sci. degrees in informatics and applied mathematics from Yerevan State University, Armenia, in 2010 and 2012, respectively. He later in 2016 obtained his Cand. Sci. (Phys.-Math.) degree in mathematical and software support for computing machines, complexes, and computer networks from the Ivannikov Institute for System Programming of the Russian Academy of Sciences. Presently he serves as the head of the system programming department at Russian-Armenian University, Armenia. His research interests include compiler technologies, software security, and software testing.
References
1. Kristan M. et al. The tenth visual object tracking VOT2022 challenge results // European Conference on Computer Vision. – Cham: Springer Nature Switzerland, 2022. – С. 431-460.
2. Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M., 2010. Visual object tracking using adaptive correlation filters. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2544-2550). IEEE.
3. Henriques, João F., et al. "High-speed tracking with kernelized correlation filters." IEEE transactions on pattern analysis and machine intelligence 37.3 (2014): 583-596.
4. Danelljan, M., Häger, G., Khan, F.S. and Felsberg, M., 2016. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), pp. 1561-1575.
5. Danelljan, M., Bhat, G., Shahbaz Khan, F. and Felsberg, M., 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V (pp. 472-488). Springer International Publishing.
6. Danelljan, M., Bhat, G., Shahbaz Khan, F. and Felsberg, M., 2017. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6638-6646).
7. Lukezic, A., Vojir, T., Zajc, L.C., Matas, J. and Kristan, M., 2018. Discriminative correlation filter tracker with channel and spatial reliability. International Journal of Computer Vision, 126(7), pp. 671-688.
8. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
9. Wang, N., Zhou, W., Wang, J. and Li, H., 2021. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1571-1580).
10. Yan B. et al. Learning spatio-temporal transformer for visual tracking // Proceedings of the IEEE/CVF international conference on computer vision. – 2021. – С. 10448-10457.
11. Chen, X., Yan, B., Zhu, X., Wang, D., Yang, X. and Lu, H., 2022. Efficient visual tracking via hierarchical cross-attention transformer. In European Conference on Computer Vision (pp. 461-477). Cham: Springer Nature Switzerland.
12. Cui, Y., Zhuang, B., Li, Y., Yuan, L., Wu, W. and Lin, L., 2022. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13608-13618).
13. Cui, Y., Zhuang, B., Li, Y., Yuan, L., Wu, W. and Lin, L., 2024. Mixformerv2: Efficient fully transformer tracking. Advances in Neural Information Processing Systems, 36.
14. Mao H. et al. PatchNet--Short-range Template Matching for Efficient Video Processing // arXiv preprint arXiv:2103.07371. – 2021.
15. Ji Q. et al. Real-time embedded object detection and tracking system in Zynq SoC // EURASIP Journal on Image and Video Processing. – 2021. – Т. 2021. – С. 1-16.
16. Liu W. et al. Ssd: Single shot multibox detector //Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. – Springer International Publishing, 2016. – С. 21-37.
17. Du L. Ho A. T. S., Cong R. Perceptual hashing for image authentication: A survey // Signal Processing: Image Communication. – 2020. – Т. 81. – С. 115713.
18. Mueller, Matthias et al. “A Benchmark and Simulator for UAV Tracking.” European Conference on Computer Vision (2016).
19. Zhu, P., Wen, L., Bian, X., Haibin, L. and Hu, Q., 2021. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), pp. 7380-7399.
20. Kristan, M., et al., 2023. The VOTS2023 Challenge Performance Measures.
21. Official implementation of the Hierarchical Cross-Attention Transformer (HCAT) tracker: https://github.com/chenxin-dlut/HCAT
22. MixFormerV2 implementation with CUDA, TensorRT and Onnx: https://github.com/maliangzhibi/MixformerV2-onnx
23. ViT tracker from OpenCV_zoo: https://github.com/opencv/opencv_zoo/tree/main/models/object_tracking_vittrack
Review
For citations:
SARDARYAN A., SAHAKYAN V., MELKONYAN V., SARGSYAN S. An Accurate Real-Time Object Tracking Method for Resource Constrained Devices. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(3):283-294. https://doi.org/10.15514/ISPRAS-2024-36(3)-20