Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task
https://doi.org/10.15514/ISPRAS-2023-35(6)-16
Abstract
This paper proposes the implementation of a neural network training approach for object detection, using augmentation - style transfer. This method improves the generalization ability of the neural network to determine the location of objects in the image by improving the interaction with low-level features such as textures, different colors and small changes in shapes. The effectiveness of the method is experimentally proved and the numerical values of the object detection metrics are demonstrated on several datasets with different classes. The application of augmentation is proposed using an unused before neural network architecture capable of carrying an arbitrary number of styles. The peculiarity of the approach is also that the weights of the neural network for styling are frozen and it is added to the graph of the detection network, which allows augmentation speed.
About the Authors
Denis Konstantinovich KARACHEVRussian Federation
Senior Data Scientist. Research interests: computer vision.
Sergey Evgenievich SHTEKHIN
Russian Federation
Team Lead Data Scientist. Research interests: computer vision.
Vladimir Sergeevich TARASYAN
Russian Federation
Cand. Sci. (Phys.-Math.), Associate Professor at Ural state University of railway transport, Head of Mechatronic Department. Research interests: computer vision, intelligence data analysis, intelligence control systems.
Ilya Urevich SMOLIN
Russian Federation
Middle Data Scientist. Research interests: computer vision.
Maksim Vladimirovich ISAKOV
Russian Federation
Junior Data Scientist. Research interests: computer vision.
References
1. Гримов Р. Нейронные сети предпочитают текстуры и как с этим бороться. Хабр. https://habr.com/ru/company/ods/blog/453788/.
2. Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa Smolic. STaDA: Style Transfer as Data Augmentation. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Vol. 4 VISAPP: VISAPP, 107-114, 2019, Prague, Czech Republic, https://www.scitepress.org/Link.aspx?doi=10.5220/0007353401070114.
3. Jocher, G., Stoken, A., Borovec, J., NanoCode012, ChristopherSTAN, Changyu, L., … Rai, P. ultralytics/yolov5: v3.1 – Bug Fixes and Performance Improvements (v3.1) [Computer software]. Zenodo, 2020. https://doi.org/10.5281/zenodo.4154370.
4. Xiaojie Li., Lu Yang, Qing Song, Fuqiang Zhou. Detector-in-Detector: Multi-Level Analysis for Human-Parts, 2019. https://arxiv.org/abs/1902.07017.
5. Kapitanov A., Makhlyarchuk A., Kvanchiani K. HaGRID. Hand Gesture Recognition Image Dataset. SberDevices, Russia, 2022, arXiv:2206.08219v1.
6. Narasimhaswamy S., Wei Z., Wang Y., Zhang J., Hoai M., Contextual Attention for Hand Detection in the Wild. IEEE/CVF International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/ICCV.2019.00966.
7. Huang, M., Narasimhaswamy, S., Vazir, S., Ling, H., Hoai, M. Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 6396-6406, doi: 10.1109/CVPR52688.2022.00630.
8. Kyprianidis, J. E., Collomosse, J., Wang, T., & Isenberg, T. State of the ‘Art’: A Taxonomy of Artistic Stylization Techniques for Images and Video. IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 5, pp. 866-885, May 2013. doi: 10.1109/TVCG.2012.160.
9. Efros A. A. and Leung T. K. Texture synthesis by non-parametric sampling. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1033-1038 vol.2, doi: 10.1109/ICCV.1999.790383.
10. Lara Raad, and Bruno Galerne, Efros and Freeman Image Quilting Algorithm for Texture Synthesis, Image Processing On Line, 7 (2017), pp. 1–22. doi:10.5201/ipol.2017.171.
11. Elad M. and Milanfar P. Style-transfer via texture-synthesis. arXiv. doi:10.48550/arXiv.1609.03057.
12. Heeger D. J., Bergen J. R. Pyramid-Based Texture Analysis/Synthesis. Proceedings of the International Conference on Image Processing (SIGGRAPH), 1995, vol. 3, pp, 648-651, https://api.semanticscholar.org/CorpusID:47266338.
13. Bousmalis K., Silberman N., Dohan D., Erhan D., Krishnan D. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. arXiv:1612.05424v1, 2016.
14. Collobert R., Kavukcuoglu K., Farabet C. Torch7: A Matlab-like Environment for Machine Learning. In: BigLearn, NIPS Workshop, 2011.
15. Gatys L. A., Ecker A. S., Bethge M. Image style transfer using convolutional neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2414-2423, doi: 10.1109/CVPR.2016.265.
16. Li C., Wand M. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2479-2486, https://doi.org/10.1109/CVPR.2016.272.
17. Gatys L. A., Ecker A. S., Bethge M., Hertzmann A., Shechtman E. Controlling Perceptual Factors in Neural Style Transfer. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 3730-3738, doi: 10.1109/CVPR.2017.397.
18. Ruder M., Dosovitskiy A., Brox T. Artistic Style Transfer for Videos. In: Rosenhahn B., Andres B., (eds). Pattern Recognition. GCPR 2016. Lecture Notes in Computer Science(), vol 9796. Springer, Cham. https://doi.org/10.1007/978-3-319-45886-1_3.
19. Johnson J., Alahi A., Fei-Fei L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/978-3-319-46475-6_43.
20. Ulyanov D., Lebedev V., Vedaldi A., Lempitsky V. Texture networks: Feed-forward Synthesis of Textures and Stylized Images. Proceedings of the 33-rd International Conference on Machine Learning (ICML), New York, NY, USA, 2016. JMLR: W&CP vol. 48, pp. 2027-2041, 2016.
21. Li C., Wand M. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_43.
22. Wang X., Oxholm G., Zhang D., Wang Y.-F. Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 7178-7186, doi: 10.1109/CVPR.2017.759.
23. Ulyanov D., Vedaldi A., Lempitsky V. Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4105-4113, doi: 10.1109/CVPR.2017.437.
24. Dumoulin V., Shlens J., Kudlur M. A Learned Representation for Artistic Style., In Proceedings of the International Conference on Learning Representations (ICLR), 2017, https://openreview.net/forum?id=BJO-BuT1g, arXiv preprint arXiv:1610.07629, 2016.
25. Li Y., Fang C., Yang J., Wang Z., Lu X., Yang M.-H. Diversified Texture Synthesis with Feed-Forward Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 266-274, doi: 10.1109/CVPR.2017.36.
26. Chen T. Q., Schmidt M. Fast Patch-based Style Transfer of Arbitrary Style. arXiv preprint arXiv:1612.04337, 2016.
27. Huang, X., Belongie, S. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization, 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 1510-1519, doi: 10.1109/ICCV.2017.167.
28. Simonyan, K., Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. 2014. arXiv. doi: 10.48550/arXiv.1409.1556
29. Wilmot P., Risser E., Barnes C. Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses. arXiv. preprint arXiv:1701.08893, 2017.
30. Peng X., Saenko K. Synthetic to Real Adaptation with Deep Generative Correlation Alignment Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 2018, pp. 1982-1991, doi: 10.1109/WACV.2018.00219.
31. Li Y., Wang N., Liu J., Hou X. Demystifying Neural Style Transfer. arXiv. 2017. doi: 10.48550/arXiv.1701.01036.
32. Kingma D. P., Welling M. Auto-encoding variational bayes. In ICLR, 2014, doi: 10.48550/arXiv.1312.6114.
33. Oord A. v. d., Kalchbrenner N., Kavukcuoglu K. Pixel recurrent neural networks. In ICML, 2016, doi: 10.48550/arXiv.1601.06759.
34. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. NIPS 2014, 2672; http://papers.nips.cc/paper/5423-generative-adversarial-nets.
35. Taigman Y., Polyak A., Wolf L. Unsupervised cross- domain image generation. In ICLR, 2017, doi: 10.48550/arXiv.1611.02200.
36. Liu M.-Y., Tuzel O. Coupled generative adversarial networks. In NIPS, 2016, https://proceedings.neurips.cc/paper/2016/file/502e4a16930e414107ee22b6198c578f-Paper.pdf.
37. Kim T., Cha M., Kim H., Lee J., Kim J. Learning to discover cross-domain relations with generative adversarial networks. arXiv, 2017, doi: 10.48550/arXiv.1703.05192.
38. Kadish D., Risi S., Løvlie A. S. Improving object detection in art images using only style transfer //2021 International Joint Conference on Neural Networks (IJCNN). – IEEE, 2021. – С. 1-8.
39. Cai H. et al. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs //arXiv preprint arXiv:1505.00110. – 2015.
40. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015. doi: 10.48550/arXiv.1505.04597.
41. Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C. L., Dollár P. (2014). Microsoft COCO: Common Objects in Context. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8693. Springer, Cham. doi: 10.1007/978-3-319-10602-1_48.
42. Wang Y., Hoai, M. Improving Human Action Recognition by Non-action Classification. arXiv 2016. doi: 10.48550/arXiv.1604.06397.
43. Tan W. R., Chan C. S., Aguirre H., Tanaka K. Improved ArtGAN for Conditional Synthesis of Natural Image and Artwork. IEEE Transactions on Image Processing, 28(1), pp. 394–409, 2019. doi: 10.1109/TIP.2018.2866698.
44. Beitzel S.M., Jensen E.C., Frieder O. MAP. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA, 2009. doi: 10.1007/978-0-387-39940-9_492.
45. Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., Savarese S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv. 2019. doi: 10.48550/arXiv.1902.09630.
Review
For citations:
KARACHEV D.K., SHTEKHIN S.E., TARASYAN V.S., SMOLIN I.U., ISAKOV M.V. Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(6):247-264. (In Russ.) https://doi.org/10.15514/ISPRAS-2023-35(6)-16