Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task

Denis Konstantinovich KARACHEV; Sergey Evgenievich SHTEKHIN; Vladimir Sergeevich TARASYAN; Ilya Urevich SMOLIN; Maksim Vladimirovich ISAKOV

doi:10.15514/ISPRAS-2023-35(6)-16

Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task

Denis Konstantinovich KARACHEV, Sergey Evgenievich SHTEKHIN, Vladimir Sergeevich TARASYAN, Ilya Urevich SMOLIN, Maksim Vladimirovich ISAKOV

https://doi.org/10.15514/ISPRAS-2023-35(6)-16

Full Text:

PDF (Rus)

Generate QR code

Abstract

This paper proposes the implementation of a neural network training approach for object detection, using augmentation - style transfer. This method improves the generalization ability of the neural network to determine the location of objects in the image by improving the interaction with low-level features such as textures, different colors and small changes in shapes. The effectiveness of the method is experimentally proved and the numerical values of the object detection metrics are demonstrated on several datasets with different classes. The application of augmentation is proposed using an unused before neural network architecture capable of carrying an arbitrary number of styles. The peculiarity of the approach is also that the weights of the neural network for styling are frozen and it is added to the graph of the detection network, which allows augmentation speed.

Keywords

neural networks, computer vision, style transfer, machine learning, object detection.

About the Authors

Denis Konstantinovich KARACHEV

Industry Center for Information Systems' Development and Deployment, Ural State University of Railway Transport
Russian Federation

Senior Data Scientist. Research interests: computer vision.

Sergey Evgenievich SHTEKHIN

Industry Center for Information Systems' Development and Deployment
Russian Federation

Team Lead Data Scientist. Research interests: computer vision.

Vladimir Sergeevich TARASYAN

Ural State University of Railway Transport
Russian Federation

Cand. Sci. (Phys.-Math.), Associate Professor at Ural state University of railway transport, Head of Mechatronic Department. Research interests: computer vision, intelligence data analysis, intelligence control systems.

Ilya Urevich SMOLIN

Industry Center for Information Systems' Development and Deployment
Russian Federation

Middle Data Scientist. Research interests: computer vision.

Maksim Vladimirovich ISAKOV

Industry Center for Information Systems' Development and Deployment
Russian Federation

Junior Data Scientist. Research interests: computer vision.

References

1. Гримов Р. Нейронные сети предпочитают текстуры и как с этим бороться. Хабр. https://habr.com/ru/company/ods/blog/453788/.

2. Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa Smolic. STaDA: Style Transfer as Data Augmentation. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Vol. 4 VISAPP: VISAPP, 107-114, 2019, Prague, Czech Republic, https://www.scitepress.org/Link.aspx?doi=10.5220/0007353401070114.

3. Jocher, G., Stoken, A., Borovec, J., NanoCode012, ChristopherSTAN, Changyu, L., … Rai, P. ultralytics/yolov5: v3.1 – Bug Fixes and Performance Improvements (v3.1) [Computer software]. Zenodo, 2020. https://doi.org/10.5281/zenodo.4154370.

4. Xiaojie Li., Lu Yang, Qing Song, Fuqiang Zhou. Detector-in-Detector: Multi-Level Analysis for Human-Parts, 2019. https://arxiv.org/abs/1902.07017.

5. Kapitanov A., Makhlyarchuk A., Kvanchiani K. HaGRID. Hand Gesture Recognition Image Dataset. SberDevices, Russia, 2022, arXiv:2206.08219v1.

6. Narasimhaswamy S., Wei Z., Wang Y., Zhang J., Hoai M., Contextual Attention for Hand Detection in the Wild. IEEE/CVF International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/ICCV.2019.00966.

7. Huang, M., Narasimhaswamy, S., Vazir, S., Ling, H., Hoai, M. Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 6396-6406, doi: 10.1109/CVPR52688.2022.00630.

8. Kyprianidis, J. E., Collomosse, J., Wang, T., & Isenberg, T. State of the ‘Art’: A Taxonomy of Artistic Stylization Techniques for Images and Video. IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 5, pp. 866-885, May 2013. doi: 10.1109/TVCG.2012.160.

9. Efros A. A. and Leung T. K. Texture synthesis by non-parametric sampling. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1033-1038 vol.2, doi: 10.1109/ICCV.1999.790383.

10. Lara Raad, and Bruno Galerne, Efros and Freeman Image Quilting Algorithm for Texture Synthesis, Image Processing On Line, 7 (2017), pp. 1–22. doi:10.5201/ipol.2017.171.

11. Elad M. and Milanfar P. Style-transfer via texture-synthesis. arXiv. doi:10.48550/arXiv.1609.03057.

12. Heeger D. J., Bergen J. R. Pyramid-Based Texture Analysis/Synthesis. Proceedings of the International Conference on Image Processing (SIGGRAPH), 1995, vol. 3, pp, 648-651, https://api.semanticscholar.org/CorpusID:47266338.

13. Bousmalis K., Silberman N., Dohan D., Erhan D., Krishnan D. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. arXiv:1612.05424v1, 2016.

14. Collobert R., Kavukcuoglu K., Farabet C. Torch7: A Matlab-like Environment for Machine Learning. In: BigLearn, NIPS Workshop, 2011.

15. Gatys L. A., Ecker A. S., Bethge M. Image style transfer using convolutional neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2414-2423, doi: 10.1109/CVPR.2016.265.

16. Li C., Wand M. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2479-2486, https://doi.org/10.1109/CVPR.2016.272.

17. Gatys L. A., Ecker A. S., Bethge M., Hertzmann A., Shechtman E. Controlling Perceptual Factors in Neural Style Transfer. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 3730-3738, doi: 10.1109/CVPR.2017.397.

18. Ruder M., Dosovitskiy A., Brox T. Artistic Style Transfer for Videos. In: Rosenhahn B., Andres B., (eds). Pattern Recognition. GCPR 2016. Lecture Notes in Computer Science(), vol 9796. Springer, Cham. https://doi.org/10.1007/978-3-319-45886-1_3.

19. Johnson J., Alahi A., Fei-Fei L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/978-3-319-46475-6_43.

20. Ulyanov D., Lebedev V., Vedaldi A., Lempitsky V. Texture networks: Feed-forward Synthesis of Textures and Stylized Images. Proceedings of the 33-rd International Conference on Machine Learning (ICML), New York, NY, USA, 2016. JMLR: W&CP vol. 48, pp. 2027-2041, 2016.

21. Li C., Wand M. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_43.

22. Wang X., Oxholm G., Zhang D., Wang Y.-F. Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 7178-7186, doi: 10.1109/CVPR.2017.759.

23. Ulyanov D., Vedaldi A., Lempitsky V. Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4105-4113, doi: 10.1109/CVPR.2017.437.

24. Dumoulin V., Shlens J., Kudlur M. A Learned Representation for Artistic Style., In Proceedings of the International Conference on Learning Representations (ICLR), 2017, https://openreview.net/forum?id=BJO-BuT1g, arXiv preprint arXiv:1610.07629, 2016.

25. Li Y., Fang C., Yang J., Wang Z., Lu X., Yang M.-H. Diversified Texture Synthesis with Feed-Forward Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 266-274, doi: 10.1109/CVPR.2017.36.

26. Chen T. Q., Schmidt M. Fast Patch-based Style Transfer of Arbitrary Style. arXiv preprint arXiv:1612.04337, 2016.

27. Huang, X., Belongie, S. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization, 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 1510-1519, doi: 10.1109/ICCV.2017.167.

28. Simonyan, K., Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. 2014. arXiv. doi: 10.48550/arXiv.1409.1556

29. Wilmot P., Risser E., Barnes C. Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses. arXiv. preprint arXiv:1701.08893, 2017.

30. Peng X., Saenko K. Synthetic to Real Adaptation with Deep Generative Correlation Alignment Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 2018, pp. 1982-1991, doi: 10.1109/WACV.2018.00219.

31. Li Y., Wang N., Liu J., Hou X. Demystifying Neural Style Transfer. arXiv. 2017. doi: 10.48550/arXiv.1701.01036.

32. Kingma D. P., Welling M. Auto-encoding variational bayes. In ICLR, 2014, doi: 10.48550/arXiv.1312.6114.

33. Oord A. v. d., Kalchbrenner N., Kavukcuoglu K. Pixel recurrent neural networks. In ICML, 2016, doi: 10.48550/arXiv.1601.06759.

34. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. NIPS 2014, 2672; http://papers.nips.cc/paper/5423-generative-adversarial-nets.

35. Taigman Y., Polyak A., Wolf L. Unsupervised cross- domain image generation. In ICLR, 2017, doi: 10.48550/arXiv.1611.02200.

36. Liu M.-Y., Tuzel O. Coupled generative adversarial networks. In NIPS, 2016, https://proceedings.neurips.cc/paper/2016/file/502e4a16930e414107ee22b6198c578f-Paper.pdf.

37. Kim T., Cha M., Kim H., Lee J., Kim J. Learning to discover cross-domain relations with generative adversarial networks. arXiv, 2017, doi: 10.48550/arXiv.1703.05192.

38. Kadish D., Risi S., Løvlie A. S. Improving object detection in art images using only style transfer //2021 International Joint Conference on Neural Networks (IJCNN). – IEEE, 2021. – С. 1-8.

39. Cai H. et al. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs //arXiv preprint arXiv:1505.00110. – 2015.

40. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015. doi: 10.48550/arXiv.1505.04597.

41. Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C. L., Dollár P. (2014). Microsoft COCO: Common Objects in Context. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8693. Springer, Cham. doi: 10.1007/978-3-319-10602-1_48.

42. Wang Y., Hoai, M. Improving Human Action Recognition by Non-action Classification. arXiv 2016. doi: 10.48550/arXiv.1604.06397.

43. Tan W. R., Chan C. S., Aguirre H., Tanaka K. Improved ArtGAN for Conditional Synthesis of Natural Image and Artwork. IEEE Transactions on Image Processing, 28(1), pp. 394–409, 2019. doi: 10.1109/TIP.2018.2866698.

44. Beitzel S.M., Jensen E.C., Frieder O. MAP. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA, 2009. doi: 10.1007/978-0-387-39940-9_492.

45. Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., Savarese S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv. 2019. doi: 10.48550/arXiv.1902.09630.

Review

For citations:

KARACHEV D.K., SHTEKHIN S.E., TARASYAN V.S., SMOLIN I.U., ISAKOV M.V. Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(6):247-264. (In Russ.) https://doi.org/10.15514/ISPRAS-2023-35(6)-16

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Style Transfer as a Way to Improve the Generalization Ability of a Neural Network in an Object Detection Task

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy