Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities

https://doi.org/10.15514/ISPRAS-2024-36(5)-7

Abstract

Variational inequalities as an effective tool for solving applied problems, including machine learning tasks, have been attracting more and more attention from researchers in recent years. The use of variational inequalities covers a wide range of areas – from reinforcement learning and generative models to traditional applications in economics and game theory. At the same time, it is impossible to imagine the modern world of machine learning without distributed optimization approaches that can significantly speed up the training process on large amounts of data. However, faced with the high costs of communication between devices in a computing network, the scientific community is striving to develop approaches that make computations cheap and stable. In this paper, we investigate the compression technique of transmitted information and its application to the distributed variational inequalities problem. In particular, we present a method based on advanced techniques originally developed for minimization problems. For the new method, we provide an exhaustive theoretical convergence analysis for cocoersive strongly monotone variational inequalities. We conduct experiments that emphasize the high performance of the presented technique and confirm its practical applicability.

About the Authors

Daniil Olegovich MEDYAKOV
Ivannikov Institute for System Programming of the RAS, Moscow Institute of Physics and Technology
Russian Federation

Master student at Moscow Institute of Physics and Technology, laboratory assistant at the Laboratory of Federated Learning Problems, senior laboratory assistant at the Research Center of Trusted Artificial Intelligence of the Institute for System Programming of the Russian Academy of Sciences. Research interests: stochastic optimization, distributed and federated learning.



Gleb Lvovich MOLODTSOV
Ivannikov Institute for System Programming of the RAS, Moscow Institute of Physics and Technology
Russian Federation

Master student at Moscow Institute of Physics and Technology, laboratory assistant at the Laboratory of Federated Learning Problems of the Institute for System Programming of the Russian Academy of Sciences. Research interests: stochastic optimization, distributed and federated learning.



Aleksandr Nikolaevich BEZNOSIKOV
Ivannikov Institute for System Programming of the RAS, Moscow Institute of Physics and Technology, Innopolis University
Russian Federation

Cand. Sci. (Phys.-Math.), Head of the Laboratory of Federated Learning Problems of the Institute for System Programming of the Russian Academy of Sciences, Associate Professor of the Department of Mathematical Foundations of Management for the Moscow Institute of Physics and Technology. Research interests: stochastic optimization, distributed and federated learning.



References

1. F. E. Browder, “Nonexpansive nonlinear operators in a banach space,” Proceedings of the National Academy of Sciences, vol. 54, no. 4, pp. 1041–1044, 1965.

2. P. Kairouz et al., “Advances and open problems in federated learning,” Foundations and trends in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021.

3. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE signal processing magazine, vol. 37, no. 3, pp. 50–60, 2020.

4. J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,” Acm computing surveys (csur), vol. 53, no. 2, pp. 1–33, 2020.

5. J. Konečný, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated optimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527, 2016.

6. V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,” Advances in neural information processing systems, vol. 30, 2017.

7. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, PMLR, 2017, pp. 1273–1282.

8. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.

9. A. Beznosikov, V. Samokhin, and A. Gasnikov, “Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms,” arXiv preprint arXiv:2010.13112, 2020.

10. I. Goodfellow et al., “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.

11. X. Liu et al., “Adversarial training for large neural language models,” arXiv preprint arXiv:2004.08994, 2020.

12. C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, and J. Liu, “Freelb: Enhanced adversarial training for natural language understanding,” arXiv preprint arXiv:1909.11764, 2019.

13. F. Bach, J. Mairal, and J. Ponce, “Convex sparse matrix factorizations,” arXiv preprint arXiv:0812.1869, 2008.

14. E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science,” SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 1015–1046, 2010.

15. A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of mathematical imaging and vision, vol. 40, pp. 120–145, 2011.

16. A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization, vol. 28. Princeton university press, 2009.

17. J. Von Neumann and O. Morgenstern, Theory of games and economic behavior: By j. Von neumann and o. morgenstern. Princeton university press, 1953.

18. F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems. Springer, 2003.

19. D. Kovalev, A. Beznosikov, E. Borodich, A. Gasnikov, and G. Scutari, “Optimal gradient sliding and its application to optimal distributed optimization under similarity,” Advances in Neural Information Processing Systems, vol. 35, pp. 33494–33507, 2022.

20. D. Medyakov, G. Molodtsov, A. Beznosikov, and A. Gasnikov, “Optimal data splitting in distributed optimization for machine learning,” in Doklady mathematics, Pleiades Publishing Moscow, 2023, pp. S465–S475.

21. Y. Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012.

22. A. Beznosikov, S. Horváth, P. Richtárik, and M. Safaryan, “On biased compression for distributed learning,” Journal of Machine Learning Research, vol. 24, no. 276, pp. 1–50, 2023.

23. P. Richtárik and M. Takáč, “Distributed coordinate descent method for learning with big data,” Journal of Machine Learning Research, vol. 17, no. 75, pp. 1–25, 2016.

24. A. T. Suresh, X. Y. Felix, S. Kumar, and H. B. McMahan, “Distributed mean estimation with limited communication,” in International conference on machine learning, PMLR, 2017, pp. 3329–3337.

25. J. Wu, W. Huang, J. Huang, and T. Zhang, “Error compensated quantized SGD and its applications to large-scale distributed optimization,” in International conference on machine learning, PMLR, 2018, pp. 5325–5333.

26. J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsification for communication-efficient distributed optimization,” Advances in Neural Information Processing Systems, vol. 31, 2018.

27. W. Wen et al., “Terngrad: Ternary gradients to reduce communication in distributed deep learning,” Advances in neural information processing systems, vol. 30, 2017.

28. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” Advances in neural information processing systems, vol. 30, 2017.

29. E. Gorbunov, “Distributed and stochastic optimization methods with gradient compression and local steps,” arXiv preprint arXiv:2112.10645, 2021.

30. R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” Advances in neural information processing systems, vol. 26, 2013.

31. N. Roux, M. Schmidt, and F. Bach, “A stochastic gradient method with an exponential convergence _rate for finite training sets,” Advances in neural information processing systems, vol. 25, 2012.

32. A. Defazio, F. Bach, and S. Lacoste-Julien, “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives,” Advances in neural information processing systems, vol. 27, 2014.

33. L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, “SARAH: A novel method for machine learning problems using stochastic recursive gradient,” in International conference on machine learning, PMLR, 2017, pp. 2613–2621.

34. A. Beznosikov and M. Takáč, “Random-reshuffled SARAH does not need full gradient computations,” Optimization Letters, vol. 18, no. 3, pp. 727–749, 2024.

35. K. Mishchenko, E. Gorbunov, M. Takáč, and P. Richtárik, “Distributed learning with compressed gradient differences,” Optimization Methods and Software, pp. 1–16, 2024.

36. S. Horvóth, C.-Y. Ho, L. Horvath, A. N. Sahu, M. Canini, and P. Richtárik, “Natural compression for distributed deep learning,” in Mathematical and scientific machine learning, PMLR, 2022, pp. 129–141.

37. F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, “Federated learning with compression: Unified analysis and sharp guarantees,” in International conference on artificial intelligence and statistics, PMLR, 2021, pp. 2350–2358.

38. R. Das, A. Acharya, A. Hashemi, S. Sanghavi, I. S. Dhillon, and U. Topcu, “Faster non-convex federated learning via global and local momentum,” in Uncertainty in artificial intelligence, PMLR, 2022, pp. 496–506.

39. E. Gorbunov, K. P. Burlachenko, Z. Li, and P. Richtárik, “MARINA: Faster non-convex distributed learning with compression,” in International conference on machine learning, PMLR, 2021, pp. 3788–3798.

40. A. Juditsky, A. Nemirovski, and C. Tauvel, “Solving variational inequalities with stochastic mirror-prox algorithm,” Stochastic Systems, vol. 1, no. 1, pp. 17–58, 2011.

41. G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien, “A variational inequality perspective on generative adversarial networks,” arXiv preprint arXiv:1802.10551, 2018.

42. Y.-G. Hsieh, F. Iutzeler, J. Malick, and P. Mertikopoulos, “On the convergence of single-call stochastic extra-gradient methods,” in Advances in neural information processing systems, Curran Associates, Inc., 2019, pp. 6938–6948.

43. K. Mishchenko, D. Kovalev, E. Shulgin, P. Richtárik, and Y. Malitsky, “Revisiting stochastic extragradient,” in International conference on artificial intelligence and statistics, PMLR, 2020, pp. 4573–4582.

44. Y.-G. Hsieh, F. Iutzeler, J. Malick, and P. Mertikopoulos, “Explore aggressively, update conservatively: Stochastic extragradient methods with variable stepsize scaling,” in Advances in Neural Information Processing Systems, 2020, pp. 16223–16234.

45. E. Gorbunov, H. Berard, G. Gidel, and N. Loizou, “Stochastic extragradient: General analysis and improved rates,” in International conference on artificial intelligence and statistics, PMLR, 2022, pp. 7865–7901.

46. A. Beznosikov, B. Polyak, E. Gorbunov, D. Kovalev, and A. Gasnikov, “Smooth monotone stochastic variational inequalities and saddle point problems: A survey,” European Mathematical Society Magazine, no. 127, pp. 15–28, 2023.

47. A. Beznosikov, S. Samsonov, M. Sheshukova, A. Gasnikov, A. Naumov, and E. Moulines, “First order methods with markovian noise: From acceleration to variational inequalities,” Advances in Neural Information Processing Systems, vol. 36, 2024.

48. V. Solodkin, A. Veprikov, and A. Beznosikov, “Methods for optimization problems with markovian stochasticity and non-euclidean geometry,” arXiv preprint arXiv:2408.01848, 2024.

49. B. Palaniappan and F. Bach, “Stochastic variance reduction methods for saddle-point problems,” in Advances in neural information processing systems, 2016, pp. 1416–1424.

50. T. Chavdarova, G. Gidel, F. Fleuret, and S. Lacoste-Julien, “Reducing noise in gan training with variance reduced extragradient,” in Advances in Neural Information Processing Systems, 2019, pp. 393–403.

51. A. Alacaoglu, Y. Malitsky, and V. Cevher, “Forward-reflected-backward method with variance reduction,” Computational Optimization and Applications, vol. 80, Nov. 2021, doi: 10.1007/s10589-021-00305- 3.

52. A. Alacaoglu and Y. Malitsky, “Stochastic variance reduction for variational inequality methods,” arXiv preprint arXiv:2102.08352, 2021.

53. D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtárik, and A. Gasnikov, “Optimal algorithms for decentralized stochastic variational inequalities,” Advances in Neural Information Processing Systems, vol. 35, pp. 31073–31088, 2022.

54. A. Beznosikov, A. Gasnikov, K. Zainulina, A. Maslovskiy, and D. Pasechnyuk, “A unified analysis of variational inequality methods: Variance reduction, sampling, quantization andCoordinate descent,” arXiv preprint arXiv:2201.12206, 2022.

55. A. Pichugin, M. Pechin, A. Beznosikov, A. Savchenko, and A. Gasnikov, “Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities,” in Doklady mathematics, Springer, 2023, pp. S348–S359.

56. A. Pichugin, M. Pechin, A. Beznosikov, V. Novitskii, and A. Gasnikov, “Method with batching for stochastic finite-sum variational inequalities in non-euclidean setting,” Chaos, Solitons & Fractals, vol. 187, p. 115396, 2024.

57. D. Medyakov, G. Molodtsov, E. Grigoriy, E. Petrov, and A. Beznosikov, “Shuffling heuristic in variational inequalities: Establishing new convergence guarantees,” in International conference on computational optimization,

58. L. Chen, B. Yao, and L. Luo, “Faster stochastic algorithms for minimax optimization under polyak-{∖l} ojasiewicz condition,” Advances in Neural Information Processing Systems, vol. 35, pp. 13921–13932, 2022.

59. A. Beznosikov and A. Gasnikov, “Sarah-based variance-reduced algorithm for stochastic finite-sum cocoercive variational inequalities,” in Data analysis and optimization: In honor of boris mirkin’s 80th birthday, Springer, 2023, pp. 47–57.

60. W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, and N. Zheng, “A decentralized proximal point-type method for saddle point problems,” arXiv preprint arXiv:1910.14380, 2019.

61. M. Liu et al., “A decentralized parallel algorithm for training generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 33, pp. 11056–11070, 2020.

62. I. Tsaknakis, M. Hong, and S. Liu, “Decentralized min-max optimization: Formulations, algorithms and applications in network poisoning attack,” in ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2020, pp. 5755–5759.

63. Y. Deng and M. Mahdavi, “Local stochastic gradient descent ascent: Convergence analysis and communication efficiency,” in International conference on artificial intelligence and statistics, PMLR, 2021, pp. 1387–1395.

64. A. Beznosikov, G. Scutari, A. Rogozin, and A. Gasnikov, “Distributed saddle-point problems under data similarity,” Advances in Neural Information Processing Systems, vol. 34, pp. 8172–8184, 2021.

65. A. Beznosikov, P. Dvurechenskii, A. Koloskova, V. Samokhin, S. U. Stich, and A. Gasnikov, “Decentralized local stochastic extra-gradient for variational inequalities,” Advances in Neural Information Processing Systems, vol. 35, pp. 38116–38133, 2022.

66. A. Beznosikov, P. Richtárik, M. Diskin, M. Ryabinin, and A. Gasnikov, “Distributed methods with compressed communication for solving variational inequalities, with theoretical guarantees,” Advances in Neural Information Processing Systems, vol. 35, pp. 14013–14029, 2022.

67. A. Beznosikov and A. Gasnikov, “Compression and data similarity: Combination of two techniques for communication-efficient solving of distributed variational inequalities,” in International conference on optimization and applications, Springer, 2022, pp. 151–162.

68. A. Beznosikov, M. Takác, and A. Gasnikov, “Similarity, compression and local steps: Three pillars of efficient communications for distributed variational inequalities,” Advances in Neural Information Processing Systems, vol. 36, 2024.

69. N. Loizou, H. Berard, G. Gidel, I. Mitliagkas, and S. Lacoste-Julien, “Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity,” Advances in Neural Information Processing Systems, vol. 34, pp. 19095–19108, 2021.

70. A. Beznosikov, E. Gorbunov, H. Berard, and N. Loizou, “Stochastic gradient descent-ascent: Unified theory and new efficient methods,” in International conference on artificial intelligence and statistics, PMLR, 2023, pp. 172–235.

71. E. Gorbunov, F. Hanzely, and P. Richtárik, “A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent,” in International conference on artificial intelligence and statistics, PMLR, 2020, pp. 680–690.

72. H. Robbins and S. Monro, “A stochastic approximation method,” The annals of mathematical statistics, pp. 400–407, 1951.

73. E. Moulines and F. Bach, “Non-asymptotic analysis of stochastic approximation algorithms for machine learning,” Advances in neural information processing systems, vol. 24, 2011.

74. B. Jacob et al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704–2713.

75. R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” arXiv preprint arXiv:1806.08342, 2018.

76. H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, “Integer quantization for deep learning inference: Principles and empirical evaluation,” arXiv preprint arXiv:2004.09602, 2020.


Review

For citations:


MEDYAKOV D.O., MOLODTSOV G.L., BEZNOSIKOV A.N. Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(5):93-108. (In Russ.) https://doi.org/10.15514/ISPRAS-2024-36(5)-7



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)