Preview

Труды Института системного программирования РАН

Расширенный поиск

Обзор методов динамической компиляции запросов

https://doi.org/10.15514/ISPRAS-2017-29(3)-11

Аннотация

Эффективное использование процессора является решающим фактором производительности аналитических систем, особенно с увеличением размеров обрабатываемых данных. В то же время возрастающие объёмы доступной основной памяти позволяют значительно сократить количество обращений к медленным дисковым хранилищам и тем самым отводят традиционные для большинства систем обработки данных оптимизации подсистемы ввода-вывода на второй план. Одним из наиболее эффективных способов повышения эффективности использования процессора и сокращения накладных расходов, прежде всего проявляющихся в затратах на интерпретацию планов запросов, является компиляция запросов в исполняемый код во время выполнения (динамическая компиляция). В последнее время наблюдается рост интереса к методам динамической компиляции запросов как в академических, так и в прикладных разработках. Данная статья является обзором литературы в области динамической компиляции запросов, в основном для реляционных СУБД. Представлены работы последних лет, описаны архитектурные особенности методов, сделана классификация работ, приведены основные результаты.

Об авторах

Е. Ю. Шарыгин
Институт системного программирования РАН; Московский государственный университет имени М.В. Ломоносова
Россия


Р. А. Бучацкий
Институт системного программирования РАН
Россия


Список литературы

1. Кузнецов, С. Основы современных баз данных. http://citforum.ru/database/osbd/contents.shtml (дата обращения 18.05.2017).

2. Chamberlin, D.D., Astrahan, M.M., et al. 1981. A history and evaluation of System R. Commun. ACM. 24, 10 (1981), 632–646.

3. Wade, B.W. 2012. Compiling SQL into System/370 machine language. IEEE Annals of the History of Computing. 34, 4 (2012), 49–50.

4. Greer, R. 1999. Daytona and the fourth-generation language Cymbal. SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data (Philadelphia, Pennsylvania, USA, 1999), 525–526.

5. Copeland, G.P., Khoshafian, S. 1985. A decomposition storage model. Proceedings of the 1985 ACM SIGMOD international conference on management of data (Austin, Texas, USA, 1985), 268–279.

6. Шарыгин, Е., Бучацкий, Р., Скворцов, Л., Жуйков, Р., Мельник, Д. 2016. Динамическая компиляция выражений в SQL-запросах для СУБД PostgreSQL. Труды ИСП РАН. 28, 4 (2016), 217–240.

7. Kornacker, M., Behm, A., et al. 2015. Impala: A modern, open-source SQL engine for Hadoop. CIDR 2015, seventh biennial conference on innovative data systems research (Asilomar, CA, USA, 2015).

8. Wanderman-Milne, S., Li, N. 2014. Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull. 37, 1 (2014), 31–37.

9. Apache Hadoop, open-source software for reliable, scalable, distributed computing. The Apache Software Foundation; http://hadoop.apache.org (дата обращения 19.06.2017).

10. Apache HBase, the Hadoop database, a distributed, scalable, big data store. The Apache Software Foundation; https://hbase.apache.org (дата обращения 19.06.2017).

11. Lattner, C., Adve, V.S. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. 2nd IEEE / ACM international symposium on code generation and optimization (CGO 2004) (San Jose, CA, USA, 2004), 75–88.

12. TPC-H, an ad-hoc, decision support benchmark. Transaction Processing Performance Council; http://www.tpc.org/tpch (дата обращения 25.05.2017).

13. Apache Spark, a fast and general engine for large-scale data processing. The Apache Software Foundation; https://spark.apache.org (дата обращения 19.06.2017).

14. Armbrust, M., Xin, R.S., et al. 2015. Spark SQL: Relational data processing in Spark. Proceedings of the 2015 ACM SIGMOD international conference on management of data (Melbourne, Victoria, Australia, 2015), 1383–1394.

15. PostgreSQL, an open source object-relational database system. The PostgreSQL Global Development Group; https://www.postgresql.org (дата обращения 16.06.2017).

16. PostgreSQL derived databases. PostgreSQL wiki; https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases (дата обращения 20.06.2017).

17. ToroDB Stampede, a database bridging NoSQL and SQL. 8Kdata; https://www.torodb.com (дата обращения 19.06.2017).

18. Vertica, a “shared nothing” distributed analytical database. Hewlett Packard Enterprise Development; https://www.vertica.com (дата обращения 19.06.2017).

19. AgensGraph, a highly optimized, multi-model graph database for the modern, complex connected data environment. Bitnine Global; http://www.agensgraph.com (дата обращения 19.06.2017).

20. Tan, C. 2015. Vitesse DB: 100% Postgres, 100X faster for analytics. Presented at the 2nd South Bay PostgreSQL Meetup; https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub (дата обращения 20.06.2017).

21. ParAccel 2010. The ParAccel analytic database: A technical overview. ParAccel, Inc. https://marketplace.informatica.com/mpresources/docs/ParAccel-Technical-Overview-White-Paper%202011.pdf (дата обращения 20.06.2017).

22. Gupta, A., Agarwal, D., et al. 2015. Amazon Redshift and the case for simpler data warehouses. Proceedings of the 2015 ACM SIGMOD international conference on management of data (Melbourne, Victoria, Australia, 2015), 1917–1923.

23. Armenatzoglou, N., Rajaraman, K.J., et al. 2016. Improving query execution speed via code generation. Pivotal Engineering Journal; http://engineering.pivotal.io/post/codegen-gpdb-qx (дата обращения 20.06.2017). (2016).

24. DeepgreenDB, a scalable MPP data warehouse solution derived from the open source Greenplum database project. Vitesse Data; http://vitessedata.com/deepgreen-db (дата обращения 19.06.2017).

25. Zhang, R., Debray, S., Snodgrass, R.T. 2012. Micro-specialization: dynamic code specialization of database management systems. 10th annual IEEE/ACM international symposium on code generation and optimization, CGO 2012 (San Jose, CA, USA, 2012), 63–73.

26. Zhang, R., Snodgrass, R.T., Debray, S. 2012. Micro-specialization in DBMSes. IEEE 28th international conference on data engineering (ICDE 2012) (Washington, DC, USA (Arlington, Virginia), 2012), 690–701.

27. Zhang, R., Snodgrass, R.T., Debray, S. 2012. Application of micro-specialization to query evaluation operators. Workshops proceedings of the IEEE 28th international conference on data engineering, ICDE 2012 (Arlington, VA, USA, 2012), 315–321.

28. Callgrind: A call-graph generating cache and branch prediction profiler. Valgrind Developers; http://valgrind.org/docs/manual/cl-manual.html (дата обращения 8.06.2017).

29. TPC-C, an on-line transaction processing benchmark. Transaction Processing Performance Council; http://www.tpc.org/tpcc (дата обращения 8.06.2017).

30. Butterstein, D., Grust, T. 2016. Precision performance surgery for PostgreSQL: LLVM-based expression compilation, just in time. PVLDB. 9, 13 (2016), 1517–1520.

31. Clang: A C language family frontend for LLVM. The LLVM Foundation; https://clang.llvm.org/ (дата обращения 1.06.2017).

32. Graefe, G. 1994. Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 120–135.

33. Neumann, T. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4, 9 (2011), 539–550.

34. Rao, J., Pirahesh, H., Mohan, C., Lohman, G. 2006. Compiled query execution engine using jVM. Proceedings of the 22Nd international conference on data engineering (Washington, DC, USA, 2006), 23.

35. Java Emitter Templates, part of Eclipse Modeling Framework. Eclipse Foundation; http://www.eclipse.org/modeling/m2t/?project=jet (дата обращения 7.06.2017).

36. DB2, a relational database. IBM Corporation; https://www.ibm.com/analytics/us/en/technology/db2 (дата обращения 21.06.2017).

37. Ahmad, Y., Koch, C. 2009. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB. 2, 2 (2009), 1566–1569.

38. Box, D., Hejlsberg, A. 2007. LINQ: .NET language-integrated query. Microsoft Developer Network; https://msdn.microsoft.com/en-us/library/bb308959.aspx (дата обращения 8.06.2017). (2007).

39. Yu, Y., Isard, M., et al. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. 8th USENIX symposium on operating systems design and implementation, OSDI 2008, proceedings (San Diego, California, USA, 2008), 1–14.

40. Dryad data-parallel processing framework. Microsoft; https://www.microsoft.com/en-us/research/project/dryad (дата обращения 19.06.2017).

41. Duffy, J. 2007. A query language for data parallel programming: Invited talk. Proceedings of the 2007 workshop on declarative aspects of multicore programming (New York, NY, USA, 2007), 50.

42. Murray, D.G., Isard, M., Yu, Y. 2011. Steno: Automatic optimization of declarative queries. Proceedings of the 32Nd aCM sIGPLAN conference on programming language design and implementation (New York, NY, USA, 2011), 121–131.

43. Krikellas, K., Viglas, S., Cintra, M. 2010. Generating code for holistic query evaluation. Proceedings of the 26th international conference on data engineering, ICDE 2010 (Long Beach, California, USA, 2010), 613–624.

44. MonetDB, an open source column-oriented database. MonetDB B.V. https://www.monetdb.org (дата обращения 21.06.2017).

45. Neumann, T., Leis, V. 2014. Compiling database queries into machine code. IEEE Data Eng. Bull. 37, 1 (2014), 3–11.

46. Actian Vector (former VectorWise), a relational vectorized columnar analytic database. Actian Corporation; https://www.actian.com/analytic-database/vector-smp-analytic-database (дата обращения 19.06.2017).

47. Diaconu, C., Freedman, C., et al. 2013. Hekaton: SQL server’s memory-optimized OLTP engine. Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013 (New York, NY, USA, 2013), 1243–1254.

48. Freedman, C., Ismert, E., Larson, P. 2014. Compilation in the Microsoft SQL Server Hekaton engine. IEEE Data Eng. Bull. 37, 1 (2014), 22–30.

49. SQLServer, a relational database. Microsoft; https://www.microsoft.com/en-us/sql-server (дата обращения 19.06.2017).

50. Paroski, D. 2016. Code generation: The inner sanctum of database performance. High Scalability; http://highscalability.com/blog/2016/9/7/code-generation-the-inner-sanctum-of-database-performance.html (дата обращения 19.06.2017). (2016).

51. Бучацкий, Р., Шарыгин, Е., Скворцов, Л., Жуйков, Р., Мельник, Д., Баев, Р. 2016. Динамическая компиляция SQL-запросов для СУБД PostgreSQL. Труды ИСП РАН. 28, 6 (2016), 37–48.

52. Melnik, D., Buchatskiy, R., Zhuykov, R., Sharygin, E. 2017. JIT-compiling SQL queries in PostgreSQL using LLVM. Presented at PGCon 2017; https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf (дата обращения 19.06.2017).

53. Dashti, M., Abadi, R. 2013. Database query optimization using compilation techniques. (2013).

54. Rompf, T., Odersky, M. 2010. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. Generative programming and component engineering, proceedings of the ninth international conference on generative programming and component engineering, GPCE 2010 (Eindhoven, The Netherlands, 2010), 127–136.

55. Klonatos, Y., Koch, C., Rompf, T., Chafi, H. 2014. Building efficient query engines in a high-level language. PVLDB. 7, 10 (2014), 853–864.

56. GLib, a general-purpose utility library. The GNOME Foundation; https://developer.gnome.org/glib/ (дата обращения 2.06.2017).

57. Hänsch, C., Kissinger, T., Habich, D., Lehner, W. 2015. Plan operator specialization using reflective compiler techniques. Datenbanksysteme für business, technologie und web (BTW), 16. Fachtagung des GI-fachbereichs «datenbanken und informationssysteme» (DBIS), 4.-6.3.2015, proceedings (Hamburg, Germany, 2015), 363–382.

58. Dexter: Dresden index for transactional access on emerging technologies. Dresden Database Systems Group; http://wwwdb.inf.tu-dresden.de/research-projects/projects/dexter/ (дата обращения 17.01.2013).

59. O’Neil, P., O’Neil, B., Chen, X. 2009. Star Schema Benchmark. (2009).

60. Tahboub, R.Y., Rompf, T. 2016. On supporting compilation in spatial query engines: (Vision paper). Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS 2016 (Burlingame, California, USA, 2016), 9:1–9:4.

61. Rompf, T. LB2, a fork of LegoBase. https://github.com/TiarkRompf/legobase-micro (дата обращения 16.06.2017).

62. PostGIS, a spatial database extender for PostgreSQL. PostGIS Project Steering Committee; http://postgis.net (дата обращения 21.06.2017).

63. Essertel, G.M., Tahboub, R.Y., Decker, J.M., Brown, K.J., Olukotun, K., Rompf, T. 2017. Flare: Native compilation for heterogeneous workloads in Apache Spark. CoRR. abs/1703.08219, (2017).

64. OpenMP, an API specification for parallel programming. OpenMP Architecture Review Board; http://www.openmp.org (дата обращения 19.06.2017).

65. Sujeeth, A.K., Rompf, T., et al. 2013. Composition and reuse with compiled domain-specific languages. ECOOP 2013 - object-oriented programming - 27th european conference, proceedings (Montpellier, France, 2013), 52–78.

66. Sujeeth, A.K., Lee, H., et al. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. Proceedings of the 28th international conference on machine learning, ICML 2011 (Bellevue, Washington, USA, 2011), 609–616.

67. Brown, K.J., Sujeeth, A.K., et al. 2011. A heterogeneous parallel framework for domain-specific languages. 2011 international conference on parallel architectures and compilation techniques, PACT 2011 (Galveston, TX, USA, 2011), 89–100.

68. Brown, K.J., Lee, H., et al. 2016. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns. Proceedings of the 2016 international symposium on code generation and optimization, CGO 2016 (Barcelona, Spain, 2016), 194–205.

69. Würthinger, T., Wöß, A., Stadler, L., Duboscq, G., Simon, D., Wimmer, C. 2012. Self-optimizing AST interpreters. Proceedings of the 8th symposium on dynamic languages, DLS ’12 (Tucson, AZ, USA, 2012), 73–82.

70. Würthinger, T., Wimmer, C., et al. 2013. One VM to rule them all. ACM symposium on new ideas in programming and reflections on software, onward! 2013, part of SPLASH ’13 (Indianapolis, IN, USA, 2013), 187–204.


Рецензия

Для цитирования:


Шарыгин Е.Ю., Бучацкий Р.А. Обзор методов динамической компиляции запросов. Труды Института системного программирования РАН. 2017;29(3):179-224. https://doi.org/10.15514/ISPRAS-2017-29(3)-11

For citation:


Sharygin E.Y., Buchatskiy R.A. Survey of Just-in-Time Query Compilation Methods. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2017;29(3):179-224. (In Russ.) https://doi.org/10.15514/ISPRAS-2017-29(3)-11



Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)