Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Array Database Internals

https://doi.org/10.15514/ISPRAS-2018-30(1)-10

Abstract

After huge amount of big scientific data, which needed to be stored and processed, has emerged, the problem of large multidimensional arrays support gained close attention in the database world. Devising special database engines with support of array data model became an issue. Development of a well-organized database management system which stands on completely uncommon data model required performing the following tasks: formally defining a data model, building a formal algebra operating on objects from the data model, devising optimization rules on logical level and then on the physical one. Those tasks has already been completed by creators of different array databases. In this paper array formalization, core algebra and optimization techniques are revised using examples of AML, RasDaMan, SciDB - developed array database management systems with different algebras and optimization approaches.

About the Authors

V. A. Pavlov
Saint Petersburg State University
Russian Federation


B. A. Novikov
Saint Petersburg State University
Russian Federation


References

1. Peter Baumann. Raster Data Management and Multi-Dimensional Arrays, pages 2332-2339. Springer US, Boston, MA, 2009.

2. Hubble telescope images. https://www.spacetelescope.org/images/.

3. Large hadron collider storage. http://lhcb-public.web.cern.ch/lhcb-public/en/Data

4. Gilberto Câmara, Lúbia Vinhas, Karine Reis Ferreira, Gilberto Ribeiro De Queiroz, Ricardo Cartaxo Modesto De Souza, Antônio Miguel Vieira Monteiro, Marcelo Tílio De Carvalho, Marco Antonio Casanova, and Ubirajara Moura De Freitas. TerraLib: An Open Source GIS Library for Large-Scale Environmental and Socio-Economic Applications, pages 247-270. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.

5. PostGIS official web page. https://postgis.net/.

6. SpatialLite web page. https://www.gaia-gis.it/fossil/libspatialite/home.

7. Oracle GeoRaster documentation. https://docs.oracle.com/cd/B19306_01/appdev.102/b14254/geor_intro.htm.

8. Baumann P. and Holsten S. A comparative analysis of array models for databases. In Database Theory and Application, Bio-Science and Bio-Technology. Communications in Computer and Information Science, volume 258, 2011.

9. Rasdaman home page. http://www.rasdaman.org/.

10. Paul G. Brown. Overview of scidb: large scale array storage, processing and analysis. In SIGMOD '10 Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 963-968, 2010.

11. Paulo Jorge Pimenta Marques. Arbitrary tiling of multidimensional discrete data cubes in the rasdaman system. 1998.

12. L.T. Chen, R. Drach, M. Keating, S. Louis, D. Rotem, and A. Shoshani. Efficient organization and access of multi-dimensional datasets on tertiary storage systems. Information Systems, 20(2):155-183, 1995. Scientific Databases.

13. Peter Baumann. A database array algebra for spatio-temporal data and beyond. 06 1999.

14. Roland Ritsch. Optimization and evaluation of array queries in database management systems. 12 1999.

15. A. G. Gutierrez and P. Baumann. Modeling fundamental geo-raster operations with array algebra. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), pages 607-612, Oct 2007.

16. Frank P. Palermo. A data base search problem. 01 1974.

17. Joseph M. Hellerstein. Optimization techniques for queries with expensive methods. ACM Trans. Database Syst., 23(2):113-157, June 1998.

18. A. Swami. Optimization of large join queries: Combining heuristics and combinatorial techniques. SIGMOD Rec., 18(2):367-376, June 1989.

19. Arunprasad P. Marathe and Kenneth Salem. Query processing techniques for arrays. SIGMOD Rec., 28(2):323-334, June 1999.

20. Hamiltonian cycle. http://mathworld.wolfram.com/HamiltonianCycle.html.

21. P. Baumann and V. Merticariu. On the efficient evaluation of array joins. In 2015 IEEE International Conference on Big Data (Big Data), pages 2046-2055, Oct 2015.

22. Ethan Kim. Comp 251: Data structures and algorithms. https://ethkim.github.io/TA/251/eulerian.pdf.

23. Arunprasad P. Marathe and Kenneth Salem. A language for manipulating arrays. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, pages 46-55, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.

24. Hilbert curve. http://www4.ncsu.edu/ njrose/pdfFiles/HilbertCurve.pdf.

25. Z-curve general information. http://wiki.gis.com/wiki/index.php/Z-order_(curve).

26. P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, and S. Zdonik. A demonstration of scidb: A science-oriented dbms. Proc. VLDB Endow., 2(2):1534-1537, August 2009.

27. Paul G. Brown. Overview of scidb: Large scale array storage, processing and analysis. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 963-968, New York, NY, USA, 2010. ACM.

28. Scidb doucumentation. https://paradigm4.atlassian.net/wiki/spaces/ESD/overview.

29. Michael Stonebraker, Paul Brown, Alex Poliakov, and Suchi Raman. The architecture of scidb. In Proceedings of the 23rd International Conference on Scientific and Statistical Database Management, SSDBM'11, pages 1-16, Berlin, Heidelberg, 2011. Springer-Verlag.

30. Emad Soroush, Magdalena Balazinska, Simon Krughoff, and Andrew Connolly. Efficient iterative processing in the scidb parallel array engine. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM '15, pages 39:1-39:6, New York, NY, USA, 2015. ACM.

31. Sangchul Kim, Seoung Gook Sohn, Taehoon Kim, Jinseon Yu, Bogyeong Kim, and Bongki Moon. Selective scan for filter operator of scidb. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM '16, pages 28:1-28:4, New York, NY, USA, 2016. ACM.

32. Jennie Duggan, Olga Papaemmanouil, Leilani Battle, and Michael Stonebraker. Skew-aware join optimization for array databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 123-135, New York, NY, USA, 2015. ACM.

33. Weijie Zhao, Florin Rusu, Bin Dong, and Kesheng Wu. Similarity join over array data. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 2007-2022, New York, NY, USA, 2016. ACM.


Review

For citations:


Pavlov V.A., Novikov B.A. Array Database Internals. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2018;30(1):137-160. https://doi.org/10.15514/ISPRAS-2018-30(1)-10



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)