Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

MapReduce: within, outside, or on the side-by-side with parallel DBMSs?

Abstract

The approaches of use of MapReduce technology together with analytical DBMSs are discussed. The paper considers approaches where one implements MapReduce within a kernel of a parallel DBMS, where MapReduce serves as a communication infrastructure of a new parallel DBMS, and where one uses MapReduce in a symbiotic unity with a parallel DBMS. As examples of the first approach, we consider features of massively-parallel DBMSs Greenplum Database and nCluster of Greenplum and Aster Data Systems companies correspondingly. The second approach is used in the project HadoopDB of the universities Yale and Brown. Finally, the third approach the Vertica company is developing.

About the Author

Sergey D. Kuznetsov.
ISP RAS, Moscow
Russian Federation


References

1. Rakesh Agrawal, Anastasia Ailamaki, Philip A. Bernstein, Eric A. Brewer, Michael J. Carey, Surajit Chaudhuri, AnHai Doan, Daniela Florescu, Michael J. Franklin, Hector Garcia Molina, Johannes Gehrke, Le Gruenwald, Laura M. Haas, Alon Y. Halevy, Joseph M. Hellerstein, Yannis E. Ioannidis, Hank F. Korth, Donald Kossmann, Samuel Madden, Roger Magoulas, Beng Chin Ooi, Tim O’Reilly, Raghu Ramakrishnan, Sunita Sarawagi, Michael Stonebraker, Alexander S. Szalay, Gerhard Weikum. The Claremont Report on Database Research, http://db.cs.berkeley.edu/claremont/claremontreport08.pdf, 2008 г.

2. Перевод на русский язык: Ракеш Агравал, Анастасия Айламаки, Филипп Бернштейн, Эрик Брювер, Майкл Кери, Сураджит Чаудхари, Анхай Доан, Даниэла Флореску, Майкл Франклин, Гектор Гарсиа Молина, Йоханнес Герке, Ле Грюнвальд, Лаура Хаас, Элон Хэлеви, Джозеф Хелерстейн, Яннис Иоаннидис, Хэнк Корт, Дональд Коссман, Сэмюэль Мэдден, Роджер Магулас, Бенг Чин Ой, Тим О’Рейли, Раджу Рамакришнан, Суннита Сарагави, Майкл Стоунбрейкер, Александер Залай, Герхард Вейкум. Клермонтский отчет об исследованиях в области баз данных, http://citforum.ru/database/articles/claremont_report/, 2008 г.

3. Curt Monash. Data warehouse appliances – fact and fiction, http://www.dbms2.com/2007/12/03/data-warehouse-appliances-%e2%80%93-fact-and-fiction/, December 3, 2007

4. Википедия. Britton Lee, Inc., http://en.wikipedia.org/wiki/Britton_Lee,_Inc., 2010

5. Teradata Home Page, http://www.teradata.com/t/, 2010

6. C.H.C. Lemg and K.S. Wong. File Processing Efficiency on the Content Addressable File Store, http://www.vldb.org/conf/1985/P282.PDF, Proceedings of the VLDB Conference, Stockholm, 1985, 282-291

7. Netezza Home Page, http://www.netezza.com/, 2010

8. Vertica Systems Home Page, http://www.vertica.com/, 2010

9. DATAllegro Home Page, http://www.datallegro.com/, 2010

10. Greenplum Home Page, http://www.greenplum.com/, 2010

11. Aster Data Systems Home Page, http://www.asterdata.com/, 2010

12. Kognitio Home Page, http://www.kognitio.com/, 2010

13. EXASOL AG Home Page, http://www.exasol.com/, 2010

14. Calpont Corporation Home Page, http://www.calpont.com/, 2010

15. Dataupia Corporation Home Page, http://www.dataupia.com/, 2010

16. Infobright Home Page, http://www.infobright.com/, 2010

17. Kickfire Home Page, http://www.kickfire.com/, 2010

18. SQL Server 2008 R2 Parallel Data Warehouse, http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx, 2010

19. Ingres Corporation Home Page, http://www.ingres.com/, 2010

20. PostgreSQL Home Page, http://www.postgresql.org/, 2010

21. MySQL Home Page, http://www.mysql.com/, 2010

22. Oracle Exadata, http://www.oracle.com/us/products/database/exadata/index.html, 2010

23. Richard Hackathorn, Colin White. Data Warehouse Appliances: Evolution or Revolution?, http://www.beyeresearch.com/study/4639, June 26, 2007

24. M. Stonebraker and U. Cetintemel. One Size Fits All: An Idea whose Time has Come and Gone, http://www.cs.brown.edu/%7Eugur/fits_all.pdf // Proc. ICDE, 2005, 2-11.

25. Перевод на русский язык: Майкл Стоунбрейкер, Угур Кетинтемел. Один размер пригоден для всех: идея, время которой пришло и ушло, http://citforum.ru/database/articles/one_size_fits_all/, 2007

26. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, http://labs.google.com/papers/mapreduce.html // Proceedings of the Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004, 137–150

27. Michael Stonebraker, David J. DeWitt. MapReduce: A major step backwards, http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/, January 17, 2008

28. Michael Stonebraker, David J. DeWitt. MapReduce II, http://databasecolumn.vertica.com/database-innovation/mapreduce-ii/, January 25, 2008

29. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis, http://cs-www.cs.yale.edu/homes/dna/papers/benchmarks-sigmod09.pdf // Proceedings of the 35th SIGMOD International Conference on Management of Data, 2009, Providence, Rhode Island, USA, 165-178.

30. Перевод на русский язык: Эндрю Павло, Эрик Паулсон, Александр Разин, Дэниэль Абади, Дэвид Девитт, Сэмюэль Мэдден, Майкл Стоунбрейкер. Сравнение подходов к крупномасштабному анализу данных, http://citforum.ru/database/articles/mr_vs_dbms/, 2009

31. Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, Caleb Welton. MAD Skills: New Analysis Practices for Big Data, http://db.cs.berkeley.edu/jmh/papers/madskills-032009.pdf // Proceedings of the VLDB'09 Conference, Lyon, France, August 24-28, 2009, 1481-1492.

32. Перевод на русский язык: Джеффри Коэн, Брайен Долэн, Марк Данлэп, Джозеф Хеллерстейн, Кейлэб Велтон. МОГучие способности: новые приемы анализа больших данных, http://citforum.ru/database/articles/mad_skills/, 2009

33. Eric Friedman, Peter Pawlowski, John Cieslewicz. SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable userdefined functions, http://www.asterdata.com/resources/downloads/whitepapers/sqlmr.pdf // Proceedings of the 35th VLDB Conference, August 24-28, 2009, Lyon, France, 1402-1413.

34. Перевод на русский язык: Эрик Фридман, Питер Павловски и Джон Кислевич. SQL/MapReduce: практический подход к поддержке самоописываемых, полиморфных и параллелизуемых функций, определяемых пользователями, http://citforum.ru/database/articles/asterdata_sql_mr/, 2010

35. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, http://www.vldb.org/pvldb/2/vldb09-861.pdf // Proceedings of the 35th VLDB Conference, August 24-28, 2009, Lyon, France, 922-933.

36. Перевод на русский язык: Азза Абузейд, Камил Байда-Павликовски, Дэниэль Абади, Ави Зильбершац, Александр Разин. HadoopDB: архитектурный гибрид технологий MapReduce и СУБД для аналитических рабочих нагрузок, http://citforum.ru/database/articles/hadoopdb/, 2010

37. Michael Stonebraker, Daniel Abadi, David J. Dawitt, Sam Madden, Erik Paulson, Andrew Pavlo and Alexander Rasin. MapReduce and Parallel DBMSs: Friends or Foes?, http://database.cs.brown.edu/papers/stonebraker-cacm2010.pdf. // Communications of the ACM, vol. 53, no. 1, January 2010, 64-71.

38. Перевод на русский язык: Майкл Стоунбрейкер, Дэниэль Абади, Дэвит Девитт, Сэм Мэдден, Эрик Паулсон, Эндрю Павло и Александр Разин. MapReduce и параллельные СУБД: друзья или враги?, http://citforum.ru/database/articles/mr_vs_dbms-2/, 2010

39. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System, http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/ru/papers/gfs-sosp2003.pdf // Proceedings of the ACM Symposium on Operating Systems Principles, Bolton Landing, New York, USA, October 19–22, 2003, 29 - 43

40. Hadoop MapReduce Home Page, http://hadoop.apache.org/mapreduce/, 2010

41. Apache Hadoop Home Page, http://hadoop.apache.org/, 2010

42. The Apache Software Foundation Home Pagem, http://www.apache.org/, 2010

43. Hadoop Distributed File System Home Page, http://hadoop.apache.org/hdfs/, 2010

44. Map/Reduce Tutorial, http://hadoop.apache.org/common/docs/current/mapred_tutorial.htm, 2010

45. Lawrence A. Rowe, Michael R. Stonebraker. The POSTGRES Data Model, http://www.vldb.org/conf/1987/P083.PDF // Proceedings of the 13th VLDB Conference, Brighton, 1987, 83-96

46. Michael Stonebraker, Erlka Anderson, Eric Hanson and Brad Ruben. Quel as a Data Type. ACM SIGMOD Record, Volume 14 , Issue 2 (June 1984), 208-214

47. Michael Stonebraker, Jeff Anton, Eric N. Hanson. Extending a Database System with Procedures, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.4497&rep=rep1&type=pdf // ACM Trans. Database Syst. 12(3), 1987, 350-376

48. MPI: The Message Passing Interface, http://parallel.ru/tech/tech_dev/mpi.html, 2010

49. Michael Stonebraker, JeffAnton, and Michael Hirohama. Extensibility in Postgres, http://sites.computer.org/debull/87JUN-CD.pdf // IEEE Database Engineering Bulletin 10(2), June 1987, 16-24

50. PostgreSQL 8.4.3 Documentation. V. Server Programming, http://www.postgresql.org/docs/8.4/static/server-programming.html, 2010

51. A Unified Engine for RDBMS and MapReduce, http://www.greenplum.com/download.php?alias=register-map-reduce&file=Greenplum-MapReduce-Whitepaper.pdf, Greenplum Whitepaper, 2009

52. Python Package Index (PyPi) Home Page, http://pypi.python.org/pypi. 2010

53. Comprehensive Perl Archive Network (CPAN) Home Page, http://www.cpan.org/, 2010

54. Ajeet Singh. Aster Data’s SQL-MapReduce: Deriving Deep Insights from Large Datasets, http://www.asterdata.com/resources/assets/wp_Aster_Data_4.0_MapReduce_Technical_Whitepaper.pdf, Aster Data Whitepaper, 2009

55. HadoopDB Home Page, http://db.cs.yale.edu/hadoopdb/hadoopdb.html, 2010

56. Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, Avi Silberschatz. HadoopDB in Action: Building Real World Applications, http://cs-www.cs.yale.edu/homes/dna/papers/hadoopdb-demo.pdf // Proceedings of the 36th SIGMOD International Conference on Management of Data, 2010, Indianapolis, Indiana, USA.

57. Hive Home Page, http://hadoop.apache.org/hive/, 2010

58. Aaron Kimball. Database Access with Hadoop, http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/, March 06, 2009


Review

For citations:


Kuznetsov. S.D. MapReduce: within, outside, or on the side-by-side with parallel DBMSs? Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2010;19. (In Russ.)



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)