Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Modern Data Management Systems

Abstract

Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies and data availability are also very important, as well as data consistency guarantees. Traditional SQL-oriented DBMSs, despite their popularity, ACID transactions and rich features, do not scale well and thus are not suitable in certain cases. A number of new data management systems and approaches have emerged over the last decade intended to resolve scalability issues. This paper reviews several classes of such systems and key problems they are able to solve. A large variety of systems and approaches due to the general trend toward specialization in the field of SMS: every data management system has been adapted to solve a certain class of problems. Thus, the selection of specific solutions due to the specific problem to be solved: the expected load, the intensity ratio of read and write, the form of data storage and query types, the desired level of consistency, reliability requirements, the availability of client libraries for the selected language, etc.

About the Authors

S. D. Kuznetsov
ISP RAS, Moscow
Russian Federation


A. V. Poskonin
MSU, Moscow
Russian Federation


References

1. M. Stonebraker и U. Çetintemel, «"One Size Fits All": An Idea Whose Time Has Come and Gone,» в ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, Washington, 2005.

2. W. Vogels, «Eventually Consistent,» ACM Queue, т. 6, № 6, 2008.

3. H. Wada, A. Fekete, L. Zhaoy, K. Lee и A. Liu, «Data Consistency Properties and the Tradeoffs in Commercial Cloud Storages: the Consumers’ Perspective,» в Conference on Innovative Data Systems Research, 2011.

4. R. Baldoni и M. Raynal, «Fundamentals of Distributed Computing - A Practical Tour of Vector Clock Systems,» 2002. URL: http://net.pku.edu.cn/~course/cs501/2008/reading/a_tour_vc.html.

5. T. B. Douglas, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer и C. H. Hauser, «Managing update conflicts in Bayou, a weakly connected replicated storage system,» в SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles, New York, NY, USA, 1995.

6. D. Merriman, «On Distributed Consistency,» 26 марта 2010. URL: http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1

7.

8.

9. E. Brewer, «Towards Robust Distributed Systems,» в ACM Symposium on the Principles of Distributed Computing, Portland, Oregon, 2000.

10. S. Gilbert и N. Linch, «Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services,» ACM SIGACT News, т. 33, № 2, pp. 51-59, 2002.

11. D. Abadi, «Problems with CAP, and Yahoo’s little known NoSQL system,» 2010. URL: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html.

12. J. Dean и S. Ghemawat, «MapReduce: Simplified Data Processing on Large Clusters,» в OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, 2004.

13. «Apache Hadoop,» URL: http://hadoop.apache.org/.

14. «MongoDB,» 10gen, URL: http://www.mongodb.org/.

15. «Apache CouchDB,» URL: http://couchdb.apache.org/.

16. «Riak,» Basho, URL: http://basho.com/riak/.

17. T. White, Hadoop: The Definitive Guide, O'Reilly Media, 2009.

18. «Apache HBase,» URL: http://hbase.apache.org/.

19. «Apache Cassandra,» URL: http://cassandra.apache.org/.

20. «Apache Pig,» URL: http://pig.apache.org/.

21. C. Strozzi, «NoSQL: A Relational Database Management System,» URL: http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page.

22. J. Gray, «The Transaction Concept: Virtues and Limitations,» в Seventh International Conference on Very Large Databases, 1981.

23. «NOSQL Databases,» URL: http://nosql-database.org/.

24. A. Wiggins, «SQL Databases Don't Scale,» 2009. URL: http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/.

25. D. Obasanjo, «Building scalable databases: Denormalization, the NoSQL movement and Digg,» 2009. URL: http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx.

26. B. A. Philip, V. Hadzilacos и N. Goodman, «Distributed Recovery,» в Concurrency Control and Recovery in Database Systems, Addison Wesley Publishing Company, 1987, pp. 240-264.

27. R. Cattell, «Scalable SQL and NoSQL Data Stores,» 2011. URL: http://www.cattell.net/datastores/Datastores.pdf.

28. A. Lith и J. Mattsson, Investigating storage solutions for large data: A comparison of well performing and scalable data storage, Goteborg, Sweden: Chalmers University Of Technology, Department of Computer Science and Engineering, 2010.

29. C. Strauch, «NoSQL Databases,» 2011. URL: http://www.christof-strauch.de/nosqldbs.pdf.

30. P. A. Bernstein и N. Goodman, «Concurrency Control in Distributed Database Systems,» ACM Computing Surveys, т. 13, № 2, pp. 185-221, 1981.

31. «memcached - a distributed memory object caching system,» URL: http://memcached.org/.

32. «memcachedb - A distributed key-value storage system designed for persistent,» URL: http://memcachedb.org/.

33. «Couchbase Server,» URL: http://www.couchbase.com/.

34. «Project Voldemort,» URL: http://www.project-voldemort.com/.

35. J. Kreps, «Project Voldemort: Scaling Simple Storage at LinkedIn,» 20 мая 2009. URL: http://blog.linkedin.com/2009/03/20/project-voldemort-scaling-simple-storage-at-linkedin/.

36. D. Karger, A. Sherman, A. Berkheimer, B. Bogstad, R. Dhanidina, K. Iwamoto, B. Kim, L. Matkins и Y. Yerushalmi, «Web Caching with Consistent Hashing,» MIT Laboratory for Computer Science, 1999. URL: http://www8.org/w8-papers/2a-webserver/caching/paper2.html.

37. «Oracle Berkeley DB,» Oracle, URL: http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html.

38. «Project Voldemort Design,» URL: http://www.project-voldemort.com/voldemort/design.html.

39. «Amazon DynamoDB,» Amazon, URL: http://aws.amazon.com/dynamodb/.

40. W. Vogels, «Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications,» 18 января 2012. URL: http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html.

41. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall и W. Vogels, «Dynamo: Amazon’s Highly Available Key-value Store,» в 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, 2007.

42. «Redis,» URL: http://redis.io/.

43. «Who's using Redis?,» URL: http://redis.io/topics/whos-using-redis.

44. «Introducing JSON,» URL: http://json.org/.

45. «Riak Users,» URL: http://basho.com/riak-users/.

46. «NoSQL Database, In-Memory or Flash Optimized and Web Scale - Aerospike,» URL: http://www.aerospike.com/.

47. «Aerospike - Acid Compliant Database for Mission-Critical Applications,» URL: http://www.aerospike.com/performance/acid-compliance/.

48. V. Srinivasan и B. Bulkowski, «Citrusleaf: A Real-Time NoSQL DB which Preserves ACID,» в Very Large Databases (VLDB), 2010.

49. «DB-Engines Ranking,» URL: http://db-engines.com/en/ranking.

50. «MongoDB Production Deployments,» URL: http://www.mongodb.org/about/production-deployments/.

51. «CouchDB Lounge,» URL: http://tilgovi.github.com/couchdb-lounge/.

52. «CouchDB in the Wild,» URL: http://wiki.apache.org/couchdb/CouchDB_in_the_wild.

53. Couchbase, «Learn about Couchbase Server,» URL: http://www.couchbase.com/learn.

54. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra и A. Fikes, «Bigtable: A Distributed Storage System for Structured Data,» в OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, 2006.

55. «Apache HBase ACID Properties,» URL: http://hbase.apache.org/acid-semantics.html.

56. «Cassandra Users,» URL: http://www.datastax.com/cassandrausers.

57. 451 Research, «NoSQL, NewSQL and Beyond: The drivers and use cases for database alternatives,» 2011.

58. P. Campaniello, «The NewSQL Market Breakdown,» 2011. URL: http://www.scalebase.com/the-story-of-newsql/.

59. P. Venkatesh, «NewSQL - The New Way to Handle Big Data,» 2012. URL: http://www.linuxforu.com/2012/01/newsql-handle-big-data/.

60. «VoltDB,» URL: http://voltdb.com.

61. «H-Store,» URL: http://hstore.cs.brown.edu/.

62. M. Stonebraker, «Errors in Database Systems, Eventual Consistency, and the CAP Theorem,» 2010. URL: http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext.

63. «Clustrix,» URL: http://www.clustrix.com/.

64. «MySQL,» URL: http://www.mysql.com/.

65. S. K. Cabral и K. Murphy, MySQL Administrator's Bible, Wiley, 2009.

66. «MySQL Cluster,» URL: http://www.mysql.com/products/cluster/.

67. «TokuDB for MySQL,» URL: http://www.tokutek.com/products/tokudb-for-mysql/.

68. «dbShards,» URL: http://www.dbshards.com.

69. «ScaleBase,» URL: http://www.scalebase.com/.

70. T. Neward, «The Vietnam of Computer Science,» 26 июня 2006. URL: http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx.


Review

For citations:


Kuznetsov S.D., Poskonin A.V. Modern Data Management Systems. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2013;24. (In Russ.)



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)