Survey of streaming processing field
https://doi.org/10.15514/ISPRAS-2017-29(1)-13
Abstract
References
1. Apache Apex. https://apex.apache.org/. [Online; accessed 2017-01-02].
2. Apache Flink: Scalable Batch and Stream Data Processing. https://flink.apache.org/. [Online; accessed 2017-01-02].
3. Apache Kafka. https://kafka.apache.org/. [Online; accessed 2017-01-02].
4. Apache Samza. http://samza.apache.org/. [Online; accessed 2017-01-02].
5. Apache Spark™ - Lightning-Fast Cluster Computing. https://spark.apache.org/. [Online; accessed 2017-01-02].
6. Apache Storm. https://storm.apache.org/. [Online; accessed 2017-01-02].
7. Drools - Business Rules Management System (Java™, Open Source). https://www.drools.org/. [Online; accessed 2017-01-02].
8. Guaranteeing message processing. http://storm.apache.org/releases/current/Guaranteeing-message-processing.html. [Online; accessed 2016-12-23].
9. RocksDB a persistent key-value store. http://rocksdb.org/. [Online; accessed 2017-01-02].
10. Spring. https://spring.io/. [Online; accessed 2017-01-02].
11. An Overview of Apache Streaming Technologies. https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/, 2016. [Online; accessed 2017-01-02].
12. Apache Flume. https://flume.apache.org/, 2016. [Online; accessed 2017-01-02].
13. Heron. A realtime, distributed, fault-tolerant stream processing engine from Twitter. https://twitter.github.io/heron/, 2016. [Online; accessed 2017-01-02].
14. Samza. Comparison Introduction. http://samza.apache.org/learn/documentation/ latest/comparisons/introduction.html, 2016. [Online; accessed 2017-01-02].
15. Project Reactor. https://projectreactor.io/, 2017. [Online; accessed 2017-01-02].
16. Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120-139, August 2003.
17. Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. The stratosphere platform for big data analytics. The VLDB Journal, 23(6):939-964, December 2014.
18. Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, and Volker Markl. Emma in action: Declarative dataflows for scalable data analysis. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 2073-2076. ACM, 2016.
19. Henrique C. M. Andrade, Bugra Gedik, and Deepak S. Turaga. Fundamentals of Stream Processing: Application Design, Systems, and Analytics. Cambridge University Press, New York, NY, USA, 1st edition, 2014.
20. Arvind Arasu, Mitch Cherniack, Eduardo F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. Linear road: A stream data management benchmark. In Mario A. Nascimento, M. Tamer Özsu, Donald Kossmann, Renée J. Miller, José A. Blakeley, and K. Bernhard Schiefer, editors, (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 480-491. Morgan Kaufmann, 2004.
21. Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. Lightweight asynchronous snapshots for distributed dataflows. CoRR, abs/1506.08603, 2015.
22. S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holderbaugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky. Benchmarking streaming computation engines: Storm, flink and spark streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1789-1792, May 2016.
23. Saliya Ekanayake. Towards a systematic study of big data performance and benchmarking. PhD thesis, the School of Informatics and Computing, Indiana University, United States - Indiana, 10 2016. http://pqdtopen.proquest.com/doc/1845860615.html?FMT=ABS.
24. Hueske Fabian. Stream Processing for Everyone with SQL and Apache Flink. https://flink.apache.org/news/2016/05/24/stream-sql.html, 2016. [Online; accessed 2017-01-02].
25. Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pages 1197-1208, New York, NY, USA, 2013. ACM.
26. Lukasz Golab and M. Tamer Özsu. Issues in data stream management. SIGMOD Rec., 32(2):5-14, June 2003.
27. Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):46:1-46:34, March 2014.
28. Kreps Jay. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform. https://www.confluent.io/blog/stream-data-platform-1/, https://www.confluent.io/blog/stream-data-platform-2/, 2015. [Online; accessed 2017-01-02].
29. Kreps Jay. Introducing kafka streams: Stream processing made simple - confluent. https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ , 2016. [Online; accessed 2017-01-02].
30. Tzoumas Kostas, Ewen Stephan, and Metzger Robert. High-throughput, low-latency, and exactly-once stream processing with apache flink. http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/, 2015. [Online; accessed 2016-12-23].
31. Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. Stream bench: Towards benchmarking modern distributed stream computing frameworks. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, UCC ’14, pages 69-78, Washington, DC, USA, 2014. IEEE Computer Society.
32. Nathan Marz and James Warren. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Greenwich, CT, USA, 1st edition, 2015.
33. Diana Matar. Benchmarking Fault-Tolerance in Stream Processing Systems. Master’s thesis. TU-Berlin, 2016, 57 pp.
34. Zaharia Matei, Wendell Patrick, and Das Tathagata. Diving into apache spark streaming’s execution model. https://databricks.com/blog/2015/07/30/diving-into-apache-spark-streamings-execution-model.html, 2015. [Online; accessed 2016-12-23].
35. Guido Mazza. big data streaming processing engines under the umbrella of the apache foundation: benchmark and industrial applications. Master’s thesis. Universita` degli Studi di Modena e Reggio Emilia, 2015. http://www.dbgroup.unimo.it/tesi/Magistrale/ 201516_Guido_Mazza_tesi.pdf
36. Gualtieri Mike, Curran Rowan, Kisker Holger, Miller Emily, and Izzi Matthew. The forrester wave™: Big data streaming analytics, q1 2016. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-2, https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-big-data-streaming-analytics-108218.pdf, 2016. [Online; accessed 2017-01-02].
37. Zapletal Petr. Comparison of apache stream processing frameworks: Part 1. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1, 2016. [Online; accessed 2016-12-23].
38. Zapletal Petr. Comparison of apache stream processing frameworks: Part 2. http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-2, 2016. [Online; accessed 2016-12-23].
39. Tilmann Rabl, Michael Frank, Manuel Danisch, Hans-Arno Jacobsen, and Bhaskar Gowda. The vision of bigbench 2.0. In Proceedings of the Fourth Workshop on Data Analytics in the Cloud, DanaC’15, pages 3:1-3:4, New York, NY, USA, 2015. ACM.
40. Michael Stonebraker, Uǧur Çetintemel, and Stan Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec., 34(4):42-47, December 2005.
41. Feng Tao. Benchmarking Apache Samza: 1.2 million messages per second on a single node. https://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node, 2015. [Online; accessed 2016-12-01].
42. Rohrmann Till. Introducing Complex Event Processing (CEP) with Apache Flink. https://flink.apache.org/news/2016/04/06/cep-monitoring.html, 2016. [Online; accessed 2017-01-02].
43. Rozov Vlad. Throughput, Latency, and Yahoo! Performance Benchmarks. Is there a winner? https://community.mapr.com/community/exchange/blog/2016/12/05/ throughput-latency-and-yahoo-performance-benchmarks-is-there-a-winner-by-vlad-rozov, 2016. [Online; accessed 2017-01-02].
Review
For citations:
Samarev R.S. Survey of streaming processing field. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2017;29(1):231-260. (In Russ.) https://doi.org/10.15514/ISPRAS-2017-29(1)-13