Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Automating cluster creation and management for Apache Spark in Openstack cloud

https://doi.org/10.15514/ISPRAS-2014-26(4)-3

Abstract

This article is dedicated to automation of cluster creation and management for Apache Spark MapReduce implementation in Openstack environments. As a result of this project open-source (Apache 2.0 license) implementation of toolchain for virtual cluster on-demand creation in Openstack environments was presented. The article contains an overview of existing solutions for clustering automation in cloud environments by the start of 2014 year. The article provides a shallow overview of issues and problems in Openstack Heat project that provides a compatibility layer for Amazon EC2 API. The final implementation provided in the article is almost strainforward port of existing toolchain for cluster creation automation for Apache Spark in Amazon EC2 environment with some improvements. Also prepared base system virtual machine image for Openstack is provided. Plans for further work are connected with Ansible project. Using Ansible for observed problem will make possible to implement versatile environment-agnostic solution that is able to work using any cloud computing services provider, set of Docker containers or bare-metal clusters without any dependencies for prepared operating system image. Current article doesn't use Ansible due to the lack of key features at the moment of the project start. The solution provided in this article has been already tested in production environment for graph theory research arcticle.

About the Authors

O. Borisenko
ISP RAS
Russian Federation


D. Turdakov
ISP RAS
Russian Federation


S. Kuznetsov
ISP RAS
Russian Federation


References

1. Apache Hadoop project web page — http://hadoop.apache.org/

2. Cloudera CDH Apache Hadoop project web page — http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

3. Infinispan project web page — http://infinispan.org/

4. Basho Riak project web page — http://basho.com/riak/

5. Apache Spark project web page — http://spark.apache.org/

6. M. Chowdhury, M. Zaharia, I. Stoica. Performance and Scalability of Broadcast in Spark. 2010.

7. Gu, Lei, and Huan Li. Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark. High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on. IEEE, 2013.

8. VMWare Serengeti project web page — http://www.vmware.com/hadoop/serengeti

9. Cloudera Manager project web page — http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager.html

10. Openstack Sahara project web page, roadmap — https://wiki.openstack.org/wiki/Sahara/Roadmap

11. Foley, Matt. High Availability HDFS. 28th IEEE Conference on Massive Data Storage, MSST. Vol. 12. 2012.

12. Hunt, Patrick, et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems. USENIX Annual Technical Conference. Vol. 8. 2010.

13. Massie, Matthew, B. Chun, and D. Culler. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing 30.7 (2004): 817-840.

14. Amazon Elastic Compute Cloud (EC2) service webpage — http://aws.amazon.com/ec2/

15. Creeger, Mache. Cloud Computing: An Overview. ACM Queue 7.5 2009.

16. Openstack Heat project web page — https://wiki.openstack.org/wiki/Heat

17. Yokoyama, Shigetoshi, and Nobukazu Yoshioka. Cluster as a Service for self-deployable cloud applications. Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on. IEEE, 2012.

18. Chef project web page — http://www.getchef.com/

19. Salt project web page — http://www.saltstack.com/

20. Ansible project web page — http://www.ansible.com/home

21. In print. K. Chikhradze, А. Korshunov, N. Buzun, N. Kuzyurin. Ispol'zovanie modeli sotsial'noj seti s soobshhestvami pol'zovatelej dlya raspredelyonnoj generatsii sluchajnykh sotsial'nykh grafov [On a model of social network with user communities for distributed generation of random social graphs]. 10-ya Mezhdunarodnaya konferentsiya «Intellektualizatsiya obrabotki informatsii» [10th International conference “Intelligent Information Processing”] 2014.


Review

For citations:


Borisenko O., Turdakov D., Kuznetsov S. Automating cluster creation and management for Apache Spark in Openstack cloud. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014;26(4):33-44. (In Russ.) https://doi.org/10.15514/ISPRAS-2014-26(4)-3



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)