Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Enabling Data Driven Projects for a Modern Enterprise

https://doi.org/10.15514/ISPRAS-2016-28(3)-13

Abstract

With the growing volume and demand for data a major concern for an Organization trying to implement Data Driven projects, is not only how to technically collect, cleanse, integrate, access, but even more so, how and why to use it. There is a lack of unification on a logical and technical level between Data Scientists, IT departments and Business departments, as it is very unclear where the data comes from, what it looks like, what it contains and how to process it in the context of existing systems. So in this paper we present a platform for data exploration and processing, which enables Data-Driven projects, that does not require a complete organizational revamp, but provides a workflow and technical basis for such projects

About the Author

Artyom Topchyan
Yerevan State University
Armenia


References

1. Rahman, Nayem, and Fahad Aldhaban. "Assessing the effectiveness of big data initiatives."2015 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 2015.

2. Davenport, Thomas H, and Jill Dych´e. "Big data in big companies."International Institute for Analytics, 2013

3. Apache Mesos. http://mesos.apache.or, 2015.

4. Dunning, Ted, and Ellen Friedman. Streaming Architecture: New Designs Using Apache Kafka and Mapr Streams. O’Reilly Medi, 2016.

5. Welsh, Matt, D. Culler, and E. Brewer. "SEDA: an architecture for highly concurrent server applications.".Proceedings of the 18th Symposium on Operating Systems Principles (SOSP-18), Banff, Canada, 2001.

6. Verma, Abhishek, et al. "Large-scale cluster management at Google with Borg."//Proceedings of the Tenth European Conference on Computer Systems. ACM, 2015.

7. Artyom Topchyan, Tigran Topchyan. Muscle-based skeletal bipedal locomotion using neural evolution. Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers1-6, 2013.

8. Shearer C. The CRISP-DM model: the new blueprint for data mining. J Data Warehousing ;5:1, 22, 2000

9. Dan Linstedt. Super Charge your Data Warehouse. Dan Linstedt. ISBN 978-0-9866757-1-3, 2010.

10. A. Maksai, J. Bogojeska and D. Wiesmann. "Hierarchical Incident Ticket Classification with Minimal Supervision,". IEEE International Conference on Data Mining, Shenzhen,, 2014, pp.923-928, 2014

11. Alex Gorelik. The Enterprise Big Data Lake: Delivering on the Promise of Hadoop and Data Science in the Enterprise. O’Reilly Medi, 2016

12. Tom White. Hadoop: The definitive guide. O’Reilly Medi, 2012

13. Dixit, Bharvi. Elasticsearch essentials, 2016

14. Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. Manning Publications Co, 2015

15. Michael Hausenblas and Nathan Bijnens. Lambda Architecture. http://lambda-architecture.net/, 2015

16. Patil, Preeti S.; Srikantha Rao; Suryakant B.Patil. Optimization of Data Warehousing System: Simplification in Reporting and Analysis. //IJCA Proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET) (Foundation of Computer Science) 9 (6):33–37, 2011

17. Newell, Andrew, et al. "Optimizing distributed actor systems for dynamic interactive services.".Proceedings of the Eleventh European Conference on Computer Systems. ACM, 2016

18. Apache Avro Project. https://avro.apache.org/docs/current, 2015.

19. Apache Kafka Project. http://kafka.apache.org/documentation.html, 2015.

20. Confluent Kafka-Connect. http://docs.confluent.io/2.0.0/connect, 2015.

21. Cohen, William, Pradeep Ravikumar, and Stephen Fienberg. "A comparison of string metrics for matching names and records.".Kdd workshop on data cleaning and object consolidation. Vol. 3, 2003

22. Hoffman, Matthew, Francis R. Bach, and David M. Blei. "Online learning for latent dirichlet allocation.".Advances in neural information processing systems, 2010

23. Blei, David M, Andrew Y. Ng, and Michael I.Jordan. "Latent dirichlet allocation.".Journal of machine Learning research 3.Jan: 993-1022, 2003

24. Project Calico. https://www.projectcalico.org, 2015.

25. Jain, Raj and Subharthi Paul. "Network virtualization and software defined networking for cloud computing: a survey.".IEEE Communications Magazine 51.11, 24-31, 2013


Review

For citations:


Topchyan A. Enabling Data Driven Projects for a Modern Enterprise. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(3):209-230. https://doi.org/10.15514/ISPRAS-2016-28(3)-13



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)