Information Retrieval and Analysis for a Modern Organization

Artyom Topchyan

doi:10.15514/ISPRAS-2016-28(4)-1

Information Retrieval and Analysis for a Modern Organization

Artyom Topchyan

https://doi.org/10.15514/ISPRAS-2016-28(4)-1

Full Text:

PDF (Eng)

Generate QR code

Abstract

With the growing volume and demand for data a major concern for an Organization is to discover what data there actually is, what it contains and how it is being used and by who. The amount of data and the disparate systems used to handle this data increase in their number and complexity every year and unifying these systems becomes more and more complex. In this work we describe an Intelligent search engine system, speciﬁcally designed to tackle the problem of information retrieval and sharing in a large multifaceted organization, that already has many systems in place for each Department, which is an integral part of a joint Operational Data Platform(ODP) for data exploration and processing.

Keywords

data-driven projects, information retrieval, streaming processing, mesos, kafka

About the Author

Artyom Topchyan

Yerevan State University
Armenia

References

1. Topchyan A.R. Enabling Data Driven Projects for a Modern Enterprise. Trudy ISP RAN/Proc. ISP RAS, vol. 28, issue 3, 2016, pp. 209-230. DOI: 10.15514/ISPRAS-2016-28(3)-13

2. Rahman, Nayem, and Fahad Aldhaban. "Assessing the eﬀectiveness of big data initiatives."2015 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 2015.

3. Davenport, Thomas H., and Jill Dych´e. "Big data in big companies."International Institute for Analytics (2013).

4. Dunning, Ted, and Ellen Friedman. Streaming Architecture: New Designs Using Apache Kafka and Mapr Streams. O’Reilly Media.2016.

5. Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. Manning Publications Co, 2015

6. Michael Hausenblas and Nathan Bijnens. Lambda Architecture. http://lambda-architecture.net, 2015.

7. K. Mani Chandy. vent-Driven Applications: Costs, Beneﬁts and Design Approaches, California Institute of Technology, 2006.

8. Akidau, Tyler, et al. "MillWheel: fault-tolerant stream processing at internet scale."Proceedings of the VLDB Endowment 6.11:1033-1044, 2013.

9. Zaharia, Matei, et al. "Discretized streams: Fault-tolerant streaming computation at scale."Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.

10. Akidau, Tyler, et al. "The dataﬂow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, outof-order data processing."Proceedings of the VLDB Endowment 8.12: 1792-1803, 2015.

11. Verma, Abhishek, et al. "Large-scale cluster management at Google with Borg."Proceedings of the Tenth European Conference on Computer Systems. ACM, 2015.

12. Boritz, J. "IS Practitioners’ Views on Core Concepts of Information Integrity". International Journal of Accounting Information Systems. Elsevier, 2011.

13. Netﬂix. Distributed Resource Scheduling with Apache Mesos. http://techblog.netﬂix.com/2016/07/distributedresource-scheduling-with.html

14. Newell, Andrew, et al. "Optimizing distributed actor systems for dynamic interactive services.”. Proceedings of the Eleventh European Conference on Computer Systems. ACM, 2016

15. Cohen, William, Pradeep Ravikumar, and Stephen Fienberg. "A comparison of string metrics for matching names and records.". Kdd workshop on data cleaning and object consolidation. Vol. 3, 2003

16. Hoﬀman, Matthew, Francis R. Bach, and David M. Blei. "Online learning for latent dirichlet allocation.”. Advances in neural information processing systems, 2010

17. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation. “Journal of machine Learning research 3.Jan: 993-1022, 2003

18. Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing order into texts. “Association for Computational Linguistics, 2004.

19. Hasan, Kazi Saidul, and Vincent Ng. "Conundrums in unsupervised key phrase extraction: making sense of the state-of-the-art. "Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010.

20. Broder, Andrei Z. "Identifying and ﬁltering near-duplicate documents. “Annual Symposium on Combinatorial Pattern Matching. Springer Berlin Heidelberg, 2000.

21. E. Cohen et al. "Finding interesting associations without support pruning. "IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 64-78, 2001.

22. Leskovec, Jure, Anand Rajaraman, and Jeﬀrey David Ullman. Mining of massive datasets. Cambridge University Press, 2014.

23. Krestel, Ralf, Peter Fankhauser, and Wolfgang Nejdl. "Latent dirichlet allocation for tag recommendation. “Proceedings of the third ACM conference on Recommender systems. ACM, 2009.

24. Maskeri, Girish, Santonu Sarkar, and Kenneth Heaﬁeld. "Mining business topics in source code using latent dirichlet allocation. “Proceedings of the 1st India software engineering conference. ACM, 2008.

25. Apache Kafka. http://kafka.apache.org, 2015.

26. Gormley, Clinton, and Zachary Tong. Elasticsearch: The Deﬁnitive Guide. "O’Reilly Media, Inc.", 2015.

27. Apache Mesos. http://mesos.apache.org, 2015.

28. Apache Tika. https://tika.apache.org, 2015.

29. Conﬂuent Inc. Kafka-Connect. http://docs.conﬂuent.io, 2015.

Review

For citations:

Topchyan A. Information Retrieval and Analysis for a Modern Organization. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(4):7-28. https://doi.org/10.15514/ISPRAS-2016-28(4)-1

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Information Retrieval and Analysis for a Modern Organization

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy