Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Some problems on graph databases

https://doi.org/10.15514/ISPRAS-2016-28(4)-12

Abstract

Graph databases appear to be the most popular and relevant among non-relational databases. Its popularity is caused by its relatively easy implementation in the problems in which data have big numbers of relations such as protein-protein interaction and others. With the development of fast internet connection, graph database found another interesting application in representation of social networks. Moreover, graph edges are storable which lowers graph traversing calculation costs. Such system appeared to be natural and in-demand in the era of Internet and social networks. The most significant by size and matter section of graph databases problems is data mining. It contains such problems as associative rules learning, data classification and categorization, clustering, regression analysis etc. In this review, data mining graph database problems are considered which are most commonly presented in modern literature. Their popularity is represented by the big number of publications on these problems on several recent years’ major conferences. Such problems as influence maximization, motif mining, pattern matching and simrank problems are examined. For every type of a problem we analyzed different papers and described basic algorithms which were offered 10-15 years ago. We also considered state-of-the-art solutions as well as some important in-between versions. This review consists of 6 sections. Besides introduction and conclusion, each section is dedicated to its own type of graph database problem.

About the Author

R. I. Guralnik
Saint-Petersburg State University
Russian Federation


References

1. Pettey C., Goasduff L. (2011) Gartner, Inc. Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data (online publication). Available at: http://www.gartner.com/newsroom/id/1731916, accessed 07.08.2016

2. DIS Group, (2014) “Big data.” http://www.dis-group.ru/solutions/data_ management/big_data/, accessed 07.08.2016.

3. Bartenev M.V., Vishnyakov I.E. “Graph database usage to optimize analysis of billing information”, Engineering Journal: Science and Innovations [Inzhenernyi zhurnal: nauka i innovatsii], issue 11, 2013. Available at: http://engjournal.ru/catalog/it/hidden/1058.html

4. Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Byers A. H. “Big data: The next frontier for innovation, competition, productivity,” 2011.

5. Jeh G., Widom J. “Simrank: a measure of structural-context similarity,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 538-543, ACM, 2002.

6. Fogaras D. and R´acz B. “Scaling link-based similarity search,” in Proceedings of the 14th international conference on World Wide Web, pp. 641-650, ACM, 2005.

7. He G., Feng H., Li C., Chen H. “Parallel simrank computation on large graphs with iterative aggregation,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 543-552, ACM, 2010.

8. Li C., Han J., He G., Jin X., Sun Y., Yu Y., Wu T. “Fast computation of simrank for static and dynamic information networks,” in Proceedings of the 13th International Conference on Extending Database Technology, pp. 465-476, ACM, 2010.

9. Li P., Liu H., Yu J. X., He J., Du X. “Fast single-pair simrank computation.,” in SDM, pp. 571-582, SIAM, 2010.

10. Yu W., Lin X., Le J. “Taming computational complexity: Efficient and parallel simrank optimizations on undirected graphs,” in International Conference on Web-Age Information Management, pp. 280-296, Springer, 2010.

11. Lizorkin D., Velikhov P., Grinev M., Turdakov D. “Accuracy estimate and optimization techniques for simrank computation,” The VLDB Journal The International Journal on Very Large Data Bases, vol. 19, no. 1, pp. 45-66, 2010.

12. Yu W., Lin X., Zhang W., Chang L., Pei J. “More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks,” Proceedings of the VLDB Endowment, vol. 7, no. 1, pp. 13-24, 2013.

13. Yu W., Zhang W., Lin X., Zhang Q., Le J. “A space and time efficient algorithm for simrank computation,” World Wide Web, vol. 15, no. 3, pp. 327-353, 2012

14. Maehara T., Kusumoto M., Kawarabayashi K.-i. “Scalable simrank join algorithm,” in 2015 IEEE 31st International Conference on Data Engineering, pp. 603-614, IEEE, 2015.

15. Fujiwara Y., Nakatsuji M., Shiokawa H., Onizuka M. “Efficient search algorithm for simrank,” in Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 589-600, IEEE, 2013.

16. Zhu R., Zou Z., Li J. “Simrank computation on uncertain graphs,” in 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 565-576, May 2016.

17. Domingos P., Richardson M. “Mining the network value of customers,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 57-66, ACM, 2001.´

18. Kempe D., Kleinberg J., Tardos E. “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137-146, ACM, 2003.

19. Chen W., Wang C., Wang Y. “Scalable influence maximization for prevalent viral marketing in large-scale social networks,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1029-1038, ACM, 2010.

20. Goyal A., Bonchi F., L. Lakshmanan V. “A data-based approach to social influence maximization,” Proceedings of the VLDB Endowment, vol. 5, no. 1, pp. 73- 84, 2011.

21. Jung K., Heo W., Chen W. “Irie: Scalable and robust influence maximization in social networks,” in 2012 IEEE 12th International Conference on Data Mining, pp. 918-923, IEEE, 2012.

22. Kim J., Kim S.-K., Yu H. “Scalable and parallelizable processing of influence maximization for large-scale social networks?,” in Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 266-277, IEEE, 2013.

23. Kim D., Lee J.-G., Lee B. S. “Topical influence modeling via topic-level interests and interactions on social curation services,”

24. Li G., Chen S., Feng J., Tan K.-l., Li W.-s. “Efficient location-aware influence maximization,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 87-98, ACM, 2014.

25. Wang X., Zhang Y., Zhang W., Lin X. “Distance-aware influence maximization in geo-social network,”

26. Khan A., Zehnder B., Kossmann D. “Revenue maximization by viral marketing: A social network host’s perspective.”

27. Li H., Bhowmick S. S., Cui J., Gao Y., Ma J. “Getreal: Towards realistic selection of influence maximization strategies in competitive networks,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1525- 1537, ACM, 2015.

28. Borgs C., Brautbar M., Chayes J., Lucier B. “Maximizing social influence in nearly optimal time,” in Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 946-957, Society for Industrial and Applied Mathematics, 2014.

29. Tang Y., Xiao X., Shi Y. “Influence maximization: Near-optimal time complexity meets practical efficiency,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 75-86, ACM, 2014.

30. Mahdian M., Ye Y., Zhang J. “Improved approximation algorithms for metric facility location problems,” in International Workshop on Approximation Algorithms for Combinatorial Optimization, pp. 229-242, Springer, 2002.

31. Gallagher B. “Matching structure and semantics: A survey on graph-based pattern matching,” AAAI FS, vol. 6, pp. 45-53, 2006.

32. Giugno R., Shasha D. “Graphgrep: A fast and universal method for querying graphs,” in Pattern Recognition, 2002. Proceedings. 16th International Conference on, vol. 2, pp. 112-115, IEEE, 2002.

33. Natarajan M. “Understanding the structure of a drug traffcking organization: a conversational analysis,” Crime Prevention Studies, vol. 11, pp. 273-298, 2000.

34. Fan W., Li J., Ma S., Tang N., Wu Y., Wu Y. “Graph pattern matching: From intractable to polynomial time,” Proc. VLDB Endow., vol. 3, pp. 264-275, Sept. 2010.

35. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U. “Network motifs: simple building blocks of complex networks,” Science, vol. 298, no. 5594, pp. 824-827, 2002.

36. Kashtan N., Itzkovitz S., Milo R., Alon U. “Mfinder tool guide,” Department of Molecular Cell Biology and Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot Israel, Tech Rep, 2002.

37. Wernicke S., Rasche F. “Fanmod: a tool for fast network motif detection,” Bioinformatics, vol. 22, no. 9, pp. 1152-1153, 2006.

38. Grochow J. A., Kellis M. “Network motif discovery using subgraph enumeration and symmetry-breaking,” in Annual International Conference on Research in Computational Molecular Biology, pp. 92-106, Springer, 2007.

39. Schreiber F., Schwobbermeyer H. “Frequency concepts and pattern detection for the analysis of motifs in networks,” in Transactions on computational systems biology III, pp. 89-104, Springer, 2005.

40. Omidi S., Schreiber F., Masoudi-Nejad A. “Moda: an efficient algorithm for network motif discovery in biological networks,” Genes & genetic systems, vol. 84, no. 5, pp. 385-395, 2009.

41. Ribeiro P., Silva F. “G-tries: an efficient data structure for discovering network motifs,” in Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1559-1566, ACM, 2010.

42. Gurukar S., Ranu S., Ravindran B. “Commit: A scalable approach to mining communication motifs from dynamic networks,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 475-489, ACM, 2015.


Review

For citations:


Guralnik R.I. Some problems on graph databases. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(4):193-216. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(4)-12



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)