Discovering Near Duplicate Text in Software Documentation
https://doi.org/10.15514/ISPRAS-2017-29(4)-21
Abstract
About the Authors
L. D. KanteevRussian Federation
Yu. O. Kostyukov
Russian Federation
D. V. Luciv
Russian Federation
D. V. Koznov
Russian Federation
M. N. Smirnov
Russian Federation
References
1. Wagner S., Fernández D.M. Analysing Text in Software Projects. Preprint, 2016. URL: https://arxiv.org/abs/1612.00164
2. Parnas D. L. Precise Documentation: The Key To Better Software. Nanz S. (ed.) The Future of Software Engineering, Springer, 2011. DOI: 10.1007/978-3-642-15187-3_8
3. Akhin, M., Itsykson, V. Clone Detection: Why, What and How? Proceedings of CEE-SECR’10, 2010, pp. 36–42. DOI: 10.1109/CEE-SECR.2010.5783148
4. Juergens E. et al. Can clone detection support quality assessments of requirements specifications? Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering, 2010, vol. 2, pp. 79–88.
5. Wingkvist A., Ericsson M., Lincke R., Löwe W. A Metrics-Based Approach to Technical Documentation Quality. Proceedings of 7th International Conference on the Quality of Information and Communications Technology, 2010, pp. 476–481.
6. Nosál M., Porubän J. Preliminary report on empirical study of repeated fragments in internal documentation. Proceedings of the Federated Conference on Computer Science and Information Systems, Gdansk, 2016, pp. 1573–1576.
7. Sajnani H., Saini V., Svajlenko J., Roy C.K., Lopes C.V. Sourcerercc: Scaling code clone detection to big-code. Proceedings of the 38th International Conference on Software Engineering, ACM, New York, USA, 2016, pp. 1157–1168. DOI: 10.1145/2884781.2884877
8. Jiang L., Misherghi G., Su Z., Glondu S. DECKARD: Scalable and accurate tree-based detection of code clones. Proceedings of 29th International Conference on Software Engineering. Institute of Electrical and Electronics Engineers, 2007, pp. 96–105. DOI: 10.1109/ICSE.2007.30
9. Huang T.K., Rahman M.S., Madhyastha H.V., Faloutsos M., Ribeiro B. An analysis of socware cascades in online social networks. Proceedings of the 22Nd International Conference on World Wide Web, 2013, pp. 619–630.
10. Cordy J.R., Roy C.K.: The NiCad clone detector. Proceedings of the 19th IEEE International Conference on Program Comprehension. Institute of Electrical and Electronics Engineers, 2011, pp. 219–220. DOI: 10.1109/ICPC.2011.26
11. Lutsiv D.V., Koznov D.V., Basit H.A., Lieh O.E., Smirnov M.N., Romanovsky K.Yu. An approach for clone detection in documentation reuse. Nauchno-tehnicheskij vestnik informacionnyh tehnologij, mehaniki i optiki [Scientific and Technical Journal of Information Technologies, Mechanics and Optics] vol. 92, issue 4, 2014, pp. 106–114 (in Russian).
12. Koznov D. et al. Clone detection in reuse of software technical documentation. Mazzara M., Voronkov A. (eds.), International Andrei Ershov Memorial Conference on Perspectives of System Informatics, 2015; Lecture Notes in Computer Science, vol. 9609, 2016, pp. 170–185. DOI: 10.1007/978-3-319-41579-6_14
13. Luciv D., Koznov D., Basit H.A., Terekhov A.N. On fuzzy repetitions detection in documentation reuse. Programming and Computer Software, vol. 42, issue 4, 2016, pp. 216–224. DOI: 10.1134/s0361768816040046
14. Basit H.A., Smyth W.F., Puglisi S.J., Turpin A., Jarzabek S. Efficient Token Based Clone Detection with Flexible Tokenization. Proceedings of ACM SIGSOFT International Symposium on the Foundations of Software Engineering, ACM Press, 2007, pp. 513–516. DOI: 10.1145/1295014.1295029
15. Natural Language Toolkit, URL: http://nltk.org/
16. Horie M., Chiba S. Tool support for crosscutting concerns of API documentation. Proceedings of 9th International Conference on Aspect-Oriented Software Development, 2010, pp. 97–108. DOI: 10.1145/1739230.1739242
17. Rago A., Marcos C., Diaz-Pace J.A. Identifying duplicate functionality in textual use cases by aligning semantic actions. International Journal on Software and Systems Modeling, vol. 15, issue 2, 2016, pp. 579–603. DOI: 10.1007/s10270-014-0431-3
18. Nosál’ M., Porubän J. Reusable software documentation with phrase annotations. Open Computer Science, vol. 4, issue 4, 2014, pp. 242-258. DOI: 10.2478/s13537-014-0208-3
19. Bassett P. Framing software reuse – lessons from real world. Prentice Hall, 1996. ISBN: 0-13-327859-X
20. Jarzabek S., Bassett P., Zhang H., Zhang W. XVCL: XML-based Variant Configuration Language. Proceedings of 25th International Conference on Software Engineering, 2003, pp. 810–811. DOI: 10.1109/ICSE.2003.1201298
21. Koznov D., Romanovsky K.. DocLine: A Method for Software Product Lines Documentation Development. Programming and Computer Software, vol. 34, issue 4, 2008, pp. 216–224. DOI: 10.1134/S0361768808040051
22. Romanovsky K., Koznov D., Minchin L. Refactoring the Documentation of Software Product Lines. Central and East European Conference on Software Engineering Techniques, Brno (Czech Republic), 2008; Lecture Notes in Computer Science, vol. 4980, Springer, 2011, pp. 158–170. DOI: 10.1007/978-3-642-22386-0_12
23. Broder A.Z. et al. Syntactic clustering of the web. Computer Networks and ISDN Systems. vol. 29, issue 8, 1997, pp. 1157–1166. DOI: 10.1016/S0169-7552(97)00031-7
24. Documentation Refactoring Toolkit,
25. URL: http://www.math.spbu.ru/user/kromanovsky/docline/index_en.html
26. Basili V., Caldiera G., Rombach H. The Goal Question Metric Approach. Encyclopedia of Software Engineering, Wiley, 1994. DOI: 10.1002/0471028959.sof142
27. Frakes W., Terry C.. Software reuse: metrics and models. ACM Computing Surveys, vol. 28, issue 2, 1996, pp. 415–435. DOI: 10.1145/234528.234531
28. Linux Kernel Documentation, snapshot on Dec 11, 2013.
29. URL: https://github.com/torvalds/linux/tree/master/Documentation/DocBook/
30. Zend PHP Framework documentation, snapshot on Apr 24, 2015.
31. URL: https://github.com/zendframework/zf1/tree/master/documentation
32. DocBook Definitive Guide, snapshot on Apr 24, 2015.
33. URL: http://sourceforge.net/p/docbook/code/HEAD/tree/trunk/defguide/en/
34. SVN Book, snapshot on Apr 24, 2015.
35. URL: http://sourceforge.net/p/svnbook/source/HEAD/tree/trunk/en/book/
36. Braun R.K., Kaneshiro R. Exploiting topic pragmatics for new event detection. Technical report. National Institute of Standards and Technology, Topic Detection and Tracking Workshop, 2004.
37. Jaccard P. Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines [Distribution of Alpine flora in the Dranses Basin and some neighboring regions]. Bulletin de la Société Vaudoise des Sciences Naturelles [Bulletin of the Vaudois Society of Natural Sciences], vol. 140, issue 37, 1901, pp. 241–272 (in French)
38. Drobintsev P.D., Kotlyarov V. P., Letichevsky A.A. A formal approach to test scenarios generation based on guides. Automatic Control and Computer Sciences, vol. 48, issue 7, 2014, pp. 415–423. DOI: 10.3103/S0146411614070062
39. Zelenov S.V., Silakov D.V., Petrenko A.K., Conrad M., Fey I. Automatic test generation for model-based code generators. Proceedings of 2nd International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 75–81. DOI: 10.1109/ISoLA.2006.70
Review
For citations:
Kanteev L.D., Kostyukov Yu.O., Luciv D.V., Koznov D.V., Smirnov M.N. Discovering Near Duplicate Text in Software Documentation. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2017;29(4):303-314. https://doi.org/10.15514/ISPRAS-2017-29(4)-21