Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Development of Legal Document Classification System Based on Support Vector Machine

https://doi.org/10.15514/ISPRAS-2023-35(2)-4

Abstract

This paper was prepared while developing text classification system for legal documents, especially those that issued by Legislative Assembly of Perm Krai. The problem in question is a lack of solutions that meet regional requirements, the main of which is the classification used in region. The research that evaluates applications of Natural Language Processing models is conveyed. The primary result of the study is the actual applicability of Support Vector Machine (SVM) to preprocessed legal document categorization. There were a server-side API constructed to perform the task, and a server-side models pre-trained of which SVM is favored.

About the Authors

Yuri NASU
HSE University
Russian Federation

3rd year student of the direction «Software Engineering» of the Faculty of Socio-Economic and Computer Science, HSE University, Perm



Vyacheslav Vladimirovich LANIN
HSE University
Russian Federation

Senior Lecturer, Department of Information Technologies in Business, Faculty of Socio-Economic and Computer Science, HSE University, Perm



References

1. Search Results in Russian Legislation. Available at: http://pravo.gov.ru/proxy/ips/?searchres=&bpas=cd00000&a3=102000505&a3type=1&a3value=%D4%E5%E4%E5%F0%E0%EB%FC%ED%FB%E9+%E7%E0%EA%EE%ED&a6=&a6type=1&a6value=&a15=&a15type=1&a15value=&a7type=1&a7from=&a7to=&a7date=&a8=&a8type=1&a1=&a0=&a16=&a16type=1&a16value=&a17=&a17type=1&a17value=&a4=102000037%3B102000038&a4type=1&a4value=&a23=&a23type=1&a23value=&textpres=&sort=7&x=65&y=10, accessed 06 Apr. 2023.

2. Shaji A. Serving a Machine Learning Model with FastAPI and Streamlit. Available at: https://testdriven.io/blog/fastapi-streamlit/, accessed 01 Apr. 2023.

3. Li S. Multi-Class Text Classification Model Comparison and Selection. Available at:: https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568, accessed on 01 Apr. 2023.

4. Joachims T. Transductive inference for text classification using support vector machines. In Proc. of the Sixteenth International Conference on Machine Learning, 1999, pp. 200-09.

5. Tong S., Koller D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, vol. 2, 2001, pp. 45-66.

6. Fernandez-Delgado M., Cernadas E. et al. Do we need hundreds of classifiers to solve real world classification problems. Journal of Machine Learning Research, vol. 15, issue 1, 2014, pp. 3133-3181.

7. Leslie C.S., Eskin E., Noble W.S. The spectrum kernel: A string kernel for SVM protein classification. In Proc. of the Pacific Symposium on Biocomputing, 2002, pp. 566–575.

8. Eskin E., Weston J. et al. Mismatch string kernels for SVM protein classification. In Proc. of the 15th International Conference on Neural Information Processing Systems (NIPS), 2002, pp. 1441-1448.

9. McCallum A., Nigam K. A comparison of event models for naive bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, 1998, pp. 41-48.

10. Kim S.B., Han K.S. et al. Some effective techniques for naive bayes text classification. IEEE transactions on knowledge and data engineering, vol. 18, issue 11, 2006, pp. 1457-1466.


Review

For citations:


NASU Yu., LANIN V.V. Development of Legal Document Classification System Based on Support Vector Machine. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(2):49-56. https://doi.org/10.15514/ISPRAS-2023-35(2)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)