Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Regular Expressions for Web Advertising Detection based on an Automatic Sliding Algorithm

https://doi.org/10.15514/ISPRAS-2021-33(2)-3

Abstract

This paper presents the automation of a Web advertising recognition algorithm, using regular expressions. Currently, the use of regular expressions, optical character recognition, Databases, and automation tests have been critical for multiple Software implementations. The tests were carried out in three Web browsers. As a result, the detection of advertisements in Spanish, that distract attention and that above all extract information from users was achieved. The main feature of the algorithm is that automatic and versatile execution does not require access to the code of the page in question and that in the future it can be an application with background operation. In addition, being supported by optical character recognition gives us acceptable efficiency in detecting advertising.

About the Authors

Donovan RIAÑO ENRIQUEZ
National Autonomous University of Mexico
Mexico

Student



Rodrigo PINON-AYALA
National Autonomous University of Mexico
Mexico


Guillermo MOLERO-CASTILLO
National Autonomous University of Mexico
Mexico

Ph.D. in Information Technologies, Associate Professor



Everardo BARCENAS
National Autonomous University of Mexico
Mexico

Ph.D., Assistant Professor



Alejandro VELAZQUEZ-MENA
National Autonomous University of Mexico
Mexico

Master of Science, Assistant Professor



References

1. Marketing Digital. What is digital marketing? URL: https://www.mdmarketingdigital.com/en/what-is-digital-marketing, accessed 12/01/2020.

2. Redes Semánticas. URL: http://tesis.uson.mx/digital/tesis/docs/9049/Capitulo1.pdf, accessed 12/01/2020 (in Spanish).

3. Marketing Online: Potencial y Estrategias, 2019. URL: https://www.cecarm.com/Guia_Marketing_Online_Potencial_y_Estrategias_-_CECARM.pdf-6120, accessed 12/01/2020 (in Spanish).

4. Pomol R., González C., González S. Una herramienta didáctica para el aprendizaje interactivo de expresiones regulares. 2013. URL: http://repositorio.uigv.edu.pe/handle/20.500.11818/804, accessed 12/01/2020 (in Spanish).

5. Beltrán R. El uso de expresiones regulares en la detección de errores escritos: implicaciones para el diseño de un corrector gramatical, 2008. URL: https://dialnet.unirioja.es/servlet/articulo?codigo=4007478, accessed 12/01/2020 (in Spanish).

6. Gallego A. La jerarquía de Chomsky y la facultad del lenguaje: consecuencias para la variación y la evolución. Teorema: Revista internacional de filosofía, vol. 27, no. 2, 2008, pp. 47-60 (in Spanish).

7. García Campos I., Herramienta para la corrección automática de autómatas finitos, 2017. URL: https://riull.ull.es/xmlui/handle/915/5846, accessed 12/01/2020 (in Spanish).

8. Sánchez J., López L., Martínez J. Solución para garantizar la privacidad en el Internet de las Cosas. El profesional de la información, vol. 24, no 1, 2015, pp. 62-70 (in Spanish).

9. Ortiz M., Aguilar L., Marín L. Los desafíos del marketing en la era del big data. e-Ciencias de la Información, vol. 6, no. 1, 2016, pp. 1-30 (in Spanish).

10. Riaño D., Molero-Castillo G., Velázquez-Mena A., Bárcenas E. Expresiones regulares para el tratamiento de privacidad de navegadores Web. Abstraction and Application, vol. 25, 2019, pp.121-130 (in Spanish).

11. Cerezo, P., Ad blocking: el modelo publicitario digital, a revisión, Cuadernos de periodistas: revista de la Asociación de la Prensa de Madrid, 2016, pp. 81-89 (in Spanish).

12. Londaitz A., Publicidad en los celulares: Publicidad invasiva vs. derecho a la privacidad. Thesis, Universidad del Salvador, 2011.

13. Bienvenido a Google, la mejor empresa para trabajar, 2013. URL: www.expansion.com/2013/08/23/directivos/1377273795.html, accessed 12/01/2020 (in Spanish).

14. Jarvis J. Y Google, ¿cómo lo haría?, 2000. URL: https://narrativabreve.com/2013/10/libro-google-jeff-harvis.html, accessed 12/01/2020 (in Spanish).

15. Leotta M., Clerissi D., Ricca F., Spadaro C. Comparing the maintainability of selenium webdriver test suites employing different locators: A case study. In Proc. of 1st International Workshop on Joining AcadeMiA and Industry Contributions to Testing Automation, 2013, pp. 53-58

16. Gojare S., Joshi R., Gaigaware D., Analysis and Design of Selenium WebDriver Automation Testing Framework, Procedia Computer Science, vol. 50, 2015, pp. 341-346.

17. Selenium Webdriver, 2017. URL: www.tutorialspoint.com/selenium/pdf/selenium_webdriver.pdf, accessed 12/01/2020.

18. Yih W., Goodman J., Carvalho V. Finding Advertising Keywords on Web Pages, In Proceedings of the 15th International Conference on World Wide Web, 2006, pp. 213-222.

19. Mei T., Li L., Tian X., Tao D., Ngo C. PageSense: Toward Stylewise Contextual Advertising via Visual Analysis of Web Pages. IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, 2018, pp. 254-266.

20. Sánchez D., Viejo A. Privacy-preserving and advertising-friendly web surfing. Computer Communications, vol. 130, 2018, pp. 113-123.

21. Krammer V. An Effective Defense against Intrusive Web Advertising. In Proc. of the Sixth Annual Conference on Privacy, Security and Trust, 2008, pp. 3-14.

22. Sajjad K. Automatic license plate recognition using Python and Opencv. College of Engineering, 2010. URL: https://pdfs.semanticscholar.org/bddf/1200eb17f239e4dce2a9cec938eb8cf305f5.pdf, accessed 12/01/2020.

23. Patel C., Patel A., Patel D. Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications, vol. 55, no. 10, 2012, pp. 50-56.

24. Vallez M. Keyword Research: métodos y herramientas para identificar palabras clave. BiD: textos universitaris de biblioteconomia i documentació, vol. 27, 2011, pp. 1-14 (in Spanish).

25. Slamet C., Andrian R., Maylawati D. et al. Web Scraping and Naïve Bayes Classification for Job Search Engine. In Proc. of the 2nd Annual Applied Science and Engineering Conference, 2018, pp. 1-7.


Review

For citations:


RIAÑO ENRIQUEZ D., PINON-AYALA R., MOLERO-CASTILLO G., BARCENAS E., VELAZQUEZ-MENA A. Regular Expressions for Web Advertising Detection based on an Automatic Sliding Algorithm. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(2):65-76. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(2)-3



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)