Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Methods for construction of socio-demographic profile of Internet users

https://doi.org/10.15514/ISPRAS-2015-27(4)-7

Abstract

The paper is devoted to methods for construction of socio-demographic profile of Internet users. Gender, age, political and religion views, region, relationship status are demographic attributes. This work is a survey of methods that detect demographic attributes from user’s profile and messages. The most of observed works are devoted to gender detection. Age, political views and region are also interested researches. The most of solutions are based on supervised machine learning. Each step of solution is observed in this work: data collection, feature extraction, feature selection, classifiers, evaluation of methods

About the Authors

A. . Gomzin
ISP RAS; CMC MSU
Russian Federation


S. . Kuznetsov
ISP RAS; CMC MSU; Moscow Institute of Physics and Technology
Russian Federation


References

1. Li Q., Kim B. M. Constructing user profiles for collaborative recommender system. Advanced Web Technologies and Applications. – Springer Berlin Heidelberg, 2004. – С. 100-110.

2. Bharat K., Lawrence S., Sahami M. Generating user information for use in targeted advertising : patent. 10/750,363 США. – 2003.

3. Spisok social'nyh setej. Wikipedia. List of social networks https://ru.wikipedia.org/wiki/Список_социальных_сетей

4. Koeshunov A. et al., Opredelenie demograficheskih atributov pol'zovatelej mikroblogov [Microblogs’ users’ demographic attributes detection]. Trudy ISP RАN [The Proceedings of ISP RAS], 2013. – T. 25, pp. 179-194. DOI: 10.15514/ISPRAS-2013-25-10

5. Filippova K. User demographics and language in an implicit social network. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. – Association for Computational Linguistics, 2012. – P. 1478-1488.

6. Cheng N., Chandramouli R., Subbalakshmi K. P. Author gender identification from text. Digital Investigation. – 2011. – T. 8. – №. 1. – P. 78-88.

7. Leskovec J., Faloutsos C. Sampling from large graphs. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. – ACM, 2006. – P. 631-636.

8. Gjoka M. et al. Walking in Facebook: A case study of unbiased sampling of OSNs. INFOCOM, 2010 Proceedings IEEE. – IEEE, 2010. – P. 1-9.

9. Conover M. D. et al. Predicting the political alignment of twitter users. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. – IEEE, 2011. – P. 192-199.

10. Rao D. et al. Classifying latent user attributes in twitter. Proceedings of the 2nd international workshop on Search and mining user-generated contents. – ACM, 2010. – P. 37-44.

11. Deitrick W. et al. Gender identification on Twitter using the modified balanced winnow. – 2012

12. Miller Z., Dickinson B., Hu W. Gender prediction on twitter using stream algorithms with N-gram character features. – 2012.

13. Burger J. D. et al. Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing. – Association for Computational Linguistics, 2011. – P. 1301-1309.

14. Alowibdi J. S., Buy U. A., Yu P. Empirical evaluation of profile characteristics for gender classification on twitter. Machine Learning and Applications (ICMLA), 2013 12th International Conference on. – IEEE, 2013. – T. 1. – P. 365-369.

15. Sloan L. et al. Knowing the tweeters: Deriving sociologically relevant demographics from Twitter. Sociological Research Online. – 2013. – T. 18. – №. 3. – P. 7.

16. Fortunato S. Community detection in graphs. Physics Reports. – 2010. – T. 486. – №. 3. – P. 75-174.

17. Peersman C., Daelemans W., Van Vaerenbergh L. Predicting age and gender in online social networks. Proceedings of the 3rd international workshop on Search and mining user-generated contents. – ACM, 2011. – P. 37-44.

18. Nguyen D., Smith N. A., Rosé C. P. Author age prediction from text using linear regression. Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. – Association for Computational Linguistics, 2011. – P. 115-123.

19. Korshunov A., Gomzin A. Tematicheskoe modelirovanie tekstov na estestvennom yazyke [Topic modeling of natural language texts]. Trudy ISP RАN [The Proceedings of ISP RAS] – 2012. – T. 23. pp. 215-244. DOI: 10.15514/ISPRAS-2012-23-13

20. Molina L. C., Belanche L., Nebot À. Feature selection algorithms: A survey and experimental evaluation. Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. – IEEE, 2002. – P. 306-313.

21. Zheng Z., Wu X., Srihari R. Feature selection for text categorization on imbalanced data. ACM Sigkdd Explorations Newsletter. – 2004. – T. 6. – №. 1. – P. 80-89.


Review

For citations:


Gomzin A., Kuznetsov S. Methods for construction of socio-demographic profile of Internet users. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(4):129-144. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(4)-7



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)