Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

A method of automatically estimating user age using social connections

https://doi.org/10.15514/ISPRAS-2016-28(6)-12

Abstract

The work is devoted to methods of social network users’ age detection. Social networks allow users to fill their profiles that may contain an age. Profiles are not fully filled, so the task of unknown attributes detection arises. Explicit and predicted values are used in recommender and marketing systems. Moreover, the predicted values can be used for determining online communities’ demographic profiles and for inferring the target audience of marketing campaigns in the Internet. In this paper a method for predicting unfilled age values is proposed. The method uses the following information available from the social network: explicit users’ ages and social graph. The graph contains nodes representing users and communities. Community is the special page in the Internet that unites users on interests. Friendship relations between users and subscriptions of users on communities represented as edges of the social graph. The method is based on the label propagation in the friendship and subscription graphs. Ages of the users are representd by labels that are propagated in the graph. The scheme of the algorithm is following: initialize user labels according to explicit profiles; build vector model that contains distributions of the neighbours’ ages grouped by user age; compute weights of users and communities, propagate labels to communities; build vector model considering calculated weights; propagate labels to users that have not filled their age in the profile. The paper describes the algorithm and contains experimantal results showing that friendship relations are more useful for age prediction in the social network than communities.

About the Authors

A. G. Gomzin
Institute for System Programming of the Russian Academy of Sciences; Lomonosov Moscow State University
Russian Federation


S. D. Kuznetsov
Institute for System Programming of the Russian Academy of Sciences; Lomonosov Moscow State University; Moscow Institute of Physics and Technology (State University)
Russian Federation


References

1. John D Burger, John Henderson, George Kim, and Guido Zarrella. Discriminating gender on twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1301–1309. Association for Computational Linguistics, 2011.

2. Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. Author gender identification from text. Digital Investigation, 8(1):78–88, 2011.

3. Michael D Conover, Bruno Gon¸calves, Jacob Ratkiewicz, Alessandro Flammini, and Filippo Menczer. Predicting the political alignment of twitter users. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, pages 192–199. IEEE, 2011.

4. Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, and Nitesh V Chawla. Inferring user demographics and social strategies in mobile social networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 15–24. ACM, 2014.

5. Katja Filippova. User demographics and language in an implicit social network. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1478–1488. Association for Computational Linguistics, 2012.

6. Alan Mislove, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 251–260. ACM, 2010.

7. Dong Nguyen, Noah A Smith, and Carolyn P Ros´e. Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123. Association for Computational Linguistics, 2011.

8. Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. Predicting age and gender in online social networks. In Proceedings of the 3rd international workshop on Search and mining user-generated contents, pages 37–44. ACM, 2011.

9. Michael Speriosu, Nikita Sudan, Sid Upadhyay, and Jason Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In Proceedings of the First workshop on Unsupervised Learning in NLP, pages 53–63. Association for Computational Linguistics, 2011.

10. Gomzin, A.G., Kuznetsov, S.D. Methods for Construction of Socio-Demographic Profile of Internet Users. Trudy ISP RAN/Proc. ISP RAS, vol 27, issue 4, 2015, pp. 129-144 (in Russian). DOI: 10.15514/ISPRAS-2015-27(4)-7.


Review

For citations:


Gomzin A.G., Kuznetsov S.D. A method of automatically estimating user age using social connections. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(6):171-184. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(6)-12



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)