Uncovering the information core in recommender systems

Zeng, Wei; Zeng, An; Liu, Hao; Shang, Ming-Sheng; Zhou, Tao

doi:10.1038/srep06140

Download PDF

Article
Open access
Published: 21 August 2014

Uncovering the information core in recommender systems

Wei Zeng^1,2,
An Zeng^3,4,
Hao Liu³,
Ming-Sheng Shang^1,3 &
…
Tao Zhou¹

Scientific Reports volume 4, Article number: 6140 (2014) Cite this article

5525 Accesses
49 Citations
12 Altmetric
Metrics details

Subjects

Abstract

With the rapid growth of the Internet and overwhelming amount of information that people are confronted with, recommender systems have been developed to effectively support users' decision-making process in online systems. So far, much attention has been paid to designing new recommendation algorithms and improving existent ones. However, few works considered the different contributions from different users to the performance of a recommender system. Such studies can help us improve the recommendation efficiency by excluding irrelevant users. In this paper, we argue that in each online system there exists a group of core users who carry most of the information for recommendation. With them, the recommender systems can already generate satisfactory recommendation. Our core user extraction method enables the recommender systems to achieve 90% of the accuracy of the top-L recommendation by taking only 20% of the users into account. A detailed investigation reveals that these core users are not necessarily the large-degree users. Moreover, they tend to select high quality objects and their selections are well diversified.

Pattern-based hybrid book recommendation system using semantic relationships

Article Open access 06 March 2023

Extracting user influence from ratings and trust for rating prediction in recommendations

Article Open access 12 August 2020

Cross-domain information fusion and personalized recommendation in artificial intelligence recommendation system based on mathematical matrix decomposition

Article Open access 03 April 2024

Introduction

The Internet nowadays provides us with abundant online contents, which makes it very time-consuming to go over every detail and find our needed information. This is often referred as the information overload problem. In order to solve it, search engines and recommender systems are widely investigated^1,2,3,4,5. The search engine returns the relevant contents based on the keywords given by users. Compared to the search engine, the recommender system provides more personalized services by predicting the potential interests according to users' historical choices. These techniques have already been successfully applied to some well-known web sites, such as google.com, amazon.com, taobao.com and youtube.com.

For recommendation algorithms, the most famous one from computer science is the socalled collaborative filtering (CF) with user-based and item-based versions^2,6,7. The user-based CF estimates each user's preferences by referring to her similar users' tastes, while the item-based CF recommends items which are similar to the target user's selected items. Recently, some physical concepts have been introduced to recommendation algorithms. Since recommender systems can be naturally represented by user-object bipartite networks^8,9,10, some classic network-based propagation processes such as mass diffusion^11,12 and heat conduction¹³, are applied to find the most relevant objects for users. The hybridization of these two propagation-based methods can effectively solve the diversity-accuracy dilemma in recommendation¹⁴. Based on these algorithms, many extensions have been made. For example, the preferential diffusion¹⁵, the biased heat conduction¹⁶ and network manipulation¹⁷ are able to further improve the recommendation accuracy for small-degree objects (i.e. solving the cold-start problem). More recently, the long-term influence of different recommendation algorithms on network evolution has been studied¹⁸.

Related works overwhelmingly focus on designing new algorithms, while the effects of the underlying user-object bipartite networks on the recommendation results are seriously overlooked, to the best of our knowledge. More specifically, the relevance of individual users on the recommendation process has not yet been well addressed. In online systems, it is reasonable to imagine that there are some “expert” users who know well about objects qualities in certain fields. By referring to them, the recommender systems can generate satisfactory recommendations for the user who have common interests with these expert users. Besides, there are some malicious online users who seek to bias the output of the recommender systems¹⁹. Eliminating these attackers is very meaningful to enhance the robustness of the recommender systems²⁰. Therefore, investigation on users' roles in recommendation can improve the efficiency as well as the robustness of recommendation algorithms by excluding irrelevant and unreliable users.

In individual level, it is already pointed out that considering K most similar users to the target user can improve the recommendation accuracy under the user-based collaborative filtering framework (known as the “KNN” method)². In this paper, we find that such phenomenon also exists in system level, i.e., one can achieve satisfactory recommendation for all users by only referring to a small group of core users. We first study the relevance of users in a recommender system and find that there exists an “information core” consisting of some key users. The size of the core users is around 20 percent of the whole system. The recommendation accuracy by relying only on the core users can reach 90 percent of that with all users. This is very meaningful from practical point of view since it can significantly speed up the recommendation process in real online systems. Moreover, the analysis in this paper is helpful for the online-retailers to categorize costumers and provide better personalized services for them.

Results

A recommender system can be naturally represented by a bipartite network G(U, O, E), where U = {u₁, u₂, …, u_n}, O = {o₁, o₂, …, o_m} and E = {e₁, e₂, …, e_l} are sets of users, objects and links, respectively. The bipartite network is denoted by an adjacency matrix A, where the element a_iα = 1 if user i has collected object α and 0 otherwise (we use Greek and Latin letters, respectively, for object- and user-related indices). The degree of an object α and a user i, k_α and k_i, represent respectively the number of users who have collected object α and the number of objects collected by user i. For a target user to whom we will recommend objects, each of her uncollected objects will be assigned a score by the recommendation algorithm and the top-L objects with the largest scores will be recommended. Different algorithms generate different object scores and thus different recommendation lists for users.

The mass diffuse¹¹ (MD for short) algorithm works by assigning objects an initial level of “resource” denoted by the vector (where f_α is the resource possessed by object α) and then redistributing it via the transformation , where is a column-normalized m × m matrix. For a target user, the resulting recommendation list of uncollected objects is sorted according to in descending order and top-L objects with the most resources will be recommended.

The MD method can be described in a more intuitive way: The initial resources placed on objects are first evenly divided among neighboring users and then evenly divided among those users' neighboring objects. In a real network, there can be a lot of neighboring users who have common objects with the target user. We argue here that only a few of the most relevant neighboring users should be taken into account in the diffusion. By doing this, there will be less computation in recommendation and the noisy information from the less relevant users can be reduced. Accordingly, we propose the K-Nearest Neighbor Mass Diffuse (KNNMD) method in which only the K nearest neighbors of the target users will be considered. Four different ways can be used to identify the most relevant neighbors: (1) random; (2) degree-based; (3) resource-based; (4) similarity-based ones. When the resources are located at the user side, the random method randomly selects K users as the neighbors; the degree-based method selects K users with the largest degrees as neighbors; and the resource-based method selects K users with the largest received resources as the neighbors. The similarity-based method is a bit more complicated than the previous three methods. Firstly, we compute the similarities between the target user and other users. The cosine index²¹ is used to measure the similarity: , where Γ_i is the set of objects collected by user i. The similarity-based method selects K users with the highest similarities to the target user. A visual representation of KNNMD is given in Fig. 1.

We compare the above four KNNMD methods on three real datasets: Douban, Last.fm and Flickr (see details about the datasets in Methods). The metric recall (see the definition of recall in Methods and the definition and results for more metrics in SI) is chosen to measure the accuracy of recommendation algorithms. A higher recall value is corresponding to a higher recommendation accuracy. The results of these KNNMD methods are presented in Fig. 2. It can be seen that the best method is the similarity-based KNNMD which outperforms the standard MD method for K ≥ 20 in Douban, K ≥ 20 in Last.fm and K ≥ 40 in Flickr, respectively. The optimal neighbor number K* of this method is around 180 in Douban, 300 in Last.fm and 280 in Flickr, respectively (see Table S2 in SI). Moreover, one can see that the accuracy of the MD method is significantly improved by reducing the less relevant neighboring users (see SI for details).

Notice that, the above analysis is at the individual level and the selected K neighbors for different individuals are different. The nice performance of KNNMD arises an important question: in the system level, which kinds of users are the most relevant ones for recommending objects for all users. We denote this group of users as the information core in the recommender system.

We thus propose four approaches to assess the relevance of users and find the information core. The most straight-forward one is simply based on the degrees of users, with an underlying hypothesis that the relevance of a user can be reflected by her degree and the information core consists of users with the largest degrees. The second one is to randomly select a set of users as the information core. This method is used as a benchmark for comparison. In the third method, we first compute the top-N (e.g. N = 10, 20, 50) most similar neighbors of each user based on the cosine similarities and then count how many times a user has appeared in other users' top-N lists. Those users who appear most frequently are selected as the information core. The fourth one is similar to the third one but takes into account the ranks of each user in other users' top-N neighbor lists. Suppose user i belongs to user j's top-N neighbors and his position is pth, then the score of i is 1/p. If i also appears in other users' top-N neighbor lists, we sum his scores as his final weight: w_i = Σ_j_,N(j)∋i 1/p_ij, where N(j) is the top-N neighbor set of user j and j runs over all users whose N(j) set contains i. p_ij is i's position in j's top-N neighbor list. Finally, those users with the largest sums will be selected as the information core. A toy example of the frequency-based and the rank-based methods to find the information core from the network in Fig. 1 is illustrated in Fig. 3.

To study the importance of the information core in recommendation, we make use of four recommendation algorithms: MD¹¹, similarity-based KNNMD (in the following, KNNMD refers to the similarity-based KNNMD), the hybrid of the mass diffusion and heat conduction¹⁴ (Hybrid for short and the details are presented in Methods) and user-based collaborative filtering⁵ (UCF for short and the details are presented in Methods). We firstly compute the accuracy of each algorithm in the traditional way, i.e. using all users in the system. We also compute its accuracy when only the users in the information core are taken into account. Given the information core C and the target user i, only the users in C will receive the resources from i's collected objects in the MD and Hybrid methods. Other users will not receive resources even though they have common objects with i. Then the users who have received resources redistribute the resources back to the object side. For the KNNMD method, we firstly compute i's top-K neighbors who are in the information core C and then only these K neighbors will receive resources and redistribute them. Similarly, the top-K neighbors will be limited in C in the UCF method. This procedure is equivalent to removing non-core users from the network. However, we still make recommendations to these non-core users. The importance of the information core in recommendation can be seen by comparing the accuracy contributed by the core to that of the traditional methods. The comparison of traditional mass diffuse and the information-core-based mass diffuse is presented in Fig. 4.

We use again the recall metric to measure the accuracy of the algorithms (the results of the precision metric are quite similar, see SI). The results are presented in Fig. 5 where r denotes the fraction of users in the information core. When r = 1, all the users will be used in the recommendation algorithms, equivalent to the traditional method. Generally speaking, the recommendation accuracy tends to decrease with r since the available information for the recommendation algorithm is less. The accuracy decreases slower when we choose the rank-based method to identify the information core. Taking the KNNMD method for the Douban data for example, 91.4% (0.1886/0.2063) of the accuracy can be achieved when we only use 20% of users (r = 0.2). Specifically, for the MD method in the Douban data, the accuracy with only 20% users (r = 0.2) can be even slightly better than that with all users (r = 1). Similar results are also observed in the other two datasets. This is of great importance since the algorithmic efficiency of recommendation methods can be largely improved if we consider fewer users. In Fig. 5, in some cases the random method performs even better than the degree-based method. This is because users in the degree-based information core tend to choose small-degree items¹⁰. Therefore, recommendation based on these large-degree users will mainly include the niche (small-degree) items and the recommendation accuracy is low accordingly. For the random-based method, it selects core users randomly. Though the core users are not selected based on calculation but they are well separated. These users' selections are diversified so the recommendation results are better than the degree-based method. However, the random-based methods cannot outperform the frequency-based and rank-based methods. Moreover, it must be noted that some non-core users might be isolated since these users are not selecting any common objects with the core users. We define the non-isolated users as those who have collected at least one common object with the core users. We find that the ratio of non-isolated users is more than 99.9% even when the r = 0.1, indicating a small fraction of users who have no common items with the core users. In this paper, we randomly recommend L objects for these isolated users since they are trivial to our conclusions. From the practical point of view, one can recommend the most popular objects for them.

From the above results, it can be seen that the rank-based method is better than the frequency-based method in identifying the information core, which indicates that the rank of a user appearing in others' top-N neighbor list matters when assessing her relevance in the recommender system. If a user appears in most users' top-N neighbor list with high rank, she should be considered as the key member in the online system since many users' recommendation will rely on her. Both methods are generally better than the random and degree-based methods. Among these methods, the degree-based method is the worst, which indicates that the large-degree users are not for sure the “expert”. Taking the MD method in the Douban data for instance, the accuracy of the degree-based method is much lower than that of the rank-based method when r = 0.2. In many previous works about real networks with heterogenous degree distribution, attention has been overwhelmingly paid to the hubs (nodes with largest degree). Our finding here suggests that degree may not be the proper criterion to judge the importance of nodes in the information filtering process, perspectively analogous to the week ties effects in information filtering²².

Apart from the recall, we also consider a global accuracy metric called ranking score (see the results in SI)^14,32. For a target user, all her uncollected items will be given a rank by the recommendation algorithm and the average position of the objects in the probe set is defined as the ranking score, which can be used to measure the accuracy of algorithms. The smaller the ranking score, the better the recommendation. Compared to the recall result, the best method is degree-based information core method instead of the rank-based method. This is because more objects will receive the resource if we choose large-degree users as the core users. Moreover, the ranking scores of all information-core-based methods are much worse than the corresponding standard ones which consider all users. In fact, any attempt to reduce the user number in recommendation will incline to increase the ranking score²³. However, measuring the accuracy of top-L objects in individual's recommendation lists is actually more important from the practical point of view since in real recommender systems individuals are only presented with top-L objects. On the other hand, we also compare the diversity of information-core-based methods with their traditional ways. Our results indicate that the rank-based information core generally increases the recommendation diversity (see the results in SI).

We then investigate the structural properties of information cores. For simplicity, the relative size of the information core (r) is set as 0.2. After obtaining the core users from different methods, we find that core users detected by rank-based and frequency-based methods are highly overlapped, but for any other pair of methods, there is only a small ratio of overlapping core users. We then compute the average degrees of information core users and the average degree of the objects selected by these core users . The result is presented in Table 1. It can be seen that the in the degree-based information core is the largest and the in the rank-based information core is the smallest. It indicates that our core users are not necessarily the large degree users. Moreover, of the rank-based information core is large, indicating that our core users indeed tend to select the high quality objects. On the contrary, of the degree-based information core is very small as shown in Table 1. The detailed distribution of core users in different can be seen in SI.

Table 1 The average degrees of users in the information core and the average item degrees selected by the information core users

Full size table

Secondly, we investigate the intra-similarity (〈s_in〉) and inter-similarity (〈s_ex〉) of the core. The intra-similarity is defined as the average cosine similarity between core user pairs. The inter-similarity is defined as the average cosine similarity between all user pairs each of which consists of one core user and one non-core user. The result is presented in Table 2. For the random-based method, it is natural that the intra-similarity and inter-similarity are both low. This is because the core users are randomly selected. However, one can observe that both the intra-similarity and inter-similarity of the rank-based method are smaller than those of the degree-based method. It indicates that our core users are well diversified. One can see that the result is different in the Last.fm data, this is because the user degree in this network is more homogeneous.

Table 2 The intra-similarity and inter-similarity of the core

Full size table

According to the above analysis, it is clear that the core users from these methods have different properties. In order to further understand their roles in network and recommendation, we consider three indices: degree heterogeneity, clustering coefficient, diffusion coverage (See the results in SI). For each real network, we first construct a corresponding sub-network which only consists of core users. We study the item degree heterogeneity and clustering coefficient in the sub-networks. The results indicate that the frequency-based and rank-based core users tend to connect to some common items while the degree-based core users' links are more evenly distributed among items. In fact, the clustering coefficient has been shown to closely related to the efficiency of the recommendation process³³. The higher clustering coefficient of the rank-based core users explains why this method leads to a better recommendation accuracy than others in Fig. 5. Many recommendation methods are based on a three-step diffusion (even the well-known collaborative filtering can be regarded as a diffusion process). The diffusion normally starts from the target user. In the first step, it finds the objects selected by the target user. In the second step, the users who selected the same objects as the target user are found and they are referred as relevant users. In the last step, the items selected by the relevant users are found and they are called relevant items. The relevant items with highest diffusion resource will be recommended to the target user. Obviously, the smaller the number of relevant items is, the stronger the filtering effect of the diffusion is. We find that if degree-based core users are used, the diffusion coverage is the same as the case where all users are used, indicating a poor filtering effect. If the frequency-based or rank-based core users are used, the diffusion coverage is significantly narrowed, such that only the most relevant items can be reached by the diffusion in this case.

To obtain the information core, one needs to compute similarities over all user pairs. Therefore, the complexity of obtaining the core users is O(n²m). Once we get the information core, the computation time of recommendation algorithms will be shortened at most 5 times (r = 0.2). Moreover, although it takes time to get the information core, the core is quite relatively stable in real systems. Taking the Douban dataset for instance, more than 90% information core users (frequency-based and rank-based) stay the same in two adjacent months. Therefore, it is enough to update the information core once a month, which will significantly reduce the computational cost. Therefore, our method is meaningful in practice.

Discussion

During the past decade, recommender systems have been widely investigated in several research fields, including computer science, physics, sociology and so on⁵. Up to now, a lot of recommendation algorithms have been proposed. However, little attention was paid to studying the effect of the underlying user-object bipartite network on recommendation process. In this paper, we study the relevance of individual users and find that there exists an information core whose size is small compared to the whole network. The users in the information core usually appear in many users' top-N neighbor lists with high ranks. For many recommendation algorithms, one can achieve very good recommendation accuracy by only using the core users. Actually, similar idea can be extended to the item-based collaborative filtering. One can use only the links of those core users to calculate the items' similarity matrix and obtain accurate recommendation²⁴. This work may find wide applications in practice. For one thing, it can significantly speed up the recommendation process in real online systems since the recommendation engine only has to deal with a small fraction of data. For another, the analysis in this paper can be also helpful for the online-retailers to categorize customers and provide better personalized services for them.

There are still many open issues, such as extending similar technique to monopartite networks for link prediction²⁵. Another interesting open issue is to study the location of these core users in the network. Specifically, one can investigate whether the core users are diversely distributed in different communities. Related study may lead to some better topology-based method to identify the core users in networks. Finally, the evolution of the information core is also an important topic. A relatively stable information core over time will lower the frequency to update core users and thus further reduce the computational cost in practice.

Methods

Data description

We use three datasets to test the accuracy of algorithms, namely Douban²⁶, Last.fm²⁷ and Flickr²⁸. Douban (www.douban.com), launched on March 6, 2005, is a Chinese Web 2.0 web site providing user rating, review and recommendation services for movies, books and music. It is also the largest online Chinese book, movie and music database and one of the largest online communities in China. The raw data contains user activities before Aug 2010 and we randomly sample 17,000 users who have collected at least ten songs. The Last.fm (www.last.fm) is a worldwide popular social music site and the objects in this dataset are referred to the artists which can be collected from Last.fm API. The raw data consists of 360,000 users and we randomly sample 30,000 users who have collected at least five items (artists). Flickr (www.flickr.com) is a photo-sharing site based on a social network. The data used in this paper is individuals' group membership in Flickr, which refers to the their participation in groups. Accordingly, we provide group recommendations for users instead of objects^29,30. We randomly sample 30,000 users who have joined at least ten groups. We treat the user-object (user-group) interaction matrix as binary, that is, the element equals to 1 if the user has viewed or rated the object (joined the group) and 0 otherwise (see Table 3). In this paper, we filtered out those users whose degrees are smaller than 10 (5 for the Last.fm), as it is very difficult to recommend items for those small-degree users.

**Table 3 The statistics of Douban, Last.fm and Flickr datasets. The sparsity is defined as**

Evaluating recommender systems

Each data is randomly divided into two parts: the training set E^T and the probe set E^P. The training set contains 80% of the original links and the recommendation algorithm runs on it³¹. The rest of the links forms the probe set, which will be used to assess the performance of the recommendation algorithm. The result is obtained by averaging over five runs with independently random division of training set and probe set¹⁵.

For each user i, she may have certain number of links (corresponding to objects) in the probe set, we denote it as E_i. After the recommendation list (with length L) is generated for user i, we will calculate d_i(L) as the number of her objects in the probe set which appear in the recommendation list. The recall of this user is defined as R_i(L) = d_i(L)/E_i and the recall of the whole system is defined as . A higher recall value indicates a higher accuracy of the recommendation algorithm³².

Hybrid algorithm

When recommending objects for user i, the hybrid method works by assigning each object collected by user i one unit of resource. The initial resources are denoted by the vector where f_α is the resource possessed by object α. Then they will be redistributed via the transformation , where is the redistribution matrix, with and denoting the degree of object α and user j, respectively. λ is a tunable parameter which adjusts the relative weight between the mass diffusion algorithm (λ = 1) and heat conduction algorithm (λ = 0)¹⁴.

User-based collaborative filtering

In the user-based collaborative filtering method, the basic assumption is that similar users usually collect the same objects. Accordingly, the recommendation score of object α for the target user i is , where N(i) is the top-K neighbors of user i and s_ij is their similarity. The cosine index is chosen to measure their similarity: , where k_i is the degree of user i.

References

Sergey, B. & Lawrence, P. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998).
Article Google Scholar
Adomavicius, G. & Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data. Eng. 17, 734–749 (2005).
Article Google Scholar
Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009).
Article Google Scholar
Tang, J., Wu, S., Sun, J. M. & Su, H. Cross-domain collaboration recommendation. in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: KDD '12, Beijing, China. New York: ACM Press. (2012 August).
Lü, L. Y. et al. Recommender systems. Phys. Rep. 519, 1–49 (2012).
Article ADS Google Scholar
Chen, K. et al. Collaborative Personalized Tweet Recommendation. in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval: SIGIR '12, Portland, USA. New York: ACM Press. (2012 September).
Xu, B., Bu, J. J., Chen, C. & Cai, D. An Exploration of Improving Collaborative Recommender Systems via User-item Subgroups. in Proceedings of the 21st International Conference on World Wide Web: WWW '12, Lyon, France. New York: ACM Press. (2012 April).
Lambiotte, R. & Ausloos, M. Uncovering collective listening habits and music genres in bipartite networks. Phys. Rev. E 72, 066107 (2005).
Article CAS ADS Google Scholar
Huang, Z., Zeng, D. D. & Chen, H. Analyzing Consumer-Product Graphs: Empirical Findings and Applications in Recommender Systems. Manage. Sci. 53, 1146–1164 (2007).
Article Google Scholar
Shang, M. S., Lü, L. Y., Zhang, Y. C. & Zhou, T. Empirical analysis of web-based user-object bipartite networks. Europhys. Lett. 90, 48006 (2010).
Article ADS Google Scholar
Zhou, T., Ren, J., Medo, M. & Zhang, Y. C. Bipartite network projection and personal recommendation. Phys. Rev. E 76, 046115 (2007).
Article ADS Google Scholar
Zhang, Y. C. et al. Recommendation model based on opinion diffusion. Europhys. Lett. 80, 68003 (2007).
Article ADS MathSciNet Google Scholar
Zhang, Y. C., Blattner, M. & Yu, Y. K. Heat Conduction Process on Community Networks as a Recommendation Model. Phys. Rev. Lett. 99, 154301 (2007).
Article ADS Google Scholar
Zhou, T. et al. Solving the apparent diversity-accuracy dilemma of recommender systems. Proc. Natl. Acad. Sci. U.S.A. 107, 4511–4515 (2010).
Article CAS ADS Google Scholar
Lü, L. Y. & Liu, W. P. Information filtering via preferential diffusion. Phys. Rev. E 83, 066119 (2011).
Article ADS Google Scholar
Liu, J. G., Zhou, T. & Guo, Q. Information filtering via biased heat conduction. Phys. Rev. E 84, 037101 (2011).
Article ADS Google Scholar
Zhang, F. G. & Zeng, A. Improving information filtering via network manipulation. Europhys. Lett. 100, 58005 (2012).
Article CAS ADS Google Scholar
Zeng, A., Yeung, C. H., Shang, M. S. & Zhang, Y. C. The reinforcing influence of recommendations on global diversification. Europhys. Lett. 97, 18005 (2012).
Article ADS Google Scholar
Ricci, F., Rokach, L., Shapira, B. & Kantor, P. B. Recommender Systems Handbook. (Springer, New York, 2011).
Zhou, Y. B., Lei, T. & Zhou, T. A robust ranking algorithm to spamming. Europhys. Lett. 94, 48002 (2011).
Article ADS Google Scholar
Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval. (Cambridge University Press, Cambridge, 2008).
Lü, L. Y. & Zhou, T. Link prediction in weighted networks: The role of weak ties. Europhys. Lett. 89, 18001 (2010).
Article ADS Google Scholar
Zeng, W., Zeng, A., Shang, M. S. & Zhang, Y. C. Information Filtering in Sparse Online Systems: Recommendation via Semi-Local Diffusion. PLoS ONE 8, e79354 (2013).
Article ADS Google Scholar
Blattner, M., Zhang, Y. C. & Maslov, S. Exploring an opinion network for taste prediction: An empirical study. Physica A 373, 753–758 (2007).
Article ADS Google Scholar
Lü, L. Y. & Zhou, T. Link prediction in complex networks: A survey. Physica A 390, 1150–1170 (2011).
Article ADS Google Scholar
Huang, J. M., Cheng, X. Q., Shen, H. W., Zhou, T. & Jin, X. L. Exploring social influence via posterior effect of word-of-mouth recommendations. in Proceedings of the fifth ACM international conference on Web search and data mining: WSDM '12, Seattle, USA. New York: ACM Press. (2012 February).
Celma, O. Music Recommendation and Discovery in the Long Tail. (Springer, New York, 2010).
Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P. & Bhattacharjee, B. Measurement and analysis of online social networks. in Proceedings of the 7th ACM SIGCOMM conference on Internet measurement: IMC '07, San Diego, USA. New York: ACM Press. (2007 May).
Zeng, W. & Chen, L. Heterogeneous data fusion via matrix factorization for augmenting item, group and friend recommendations. in Proceedings of the 28th Annual ACM Symposium on Applied Computing: SAC '13, Coimbra, Portugal. New York: ACM Press. (2013 May).
Chen, L., Zeng, W. & Yuan, Q. A unified framework for recommending items, groups and friends in social media environment via mutual resource fusion. Expert Syst. Appl. 40, 2889–2903 (2013).
Article Google Scholar
Jamali, M. & Ester, M. A matrix factorization technique with trust propagation for recommendation in social networks. in Proceedings of the fourth ACM conference on Recommender systems: RecSys '10, Barcelona, Spain. New York: ACM Press. (2010 March).
Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22, 5–53 (2004).
Article Google Scholar
Huang, Z., Zeng, D. D. & Chen, H. C. Analyzing Consumer-Product Graphs: Empirical Findings and Applications in Recommender Systems. Manage. Sci. 53, 1146–1164 (2007).
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant Nos. 61370150 and 91324002), the Open Foundation of State key Laboratory of Networking and Switching Technology (SKLNST-2013-1-18) and the Special Project of Sichuan Youth Science and Technology Innovation Research Team (No.2013TD0006). W.Z. acknowledges the support from the Program of Outstanding PhD Candidate in Academic Research by UESTC (YBXSZC20131029).

Author information

Authors and Affiliations

Web Sciences Center, University of Electronic Science and Technology of China, Chengdu, 611731, P.R. China
Wei Zeng, Ming-Sheng Shang & Tao Zhou
State Key Laboratory of Networking and Switching Technology, Beijing, 100876, P.R. China
Wei Zeng
Department of Physics, University of Fribourg, Fribourg, CH1700, Switzerland
An Zeng, Hao Liu & Ming-Sheng Shang
School of Systems Science, Beijing Normal University, Beijing, 100875, P.R. China
An Zeng

Authors

Wei Zeng
View author publications
You can also search for this author in PubMed Google Scholar
An Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Sheng Shang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.Z., A.Z., H.L. and M.S.S. designed the research. W.Z. performed the experiments, W.Z., A.Z., H.L. and T.Z. analysed the data, W.Z., A.Z. and T.Z. wrote the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary information

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

Reprints and permissions

About this article

Cite this article

Zeng, W., Zeng, A., Liu, H. et al. Uncovering the information core in recommender systems. Sci Rep 4, 6140 (2014). https://doi.org/10.1038/srep06140

Download citation

Received: 26 February 2014
Accepted: 17 July 2014
Published: 21 August 2014
DOI: https://doi.org/10.1038/srep06140

This article is cited by

Preference modeling by exploiting latent components of ratings
- Junhua Chen
- Wei Zeng
- Ge Fan
Knowledge and Information Systems (2019)
Uncovering the essential links in online commercial networks
- Wei Zeng
- Meiling Fang
- Mingsheng Shang
Scientific Reports (2016)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.