Recommender on Last.fm

Character computation.

Learn your taste from your listen record in the tag feature space.

We characterize users' tag attributes from their listen records. In the feature space, each dimension represents a tag attribute. The normalized value of attribute is an accumulative result for all listened artists who has the given tag. Each addend is the product of normalized tagged count and listening time of corresponding artist.

* This plot is adapted from high dimensional feature space to 2D space by using t_SNE. Each node represents a user in our dataset.

Capture your neighbors.

K nearest neighbors are selected based on the Euclidean distance between you and other users in the feature space.

* This animation shows a test user(ID:5) is finding 30 nearest neighbors. The test user becomes bigger at the center since collecting more records from neighbors. The value of distance in 2D space is equal to the value of Euclidean distance in high-dimensional feature space in this animation. The radius of a neighbor represents the relative size of total listen number.

And lastly, guess your most favorite.

The listened records of neighbors are weighted by the Gaussian function of distance in feature space. By combining K neighbors' weighted records, we recommend the top ranked artist out of your listened artists to you.

* The plot compares the overall accuracy of predict most favorite with KNN under different parameter K. An accuracy of which recommending the most popular artist out of the listened artists, is given as the baseline of performance. Each user in the dataset is set as a test user by removing the most listened artist and using rest listen record to predict the most favorite artist. The overall matching rate of the predict and the actual most listened artist is treated as accuracy. For the best settings, the overall accuracy reached 30.39%. The dataset of Last.FM is published at HetRec-2011.

Test out recommender on the Dashboard.

Build your mock user. Run the recommender hosted on our python server. Test its recommendation with a lot of fun.

* The dataset on our python server including 186479 tags labeled by 1892 users to 17632 artists. 92834 listened records are used to train the k nearest neighbors algorithm (k = 40).

Play on the Dashboard Review Final Report

Team