Keeping up with the Influencers: Improving User Recommendation in Instagram using Visual Content

Marco Bertini, Andrea Ferracani, Riccardo Papucci, Alberto Del Bimbo

Photos shared by users on their Instagram profile can be exploited to improve user-to-user recommendation through a user similarity computed considering photos visual content. We consider in particular users with an established credibility and audience, the so called influencers and demonstrate that an hybrid approach that combines collaborative filtering and NN on image collections performs better than a standard recommender based only on collaborative filtering.

Instagram user classification exploiting Visual Content

Users are classified in four categories of interest on the basis of the similarity of their photo collection to the collections of influencers, popular users who share content on the same topic (fashion, food, animal, travel):

Examples of photos from fashion influencers

Examples of photos from food influencers

7,433 photos from 138 influencers: 35 specialized in animals, 33 specialized in food, 34 specialized in fashion and 36 specialized in travel
4,737,306 followers

The classifier was trained with triplet loss and tested with k-fold cross-validation. Visual features were extracted with ResNet50 using an intermediate layer:

90 train set, 4818 photos
48 test set, 2615 photos

The size of the visual features is 7 x 7 x 512. Because influencers have a different number of images the descriptors of single images are combined into a descriptor of the photo collection, so to have a descriptor that has the same dimension for all the influencers.

User collection feature: the visual representation fo the collection matrix is obtained using max-pooling

Training of the classifier

An anchor photo collection is compared with both a positive sample collection and a negative sample producing embeddings that are given to the triplet loss function and are then used in order to improve the user recommender considering not only collaborative filtering data but also the similarity of photo collections.

CNN architecture for training. The network takesas input the ‘photo collections’ of two fashion bloggers andof a food blogger. Features are collected, shared weights computed and given as input to the triplet loss function. — CNN architecture for training. The network takes as input the ‘photo collections’ of two fashion bloggers and of a food blogger. Features are collected, shared weights computed and given as input to the triplet loss function.

The pre-trained ResNet50 layers are followed by two fully connected layers and then by a 10-dimensional layer (sigmoid activation) that produces the embeddings used in the recommender.

Classifier evaluation

Accuracy of the classifier in predicting the typology of influencers has been tested with different approaches:

based on threshold of the distances of the embeddings, using F1 score as evaluation metric: accuracy 0.88;
using KNN: accuracy 0.96;
using Linear SVM: accuracy 0.96.

The User Hybrid Recommender

The main goal has been to test if considering embeddings learned from visual feature of user photo collections improves over a standard user recommendation algorithm based only on collaborative filtering.

For collaborative filtering we have used a matrix factorization model: if a user follows an influencer we consider this as a positive interaction, while a lack of interaction is implicitly considered as a negative.

Recommender evaluation

Influencer recommendation is treated as a ranking problem: we desire that the system recommends a list of persons in an order in which the most interesting ones for a user have a higher ranking.

The metric used to compare the proposed hybrid approach with the collaborative filtering baseline is ROC AUC, that measures the probability that a randomly chosen positive example has a higher score than a randomly chosen negative example.

ROC AUC values for different number of trainingepochs. The hybrid model always outperforms the CF modelby a large margin