Mohamed Amine Chatti, Fangzheng Ji, Mouadh Guesmi, Arham Muslim, Ravi Kumar Singh, Shoeb Joarder
Social Computing Group, University of Duisburg-Essen, Germany
Problem
Interest modeling is a crucial task to achieve personalized services, such as recommendations. Applying interest modeling to textual data is often associated with semantic-related problems. Interest models can contain similar interests represented in form of acronyms (e.g., mooc and massive open online course), synonyms (e.g., technology enhanced learning and elearning), and lexical variants (e.g., elearning and e-learning). There also exists overgeneration problem, e.g., the keyphrases open learning analytics and learning analytics represent the same interest learning analytics. Additionally, the keyword extraction algorithms can generate some irrelevant keywords (e.g., dataset) which might not describe the user’s interest. Moreover, interest models can be semantically similar as they contain acronyms, synonyms, and lexical variants. However, due to the lack of semantic knowledge, traditional similarity methods (e.g., Jaccard or cosine similarity) will identify the interest models as different, which might influence the accuracy of a recommender system.
Interest models of two researchers
Solution
In this demo, we address these semantic problems in the interest modeling task and present a Semantic Interest Modeling Toolkit (SIMT) for the effective generation and similarity computation of interest models, based on semantic information. SIMT follows a mixed-method approach that combines unsupervised keyword extraction algorithms, knowledge bases, and word embedding techniques to address the semantic issues in the interest modeling process.
SIMT
SIMT consists of two main components, namely Interest Model Generation and Semantic Interest Model Similarity. These components are developed as RESTful APIs allowing them to be easily used by any application that requires semantic user interest modeling.

SIMT abstract architecture
Interest Model Generation
The Interest Model Generation component consists of two sub-components:
- The Keyword Extractor sub-component is responsible for extracting candidate interest keywords from the user-generated textual content (posts/publications) using various unsupervised keyword extraction algorithms (e.g., TextRank, SingleRank, TopicRank, TopicalPageRank, PositionRank, MultipartitieRak, Rake, and YAKE!)
- The Semantic Enrichment sub-component leverages semantic information from Wikipedia to generate the user’s interest model based on the candidate interest keywords generated from the Keyword Extractor sub-component.
Three types of interest models can be generated, namely Keyword-based Interest Model, Wiki-based Interest Model, and Wiki Category-based Interest Model. By comparing the keyword-based interest model and the Wiki-based interest model, it can be observed that synonym interests are merged, acronym interests are reduced, and less interesting keywords are removed.

Interest model generation
Semantic Interest Model Similarity
The Semantic Interest Model Similarity component is responsible for calculating semantic similarity scores between two interest models. The first step in calculating the similarity is to generate a vector representation of both models. Afterward, the similarity is calculated by applying cosine similarity to the two interest model vectors. SIMT computes the semantic similarity of interest models using two different approaches, namely Wikipedia-based Measure and Word Embedding-based Measure.

Wikipedia-based measure

Word embedding-based measure
SIMT in Action
SIMT has been leveraged in the transparent Recommendation and Interest Modeling Application (RIMA) to:
- Infer interest models of researchers based on their publications extracted from Semantic Scholar
- Compute the semantic similarity of two researchers
- Open and explain the inferred user interest models
- Provide personalized recommendations and explain them

Generating user interest models (left) and computing semantic similarity between user interest models (right)

Opening the user interest model

Explaining the user interest model

Providing personalized recommendations

Explaining the recommendation
Future Work
The Semantic Interest Modeling Toolkit (SIMT) harnesses the semantic information based on Wikipedia and word embedding techniques for the effective generation and similarity computation of interest models. As future work, we will conduct experiments to compare our approach with traditional keyphrase extraction and similarity computation approaches in the interest modeling task. Further, we plan to consider and evaluate other embedding techniques such as BERT and ELMo to compute the semantic similarity between user interest models.





