ACM UMAP 2021 - SIMT: A Semantic Interest Modeling Toolkit

Mohamed Amine Chatti, Fangzheng Ji, Mouadh Guesmi, Arham Muslim, Ravi Kumar Singh, Shoeb Joarder

Social Computing Group, University of Duisburg-Essen, Germany

Problem

Interest modeling is a crucial task to achieve personalized services, such as recommendations. Applying interest modeling to textual data is often associated with semantic-related problems. Interest models can contain similar interests represented in form of acronyms (e.g., mooc and massive open online course), synonyms (e.g., technology enhanced learning and elearning), and lexical variants (e.g., elearning and e-learning). There also exists overgeneration problem, e.g., the keyphrases open learning analytics and learning analytics represent the same interest learning analytics. Additionally, the keyword extraction algorithms can generate some irrelevant keywords (e.g., dataset) which might not describe the user’s interest. Moreover, interest models can be semantically similar as they contain acronyms, synonyms, and lexical variants. However, due to the lack of semantic knowledge, traditional similarity methods (e.g., Jaccard or cosine similarity) will identify the interest models as different, which might influence the accuracy of a recommender system. 001 researcher A researcher B

Interest models of two researchers

Solution

In this demo, we address these semantic problems in the interest modeling task and present a Semantic Interest Modeling Toolkit (SIMT) for the effective generation and similarity computation of interest models, based on semantic information. SIMT follows a mixed-method approach that combines unsupervised keyword extraction algorithms, knowledge bases, and word embedding techniques to address the semantic issues in the interest modeling process.

SIMT

SIMT consists of two main components, namely Interest Model Generation and Semantic Interest Model Similarity. These components are developed as RESTful APIs allowing them to be easily used by any application that requires semantic user interest modeling.

003 abstract architecture

SIMT abstract architecture

Interest Model Generation

The Interest Model Generation component consists of two sub-components:

The Keyword Extractor sub-component is responsible for extracting candidate interest keywords from the user-generated textual content (posts/publications) using various unsupervised keyword extraction algorithms (e.g., TextRank, SingleRank, TopicRank, TopicalPageRank, PositionRank, MultipartitieRak, Rake, and YAKE!)
The Semantic Enrichment sub-component leverages semantic information from Wikipedia to generate the user’s interest model based on the candidate interest keywords generated from the Keyword Extractor sub-component.

Three types of interest models can be generated, namely Keyword-based Interest Model, Wiki-based Interest Model, and Wiki Category-based Interest Model. By comparing the keyword-based interest model and the Wiki-based interest model, it can be observed that synonym interests are merged, acronym interests are reduced, and less interesting keywords are removed.

004 interest model generation

Interest model generation

Semantic Interest Model Similarity

The Semantic Interest Model Similarity component is responsible for calculating semantic similarity scores between two interest models. The first step in calculating the similarity is to generate a vector representation of both models. Afterward, the similarity is calculated by applying cosine similarity to the two interest model vectors. SIMT computes the semantic similarity of interest models using two different approaches, namely Wikipedia-based Measure and Word Embedding-based Measure.

005 wikipedia based measure

Wikipedia-based measure

006 word embedding based measure

Word embedding-based measure

SIMT in Action

SIMT has been leveraged in the transparent Recommendation and Interest Modeling Application (RIMA) to:

Infer interest models of researchers based on their publications extracted from Semantic Scholar
Compute the semantic similarity of two researchers
Open and explain the inferred user interest models
Provide personalized recommendations and explain them

002 interest model and semantic interest model similarity

Generating user interest models (left) and computing semantic similarity between user interest models (right)

008 opening user interest model

Opening the user interest model

09 explaining user interest model

Explaining the user interest model

10 providing personalized recommendation

Providing personalized recommendations

11 explaining recommendation

Explaining the recommendation

Future Work

The Semantic Interest Modeling Toolkit (SIMT) harnesses the semantic information based on Wikipedia and word embedding techniques for the effective generation and similarity computation of interest models. As future work, we will conduct experiments to compare our approach with traditional keyphrase extraction and similarity computation approaches in the interest modeling task. Further, we plan to consider and evaluate other embedding techniques such as BERT and ELMo to compute the semantic similarity between user interest models.

WATCH OUR MADNESS VIDEO