May 19, 2024
Hojjat Emami

Hojjat Emami

Academic rank: Associate professor
Address: Iran, East Azerbaijan, Bonab, University of Bonab
Education: Ph.D in Computer Engineering- Artificial Intelligence
Phone: 041-37741636
Faculty: Faculty of Engineering
Department: Computer Engineering

Research

Title
Personal Name Disambiguation inFarsi Web Pages
Type Article
Keywords
web mining, information retrieval, name ambiguity, name disambiguation, Farsi language, clustering
Researchers Hojjat Emami

Abstract

The problem of name ambiguity causes the results for a personal name query to be a mixture of web pages about different individuals sharing the same name. Name disambiguation as an important task of web mining and information retrieval is the process of grouping web pages into some clusters, where each cluster contains all web pages that refer to the same individual. This paper presents an unsupervised approach to name disambiguation problem. The proposed method exploits two sources of semantic information: discourse profile information derived from the local corpus and global information extracted from ontology. Our approach formalizes the name disambiguation problem as four main subtasks: pre-processing, discourse profile extraction, profile enrichment and profile clustering. First, our approach takes as input the web pages to be disambiguated and then cast them into annotated textual documents using pre-processing tools. Profile extraction phase extracts individuals’ discourse profiles from pre-processed text. In profile enrichment phase, discourse profiles are enriched with semantic information obtained from an online ontology. In profile clustering stage, enriched profiles are grouped into some clusters such that each cluster contains the web pages refer to the same individual. The performance of the proposed approach is evaluated on a Farsi and English datasets. The experimental results are encouraging and indicate that the proposed method outperforms the baseline methods and its counterparts.