May 19, 2024
Hojjat Emami

Hojjat Emami

Academic rank: Associate professor
Address: Iran, East Azerbaijan, Bonab, University of Bonab
Education: Ph.D in Computer Engineering- Artificial Intelligence
Phone: 041-37741636
Faculty: Faculty of Engineering
Department: Computer Engineering

Research

Title
Personal name disambiguation in Farsi web pages
Type Presentation
Keywords
Web mining, information retrieval, name ambiguity, name disambiguation, clustering
Researchers Hojjat Emami

Abstract

The problem of name ambiguity causes the results for a personal name query to be a mixture of web pages about different individuals sharing the same name. Name disambiguation as an important task of web mining and information retrieval is the process of grouping web pages into some clusters, where each cluster contains all web pages that refer to the same individual. This paper presents an unsupervised approach to name disambiguation problem. The proposed method exploits two sources of semantic information: discourse profile information derived from the local corpus and global information extracted from a distant ontology. Our approach formalizes the name disambiguation problem as four main subtasks: pre-processing, discourse profile extraction, profile enrichment and profile clustering. First, our approach takes as input the web pages to be disambiguated and then cast them into annotated textual documents using pre-processing tools. Profile extraction phase extracts individuals’ discourse profiles from pre-processed text. In profile enrichment phase, discourse profiles are enriched with semantic information obtained from a distant ontology. In profile clustering stage, enriched profiles are grouped into some clusters such that each cluster contains the web pages refer to the same individual. The performance of the proposed approach is evaluated on a small Farsi corpus. The experimental results are encouraging and indicate that the proposed method outperforms baseline methods.