Loading...
Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks
Lu, Jinghui ; Henchion, Maeve ; Mac Namee, Brian
Lu, Jinghui
Henchion, Maeve
Mac Namee, Brian
Citations
Altmetric:
Date
2021-11-16
Collections
Files
Loading...
main proceedings
Adobe PDF, 795.26 KB
Research Projects
Organizational Units
Journal Issue
Citation
Abstract
Jensen-Shannon divergence (JSD) is a distribution similarity measurement widely used in natural language processing. In corpus
comparison tasks, where keywords are extracted to reveal the divergence between different corpora (for example, social media posts
from proponents of different views on a political issue), two variants of JSD have emerged in the literature. One of these uses a weighting
based on the relative sizes of the corpora being compared. In this paper we argue that this weighting is unnecessary and, in fact, can lead
to misleading results. We recommend that this weighted version is not used. We base this recommendation on an analysis of the JSD
variants and experiments showing how they impact corpus comparison results as the relative sizes of the corpora being compared change.
