research-article

Unsupervised Readability Assessment via Learning from Weak Readability Signals

Authors:

Qing GuAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1324 - 1334

https://doi.org/10.1145/3539618.3591695

Published: 18 July 2023 Publication History

Get Access

Abstract

Unsupervised readability assessment aims to evaluate the reading difficulty of text without any manually-labeled data for model training. This is a challenging task because the absence of labeled data makes it difficult for the model to understand what readability is. In this paper, we propose a novel framework to Learn a neural model from Weak Readability Signals (LWRS). Instead of relying on labeled data, LWRS utilizes a set of heuristic signals that specialize in describing text readability from different aspects to guide the model in outputting readability scores for ranking. Specifically, to effectively use multiple heuristic weak signals for model training, we build a multi-signal learning model that ranks the unlabeled texts from multiple readability-related aspects based on intra- and inter-signal learning. We also adopt the pairwise ranking paradigm to reduce the cascade coupling among partial-order pairs. Furthermore, we propose identifying the most representative signal based on the batch-level consensus distribution of all signals. This strategy helps identify the predicted signal that is most correlated with readability in the absence of ground-truth labels. We conduct experiments on three public readability assessment datasets. The experimental results demonstrate that our LWRS outperforms each heuristic signal and their combinations significantly, and can even perform comparably with some supervised methods. Additionally, our LWRS trained on one dataset can be effectively transferred to other datasets, including those in other languages, which indicates its good generalization and potential for wide application.

Supplemental Material

MP4 File

Presentation video for full paper Unsupervised Readability Assessment via Learning from Weak Readability Signals. This is my first time recording this kind of video, I am a bit nervous and not very good at speaking, so please feel free to contact me if you have any questions about the content of the paper

Download
133.06 MB

References

[1]

Hélder Antunes and Carla Teixeira Lopes. 2020. Proposal and Comparison of Health Specific Features for the Automatic Assessment of Readability. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1973--1976. https: //doi.org/10.1145/3397271.3401187

Abstract

Supplemental Material

References

Index Terms

Recommendations

Transductive Multilabel Learning via Label Set Propagation

An Analysis of Transfer Learning Methods for Multilingual Readability Assessment

A pairwise ranking based approach to learning with positive and unlabeled examples

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations