We report on work in progress on extracting lexical simplifications (e.g., ``collaborate'' → ``work together''), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.
@inproceedings{Yatskar+al:10a, author = {Mark Yatskar and Bo Pang and Cristian Danescu-Niculescu-Mizil and Lillian Lee}, title = {For the sake of simplicity: Unsupervised extraction of lexical simplifications from {Wikipedia}}, year = {2010}, pages = {365--368}, booktitle = {Proceedings of the NAACL} }
This material is based upon work supported in part by NSF grant IIS-0910664. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other sponsors.