Natural Language Processing For Fake News Detection

William Wang, University of California, Santa Barbara

Stevenson Hall 1300
12:00 PM - 12:50 PM

In this past election cycle for the 45th President of the United States, the world has witnessed a growing epidemic of fake news. The plague of fake news not only poses serious threats to the integrity of journalism, but has also created turmoil in the political and actual world. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this talk, we will describe LIAR: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate metadata with text. We show that this hybrid approach can improve a text-only deep learning model. We will outline future directions, and conclude this talk by discussing related technologies in natural language processing