1593
Development of an App to Flag Health News Disinformation

Kristen Swain, PhD, School of Journalism and New Media, University of Mississippi, University, MS and Naeemul Hassan, PhD, Department of Computer & Information Science, University of Mississippi, University, MS

Background:

Health misinformation, spread through viral news hoaxes, has emerged as a critical social media problem over the last decade. Although trusted, reliable media outlets do occasionally spread health misinformation on social media, most intentional disinformation originates from unreliable media sources.

Program background:

Using a large collection of health-related news articles published by reliable and unreliable media outlets, this study examines how health news framing is exploited to spread disinformation. Findings will inform the development of algorithms and a new text analysis tool to help journalists, readers and news providers differentiate credible health news from disinformation.

Evaluation Methods and Results:

A data collection program was used to curate a preliminary set of about 66,000 news articles from 27 reliable media outlets and 20 unreliable outlets. Next, a large-scale repository of both print and broadcast health news stories that appeared on social networking sites will be built and analyzed. A systematic analysis of media content across reliable and unreliable stories will identify network characteristics and engagement patterns among readers in different age groups. The team has developed a scraper program to automatically collect articles from media sources’ websites, automatically separate health-oriented news articles from non-health articles, and gather social media engagement metrics, such as comments, shares and likes for each article. Algorithms also will identify story characteristics including headlines, bylines, leads, captions, video and other images, topics, sourcing patterns, factual and opinion statements, and quote types. Reliable sources will include CNN Health, Cancer.gov, WebMD, etc. Unreliable sources will include sites like REALfarmacy.com and HealthNutNews. Unreliable story characteristics include disease mongering, vague sourcing, and failure to identify financial conflicts of interest. After completing the content analysis, the team will interview young people and seniors who read, share and engage with health stories. Reader surveys and focus groups of adolescents and seniors will identify health disinformation challenges. To explore whether age, gender, education level and news consumption behaviors predict susceptibility to health disinformation, the team will conduct focus groups of readers younger than 18 and seniors 65 and older.

Conclusions:

Natural language analysis of the initial 66,000 story set revealed structural, topical, and semantic patterns that differentiate reliable from unreliable health stories. Unreliable media outlets more often used click-bait headlines to catch the attention of users and were less likely to use direct quotes and hyperlinks. Unreliable outlets also tended to have longer headlines, which often have higher click-through rates than short headlines do.

Implications for research and/or practice:

Findings will used to develop an app to combat health disinformation, as well as recommendations for policies that discourage health hoax propagators. The findings also will inform recommendations for journalists who cover health topics, as well as a computer application that could provide instant feedback on their draft stories. The application could help commercial third-party news aggregate applications such as Yahoo News, Flipboard and Bundle News automatically flag health disinformation to remove from news feeds. Finally, the findings will inform the development of media literacy educational materials to help news consumers of all ages learn to identify health disinformation.