35448 Geo-Targeted Social Media Analytics for Tracking Disease Outbreaks and Public Opinion at the Municipal Scale

Ming-Hsiang Tsou, Ph.D.1, Brian H. Spitzberg, Ph.D.2, Chris Allen, M.A.1, Anoshé Aslam, B.S.3 and Anna Nagel, MPH3, 1Department of Geography, San Diego State University, San Diego, CA, 2School of Communication, San Diego State University, San Diego, CA, 3Graduate School of Public Health, San Diego State University, San Diego, CA

Theoretical Background and research questions/hypothesis: Surveillance is a key component in disease detection and public health research, but traditional methods of collecting patient data and reporting to health officials are costly and time consuming. In recent years infoveillance tools have been developed for researchers to track and analyze data available in real time on the Internet and social media. While many infoveillance tools developed, very few tools focus on geo-targeted data collection at the municipal scale with Geographic Information Systems (GIS) analysis capability. We developed geo-targeted social media APIs (for Twitter) and analyzed “flu” related tweets from the largest 31 U.S. cities using 2010 census data and GIS methods. We found out that each city has its unique pattern of flu tweeting rates during the 2013/2014 flu season. Comparing to national or regional approaches, disease outbreak detections and public opinion analysis at the municipal scale are more valuable for public health agencies and practitioners.

Methods: Tweets are collected continuously with the keyword “flu” using customized geo-targeted Twitter Search APIs from 31 U.S. cities within a 17 mile radius from Week 39, 2013 to Week 10, 2014. Our prediction and analysis model uses partial methods in one of our papers published last year (Nagel et al., 2013 in J Med Internet Res 2013;15(10)). We compared the temporal patterns of weekly flu tweeting rates in each city with the CDC regional ILI data. The correlation coefficients between flu tweeting rates and the regional CDC ILI records were calculated and trends were represented graphically for visualization. 

Results: Since CDC did not provide ILI data at the municipal scale, we used the 10 regional ILI records to compare with each city’s flu tweeting rate within each region. Correlation coefficients (R-value) between weekly aggregated flu tweeting rates and disease occurrence (CDC regional ILI) are very high in many cities (Atlanta vs. Region-4 ILI: 0.79, Nashville vs. Region-4 ILI: 0.93, Chicago vs. Region-5 ILI: 0.75, Detroit vs. Region-5 ILI: 0.84, Los Angeles vs. Region-9 ILI: 0.88, Seattle vs. Region-10 ILI: 0.78). We also collected the weekly laboratory confirmed influenza cases from San Diego County and compared it with the weekly flu tweeting rates in San Diego.  The R-value is 0.9185, which indicates a very strong correlation of temporal patterns between the influenza cases and flu tweeting rates at the municipal scale. 

Conclusions: This study demonstrates that geo-targeted social media analytics with GIS methods are feasible to track and monitor flu outbreaks at the municipal scale. One key advantage of our tools is the capability to track flu outbreaks automatically every day.  The daily monitoring capability is essential for local public health agencies and practitioners to investigate large-scale disease outbreaks. 

Implications for research and/or practice: This municipal-level flu outbreak monitoring method can be applied to major U.S. Cities and help local health agencies and hospitals have better surveillance tools to monitor local disease outbreaks.  The methods may be extended to other public health issues, such as air pollutions, depression, and other epidemics.