Online surveys and social media offer abundant data at low costs, but how confident can researchers be about the quality of the data?

Researchers working on this project address this issue in three ways:

Examining the representativeness of Twitter users

First, researchers are working to assess the degree to which Twitter account holders are representative of the population, stratified by how frequently they use Twitter, including those who do not use it.

Assessing the adequacy of demographic information about Twitter users

Currently, demographic information about Twitter users is sparse and incomplete. However, researchers are using additional data to model the relationship between characteristics of tweets (for example, emojis, word structures, and sentiments) and respondent demographics. The results will be used to multiply impute demographics on a sample of twitter handles.

Findings from a preliminary analysis of methods for validating predictions of socio-demographics of Twitter users. The graph shows the distribution of the four most common methods, by attribute.

Develop alternative weighting procedures using imputed demographics in an attempt to improve population representativeness of Twitter account holders

Using data from the American Community Survey and the Pew Research Center surveys on internet use, researchers cross-validate the representativeness of Twitter users.

Making methodological advancements in blending survey and social media data