Online surveys and social media offer abundant data at low costs, but how confident can researchers be about the quality of the data?
Researchers working on this project address this issue in three ways:
Examining the representativeness of Twitter users
First, researchers are working to assess the degree to which Twitter account holders are representative of the population, stratified by how frequently they use Twitter, including those who do not use it.
Assessing the adequacy of demographic information about Twitter users
Currently, demographic information about Twitter users is sparse and incomplete. However, researchers are using additional data to model the relationship between characteristics of tweets (for example, emojis, word structures, and sentiments) and respondent demographics. The results will be used to multiply impute demographics on a sample of twitter handles.
Develop alternative weighting procedures using imputed demographics in an attempt to improve population representativeness of Twitter account holders
Using data from the American Community Survey and the Pew Research Center surveys on internet use, researchers cross-validate the representativeness of Twitter users.