Disciplines

Applied Statistics | Data Science | Public Health

Abstract (300 words maximum)

The utilization of online crowdsourcing platforms for data collection has increased over the past two decades in the field of public health due to the ease of use, the cost-saving benefits, the speed of the data collection process, and the accessibility of a potentially true representative population. Although these platforms offer many advantages to researchers, significant drawbacks exist, such as poor data quality, that threaten the reliability and validity of the study. Previous studies have examined data quality concerns, but differences in results arise due to variations in study designs, disciplinary contexts, and the platforms being investigated. Therefore, this study was conducted to concentrate on data quality for Patient-Reported Outcomes in orthopedic and sports medicine research using Qualtrics, Survey Monkey, and Amazon Mechanical Turk (MTurk). Multiple rounds of data collection were executed across the three platforms to assess data quality. Four primary data quality assessment categories were determined: demographic (State vs. Zip code, State vs. Region, etc.), attention, honesty, and logic. Data were collected in Qualtrics (n=500), SurveyMonkey (n=400), and MTurk (n=400) and descriptive analyses were conducted to assess data quality. Pearson correlation coefficient was performed to compare Relevant ID Score provided by pay-for-data services to the total number of flags assessed by the research team. Chi-square tests were performed to compare the proportions of good quality data across services. Based on the statistical analyses, we observe that Qualtrics provided the best data quality among the three platforms, SurveyMonkey produced reasonable data quality, and MTurk was the worst of the three services.

Academic department under which the project should be listed

CCSE - Data Science and Analytics

Primary Investigator (PI) Name

Kevin Gittner

Share

COinS
 

Data Quality Checks: Implementation With Popular Data Collection Crowdsourcing Platforms

The utilization of online crowdsourcing platforms for data collection has increased over the past two decades in the field of public health due to the ease of use, the cost-saving benefits, the speed of the data collection process, and the accessibility of a potentially true representative population. Although these platforms offer many advantages to researchers, significant drawbacks exist, such as poor data quality, that threaten the reliability and validity of the study. Previous studies have examined data quality concerns, but differences in results arise due to variations in study designs, disciplinary contexts, and the platforms being investigated. Therefore, this study was conducted to concentrate on data quality for Patient-Reported Outcomes in orthopedic and sports medicine research using Qualtrics, Survey Monkey, and Amazon Mechanical Turk (MTurk). Multiple rounds of data collection were executed across the three platforms to assess data quality. Four primary data quality assessment categories were determined: demographic (State vs. Zip code, State vs. Region, etc.), attention, honesty, and logic. Data were collected in Qualtrics (n=500), SurveyMonkey (n=400), and MTurk (n=400) and descriptive analyses were conducted to assess data quality. Pearson correlation coefficient was performed to compare Relevant ID Score provided by pay-for-data services to the total number of flags assessed by the research team. Chi-square tests were performed to compare the proportions of good quality data across services. Based on the statistical analyses, we observe that Qualtrics provided the best data quality among the three platforms, SurveyMonkey produced reasonable data quality, and MTurk was the worst of the three services.