Published in the Communication Methods and Measures journal, a recent study by the Leverhulme Centre for Demographic Science’s Oriol Bosch Jover investigates bias in digital trace data by looking at blind spots when obtaining data through web tracking.
The study highlights the importance of tracking across all devices and browsers to ensure accurate measurements when undertaking academic research, and highlights the importance of considering potential biases when interpreting digital trace data.
Using data from Spain, Portugal and Italy, the study simulates and demonstrates the prevalence and impact of ‘undercoverage bias’ in online panels, and tests the accuracy of web tracking data in measuring online media exposure. The study defines ‘undercoverage bias’ as researchers’ failure to capture data from all the devices and browsers that individuals use to go online.
The study found a high prevalence of undercoverage, with over 70% of participants in commercial panels not having all their devices and/or browsers tracked. As a result, researchers failed to capture comprehensive data individuals use to go online.
The type and number of devices used by individuals were identified as the primary determinants of undercoverage, making this the key factor in the failure of collecting comprehensive data, rather than individual’s characteristics.
Through a simulation study, the researchers demonstrated that web tracking estimates are often substantially biased due to undercoverage. This bias can significantly affect the accuracy of online media exposure measurements.
These findings have significant implications for researchers and policymakers who rely on digital trace data for studying online behaviours.
Oriol Bosch Jover, Postdoctoral Researcher in Data Donation and Computational Methods at the Leverhulme Centre for Demographic Science said, ‘Digital trace data is often considered the gold standard for measuring online behaviours, but our study shows that it is not without its flaws. Tracking undercoverage can introduce substantial biases, which researchers need to account for in their analyses.’
While the simulations used to estimate bias due to undercoverage might not capture all real-world complexities, the study offers the following strategies that researchers can use to address biases in web tracking data introduced by undercoverage:
- Use auxiliary survey data: Integrate survey data to identify and correct for undercoverage by comparing tracked and self-reported behaviours.
- Encourage full device coverage: Incentivise participants to install tracking technologies on all their devices and browsers by emphasising the academic research purposes and benefits.
- Advanced statistical techniques: Apply statistical adjustments, such as weighting and imputation, to account for missing data.
By applying these strategies, researchers can help mitigate the impact of undercoverage and improve the accuracy of digital trace data.
The full article ‘Uncovering Digital Trace Data Biases: Tracking Undercoverage in Web Tracking Data’ can be found in the Communication Methods and Measures journal.