The overlooked politics of synthetic data performance metrics

Louis Ravn, University of Copenhagen, Denmark

PUBLISHED ON: 2 May 2024

In recent years, the advent of synthetic data — artificially produced data used for data science tasks (Jordon et al., 2022) — has led to a questioning of a premise of “data-driven” societies: the need to collect data from real persons, objects, and events. Thanks to the growing availability of generative artificial intelligence (AI), synthetic data proponents claim, the data needed to train machine learning algorithms can now be produced artificially (Jacobsen, 2023). The promises of synthetic data have taken hold in fields as diverse as finance (e.g. Assefa et al., 2020), transport (e.g. Osiński et al., 2021), and medicine (e.g. Chen et al., 2021). This hype, however, is undergirded by an overlooked infrastructure: performance metrics.

Such metrics should be critically scrutinised with an awareness of the social scientific insight that metrics are performative: they may actively shape the world rather than merely describe it. I first sketch the current landscape of synthetic data performance metrics, before emphasising its two central issues: the lack of standardised metrics and the reliance on quantified notions of such ambiguous concepts as fairness. While the intention behind synthetic data performance metrics is often laudable, a lack of critical engagement risks troubling consequences, including the reduction of AI ethics to dataset evaluation and the bolstering of the data capitalist status quo.

The proliferation of synthetic data performance metrics

In the realm of synthetic data for machine learning, performance metrics are used to compare how a given model fares when trained on real data as opposed to on synthetic data (Jordon et al., 2022). Across both the synthetic data industry and academic research environments various performance metrics, notably around utility, quality, and fairness, are beginning to proliferate.

One leading synthetic data firm, Gretel AI, provides its business users with both quality and utility reports. While the former evaluates the similarity between real and synthetic data on a score from 0 to 100 — based on multiple statistical concepts — the latter quantitatively reports on the “utility” of the generated synthetic data for the respective machine learning pipeline. Similar quantitative metrics are used by other prominent synthetic data firms, including Hazy and MOSTLY AI.

Academic research on synthetic data metrics reveals more explicit fragmentation. One article on synthetic health data, for example, points to seven different ways to measure the utility of synthetic data (El Emam, 2020), while another paper on synthetic data generation diagnoses the “the absence of standardised metrics” (Bauer et al., 2024). Notably, synthetic data generation is also evaluated in terms of “fairness” (e.g. Chaudhary et al., 2022), denoting the absence of bias in datasets.

These examples underscore the breadth of performance metrics, the relative absence of standardised frameworks, and a reliance on the quantification of such ambiguous concepts as utility, quality, and fairness. But might these performance metrics do more than simply measure?

Metrics don’t simply measure

Researchers in various social sciences have argued that metrics do not simply measure aspects of the world. Metrics are instead performative — they participate in shaping the world. Science and technology studies (STS) scholars have pointed to various examples of this dynamic. For example, the models with which economists measure aspects of markets may actually shape markets so as to align with said models (MacKenzie, 2006). Espeland and Sauder (2007), moreover, convincingly show how quantitative university rankings — and the metrics upon which they are based — lead the entities thus measured to conform with them. 

Critical data studies have similarly showcased the power of metrics. Domínguez Hernández et al. (2023) argue that “performance metrics have performative power because they create expectations around, and effectively vouch for, the value of an algorithm” (p. 12). In the context of sustainability metrics, Archer (2024) insightfully argues that quantitative metrics reinforce dominant data-driven approaches to sustainability, in turn bolstering the power of large companies rather than questioning their role in causing climate change. Thus, as metrics do more than simply measure, what might this tell us about synthetic data performance metrics?

The politics of synthetic data performance metrics

It follows that synthetic data performance metrics are more than neutral descriptions about the utility, quality, or fairness of synthetic data. Instead, these metrics performatively induce two deeply questionable views: first, that utility, quality, and fairness can and should be measured in highly quantitative terms; and second, that if synthetic datasets score highly on these dimensions, they can be used without further investigation. In so doing, these metrics posit the evaluation of synthetic datasets as crucial, implicitly reducing the complexities of data utility, fairness, and quality to quantitative variables. Consequently, these metrics contribute to what Jacobsen (2023) astutely hinted at: a reduction of AI ethics to quantitative evaluations of a given synthetic dataset’s utility, quality, or fairness score.

What gets circumvented by a fetishisation of performance metrics, however, are deeper reflections about the ends of synthetic data generation and use. Insofar as performance metrics reinforce the misplaced sense that synthetic data imply ethicality, they risk simply bolstering the status quo of data capitalism (Steinhoff, 2022). Specifically, these metrics operate in accordance with a political economic system whose logic requires the permanent recombination and reuse of data, which today exists in a state of productive surplus (Halpern et al., 2022).

Of course, as with any technological innovation, effective policy has the power to shape its implications in socially beneficial ways. This is especially true in the case of synthetic data which due to their unique traits may reconfigure established paradigms of data policy. Significantly, then, both the General Data Protection Regulation (GDPR) as well as the recently passed Data and AI Acts remain surprisingly silent on synthetic data. While the GDPR operates with a rigid distinction between personal and non-personal data, the Data Act moves beyond this dualism (see also Beduschi, 2024). Since synthetic data often intertwine personal and non-personal data (e.g. Tinsley et al., 2022), this is a sensible approach. The AI Act, however, mentions synthetic data only in passing. A future question for policy, then, is how to work towards more standardised approaches to synthetic data without fetishising performance metrics.

Thus, it remains crucial for critical citizens and scholars to attend critically to the emergent landscape of synthetic data performance metrics — and their overlooked power in shaping this burgeoning field.

References

Archer, M. (2024). Unsustainable: Measurement, reporting, and the limits of corporate sustainability. NYU Press.

Assefa, S. A., Dervovic, D., Mahfouz, M., Tillman, R. E., Reddy, P., & Veloso, M. (2020). Generating synthetic data in finance: Opportunities, challenges and pitfalls. Proceedings of the First ACM International Conference on AI in Finance, 1–8. https://doi.org/10.1145/3383455.3422554

Bauer, A., Trapp, S., Stenger, M., Leppich, R., Kounev, S., Leznik, M., Chard, K., & Foster, I. (2024). Comprehensive exploration of synthetic data generation: A survey (arXiv:2401.02524). arXiv. https://doi.org/10.48550/arXiv.2401.02524

Beduschi, A. (2024). Synthetic data protection: Towards a paradigm change in data regulation? Big Data & Society, 11(1). https://doi.org/10.1177/20539517241231277

Chaudhari, B., Chaudhary, H., Agarwal, A., Meena, K., & Bhowmik, T. (2022). FairGen: Fair synthetic data generation (arXiv:2210.13023). arXiv. https://doi.org/10.48550/arXiv.2210.13023

Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5, 493–497. https://doi.org/10.1038/s41551-021-00751-8

Domínguez Hernández, A., Owen, R., Nielsen, D. S., & McConville, R. (2023). Ethical, political, and epistemic implications of machine learning (mis)information classification: Insights from an interdisciplinary collaboration between social and data scientists. Journal of Responsible Innovation, 10(1), 1–25. https://doi.org/10.1080/23299460.2023.2222514

El Emam, K. (2020). Seven ways to evaluate the utility of synthetic data. IEEE Security and Privacy, 18(4), 56–59. https://doi.org/10.1109/MSEC.2020.2992821

Espeland, W. N., & Sauder, M. (2007). Ranking and reactivity: How public measures recreate social worlds. American Journal of Sociology, 113(1), 1–40. https://doi.org/10.1086/517897

Halpern, O., Jagoda, P., West Kirkwood, J., & Weatherby, L. (2022). Surplus data: An introduction. Critical Inquiry, 48(2), 197–210. https://doi.org/10.1086/717320

Jacobsen, B. N. (2023). Machine learning and the politics of synthetic data. Big Data & Society, 10(1), 1–12. https://doi.org/10.1177/20539517221145372

Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., Cohen, S. N., & Weller, A. (2022). Synthetic data—What, why and how? (arXiv:2205.03257). arXiv. https://doi.org/10.48550/arXiv.2205.03257

MacKenzie, D. (2006). Is economics performative? Option theory and the construction of derivatives markets. Journal of the History of Economic Thought, 28(1), 29–55. https://doi.org/10.1080/10427710500509722

Osiński, B., Jakubowski, A., Zięcina, P., Miłoś, P., Galias, C., Homoceanu, S., & Michalewski, H. (2020). Simulation-based reinforcement learning for real-world autonomous driving. 2020 IEEE International Conference on Robotics and Automation (ICRA), 6411–6418. https://doi.org/10.1109/ICRA40945.2020.9196730

Steinhoff, J. (2022). Toward a political economy of synthetic data: A data-intensive capitalism that is not surveillance capitalism? New Media & Society, 1–17. https://doi.org/10.1177/14614448221099217

Tinsley, P., Czajka, A., & Flynn, P. (2022). Haven’t I seen you before? Assessing identity leakage in synthetic irises (arXiv:2211.05629). arXiv. https://doi.org/10.48550/arXiv.2211.05629