Misguided: AI regulation needs a shift in focus

Agathe Balayn, Delft University of Technology (TU Delft), Netherlands
Seda Gürses, Delft University of Technology (TU Delft), Netherlands

PUBLISHED ON: 30 Sep 2024

This opinion piece is part of AI systems for the public interest, a special issue of Internet Policy Review guest-edited by Theresa Züger and Hadi Asghari.

Funding Note

This piece is based on broader research conducted within the Programmable Infrastructures Project at TU Delft. This work is partially funded by the NWO AlgoSoc Project (NL) as well as a research grant from the Botnar Foundation (CH).

Our current policy and research focus on artificial intelligence (AI) needs a paradigmatic shift in order to regulate technology effectively. It is evident that AI-based systems and services grab the attention of policymakers and researchers in light of recent regulatory efforts, like the EU AI Act and subsequent public interest technology initiatives focusing on AI (Züger & Asghari, 2023). While these initiatives have their merits, they end up narrowly focusing efforts on the latest trends in digitalisation. We argue that this approach leaves untouched the engineered environments in which digital services are produced, thereby undermining efforts to regulate AI and to ensure that AI-based services serve the public interest. Even when policymakers consider how digital services are produced, they assume that AI is captured by a few cloud companies (Cobbe, 2024; Vipra & West, 2023), when, in fact, AI is a product of these environments. If policymakers fail to recognise and tackle issues stemming from AI’s production environments, their focus on AI may be misguided. Accordingly, we argue that policy and research should shift their regulatory focus from AI to its production environments. In order to illustrate the potential of this shift, we start with a short history of forms of digitalisation and the regulatory response to each form, using the example of a mundane but familiar public service: the postal one.

AI is the game changer in regulating digitalisation

A missed package might lead you to query your national postal service as to its whereabouts. Doing so is likely to present an array of challenges. Until recently, these would have required you to scroll through long lists of FAQs or to type to a somewhat basic chatbot, all in order to eventually dig out a phone number. A call would test your skills in attentive listening to menu items before you finally reach a customer ”support agent”, also known as a human being.1 This user experience is the product of what the industry calls “software-supported customer service”. In the last two decades, portions of postal websites replaced the support that used to be provided by employees at post offices. These websites, nominally developed for the public’s benefit, brought cost savings at a time when public spending on postal services was under pressure across Europe. Yet, this method of providing customer service, which claimed to broaden access to the postal service, also increased customer self-management, throwing those whose problems didn’t fit the FAQ templates into a Kafkaesque limbo. It also led to discrimination against the elderly, those with accessibility needs, and others who can’t access the required technology. These real concerns notwithstanding, when these websites launched policymakers did not imagine they warranted attention.

Initially, software-based customer services tended to be software the institution procured, commissioned from software consultancies or built in-house. This model was adopted widely, including by hospitals, transportation, and governmental institutions more generally. In-house production meant the software was customised to the institution’s internal operations and clientele. These projects could be costly or lengthy, and development frequently ended in failure. Struggling with failed projects and profitability, companies started pushing software-as-a-service (SaaS) offerings (e.g. Zendesk) and the old procurement models gave way to them. Software providers promised that institutions could avoid costly in-house development, reduce IT capacity, or even skip procurement. The advantages of the subscription model were felt upfront; however, their long-term impact on institutional finance, governance, and quality of service remain harder to evaluate. All of this also stayed under the radar of regulators, seen as mundane elements of digitalisation.

Championing digitalisation without regulation remained the state of play until the latest “AI revolution” and the dynamic policy response that followed. Today, the use of AI chatbots (e.g. using large language models – LLMs – such as OpenAI’s ChatGPT) as customer-service tools is making headlines, sometimes with claims to intelligence, sometimes to mock their comical failures. Unlike before, the dawn of AI-based services2 has triggered a plethora of policy-making efforts. The list is long: the EU AI Act, the US AI Bill of Rights, and the UK AI Regulation White Paper, plus diverse new AI ethics research initiatives such as the Partnership on AI in the US or the European Laboratory for Learning and Intelligent Systems (ELLIS), as well as funding for research and policy-making towards the governance of AI. What’s more, every time companies introduce services integrating new AI techniques, all these efforts shift their focus.
For example, in reaction to the latest hyped trends in AI, in the eleventh hour policymakers introduced new articles into the AI Act to address risks specific to “general purpose AI” (referring, for instance, to the latest developments in large language models). That policymakers felt obliged to add new articles to the Act raises questions about the effectiveness and long-term viability of the regulation when further AI techniques are introduced. This reactive behaviour is especially concerning when companies present incremental progress as if it were a breakthrough: the painstaking digitalisation of postal services demonstrates that iterations of the delivery of digital services and AI are decades in the making. From a technical point of view, they hardly qualify as breakthroughs.

The gap between digitalisation and policy and research responses to AI raises the following questions: are AI-based services drastically different from prior digital services? Are they really what should be the focus of current regulatory and research attention?

The rise of agile production environments

To answer these two questions, let’s reverse the assumption and assume that AI-based services such as LLM chatbots are just an extension of their software production environments (just like in any form of production, developers set up digital software production environments that impact how and what they produce).

Modern digital services, including those relying on AI models, are typically produced using service architectures and agile methods (Gürses & van Hoboken, 2018; Bertulfo, in press). When consumer devices like smartphones turn into a brick without an internet connection, it is because their programmes are not running solely on the device but are in fact a service they access via the internet. When update notifications fill a device screen, it is because of agile development. While we may take this mode of production for granted, this is the result of at least three decades of incremental yet non-trivial transformations. In other words, while cloud-based digital services are the default mode of software production nowadays, it hasn’t always been the case.

Even this early in its short history, high rates of failure among software projects have proven that software production is hard. A software project requires mastery of technical, economic, and management challenges. For an in-house project, for example, the postal service has to hire qualified engineers and have them work continuously for years before the system is ready for deployment. Until the 2000s, this was the typical workflow of the “waterfall model” (Ruparelia, 2010). To address these challenges, the industry incrementally changed the mode of software production to use service architectures and agile methods. Combined, they enable other businesses to reuse parts of existing software (i.e. services) developed by one organisation to build software for a different organisation and to quickly develop and test prototypes and reduce time-to-market in delivering services (Gürses & van Hoboken, 2018). Current AI-based services are the ultimate product of these software production environments. They rely on a plethora of services and depend on rapid iterations throughout the development and deployment of each service.

What does this look like for the “deployers” – as the AI Act would call them – who adopt AI-based services? Looking back at our example, the postal services would depend on many third-parties. The post is the deployer of the AI-based chatbot of a service provider (another term used in the AI Act), such as Salesforce, which might have repeatedly fine-tuned an LLM with some of the post office’s data. The postal service's clientele then becomes users of this chatbot. For its service to the post, Salesforce may rely on a foundation model3 (accessible as a service) from another provider (e.g. OpenAI). In turn, to speed up time-to-market, Open AI’s foundation model may rely on another bundle of services internal or external to their organisation. Providers include crowdsourcing companies (e.g. Amazon Mechanical Turk) for annotating or synthesising more training data sets, as well as cloud (e.g. Microsoft Azure) and hardware (e.g. NVIDIA GPUs) companies that provide the resources to train their foundational model.

The transformations driven by agile production environments

While these production environments promise to bring agility to the business of computing, they also bring broader political and economic transformations. With services provided by remote servers, the “software product” that catapulted Microsoft to a behemoth ceases to exist.

Service architectures multiply the number of relationships among service providers and deployers. Any service provider (e.g. Alphabet, OpenAI, or DeepL), is now able to engineer services (e.g. Google analytics, chatbot, translation) that can be used by thousands of deployers in a plug-and-play manner. Conversely, clients like the postal service hardly ever use a single service provider. Users interacting with an organisation’s website or app may in fact be interacting with dozens of services that, like matryoshka dolls, may comprise further bundles of services, as was the case when the post adopted Salesforce’s chatbot.

Service providers that adopt agile methods do so to make their services responsive and customisable (Bertulfo, in press). Agile methods prioritise interactions, communication, and feedback loops, to ensure services are responsive to customer needs, technical changes, and market volatility (Gürses & van Hoboken, 2018). Agile production further promises deployers the option of customising the services they offer their clientele. To fulfill these goals, deployers and their clientele are subjected to continuous monitoring. The data thus generated becomes input to experiments with new features (e.g. through AB testing) and fuels rapid updates across thousands of deployers. As a result, service providers gain a powerful vantage point from which to tailor responsiveness and customisation for their thousands of deployers. With agile methods, providers can weave their own business interests seamlessly into the everyday functionality integral to the deployers. For instance, Google Analytics, a service used by over 300 million websites to analyse site visits, also happens to serve Google’s web-wide tracking needs for their advertising business.

What is specific to AI in all this? Not a lot! AI-based services are developed and deployed in agile production environments, which include providers and deployers utilising massive amounts of data (typically from existing services), monitoring, and experimentation to provide customisation and responsiveness. Businesses promoting AI promise that these modifications can be ”simply” made by retraining the models. This is a non-trivial advantage over delivering customisation by manually programming new features for different contexts, and it promises to increase the agility of the production environment. Most importantly, even the latest techniques in AI don’t upend agile software production environments but are key to iterating over them.

Agile production environments are the source of harms and power imbalances

Agile production environments impact the power relationship between providers and deployers (Gürses & van Hoboken, 2018; Kostova et al., 2020). When they adopt digital services, deployers, however much they may want to serve the public interest, become heavily dependent on both the quality of the service and the business interests of the many providers bundled into the services they adopt. This can cause trouble, especially when deployers (e.g. the postal office deploying the chatbot) cannot identify the source of the services they are interacting with, where changes come from, or when they happen. Even when these arrangements are transparent, the postal services may not have sufficient power to contest a provider's business decision

Paradoxically, the same agility that is meant to allow providers to respond to business volatility may create volatility for deployers and their users alike. Agility encourages service providers to experiment with or remove pertinent features, prioritise segments of customers, or easily discontinue services (Bertulfo, in press). For example, a service provider’s attempts to save on costs of data labeling or collection may produce unexpected errors that impact the ability of deployers to serve the public reliably.

Finally, while theoretically anyone can set up an agile production environment, deploying agile services globally or at large scale is the business of a few companies (Balayn & Gürses 2021; Troncoso et al. 2022; Cobbe, 2024; Luitse, 2024). As the world moved away from software as a product to software as a service, even companies that previously dominated the software market, like Microsoft, pivoted from selling software to providing cloud environments for agile production of services. But this is not just about the cloud: a digital service requires a server and a client to deliver the service. Microsoft’s Azure, Amazon’s AWS, and Alphabet’s Google Cloud Platform dominate the server end of this production environment, while the same three and Apple dominate the operating systems of client devices. As a result, any AI-based service is likely to pass through the agile production environments, clouds, and end-devices making up these few companies’ computational infrastructure (e.g. OpenAI founded in 2015 started producing models at scale on Microsoft Azure by 2016). By now, producing AI-based services outside the computational infrastructures controlled by these companies is difficult and this makes countering the companies' decisions or mistakes an uphill battle (Troncoso et al., 2022).

Agile production environments constitute a challenge to tackling AI-specific risks

"But are there no issues specific to AI-based services that regulators should act on?” you may ask. Due to their stochastic character (referring to randomness in outputs) and their ambition to replace manual programming with ”learning from data”, yes, AI models do bring unique harms. These include questions of bias and unfairness (Mehrabi et al., 2021; Buolamwini & Gebru, 2018), hallucinations (Li et al., 2023; Bender et al., 2021), privacy leakage (Kim et al., 2024), and copyright infringements (Birhane & Prabhu, 2021). In addition, these services may randomly serve some users and fail others: AI-based chatbots may not understand accents that differ from the ”norm” (Markl, 2022), and may provide offensive or wrong answers causing social and economic damages. While not all of these issues are new, adding AI makes them stochastically and unpredictably salient, despite the promise of making them measurable, and hence, manageable (Mehrabi et al., 2021; Roberts, 2016).

The claim that problems of AI models are measurable and hence manageable may hold on paper, but the workings of agile production environments override these promises. AI-based services may be deployed to thousands of organisations. In the process, these systems exacerbate the harms of agile production outlined above while lacking the conditions necessary to apply even narrowly defined technical or legal remedies to mitigate the known harms of AI models (Balayn & Gürses, 2021). For example, applying unfairness mitigation (Balayn & Gürses, 2021) or privacy (Kostova et al., 2020) remedies in a coordinated manner to a stack of providers is difficult. Short of that, the AI-based services that bundle these services would quickly become unfair again, randomly filtering outputs (unfairly or not). To return to our example, if Salesforce misses OpenAI’s update to its LLM, the postal chatbot may start blurting nonsense about absent packages and filtering which user populations get answers. This means that applying protective measures such as unfairness mitigation against issues in the output of LLMs poses coordination challenges. Companies like Apple, Microsoft, Amazon, and Alphabet, which already have greater power over current production environments and greater leverage over how services are bundled, are likely better placed to respond to such challenges. If things turn out like they did for privacy-by-design (Kostova et al., 2020), where infrastructure players use privacy-enhancing technologies to entrench their dominant position, demanding trustworthy or public interest AI may inadvertently end up empowering these already-powerful companies and expanding their influence over agile production environments.

Finally, as AI-based services replace human agents, deployers may struggle to respond to AI models’ harms stemming from agile production environments. The multiplication of relationships between service providers and millions of deployers creates a responsibility gap. To close this gap, providers will likely rely on the plug-and-play labour force of agile production environments – that is, task workers (e.g. by having crowd workers collect more data) (Roberts, 2016; Toxtli et al., 2021). Both the lack of accountability across services, as well as the continuing reporting on abusive labour conditions for task workers, are hard to reconcile with policy approaches meant to ensure these technologies serve the public’s interest.

Policymakers should regulate agile production environments

AI-based services are the product of agile production environments that, so far, elude regulation. These dynamic environments are a rich source of harms due to the inherent distribution of responsibility and power asymmetries, all of which are undergirded by infrastructural concentration. Agile production is heavily pushed by these companies and will bring about economic and technical transformations to providers and deployers whose effects are not yet well understood. The responsiveness and customisation that services can achieve with AI may promise benefits to service providers, but the same agile environments make tackling harms specific to AI challenging, with downstream effects for deployers, and in the end, all of us.

This reality is glaringly disconnected from the current policy focus on AI models. What does this schism mean for the adequacy of approaches like the AI Act? Remarkably, the final version of the AI Act distributes responsibilities across what it calls ”value chains”, acknowledging the many providers, deployers, and users who are involved in AI production environments. This is promising, and yet it is difficult to assess whether the AI Act can rein in software production environments that have been in development for decades. For example, the proposals for sandboxes, monitoring, and resulting standards do not account for continuous system updates, the complexity of coordinating multiple actors, or the concentration of computational infrastructures. Nor does the Act address the broader harms of this production environment affecting users, institutions, workers, or our surroundings (Bender et al., 2021; Cobbe et al., 2021, Balayn & Gürses, 2021). The AI Act provides little guidance and few safeguards for the lack of accountability inherent to agile production. It assumes that institutional accountability measures will make up for the failures of the agile production environment. This may work for some organisations, but it is generally concerning, and especially alarming, when AI models are used by law enforcement and migration authorities, who are often criticised for failures in institutional accountability and are exempt from the AI Act.

With their emphasis on AI models, regulators risk running after technology trends while normalising a software production environment. This advantages tech companies over public and private organisations whose operational capacity is to be replaced by agile services. If protecting deployers and the broader public is their ambition, we urge regulators to further account for the more fundamental challenges brought by agile software production environments. AI models may be moving fast, but it takes decades to establish the production environments needed to deliver digital services. Regulating agile production environments, rather than the latest technology trends, ought to help make regulating digital services more robust against new technology shifts. We are confident that this would strengthen regulators’ ability to rein in the negative impacts of digitalisation and to ensure digitalisation (not just AI) more broadly serves the public interest.

Acknowledgements

We thank Donald Jay Bertulfo, Corinne Cath, Michael Veale, Frederik Borgesius, Anushka Mittal, Frédéric Dubois, and Wendy Grossman for their comments on this op-ed.

References

Balayn, A., & Gürses, S. (2021). Beyond debiasing: Regulating AI and its inequalities [Report]. European Digital Rights (EDRi). https://edri.org/wp-content/uploads/2021/09/EDRi_Beyond-Debiasing-Report_Online.pdf

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Bertulfo, D. (In press). Agile production.

Birhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? 2021 IEEE Winter Conference on Applications of Computer Vision, 1536–1546. https://doi.org/10.1109/WACV48630.2021.00158

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 81, 77–91. http://proceedings.mlr.press/v81/buolamwini18a.html

Cobbe, J. (2024). The politics of artificial intelligence: Rhetoric vs reality. Political Insight, 15(2), 20–23. https://doi.org/10.1177/20419058241260785

Cobbe, J., Veale, M., & Singh, J. (2023). Understanding accountability in algorithmic supply chains. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1186–1197. https://doi.org/10.1145/3593013.3594073

Gürses, S., & van Hoboken, J. (2018). Privacy after the agile turn. In E. Selinger, J. Polonetsky, & O. Tene (Eds.), The Cambridge handbook of consumer privacy (1st ed., pp. 579–601). Cambridge University Press. https://doi.org/10.1017/9781316831960.032

Kim, S., Yun, S., Lee, H., Gubri, M., Yoon, S., & Oh, S. J. (2024). ProPILE: Probing privacy leakage in large language models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & Levine, S. (Eds.), Advances in Neural Information Processing Systems (Vol. 36). https://proceedings.neurips.cc/paper_files/paper/2023/hash/420678bb4c8251ab30e765bc27c3b047-Abstract-Conference.html

Kostova, B., Gürses, S., & Troncoso, C. (2020). Privacy engineering meets software engineering. On the challenges of engineering privacy by design (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2007.08613

Li, J., Cheng, X., Zhao, X., Nie, J.-Y., & Wen, J.-R. (2023). HaluEval: A large-scale hallucination evaluation benchmark for large language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 6449–6464). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.397

Luitse, D. (2024). Platform power in AI: The evolution of cloud infrastructures in the political economy of artificial intelligence. Internet Policy Review, 13(2). https://doi.org/10.14763/2024.2.1768

Markl, N. (2022). Language variation and algorithmic bias: Understanding algorithmic bias in British English automatic speech recognition. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 521–534. https://doi.org/10.1145/3531146.3533117

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607

Roberts, S. T. (2016). Commercial content moderation: Digital laborers’ dirty work. In S. U. Noble & B. Tynes (Eds.), The intersectional internet: Race, sex, class and culture online. Peter Lang Publishing. https://ir.lib.uwo.ca/commpub/12/?utm_source

Ruparelia, N. B. (2010). Software development lifecycle models. ACM SIGSOFT Software Engineering Notes, 35(3), 8–13. https://doi.org/10.1145/1764810.1764814

Toxtli, C., Suri, S., & Savage, S. (2021). Quantifying the invisible labor in crowd work. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–26. https://doi.org/10.1145/3476060

Troncoso, C., Bogdanov, D., Bugnion, E., Chatel, S., Cremers, C., Gürses, S., Hubaux, J.-P., Jackson, D., Larus, J. R., Lueks, W., Oliveira, R., Payer, M., Preneel, B., Pyrgelis, A., Salathé, M., Stadler, T., & Veale, M. (2022, September). Deploying decentralized, privacy-preserving proximity tracing. Communications of the ACM, 65(9), 48–57.

Vipra, J., & West, S. M. (2023). Computational power and AI [Comment submission]. AI Now Institute. https://ainowinstitute.org/publication/policy/computational-power-and-ai

Züger, T., & Asghari, H. (2023). AI for the public. How public interest theory shifts the discourse on AI. AI & Society, 38(2), 815–828. https://doi.org/10.1007/s00146-022-01480-5

Footnotes

1. As an example, one might want to try out the Dutch postal service system.

2. We focus here on the use of AI in digital services intended for a large user base and that rely on agile production methods. This leaves out, for example, data science projects developed for analysis, reporting, or internal use in one or few organisations. Concerns around AI services that are non-user facing, e.g. farming applications, are equally impacted by the production environment we introduce here but are outside the scope of this op-ed.

3. Foundation models (also termed “general-purpose AI” by the AI Act) such as LLMs are a specific type of AI model. These models are trained on large datasets and are big enough to be applied to many use cases. Note that they often first require to be fine-tuned with additional data from a specific use case to become accurate enough and applicable to this use case.