Black box algorithms and the rights of individuals: no easy solution to the “explainability” problem

: Over the last few years, the interpretability of classification models has been a very active area of research. Recently, the concept of interpretability was given a more specific legal context. In 2016, the EU adopted the General Data Protection Regulation (GDPR), containing the right to explanation for people subjected to automated decision-making (ADM). The regulation itself is very reticent about what such a right might imply. As a result, since the introduction of the GDPR there has been an ongoing discussion about not only the need to introduce such a right, but also about its scope and practical consequences in the digital world. While there is no doubt that the right to explanation may be very difficult to implement due to technical challenges, any difficulty in explaining how algorithms work cannot be considered a sufficient reason to completely abandon this legal safeguard. The aim of this article is twofold. First, to demonstrate that the interpretability of “black box” machine


Introduction
Recent advances in the development of machine learning (ML) algorithms, combined with the massive amount of data used to train them, has changed dramatically their utility and scope of applications. Software tools based on these algorithms are now routinely used in criminal justice systems, financial services, medicine, research and even in small business. Many decisions affecting important aspects of our lives are now made by algorithms rather than humans. Clearly, there are many advantages to this transformation. Human decisions are often biased and sometimes simply incorrect. Algorithms are also cheaper and easier to adjust to changing circumstances.
But algorithms have not proven a panacea. Despite promises to the contrary, there have been several instances of bias and discrimination discovered in algorithmic decision-making (Buiten, 2019, p. 42), particularly disturbing in the case of criminal justice (Huq, 2019;Richardson et al., 2019). Of course, once discovered, such bias can be removed and algorithms can be validated as non-discriminatory before they are deployed. But there is still widespread uneasiness-particularly among legal experts-about the use of these algorithms. Most of these algorithms are selflearning and their designers have little control over the models generated from the training data. In fact, computer scientists were formerly not very interested in studying these models because they were (and are) often extraordinarily complex (the reason they are often referred to as "black boxes"). The standard approach was that as long as an algorithm worked correctly, no one bothered to analyse how it worked 1 .
This approach changed once the tools based on ML algorithms became ubiquitous and began directly affecting the lives of ordinary people (Pasquale, 2015). If the decision about how many years one will spend in prison is made by an algorithm, the convicted should have the right to know how this decision is made. 2 In other words, there is a clear need for the transparency and accountability of automatic decision-making (ADM) algorithms (Larsson & Heintz, 2020).
In recent years, many published papers have addressed the interpretability (variously defined) of models generated by ML algorithms. It has been argued that inter-1. This is how Chris Anderson summarised this approach: "Forget taxonomy, ontology, and psychology.
Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. " (Anderson, 2008, n.p.) 2. Advanced algorithms have been used in criminal justice systems, both in the United States and increasingly in Europe (Završnik, 2019).
pretability is not a monolithic notion. As a result, the subjectivity of each interpretation, due to different levels of human understanding, implies that there must be a multitude of dimensions that together constitute interpretability (Chakraborty et al., 2017). However, Zachary Lipton (2018) suggests that not only is the concept of interpretability muddled, it is also badly motivated. The approval of EU regulation 2016/679 (General Data Protection Regulation or GDPR) in 2016 prompted discussion of a related legal concept, the right to explanation. If this right is indeed mandated by GDPR (in effect since 2018), then software companies conducting business in Europe 3 are immediately liable if they are not able to satisfy this right.
The aim of this paper is to answer the question of whether and to what extent-given the specificity of ML systems-it is possible to provide information that would demonstrate algorithmic fairness, and as a result, compliance with the right to explanation. The first section analyses the concept of explanation within its legal as well as psychological context. We then demonstrate-using a case study of a music recommendation system-that the interpretability of "black box" algorithms is a challenging technical problem for which no solutions have yet been found. To that end, we show that models created by ML algorithms are inherently so complex that they cannot be "explained" in a meaningful way to an ordinary user of such systems. Instead, rather than looking "inside" an algorithm, we propose focussing on its statistical fairness and correctness. A promising way to achieve this goal may be to introduce event logging mechanisms and certification schemes, which are currently being used very successfully in the IT sector.

What is the right to explanation
One of the goals of the GDPR was to adapt EU regulations to modern methods of data processing, such as cloud computing or big data. 4 Hence, the EU legislature introduced a number of new provisions-including the widely discussed right to data portability (de Hert et al., 2018)-and expanded existing regulations (Hoofnagle et al., 2019), such as provisions on the right to information and automated decision-making.
3. It should be remembered that, due to the so-called territorial scope of application, the provisions of the GDPR should also be applied by entities having their headquarters in third countries (that is, outside the EEA) but directing their services to the market of at least one of the member states (de Hert & Czerniawski, 2016). The issue of the cross-border application of the GDPR is another practical problem in the enforcement of EU data protection legislation (Greze, 2019).
4. It is disputable to what extent this goal has been achieved. Tal Zarsky points out that "the GDPR fails to properly address the surge in Big Data practices. The GDPR's provisions are-to borrow a key term used throughout EU data protection regulation-incompatible with the data environment that the availability of Big Data generates" (Zarsky, 2017, p. 996).
According to the EU data protection model, every person has the right to know both the scope of data processed about them and the purpose of such processing.
Furthermore, the data controller is required to provide them with this information "in a concise, transparent, intelligible and easily accessible form, using clear and plain language" (GDPR, 2016, Art. 12(1)).
In the EU legal system, the right to the protection of personal data-as well as the right to privacy-have been included in the catalogue of fundamental rights (CFR, 2012). Furthermore, it should be noted that, although both rights are closely related, they are, in fact, independent rights. This means, in particular, that-at least in the scope of EU law-data protection laws may be infringed even if privacy has not been affected in any way. Undoubtedly, one of the main goals of establishing dedicated data protection regulations is to guarantee the rights and freedoms of individuals in the digital era, and protect them from new types of threats arising from rapid technological development and the globalisation of modern IT services.
Article 22 of the GDPR is aligned with this goal; it introduces the right to not be subject to a decision made as a result of automated data processing that legally affects an individual or otherwise has a significant impact upon them. This regulation was also enshrined in Directive 95/46, the GDPR's predecessor, which was in place for over 20 years. However, since bulk algorithmic processing of personal data has developed rapidly only within the last two decades, the practical significance of this provision was insignificant. The situation has changed with the growth in profiling, including profiling for purposes other than advertising products and services (Data Is Power, 2017). It is worth noting that Article 22 of the GDPR does not explicitly provide for an individual's right to explanation of an automated decision. Instead, it sets out the general principle that an individual may object to automated decision-making (Malgieri & Comandé, 2017, p. 246).
In the case of automated decision-making, the EU legislature has extended the information obligation imposed on data controllers by introducing in Article 15(1)(h) of the GDPR the need to provide "meaningful information" on the logic involved in such decisions, taking into account the "significance and the envisaged consequences of such processing for the data subject". And it is this regulation that is the source of the term "right to explanation", though the phrase itself does not appear directly in the wording of the regulation. This interpretation is confirmed by Recital 71 of the GDPR, which states that processing based on automated decisions should always be subject to suitable safeguards, including the "right to obtain an explanation of the decision reached after such assessment and to challenge the decision".
Hence, the question arises at the outset as to whether the right to explanation is in fact a separate (per se) right of an individual or just an element of a broader right-the right to information. Some scholars have questioned the very existence of such a right (Wachter et al., 2017), while others have pointed out that, regardless of how the right to explanation is defined, it is not "illusory" (Selbst & Powles, 2017). Undoubtedly, the right to explanation serves a specific purpose-to enable an individual to challenge the correctness of a decision that has been made by an showing that commonly used price comparison online services can also have "significant effects" on individuals (Veale & Edwards, 2018, p. 401).
The term "to obtain an explanation" used in the context of an automated decision may suggest that the obligation of a controller using automated decision-making is to explain how the algorithm reaches a specific result, which, according to Article 13(1) of the GDPR, should be presented in a transparent and intelligible form, using "clear and plain language". A significant part of the controversy surrounding the right to explanation relates precisely to the possibility of meeting this condition.
Before trying to identify the source of the difficulty, the term "explanation" in the context of decision-making needs to be clarified. Decision-making tools are based almost exclusively on classification algorithms. Classification algorithms are "trained" with data obtained from past decisions to create a model which is then used to arrive at future decisions. In this case the model requires an explanation, not the algorithm itself (in fact, different algorithms may be generated by very similar models).
When a user submits their information to a decision-making tool, an answer is generated-such as a number, a No, or a category such as "high risk". From the wording of Recital 71 (which states that the user has the right to challenge the decision) it is clear that the right to explanation is provided for cases where the answer given by the tool is different from what the user expected or hoped for. The most straightforward question an individual may then ask is: "Why X?". When the user asks "Why X?", having expected a different answer ("Y"), they mean in fact to ask: "Why X rather than Y?". This type of question calls for a contrastive explanation (Miller et al., 2017). The answer that needs to be provided to the user must contain not only the explanation as to why the information provided by the user generated answer X, but also what information must change in order to generate answer Y (the one the user was expecting).
When people ask "Why X?", they are looking for the cause of X. Thus, if X is a negative decision for a loan application, an answer would need to specify what information in an application (the so-called "features" used as input in the model) caused X. It should also be remembered that the decision-making tool making a decision for a user is replacing a human that used to make such decisions. In fact, a person reporting a decision to the user may not clearly state that the decision is the verdict of an algorithm (judges in the US routinely use software-based risk assessment tools to help them in sentencing). The user may thus expect that the explanation provided uses the language of social attribution (Miller et al., 2017), that is, explains the behaviour of the algorithm using folk psychology.

A case study: Building a music recommendation system
As it was argued in the introduction, algorithm interpretability is a challenging task for their designers. Three barriers to the transparency of algorithms in general are usually distinguished: (1) intentional concealment whose objective is the protection of intellectual property; (2) lack of technical literacy on the part of users; (3) intrinsic opacity which arises from the nature of ML methods. A right to explanation is probably void when trade secrets are at stake (see Recital 63 of the GDPR; see Article 29 Working Party, 2017, p. 17), but the other two barriers still need to be addressed. In fact, these two barriers depend on each other. The complexity of ML methods positively correlate with the level of technical literacy required to comprehend them.
The most obvious solution to the second barrier would be implementing educational programmes aimed at transferring knowledge about the functioning of modern technologies. This could be achieved with stronger education programmes in computational thinking, and by providing independent experts to advise those affected by algorithmic decision-making (Lepri et al., 2018). The effectiveness of this solution, however, is questionable: even if it were possible to improve technical literacy education (which seems very unlikely given previous experience in this area), that still leaves 80% of the population who completed their education many years ago.
As a solution to the last barrier, namely, the lack of transparency relating to the nature of ML methods, some sort of evidence gathering based on registering the key parameters of the algorithm should be sufficient (Wachter et al., 2017). Indeed, collecting this type of data would certainly help to understand how a system arrived at a specific decision. That said, it would still be completely unrealistic to expect a layperson to grasp these concepts.
Over the last few years much work has been done on "black box" model explana- . None of them attempt to explain fully the two contrasting paths ("why X rather than Y") in a model that lead to distinct classification results (which, as stated above, is necessary for a contrastive explanation).
Indeed, explaining the black box model of an ADM algorithm is much harder than is normally assumed. To illustrate this case better, we describe in this section recent work we were involved with (Shahbazi et al., 2018) on designing a song recommendation system for KKBOX, Asia's leading music streaming service provider.
KKBOX had provided a training data set that consisted of information from listening sessions for each unique user-song pair within a specific timeframe. This information available to the algorithm includes information about the users, such as identification number, age, gender, etc., and about songs, such as length, genre, singer, etc. The training and the test data were selected from users' listening history in a given time period and had around 7 and 2.5 million unique user-song pairs respectively.
The quality of a recommendation system's predictions relies on two principal factors: predictive features available from past data (for example, what songs the user has listened to the most) and an effective learning algorithm. Very often, these features are only implicit in the training data and the algorithm is not able to extract them by itself. Feature engineering is an approach that exploits the domain knowledge of an expert to extract from the data set features that should generalise well to the unseen data in the test set. The quality and quantity of the features have a direct impact on the overall quality of the model. In this case, certain statistical features were created (or extracted, because they were not explicitly present in data), including the number of sessions per user, the number of songs per session and the length of time a user had been registered with KKBOX.
As a result, the number of features available to the algorithm was increased by a factor of about 10, to 185. And this is the key point: some of these derived features turned out to be extremely important in determining a user's taste in songs and, as a result, the recommendation that was provided. But it should be emphasised that none of these features were explicitly present in the original data. The paradox is that if someone asked for an explanation of how the model worked, the answer would have to be based on features not present in the source data.
But this is only part of the story. The solution provided did not use a single algorithm to make a prediction. In total, five different algorithms were used, all of them very complex. Thus, here is another key point: the final model was the weighted average of all five models' predictions. Again, it should be stressed that the result was not the outcome of just one algorithm. Figure 1 shows the complexity of one of these algorithms in the form of a simplified neural net structure. 5 5. Each of these steps has not been explained in detail as the key point is simply to present the complexity of the entire prediction process, not its technical aspects.
FIGURE 1: Structure of one of the algorithms used in the recommendation system (Shahbazi et al., 2018) The model that was generated by these algorithms was also extremely large and complex. Since gradient boosting decision tree algorithms were used, the resulting model was a forest of such decision trees. 6 The forest contained over 1,000 trees, each with 10-20 children at each node and at least 16 nodes deep. 7 The question arises as to how a user can understand this model. One can begin by assuming that a user wants an explanation for why song X was recommended rather than song Y. There will be multiple trees with the X recommendation as well as the Y recommendation. But which one offers the right choice? These multiple trees cannot be generalised as this has already been done by the algorithm (one of the most difficult aspects of algorithms based on decision trees is their optimisation, that is, generating the simplest, most general trees). Indeed, an ordinary user would not be able to comprehend the model, let alone understand an explanation that uses vocabulary entirely foreign to them. It is up to the experts to verify the explanation and convey this verification to the user.
The ADM models are often even more complex than the system described above.
Machine learning is heuristics-driven and no one expects rigorous mathematical proofs of the correctness of its algorithms. What often happens is that, if a model generated by an algorithm does not correctly classify the test data, a designer will place another algorithmic layer on top of it in the hope that it improves the re-6. Nodes in a decision tree store conditions that have to be satisfied (for example, she must be under 15 years of age) if a user is to be recommended a particular song.

Who needs the explanations anyway?
The juxtaposition of legal requirements arising from the GDPR with the specificity of ML systems has led to serious doubt about the actual usefulness of the right to explanation of an automated decision. Proponents of the view that the right to explanation is useless in the world of machine learning systems highlight two important arguments: one of a technological nature and the other of a social nature.
First of all, as stated above, the way ML systems work makes it difficult (or even impossible) to present the criteria used by an algorithm when resolving a given case. It should be remembered that the decisions made by ML systems largely depend on the data used in the system learning process (this is related to the socalled incremental effect). 8 This conclusion is based not only on the presumption that understanding algorithms is too difficult for people, but also on the fact that, in general, the way algorithms operate and process information is qualitatively different from how humans operate and process information and, as such, the term "interpretability" has a different meaning both for people and ML algorithms (Krishnan, 2019). However, even if the technological limitation is overcome, another problem becomes apparent: the average individual's lack of knowledge and expertise in analysing and evaluating the very complex results of operations carried out by advanced ML algorithms, where highly specialised knowledge is needed.
The latter issue will be analysed first. It can be reduced to the following argument: It is not necessary to explain the decisions made by the algorithms because no one will understand the explanation in the first place. If this were true, the same reasoning could be applied to the problem of analysing flaws and defects related to the operation of other advanced systems and products, such as cars and airplanes. Most users do not understand how a CPU works, but they are not denied the right to de-8. The incremental effect consists of changing the operation of the algorithm as a result of providing new information to the database. The algorithm "learns" on the basis of the new information, which may lead to a different interpretation of the information processed previously. Hence, the result of the algorithm is variable over time, which means that by providing the same data for analysis, different outcomes can be obtained. This leads to the conclusion that, in the case of ML algorithms, attempting to confirm their correct operation by processing the same data set at another time is not a good strategy.
termine whether it was a processor failure that caused a plane to crash. Technology is becoming more and more complex every year, and this is true not only of the IT world. Most people do not understand the medical therapies they undergo, economic processes that affect their financial position or legislation-even though they are obliged to abide by it. At the same time, if an individual considers that they have suffered harm, or that their rights have been undermined, they can take their case to court. One does not have to be a professor of medicine to claim compensation for medical malpractice. 9 The scope or existence of this right should not be contingent upon whether the wrong diagnosis was made by a medical practitioner or by an algorithm. If the court decides that expert knowledge is needed to resolve a given case, it will appoint expert witnesses to assess the evidence gathered in that case. In this way, expert witnesses can help determine the causes of a plane crash, whether medical malpractice took place, or who has liability for a leaking roof in a house. Experts familiar with modern decision-making systems should be able to analyse the results of an algorithm's operation in the same way. 10 However, for this to be possible, individuals affected by such a decision must have the right to know how this decision was reached. Depriving them of this right would effectively condone the practice of unknown decision-makers making nontransparent decisions according to unknown criteria, with no real possibility of challenging such decisions. This is a Kafkaesque world, incompatible with the principles of a democratic society.

Possible (and feasible) solutions
Assuming a general consensus that an individual should be able to challenge decisions taken automatically, the next step that needs to be addressed is to overcome the technical difficulty in determining (reconstructing) the criteria that were taken into account by the algorithm while formulating its decision. This problem should not be underestimated. As illustrated in Section 3, a relatively simple recommendation system used by a music provider demonstrates that in the era of big data systems, even seemingly straightforward decisions ("which song to recommend to a user") are made with the use of very advanced algorithms. Society expects that IT systems will work not only faster than people, but also more efficiently and effectively, which means that algorithms will be able to solve complex problems with a 9. It should be remembered that nowadays medicine is one of the main areas of application for ML algorithms (Hoeren & Niehoff, 2018).

10.
Cf the examples discussed by Jenna Burrell, which she uses to "illustrate how the workings of machine learning algorithms can escape full understanding and interpretation by humans, even for those with specialized training, even for computer scientists" (Burrell, 2016, p. 10).
speed unattainable for humans, and that they will also be able to solve problems that people could not otherwise solve at all (Hecht, 2018). Algorithm predictions are made in all applications of ML systems, including those extremely critical for individuals, such as medical diagnostics (Hoeren & Niehoff, 2018). However, due to the almost complete opacity of algorithm functioning, any attempt to trace their mode of operation, even by an expert in the field, if not actually impossible, would be affected by such a large margin of error as to make any results wholly unreliable (Burrell, 2016). In order to understand the correctness of a decision, an expert or even a group of experts, would have to not only learn the logic of the algorithm but also trace previous decisions and familiarise themselves with the system's learning (training) process. Due to the increasing complexity of this type of algorithm, the scale of this problem will only escalate.
Providing an explanation that is understandable to humans also requires assessing the quality of the data on which an algorithm is based. Classification algorithms need data to learn how to make predictions. This training set must be representative of that data and sufficiently large. For example, the data set for the KKBOX recommendation system described in Section 3 contained information on 30,000 users, 360,000 songs and 7 million user-song pairs. One of the main sources of AI success has been the emergence of 'big data' , that is, freely and automatically collected data widely available for anyone to use. However, it is important to note that the amount of data alone is not sufficient to generate correct predictions; the data must also be representative. In ADMs the problem may be further compounded by uncritical analysis, leading to discriminatory conclusions (Barocas & Selbst, 2016).
The data used by ADMs must therefore be validated to ensure lack of bias. Obviously, this is not an easy task. First, the data sets used by ADM systems are huge and cannot be analysed "manually". To automate this process, the type of bias that might impact further processing should be defined in advance. Second, most of the data used by ADMs stems from past decisions made by humans, which could conceivably be biased along racial or gender lines. Therefore, when considering possible technical implementations of the right to explain in the context of ADM, the problem of ensuring adequate quality of data should also be addressed. In short, it is necessary not only to analyse the mechanisms used for confirming the correctness of an algorithm itself, but also the existence of safeguards that ensure the processed data is trustworthy.
There are at least two possible solutions to this problem. The first would require mandatory registration of the key parameters of those ADM systems whose deci-sions have legal ramifications for individuals (as in the case of Article 21 of the GDPR). The second way to validate the operation of an algorithm is not so much an attempt to trace the correctness of its decisions as a formal evaluation of the entire system through certification measures. The following sections will discuss both proposals, together with an analysis of their main advantages and limitations.

An event logging subsystem
A proven solution, used by IT system designers in cases where it was necessary to trace (reconstruct) the operation of an algorithm at a later stage, is the recording of significant processing parameters. A typical example of such a mechanism are flight recorders, the key elements used to determine the course of flight events.
This proposal therefore aims to introduce an obligation to record ( In addition to being straightforward to implement, the logging of system parameters can also be easily secured cryptographically to ensure the consistency and integrity of recorded data. Taking into account the type of ML system or sensitivity of data processed, logs can be maintained by a specific service provider or trusted third party-avoiding the risk of the data being changed without authorisation Moreover, there is no obstacle to such data being stored in systems supervised by public entities; in this way, the relevant parameters of, for example, a machinebased credit scoring system could be securely stored under the oversight of a financial market supervisor. This, in turn, opens up the possibility of introducing sector-specific requirements that would define a minimum set of parameters to be recorded by automatic decision-making systems and used for the provision of ser-vices in regulated markets. Under this approach, a person challenging the correctness of a decision taken or wishing to exercise their right to explanation of an automatic decision (Article 22 of the GDPR) would have access to the set of key parameters that influenced the final decision. In turn, the supervisory authority could have access to a wider (and more detailed) set of parameters with which it could analyse not only individual cases but also the regularity and legality of the operation of the whole system.
The solution outlined above does have its weaknesses. First of all, it cannot be applied to all types of machine learning algorithms-in particular, deep neural networks with weights attached to features and complex interactions that are not directly interpretable, and therefore no user-interpretable arguments that can be recorded.
ML systems are also not 'static' -with new data, the prediction model generated by an algorithm will change. As a result, the inference process will be modified (e.g. new parameters will be included or pre-existing parameters omitted) and event logging mechanisms will change as well. In traditional IT solutions, it is the main user of the system who determines the set of data to be recorded and also indi- cates how often such recording should be done. Both the scope of data and the frequency of ML recording are criteria which cannot be defined in advance. Practically speaking, it is the system itself (or one of its components) that should be designed to determine what parameters are to be recorded and when. However, this goes against the idea behind this safeguard-to ensure transparency. Since it is not the developer who would establish strict and unchangeable criteria for recording key parameters, but the system itself, this mechanism could also be prone to error or external manipulation. As a result, there would need to be a formal evaluation of the recording process itself. In other words, the attempt to solve the problem of the transparency of an ML system would be replaced by the problem of ensuring the transparency of the event logging subsystem.
Another limitation of this solution is the context of analysis, which is difficult to take into account. It should be remembered that the operation of an algorithm depends not only on the input data and internal procedures for processing (the result of which is also easy to save), but also on previous analyses-that is, on the whole tree of decisions made earlier. Understanding the current result of an algorithm may therefore require the review of a huge knowledge base describing previous decisions made by the system. Without this information, simply saving the current parameters used in the inference might not allow one to reconstruct (and thus verify the correctness and fairness of) the inference performed. The more an algo-rithm is based on machine learning mechanisms, the more this problem will make difficult the use of logging as a way of ensuring system transparency.
A third limitation that needs discussing is the unobvious relationship between the stored parameters and the internal logic of an algorithm. Even assuming that the two previously mentioned obstacles can be overcome, and that the recording of key parameters allows the full and precise reproduction of the initial state and results of subsequent processing steps, the problem of access to the internal logic of an algorithm will subsequently become apparent. ML systems, like other highly specialised technologies, are subject to intellectual property protection (Gervais, 2020). The effectiveness of the protection of various AI technology components is a significant problem affecting the growth of this market. Without access to the source code-and thus to the logic of an AI algorithm-even detailed parameters of its operation will not be sufficient to fully understand the decision-making process whose correctness is to be assessed.
Another issue to be clarified is the adequacy of this measure in achieving its intended purpose. In fact, advocates of the transparency of processing expect the reliability (credibility) of algorithms' operation to be ensured. It seems, however, that ensuring the transparency of the system will not always be a sufficient guarantee of processing reliability-and thus the protection of an individual's rights. Ensuring that processing is fair must include not only confirmation of the correctness of the processing carried out but also its compliance with legal or ethical standards. After taking into account these additional limitations, it may turn out that a properly functioning IT system, which identifies objectively correct relationships between data, cannot be considered trustworthy. It will not be possible to reveal this limitation solely by recording the processing parameters. These parameters alone will not reveal a defect relating to the external data on which an algorithm is based.

Certification frameworks
A second way to validate the operation of an algorithm is not so much an attempt to trace the correctness of its decisions, then a formal evaluation of the entire system through certification. It proposes the creation of a national (or international) certification framework for machine learning systems. The purpose of such a framework would not only be to ensure that systems used to make automated decisions were designed, built and tested in compliance with applicable norms and standards, but also to make sure that their mode of operation (the reliability of decisions made) was confirmed statistically.
In the IT industry, certification mechanisms have been used for years to confirm the authenticity and integrity of software systems (Heck et al., 2010). The use of an external certification mechanism (independent of the provider or user) in relation to machine learning systems could also help to eliminate the risk of unauthorised interference in the way a system works. Furthermore, certification would not have to be mandatory-it could be an optional measure. To encourage ML system providers to participate in this framework, the legislature could introduce a number of legal presumptions based on the premise that decisions made by a certified system are correct. As with any legal presumption, a party challenging such a decision could contest it in court, but they would be required to prove the malfunction of the system. Certification would therefore be a mechanism that obviates the necessity to later prove the correctness and fairness of a system in litigation.
The proposal to introduce certification of advanced IT systems is not a new one and has already been defined, for instance, in relation to artificial intelligence (AI) systems. Matthew Scherer (2016), suggested regulating the AI market with a supervisory body that would issue certifications for AI systems (including tests of new versions of software agents). According to his proposal, certification was not a prerequisite for putting a system into operation but rather a manifestation of soft law regulation. This would provide an incentive for developers by limiting the liability for damage caused by their systems (Scherer, 2016). A similar idea was mooted 20 years earlier by Curtis Karnow. The model he proposed was simpler and primarily involved the creation of the Turing Registry (a hypothetical list of "safe" AI agents), without a reference to any regulatory aspects (Karnow, 1996).
It is worth noting that the implementation of a certification framework for systems making automated decisions is a solution that can be reconciled with the current wording of GDPR provisions. An element of every formal IT system certification framework is an assessment of whether the documentation provided is complete and up to date. It can be expected that in the case of ML systems, such documentation would contain not only a technical description of the environment and the algorithms used, but also a high-level description of the system's operating principles-prepared in a simple and readable manner, compliant in this respect with Article 15 of the GDPR.
It appears, therefore, that the introduction of a certification framework may be helpful in solving both of the problems discussed above. On the one hand, this solution would take into account the specificity of ML systems and would be technically feasible; on the other, it would not require people who want to challenge automated decisions to have specialised knowledge in the field of data analysis or the structure of expert systems.
However, the proposal to use certification frameworks also requires the resolution of several important problems. Firstly, it should be remembered that different certification mechanisms are used in the IT industry. In general, they can be divided into those confirming the correctness of software development and maintenance processes (process certification) and those intended to confirm the authenticity and integrity of software (code certification) (Eloff & von Solms, 2000). In both areas, different norms and standards are used.
Code certification makes it possible to ensure that no third party has interfered with and changed the structure of the computer software. However, such certification only applies to software supplied (or implemented) by the manufacturer (developer), and therefore does not confirm lack of interference with the memory structure of the ML system being run. In particular, it does not in any way refer to the possibility of poisoning the ML logic by deliberate manipulation or feeding the system with badly prepared data. Although system certification mechanisms have been used in the IT sector for several decades, they have so far been used mainly to validate systems that process sensitive data, e.g. in the area of state security (Lipner, 2015). This is due to the simple fact that formal certification of an IT system is a very time-consuming and costly process (Kaluvuri et al., 2014). The wide application of the existing certification framework, such as the Common Criteria (ISO/IEC, 2009), is therefore not enough to fully reflect the needs of the ML market, and it also seems problematic for commercial reasons (see generally, Mellado et al., 2007). It is difficult to imagine that European technology providers would conduct formal certification that might delay their product launch onto the market, whereas the activities of entities operating in other jurisdictions would not be limited in this way.
With regard to process certification in the IT industry, for years the reference frameworks have been the ISO/IEC 20000 and ISO/IEC 27001 family of standards (Siponen & Willison, 2009). Management systems built on their basis may be subject to formal certification. However, it should be remembered that in this scenario certification would ensure that the development, implementation and maintenance of IT systems were carried out with best practice in mind, and in a way that minimised identified risks. Moreover, management systems are part of soft law regulation, so they are mainly the source of internal requirements in the compliance area of the service provider and do not lay down legally binding obligations towards the system users. Processes' certification can also be used to establish a secure supply chain, in which many actors are de facto responsible for the proper op-eration of an ADM system. In this case, it would be possible to introduce standards dedicated to particular categories of entities, e.g. data brokers, companies responsible for data cleaning and quality assurance processes or those involved in the ADM training process. These standards could be subject to a formal evaluation of conformity by an independent external body in a similar way to current certification of management systems.
While certification is a good way to regulate the introduction and operation of ADM systems, there are currently no certification schemes that can be applied directly to this end. What is more, there are not even any legal regulations-at either EU or member state level-that could form the basis for introducing such certification schemes. Even Regulation 2019/881, which creates a framework for certification in the area of cybersecurity, cannot be regarded as such. The main application of the regulation is to improve the security of products used by critical infrastructure operators and digital service providers (Rojszczak, 2020). The main area of application of ML systems, in turn, is the mass consumer market. Hence, it seems that before it is possible to address in detail a future certification framework for ADM systems, it will be necessary to discuss the establishment of new EU regulations that could form the basis of such programmes.
Reuben Binns (2018) aptly notes that current approaches to fair machine learning are typically focused on interventions at the data preparation, model-learning or post-processing stages. Although certification seems to be a promising solution to the problem of confirming the correct operation of ADM algorithms, it will not overcome the significant limitation strictly related to the very nature of statistical analysis. As noted earlier, the right to an explanation is seen not only as a means of confirming the correctness of the decision but also a means of establishing the reasons for not taking the decision that the applicant had expected (the "why X and not Y?" problem presented earlier). As a result, even if a specific algorithm generates statistically correct results, which are confirmed in the certification procedure, its operation can still be questioned because an individual will be deprived of the possibility of ascertaining what circumstances determined the unexpected, or unwelcome, outcome.

Conclusions
Black box algorithms make decisions that affect human lives. This trend is not expected to change in the coming years. Automatic decisions will be made not only on an ever-increasing scale, but also with ever-increasing intensity-as a result of which there will also be increasing pressure on public opinion to develop effective control mechanisms, including those which make it possible to question the decision made in individual cases.
Numerous researchers have criticised the very concept of a right to explanation, pointing out the lack of precision of the EU legislature (Wachter et al., 2017) and questioning the usefulness of this right in practice (Edwards & Veale, 2017). Due to different definitions of and approaches to the "explainability" problem of ML systems, Cynthia Rudin (2019, p. 206) has stated that "the field of interpretability/ explainability/comprehensibility/transparency in ML has strayed away from the needs of real problems. " Today, the right to explanation of an automated decision may be perceived as one of the less important elements of the GDPR, with limited practical significance.
However, this perception will change soon. ML systems are entering new areas of the economy as well as public administration. Hence, the wording and limits of applicability of the law laid down in the GDPR will undoubtedly be subject to recurrent interpretation, including interpretation by the Court of Justice of the European Union.
This would therefore seem an opportune moment to begin discussing the need for a comprehensive regulation on how ML systems are developed, implemented and supervised. Drawing on the experience of the IT sector, it seems most appropriate to introduce a regulatory model in which various types of certification mechanisms will play a leading role. The basis for such a model may be a certification scheme for ML systems-allowing for different certification schemes for systems operating in different markets. It will certainly be necessary to distinguish a specific category of systems whose decisions may affect fundamental rights and freedoms. Future legislation should also promote the use of soft law measures, such as certification based on international standards or codes of conduct, to support the development of industry standards and self-regulation mechanisms. An example of such soft law is the ISO/IEC CD 23053 (2020), a draft international standard that is intended to establish a framework for artificial intelligence systems using machine learning.
Regardless of the certification, in the case of less advanced ML systems, it may be sufficient to use standardised (e.g. resulting from recommendations issued by competent supervisory authorities) procedures for recording key systems parameters.
This proposal may additionally be combined with the establishment of a dedicated supervisory authority, competent to moderate the development of an AI market and-by introducing various regulatory mechanisms, including certification-ensuring their safe use (Tutt, 2016).
It should not be expected therefore that a single, universally-accepted certification scheme for ADM systems will be developed. It is also unlikely that such a uniform standard will be developed within the EU in the near future. The reason for this is not only the lack of consensus between member states on the need to establish EU regulation in this area but also the different digital maturity of individual national markets. Hence, it seems more probable that a set of different legal safeguards which can be applied in particular EU countries will be developed in order to ensure that the dynamic development of technology-including the spread of ADM-does not adversely affect the area of fundamental rights. This trend is already being observed today (Malgieri, 2019), and the problem of implementing the right to explanation of decisions taken automatically is one of the main areas of legislative activity.