Regulating “ big data education ” in Europe : lessons learned from the US

European schools are increasingly relying on vendors to collect, process, analyse, and even make decisions based on a considerable amount of student data through big data tools and methods. Consequently, portions of school’s power are gradually shifting from traditional public schools to the hands of for-profit organisations. This article discusses the current and forthcoming European Union (EU) data protection regime with respect to the protection of student rights from the potential risk of outsourcing student data utilisation in Kindergarten12th grade (K-12) educational systems. The article identifies what lessons can be drawn from recent developments in the United States (US) “student data affair”. These lessons can provide a new perspective for designing a balanced policy for regulating the shift in school’s power.


THE AMERICAN STUDENT DATA PROTECTION "UPROAR"
Using new technologies or services offered by vendors in order to improve learning processes is certainly not new to the education domain (Polonetsky & Jerome, 2014). Nonetheless, the combination of more technology and a reliance on private vendors has raised wide-ranging concerns in the US (Chui & Sarakatsannis, 2015).
In 2011, inBloom, a non-profit data analytics company, designed an advanced secure service offering states and school districts to store data and connect to a personalised learning software.
By mid-2013 inBloom provided its services to nearly every public school in New York State. But for many the software got a little too personal.
Although there was no evidence of inBloom misusing the information, parents and privacy advocates raised concerns about the scope of inBloom's potential data collection. Following some negative campaigns led by privacy activists, parents and teachers' groups, inBloom announced in April 2014 that it would be shutting down its operations (Bennett & Weber, 2015).

CONCERNS OVER THE POWER SHIFT FROM SCHOOLS TO VENDORS
Big data encapsulates two significant components: huge quantities of varied data and large-scale analytics. To achieve both the touted benefits and anticipated harms of big data in education, a vendor needs to utilise a significant analysable quantity of student data (Young, 2015).
The demise of inBloom, and frequent media reports of data security breaches, gave rise to increasing concerns in the US over student data protection (Young, 2015 [PDF]).
Critics, mainly parents and educational and privacy advocacy groups, have been concerned that the large dissemination of student data to private vendors might risk students' privacy, disclosing sensitive information about children, like data about learning disabilities, disciplinary problems or family trauma (Singer, 2014). Of particular concern is the likelihood that vendors will improperly "mine" or sell student data or otherwise monetise student information through building advertising profiles or marketing (Herold, 2014).6 Critics have also been concerned that constant monitoring of students' online activities may overly limit creativity, free speech and free thought, by creating a "surveillance effect" (Zeide, 2016) and invading their "intellectual privacy" i.e. "the ability, whether protected by law or social circumstances, to develop ideas and beliefs away from the unwanted gaze or interference of others" (Richards, 2015).
Another prominent fear concerns big data techniques prematurely and permanently labeling students as underperformers which may "forestall future opportunities by becoming a modern day version of the proverbial permanent record" (Zeide, 2015). The identification of students as "at risk", for example, might not allow them to remove any harmful record of their failures if they improve in the future. Consequently, "students may see labels as self-fulfilling prophecies and predictive analytics may prime educators to make prior judgments about students' capabilities and character" (Alarcon et al., 2014). Furthermore, critics have been concerned that continuous student data mining coupled with decision-making based on algorithmic models will exacerbate bias and create new forms of discrimination, resulting from the embedment of arbitrary or unfair factors (MacCartney, 2014 [PDF]). Grounding decision-making on objective information retrieved by algorithms from multiple educational sources, and based on students' performance in a wide array of educational contexts, may appear "neutral" and irrefutably scientific. However, the "hidden" algorithms that facilitate educational data-driven decision-making reflect particular norms and values about what educational opportunity and equity means. As such, they may rely on biased data that reflect social inequality and plausibly reinforce present structural inequities and contribute to a problem of cumulative disadvantage (Alarcon et al., 2014).

REGULATORY REFORMS TO PROTECT STUDENT DATA IN THE US
The concerns over the engagement of for-profit third parties in education, through the utilisation of student data, revolve around how and for which educational and non-educational purposes data is collected, processed and analysed.
Legal frameworks that apply to student data held by schools and vendors acting on their behalf exist primarily in three US federal statutes which focus mainly on protecting student privacy by limiting access to and disclosure of data: The Family Educational Rights and Privacy Act of 1974 (FERPA) prohibits the unauthorised a.
disclosure of education records. FERPA applies to any school receiving federal funds and levies financial penalties for non-compliance; The Protection of Pupil Rights Amendment (PPRA) of 1978 governs the administration of b.
surveys soliciting specific categories of information, and imposes certain requirements regarding the collection and use of student information for marketing purposes; and The Children's Online Privacy Protection Act of 1998 (COPPA) which applies particularly to c.
online service providers that have direct or actual knowledge of users under 13 and collect information online. Enacted over four decades ago, FERPA was not created for a world where data flows freely and where third parties who are not educational actors become integral part of day-to-day information flow. Even since COPPA came into effect in 2000, education technology has changed radically (Krueger, 2014).
While FERPA was groundbreaking privacy legislation when it was enacted over four decades ago, it is inadequate in today's world where data flows freely and where third parties who are not educational actors become an integral part of day-to-day information flow. "FERPA is so dated that when confronted with a technology that can collect and use big data… the statute practically breaks down," says Young (2015). For example, the definition of educational record and personally identifiable information (PII) would likely not include unconventional types of student data collected through EdTech, such as a lunch item choice or the subject of an email message. Moreover, FERPA's "school officials" exception allows directory PII, such as students' names, addresses, and phone numbers, to be disclosed to third parties who have "legitimate educational interests" without parental consent if the school notifies parents of this practice once a year, and parents are given the opportunity to opt-out of this disclosure. The majority of EdTech providers arguably meet the "school official" exception because they are often under contract with a school to perform an institutional service or function.7 Furthermore, FERPA puts the primary compliance burden on schools themselves, whereas vendors are not required to comply with the law's provisions (Center for Democracy & Technology, 2015 [PDF]).
PPRA requires that schools give notice to, obtain written consent from, and provide an opt-out opportunity to parents before students can participate in commercial activities that involve the collection, disclosure or use of personal information for marketing purposes.8 Nevertheless, this rule does not apply if a vendor is using student data solely for the purpose of developing, evaluating, or providing educational products or services to students or schools. Moreover, like FERPA, PPRA does not provide a private right of action, thus students and parents cannot enforce compliance with the statute (Tudor, 2015).
COPPA, as opposed to FERPA and PPRA, was not designed to be a student privacy law. Even though the law does help to ensure vendors collect and use student data responsibly, it is limited only to sites or services that collect information from children under 13 years old, and not information provided by adults about these children. Therefore, a vendor would not have to comply with the law if the data it collects on an under-13-year-old is only obtained from a parent, school, or presumably anyone of 13 years or older (which includes the majority of high school and some junior high school students).
In response to Americans' persistent concerns, state legislatures began passing laws to fill the gaps in FERPA and other federal laws, as well as to extend privacy protections to other areas. In fact, by September 2015, 46 US states introduced 182 bills addressing student data privacy. Of these bills, 28 in 15 states were enacted into law (Data Quality Campaign, 2015 [PDF]).9 Many of these bills focus on who can access student information and mandate that private entities only use student data for educational purposes. They often stipulate substantive restrictions on the use of student data for creating advertising profiles and for marketing purposes. A lot of the rules focus on providing more opportunities for notice and choice for parents to consent to particular uses or collection (Center for Democracy & Technology, 2015 [PDF]).
The growing student data-related concerns have also garnered attention from legislators of both houses of Congress, who stood on guard to protect student privacy "from the hands" of private vendors by introducing numerous bills.
In April 2015, the Student Digital Privacy and Parental Rights Act (SDPPRA), was introduced with the support of President Obama. The bill prohibits the use of students' PII for advertising and marketing purposes and seeks to minimise the amount of such information that is transferred from schools to private companies.
The bi-partisan Protecting Student Privacy Act (PSPA) was introduced in May 2015 by senators Ed Markey, Orrin Hatch and Mark Kirk. The bill proposes to amend FERPA to, inter alia, require schools to implement policies and procedures that protect students' PII; prohibit schools from knowingly providing access to PII for advertising or marketing purposes; and require states and schools to ensure that outside parties comply with specific requirements.
Perhaps the most rigid bill introduced in the Senate was by senator David Vitter. The Student Privacy Protection Act (SPPA) which would amend FERPA, takes a dramatically different tack than other student-data-privacy legislation that have previously appeared at the federal and state level. SPPA requires educational agencies and institutions to receive parental consent before sharing student data with third parties. It also, for the first time, allows for individual families to receive monetary awards from educational agencies and private actors that violate their children's FERPA rights.10 While it is unclear when, or if at all, they will be enacted, it can already be expected that the pending US federal bills will not quell the uproar or diminish the sizzling debate over the expanding role of for-profit companies in education.11 Parents and privacy advocates have by now vigorously expressed their fears that the bills are inadequate to protect students' rights.
Representatives of the Parent Coalition for Student Privacy, for example, raised alarms that SDPPRA does not require any parental notification or consent before schools share personal data with third parties, allowing vendors to target ads to students and to continue collecting and sharing vast amounts of highly sensitive student information (Strauss, 2015). Pasquale (2015b) had also criticised the bill for focusing mainly on privacy issues, while not addressing other issues such as student profiling (e.g. "at risk" students).
PSPA, on the other hand, was criticised by privacy advocates for not holding vendors legally accountable, and for not expanding its definition of "educational records" to include student emails and digital metadata created on school provided services, platforms, and equipment (Roscorla, 2014).
Even SPPA, ostensibly the most comprehensive proposed policy change, suffered critique for being too lenient towards schools and vendors. In her critical analysis of the bill, Hoge (2015 [PDF]) argues that SPPA will increase psychological screening and profiling of those with disabilities by allowing special education teams to implement psychological testing, treatment, analysis, and evaluation, without parental consent. In addition, as Hoge argues, the bill will not decrease access to private data by third parties. When referring to PII, the bill creates protections for "student data" and then aligns to the definition already listed in FERPA for PII that allows directory information to be cross matched and used to identify the individual student. Moreover, according to the bill, third parties and "school officials" still have access to data because of written agreements in the original version of FERPA.
The description of the American case shows the variety of concerns over student data use, which go way beyond privacy. The US legislative attempts, however, focus on privacy (e.g., prohibition of ads targeting and disclosure of PII).
The protection of "student privacy" is, of course, a major concern when it comes to the potential risks of student data utilisation by vendors. But much of the debate about "student privacy" is not about privacy, and the term has actually become a rallying cry related to any issue involving data use in education. Some of the concerns are less about information practices than about education policy and pedagogy, including the "privatisation" of the public school system.
Parents care most about whether their child receives a good education and they want to ensure that his safety and future opportunities are not compromised in pursuit of conflicting corporate interests (Zeide, 2016). Therefore, regulation should account for other student rights that might be jeopardised by the shift in power from schools to vendors, such as equality, autonomy and freedom of thought.

PROTECTING STUDENT DATA IN THE EU
In contrast to the US piecemeal approach to regulating data protection where legislation is sector driven and may be enacted at state and/or federal levels, personal data protection has been regulated in the EU for a long time, applying a comprehensive prescriptive legal approach which focuses on the population as a whole. For the reasons that will be outlined in the following section, it is nevertheless not sufficiently equipped to deal with the pitfalls of big data in education.

CURRENT EU STUDENT DATA PROTECTION REGIME
At present, the most important EU legal instrument on personal data protection is the 1995 Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data (DPD).12 Recognising the important role vendors play in processing personal data, the DPD distinguishes between first parties and vendors through the introduction of "data controllers" and "data processors" (art. 2(d)-(e)).
Within this structure, a school acts as a data controller if it decides on (a) outsourcing of student data processing; (b) delegating all or part of the processing activities to an external organisation; and (c) determining the ultimate purpose of the processing. A vendor acts as a data processor if it merely supplies the means and the platform, acting on behalf of the school (Article 29 Working Party, 2012 [PDF]).
Deemed data controllers, schools must abide by data protection legislation and must adhere to basic principles of the DPD. Without entering into a discussion as to the effect of holding schools accountable for the actions of third parties, the DPD has two key drawbacks in protecting student privacy and personal data in the context of "big data education".
First, the DPD does not protect student data from re-identification. The DPD's definition of personal data is: "any information relating to an identified or identifiable natural person ('data subject')" (art. 2(a)). If the data is anonymised or aggregated and an individual cannot be identified from the remaining data, it ceases to be personal data, and the provisions of the DPD no longer apply.
When talking about big data, it is questionable whether the personal/non-personal data distinction remains viable and whether anonymisation and aggregation remain effective in protecting users against tracking and profiling (Monreale, Rinzivillo, Pratesi, Giannotti, & Pedreschi, 2014, pp. 1-2 [PDF]). Even if identifiers, such as names and ID numbers, have been removed, one can use background knowledge and cross-correlation with other databases in order to re-identify student data records (Narayanan & Shmatikov, 2008 [PDF]). Therefore, it could be that when student data is anonymised or aggregated the provisions of the DPD will not apply, but the risk of identifying the student -or more precisely: re-identifying -still remains.
Second, setting consent as the DPD's main legal guide may be ineffective. A key principle in the DPD is the need to obtain personal unambiguous consent before data can be processed (art. 2(h)). Before big data, parents could roughly gauge the expected uses of their children's personal data and weigh the benefits and the costs at the time they provided their consent. Today, the ability to make extensive, often unexpected, secondary uses of student data makes it simply too complicated for the average parent to make fine-grained choices for every new situation (Kay, Korn, & Oppenheim, 2012 [PDF]). Moreover, in many instances vendors do not offer users the option of choosing which data they agree to share and for which purposes, thus users are forced to accept or deny the service as a whole. Consequently, parents could end up unintentionally excluding their children from services necessary for their education just because they are unable or unwilling to parse out complex data policy statements (Polonetsky & Jerome, 2014).
The Directive does not address the fact that opting-out is hardly a feasible alternative for users in the educational context, since most parents do not have the privilege of changing their children's schools based on the applicable privacy policy (Zeide, 2016). Therefore, student privacy should not be a binary concept that is either on or off and parents should be given the option of choosing which data they agree to share and for which specific purposes, without having to disengage their children from "big data education". Furthermore, the DPD presumes that consent is not freely given in situations where the party requesting consent has power over the individual granting it. Since a school, ultimately, has the power to make decisions that can affect a student's life chances, there is a risk that parents will feel compelled to consent (Kay et al., 2012).

STUDENT DATA PROTECTION UNDER THE GENERAL DATA PROTECTION REGULATION
EU data protection law has undergone a long-awaited, rigorous and comprehensive revision.
After long discussions in the various committees, on 16 April 2016, the EU Parliament formally approved the General Data Protection Regulation (GDPR or Regulation) and it is set to go into effect in May 2018 in all EU member states.
The GDPR was adopted by the European Commission "to strengthen online privacy rights and boost Europe's digital economy", recognising that "technological progress and globalisation have profoundly changed the way our data is collected, accessed and used" (European Commission, 2012).
In general, the GDPR does not forsake the basic principles of data protection established by the DPD, including consent as a ground for lawfulness of processing (art. 6), and the definition for "personal data" which is a key for determining the scope of the Regulation (art. 5). However, the GDPR adopts several innovative approaches to data protection which could improve the level of data protection for data subjects by imposing considerable additional duties on data controllers.
For example, the Regulation places notable emphasis on transparency by requiring data controllers to communicate with data subjects "in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child" (art. 12). According to Burrell (2016 [PDF]), however, attempts to enforce transparency are challenged by the fact that, for several reasons, algorithms of classification that operate on data, and machine learning algorithms in particular, are irremediably opaque. As Burrell argues, a recipient of the output of the algorithm (the classification decision), rarely has any concrete sense of how or why a particular classification has been arrived at from the inputs.
Additionally, the inputs themselves may be entirely unknown or known only partially.
In addition to the transparency requirement, the GDPR introduces the new 'data protection by design and by default' principle (art. 25) which motivates architects of big data analytics to embed good data protection practices, like anonymisation, pseudonymisation, encryption, and protocols for anonymous communications (European Commission, 2015).
Furthermore, the GDPR obligates data controllers to carry out a risk analysis of the potential impact of the intended data processing if it is "likely to result in a high risk to the rights and freedoms of natural persons" (art. 35). If a specific high risk is likely to be presented, the controllers should also carry out data protection impact assessment and periodical compliance reviews.
It is clear that the GDPR is intended to address some of the key data protection issues that have been identified in relation to big data analytics. Although no specific provisions are included with respect to the protection of school student data, the GDPR explicitly refers to providing children (i.e. any person below the age of 18 years), with specific protection of their personal data. In this sense, several provisions are stipulated to set out special conditions for the processing of personal data of children.13

CONCLUSION
A shift in power from schools to vendors of the EdTech industry is most likely inevitable to some degree. As EU schools become more data-driven we can expect the vendor role in the everyday pedagogic and administrative operations of schools to expand.
The GDPR indicates a possible paradigm shift in the approach of considering privacy and data protection as a new collective interest that would require more public regulation than private enforcement. Once in effect, it may re-establish a different power-balance between data subjects and data users (controllers and processors) thus achieving a significant milestone for increasing the actual level of student privacy protection.
Notwithstanding the need for the EU data protection law to enhance the protection of student privacy by increasing transparency and providing users more consent options, the US experience elucidates that although education shows similarities with other areas, such as social networks or e-commerce, data use in K-12 education also has significant differences.
The mounting public discussion over the outsourcing of student data utilisation goes well beyond traditional privacy and data protection concerns. The expanding role of vendors inside and outside the classroom is taken as a threat to autonomy, liberty, freedom of thought, equality and opportunity.
An adequate regulatory protection of students' rights would focus not only on uses of data outside of school premises, but inside it as well (Pasquale, 2015b). EU policymakers should define the potential risks of outsourcing student data utilisation and need, and establish a new power-balance that will safeguard the full scope of students' rights. For example, and as already pointed out, despite the "aura of neutrality", the algorithms that facilitate educational datadriven decision-making may rely on biased data and thus may affect low-income and underserved populations. Drawing from Pasquale's (2015a) analysis of the reputation, search, and finance sectors, one arguable regulatory solution for addressing the risk of big data analytics facilitating discrimination, would be to deploy auditing systems that review the algorithms and the data used to detect biases and test for disparate impact in education.
Another vital regulatory effort would be for policymakers to protect students' "intellectual privacy" from the "surveillance effect". Broadly speaking, policymakers should set boundaries between 'private' and 'public' spaces within digital learning environments, that will safeguard students' freedom of thought and belief, right to read and engage in intellectual exploration, and the confidentiality of communications between participants (Richards, 2015).
Parental concerns in the US stem from the unproven and unpredictable outcomes and potential unintended consequences of student data use, thus they seek to avoid uncertainty by limiting who can access student information in the first place (Zeide, 2015). The demise of inBloom is perhaps the best example of the uncompromising backlash from parents and media and it is indicative of the deep anxiety about the use of student information.
Regulatory rules that focus not only on how student data is transferred from schools to vendors, but also on when and where student data is collected, for what purposes, and by which tools, will build trust around and allay the wide-ranging concerns related to the shift in power from European schools to private entities in the contemporary data-infused educational landscape.
3. The Learning Analytics Community Exchange (LACE) project, for example, funded by the EU, aims at bringing together key European players in the field of LA and EDM to promote the effective use of analytics in a wide range of educational settings including schools, higher education establishments and workplace learning environments.
4. According to the Horizon Report Europe: 2014 Schools Edition (2014) [PDF], European schools have already started routinely using the services and products of vendors to make effective use of varied and real-time student data. For example, the report states that hundreds of primary and secondary schools in Norway, the UK and the Netherlands are using the "itslearning" learning management system, offered by a market leading vendor, to get quick assessments of learning inside and outside the classroom. 5. Knewton, for example, one of the most prominent companies in the field, uses big data to develop adaptive learning systems and data analytics for students, teachers, school district and publishers. The data analytics are intended to map students' weaknesses and strong points along time, to enable the teacher to personalise the learning process and the content.
6. In 2014, Google admitted that it mines student data from its Google Apps for Education for targeted advertisement purposes. See Gould, J. (2014, January 31). Google admits data mining student emails in its free education apps. SafeGov.org. Retrieved from http://safegov.org/2014/1/31/google-admits-data-mining-student-emails-in... 7. A contractor providing outsourced services to a school is treated as a "school official" if it is (1) performing services for which the school would otherwise use employees; (2) is under the direct control of the school with respect to the use and maintenance of student data; and (3) agrees to abide by FERPA regulations governing use and redisclosure of student data.
8. Under the statute, "personal information" is defined as individually identifiable information including a student or parent's first and last name, a physical address, a telephone number, or a social security number. 9. A bill is the form used for most legislation, whether permanent or temporary, general or special, public or private. A bill does not become law until it is passed by the legislature and, in most cases, approved by the executive (in the US, the President). See Sullivan, J. V. (2007). How our laws are made. U.S. House of Representatives. Retrieved from https://www.gpo.gov/fdsys/pkg/CDOC-110hdoc49/pdf/CDOC-110hdoc49.pdf 10. The Safe Kids Act, an additional federal bill addressing student data protection, was introduced later on in July 2015.
11. In a recent survey it was found that although there is a feel good factor about the growing use of technology in education, 79% of parents reported they are at least somewhat, very or extremely concerned about the security and privacy of their child's data (See Marketplace.org (2015). Parents' attitudes toward education technology. Retrieved from http://www.marketplace.org/sites/default/files/Education%20Technology%20...

12.
The DPD applies to all individuals whose personal data is processed in a member state of the EU. Any use of data constitutes processing under the DPD, and anything that is done to the data is considered to be processing the data, ranging from its creation or collection, to its eventual destruction.
the interests of children might be at stake. Article 12 establishes that information must be adapted to the data subjects, especially if they are children. Article 40, on 'Codes of conduct', states that data controllers and data processors should be encourage to draw up codes of conduct to the proper application of the Regulation taking into account the specific features of the various data processing sectors, such as with regards to "the information provided to, and the protection of, children, and the manner in which the consent of the holders of parental responsibility over children is to be obtained".