Staking out the unclear ethical terrain of online social experiments

In this article, we discuss the ethical issues raised by large-scale online social experiments using the controversy surrounding the so-called Facebook emotional contagion study as our prime example (Kramer, Guillory, & Hancock, 2014). We describe how different parties approach the issues raised by the study and which aspects they highlight, discerning how data science advocates and data science critics use different sets of analogies to strategically support their claims. Through a qualitative and non-representative discourse analysis we find that proponents weigh the arguments for and against online social experiments with each other, while critics question the legitimacy of the implicit assignment of different roles to scientists and subjects in such studies. We conclude that rather than the effects of the research itself, the asymmetrical nature of the relationship between these actors and the present status of data science as a (to the wider public) black box is at the heart of the controversy that followed the Facebook study, and that this perceived asymmetry is likely to lead to future conflicts.

pointed out that: (1) the content omitted from the News Feed as part of the experiment was still available by going directly to the user's Wall; (2) the percentage of omitted content was very small; (3) the content of the News Feed is generally the product of algorithmic filtering rather than a verbatim reproduction of everything posted by one's contacts; and (4) no content was examined manually, that is, read by a human researcher, but that the classification was determined by LIWC automatically.Some of these aspects were misrepresented in the media reactions to the study, but more basic considerations such as how the study had been institutionally handled by Facebook, Cornell, and PNAS, and whether agreement to the terms of service constituted informed consent to participation in an experiment were also raised in the debate that followed.

THE UNCLEAR ETHICAL TERRAIN OF ONLINE SOCIAL EXPERIMENTS
How can the extremely divergent characterisations of the same event be explained, and what do such conflicting perspectives spell out for the ethics of large-scale online social experiments?In what follows, we will discuss these questions, drawing on multiple examples of similar studies.
Researchers at Facebook have conducted other experiments, for instance studying forms of selfcensorship by tracking what users type into a comment box without sending it (Das & Kramer, 2013); displaying products that users have claimed through Facebook offers to their friends in order to see whether a buying impulse is activated by peer behaviour (Taylor et al., 2013); showing users a picture of a friend next to an advertisement without the friend's consent (Bakshy et al., 2013); hiding content from certain users to measure the influence peers exert on information sharing (Bakshy et al., 2012b); and offering users an 'I Voted' button at the top of their News Feeds in order to nudge family members and friends to vote and at the same time assess the influence of peer pressure on voting behaviour (Bond et al., 2012).
While the Facebook emotional contagion study caused the largest controversy, other companies actively conduct very similar experiments.OkCupid, an online dating company, undertook an experiment that consisted of displaying an incorrect matching score to a pair of users in order to assess the effect that an artificially inflated or reduced score would have on user behaviour.A couple that was shown a 90% preferential match was an actual 20% match according to the OkCupid algorithm and an actual 90% match was shown as a 20% score (Rudder, 2014).
According to the results, the recommendation was sufficient to inspire bad matches to exchange nearly as many messages as good matches typically do (Paumgarten, 2014), calling the effectiveness of the algorithm into question.Co-founder and president of OkCupid Christian Rudder responded to this criticism by claiming that: "when we tell people they are a good match, they act as if they are [..] even when they should be wrong for each other" (Rudder, 2014).OkCupid also removed text from users' profiles and hid photos for certain experiments in order to gauge the effect that this would have on user behaviour (BBC, 2014).Similar experiments are conducted by companies such as Google, Yahoo, Amazon, Ebay and Twitter, all of which have access to large volumes of user data and increasingly employ interdisciplinary teams of research scientists that approach problems beyond the scope of traditional computer science.Such teams consist of mathematicians, psychologists, sociologists and ethnographers who analyse data from user transactions, interviews, surveys and ethnographic studies in order to optimise company services (Ungerleider, 2014).Very often (as in the Facebook case) the results of their research is presented at international conferences or published in academic journals in order to stimulate discourse with the academic community.Frequently multi-authored papers bring together company researchers and scientists at academic institutions, particularly in the Unites States.Therefore the questions of whether something constitutes industry research or academic research is much harder to answer than it may seem at the onset, with the lines deliberately being blurred by the quasi-academic environment cultivated at major internet companies.

ARGUMENTS FOR AND AGAINST ONLINE SOCIAL EXPERIMENTS
In the debate that followed the publication of the study, different stances were assumed by a range of actors including journalists, user rights advocates, government officials, company representatives, and academics from a variety of fields, a small and nonrepresentative selection of which is presented in the following (see table 1 for a summary).Our sample is based on a list compiled by legal scholar James Grimmelmann (2014b), who collected sources and called for references from social media users in the period after the study had been widely publicised.
Grimmelmann does not specify exact criteria for the items on his list, simply referring to them as "major primary sources", but we believe that it provides a valuable overview of the types of arguments made in favour of and in opposition to the study.Many commentators reacted critically to the research, but some also expressed concerns in relation to how the study had been handled, blaming media hype and misrepresentation of the experiment for some of the negative responses.Our aim is to characterise these reactions through their implicit conceptualisations by identifying a set of recurring arguments provided in defense of the experiment.Our intent is furthermore to categorise and contrast different arguments, and to point out how they relate to the actors who benefit most from what they imply.By categorising actors along with arguments, we show that the discussion around online experiments is strongly shaped by different and at times conflicting epistemological frameworks that implicitly privilege certain viewpoints over others to attain legitimacy.

BENEFITS OF ONLINE EXPERIMENTS FOR THE INDIVIDUAL
A number of media reports stated that as part of the experiment, the News Feed had been "manipulated" (Arthur, 2014;BBC, 2014;Hill, 2014;Lennard, 2014;R. Meyer, 2014), a wording that appeared problematic to some commentators, as the News Feed is generally filtered to represent a selection of status updates curated according to algorithmic criteria (Bozdag, 2013;Gillespie, 2014).Since the News Feed is algorithmically personalised to foster user engagement in Facebook, it is difficult to judge what kind of modifications qualify as manipulations and which constitute website optimisation.Gillespie (2014) points out that Facebook's curation of user data in the News Feed is already part of the site's terms of service and its data use policy.Sandvig (2014) in turn offers a list of examples outside the News Feed in which pieces of personal communication are effectively recontextualised, for example to be used as advertisements.Facebook has stated that out of an average of 1,500 updates, the News Feed algorithm selects approximately 300 items for each user with each update (Backstrom, 2012).
According to Facebook, in an unfiltered stream of information, people would be missing "something they wanted to see" (Backstrom, 2012).Since the selection of items is achieved through constant testing of alternative site designs, content selection is the product of constant experimentation.As platforms such as Facebook are generally subject to some sort of algorithmic filtering, some commentators have argued that we are ultimately faced with "a problem with the ethics of there being an algorithm in the first place."(Robbins, 2014) On the other hand, research shows that most Facebook users have no precise idea about how the News Feed algorithm works, or that there is a filtering process at all (Sandvig, Karahalios, & Langbort, 2014).Contrary to intuition, an average Facebook post reaches only 12% of a user's followers (Constine, 2012).This curation is assumed to add value, and given the amount of content that is published on Facebook, it reduces clutter.But the filtering criteria cannot be controlled by users (in contrast to, for example, privacy settings), and the precise set of criteria is not transparent.Sandvig (2014) refers to the dangers of a curation that results in a distorted sense of the social context as "corrupt personalization" which he characterises as "the process by which your attention is drawn to interests that are not your own".He acknowledges that it is difficult to pinpoint inauthentic personal interests, but argues convincingly that a commercialisation of communication through algorithmic curation may conflict with user interests without the subject noticing that this is the case.Sandvig categorically differentiates between tailoring content to a user in her best interest and deriving a profit from it, and prioritising commercial content over non-commercial content in a non-transparent fashion.He interprets the latter not merely as an ethical issue to be resolved, but also as a waste of the potential of algorithmic curation.

INFORMED CONSENT AND ITS MANY INTERPRETATIONS
A second point of contention is whether or not agreeing to the Facebook terms of service constitutes informed consent to an experiment in which the News Feed is manipulated in the described way.This question has narrower legal and broader ethical implications.A clause in the terms of service covers research to improve the site and make it more attractive to users, but experts disagree on whether this covers an experimental design as the one chosen by Facebook (cf.Grimmelmann, 2014a; M.N.Meyer, 2014).The Facebook study provoked a discussion among legal scholars about the responsibility of institutional review boards (IRBs) that is still ongoing, demonstrating that massive online experiments represent unchartered territory not just from the perspective of internet companies, but also for academic regulatory bodies, who are likely to approach such experiments in markedly different ways.Grimmelmann (2014a) argues that "informed consent, at a minimum, includes providing a description of the research to participants, disclosing any reasonably foreseeable risks or discomforts, providing a point of contact for questions, and giving participants the ability to opt out with no penalty or loss of benefits to which the subject is otherwise entitled", which in his view the Facebook study did not do effectively.Taking on a similar perspective, Gray (2014) points out that Facebook could have notified the participants in a follow up email, sharing the results with them and offering them a link to the happy and sad moments that they missed in their News Feed while the experiment was underway.Facebook could also have given participants the option of deleting their data after the research was concluded, which the company did not.Jeffrey Hancock, a co-author of the study, also argued for such a "notify after" approach as a response to criticism.Hancock claimed opt-in procedures to be unrealistic for online experiments due to their ubiquity.
Instead, he argued in favour of retroactively informing users after an experiment has taken place, including more information about the study, and contact information for the researchers or an ombudsman (LaFrance, 2014).Of course, user data samples based on prior consent may be less attractive to scientists than random samples (cf.Bernstein, 2014).But while the risk of influencing results by informing users in advance is acknowledged, legal scholars argue that this cannot be effectively weighed against informed consent, because "if it were, informed consent would never be viable" (Grimmelmann, 2014c).
Beyond the question of what kind of provisions are covered by the terms of service in this concrete case, informed consent more generally is seen by some experts as being in need of reform.Erika C. Hayden refers to informed consent as "a broken contract" (2014) and Mary DeRosa describes it as being "overdue for a wake-up call" (2014, para 2).In the context of the reactions to Facebook's study, DeRosa discusses the difference between what may constitute legal agreement and ethical behaviour, asking: "Would anyone seriously argue that Facebook users expected this kind of manipulation of their News Feed or examination of their data for this purpose?Some consumers would knowingly consent to research like this, but it is unlikely that a single one actually did" (para 6).As DeRosa points out, a key problem is that the expectations of users are violated, rather than that consent with online experiments is necessarily per se rare.Van de Poel (2011) argues that applying the principle of informed consent to social experiments in technology raises the question of whether it makes sense to ask people to consent to unknown hazards.As accepting to be a part of an experiment with unknown consequences seems to entail accepting all negative consequences emerging from the experiment, it is difficult to see how people could rationally agree to such an approach.However, Van de Poel argues, any social experiment involving ignorance and a lack of mutual understanding is unacceptable.Instead of directly trying to apply the principle of informed consent, it might be better to focus on the underlying moral concern on which consent is based.Instead of blindly accepting an agreement, the emphasis could rest on informing users about the experiment as such and the risks it entails, providing the option to stop participating if desired, and notifying participants once the experiment is stopped.

THE UBIQUITY OF ONLINE SOCIAL EXPERIMENTS
Some proponents of the study claim that online experiments should be accepted as a fact of life, since every social media company conducts them and they are without any feasible alternative (Andreessen in Sullivan, 2014).Furthermore, some researchers argue that online experiments should not be regulated by the same ethical guidelines that are applied to offline laboratory experiments as they are unique, novel and provide a great opportunity to discover human behaviour at a large scale (Bernstein, 2014;Watts, 2014).However, experiments do not always occur in a traditional laboratory setting.Van de Poel (2009) shows that certain innovations, such as nanotechnology, cannot be developed in a laboratory setting and it is hardly possible to reliably predict risks of such technologies before they are actually employed in society.It may not be feasible to reliably predict the possible hazards to all potential users of a technology, and even when we can, we may not properly express their likelihood in numbers.Van de Poel (2009, 2011) lists conditions for the acceptability of social experiments: (1) the absence of alternatives, (2) the controllability of the experiment, (3) informed consent, (4) the proportionality of hazards and benefits, (5) the approval by democratically legitimised bodies, (6) the possibility for subjects to influence the set-up, carrying out and stop the experiment if needed, (7) the protection of potentially vulnerable subjects, and (8) careful and proportional scaling of the sample size.
Clearly many online intermediaries do not adhere to these principles, mixing different types of considerations: (1) users are rarely informed before or after an experiment is conducted, (2) experiments are approved from within the company, rather than by independent bodies, (3) the subjects cannot influence or stop the experiment, nor give feedback, (4) vulnerable subjects are not protected, (5) experiments are conducted in large scale from the start, (6) the distribution of potential hazards and benefits are not clearly shown, (7) alternatives to the experiments are not considered, and (8) experiments are not subject to the control of participants in the sense that they are able to revoke or modify their participation after the experiment has started.While the ubiquity of such experiments is a result of the pervasiveness of online platforms in which users are able to interact, this hardly makes the experiments ethically less consequential.All actors involved need to jointly discuss and devise criteria for the ethics of online experiments in accordance with existing guidelines (see for example Association of Internet Researchers, 2012).
This by no means excludes users, who also can better weigh risks and benefits when they are adequately informed.In this vein, arguing for a better understanding of how social media platforms operate, Muench (2014) observes that it is "important for users to be aware of how these sites are designed to engage and reinforce our browsing behavior through evolutionary reward systems".

DIFFERENT PERCEPTIONS OF RISK IN ONLINE EXPERIMENTS
The authors of the Facebook study claimed that because Facebook did not insert emotional messages into the News Feed, but only hid certain posts for certain users, the experiment did not represent any danger to users.This argument has been opposed on the grounds that if persuasion does not happen voluntarily and if the persuader does not reveal her intentions before the persuading act takes place, this is to be considered manipulative (Smids, 2012;Spahn, 2012), making manipulation as much an issue of intent as much as an issue of effect.
Others argue that involuntary persuasion is acceptable only if there is a very significant benefit for society that would outweigh possible harms (e.g.Berdichevsky & Neuenschwander, 1999).In the case of the Facebook study, it is difficult to adequately judge the benefits of the research at this point, while the harm, if only in terms of public perception, has become quite obvious.Data scientist Duncan Watts optimistically argues in The Guardian that online social experiments will usher in "a golden age for research" ( 2014), but this depends on each actor's perspective.Mary L. Gray (2014) draws a comparison to early nuclear research and experiments on human subjects, and sees data science as undergoing a learning process with regards to research ethics.
In reaction to Kramer's response to the criticism, published on his personal Facebook page, individual Facebook users responded with personal accounts of emotional hardship and depression, expressing concern that Facebook would experiment on the content of the News Feed in ways that could adversely affect them.The question of risk beyond individual users seems impossible to answer without precedence, but the lack of transparency towards participants is likely to weigh more strongly in the eyes of many users than the small size of the effect reported in the study -and the details of how the filtering was conducted.Furthermore, as Kramer and colleagues point out, the impact of systematically seeking to influence users may still be strong, even if it is restricted to a small group.In a 61 million user experiment in 2010, Facebook users were shown messages at the top of their News Feeds that encouraged them to vote, pointed to nearby polling places, offered a place to click "I Voted" and displayed images of select friends who had already voted (Bond et al, 2012).The results suggest that the Facebook social message increased turnout by close to 340,000 votes.It has consequently been argued that if Facebook can persuade users to vote, it can also persuade them to vote for a certain candidate, a kind of influence which, while hypothetical, does present obvious risks (Zittrain, 2014).

BENEFITS OF ONLINE EXPERIMENTATION FOR THE SOCIETY
A popular argument among proponents of online social experiments resides in their potential benefits to society, and associated with these, the danger that negative responses could have a chilling effect on collaborations between industry and academics (Bernstein, 2014;M.N. Meyer, 2014;Tarkoni, 2014;Watts, 2014).Michelle N. Meyer (2014) makes this argument in two parts, stating first that "rigorous science helps to generate information that we need to understand our world, how it affects us and how our activities affect others", and secondly that "permitting Facebook and other companies to mine our data and study our behavior for personal profit, but penalizing it for making its data available for others to see and to learn from makes no one better off".Similar arguments are made by Watts (2014), and also by Tarkoni (2014), who contends: "Consider: by far the most likely outcome of the backlash Facebook is currently experiencing is that, in future, its leadership will be less likely to allow its data scientists to publish their findings in the scientific literature[..] The fact that Facebook is willing to allow its data science team to spend at least some of its time publishing basic scientific research that draws on Facebook's unparalleled resources is something to be commended, not criticized." What justifies the risks, if potential, that are incurred by large-scale online social experiments?
Watts draws an analogy between the rise of empiricism during the Enlightenment and the current circumstances, arguing that "the arrival of new ways to understand the world can be unsettling".But this analogy is made at least latently problematic by the commercial interests that are at play -the opportunities of learning anything about basic human behaviour are no more pertinent than the opportunities to influence behaviour, for whatever purpose.Muench (2014) compares online social experiments to Skinnerian operant conditioning, in which strategic choices, such as exposing subjects to stimuli in randomised intervals, lead to greater engagement.To make good on the claim of societal benefit, a clearer case needs to be made for the positive impact of online social experiments, a case that is able to transcend the aim of increasing user engagement.

THE UNAVOIDABILITY OF ONLINE EXPERIMENTS
Advocates of online social experiments, such as OkCupid's CEO Christian Rudder, argue that such experiments are unavoidable, because all aspects of the design of digital platforms are shaped by constant experimentation in order to make improvements: "OkCupid doesn't really know what it's doing.Neither does any other website.It's not like people have been building these things for very long, or you can go look up a blueprint or something.Most ideas are bad.Even good ideas could be better.Experiments are how you sort all this out."(Rudder, 2014).
He continues to argue that experiments are needed to make sure that the current algorithm works better than a random one, and that there is no alternative to such an incremental approach to optimally address user preferences.He also believes that while experiments presently cause controversies, they will be fully accepted in the future.Critics contend that the potential to innovate via experimentation must still be weighed with possible drawbacks, rather than being accepted as being without an alternative.For instance, Howell (2014) responds to Rudder arguing that he "is clearly acting wrongly, and for (at least) two reasons: 1) He is being dishonest by providing something other than what he says he will provide.Rudder thus provides a system that performs bad matches to see how people will react, instead of their claim "Our matching algorithm helps you find the right people 1. 2) he subjects his (users) to potential harm that they have actively sought to avoid".Howell (2014) further argues that the defense of the company is disingenuous: "either OkCupid believes its sales pitch or it doesn't.If it doesn't, we already have a moral issue.If it does, then they are doing what they believe will be harmful to their customers".Grimmelmann (2014c) shares this view when proposing that, unless risks are minimal or nonexistent, researchers cannot decide that an experiment is worth a particular risk.
That decision should instead be made by users.
Table 1 summarises our observations on the arguments made by the proponents and critics of the Facebook study, and similar online experiments.

DISCUSSION
We have aimed to show that the ethical issues raised by social experiments can be described on multiple discursive levels, depending on the roles that the discussants assume.We have shown that the problem is complex and involves interests reflected in different arguments, such as the individual and social benefits of online experiments, their ubiquity and relevance, the fact that consent is provided and that users are not exposed to any significant risks.We have shown that some of these values themselves are dependent on specific frames of reference (e.g., the attainment of status in science) and that further debate is needed to balance their relation to one another.Perhaps our central observation is that the asymmetrical relationship between data scientists and users of social media platforms is what underpins these conflicting frames of reference.Furthermore, as long as there is no consensus regarding the ethics of online experiments that transcends a single stakeholder group, such conflicts are likely to arise again in the future, rather than abate.In this paper, we have used the Facebook experiment as a use case to discuss a range of arguments provided by different stakeholders to illustrate this conflict.
While the study has provoked strong reactions, it is worth to again point to similar research, both at Facebook and elsewhere, to clarify that this is a broader issue, rather than a singular case.In a 2012 study on information diffusion, Facebook researchers randomly blocked some status updates from the News Feeds of a pool of some 250 million users, many more than in the emotion contagion experiment.Google provides a set of tools to conduct A/B tests for website optimisation, as does Amazon.Beyond A/B testing to improve the quality of search results, issues become yet more complicated when experiments around information exposure are conducted with social improvement in mind, and without explicit consent.In research conducted at Microsoft, researchers Yom-Tov, Dumais, & Guo (2013) changed search engine results in order to promote more balanced civil discourse.In the study, the authors modified results that were displayed when users entered specific political search queries, so that subjects entering the query obamacare would be exposed both to liberal and conservative sources, rather than just to content biased into one ideological direction.While the researchers arguably had the best intentions, they did not notify users that their search results were being modified, neither during the experiment nor afterwards.This raises complex questions regarding the ethics of manipulation with the aim of affording social improvement.Some have claimed that when persuasion is conducted for a higher ethical goal, this can be acceptable (Berdichevsky & Neuenschwander, 1999), while others disagree (Smids, 2012;Spahn, 2012).In the light of the discrepancy between the ethical standards of academic research on human subjects and the entirely different requirements of building and optimising social media platforms and search engines, it is tempting and simplistic to single out any particular company for filtering content algorithmically.New collaborative models of joint corporate and academic research are considerably blurring the boundaries between basic and industry research, and complicating the picture of disinterested academia and result-driven commercial research.
The public outcry in reaction to the Facebook study underlines that there is a growing expectation towards more transparency regarding how content is filtered and presented, beyond assuming a 'take it or leave it'-style attitude.A company may have the interests of its users in mind, whether this goal is usability, more relevant search results, happier status updates, or a better match in dating platforms.However, users have to be able to assess these intentions for themselves, and evaluate the balance between their personal benefits and the interests of the company.There is a pronounced fear among publicly-funded academics that Facebook and other social media companies might limit the already fairly sparse access to their data, as they clearly see benefits in publishing studies based on unprecedented amounts of data -not solely for science, but also for their own careers.each other on Facebook than about human interaction in any broader, more universal sense.
After the controversy had erupted, the editor of the publication, Susan Fiske, noted the complexity of the situation, pointing out that the Institutional Review Board of the authors' institutions had approved the research, and arguing that Facebook could not be held to the same standards as academic institutions.Kramer and colleagues clearly saw their experiment in line with Facebook's continued efforts to optimise the News Feed, yet as we have pointed out, the arguments made in defense of this and similar experiments are strongly coloured by the interests of different parties, with users relatively far removed from the benefits in favour of which the proponents argue.Data science must show more convincingly that it balances the interests of scientists, companies and users to deliver on its many promises.Laboratories, regardless of their size, are governed by rules ensuring that the research conducted under their oversight is not just legal, but also ethical.Legalistic attempts to seek to cover behind the terms of service have failed to achieve this type of broad societal acceptance for what undoubtedly constitutes a new approach to science.While some researchers argue that online social experiments should not be subjected to the same ethical guidelines that are used for offline social experiments, we find the 'newness' of such experiments to lie in their potential scale, rather than in their ethics.The point is not to wring our hands about hypothetical potentials for abuse, but to carefully examine cases such as the Facebook study and ask why the reference points of users and data scientists are as different as they apparently are, and whether these differences can be reconciled in the future.Benefits for science should be balanced with possible hazards that may be caused by experiments, rather than precluding that such benefits outweigh the gains.Transparency towards users is paramount, as is seeking articulated consent for participation.

Table 1 :
Arguments for and against online social experiments surrounding the Facebook Staking out the unclear ethical terrain of online social experiments Competition for cutting-edge research results is neither unique to social media data nor surprising, but it spells out a potential conflict of interest between users whose sense of freedom and privacy is in potential conflict with scientists' interest in advancing a nascent field vying for scholarly acceptance through highprofile publications.To users, it remains largely unclear what exactly the benefits of such research may be.The argument made by Meyer, that "rigorous science helps to generate information that we need to better understand our world" (our emphasis), is qualified by the highly media-specific nature of such research -we learn much more about how people react to Staking out the unclear ethical terrain of online social experiments Internet Policy Review | http://policyreview.info 11 November 2014 | Volume 3 | Issue 4