The implications of venturing down the rabbit hole

Jonas Kaiser, Berkman Klein Center for Internet & Society, Harvard University, Cambridge, United States of America
Adrian Rauchfleisch, Graduate Institute of Journalism, National Taiwan University, R.O.C

PUBLISHED ON: 27 Jun 2019

While conducting research on YouTube’s algorithms, three researchers discovered that YouTube’s recommendations had created a community of sexually suggestive channels. Following YouTube’s video recommendation down the “rabbit hole” lead them to videos of minors as well as children. When they shared their findings with The New York Times, YouTube implemented changes, and US lawmakers demanded consequences. In this piece, they expand on their findings, show how easily accessible the channels were, reflect upon the risks of conducting online research, media cooperations, and demand more accountability from social media platforms.

Fuelled by fears of radicalisation (O’Callaghan et al., 2015; Munn, 2019), misinformation (Briones et al., 2012), and filter bubbles (Pariser, 2011), YouTube and especially its recommendation algorithms have come under severe scrutiny in the last few years. Once famous for enabling everyone to “broadcast yourself”, YouTube is now known for the “alternative influence network” (Lewis, 2018) that the US far-right has set up on the platform. But, as we will show here, YouTube is not only plagued by political extremism and misinformation, but also by its own algorithms that created a filter bubble of sorts for paedophiles. And while we had started like we had done previously in our analysis on the United States (Kaiser & Rauchfleisch, 2018a, 2018b) and Germany (Kaiser & Rauchfleisch, 2017), this time, we had truly fallen down YouTube’s algorithmic rabbit hole. Our starting research question changed from “How is the Brazilian YouTube-sphere structured?” to “How bad is it?”.

But let’s start from the beginning: like in our previous work, we were interested in the communities and prominent channels in the Brazilian YouTube-sphere. With over 200 million inhabitants, Brazil is the fourth largest democracy in the world and in the last elections YouTubers even got voted into office (Broderick, 2018). We —that is Yasodora Córdova, Adrian Rauchfleisch, and Jonas Kaiser— set out to map the Brazilian YouTube-sphere but ended up navigating YouTube’s communities of sexually suggestive channels as well as videos of children. In this piece, we will outline our method, to, then, highlight the issues that we encountered along the way, both from a research as well as a legal perspective. We posit that this specific project and the issues that it touched upon are endemic in computational social science, need to be discussed, and guidelines to be introduced.

Stumbling over the rabbit hole: our method

For our analysis, we followed YouTube’s “related channel” function; i.e., its channel recommendation system. In a first step, we created a list with political channels, conspiracy theory channels, as well as the top 250 channels from SocialBlade.com. In total our list had 451 channels. We then collected all the channels that the channel owners themselves suggested. By adding those, we had a starting seed list of 1,851. Next, we followed YouTube’s channel recommendation system for these channels in five steps; i.e., collected the recommended channels, then the recommended channels for the newly added channels, then again the recommended channels for the newly added channels, etc. This resulted in a network of 12,341 channels. We then analysed the network using Gephi. Based on the network’s structure we focused on four core communities that we analysed more closely. These consisted of political content, mainstream channels, conspiracy theory channels, esoteric channels, and, to our surprise, channels that were sexually suggestive. When inspecting the latter sub-community more closely, we noticed that while the channels were all sexually suggestive, that some of the channels featured videos of women of age while some channels featured underage women. In a next step, we took the top videos (most viewed) since January 2018 for these channels and used YouTube’s API to collect the video recommendations for these top videos. For each video, we collected the top 20 recommended videos. We then aggregated this network of video recommendations on a channel level. Next, we used community detection to understand the different communities that were part of the channel network.

Down the rabbit hole: our results

We were able to identify a community that featured what appeared to be adults engaging in sexually suggestive situations that were, however, not explicitly sexual. Some of the videos even were monetised, i.e., had several ads in them. In some of the videos, women seemed to look for financial support in the form of a so-called “sugar daddy”. The second community was less clear but, for example, consisted of a channel from a woman that used YouTube to direct users to other platforms like Instagram, Amazon or Patreon - a crowdfunding platform for artists and creators; presumably for monetary gain. While this American woman was of age, other channels within the community seemed to be by teenagers. The third community, then, consisted mostly of channels featuring videos of small children. One of the videos showed a child swimming in a pool, another one a child doing splits. While some of the channels seemed to be from a parent or the child itself, other channels seemed to have uploaded videos of different children. Although we inspected only few videos personally, the common theme was that the children were only lightly dressed. Noteworthy about the first type of channels was that most of the videos seemed to have gathered few views (in the cases we inspected, only about three or four) while one video had over 250,000 views. The difference between the videos: in the video with the over 250,000 views, the child was only barely dressed. It has to be noted that although most if not all of the videos that we personally looked at do not seem to have violated US law, some of them certainly violated YouTube’s Terms of Service.

While these problematic videos seem to be hidden in a rather isolated community we still just accidentally captured them during our political analysis of the Brazilian Youtube-sphere. It would be even more problematic if these videos could be reached directly over the YouTube recommendation starting with mainstream entertainment and politics channels. We checked in our network analysis how many recommendation jumps were between all our captured channels and the problematic channels. Unfortunately, in more than 50% of the cases the problematic channels can eventually be reached. However, most of these channels are more than ten jumps away from the problematic cluster of channels (see Fig. 1).

Figure 1. Visualisations of necessary steps, i.e., clicks, to reach the sexually suggestive channel community in the Brazilian YouTube channel network (Network layout: Yifan Hu; Node size: uniform; Node color: closeness to core community; N=12,341).

The implications

These results do not only have direct implications on the children as well as the families pictured in the videos, ethical implications for the platform but also serve as a moment for academics to consider our research process. If we analyse recommendation algorithms, crawl websites, collect tweets, scrape 4chan, etc. we usually do not know at the beginning what content will be captured during the data collection. Most likely, most scholars who conduct such analysis won’t know all content that is being represented in their dataset. On the one hand, we usually have a precise and directed research question, while on the other hand, the long tail which usually is being described as “noise” is often ignored. Indeed, in a worse case scenario illegal content will be captured. While we will usually ask whether an analysis can be done, in this case, we were rather wondering if a certain analysis should be done due to the potential legal issues of the topic. In our case we relied on the invaluable legal advice provided by experienced scholars at the Berkman Klein Center’s cyberlaw clinic as well as the legal team at the New York Times (NYT). Together we mapped a way forward that allowed research in a limited scope but also included reporting the channels we identified to the National Center for Missing and Exploited Children (NCMEC). We then moved forward and shared our data with the New York Times which eventually published an article called “On YouTube’s Digital Playground, an Open Gate for Pedophiles” (Fisher & Taub, 2019). Since then YouTube implemented changes to their platform (YouTube, 2019) and US lawmakers have put forward legislation (Hawley, 2019) as well as questions for YouTube (Blumenthal & Blackburn, 2019).

For us, this experience highlighted several key issues in academic research: the implicit risks of conducting research in an ever-changing online environment, the associated legal questions that need to be considered, but also our own roles as academics, the decision to cooperate with journalists, and the potential impact that work might have. Indeed, this might sound more familiar to academics working in the field of climate change research than for communications scholars. After having briefly outlined our method and findings, we now want to discuss these issues. We draw here from our own experiences, our knowledge of science communication, as well as from private conversations that we had over the last few years on the role of academics, their presence in the media, and the potential trade-off this presence might imply.

  • Knowing your data. It was pure luck that we stumbled over the sub-community of channels in our dataset. Of the over 12,000 channels, we looked at maybe 3,000 more closely. While our project was exploratory in nature, we had a clear research focus, collected the data accordingly, and then tried to answer our question. In the pursuit of making sense out of the channel network, we did not look through all channels but focussed on the four largest network communities. In doing so, we stumbled over the ~50 channels that lead us to the child exploitation videos. This evokes the question of how many illegal metadata is out there in not-yet and already published academic work except for nobody knows about it. Is illegal content a part of the web that we have to endure as academics? If not, how can we control for it? How do we find it? And who do we turn to once we have found it? Like in any step of this research process, we were highly lucky: we could turn to the Berkman Klein Center’s cyberlaw clinic and, in addition, get the perspective of the NYT’s legal team. In addition, we were located in the United States and in Taiwan, i.e., countries that have clear protocols and organisations in place that we can talk to. Other academics are, most certainly and unfortunately, not that lucky and privileged. As social media researchers it is time to talk about good practice for online research, for mitigating risks, for understanding the risks, and for dealing with risks responsibly. In the same context, it is important to note that social media companies, too, do not want illegal content on their platforms. The companies, however, have to get better at being approachable, having contact persons only for academic matters, and being responsive. Only this way, social media companies can function as allies.

  • Publish or perish. In academia, publications are paramount and will decide over one’s career. While working papers seem to be more accepted in other fields, they are less popular in communications. In addition, working papers are usually seen as less legitimate due to them not being peer-reviewed. In our case, we decided that we cannot openly report about this in a traditional academic paper as it would not change the problem we identified. In an optimistic scenario our results would have been publicly available earliest in a year (assuming that everything in the peer-review and publication process goes right). The children in the videos, nor the families that had uploaded some of the videos could not have possibly waited one year for YouTube to change their algorithm. Indeed, it is our conviction that this would have been highly irresponsible. The moment we had identified the content we knew that we had to act immediately. We thus decided to directly use this analysis in collaboration with the NYT. Another reason why we had decided against writing a research paper is reproducibility. What is usually an important academic core tenet is, in this case, a risk. Usually, we encourage replication, here we explicitly wanted to avoid it at any cost. We only changed our stance and highlighted our method here so transparently because YouTube removed the “related channels” feature and our work, thus, became de facto irreproducible.

  • Media cooperation. Like many academics, we have mixed feelings about cooperating with journalists. If you want to argue with Luhmann, Habermas or someone else: it can generally be assumed that the logic of the journalistic field (e.g., timeliness) is different from the academic field (e.g., truth). Cooperating with journalists thus can often be complicated: as academics ourselves, we believe it’s fair to say that only few academic results are newsworthy (Galtung & Ruge, 1965). And that’s perfectly fine as they are part of an inner-systemic discourse, attempts to expand our knowledge and/or reject aspects that we thought to be true. For journalists, however, newsworthiness is the business. In our case, we were lucky enough to work closely with two extraordinary journalists who were very open to our findings as well as our concerns and who, in turn, also helped us in making our analysis better. Yet, it takes little creativity to presume that cooperating with journalists might take a wrong turn. Results might get exaggerated, words misinterpreted, findings reframed or sensationalised. In our example, this was not the case. Indeed, we profited from the cooperation by having our findings questioned and validated.

  • Science engagement. This connects closely to what Roger Pielke Jr. (2015) called “modes of engagement.” In his work, Pielke Jr. differentiates different modes of engagement that academics can choose from when interacting with the public and the political field in particular: “pure scientist”, “issue advocate”, “science arbiter”, and “honest broker of policy alternatives”. This touches upon what we have discussed with several colleagues over the last few years as well as a trend in modern academia: the pressure to engage with the public but especially so with the media. And while this becomes more and more important in deciding between job candidates, it also leads to internal discussions. While some might be called ‘rockstars’ others are less lucky and are described as ‘should focus more on publishing papers and less on being in the media’ or active on social media (see, for example, Hall, 2014). In a more informal setting, the differentiation between the four modes of engagement is being collapsed into academics that “represent” science and those who become talking heads. This leads to the somewhat curious contradiction that media engagement might formally lead to an advantage while informally losing standing. In our case, we explicitly decided in favour of sharing our work with the NYT. Similarly to the question of whether to cooperate with journalists, the second question academics should ask themselves in this context is “what kind of academic do I want to be”. And while many academics think that they have to engage with and/or educate the public, “the public” often refers to policy makers and less so to the general media (Besley & Nisbet, 2011). Social media, of course, has changed this to some extent and it would be worth examining if the roles displayed on social media would still fit Pielke Jr.’s ideal types. In our case, we were aiming to fit in the role of both pure scientist as well as honest broker; pure scientist in the context of what we found and how to interpret the findings, honest broker in the context of the question “what can be done about it?”.

Conclusion: a look in the mirror

Conducting research on social media platforms comes with risks. These risks, however, are often either not known or ignored. By writing this piece we wanted to emphasise how quickly things can get messy or, worse, legally risky. We argue that as social scientists using computational methods we have to be aware of these risks, address them, discuss them, and create a workflow for us and all our colleagues that helps as a guideline and resource on how to behave, what to do, and who to contact. This includes lawyers, colleagues in the department / the research institution, the social media platforms that we analyse, policy makers, and journalists. This also includes talking about privilege and that some research institutions and some countries are better equipped in dealing with such problematic findings than others. It is paramount for the institutions and scholars that enjoy more of these privileged to realise this and engage in finding a way forward so that, in the end, all researchers can conduct research more safety.

In a so-called “post-normal science” world (Funtowicz & Ravetz, 1993), the problems are overly complex, our methods limited, and all solutions come with trade-offs. Yet, we are not alone and sometimes it is better to cooperate with journalists, social media companies, or agencies, and sometimes it is not. It is important to consider at which point of the research process a cooperation makes sense and when it does not. We use our example to highlight that sometimes the public interest is more important than a proper academic peer review process. Our priority was, first and foremost, to get YouTube to remove some of the channels and videos as well as to adjust their algorithms. YouTube has done so, and we command them for taking a step in the right direction.

The question, however, has to be asked whether YouTube would have also acted that way if we had not cooperated with the NYT. Indeed, our project also highlights, once more, the need for a more direct contact into the social media companies. Academics should encounter social media companies on a level playing field; academics should be understood as equals not as outsiders. Part of this is allowing research, giving privileges to researchers and journalists, and, most importantly, to honestly listen. It is, perhaps, telling that we haven’t heard from YouTube about our research. It is, perhaps, also telling that YouTube has shut down the “related channels” feature during the time when the NYT asked for comments; according to YouTube because they “weren’t frequently used”... the feature we have used for our research and which we now cannot continue.

Bibliography

Besley, J. C., & Nisbet, M. (2013). How scientists view the public, the media and the political process. Public Understanding of Science, 22(6), 644–659. https://doi.org/10.1177/0963662511418743

Blumenthal, R., & Blackburn, M. (2019, June 6). Letter from Sen. Richard Blumenthal & Marsha Blackburn to Susan Wojciki. Retrieved from https://www.blumenthal.senate.gov/imo/media/doc/2019.06.03%20-%20YouTube%20-%20Child%20Abuse.pdf

Briones, R., Nan, X., Madden, K., & Waks, L. (2012). When Vaccines Go Viral: An Analysis of HPV Vaccine Coverage on YouTube. Health Communication, 27(5), 478–485. https://doi.org/10.1080/10410236.2011.610258

Broderick, R. (2018). YouTubers Will Enter Politics, And The Ones Who Do Are Probably Going To Win. BuzzFeed News. Retrieved from https://www.buzzfeednews.com/article/ryanhatesthis/brazils-congressional-youtubers

Fisher, M., & Taub, A. (2019). On YouTube’s Digital Playground, an Open Gate for Pedophiles. The New York Times. Retrieved from https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html?smtyp=cur&smid=tw-nytimes

Funtowicz, S. O., & Ravetz, J. R. (1993). Science for the post-normal age. Futures, 25(7), 739–755. https://doi.org/10.1016/0016-3287(93)90022-L

Galtung, J., & Ruge, M. H. (1965). The Structure of Foreign News: The Presentation of the Congo, Cuba and Cyprus Crises in Four Norwegian Newspapers. Journal of Peace Research, 2(1), 64–90. https://doi.org/10.1177/002234336500200104

Hall, N. (2014). The Kardashian index: A measure of discrepant social media profile for scientists. Genome Biology, 15(7), 424. https://doi.org/10.1186/s13059-014-0424-0

Hawley, J. (2019). Protecting Children from Online Predators Act. Retrieved from https://www.hawley.senate.gov/sites/default/files/2019-06/Protecting-Children-Online-Predators-Act-Highlight.pdf

Kaiser, J., & Rauchfleisch, A. (2017). YouTubes Algorithmen sorgen dafür, dass AfD-Fans unter sich bleiben. Motherboard VICE. Retrieved from https://www.vice.com/de/article/59d98n/youtubes-algorithmen-sorgen-dafur-dass-afd-fans-unter-sich-bleiben

Kaiser, J., & Rauchfleisch, A. (2018a). Unite the Right? How YouTube’s Recommendation Algorithm Connects The U.S. Far-Right [D&S Media Manipulation: Dispatches from the Field]. Retrieved from D&S Media Manipulation: Dispatches from the Field website: https://medium.com/@MediaManipulation/unite-the-right-how-youtubes-recommendation-algorithm-connects-the-u-s-far-right-9f1387ccfabd

Kaiser, J., & Rauchfleisch, A. (2018b). Filling the void Alex Jones left behind YouTube’s attempt to burst the (far-)right bubble is a chance for Fox News [Berkman Klein Center Collection]. Retrieved from Berkman Klein Center Collection website: https://medium.com/berkman-klein-center/filling-the-void-alex-jones-left-behind-b9a18fdd95be

Lewis, B. (2018). Alternative Influence: Broadcasting the Reactionary Right on YouTube. Data&Society. Retrieved from Data&Society website: https://datasociety.net/wp-content/uploads/2018/09/DS_Alternative_Influence.pdf

Munn, L. (2019). Alt-right pipeline: Individual journeys to extremism online. First Monday, 24(6). https://doi.org/10.5210/fm.v24i6.10108

O’Callaghan, D., Greene, D., Conway, M., Carthy, J., & Cunningham, P. (2015). Down the (White) Rabbit Hole: The Extreme Right and Online Recommender Systems. Social Science Computer Review, 33(4), 459–478. https://doi.org/10.1177/0894439314555329

Pariser, E. (2011). The filter bubble: What the Internet is hiding from you. New York, NY: Penguin Press.

Pielke Jr., R. (2015). Five Modes of Science Engagement [Roger Pielke Jr.’s Blog]. Retrieved from Roger Pielke Jr.’s Blog website: http://rogerpielkejr.blogspot.com/2015/01/five-modes-of-science-engagement.html

YouTube. (2019). An update on our efforts to protect minors and families [YouTube Official Blog]. Retrieved from YouTube Official Blog website: https://youtube.googleblog.com/2019/06/an-update-on-our-efforts-to-protect.html

Funding

This work was supported by the Ministry of Science and Technology, Taiwan (R.O.C) (Grant No 108-2410-H-002 -007 -MY2).

Acknowledgements

We further want to thank Valentin Weber and Sarah Hampton Brown for their feedback to prior drafts of this article.

Add new comment