Disinformation optimised: gaming search engine algorithms to amplify junk news

: Previous research has described how highly personalised paid advertising on social media platforms can be used to influence voter preferences and undermine the integrity of elections. However, less work has examined how search engine optimisation (SEO) strategies are used to target audiences with disinformation or political propaganda. This paper looks at 29 junk news domains and their SEO keyword strategies between January 2016 and March 2019. I find that SEO — rather than paid advertising — is the most important strategy for generating discoverability via Google Search. Following public concern over the spread of disinformation online, Google’s algorithmic changes had a significant impact on junk news discoverability. The findings of this research have implications for policymaking, as regulators think through legal remedies to combat the spread of disinformation online.


INTRODUCTION
Did the Holocaust really happen? In December 2016, Google's search engine algorithm determined the most authoritative source to answer this question was a neo-Nazi website peddling holocaust denialism (Cadwalladr, 2016b). For any inquisitive user typing this question into Google, the first website recommended by Search linked to an article entitled: "Top 10 reasons why the Holocaust didn't happen". The third article "The Holocaust Hoax; IT NEVER HAPPENED" was published by another neo-Nazi website, while the fifth, seventh, and ninth recommendations linked to similar racist propaganda pages (Cadwalladr, 2016b). Up until Google started demoting websites committed to spreading anti-Semitic messages, anyone asking whether the Holocaust actually happened would have been directed to consult neo-Nazi websites, rather than one of the many credible sources about the Holocaust and tragedy of World War II.
Google's role in shaping the information environment and enabling political advertising has made it a "de facto infrastructure" for democratic processes (Barrett & Kreiss, 2019). How its search engine algorithm determines authoritative sources directly shapes the online information environment for more than 89 percent of the world's internet users who trust Google Search to quickly and accurately find answers to their questions. Unlike social media platforms that tailor content based on "algorithmically curated newsfeeds" (Golebiewski & boyd, 2019), the logic of search engines is "mutually shaped" by algorithms -that shape access -and users -who shape the information being sought (Schroeder, 2014). By facilitating information access and discovery, search engines hold a unique position in the information ecosystem. But, like other digital platforms, the digital affordances of Google Search have proved to be fertile ground for media manipulation.
Previous research has demonstrated how large volumes of mis-and disinformation were spread on social media platforms in the lead up to elections around the world (Hedman et al., 2018;Machado et al., 2018). Some of this disinformation was micro-targeted towards specific communities or individuals based on their personal data. While data-driven campaigning has become a powerful tool for political parties to mobilise and fundraise (Fowler et al., 2019;Baldwin-Philippi, 2017), the connection between online advertisements and disinformation, foreign election interference, polarisation, and nontransparent campaign practices has caused growing anxieties about its impact on democracy.
Since the 2016 presidential election in the United States, public attention and scrutiny has largely focused on the role of Facebook in profiting from and amplifying the spread of disinformation via digital advertisements. However, less attention has been paid to Google, who, along with Facebook, commands more than 60% of the digital advertising market share. At the same time, a multi-billion-dollar search engine optimisation (SEO) industry has been built around understanding how technical systems rank, sort, and prioritise information (Hoffmann, Taylor, & Bradshaw, 2019). The purveyors of disinformation have learned to exploit social media platforms to engineer content discovery and drive "pseudo-organic engagement". 1 These websites -that do not employ professional journalistic standards, report on conspiracy theory, counterfeit professional news brands, and mask partisan commentary as news -have been referred to as "junk news" domains .
Together, the role of political advertising and the matured SEO industry make Google Search an interesting and largely underexplored case to analyse. Considering the importance of Google Search in connecting individuals to news and information about politics, this paper examines how junk news websites generate discoverability via Google Search. It asks: (1) How do junk news domains optimise content, through both paid and SEO strategies, to grow discoverability and grow their website value? (2) What strategies are effective at growing discoverability and/or growing website value; and (3) What are the implications of these findings for ongoing discussions about the regulation of social media platforms?
To answer these questions, I analysed 29 junk news domains and their advertising and search engine optimisation strategies between January 2016 and March 2019. First, junk news domains make use of a variety of SEO keyword strategies in order to game Search and grow pseudo-organic clicks and grow their website value. The keywords that generated the highest placements on Google Search focused on (1) navigational searches for known brand names (such as searches for "breitbart.com") and (2) carefully curated keyword combinations that fill socalled "data voids" (Golebiewski & Boyd, 2018), or a gap in search engine queries (such as searches for "Obama illegal alien"). Second, there was a clear correlation between the number of clicks that a website receives and the estimated value of the junk news domains. The most profitable timeframes correlated with important political events in the United States (such as the 2016 presidential election, and the 2018 midterm elections), and the value of the domain increased based on SEO optimised -rather than paid -clicks. Third, junk news domains were relatively successful at generating top-placements on Google Search before and after the 2016 US presidential election. However, their discoverability abruptly declined beginning in August 2017 following major announcements from Google about changes to its search engine algorithms, as well as other initiatives to combat the spread of junk news in search results. This suggests that Google can, and has, measurably impacted the discoverability of junk news on Search.
This paper proceeds as follows: The first section provides background on the vocabulary of disinformation and ongoing debates about so-called fake news, situating the terminology of "junk news" used in this paper in the scholarly literature. The second section discusses the logic and politics of search, describing how search engines work and reviewing the existing literature on Google Search and the spread of disinformation. The third section outlines the methodology of the paper. The fourth section analyses 29 prominent junk news domains to learn about their SEO and advertising strategies, as well as their impact on content discoverability and revenue generation. This paper concludes with a discussion of the findings and implications for future policymaking and private self-regulation.

THE VOCABULARY OF POLITICAL COMMUNICATION IN THE 21 ST CENTURY
"Fake news" gained significant attention from scholarship and mainstream media during the 2016 presidential election in the United States as viral stories pushing outrageous headlinessuch as Hillary Clinton's alleged involvement in a paedophile ring in the basement of a DC pizzeria -were prominently displayed across search and social media news feeds (Silverman, 2016). Although "fake news" is not a new phenomenon, the spread of these stories-which are both enhanced and constrained by the unique affordances of internet and social networking technologies -has reinvigorated an entire research agenda around digital news consumption and democratic outcomes. Scholars from diverse disciplinary backgrounds -including psychology, sociology and ethnography, economics, political science, law, computer science, journalism, and communication studies -have launched investigations into circulation of socalled "fake news" stories (Allcott & Gentzkow, 2017;Lazer et al., 2018), their role in agendasetting Vargo, Guo, & Amazeen, 2018), and their impact on democratic outcomes and political polarisation (Persily, 2017;Tucker et al., 2018).
However, scholars at the forefront of this research agenda have continually identified several epistemological and methodological challenges around the study of so-called "fake news". A commonly identified concern is the ambiguity of the term itself, as "fake news" has come to be an umbrella term for all kinds of problematic content online, including political satire, fabrication, manipulation, propaganda, and advertising (Tandoc, Lim, & Ling, 2018;Wardle, 2017 reporters…to high risk forms such as foreign states or domestic groups that would try to undermine the political process" (European Commission, 2018). And even when the term "fake news" is simply used to describe news and information that is factually inaccurate, the binary distinction between what is true and what is false has been criticised for not adequately capturing the complexity of the kinds of information being shared and consumed in today's digital media environment (Wardle & Derakhshan, 2017).
Beyond the ambiguities surrounding the vocabulary of "fake news", there is growing concern that the term has begun to be appropriated by politicians to restrict freedom of the press. A wide range of political actors have used the term "fake news" to discredit, attack, and delegitimise political opponents and mainstream media (Farkas & Schou, 2018). Certainly, Donald Trump's (in)famous use of the term "fake news", is often used to "deflect" criticism and to erode the credibility of established media and journalist organisations (Lakoff, 2018). And many authoritarian regimes have followed suit, adopting the term into a common lexicon to legitimise further censorship and restrictions on media within their own borders . Given that most citizens perceive "fake news" to define "partisan debate and poor journalism", rather than a discursive tool to undermine trust and legitimacy in media institutions, there is general scholarly consensus that the term is highly problematic (Nielsen & Graves, 2017).
Rather than chasing a definition of what has come to be known as "fake news", researchers at the Oxford Internet Institute have produced a grounded typology of what users actually share on social media . Drawing on Twitter and Facebook data from elections in Europe and North America, researchers developed a grounded typology of online political communication . They identified a growing prevalence of "junk news" domains, which publish a variety of hyper-partisan, conspiracy theory or click-bait content that was designed to look like real news about politics.
During the 2016 presidential election in the United States, social media users on Twitter shared as much "junk news" as professionally produced news about politics (Howard, Bolsover, Kollanyi, Bradshaw, & Neudert, 2017;. And voters in swing-states tended to share more junk news than their counterparts in uncontested ones . In countries throughout Europe -in France, Germany, the United Kingdom and Sweden -junk news inflamed political debates around immigration and amplified populist voices across the continent (Desiguad, Howard, Kollanyi, & Bradshaw, 2017;Kaminska, Galacher, Kollanyi, Yasseri, & Howard, 2017;. According to researchers on the Computational Propaganda Project junk news is defined as having at least three out of five elements: (1) professionalism, where sources do not employ the standards and best practices of professional journalism including information about real authors, editors, and owners (2) style, where emotionally driven language, ad hominem attacks, mobilising memes and misleading headlines are used; (3) credibility, where sources rely on false information or conspiracy theories, and do not post corrections; (4) bias, where sources are highly biased, ideologically skewed and publish opinion pieces as news; and (5) counterfeit, where sources mimic established news reporting including fonts, branding and content strategies .
In a complex ecosystem of political news and information, junk news provides a useful point of analysis because rather than focusing on individual stories that may contain honest mistakes, it examines the domain as a whole and looks for various elements of deception, which underscores the definition of disinformation. The concept of junk news is also not tied to a particular producer of disinformation, such as foreign operatives, hyper-partisan media, or hate groups, who, despite their diverse goals, deploy the same strategies to generate discoverability. Given that the literature on disinformation is often siloed around one particular actor, does not cross platforms, nor integrate a variety of media sources (Tucker et al., 2018), the junk news framework can be useful for taking a broader look at the ecosystem as a whole and the digital techniques producers use to game search engine algorithms. Throughout this paper, I use the term "junk news" to describe the wide range of politically and economically motivated disinformation being shared about politics.

THE LOGIC AND POLITICS OF SEARCH
Search engines play a fundamental role in the modern information environment by sorting, organising, and making visible content on the internet. Before the search engine, anyone who wished to find content online would have to navigate "cluttered portals, garish ads and spam galore" (Pasquale, 2015). This didn't matter in the early days of the web when it remained small and easy to navigate. During this time, web directories were built and maintained by humans who often categorised pages according to their characteristics (Metaxas, 2010). By the mid-1990s it became clear that the human classification system would not be able to scale. The search engine "brought order to chaos by offering a clean and seamless interface to deliver content to users" (Hoffman, Taylor, & Bradshaw, 2019).
Simplistically speaking, search engines work by crawling the web to gather information about online webpages. Data about the words on a webpage, links, images, videos, or the pages they link to are organised into an index by an algorithm, analogous to an index found at the end of a book. When a user types a query into Google Search, machine learning algorithms apply complex statistical models in order to deliver the most "relevant" and "important" information to a user (Gillespie, 2012). These models are based on a combination of "signals" including the words used in a specific query, the relevance and usability of webpages, the expertise of sources, and other information about context, such as a user's geographic location and settings (Google, 2019).
Google's search rankings are also influenced by AdWords, which allow individuals or companies to promote their websites by purchasing "paid placement" for specific keyword searches. Paid placement is conducted through a bidding system, where rankings and the number of times the advertisement is displayed are prioritised by the amount of money spent by the advertiser. For example, a company that sells jeans might purchase AdWords for keywords such as "jeans", "pants", or "trousers", so when an individual queries Google using these terms, a "sponsored post" will be placed at the top of the search results. 2 AdWords also make use of personalisation, which allow advertisers to target more granular audiences based on factors such as age, gender, and location. Thus, a local company selling jeans for women can specify local female audiences -individuals who are more likely to purchase their products.
The way in which Google structures, organizes, and presents information and advertisements to users is important because these technical and policy decisions embed a wide range of political issues (Granka, 2010;Introna & Nissenbaum, 2000;Vaidhynathan, 2011). Several public and academic investigations auditing Google's algorithms have documented various examples of bias in Search or problems with the autocomplete function (Cadwalladr, 2016a;Pasquale, 2015).
Biases inherently designed into algorithms have been shown to disproportionately marginalise minority communities, women, and the poor (Noble, 2018 (2017). Metaxa-Kakavouli and Torres-Echeverry suggest that the low levels of "fake news" are the result of Google's "long history" combatting spammers on its platform (2017). Another research paper by Golebiewski and boyd looks at how gaps in search engine results lead to strategic "data voids" that optimisers exploit to amplify their content (2018). Golebiewski and boyd argue that there are many search terms where data is "limited, non-existent or deeply problematic" (2018). Although these searches are rare, if a user types these search terms into a search engine, "it might not give a user what they are looking for because of limited data and/or limited lessons learned through previous searches" (Golebiewski & boyd, 2018).
The existence of biases, disinformation, or gaps in authoritative information on Google Search matters because Google directly impacts what people consume as news and information. Most of the time, people do not look past the top ten results returned by the search engine (Metaxas, 2010). Indeed, eye-tracking experiments have demonstrated that the order in which Google results are presented to users matters more than the actual relevance of the page abstracts (Pan et al., 2007). However, it is important to note that the logic of higher placements does not necessarily translate to search engine advertising listings, where users are less likely to click on advertisements if they are familiar with the brand or product they are searching for (Narayanan & Kalyanam, 2015).
Nevertheless, the significance of the top ten placement has given rise to the SEO industry, whereby optimisers use digital keyword strategies to move webpages higher in Google's rankings and thereby generate higher traffic flows. There is a long history of SEO dating back to the 1990s when the first search engine algorithms emerged (Metaxas, 2010 (2019) found that despite more than 125 announcements over a three-year period, the algorithmic changes made by the platforms did not significantly alter digital marketing strategies.
This paper hopes to contribute to the growing body of work examining the effect of Search on the spread of disinformation and junk news by empirically analysing the strategies -paid and optimised -employed by junk news domains. By performing an audit of the keywords junk news websites use to generate discoverability, this paper evaluates the effectiveness of Google in combatting the spread of disinformation on Search.

METHODOLOGY CONCEPTUAL FRAMEWORK: THE TECHNO-COMMERCIAL INFRASTRUCTURE OF JUNK NEWS
The starting place for this inquiry into the SEO infrastructure of junk news domains is grounded conceptually in the field of science and technology studies (STS), which provides a rich literature on how infrastructure design, implementation, and use embeds politics (Winner, 1980). Digital infrastructure -such as physical hardware, cables, virtual protocols, and code -operate invisibly in the background, which can make it difficult to trace the politics embedded in technical coding and design (Star & Ruhleder, 1994). As a result, calls to study internet infrastructure has engendered digital research methods that shed light on the less-visible areas of technology. One growing and relevant body of research has focused on the infrastructure of social media platforms and the algorithms and advertising infrastructure that invisibly operate to amplify or spread junk news to users, or to micro-target political advertisements (Kim et al., 2018;Tambini, Anstead, & Magalhães, 2017). Certainly, the affordances of technology -both real and imagined -mutually shape social media algorithms and their potential for manipulation (Nagy & Neff, 2015;Neff & Nagy, 2016). However, the proprietary nature of platform architecture has made it difficult to operationalise studies in this field. Because junk news domains operate in a digital ecosystem built on search engine optimisation, page ranks, and advertising, there is an opportunity to analyse the infrastructure that supports the discoverability of junk news content, which could provide insights into how producers reach audiences, grow visibility, and generate domain value.

JUNK NEWS DATA SET
The

JUNK NEWS KEYWORD OPTIMISATION STRATEGIES
In order to assess the keyword optimisation strategies used by junk news websites, I worked with SpyFu, which provided historical keyword data for the 29 junk news domains, when those keywords made it to the top-50 results in Google between January 2016 and March 2019. In total, there were 88,662 unique keywords in the data set. Given the importance of placement on Google, I looked specifically at keywords that indexed junk news websites on the first -and most authoritative -position. Junk news domains had different aptitudes for generating placement in the first position (See Table 1  Different keywords also generate different kinds of placement over the 39-month period. Table   2 (see Appendix) provides a sample list of up to ten keywords from each junk news domain in the sample when the keyword reached the first position.
First, many junk news domains appear in the first position on Google Search as a result of "navigational searches" whereby a user entered a query with the intent of finding a website. A search for a specific brand of junk news could happen naturally for many users, since the Google Search function is built into the address bar in Chrome, and sometimes set as the default search engine for other browsers. In particular, terms like "infowars" "breitbart" "cnsnews" and "rawstory" were navigational keywords users typed into Google Search. The performance of brand searches over time consistently places junk news webpages in the number one position (see Figure 3: Brand-related keywords over time). This suggests that brand-recognition plays an important role for driving traffic to junk news domains. Second, many keywords that made it to the top position in Google Search results are what Golebiewski and boyd (2018) would call terms that filled "data voids", or gaps in search engine queries where there is limited authoritative information about a particular issue. These keywords tended to focus on conspiratorial information especially around President Barack Obama ("Obama homosexual" or "stop Barack Obama"), gun rights ("gun control myths"), prolife narratives ("anti-abortion quotes" or "fetus after abortion"), and xenophobic or racist content ("against Islam" or "Mexicans suck"). Unlike brand-related keywords, problematic search terms do not achieve a consistently high placement on Google Search over the 39-week period. Keywords that ranked in number one for more than 30-weeks include: "vz58 vs. ak47", "feminizing uranium", "successful people with down syndrome", "google ddrive", and "westboro[sic] Baptist church tires slashed". This suggests that, for the most part, data voids are either being filled by more authoritative sources, or Google Search has been able to demote websites attempting to generate pseudo-organic engagement via SEO.

THE PERFORMANCE OF JUNK NEWS DOMAINS ON GOOGLE SEARCH
After analysing what keywords are used to get junk news websites in the number one position, the next half of my analysis looks at larger trends in SEO strategies overtime. What is the relationship between organic clicks and the value of a junk news website? How has the effectiveness of SEO keywords changed over the past 48 months? And have changes made by Google to combat the spread of junk news on Search had an impact on its discoverability?

JUNK NEWS, ORGANIC CLICKS, AND THE VALUE OF THE DOMAIN
There is a close relationship between the number of clicks a domain receives and the estimated value of that domain. By comparing figure 4 and 5, you can see that the more clicks a website receives, the higher its estimated value. Often, a domain is considered more valuable when it generates large amounts of traffic. Advertisers see this as an opportunity, then, to reach more people. Thus, the higher the value of a domain, the more likely it is to generate revenue for the operator. The median estimated value of the top-29 most popular junk news was $5,160 USD banned Infowars from their platforms, and the domain has not been able to regain its clicks nor value since. This demonstrates the powerful role platforms play in not only making content visible to users, but also controlling who can grow their website value -and ultimately generate revenue -from the content they produce and share online.   Figure 7). In fact, after August 2017 there has been a gradual increase in the organic results of mainstream news media. After almost a year, the top-performing junk news websites have regained some of their organic results, but the levels are not nearly as high as they were leading up to and preceding the 2016 presidential election. This demonstrates the power of Google's algorithmic changes in limiting the discoverability of junk news on Search. But it also shows how junk news producers learn to adapt their strategies in order to extend the visibility of their content. In order to be effective at limiting the visibility of bad information via search, Google must continue to monitor the keywords and optimisation strategies these domains deploy -especially in the lead-up to elections -when more people will be naturally searching for news and information about politics.

CONCLUSION
In conclusion, the spread of junk news on the internet and the impact it has on democracy has certainly been a growing field of academic inquiry. This paper has looked at a small subset of this phenomenon, in particular the role of Google Search in assisting in the discoverability and monetisation of junk news domains. By looking at the techno-commercial infrastructure that junk news producers use to optimise their websites for paid and pseudo-organic clicks, I found: Junk news domains do not rely on Google advertisements to grow their audiences and instead 1.
focus their efforts on optimisation and keyword strategies; Navigational searches drive the most traffic to junk news websites, and data voids are used to 2.
grow the discoverability of junk news content to mostly small, but varying degrees. Many junk news producers place advertisements on their websites and grow their value 3.
particularly and the variety of bad actors exploiting technology to influence political outcomes has also led to the manipulation of Search. Google's response to the optimisation strategies used by junk news domains has had a positive effect on limiting the discoverability of these domains over time.
However, the findings of this paper are also showing an upward trend, as junk news producers find new ways to optimise their content for higher search rankings. This game of cat and mouse is one that will continue for the foreseeable future.
While it is hard to reduce the visibility of junk news domains when individuals actively search for them, more can be done to limit the ways in which bad actors might try to optimise content to generate pseudo-organic engagement, especially around disinformation. Google can certainly do more to tweak its algorithms in order to demote known disinformation sources, as well as identify and limit the discoverability of content seeking to exploit data voids. However, there is no straightforward technical patch that Google can implement to stop various actors from trying to game their systems. By co-opting the technical infrastructure and policies that enable search, the producers of junk news are able to spread disinformation -albeit to small audiences who might use obscure search terms to learn about a particular topic.
There have also been growing pressures for regulators to take steps that force social media platforms to take greater actions that limit the spread of disinformation online. But the findings of this paper have two important lessons for policymakers. First, the disinformation problemthrough both optimisation and advertising -on Google Search is not as dramatic as it is sometimes portrayed. Most of the traffic to junk news websites are by users performing navigational searches to find specific, well-known brands. Only a limited number of placements -as well as clicks -to junk news domains come from pseudo-organic engagement generated by data voids and other problematic keyword searches. Thus, requiring Google to take a heavyhanded approach to content moderation could do more harm than good, and might not reflect the severity of the problem. Second, the reason why disinformation spreads on Google are reflective of deeper systemic problems within democracies: growing levels of polarisation and distrust in the mainstream media are pushing citizens to fringe and highly partisan sources of news and information. Any solution to the spread of disinformation on Google Search will require thinking about media and digital literacy and programmes to strengthen, support, and sustain professional journalism.

APPENDIX 1
Junk news seed list (

APPENDIX 2
Table 2: A sample list of up to ten keywords from each junk news domain in the sample when the keyword reached the first position. 100percentfedup.com dailywire.com theblacksphere.net gruesome videos 6 states bankrupt 22 black sphere 28 snopes exposed 5 ms 13 portland oregon 15 dwayne johnson gay 10 gruesome video 4 the gadsen flag 12 george soros private security 1