When internet interconnection trouble occurs, immediate coordination kicks in

Disclaimer: The author is the recipient of a grant from RIPE NCC’s academic cooperation initiative.

For the majority of people in developed countries, the Internet is invisible most of the time. A socket in the wall, a cell site atop a building, a WiFi password written on a restaurant menu – only rarely are we reminded of the fact that Internet connectivity is not just there like a natural resource. It has become ambient. But for some people there is another side to it. As ethnographer Susan Leigh Star wrote in 1999, referring to the relational character of infrastructures: “One person’s infrastructure is another’s topic, or difficulty”. In this text, I discuss a group of professionals for whom Internet connectivity is a topic and sometimes, a difficulty: the network engineers and peering coordinators (in short: networkers) around the world who make the Internet work.

The size of this group can only be approximately estimated: The Internet today has grown to close to 60,000 networks (called “autonomous systems” or “AS” in technical parlance). Yet the majority of these networks do not manage their Internet connectivity actively; they purchase it from a larger transit network. Like a leaf that hangs off a tree they connect to the Internet with one stem. These networks do not play an active role in shaping the routing system; hence, for the purposes of this observation, they can be excluded. However, an estimated fifth of all Internet networks are relevant. They interconnect with two or more Internet exchanges, which means that they manage their interconnections actively. At each of these networks there is at least one networker whose job involves determining which interconnection is used for what traffic, configuring the interfaces and maintaining what I suggest calling an infrastructural relationship with the neighboring networks. So as a lower bound estimate, there are at least 12,000  networkers who take care of Internet connectivity globally.

How do networkers ensure Internet connectivity?

These networkers jointly manufacture Internet connectivity. But how do they do it practically? Due to the decentralised character of the Internet, there is no hierarchical or multilateral organisation, no global institution to impose rules upon the network operators. Organisations such as the Regional Internet Registries, Internet exchanges or standard developing organisations support what many refer to as a community on a meso-level. But they have no say in how networkers ultimately operate their networks. Is Internet connectivity thus the aggregated result of atomised, unrelated, individual actions, as the market metaphor would suggest? Is it contingent? Or what forms of coordination can we find between networkers? Existing research already highlights the role of trust, distrust or reputation between networkers. But it does not yet grasp the global dimension.

In the following, I will argue that at the core of the Internet we can find an interaction order among networkers that is ongoing, immediate and global in scope. This interaction order is laid out in the routing system; it is enacted by the networkers and complemented by the use of channels for real-time communication. This form of coordination among networkers plays a critical role in the maintenance of Internet connectivity.

Theoretical backdrop: Scopic media expand face-to-face situations

As a backdrop for this argument I am drawing on the work of economic sociologist Karin Knorr-Cetina. Having studied traders on financial markets, Knorr-Cetina noticed that these professionals make up a “genuinely global form” by which she means “fields of practice that stretch across all time zones” (2014). Traders are experts who work almost autonomously. She found that the central banking system facilitates and shapes direct interactions between them, although the trading desks are geographically distributed. It serves as what she calls “scopic media”. Knorr-Cetina uses the term scope to refer to a reflexive mechanism of both observation and projection. Traders both observe and act upon this central system, and they are kept in check by it, wherever they are. She suggests that scopic media bring together distributed phenomena or actors. They move the boundary between situations and systems, expand local situations geographically, and they transform face-to-face situations into synthetic situations.

In a limited way, networkers can be compared to traders. Both professional groups comprise of highly specialised experts in a field that is characterised by complexity. Because networking requires expert knowledge and experience, it is not uncommon for networkers to act more or less autonomously within their organisations (less so in large firms) – simply because their superiors may not be able to comprehend the details. The reverse side of their specialisation is that many lack counterparts at their workplaces with whom they can discuss arising issues. This is one of the reasons why networkers are inclined to turn to fellow professionals outside of their organisation for advice or chat, even if those colleagues are competitors on a company level.

Networkers are united by the routing system

Thinking along the lines of Knorr-Cetina’s concept of scopic media, let’s look at what promotes the common concern shared by networkers around the globe. Networkers are virtually united by a continuous focus on the routing system. Although each networker has his or her unique perspective, the routing system is of interest to everyone. This system emerges from the path availability information that networks announce to their neighbouring networks by means of the Border Gateway Protocol (BGP).

Every network depends on this resource, which is produced collaboratively. But, as any networker knows, BGP comes with uncertainties. One of the well-known challenges is that, even today, networkers cannot generally know if the route information that they receive from their counterparts is correct. There is no central authority either. This uncertainty, in combination with further interdependencies between the networks, can lead to critical situations at any moment, so a common focus is necessary. “One movement from your partner can ruin your network for hours while you understand what’s wrong”, explains one networker.

“Think of Houston Mission Control”

As a consequence, networkers have monitoring systems in place. Real-time alerts draw their attention to irregularities at their interconnections. Sometimes scripts also trigger automated responses, such as the re-routing of traffic. In small networks, networkers will “watch the edge routers all the time”. Larger networks have so-called network operations centers (NOCs). “Think of Houston Mission Control”, explains one high level engineer, “our people work in shifts to watch over our network 24 hours a day.” The monitoring systems produce graphs with live views of the network’s interconnections. “You understand when something goes wrong almost immediately”, comments another networker. A key qualification for networkers is to be able to “stay on screen for hours”, adds a third. Because it absorbs the networker’s attention while reflecting his or her inputs, the routing system can be seen as a “centering and mediating device” in accordance with Knorr-Cetina’s concept of scopic media.

Looking at it more closely, networkers receive feedback about quality parameters at their interconnection points through monitoring tools. The tools deliver information about interface utilisation, congestion, the latency of important connections, packet loss or about the volume of traffic that they are exchanging with other networks. Some tools also highlight when abnormalities in traffic patterns occur or when traffic flows shift from one interconnection to another. However, while monitoring tools do alert the networkers to problems, they cannot explain them. They indicate that something may be wrong, but they do not identify the issue. An edge router may be set up to alert its operator that it is receiving an unusually large number of prefixes (these are clusters of IP addresses) from one interconnection partner, but it will not be able to determine why that is. This is an issue because the causes of irregularities can be manifold and networkers have to react to them, often quickly.

Human expertise required

For Internet users it may be difficult to imagine the lack of certainty in Internet networking. So here are some examples of situations that networkers have to decipher: A jump in the number of prefixes received from an interconnection partner could indicate a problematic route leak (a misconfiguration), but it could also mean that the interconnection partner’s network has undergone a corporate merger and has more customers now. Traffic flows between two networks may shift from one interconnection point to another and cause congestion for various reasons: there may be hardware problems, another network may have shut down the interconnection session to do maintenance, or there may be problems with the physical infrastructure occurring, for example, when a tractor has unearthed the fibre wave. A networker may see traffic levels increase at one point because of a media event (think Olympics) or because the interconnecting network has started to peer with a customer of hers. These examples indicate that while the routing system does serve as a kind of a nervous system, it lacks comprehensive talk-back functions. The information that a single networker can gather from it is limited and potentially full of unknowns.

Need for cooperation

Networkers help themselves by coordinating with each other directly beyond what can be fitted in BGP messages. When problems appear to be so broad in scope that they cannot be resolved with the networker’s direct interconnection partners or friends alone, one place to turn to for information is the many mailing lists that are maintained by the Internet exchanges, by local Network Operator Groups (NOGs) or by the Regional Internet Registries. Yet these lists are, by definition, limited in scope. What’s more, turning to the right mailing list already requires a preliminary understanding of the problem. And communication via the lists can be slow.

So networkers have found themselves another place to go to, especially when problems arise. They use chat rooms “to stay in touch with the community at large”. The existence of these informal rooms is neither a secret, nor is it widely known outside the networker community. They are technically open but socially closed places for networkers to meet online. Depending on the room, a couple of hundred networkers will be logged in at any time of the day. Even large networks try to listen in on the chat. It is common, too, that networkers set up alerts so that they will get notified when their AS number is mentioned in the chat. Some find the chats to be “a waste of time” because of the high “spam to noise ratio”. Others report having turned their backs on these chat rooms due to other users’  bad manners and misogynist behaviour. Yet others describe the chats as invaluable places for information-sharing and swift coordination. Certainly, any explanation of how networkers manufacture Internet connectivity would be incomplete without mentioning the use of this medium. It would mean omitting an important part of the informal self-governance at the core of the Internet.

Useless chatter – until something happens

What most characterises the chat rooms in the eyes of my interviewees is that they allow networkers to communicate quickly and directly with each other without a formal process around it. Networkers do not want company borders or bureaucracy to stand in their way when they are busy getting things done, i.e. fixing technical problems. Chatroom users appreciate the immediate virtual presence of their colleagues. They use the chat room as a diagnostic tool, they address the crowd to find people who share niche problems, e.g. problems with specific hardware configurations. And they also distribute time-critical information that will be of interest to many others, e.g. when there are problems at a large Internet exchange facility. A typical scenario is also that one networker searches for someone from a specific other network, and when the match is made, they switch over to a private conversation.

The dimension I would like to highlight here, however, is, the wider capacity for ad-hoc coordination that can emerge from these chat rooms. This happens when things go wrong. Because, as one networker comments, “it is mostly useless chatter – until something happens”. Here are three of the many examples given by networkers for what they do in the chat rooms:

“The day it matters is a day like 9/11. A day like that where something goes wrong. Where you loose a Gigabit of capacity in one day and you say: ‘I have lost half of my capacity to the States. I need help.’ And someone answers: ‘You’re lucky. I have half free. Do you want it for a few days?’ (...) People were giving each other products you would have charged for. People were giving away capacity to make sure it worked.”

In another instance in 2012, a hurricane had caused damage in the Caribbean and the North-Eastern part of the U.S.. Through RIPE NCC’s distributed network monitoring system “Atlas”, networkers learned where power outages had occurred. In the chat room, they gathered to organise generators and diesel to restore power to facilities that were down.

In one last example, a highly trusted network became the source of a major irritation in the routing system. According to one networker, in 2010 the following happened:

“RIPE NCC sent out a specific BGP update message to their peers and upstreams to test whether BGP sessions could carry a certain large payload. Usually those messages are quite small in size. They wanted to test: can we carry cryptographic signatures in the payload? And those, obviously, usually are larger. So they sent an update that was technically valid and should not have created problems. But it did create massive problems. And because of the particular software defect that was triggered it would ripple out over the entire Internet and continue to be destructive. It was a very unique case. And it was on that chat that within minutes people started, within seconds even, they started talking about ‘Where is this coming from?’, ‘Who is sending this update?’, ‘What is causing this?’ And the matter was resolved quite fast, once people realised that it was the RIPE NCC update. And then they called RIPE NCC and they said: ‘You should turn this off immediately.’ I feel that the chat room was very useful in scenarios like that.”

Future-proofing the Internet

These examples expose a specific trait of the global interaction order: it has the ability to deal with new and unforeseen problems. The Internet is often praised for being open and for allowing permissionless innovation on the application layer levels. But as we can see, such innovations can pose new challenges to networks, demanding continuous adaptability from those who seek to ensure Internet connectivity. Informal structures foster flexibility and timely problem-solving.

That said, the global interaction order gives rise to  questions, some of which are practical, some are matters of principle. Practical questions arise with regard to growth. Only two years ago, the Internet consisted of close to 50,000 networks; now there are close to 60,000 already. (How) will new generations of networkers find their way into these structures, learn about informal rules or change them if the Internet continues to grow at this pace? Some networkers are already complaining that the size of the chat rooms has become unworkably large. Others fear that with more users comes increasing publicity, which might bring some of the practices under scrutiny. This leads to the question of principle. If this informal mode of coordination is so important that it should be considered an expression of Internet governance, then it is important to create legitimacy around it. This text is an attempt to start this process.

Networkers will undoubtedly be doing themselves no favours if they give into the urge to seal off their semi-public virtual societies or retreat to private social networks. This is for two reasons: First, the way the Internet is designed, networks are in this together, and they will be affected by the actions of newcomers in any case. Second, what is the alternative? If networkers lose this means of easy, open and immediate coordination, they will be left with the current, fragmented organisational structures. These structures serve important purposes. They cannot, however, substitute for the flexible means of immediate cooperation needed in a pinch.

Reference

 
Knorr Cetina, K. (2014). Scopic media and global coordination: the mediatization of face-to-face encounters. In K. Lundby (Ed.), Mediatization of communication. Berlin: de Gruyter., S.40
 
Star, S. L. (1999). The Ethnography of Infrastructure. American Behavioral Scientist43(3), 377-391, p.380.

Add new comment