An unanticipated consequence of the rise of digital publishing has been the centralization of authority over who may publish and what works may be discovered.

Early proponents of the power of digital publishing celebrated the ways in which the Internet, and in particular the world wide web, democratized both access to information and the ability to disseminate knowledge to wide audiences. News organizations might evade government controls of the press by publishing on servers outside their nations’ borders. Dissidents could organize in the digital public sphere, evading controls that prevented freedom of assembly in the physical world. Scholars could disseminate work in progress directly to the web either outside of the process of peer review or under the aegis of new types of online journals.

It’s possible this utopian vision reigned for at least the early years of the consumer web, when independent online publishing was common. It’s also arguable that this has always been a fantasy, and that chokepoints like the domain name system and large internet service providers have always had the power to control speech. But since 2010, publishing online has centralized on a few commercial platforms, notably Amazon Web Services (which provides hosting and backend for over a million different websites, including those for publishers like Netflix, Instagram, and GitHub ); Facebook, which hosts content produced by over 1.7 billion people; Google, whose YouTube service hosts a significant portion of the web’s video content; as well as smaller players like WordPress and Wikimedia.

These platforms have immense power over what speech is possible, and their decisions are opaque and not subject to external review. When Facebook decided to prohibit Nick Ut’s Pulitzer-winning photo of Kim Phúc running from a napalm attack, public shaming was the only option Norwegian newspaper Aftenposten found for appealing the decision. Publishing platforms face intense pressure from governments to block controversial content, from Israeli government pressure to remove critical content from Facebook to the “blockade” against Wikileaks that caused the leaks organization to lose web hosting, domain name services, and services to accept donations.1Sauter, The Coming Swarm: DDOS Actions, Hacktivism, and Civil Disobedience on the Internet, Bloomsbury, 2014. Intro.Examples abound of Google’s power to discriminate through indexing of information, and the results merit close study.Far more subtle forms of content control happen every day, from organized campaigns to “flag” and demand removal of content a group of coordinated individuals find offensive,2Crawford and Gillespie, “What is a Flag For?”, New Media and Society, 2014. from Buddhist Burmese flagging pro-Rohingya content to Palestinian and Israeli activists attempting to silence each other.

Advocates like Brewster Kahle have begun considering new architectural approaches to decentralizing online publishing, in hopes of eliminating the points of control this centralization has engendered. These approaches, mostly speculative, seek to avoid these chokepoints by distributing files over hundreds of machines, much as networks like BitTorrent have kept copyrighted material online in the face of legal challenges by distributing a flood of copies. The most ambitious of these systems use the public ledger mechanism of Bitcoin, the “blockchain”, to register ownership of these works and retain the ability to modify or change them in the future. While these proposals promise to “lock the web open” architecturally, they are complex and may turn out to be technically infeasible. But they are comparatively simple when compared to the much more difficult challenge of decentralizing systems for content discovery.

The centralization of web publishing is a fairly recent phenomenon; the centralization of discovery systems happened very early in the life of the web. Yahoo! rose to prominence by providing a rich directory to web content in 1994, and Google became the Internet’s dominant company with a search engine that was a vast improvement over existing technologies. Google’s position in this field has been largely unchallenged since 2000, despite ambitious projects from Microsoft (Bing) and an EU-backed consortium (Quaero). The Wikipedia community, led by Lila Tretikov, began an ambitious project to build an alternative search platform called the Knowledge Engine, which led to a community rebellion and to Tretikov’s ouster as Wikimedia executive director.

While the path toward alternatives to Google’s search is lined with the bodies of those who’ve come before, it’s a well-trodden trail precisely because the destination is so important. Google holds immense power over what content is seen and what is hidden, and the company’s decisions are opaque, without appeal, and strike without warning—just ask any business that saw a sharp decline in traffic when Google tweaked its search algorithms. Examples abound of Google’s power to discriminate through indexing of information, and the results merit close study, even if the discrimination is entirely unconscious on the part of the engineers or a reflection of larger societal biases that emerge when collected by Google’s tools.3See Sweeney, “Discrimination in Online Ad Delivery,” 2013, which found that searches for distinctively African-American names were more likely to be associated with ads for arrest records than searches for distinctively Caucasian names. It is unlikely that Google engineers consciously chose to target these ads to African-American names. More likely, ads for arrest records were clicked more often on searches for African-American names, which led Google AdSense to serve more of these ads.

The Microsoft, EU, and Wikimedia projects have all attempted to decrease Google’s power by providing alternative search infrastructure. Wikimedia’s proposed project went a step further toward openness by proposing that search algorithms would be open, allowing community members to propose best matches for searches much as they contribute content to Wikipedia articles. This would have been extremely difficult to accomplish, as a vast “search engine optimization” industry exists to discover how search engines prioritize results and alter webpages to make their discovery more likely. Google’s intense secrecy over its search algorithms is a matter not just of competitive advantage but an understandable step in ensuring its product remains usable.

More difficult than building a successful competitor to Google—a task no technology company or government research effort has accomplished—would be building a truly decentralized search system. Google, Bing, or any existing search engine control what information is added to their catalogs and the algorithms that decide what results are most relevant.In a world of scarce attention, those who control curation and discovery systems control what we encounter and what we know. Truly decentralized search would eliminate both these points of control, allowing users to select catalogs and algorithms that include or exclude positions, preferences and points of view. At present, no high visibility projects are attempting this complex task. It is a challenge even to imagine what such a system might look like.

The centralization of discovery systems matters because the ability to publish without the ability to be discovered is an empty promise. Before the rise of digital publishing, the ability to circulate your words was a scarce commodity. As predicted by Herbert Simon in 1971, “…in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”4Simon, H. A. (1971), “Designing Organizations for an Information-Rich World,” in Martin Greenberger, Computers, Communication, and the Public Interest (PDF), Baltimore, MD: The Johns Hopkins Press, ISBN 0-8018-1135-X In a world of scarce attention, those who control curation and discovery systems control what we encounter and what we know.

Google has quietly emerged as a powerful gatekeeper for scholarly communication with Google Scholar, an alternative to Elsevier’s Scopus and Thompson Reuters’s Web of Science. While all three databases use measures of total publications, citations, and h-index to determine what publications are selected as results for a particular search, in each case, the actual algorithm and rules for catalog inclusion are opaque and beyond appeal. Google Scholar seems to follow the loose rule that “if academic literature cites it, it’s academic literature,” a recursive definition that has allowed the site to bootstrap a collection that includes journals, academic and popular press books, newspaper and magazine articles, and blog posts. But it is impossible to audit the rules Google uses for inclusion in its catalog,5Christian Sandvig’s forthcoming “Glass Boxing—Diagnosing and Responding to Algorithmic Harms” may provide a template for how systems like this might be best interrogated. or to question its treatment of closed access versus open access journals, or scholarship in the public sphere versus in dedicated academic spaces.

As we consider the transformations in the production, publication, and archiving of social research under digital conditions, it is essential that we understand that scholarly publishing and discovery, a space traditionally controlled by university presses and scholarly peers, is now centralizing around a small number of technically sophisticated commercial firms. The good news in this development is that we have the opportunity to make collective cause with those seeking to ensure online publishing and discovery systems are transparent, fair, auditable, and distributed. The bad news is that we find ourselves joining a profoundly uphill battle, where many of our goals are merely infeasible and others may be technically impossible.

References   [ + ]