In the wake of the 2016 US presidential election, librarians, civic hackers, technologists, cultural heritage institutions, journalists, and citizen scientists grew increasingly concerned that government data—particularly data related to topics like climate change, gerrymandering and redlining, and other politically charged subjects—might be threatened through censorship or neglect. Initiatives have sprung up throughout the country, spearheaded by groups like DataRefuge, the Environmental Data & Governance Initiative (EDGI), and the End of Term Web Archive, that attempted to collect, secure, and document government data. The vulnerability of these data underscored the need for quick action on the one hand, and sustained attention to the mechanisms that support (or fail to support) preservation and access to government data, information, and records on the other.
Now in its second year, Endangered Data Week (EDW) takes place February 26 through March 2, 2018, at campuses and institutions around the world. The initiative’s goal is to build upon prior work in digital preservation and coordinate annual international events to raise awareness about threatened data. EDW, supported by the Digital Library Federation (DLF), fosters public conversations about data and encourages the development of reusable curricula for engaging technologists, scholars, librarians, archivists, faculty, students, journalists, nonprofits, and citizens on questions relating to the acquisition, manipulation, visualization, use, and politics of public data.
Decentralized, distributed, and open to all, EDW runs a variety of events designed to shed light on continuing or potential threats to data, train people on acquiring and working with difficult datasets, and build a culture of data consciousness and records transparency. In this way, EDW advocates for open data policies at all levels of government, highlights how such policies can benefit the public and policymakers, and supports the sharing of skills and practices for working with data across a range of disciplines and professions. This year, EDW will offer several events covering a range of issues, skills, and topics. There are over 40 EDW events scheduled across the US and Canada; attendees will participate in health hackathons, learn to work with local open data, explore the accessibility of government data, take part in public Twitter chats about endangered data, and engage in a wide range of other events.
Threats to public data over the past year
There are many reasons why federal data may become unavailable or difficult to access, including under-or unfunded infrastructure, privatization, and political suppression. Since the beginning of the Trump administration, we’ve seen valid reasons for concern in all three areas.
Data sharing and publication is often an unfunded mandate, and the federal government’s ability to share all of the data and information it produces has always been a challenge. Although it has been nearly nine years since the launch of data.gov and nearly five years since the US CIO’s Federal Open Data Policy, the portal only connects users with a small fraction of the federal data appropriate for public use. The past year has seen an increase in austerity measures across nearly all federal agencies. (Brandon Locke’s April 2017 essay “Protect Government Data for Future Historians: Announcing Endangered Data Week” offered one early snapshot of some of these measures.) These budget reductions, alongside an unprecedented number of unfilled positions, pose a serious threat to the ability of agencies to publish open data and maintain the infrastructure needed for dependable access. The use of private contractors and products means that more and more of the data and algorithms used by federal, state, and local governments are inaccessible to the public.1Andrew Guthrie Ferguson discusses the issue of opaque, proprietary predictive policing technology and algorithms in this Data+Society Podcast episode.
Since the inaugural Endangered Data Week in April 2017, threats to government data and information have continued to grow. Limitations on federal data due to political motivations have probably garnered the most media attention. The removal, suppression, or reframing of data may be done to prevent research that may run counter to the current administration’s policies, or to prevent effective opposition to an administration’s narratives. Below, we highlight a few of the key ways in which these dangers have developed in that time.
The Trump administration’s appointment of more directors from the private sector also signals a move toward paywalls, proprietary data and algorithms, and less transparency. For example, Barry Myers, Trump’s pick to lead the National Oceanic and Atmospheric Administration (NOAA), is currently the CEO of AccuWeather. Myers and AccuWeather have long advocated for limiting publicly available information from the Weather Service, instead allowing companies to buy in to a higher level of information access.
One important mechanism supporting free public access to government information is a section of the US Code, Title 44, which establishes the Federal Depository Library Program (FDLP) and the online public access portal for the Government Publishing Office (GPO). Title 44 also contains the only legal guarantee that the federal government will provide its public information for free to the general public. Knowledgeable observers were thus understandably concerned when, in June of 2017, Congress announced a new effort to reform Title 44, releasing a draft of the bill in December 2017. 2See Rachel Mattson’s “Title 44 and the Uncertain Future of Free Public Access to Government Info in the US” and her August 2017 interview with Jim Jacobs on Title 44
Although most observers agree that the law does need some revision (especially to address the challenges posed by born-digital government information) many have also noted that the current text of the bill leaves a great deal to be desired. According to the team at freegovinfo.info, the bill allows the “GPO to delete online information without providing any principles or guidelines or goals to achieve when it does so.” It also centralizes control over distribution and effectively slashes the GPO’s budget. Others (see posts from the Law and Technology Resources for Legal Professionals blog and former Depository Library Council member Bernadine Abbott Hoduski) have decried the bill’s efforts to open the door to the privatization of government information and its proposed elimination of the Congressional Joint Committee on Printing, among other worrisome features.
Many of the concerns since 2016 regarding access to public data have been focused on climate and environmental data, and, while there haven’t yet been massive removals of data, there have still been extensive efforts to suppress climate and pollution data. However, thanks to the tireless work of EDGI, we can trace the many changes in culture and secrecy as they spread across the EPA and other agencies. According to EDGI’s The First 100 Days and Counting report, links to climate change initiatives and agency objectives have been removed; cuts have been made to data collection, infrastructure, and data usage training; and the EPA has made a notable shift toward “job creation” instead of environmental protection.
The FBI’s October 2017 release of its Crime in the United States report garnered a great deal of press coverage for its significant decrease in data tables. While the FBI contends that this is part of a modernization effort that streamlines the report and makes more data available through its Crime Data Explorer interface, after several months, much of the data remains unavailable to social scientists and nonprofits that rely on the data for research and the allocation and justification of services. Some also have doubts about the FBI’s claims on access to these data. Not only does this lack of access hamper researchers and service providers, it also makes it difficult to challenge or confirm claims from the FBI and the Department of Justice.
Concern has also mounted, over the past year, about the fate of the 2020 census. In spring of 2017, census director John H. Thompson resigned his post. Subsequently, President Trump signaled his intention to appoint Thomas Brunell, author of the 2008 book Redistricting and Representation: Why Competitive Elections Are Bad for America. Brunell has since stepped aside, but the agency remains leaderless and—according to many civic and legal groups—woefully underfunded.3See, e.g. http://www.commoncause.org/states/new-york/research-and-reports/the-count-starts-now-2020-census.pdf Experts warn that underfunding, along with an inadequate questionnaire distribution plan (relying almost entirely on mail and online access) threatens to boost the census’s historical tendency to undercount vulnerable populations such as immigrants and the homeless. A new Trump administration proposal to add a question about citizenship to the census form compounds these issues.4See http://thehill.com/regulation/372445-citizenship-question-drives-uncertainty-over-2020-census; https://www.colorlines.com/articles/census-bureau-ignore-obama-era-recommendations-recording-race-ethnicity And we may continue to learn about factors threatening a robust 2020 population count if the NAACP, which recently filed suit to compel the Commerce Department to produce records about preparations for the 2020 census, is successful.5See http://www.naacp.org/latest/naacp-sues-u-s-commerce-department-refusal-disclose-records-preparations-2020-census/ Since census data determine congressional redistricting and the distribution of funding for infrastructure and social services, the stakes of ensuring a fair and accurate count in 2020 couldn’t be higher. As a recent report noted, “undercounting will ultimately deprive historically marginalized communities of vital public and private resources over the next decade.”6 https://www.npr.org/2018/01/10/575145554/adding-citizenship-question-risks-bad-count-for-2020-census-experts-warn
We would also be remiss to not mention some crucial datasets for research and advocacy that have continued to go uncollected over the past year. There are a number of places where nonprofits and journalists have created datasets where the federal government has failed to provide dependable resources. FBI hate crime numbers are known to be far from complete, but the Southern Poverty Law Center’s hate incidents database and volunteer reporters attempt to fill in some of those gaps. Similarly, The Guardian created The Counted, a database of people killed by police in the US in 2015 and 2016, and Vice News assembled a database of all people shot by police in the nation’s 50 largest local police departments (it is also worth noting the barriers that Vice News encountered when requesting public information from many police departments). The existence of these time- and money-intensive parallel datasets speaks to the need for such data to be collected and openly available. While we advocate for continued access to existing public data, we can also ask that the federal government provide these in-demand datasets.
Fighting for openness
It hasn’t been a great year for transparency and openness at the federal level. However, we have seen a great grassroots effort to stand for open data and the use of data to improve the world. DataRefuge mobilized thousands of people around the world to download and preserve threatened data, and is working to document the ways data live in the world. The SSRC convened a group of humanities and social science scholars, data scientists, librarians, and archivists and produced the report Securing Social Science and Humanities Government Data with a number of key takeaways and initiatives. EDGI has been an inspiring example of what a thorough government watchdog can look like in a digital age. The PEGI Project has received funding to address concerns regarding the preservation of electronic government information. November’s Data for Black Lives conference at MIT brought together activists and scholars to discuss how to use data to make concrete and measurable change in black lives.
Currently, members of the Endangered Data Week team are working with the Mozilla Open Leaders program to develop an open set of curricula and training resources for others who want to study, teach, and advocate for publicly available data. As we build toward a culture of data consciousness, we will also work to encourage the use of our resources, guides, and training outside of academe. How, we will ask, can librarians, archivists, and scholars support the work of journalists, for example, who are interested in local crime data to look for patterns of discrimination or uneven enforcement? Finding, preparing, and using such data has an influence in the world of public policy and an informed citizenry as well. In the meantime, join us for this year’s Endangered Data Week and follow #EndangeredData on Twitter. Several of the Endangered Data Week events take place online, including the collection of Public Data Stories on Twitter. We also welcome participation in the Digital Library Federation’s Government Records Transparency & Accountability Interest Group.
References [ + ]
|1.||↑||Andrew Guthrie Ferguson discusses the issue of opaque, proprietary predictive policing technology and algorithms in this Data+Society Podcast episode.|
|2.||↑||See Rachel Mattson’s “Title 44 and the Uncertain Future of Free Public Access to Government Info in the US” and her August 2017 interview with Jim Jacobs on Title 44|
|3.||↑||See, e.g. http://www.commoncause.org/states/new-york/research-and-reports/the-count-starts-now-2020-census.pdf|
|4.||↑||See http://thehill.com/regulation/372445-citizenship-question-drives-uncertainty-over-2020-census; https://www.colorlines.com/articles/census-bureau-ignore-obama-era-recommendations-recording-race-ethnicity|