Skip links

Differential Privacy and the Upcoming Process of Redistricting

Dear Readers: Next Thursday, June 10, University of Virginia Center for Politics Director Larry J. Sabato will hold a Crystal Ball webinar from 1 p.m. to 2 p.m. eastern time. He’ll be discussing the continuing fallout from the 2020 election, the 2022 midterms, and much more. You can tune in for free at https://livestream.com/tavco/sabatoscrystalballjune2021.

The webinar is part of the UVA Alumni Association’s Reunions Remixed, which is being held virtually June 9-12. See here for more information on Reunions Remixed. You do not need to sign up for Reunions Remixed to watch Thursday’s Crystal Ball webinar.

In today’s Crystal Ball, UVA President Emerita Teresa A. Sullivan and Qian Cai of UVA’s Weldon Cooper Center detail an important but perhaps overlooked change in the 2020 census — the use of “differential privacy” — that could become a significant challenge in redistricting.

The Editors

KEY POINTS FROM THIS ARTICLE

— The U.S. Census Bureau is required by law to protect the confidentiality of census respondents.

— The bureau is using a new method called “differential privacy” as part of the 2020 census to fuzz up the data in order to prevent individual respondents from being potentially identified.

— However, the use of differential privacy may cause problems in the upcoming redistricting process by injecting inaccurate information into the granular census data required to draw districts of equal sizes and to ensure fair racial representation.

Differential privacy creates challenge for redistricting

The U.S. Census Bureau is charged under Title 13 of the U.S. Code to protect the confidentiality of census respondents and to ensure that their data remain private for 72 years. Since 1850, the Census Bureau has implemented privacy measures in every decade, but the 2020 census provides new challenges in ensuring privacy. In particular, new efforts to secure individual privacy in the released census data create a tradeoff between privacy and accuracy that is problematic for redistricting.

Because of the large number of commercial databases, social media sites, and other sources of digital information, analysts have access to many sources of individual data besides the census. Even some government records that are subject to Freedom of Information Act requests may contain such information (such as driver license records and voting records). This greater availability of data, along with high-power computing, pose the possibility of reverse identification: that is, a person armed with publicly available data may be able to identify a unique individual respondent in the census data. The Census Bureau considers this possibility an unacceptable risk given its responsibilities under Title 13.

The solution the bureau intends to use is called differential privacy, a technique developed by data scientists and described in the literature.[1] The bureau is using differential privacy in such a way that only state population totals remain intact, but any populations below the state level (say, for a town, city, or county) and the characteristics of the individuals within the jurisdiction are changed with “noise” injection to fuzz up the actual data. A variable called ε measures the level of noise that is injected into the data. A higher value of ε indicates less loss of accuracy; a lower value indicates more loss of accuracy.[2] The noise injection, even with the highest ε value the bureau has applied, results in many cases not only in less accurate data, but in inconsistent or even illogical data.

The state populations for the apportionment of Congress, which have already been released, are not subjected to the differential privacy process. Differential privacy is not needed for these data, whose use is to document “the whole number of persons in each state” as required by the 14th Amendment to the U.S. Constitution. Neither voting-age population nor race/ethnicity information is required for reapportionment.

The data for redistricting are a different story. The Census Bureau plans to use differential privacy when these data, called Public Law 94-171 redistricting data, are released. The release date is currently Aug. 16; the statutory deadline of March 31 was missed because of the effects the pandemic had on completing the original count and quality checks of the data. Interested parties to redistricting are typically concerned with total population, voting-age population (18+), and race and ethnicity of these populations in small geographic areas, such as census block or census block group. Race and Hispanic origin are known to be associated with party affiliation and voting behavior. All these variables will be injected with statistical noise under conditions of differential privacy.

Just how much accuracy will be lost in redistricting? On April 26, 2021, the Census Bureau issued a test dataset for analysts. This test dataset consisted of the 2010 census redistricting data as originally issued, and then, for comparison, the same data, only using differential privacy. Because the census block, which is roughly the size of a city block, is the basic geographic unit for redistricting, the comparisons typically focus on the census block level.

The Demographics Research Group of UVA’s Weldon Cooper Center analyzed the test data for Virginia and found significant inaccuracy at the census block level. Some findings are listed below:

— Nearly a quarter of the census blocks had a population change of more than 10%.

— Nearly 2,500 census blocks had only children (ages 0-17) but no adults (ages 18+).

–Populations in 1,255 census blocks were completely erased to 0.

In addition, beyond census blocks, we identified significant outliers at larger geographies as well. For example:

— Among Virginia’s towns, the number of Black or African Americans was found to be inflated as high as nine times, American Indians four times, Asians four times, Hispanics six times, and More Than Two Races 16 times. On the flip side, these various racial/ethnic groups were found to be completely erased to zero in various towns, representing a 100% reduction.

— Among Virginia’s counties, Black or African Americans can be reduced by 100% (meaning entirely removed), American Indians by 62%, Asians by 56%, and Hispanics by 33%.

Is differential privacy a fatal flaw for redistricting? Some analysts are not too concerned because the very commercial databases that prompted the use of differential privacy are also available to redistricting commissions and committees. Voting records, for example, may provide the type of information that redistricters need. But redistricting still relies on the census counts to ensure equal size of the districts and fair racial representation.

Others have sounded the alarm, most notably Alabama, which has filed suit in federal court to prevent the use of differential privacy in the data released for redistricting.[3] Sixteen state attorneys general of both red and blue persuasions filed an amicus brief in support of Alabama. The fundamental argument Alabama makes is that the Census Bureau will “provide the States purposefully flawed population tabulations…. [the Bureau] will force Alabama to redistrict using results that purposefully count people in the wrong place.” The filing alleges that the decision to use differential privacy was arbitrary and capricious, and a violation of the Administrative Procedure Act as well as a violation of the Census Act and the due process and equal protection rights of the plaintiffs.

Whatever decision is made in the Alabama case, however, social scientists who analyze census data, especially for small geographic areas, will find the issue of differential privacy a recurring concern in their analysis.

Notes

[1] Sullivan (2020) reviews the background of differential privacy. Teresa A. Sullivan, Census 2020: Understanding the Issues (Springer), pp. 78-80.

[2] Jaewoo Lee and Chris Clifton (2011). “How Much is Enough? Choosing ε for Differential Privacy”. IN: Lai X., Zhou, J., Li H. (eds) Information Security. ISC 2011. Lecture Notes in Computer Science, vol. 7001. Springer. https://doi.org/10.1007/978-3-642-24861-0_22.

[3] Complaint in Alabama v. U.S. Dep’t of Commerce, No. 3:21-cv-211 (M.D. Ala.), available at https://www.brennancenter.org/sites/default/files/2021-03/Complaint_%202021-03-11_0.pdf. The Court has approved the motion for a three-judge panel. As a result of the three-judge panel, the eventual decision could be appealed directly to the U.S. Supreme Court.

Teresa A. Sullivan is President Emerita and University Professor of Sociology at the University of Virginia. Qian Cai is the Director of the Demographics Research Group at the Weldon Cooper Center of the University of Virginia.