Data Democracy

Big Data and the pitfalls of targeted campaigning

by Angie Kim

Illustration by Grace Zhang

published November 9, 2018

At a typical phone banking meeting, volunteers cluster around a table or two. They all go to the website that hosts their campaign software and enter the system using their unique volunteer logins. The leader of the meeting then provides an ID for volunteers to feed into the system, which allows them to access the list of people to be contacted. The list is automatically constructed within the database to pull together voters whose data places them in specific groups that the campaign wants to target, such as undecided young voters. Each volunteer’s first voter file pops up, which reveals someone’s phone number and relevant information about them such as their age and other members of their household. Shown next to their information, an accompanying script tailored to the list remains consistent throughout the session.

And so the calls begin. Scattered throughout the script are dropdown menus from which volunteers select options based on responses from the call. Each conversation—if everything goes well—usually ends with the classic ask: “Can we count on your support this election?”

With the widespread implementation of data-driven political campaign strategies, data collected from these canvassing efforts has become increasingly crucial. Yet with the advent of advanced technological tools and more ways to collect personal data, the broader implications of using these to influence the public should be questioned.




Public relations campaigns play a massive role in shaping our patterns of consumption, and with the rise of Big Data—which the Oxford English Dictionary defines as “Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions”—these techniques have grown exponentially in sophistication. No longer do advertisers have to appeal to the average consumer in order to sell; they’re now able to understand and target niches with an unprecedented level of granularity. Political campaigns have similarly evolved: Our hearts and minds have been treated as a marketplace, with campaigns competing to sell us platforms in exchange for our votes. Though this strategy has met varying success, it has become widely implemented by the Democratic and Republican establishments alike since it was piloted in Obama’s 2012 campaign.

The use of Big Data has caused a massive shift in the way campaigns are run. In the past, campaigns vied for political support through traditional media outlets, such as television and radio. Though political messaging was widespread, its reach was inherently limited by its medium, since messages could only be shared to an indiscriminate audience. In the same way businesses had to advertise their products to the average buyer, candidates’ best strategy was to appeal to the average, undecided voter in an attempt to swing their support.

This is no longer the case. Since the advent of data-driven campaigns, appeals to the masses have largely been replaced. Instead, the predominant strategy for candidates has become identifying their base and mobilizing them to vote. In the US, both parties make use of tools that allow them to do this. Democratic campaigns use NGP VAN, a software that provides “websites…organizing, fundraising, compliance, and digital tools” for progressive campaigns; Republicans have begun to use the services of a rival company, NationBuilder. These software-as-service companies are the leaders in campaign data management and are used by campaigns across the United States and around the world.

The databases hosted on these services are, in their most basic form, contact lists of individuals to reach out to about the candidate’s platform. But in addition to knowing your address and party affiliation, campaigns are able to build a detailed profile that includes your interests and past interactions with not just the candidate themselves, but with the party with which they’re affiliated. The Canadian Broadcasting Corporation reports that political campaigns in the US also have access to “income, number of homes, kinds of vacations, number of children and education history” and can purchase consumer data.

Using this information, voter database software creates predictive scores determining behavior, support, and responsiveness. Behavior scores indicate the likelihood of a voter taking certain actions, such as donating, volunteering, and going to the polls. Support scores reflect how much the individual would support the candidate. Responsiveness scores represent whether someone will react to outreach favorably, negatively, or whether they’ll respond at all. A combination of the three creates segmentations in the voter database for issue-specific email blasts to be sent to, or lists of people for volunteers to call.

Furthermore, with rapid technological advancement, the sophistication of machine learning models has accelerated over the past few years alone. As exemplified by the notorious Cambridge Analytica scandal, this has allowed political campaigns and consultancies to go even further. Rather than using only the information found through public voting records and data collected by people, campaigns can take advantage of advanced psychological models created using machine learning techniques that have only recently become possible through increased computing power. Cambridge Analytica uses individuals’ data to score them on their “openness to experience, conscientiousness, extraversion, agreeableness and neuroticism.” Profiles generated from these models broaden the ways in which campaigns can reach the electorate; for example, online advertisements can be targeted to individuals whose browsing history classifies them as likely supporters.




The spread of campaign strategies that rely on such an intimate “understanding” of voters can actually harm democracy in two important ways.

First, the proliferation of data-driven campaign strategies creates additional resource disparities between candidates that can discourage potential contenders from running. Most notably, the Democratic National Committee has formally partnered with NGP VAN, which means that the party decides who can and cannot receive access to the company’s different platforms. The VoteBuilder system—which Wired Magazine called “the central nervous system of every Democratic campaign”—hosts the unified database to which all state parties have access and which holds all of the voter data the party has accumulated over nearly 10 years. However, in states such as Illinois and Missouri, state party policy dictates that candidates running against incumbents do not get to share the wealth of data that lives in that ecosystem. These candidates can get access to other NGP VAN tools to host data they collect over the course of their campaign, but are at a clear disadvantage without the more sophisticated datasets of their rivals.

Such a policy clearly privileges those already in power. Anthony Clark, a Democratic primary candidate for Illinois’ 7th congressional district expressed this frustration: “What’s one more way you can stack the deck against me? Deny me access to valuable information and data.” Clark ran as a Justice Democrat, part of a coalition inspired by Bernie Sanders’ 2016 presidential bid that hopes to push the Democratic party to the left. Many of these candidates were denied access to the party’s troves of data. Sanders himself was embroiled in a data-centered scandal in December 2015 when his campaign mistakenly received access to Hillary Clinton’s voter database; in response, the DNC shut his campaign out of NGP VAN entirely.

These candidates often choose to run under the Democratic label in the first place because they foresee difficulties in running without a mainstream party name attached to them. Yet when state party leadership—which largely supports the establishment—wields the power to choose who does and does not get data, it discourages those who don’t fit into the mold from throwing their hats into the ring at all.

There already exist massive financial barriers for candidates who are challenging party establishment; the added challenge that comes from lacking data, however, is that it’s nearly impossible to get even close to the same level of sophistication by starting from zero. As data becomes currency for political power, incumbents are the richest, leaving challengers knocking on doors with no sense of who might be on the other side.

More insidiously, however, massive amounts of data on the electorate have enabled a phenomenon that parallels market segmentation. The splitting of messaging to cater to highly specific groups further commodifies public opinion as something to be bought and sold, rather than something that is formed organically. Today, mainstream campaigns with rich voter data opt to communicate with individuals whom they’ve algorithmically determined to support them, rather than appealing to broader groups. Campaigns are now putting greater attention on converting existing support into votes than on expanding their base, under the assumption that costs are better spent on the former.

Whether this has contributed to or is a result of political polarization is unclear, but this practice effectively limits public discourse by reinforcing already-entrenched beliefs among voters, rather than creating productive conversation on important issues. Compounded with the rising ideological insularity of social circles and online spaces oft-criticized by political figures, data-driven outreach practices and advertisements have become an ironic blow to democratic principles. In an interview with Vox, Yale political science professor Eitan Hersh agreed that campaigns are “not looking to persuade people with ideas or arguments; they're just identifying the people most likely to vote for them and excluding everyone else.”

This is not to say that it’s inherently bad for candidates to know more about their constituents. The wealth of data owned by campaigns can reveal meaningful insights about voters’ priorities that genuinely lead to better representation of the electorate once a candidate is in office. And given the reality that data is driving the future of decision making—though this should certainly be interrogated, too—it would be foolish to suggest that campaigns stop using the data readily accessible to them. Yet we will never enact broader societal change if campaigns continue to target only those who are committed to them. After all, what would happen if nobody could change their mind?


ANGIE KIM B’20 is from Canada.