Weaponized data: How the obsession with data has been hurting marginalized communities

Hi everyone, I just came back from giving a keynote speech in Vancouver Canada, complete with pictures of baby animals. I am condensing the key concepts here. A couple of notes before we tackle today’s exciting topic. First, I want to thank my awesome colleague Dr. Jondou Chen for introducing me to the term “weaponized data.” If I ever start up an alternative rock band, I am going to invite Jondou, and we’ll call it “Weaponized Data.” Sample lyrics: “From the start/you returned begrudging correlation/to my foolish causation/like an icepick to my heart.”

Second, for the grammar geeks out there—and I am one—I’m going to do something blasphemous and use “data” as both a singular and a plural noun in this post, depending on context. I know, I know, technically “data” is the plural for “datum,” so we should be saying, “The data are inconclusive” and not “The data is inconclusive.” Kind of like “media” is the plural of “medium” and “panda” is the plural of “pandum.” But, fellow grammar geeks, we must choose our battles. Let us save our energy to fight, with patience and compassion, crimes against decency like “that time works for John and I” and “you were literally on fire during your presentation.”

So, data. Data is pretty awesome. As a proud nerd, I love a good set of data and can spend endless hours looking at a sexy chart full of numbers. If data were turned into a syrup, I would put it on my soy ice cream all the time, because it is just so sweet. In the past few years, there has been more and more pressure on nonprofits being able to produce good data. Getting more and better information on practices and outcomes can only be good for our sector.

However, like fire or Jager Bombs, data can be used for good or for evil. When poorly thought out and executed, data can be used as a weapon to screw over many communities. Usually this is unintentional, but I’ve seen way too many instances of good intentions gone horribly awry where data is concerned. Here are a few challenges we need to pay attention to regarding the game of data, which is a lot like The Game of Thrones, but with way less frontal nudity:

General challenges with data

The illusion of objectivity: Data is supposed to be objective; however, humans are subjective, and they collect and interpret data; therefore, there’s no such thing as objective data. Considering the wealth of data on climate change and the effectiveness of immunization, why do we still have deniers of global warming, and people who still refuse to vaccinate their kids? People, for better or usually worse, will find data that support their established positions, and ignore everything else. And those who create methods and instruments for data collection (such as standardized tests of student performance) are also affected by their own backgrounds and biases, often leading to flawed data that are then used to make decisions.

The delusion of validity: Since many nonprofits just don’t have the resources to gather robust, scientifically accurate data, and yet all of us are forced to gather it somehow, a lot of the information we gather is not really useable. We’ll say things like “95% of the students in our program increase their English proficiency by 25% or more, based on standardized pre/post-tests.” That sounds great, and we’ll throw that into a grant proposal, but deep down, many of us realize that’s BS, due to a host of confounding variables and biases (such as, for instance, the Selection Bias: maybe kids who are generally more motivated joined our program, and they would have done well regardless of whether they were in our program or not.)

The assumption of generalizability: People are so varied, and yet we have a tendency to assume that findings in studies can be generalized to everyone, and we make decisions based on those assumptions. Just because a study finds that looking at pictures of baby animals increases productivity, it does not mean this applies to everyone. They tested on one particular group, so it may be true for that group, but not necessarily true for other groups. So many studies do not focus on kids of color, for example, and yet assume that the results would generalize to them, and then decisions are made that affect these kids.

The dangers of simplification: The field that we are in is complicated, and there is severe danger when we try to simplify things too much. We lose out on the richness of our work, and we jeopardize programs that are effective in ways we may not be thinking about. For example, a kid drops out of an after-school program midway through the year. The program, of course, cannot count that kid toward meeting an outcome. However, who is to say what effects the program did have on this student. Maybe without even his short participation in the program, he would have engaged in some terrible stuff, such as dealing drugs or getting addicted to “True Blood” or something. How do we measure things like that?

The focus on the technical versus the adaptive: Data usually just reveal short periods in history, as longitudinal studies are time consuming and expensive. The risk of that is that sometimes we fail to see whole systems and ecosystems and how different elements affect one another. Solutions based on these data, then, may tend to focus on the short-term gains vs. systems change. For example, data saying that doubling down on math will increase math scores in low-income students, which leads to increase in math time in programs, sometimes at the expense of art, sports, etc. This does not take into account things like poverty, language barriers, the importance of “soft” skills, and other factors, or the fact that arts and sports may in the long-run increase motivation and thus overall academic achievement.

The focus on “accountability” as a way to place blame. As I talked about in “Why we should rethink Accountability as an organizational and societal value,” accountability has been about placing blame. It is an extrinsic motivation, and it has caused a lot of harm. An example is in the field of public education, where kids are forced to take endless standardized tests in the name of accountability and data. They are losing time they should be spent actually learning, and because of inequitable resources, the data are not accurate. Low-income schools and kids, for example, often don’t have access to computers at home; kindergarteners may never have touched a computer mouse before, and yet they’re taking tests on computers. Here’s John Oliver’s hilarious and also depressing take on the subject. Kids are so traumatized by some tests that they are throwing up during them, leading to some test makers to standardize procedures for how to handle booklets that have been thrown up on.

Weaponized Data

A very serious danger regarding data is when it is deployed without consideration for cultural and other competencies. Poorly thought-out data, unfortunately, is rampant in our sector. Again, I don’t think anyone has bad intentions when using data, but that does not prevent data from being used to cause harm. It has been done so numerous times in history, such as “scientific data” being used to perpetuate things like phrenology, eugenics, and Apartheid. Here are ways that I’ve seen data being weaponized in our sector:

Data is used to hoard resources, perpetuating Trickle-Down Community Engagement: If an organization does not have resources to collect data, then it does not have the data to collect resources. I call this the Data-Resource Paradox, and it mirrors the Capacity Paradox, when smaller organizations cannot get resources because it does not have capacity, so it cannot build capacity to get resources. Unfortunately, marginalized communities—communities of color, disabled communities, rural, LGBTQ, etc.—are left in the dust because they simply cannot compete with more established organizations to gather and deploy data. I have seen this repeatedly, recently with a City levy grant that forces nonprofits to have not just one, but two years of strong data before they’re even eligible to apply for funds that are, ironically, supposed to be going to low-income communities. Larger, more mainstream, and usually well-meaning, organizations get the resources, and unfortunately many do not have connections to the diverse communities targeted, so they “trickle-down” some of the funding to smaller organizations, perpetuating a vicious cycle. (Please see “Are you or your org guilty of Trickle-Down Community Engagement?”)

Data is used as a gatekeeping strategy: Similarly, I’ve data being used to prevent strategies from being deployed. In Seattle, for example, we have been talking about this education and opportunity gap for ages. It seems all these data-backed strategies have not been working to close this achievement gap for the past three decades. Yet, when community leaders advocate for trying something new, the response is often, “Yeah, that sounds good, but where’s the data proving that will work?” When I pushed for closer collaborations between schools and nonprofits, for example, with funding going equitably to nonprofits to do family engagement, another committee member smirked and said, “Show me the studies that prove funding nonprofit partners directly will lead to results in school.” Dude, what I’m proposing MAY not work, but what you have been supporting HAS not worked. Yet, because he was able to pull up studies faster than I could, he was able to sway the rest of the group.

Imperfect data is used as a convenient way to make tough decisions. For example, a local university decided they would discontinue a staff position focused on recruiting Asian students, stating that the data shows that there are enough Asian students, so there’s no need to focus on recruiting them. However, when the data is disaggregated, it shows Southeast Asian kids were underrepresented. The data, flawed as it was, was a convenient way for the school to make a decision and cut down on costs. (Luckily, the community banded together and pushed back, using the disaggregated data, and the position has been reinstated to focus specifically on Southeast Asian students).

Data is used to pathologize whole communities: As Dr. Jondou puts it: “There is a dangerous pattern of behavior that emerges for researchers and consumers of research looking at data comparing groups. The first time we look at the data, we see a difference between groups. The next we look, we see a problem. Then we look again and we assign responsibility. Finally, we pathologize entire groups of people.” Look at this study published in the Washington Post: “Researchers visited the children’s homes twice: when they were nine months old and again when they were 2 to 3 years old. On the first visit, the researchers assessed babies’ ability to manipulate simple objects, such as a rattle, and use and comprehend words; on the second visit, they assessed the toddlers’ memory, vocabulary and basic problem-solving skills.” From this, they concluded: “The research suggests that prekindergarten may be too late to start trying to close persistent academic achievement gaps between Latino and white students.”

Rochelle Gutierrez of the University of Illinois at Urbana-Champaign talks about this “Gap-Gazing” as a fetish many researchers have, and a harmful one that offer “little more than a static picture of inequity, supporting deficit thinking and negative narratives about students of color and working-class students […] and promoting a narrow definition of learning and equity” and proposes “a new focus for research on advancement (excellence and gains) and interventions for specific groups.” Why exactly are White students the gold standards for all kids to aspire to, when they all have different strengths and needs?

Really, testing kids at 9 months old and then when they are 2 or 3 and concluding that one group is definitely deficient and that it’s hopeless for them even before kindergarten? No one disputes the fact that there are differences between different groups of kids. But the conclusions place the blame and responsibility on 2-year-olds and their parents instead of looking holistically at all the systems that affect families, including poverty, education funding, curriculum, different learning styles across different cultures, etc.

De-Weaponizing Data

The emphasis on data has been both good and bad. When used right, data, like fire, can be used warm and illuminate. When used wrong, it can burn whole communities:

Consider contexts and who is driving the data: The problem of people not from communities affected by communities making decisions for those who are is very prevalent in our field, and the work around data is no exception. Who created the data? Was the right mix of people involved? Who interpreted the data? The rallying cry among marginalized communities is “Stop talking about us without us,” and this applies to data collection and interpretation.

Pay for data, especially data created by communities most affected by inequity: Getting good data takes resources. Expecting nonprofits to produce high-quality data on a small budget is futile and distracting. Nonprofits must pay for data, and funders need to support it and take some risks, especially with smaller nonprofits that may not yet have the track record on data collection. If you require data, pay for it. (Same goes if you require financial audits).

Pay for more than just data: Thanks to the push for data, we’ve been seeing lots and lots of shiny data reports. Unfortunately, few people in the field have time to read these reports, much less put the information to actual use. As I mentioned in “Capacity 9.0: Fund people to do stuff and get out of their way,” reports and toolkits and data-dissemination summits are useless unless the people in the field are there to put them to use. Fund people.

Disaggregate data: As discussed above, it is easy to lump myriad different communities into fewer categories, but the loss in accuracy is not only frustrating, it is extremely damaging and inequitable. Where you can, be thoughtful about the categories you use to organizing groups of people. It makes a difference.

Combine data with community engagement: Data alone, considering how flawed it is, is not enough to motivate communities affected by inequity. I’ve seen whole efforts skip community engagement efforts on the belief that the data is strong enough to convince everyone to get on a bandwagon. Those efforts usually fail.

Redefine what constitutes good data: Oftentimes, it is not that marginalized communities don’t have the data, it is just that the data they do have does not conform to this mainstream definition of what data is or how it should be presented. So a study with t-tests and Pearson r’s and stuff is considered “good” data, but testimonials from dozens of people directly affected by issues is considered less desirable qualitative data? The definition of data, as well as of “capacity” and “readiness” and other concepts, has been perpetuating inequity and needs to be changed.

Re-examine comparison groups: Are they really necessary? Why is one group held up as the standards for all other groups? Instead, focus on individual groups’ intrinsic strengths and challenges and growth. Comparisons may be needed, but it’s important to be thoughtful about it to avoid oversimplifying things and falling into deficit mindsets.

Let me know your thoughts. I have to run. I’m inspired to write more lyrics for this band I might form.

***

Make Mondays suck a little less. Get a notice each Monday morning when a new post arrives. Subscribe to NWB by scrolling to the top right of this page and enter in your email address.

Published by Vu