“When we launched the first IndigiData workshop to empower the next generation of Indigenous data scientists,” says Krystal Tsosie, a Navajo geneticist and bioethics scholar at Vanderbilt University, “we were tapping into a line of inquiry about Indigenous peoples as stewards of their own data that many Indigenous scholars before us had initiated. 

“The core idea, which can still seem a surprise to many, is that Indigenous peoples have sovereignty over their data. This may be a hard concept for people who didn’t grow up in these cultures to understand, or who haven’t been exposed to the history of Indigenous sovereignty in the US and other countries, but it’s a cornerstone principle.”

Therefore, the idea of integrating Indigenous data sovereignties into the Big Data Era in the right way is of particular interest to Tsosie and Oliver Bear Don’t Walk IV, both co-organizers of IndigiData. Especially today, as data scientists are moving to incorporate data from underrepresented populations into their studies, how can they do this in an equity-centered fashion?

Speaking the Same Language

Clarifying the concepts of inclusion and equity, which are often incorrectly conflated, can help. It’s critical to make a distinction here, because even in Diversity, Equity and Inclusion conversations, inclusion is often used synonymously with equity. But they are different. “Increased inclusion in datasets for different groups is not something new,” Tsosie says, “It's always been called for. But inclusion without equity, as in the case of the Human Genome Diversity Project (HGDP), means that many global Indigenous populations were included in datasets but afforded no equity after the fact.  

“As we have seen in the 20 years since HGDP, openly available biomarkers from Indigenous peoples are now being used by commercial entities for the creation of Indigenous-specific products for the benefit of these companies,” she adds. One DNA company in this group was set to go public earlier this year in a so-called SPAC (Special Purpose Acquisition Company) merger that values it at $3.5 billion. Yet none of this company’s profits are translating to benefits for Indigenous peoples. There are many other examples in recent history.

“So yes, we do need to increase inclusion of Indigenous genomic and medical data in order for research to benefit us,” Tsosie says, “But we’re calling for increased equity as well. That means more decision-making authority about what types of Indigenous data are included in datasets, how much of a say we get, and how the positive outcomes of this research can be directly felt in our communities.”

Why Genomic Data Shifts the Focus to Communities

Genomic big data studies that include Indigenous peoples also beg clarification between the individual and the community or group, according to Tsosie and Bear Don’t Walk. In most bioethics work, the text brought to the fore is The Belmont Report of 1974, which laid out ethical principles and guidelines for the protection of human subjects of research.

“This report was an important foundation document in bioethics, but it is generally focused on individuals being studied, not on the community from which they come,” Bear Don’t Walk says. He adds that since the Human Genome Project was completed in 2003, the ability to infer information about all of the biological kin of each test subject changes the risk dynamics fundamentally. 

With consumers, data consent is still generally centered on the individual, related to their use of the technology. However, with genomics, the data doesn’t just incur risk on that individual, but also everyone that they’re related to.

Although many people might be quick to agree to a system that collectivizes and de-anonymizes their genetic data if it contributes to catching a killer, for example, it’s important to realize that this is just one potential use of the technology amid myriad ways in which civil liberties and privacies may be abused, the scholars say. We must also consider the possible misuses in widening injustices related to racial profiling in the legal system. The ends, in other words, do not always justify the means.

For instance, this risk becomes elevated when you’re talking about members of identifiable small populations, like Indigenous peoples, in which both researchers have a lot of biological kin. “Our family sizes are larger, our generation gaps are smaller,” Tsosie points out. “Therefore, the re-identification risk to a large subset of our community is greater and disproportionate compared to the dominant population, which doesn’t have these same biological kinship ties. Racial profiling and genomic racial profiling are escalated and disproportionate for our communities. That’s something we have to be careful about.”

Both scientists note that this shift should encourage all of us to expand our ethical thinking from the individual to the community, how researchers’ use of the information could have an impact on the community, and how that community gets a say in what stories are told and how they are told. This is particularly important in Indigenous communities, which govern using a communitarian set of ethics that differs from Western senses of autonomy and individualism. Several Indigenous academics and geneticists currently are hard at work trying to change these types of distinctions and these definitions.  

The Power of Community-Based Research

A good place to start when researching an Indigenous community is making sure that you include people from the community recognized as experts just as much as scientists, Bear Don’t Walk says, starting with the design of the process all the way through the interpretation of the research. “Community representatives are experts, and they should be co-leaders when it comes to designing experiments, collecting data, analyses, and the entire process of publication,” he says. “These practices are a good way to start incorporating communities and making sure that this whole research process is much more beneficial to them.”

There are different models of what data governance can look like within tribal constructs based on recent biomedical history. These models range from tribal trust partnerships, in which the research and the data are being governed by an outside entity, usually on behalf of indigenous people, to a concept called Tribally Driven Research, or TDR.

TDR was first coined by Dr. Eddie Brown, a Professor Emeritus in American Indian Studies at Arizona State University. It exists somewhat ideally, since it means that the tribe has 100% agency, autonomy, and authority over all aspects of data governance. “So far we really haven't seen many examples of those studies, because we’re still training the Indigenous peoples to have the skillsets to participate at this level,” Tsosie says. “It also entails empowering tribes to create those agencies and processes for themselves, and we’re still working on that. We should see progress in the next few years.”

Representation is the Beginning of Ethics, Not the Endpoint

Representation is a crucial part of better ethics and equity, but it doesn't solve for everything, Bear Don’t Walk notes. “If you have five or six Indigenous software engineers working at a Big Tech company, it’s unlikely that the company would suddenly start making much more ethically informed or culturally relevant decisions when it came to Indigenous people working with their products.”

At the end of the day, the importance of Indigenous data must be stated explicitly and publicly. “Unless data-driven companies can say, ‘We really want to be informed by Indigenous leadership and ethics on how to work with these populations and our field of data science or computer science,’” says Bear Don’t Walk, “nothing is likely to change.” It’s by doing the hard work to ensure these people have a voice at the table that starts to create positive change.