What are the proper ethics in data collection and mining?

This is a loaded question from almost any angle but one that organizations of all types must grapple with to remain relevant, credible – and in more and more cases – on the right legal footing. 

As someone with a background in bioethics and training in population genetics, data is deeply embedded in my work. That’s especially the case right now, since I also serve as a co-founder and ethics and policy director at the Native BioData Consortium, which is a data repository for tribes. 

All this work has opened my eyes to data mining ethics as a concern, especially when you consider the collection, commercialization and co-optation of genetic data. Traditionally, ethicists have not liked to ascribe a commercial value to biologically intrinsic entities such as DNA. Yet access to DNA and other medical information is bought, sold and traded by pharmaceutical and genomic information companies. This puts the onus on us as data leaders to start thinking in more sophisticated ways about the ethics of collecting and mining data – not only in genomics and medicine but across industry settings.

From data ethics to data justice

For me, data ethics is about ensuring that the people who generate or retain ownership of data also retain decision-making authority over what happens to their data. Unfortunately, this retention of control doesn’t happen enough. Why? I fault a tendency we have as data professionals to disentangle the human elements of data. Once data becomes supposedly “de-identified,” it becomes nonhuman subject data. Therefore this data can be harmonized, mined and operationalized while considered as a nonhuman element – as if this redefinition removes the risk related to the data itself.

I believe it’s time to become more assertive in posing new questions about the commercialization of data. This includes asking about data ethics in terms of data access and data justice. In many cases whole communities, particularly underrepresented communities, must have a voice in how their data is used, especially if inferences are being made that exclude, devalue or harm them. This is very much a reality in communities of color, which historically have been disenfranchised by suppositions and inferences about data.

A more ethical foundation for data collection

Web 2.0 essentially runs on data. As a result we’ve become inured to hearing companies say, “Well, we need more of your data so we can tailor these products for you and your experience.” In fact this request is often about collecting and mining customer data to gain as much insight as possible so companies can commercialize and package it for their own profit. One push-back to these practices is a recent trend where data privacy is being highlighted as a benefit of doing business with a company. This extends from tech giants like Apple to GDPR in the EU, which gives individuals power over the use of their personal data and holds data collectors at organizations accountable for their collection and usage practices.

As an Indigenous person, and also as an advocate for my community, I see a lot of interest in Indigenous data, especially for commercialization and cooptation. As a result of the past few decades of experience, we have become increasingly wary of “for-you” arguments where an entity states, “Oh, we can do this for you as a benefit of our service.” In my experience these benefits rarely materialize. In fact, the first entities that benefit from data access are usually not the communities that have provided that data to begin with. 

Just as the data justice movement that drives Native BioData Consortium requires building the capacity from within to grapple with ethical questions, the ethical use of data in any organization requires expertise from those trained in ethical practices. Finding the right skills may mean asking foundational questions of yourself and your organization:

  • What or who defines a data ethicist? 
  • What level of training is adequate enough to constitute a data ethicist: Master’s? PhD? Post-doc level? 
  • Should a data ethicist be trained in applied ethics as well as normative ethics? 
  • Under what circumstances does a company get to use a label to brand themselves as an ethical data company or an ethical data entity?

What we really need to be doing as data leaders is creating ethical guidelines that ensure power flows back to the people from whom that data is being extracted. Getting there is a function of helping those we serve to understand the benefits as well as the dark side of data collection and mining.

However, this can only happen with greater data literacy, and data literacy can only drive higher ethical standards if there is transparency and knowledge about industry practices. My point is data literacy can be used to empower or exploit depending on the ethical practices of the user or organization. It’s up to all of us as data leaders to ensure that we ask ethical questions as often as we ask data questions.