I came from interpreting American Sign Language in medical encounters. Interpreting is regarded as a practice profession, in line with lawyers and doctors where there are both Codes of Ethics and Standards of Practice. I was active in medical interpreting when the Certified Healthcare Interpreter credential started.

Despite some recent gains the current state of ethics in data is still very early in its evolution. In 2020, leading computer scientists Timnit Gebru and Margaret Mitchell ceased working for Google after pointing out biases in large NLP data sets. Ethics in practice forces uncomfortable conversations.

Gebru is famous for research that includes Data Sheets for Data Sets (2018) and founding Distributed AI Research Institute (DAIR)Data sheets make clear what a data set does and does not include, as well as biases, much as Material Safety Data Sheets that accompany hazardous chemicals communicate the handling risks to downstream users. The processes, roles, and policies that make up data governance also act as techniques for the proper handling of data. 

Franken-proofing your data
As we are still in the early days of comprehensive data ethics, our conversations tend to be inconsistent. Addressing systems that perpetuate discrimination based on identity and social stratifications requires getting uncomfortable. It’s not a matter of if algorithms and data-derived solutions contain bias, but what biases against which populations, and to what effect? 

I call the pursuit of this kind of inquiry “Franken-proofing” your work. Like Doctor Frankenstein, it’s easy to see our creations for all the good they can create, but we need others to highlight where such things can go awry. We can take in this cautionary information and iterate on a workable solution where those affected have a voice. Iterating is a core tenet of design and development, and an essential one in order to realize broader sociological impacts with data.

Data bias is a broader codification of the discriminatory systems we already have, but surfacing these types of data ethics issues requires work. Credit scores have long been an example of this. Mortgages count towards a higher score, but rents do not. To get credit started, one usually must rely on someone else or go through a long, painful process to build it. This reality further perpetuates the cycle of poverty and inhibits upward mobility.  

In healthcare, kidney transplants are one of the most storied examples of data discrimination. EGR cutoffs, a measure of kidney function used for transplant recommendations, are calculated differently based on race. Black patients end up being diagnosed later, when the disease is more severe. As a result they wait longer and die far more often. The National Kidney Foundation, UNOS, and other organizations are demanding change. These decisions have an impact on the society we create.   

Data literacy is a data ethics issue
When we restrict data learning to colleges and specific work environments, we cut people out of access for what is becoming a fundamental skill to navigate the world. Data literacy isn’t only navigating charts, but being able to navigate the entire ecosystem that creates and leverages data.

I bought a car recently and the dealer was woefully unequipped for my questions around data creation, storage, and use. Modern cars function more like devices than traditional automobiles. This reality has downstream ramifications on informed consent, a central tenet of industries like financial services and healthcare. But how can I make an informed choice if I can’t navigate the data around it?

5 things data leaders can do 

Here are a few data ethics principles, tips, and reminders that ethics-minded data leaders can follow in their daily practice:

  1. Recognize data as a collected artifact. By its very nature, data is biased because someone chose what to collect. Objectivity ends up meaning that the viewpoint doesn’t upset the most empowered and advantaged group, often at the expense of other groups. 
  2. Remember that data, like interpreting, is fuzzy around the edges. When societies prize hard sciences, this haziness is hard to appreciate. Simple math adds up. Data, however, doesn’t fully capture or expose the whole picture. We must recognize these limitations and build with them in mind. 
  3. Use new tools to guide your efforts. You can deploy Data Sheets for Data Sets (from Gebru et al), vet your work against a playbook like that from Ethical OS, and create models to identify how (not if) your data is biased. We also must recognize the undue burden this work ends up putting on those most affected. Ayodele Odubela has discussed openly the impacts of writing a data ethics book covering the effects of these biases as a Black woman in tech. Data ethics must become a team sport that recognizes where harm is being done and actively work to stop it.
  4. Get involved in advocating for data oversight. A formal body organizes its members. It decides who is in or out and provides avenues towards training, so that over the long term, standard qualifications can be defined. For lawyers, this is passing the bar exam. Interpreters have chosen various iterations of certifications. This can be as broad as possible, or narrowed to specific industries. Data & Society, Data Visualization Society, and others have started this work. Entities like these are the ones that will eventually push for legislation in ways individuals alone cannot.
  5. Read, take courses, and listen. Look to courses that should push you out of your comfort zone. Ethics sit between laws and morals and represent decisions you make on someone’s behalf. That is a trusted role not to be taken lightly.

Bias bounties are absolutely a start towards realizing the need to vet with a broader base of users. For solutions to truly be inclusive and less biased, however, data creators need to fully reflect their user base. These types of competitions also intersect with broader sociological conversations about the nature of work. Are the competitions outsourcing what could and should be a full-time role or even a department? The answers to these questions have broader sociological implications that I suspect we’ll see start to play out in the near future.