Four Principles of Data Science Ethics at Retina

Does your data science team worry about aligning your work with your values? Data science and machine learning provide new capabilities, and with those come new ethical responsibilities.

At Retina, we developed four principles of data science ethics to make sure we serve our clients and their customers with respect and fairness. Our principles are informed by professional guidelines* that have been refined over decades.  Contemporary experts like former US Chief Data Scientist DJ Patil also have inspired our principles.

1. Protect Individual and Company Privacy

Over 5 billion records of personal information were exposed through data breaches in 2019, costing businesses trillions of dollars in the resulting scandals. People are beginning to take more aspects of their daily lives online, ranging from sharing what’s for lunch on social media to more sensitive information like managing bank accounts and credit cards. As a result, hackers have much more to gain by attempting to access a company’s records about its customers and prospects.

The other side of the coin is that each of these data breaches can cause personal harm—like identity theft—to real people. It’s important to have comprehensive plans to protect and secure user data. But, it’s also critical to understand that flaws in these plans are always a possibility. So, developing a mechanism for redress if people are ultimately harmed by the results or implementation of a model is critical. It’s also required to comply with regulations like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR).

2. Account For and Remove Unfair Data Bias

Data breaches aren’t the only way models can cause harm—models based on biased data can make unfair decisions about people’s opportunities, livelihoods, and more. In one example, an unsupervised model for determining whether a person should qualify for insurance at a low rate or not took the applicant’s race into account.

Some unsupervised machine learning algorithms operate like a black box. So, it’s difficult to know off the bat if the resulting model discriminates based on characteristics like race or sex. That’s why we conduct research in advance to identify possible bias in the data that will serve as the basis of the model. We also test models for fairness and disparate error rates among different user groups after they’re complete. This serves as another layer of security against unfair bias.

3. Ask About The Use Cases of End Results

When building a model and shipping its results, we always consider both the people using the insights and those affected by the results. We are always sure to work in the best interests of both groups. We have open conversations with our clients about their intended use cases for the results we ship so we can serve them and their customers ethically. At a glance, it might seem challenging to think of an example where the interests of these groups would diverge. But, some political campaigns use manipulative advertising strategies to sway voters to behave differently than they would if served honest marketing campaigns.

At Retina, we’re in the business of helping our clients better understand the individuals who make up their target markets. This empowers them to optimize their marketing campaigns, whether that’s for attracting new or retaining existing customers. We help our clients hone their customer acquisition targeting to acquire customers who are likely to spend more. Ultimately, this allows them to spend less of their advertising budgets on people who aren’t interested in purchasing.

That means their customers receive more relevant marketing material, and fewer ads for products they would never want to buy. To the best of our knowledge, the outputs of our models do not harm our clients or their customers.

4. Ship High Quality and Accurate Models

Clean data and solid methodology are equally important for building and implementing accurate, high-quality models. Making predictions about customers based on inaccurate lifetime value models can lead to setting marketing budgets that are out of scale with the revenue a customer is likely to generate for your business over their lifetime as a customer. Significantly, inaccurate models can also pose ethical concerns.

We hold ourselves to high internal standards for how trustworthy the results of our models must be. We refuse to deliver models or insights that don’t make the cut. This responsibility doesn’t end after a model is in production—we test for model drift to ensure our software remains fair over time. We always think ahead about how to shut it down in production in the unlikely event it starts behaving badly. In most cases, issues that arise do not require a full stop in production to resolve. But, it’s still our policy to be fully transparent with our clients if we discover issues with our results and work together to resolve them.

How Do We Implement Our Data Science Code of Ethics?

Our commitment to transparency with our clients helps us stay accountable. But, our employees are the greatest safeguard of our principles. We include ethical case questions during our interview process. We also focus on the importance of these tenets during our onboarding process. That means every member of the Retina team commits to delivering trustworthy results to clients.

Are you interested in an ethical partner to bring the power of data science to your organization? If so, please feel free to explore our solutions further and contact us today.

Professional Guidelines Referenced*

  • Code of Ethics – Association for Computing Machinery (1993)
  • Ethical Guidelines – American Statistical Association (1983)
  • Ethics and Data Science – DJ Patil