Does your data science team worry about aligning your work with your values? Data science and machine learning provide new capabilities, and with those come new ethical responsibilities.
At Retina, we developed four principles of data science ethics to make sure we serve our clients and their customers with respect and fairness. Professional guidelines* that have been refined over decades and contemporary experts like former US Chief Data Scientist DJ Patil alike inspire and inform our principles.
1. Protect Individual and Company Privacy
Over 5 billion records of personal information were exposed through data breaches in 2019, costing businesses trillions of dollars in the resulting scandals. As people begin to take more aspects of their daily lives online, ranging from sharing what’s for lunch on social media to more sensitive information like managing bank accounts and credit cards, hackers have much more to gain by attempting to access a company’s records about its customers and prospects.
The other side of the coin is that each of these data breaches can cause personal harm—like identity theft—to real people. It’s important to have comprehensive plans to protect and secure user data, but also to understand that flaws in these plans are always a possibility. So, developing a mechanism for redress if people are ultimately harmed by the results or implementation of a model is critical and required to comply with regulations like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR).
2. Account For and Remove Unfair Data Bias
Data breaches aren’t the only way models can cause harm—models based on biased data can make unfair decisions about people’s opportunities, livelihoods, and more. In one example, an unsupervised model for determining whether a person should qualify for insurance at a low rate or not took the applicant’s race into account.
Some unsupervised machine learning algorithms operate like a black box. So, it’s difficult to know off the bat if the resulting model discriminates based on characteristics like race or sex. We conduct research in advance to identify possible bias in the data that will serve as the basis of the model. We also test models for fairness and disparate error rates among different user groups after they’re complete as another layer of security against unfair bias.
3. Ask About The Use Cases of End Results
When building a model and shipping it’s results, we always consider both the people using the insights and those affected by the results and work in the best interests of both groups. We have open conversations with our clients about their intended use cases for the results we ship so we can serve them and their customers ethically. At a glance, it might seem challenging to think of an example where the interests of these groups would diverge, but some political campaigns use manipulative advertising strategies to sway voters to behave differently than they would if served honest marketing campaigns.
At Retina, we’re in the business of helping our clients better understand the individuals who make up their target markets so they can optimize their marketing campaigns, whether that’s for attracting new or retaining existing customers. We help our clients hone their customer acquisition targeting to acquire customers who are likely to spend more so they can spend less of their advertising budgets on people who are less likely to be as interested in their products and services.
That means their customers receive more relevant marketing material, and fewer ads for products they would never want to buy. To the best of our knowledge, the outputs of our models do not harm our clients or their customers by supporting manipulative use cases.
4. Ship High Quality and Accurate Models
Clean data and solid methodology are equally important for building and implementing accurate, high-quality models. Making predictions about customers based on inaccurate lifetime value models can not only lead to setting marketing budgets that are out of scale with the revenue a customer is likely to generate for your business over their lifetime as a customer, but inaccurate models can also pose ethical concerns.
We hold ourselves to high internal standards for how trustworthy the results of our models must be and refuse to deliver models or insights that don’t make the cut. This responsibility doesn’t end after a model is in production—we test for model drift to ensure our software remains fair over time, and we’re always thinking ahead about how to prepare to shut it down in production in the unlikely event it starts behaving badly. In most cases, a full stop is not required, but it’s still our policy to be fully transparent with our clients if we discover issues with our results and work together to resolve them.
How Do We Implement Our Data Science Code of Ethics?
Our commitment to transparency with our clients helps us stay accountable, but our employees are the greatest safeguard of our principles. We include ethical case questions during our interview process and focus on the importance of these tenets during our onboarding process. That means every member of the Retina team commits to delivering trustworthy results to clients.
Professional Guidelines Referenced*
- Code of Ethics – Association for Computing Machinery (1993)
- Ethical Guidelines – American Statistical Association (1983)
- Ethics and Data Science – DJ Patil