April 20, 2022

Ethics Around Implementing ML in Healthcare

By Christina Bernard

Machine learning (ML) can improve healthcare, but many people are hesitant to want their healthcare data included in an ML model

One reason is that patient privacy and safety don’t have clear legal guidelines when it comes to technical workers viewing and using patient information. The other concern is that the diagnosis the computer generates has no medical foundation behind the decision, so doctors are concerned with diagnostic outcomes. After all, no one wants to get misdiagnosed.

HIPAA compliance regulations are the only guidelines on how to implement certain aspects of server setup, data storage, data transmission, and user access control. However, that doesn’t protect the identifying factors of the data, like name, address, and social security number. 

Currently, bills are being proposed to strengthen collaboration between government and industry to determine what practices can better protect consumers. 

In this blog, we’ll take a look at three key areas of ethics that you and your customers will be concerned about: Transparency, Privacy, and Security. 


Patients should be informed about how their data is being used. With the rise of fitness trackers and similar health technologies, patients are being more informed about how data can help improve wellbeing. 

To protect patients, clinics must be intentional and cautious when deciding what data is needed for a data science experimentation. A simple export of patient data not only creates privacy issues but also denies the patient knowledge of how their data is being used. 

In 2015, a set of images from a webcam in a San Francisco cafe had been used in research by Stanford University scientists to develop a computer vision model that identified a person in a crowded scene. While this model was not developed in the healthcare field, it had huge implications on consent and the use of personal data. 

While Stanford did use images that were being streamed online, concern arose because the people had not consented for their data to be used in research or streaming. If using basic images of customers in a cafe without consent is highly questionable, then there should be more concern about the private medical history of individuals. Inappropriate use of private medical history could result in serious discrimination. 

The counterpart to patient transparency is doctor transparency. Clearly informing doctors or other stakeholders of model limitations and performance discrepancies can help ensure that patients are not negatively impacted. Like all models, healthcare models are subject to bias. 

If an original model is only built on a subset of data, the population sample may not be fully representative of the population. For example, a model created on data from a clinic that sees mostly diabetic patients will likely not work well on cancer patients. This can lead to an overfit model which in healthcare could lead to adverse outcomes. If these factors are not communicated and understood well, it could result in a large number of misdiagnoses.


Privacy has always been a hot-button topic when it comes to medical data. There are questions around how algorithm owners can access patient information. HIPAA compliance has regulations around user access control, but what about the need to perform exploratory data analysis (EDA)? Data scientists like to get an understanding of the underlying data through EDA. 

In this process, the questions come up if they have a strong enough reason to view personal identifying information (PII). If there is a strong enough reason for them to access that information, can they do so in a secure manner?

Steps can be taken to remove private data from PII through anonymization. This could be removing identifiers, like names and addresses, and replacing them with hashes. Hashes are a random assortment of characters with a specific length. Another method is pseudonymization, which is the replacement of scientific data with artificially generated one. 

This process is replacing the real data in the ETL process with artificial data, while still retaining the real data in another secure location. This could look like scrambling the data or blurring the data by multiplying it by a random factor or adding a random number. The trends in the data will remain the same and the data scientists should be able to create a high-functioning model with this version of the data. But is that enough?

In response, differential privacy techniques have surfaced to combat re-identification attacks. This method intentionally introduces random noise at different stages of model development. 

Another layer on top of the above techniques is homomorphic encryption. This is when an algebraic operation is used to encrypt/decrypt plaintext. It is more technically challenging to implement, but it is considered the gold standard in information security. 

Simple brute-force methods can not crack it. Successful implementations of this method have been completed on neutral networks. However, research and best practices are still underway to make it more widely applicable. 


Data security is more of an issue of ensuring that the data itself is securely housed. You may be familiar with HIPAA compliance, HITRust compliance, or SOC2 compliance. These security protocols ensure that the right people have access to the data, access is being logged, and that logging into the system is secure. 

Also, the data needs to be backed up properly, which is more than a backup drive on the same machine. This involves the replication of data in different equally secure locations. These protocols are becoming more widely adopted because they can help mitigate the effects of cyberattacks. 


Ethical AI is being explored by organizations around the world and healthcare continues to be at the forefront of a lot of those discussions. Tech companies are trying to determine how to balance consumer privacy with business goals and strategies. Groups are forming across the world to discuss implications.   

While precautions must be taken when implementing AI in healthcare, there are still tremendous benefits of doing so. If you are interested in learning more about this topic or how to implement AI in healthcare at your organization, be sure to download our free and comprehensive guide on the subject.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit