Season 4: Episode #121
Podcast with Dr. Taha Kass-Hout, Director of Machine Learning and Chief Medical Officer, Amazon Web Services
In this episode, Dr. Taha Kass-Hout discusses Amazon’s investments in AI and ML for the healthcare space. He also talks about their work with healthcare organizations across the globe in empowering healthcare and life science organizations to make sense of their health data with a purpose-built machine learning platform.
Taha talks at length about Amazon’s work with leading healthcare organizations and how the Amazon HealthLake platform enables the aggregation and analysis of large data sets. He also talks about the current state of AI and ML, the opportunity to analyze unstructured data, and the big gap in the acceptance of AI/ML due to issues such as algorithmic bias that must be addressed in applying AI/ML to healthcare. Take a listen.
|01:58||Tell us about your role as the Director of Machine Learning and Chief Medical Officer at AWS|
|05:40||What is the current state of AI and ML in healthcare?|
|11:02||Tell us about your machine learning use cases.|
|15:16||From the Amazon HealthLake perspective, what is the state of the union of data landscape?|
|20:37||Where do you think is a big gap in the acceptance of AL/ML and issues we need to consider as we start applying these tools in the healthcare context?|
|26:41||How do you support all the different healthcare bets Amazon is making - Amazon Care, Alexa Voice Service, HealthLake – through your machine learning capabilities?|
About our guest
Taha Kass-Hout, MD, MS is Director of Machine Learning and Chief Medical Officer at Amazon Web Services, and leads our Health AI strategy and efforts, including Amazon Comprehend Medical and Amazon HealthLake. He works with teams at Amazon responsible for developing the science, technology, and scale for COVID-19 lab testing, including Amazon’s first FDA authorization for testing our associates—now offered to the public for at-home testing.
A physician and bioinformatician, Taha served two terms under President Obama, including the first Chief Health Informatics officer at the FDA. During this time as a public servant, he pioneered the use of emerging technologies and the cloud (the CDC’s electronic disease surveillance) and established widely accessible global data sharing platforms: the openFDA, which enabled researchers and the public to search and analyze adverse event data, and precisionFDA (part of the Presidential Precision Medicine initiative). Taha holds Doctor of Medicine and Master of Science in biostatistics degrees from the University of Texas and completed clinical training at Harvard Medical School’s Beth Israel Deaconess Medical Center.
Q. Taha, you’ve got an interesting background across the government, private sector, and health systems. Tell us about your role and responsibilities.
Taha: My role at Amazon spans bridging tech, science, and medicine to help develop the right technology services and enable customers to solve their problems. In my current role, I really enjoy working with scientists, engineers, and product managers even as I interface very directly with customers across health care, life sciences, and genomics of all sizes, from startups to academia to large Fortune 500 companies. All of them are trying to help solve concrete problems for patients, consumers, and health systems, or introduce better ways about how they can operate more efficiently or design better systems.
Q. Tell us about your time with the government.
Taha: Before coming to Amazon, I was at the Food and Drug Administration (FDA) during Obama’s second term. As the first Chief Health Information officer, my role revolved around how to get innovation, big data, the cloud, and machine learning to spur innovation in industry.
I also looked at how the FDA could ensure product safety and efficacy on the market in a way as to enable advancements in technologies and the cloud to help medical reviewers even as I worked with industry, medical device companies, pharmaceutical companies, and regional health companies. Not only would this help them innovate, but also ensure safe and effective medical products.
The last couple of years at the FDA, I was part of the core team collaborating with the NIH and on President Obama’s Precision Medicine Initiative. A part of that was all about how we should introduce something called precision to help industry better benchmark next-generation, emergent sequencing, machine learning and AI algorithms coming to market in ways that use a standard based approach. How can you ensure accuracy and reproducibility in a way that also advances regulatory science?
I have a unique background, being both, a clinician – an Interventional Cardiologist by training — as well as a statistician with a lot of depth in applications, population surveillance, clinical trials, and bringing innovation in big data whether for disease surveillance, post-market analysis, or monitoring.
I’ve done the whole lifecycle then, from dreaming up something to bringing it to reality, and advancing those therapeutics. It’s really great to be at Amazon because we like to think of big problems and how we can solve them for these customers. I bring that perspective and the level of depth with these customers into working with the engineers and scientists to craft the right strategy and understand how we can go deep into solving those problems.
Q. Tell us a little about how Amazon is really helping your customers specifically in the healthcare space. Also, what are your thoughts, at a very high level on the current state of AI, ML, and healthcare? Where are we seeing the big wins?
Taha: Machine Learning is transformative, perhaps one of the most transformative technologies we’ve seen. It’s a technology that can use data to build algorithms that allow computer-based systems to generate models for meaningful interpretation and for health. That’s also a potential clinical use. And the dust has settled on a number of areas in Machine Learning, for example, with Natural Language Processing, the better algorithms are really about high accuracy. So, you can imagine how important this is for predictions, tasks, and pattern recognition.
If you look at health data, for the major part that’s unstructured, data comes in the form of images, notes, and signals. So, ML is really amazing for sequential and unstructured data encountered in the health space where, today, we see demonstrations across science organizations from the largest healthcare providers to payers and IQ vendors to the smallest system integrators and entities across the globe, who are applying massive machine learning services to improve patient outcomes and accelerate decision making.
You saw the digitization of medical records over the last decade. Now that we’ve gone from something like 15% maybe five or six years ago, some of your data may still be in paper charts today, but about 98% of all systems are captioned in digital form. With that comes a really amazing business opportunity in value-based care. When the health system is really moving more towards the quality of care and measurable outcomes, you have more data to be able to drive decisions. This is where ML paired with data interoperability can help uncover ways to enhance patient care, improve outcomes, and ultimately, save lives while simultaneously, driving operational efficiencies to lower the overall cost of care by enabling secure access to health data and supporting health care providers with predictive machine learning models.
Life science companies, pharma, and biotech, enable an understanding of how to seamlessly forecast future events like stroke, cancer, and heart attacks and conduct early interventions with personalized care and superior patient experience. They’re designing better therapeutics, fast-tracking the drug discovery cycle so it’s not something that takes ten years for what could be done in a matter of weeks or months.
It’s similar with vaccines, Cancer therapeutics, medical devices and what we work on with Amazon Web Services. The cloud was invented by Amazon, and we provide our customers, healthcare, and life science organizations with absolutely the broadest and deepest set of purpose-built AI, ML services on top of the most comprehensive cloud. That includes data storage, security, analytics, compute services and beyond. And as you’ve seen with our health AI services, now there are purpose-built services for the health industry such as, Amazon Comprehend Medical that can help analyze and detect information, extract and structure this from medical notes, Radiology reports, or medications and conditions and then, map it to the right Ontology, to offer with full transparency and high accuracy insights into how we’re doing.
The Amazon HealthLake is how you can store, index, and analyze this massive amount of information at-scale and in a matter of minutes. We have a number of other services as well which offer consistent data transparency and controls to protect patient privacy. We want these customers to be able to make sense of their vast troves of health data and simultaneously, support their machine learning workflows to make sense of this data. We are committed to developing fair and accurate AI ML services and providing the tools and guidance needed for these customers to build responsible AI and ML applications.
Q. A lot of health care organizations are moving to the cloud for a variety of reasons, such as Analytics, for one. Can you share one or two examples of how your machine learning capabilities and tools have made a difference? Do tell us about one or two use cases as well.
Taha: We’re talking about maybe two use cases – one, on operational efficiency, in which we see a lot of traction; ML’s there, and one on the analytics.
With regard to operational efficiency, for example, the Harvard Beth Israel Deaconess Medical Center uses deep learning models built on Amazon SageMaker. Our end-to-end product is for developers and scientists to build, train, and deploy ML models, and detect bias in the process or be able to monitor those in a way that they were able to optimize the schedule of its 41 operating rooms and align those to improve patient flow and the inpatient settings. But they also use Amazon Comprehend Medical because as you can imagine, for a regional hospital, they receive a lot of patients that are referred to their hospital, for operations and beyond. They come with documentations and to be able to sift through all that and extract key medical terms from co-morbidity, broad prior procedures, to even their blood type and more is where the Amazon company medical purpose-built service HIPAA eligible for understanding the context of the medical text, extract the meanings, and use them to identify history and physical information that’s really needed before the procedure. That’s one example where our health system was able to realize operational efficiency in those settings, translate it into dollars savings, align schedules between surgeons and patients, and benefit the patients via better experiences.
The service also enabled surgeons to have more meaningful schedules on the healthcare side with analytics. We’re really excited about the use case with Rush University Medical Center. We work with them to create an cloud-based analytics hub using the Amazon HealthLake I just mentioned. This hub allows them to securely analyze patient admissions, discharges, and hospital capacity in real-time to provide care to the most critically-ill patients.
They use predictive models around social determinants of health across Chicago to help identify gaps in care before they happen. This is really a great example about how they’re able to bring all that information, organize induction via HealthLake and then, start layering all these analytics to be able to identify those at risk. Outside the health system, there are additional data sources and blood pressure monitors which really offer more of a complete picture around care for all the Chicago Metropolitan that population.
Q. That’s a great example. However, healthcare has a fragmented data landscape. What’s your approach to sorting through the plethora of data sources?
Taha: While healthcare organizations are capturing huge volumes of patient information in medical records every day, however, this data is really not easy to use or analyze. As a matter of fact, 97% of this information, today, is not being used at the point-of-care as data since it’s unstructured in nature and trapped in lab reports, insurance claims, clinical studies, recorded conversations, X-rays, doctor notes and more. The process to extract this information has been fairly labor-intensive and error-prone not to mention the cost of operational complexity which is challenging for most organizations.
We’re finding that every health care provider, payer, or life science company, is trying to solve this obstruction to data, because doing so can enhance patient-support decisions, improve clinical trials, ensure operational efficiency, and even identify population health trends and get ahead. The majority of this medical data today is also stored in various forms, formats, and systems that are not exposed through application programing, interfaces, APIs, or microservices. You’re really still trying to deal with that, but the impact is palpable. I mentioned a couple examples, one on a population level and how Rush University Medical Center is trying to really accomplish better insights into their population.
There’s also Harvard General Hospital which is realizing better operational efficiencies through machine learning but even at the point-of-care, today, the most widely used clinical models like predicting say one’s heart risk, are built from commonly available variables with very simple features that are about 10 to 30 data points. We must get to the level of truly offering what the patients really need, to them. Even the most common conditions like diabetes or depression or for example, of diabetic patients, only 10% of those are similar. Thinking through the therapeutic options and what’s best for the patient, oftentimes takes a while just to understand from a data driven approach, what really might work for them rather than this broad stroke approach. If you look at patients, medical records have at least 200 to 300,000 data points, including your medical notes for sure. None of that is used to manage patients and predict their outcomes. Why you want all this data to come together and organize a way out of the point of care is to build better and more accurate predictions. This is really why we introduced Amazon Health — to start helping these customers address these challenges by storing information in this structure and organizing it in a way that enables better analytics to be built by using more information on that patient. For the last five to six years, there have been standards being developed by the community around healthcare, interoperability, resources, or FHIR. It is amazing for exchanging data in a structured way or it’s a great lexicon and standard for healthcare data.
However, if the majority of the data are still unstructured, you need to be able to index that information and this is where Amazon Health really comes in. We have a machine learning model trained to support these organizations to automatically normalize an index and structure this data and bring this information in a way that creates a complete view of a patient’s entire medical history. This makes it easier for the providers to understand relationships, the progression and make comparisons with the rest of the population to drive better patient outcomes and increase operational efficiencies. This also helps leverage the power of machine learning capabilities for this kind of a problem and enables the designing of better cohorts, better dashboards to monitor and compare these patients, and start personalizing at the individual level, predicting disease onset and beyond.
When we bring this massive amount of unstructured information, we use machine learning capabilities integrated within HealthLake to understand the medical context, extract this information, and augment the records. Then, every data point on the timeline is mapped into the FHIR standard which is helpful when you’re trying to store and exchange this information.
Q. From all indications, now there’s great acceptance of AI algorithms in enabling clinical care. You mentioned Rush and Beth Israel but there may be others too. Where do you think is a big gap in the acceptance? What are some of the issues we need to be thinking about as we start applying ML in a health care context?
Taha: You mentioned data quality. Of course, there’s bias that comes with it. We’re over the hyperbole of what ML is with applications around Natural Language Processing and pattern recognition enabling better predictions. We’re seeing that across life sciences and healthcare, customers are really benefiting from this. The power of machine learning is not just to apply it across the entire end-to-end data strategy from data annotation to understanding any biases in information but also undertaking data wrangling by putting all this information together and leaning on machine learning. For example, in health care this would be undertaken with the large majority of unstructured data. This is why we have Amazon Comprehend Medical national banks. They help us to understand the medical context and extract medical entities and then, map those data and healthcare — not only multimodal but also highly contextual.
There are codes, for instance, diseases have certain standards like ICDs, drugs, whether that’s generic or branded and all the formulary around them. It’s enormous. How is machine learning training purpose-built? How is it pre-trained to understand this information? How does it know that this is a family history, this is negation, there’s anatomy structure, and that information can be extracted with full transparency and a relationship between this condition and this medication be derived? How does it know medication structures, dosage, and more?
We’re really removing the obstruction to enable customers to structure this information in the first place with outcomes and that’s what you really need to look at when you talk about machine learning. I look at it as an end-to-end data strategy from the data prep to when you build those models to when you deploy those models. Then, when you monitor those models in the wild, there’s no one model that you can put out there and expect it to work forever. Do these models aggregate this?
Take one machine learning model, let’s say being worked on by an Assistant Radiologist in one hospital. They train on one data and then take the same model across the street to another hospital acquired by a health system. You’ve acquired one hospital that is using the same old coding system of ICD nine instead of ICD ten and so on. Your sepsis model no longer works so, these are technical biases that come into the data.
If I’m just to take it from the top three and eight of us are committed to developing fair and accurate machine learning services and providing the tools and guidance needed so that when these applications are done responsibly in the first place, this is really where we’re making a lot of mature investment processes. A part of that journey in democratizing machine learning to the masses at scale is also about ensuring the privacy, and detecting bias. it’s not just, you know, referred to as data-driven for it creates imbalances in data or disparities in the performance of these models across different demographics.
This is also an area where machine learning really is of tremendous help in mitigating the bias by detecting potential bias during data preparation and then wrangling the data in your deployed model. As you examine specific attributes, you’ll be able to understand bring the black box. These are the features influencing the output and they could be potential of the output, but we haven’t looked at them because not every feature that goes in the model is, is a predictor. There’s contamination as well and these can be where it starts having different kinds of biases in the output.
Then, of course, the monitoring aspect via a human review becomes so important. It helps understand model behavior once you develop a subset of migration. Today, if you come up with a new drug, you design a clinical trial, but you won’t design it for the entire population in the world. You design a clinical trial for the population you control for every variation and variable. Then, you put it out in the world. That’s when your post-market surveillance is going to monitor for adverse events. Imagine now you have all the tools necessary working for you, and that is really what we package.
With machine learning you don’t design one or two models, typically, you build hundreds or thousands of these until you get to the best performing one. But you’ll have to continuously monitor your leaderboard because the data is going to drift, the model is going to drift as you apply it to heart failure predictions and one population or the other tracks a particular region, a different kind of construct of the population in order to constantly iterate and develop an agile way to do that.
Q. What are the different healthcare bets that Amazon is making? You’ve got Amazon Care, Alexa Voice Service, HealthLake, SageMaker, Comprehend Medical — How do you support all of these? Tell us about that.
Taha: I can only speak about my role within it. We build the technologies and the services to help solve a lot of these problems for health care providers, payers like finance companies and biotech and entities of all sizes and levels of complexity. That’s our goal and the material investment we’re making. ML is such that anyone should be able to pick it up, but then, it’s important to really try to break the black box, remove the complexity, and do the heavy lift for a lot of these customers.
No matter who is building what for whom, with machine learning, AI and other transformational technologies, we want to be able to give right guidance and build these the right way, the responsible way. That’s our approach to it. That’s on the AWS side. We partner with a lot of health care providers and customers, too, because we see a lot of repeated use cases across the board, which is enough for us to really understand the heavy lifting and why we started making those services in the first place.
Q. Would it be fair to say that even an Amazon Care is an internal customer for some of your services, just like a Beth Israel or a Rush or any of those health care providers are?
Taha: I can’t talk about Amazon care. We have to think about Amazon Web Services as a cloud provider, first. Whether that’s an internal customer who is going to use a cloud or an external customer is how we will look at it later. Then they’re going to have a lot of common problems and that’s exciting for us because we can really think hard about the heavy lifts that they observe to be able to start pulling up on those. The last few years have been exciting on the other side of building those purpose-built services.
Pre-trained on the medical context, whether that’s Amazon Comprehend Medical, Amazon Transcribe Medical to understand medical transcriptions, Amazon HealthLake to really provide you that scale with indexing and information on patients and be able to really kind of build these dashboards and cohorts and do these wonderful prediction models, whether that’s for operational efficiencies, improving outcomes, or reducing biases, and closing gaps in care.
Today, over 4 billion people don’t have access to care. Forget about high quality care. I do believe that AI and technology have to be part of the future that can close such gaps in care, enable access to care, and provide more equitable solutions. Innovations in precision medicine, APIs for data interoperability, and system interoperability, intelligent scribes and others are components that can really be part of that solution to being more accountable in offering care to the world.
Disclaimer: This Q&A has been derived from the podcast transcript and has been edited for readability and clarity
About the host
Paddy is the co-author of Healthcare Digital Transformation – How Consumerism, Technology and Pandemic are Accelerating the Future (Taylor & Francis, Aug 2020), along with Edward W. Marx. Paddy is also the author of the best-selling book The Big Unlock – Harnessing Data and Growing Digital Health Businesses in a Value-based Care Era (Archway Publishing, 2017). He is the host of the highly subscribed The Big Unlock podcast on digital transformation in healthcare featuring C-level executives from the healthcare and technology sectors. He is widely published and has a by-lined column in CIO Magazine and other respected industry publications.
Stay informed on the latest in digital health innovation and digital transformation