Natural Language Processing and Oncology: Unlikely Allies in the Fight Against Cancer
Natural Language Processing (NLP) refers to the ability of computers to parse spoken, written, and visual speech acts with the ultimate goal of human-level performance. Applications of this technology range widely, from text message prediction to automatic translation, but, perhaps surprisingly, NLP systems have increasingly been adapted to medical research. The focus of this piece will narrow in on oncological research, which, to one new to the field of artificial intelligence, may seem incongruous with computational linguistics. However, there is a rich application of NLP in several studies from around the world, and it has become clear that computational linguists have a necessary skill to contribute in the fight against cancer.
While NLP has been in use since the 1950s as the ideal marriage between linguistics and artificial intelligence, medical applications are far more recent and the bulk of these benefits are anticipated in the coming years. In the context of medicine and wellness, Natural Language Processing becomes what some computational linguists refer to as Medical Language Understanding (MLU). Like many modern NLP methods, MLU relies on probabilistic tools such as Bayesian Networks (Christensen, 2002). These networks are a type of model that produces a graph structure where random variables (called nodes) are graphed alongside the relationship between these variables (called edges).
With NLP and medical applications, what we see is that vast quantities of non-numerical data – progress notes, nursing notes, admission documents, discharge summaries, preoperative CT scans, and general unstructured reports – appear in the Electronic Health Record (EHR). Thus the goal of any NLP-medical project is to reorganize this data into tokens that can be parsed by an algorithm.
This brings us to our unlikely alliance in the battle to make cancer a scourge of the past: oncology and language. How marvelous that a pride of phonemes and their interactive rodeo could be the greatest invention of humankind. How marvelous that this act of laying down words beside each other, limb to limb to watch our sentences grow, can combat the abnormal cell growth within. What power.
But language does have this power, this magic. NLP has demonstrated improved outcomes for cancer patients in studies by Barber et al. and DiMartino et al. In their research, Barber and colleagues focused on post-operative outcomes in women with ovarian cancer. Beginning their research with an overview of the treatment of this particular cancer, the authors write, “Ovarian cancer is treated with radical surgical debulking [removing as much of the tumor as possible], which is associated with high rates of postoperative complications” (Barber et al., 2021. This sets the stage for NLP to analyze reports of the outcomes of these surgeries and potential complications. Then the next step went further: determine hospital readmission.
After building the EHR for six years at the Northwestern Medicine Enterprise Data Warehouse, the researchers identified 291 women who had undergone the debulking procedure. Utilizing logistic regression models on the EHR data, they found words that were predictive of readmission and other complications, including “liver lesion,” “omental,” “bilateral ovarian lesions,” and more. Women whose reports contained these buzzwords – which describe the state of the cancer – were more likely to be admitted to the hospital following surgeries. This led to an improvement of approximately 25% to predict postoperative complications (Barber et al., 2021).
The DiMartino and colleagues (2022) study examines cancer treatment in terms of hospitalizations and hopes to improve access of patients to palliative care by determining which patients are most likely qualified using EHR data. Their model broke down symptoms into pain, shortness of breath, and nausea and vomiting. The pain model generated up to a 70% predictive value for which patients would be hospitalized, though the results were not as promising for the nausea/vomiting and shortness of breath models. Their study “adds to…research by illustrating feasibility of an NLP algorithm for identifying advanced cancer patients with uncontrolled pain across multiple cancer types and specifically in the inpatient setting” (DiMartino et al., 2022).
As a developing technology, NLP in oncology does have its limitations. Perhaps central to these limitations is the concept of over-fitting, which refers to the situation wherein models fit the trained data perfectly but regardless makes poor predictions when new test data is introduced. NLP engineers combat over-fitting with cross-validation. In cross-validation, researchers split their data into two camps, training data and testing data, building their model from the training data and then using that model to make predictions in the testing data as a means of checking how well a model fits. This is not specific to NLP, but is common in many machine learning procedures (Nadarkani et al., 2011).
It is also worth noting that NLP technologies continue to struggle with word/phrase order variation (the same concept being expressed in two different syntactic constructions), synonymy (rampant in biomedical terminologies), and uncertainty identification (Nadkarni et al., 2011).
A final limitation on the abilities of NLP systems to engage in research is that most of these systems have been developed for the English language. Linda et al. (2021) noted this, and in their paper developed an algorithm that was capable of restructuring data in the Italian language. Like DiMartino and colleagues, and like Barber and colleagues, the role of NLP in the Linda et al. paper was to sift through, tag, and structure unstructured data. Also like the other studies, this work utilized training sets to discover relevant patterns – in this case, negation.
Despite these setbacks, there is reason to be optimistic. There is no limit to what NLP is capable of in a medical context. Computers can turn reports and records into quality data. Some researchers even envision a world of technological healthcare assistants.
The future belongs to the meaningful collaboration of human and machine, and the future of that collaboration is centered on language. Artificial intelligence research moves quickly, and therefore Natural Language Processing technologies move like the river against the muddy bank: ever pounding, ever flexing, ever remaking the world in its image. Language is capable of changing reality in a tangible way, taking on even tumors. It is not hyperbolic to pinpoint language as the central core of the human experience, the greatest tool we have ever devised, one that can shape our very biology.
For this is the power of words.
References
Barber, E.L., Garg, R., Persenaire, C., & Simon, M. (2021). Natural language processing with machine
learning to predict outcomes after ovarian cancer surgery. Gynecologic Oncology, 160, 182 – 186.
Christensen, L.M., Haug, P.J., & Fiszman, M. (2002). MPLUS: A probabilistic medical language
understanding system. Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, 29 – 36.
DiMartino, L., Miano, T., Wessell, K., Bohac, B.l & Hanson, L.C. (2022). Identification of uncontrolled
symptoms in cancer patients using natural language processing. Journal of Pain and Symptom Management, 63 (4).
Linda, H., Paglialonga, A., Giancarlo, P., Michele, T., Milena, S., Carlo, B., Gianluca, C.E., & Paolo, B.
(2021). Automated classification of cancer morphology from Italian pathology reports using natural language processing techniques: A rule-based approach. Journal of Biomedical Informatics, 116, 1 – 7.
Nadkarni, P.M., Ohno-Machado, L., & Chapman, W.W. (2011). Natural language processing: an
introduction. Journal of American Medical Association, 18, 544 – 551.