Drug Pair Synergy Prediction using LLMs | | | | Turtles AI
Drug Pair Synergy Prediction using LLMs
DukeRem28 April 2023
A new study has shown that large pre-trained language models (#LLM) have the potential to revolutionize few-shot learning across various fields, even with minimal training data. However, until now, their ability to generalize to unseen tasks in more complex fields, such as #biology, has not been fully evaluated. But a team of researchers has proposed a few-shot learning approach that uses LLMs to predict the #synergy of drug pairs in rare #tissues with limited structured data and features.
The researchers developed a model called #CancerGPT, which uses LLMs to extract prior knowledge from text corpora to offer a promising alternative approach for biological inference. Their experiments involved seven rare tissues from different cancer types and demonstrated that the LLM-based prediction model achieved significant accuracy with very few or zero samples. Remarkably, CancerGPT, which has approximately 124 million parameters, was even comparable to the larger fine-tuned GPT-3 model, which has approximately 175 billion parameters.
The team's research is the first to tackle drug pair synergy prediction in rare tissues with limited data. They are also the first to utilize an LLM-based prediction model for biological reaction prediction tasks. To fine-tune CancerGPT, the team tailored GPT-2 by adjusting it in the context of drug pair synergy prediction. They named this model CancerGPT and used the same structure as the modified GPT-2.
The team first fine-tuned CancerGPT to learn the relational information between drug pairs from common tissues. This approach was based on the assumption that certain drug pairs exhibit synergy regardless of the cellular context. Therefore, the relational information between drug pairs in common tissues can be used to predict synergy in new cell lines in different tissues.
Additionally, the team incorporated information on the sensitivity of each individual drug to the given cell line, using relative inhibition score as a measure of sensitivity. By doing so, they were able to gather a more detailed and nuanced understanding of the relationship between drugs and cell lines.
The team utilized a publicly accessible extensive database of drug synergy from DrugComb Portal, which is an open-access data portal where the results of drug combination screening studies for a large variety of cancer cell lines are accumulated, standardized, and harmonized. The database contains both drug sensitivity rows and drug pair synergy rows. They focused on cell lines from rare tissues, which they defined as tissues with less than 4000 samples, including the pancreas, endometrium, liver, soft tissues, stomach, urinary tract, and bone. They tested their models with each of the rare tissues.
To evaluate the accuracy of classification, the team used AUROC and AUPRC. They compared the LLM-based prediction model with two other tabular models that take the same set of inputs: XGBoost and TabTransformer. XGBoost is one of the gradient-boosting algorithms for structured or tabular data, while TabTransformer is a self-attention-based supervised learning model for tabular data. The team fine-tuned all the models for each k shot in each rare tissue and tested them with AUPRC and AUROC.
The team's findings suggest that CancerGPT, which utilizes LLMs, is a promising alternative approach for predicting drug pair synergy in rare tissues with limited data. The team's research opens up new avenues for future studies to explore the potential of LLMs in other biological inference tasks.