Language models can...create proteins! | | | | Turtles AI

Language models can...create proteins!
DukeRem1 February 2023
Scientists seemingly have found a new way to create artificial proteins. This is important because there are more possible protein designs than there are atoms in the whole universe. Usually, scientists use a technique called "Directed Evolution" which imitates how nature evolves things in a lab. But this can be slow or not work at all. To solve this problem, Madani, Krause and Naik use AI models that can learn from the many protein sequences that are now available. They use a type of AI called "neural language models" that can imitate human language. Proteins can be thought of as a language made of 20 different molecules called amino acids. By using these models, scientists can create realistic and new protein sequences. They used a type of AI called "conditional language models" that can be controlled by them, to create proteins with specific properties, for example being able to destroy bacteria. These scientists trained their AI model, called ProGen, on 280 million protein sequences and then tested it by creating artificial lysozymes, which are antibacterial proteins. They found that 73% of the artificial lysozymes were functional, while 59% of natural lysozymes were functional.