Proteins are an essential part of all life on Earth, allowing all organisms to live, grow, and perform tasks in order to survive. In ‘What is a protein?’, this biological macromolecule was broken down into its components to understand how proteins are made and what some of them do. As science progresses, more has been learned about proteins and their functions. The Nobel Prize in Chemistry in 2024 was awarded to Dr. John Jumper and Dr. Demis Hassabis for developing an artificial intelligence platform called AlphaFold that could study the known structures of proteins and predict new ones. Artificial intelligence (AI) has many applications in the work of scientists, including protein science. Just like a protein can be broken down into the components that form it, there are AI programs that have been developed to analyze each component. The key to these AI programs is the vast understanding of proteins that scientists have worked to collect to date, and using that data to broaden our knowledge in a more timely manner to better understand complex diseases.
Protein Sequence Analysis
The code for all proteins, how they are formed, and what they do lies in an organism’s DNA. Many protein sequences, as sections of DNA that code for a protein are called, are known. However, there are many proteins that have not been discovered, and decoding their sequence is the first step to understanding their function. Machine learning models can identify functional domains, post-translational modifications, and evolutionary relationships within protein sequences. This means that they can take information known about an organism’s proteins and learn from it to predict other, unknown proteins. This helps in understanding the proteome – or full protein catalog – of cells under different conditions or disease states.
Protein Structure Prediction
There is a good reason that the program AlphaFold won the Chemistry Nobel Prize in 2024. Protein structure prediction is a long and difficult process, since understanding how a protein is shaped and fits together is key to understanding its function. Decoding the human genome in the late twentieth century led to advancements in this field, but many scientists (including this author) have struggled to understand new protein structures. AI algorithms, like AlphaFold, can predict protein structures with high accuracy, significantly reducing the time and cost of experimental determination through techniques like X-ray crystallography or cryo-electron microscopy. Understanding protein structures aids in drug design, enzyme engineering, and understanding disease mechanisms. AI can model protein folding pathways and dynamics, providing insights into folding diseases like Alzheimer’s or Parkinson’s. It can also predict how proteins will behave under various environmental conditions. This prediction capability is made possible by the many protein structures that have already been solved. By comparing genetic sequences to similar proteins that have a structure already, AlphaFold can make a guess about an unknown protein. This can then be validated by experimental methods.
Drug Discovery based on Protein Structure and Function
Many efforts in the design of therapeutics center around the druggability of proteins. This is the ability of a protein’s process to be altered by the presence of a drug, either a chemical compound such as Ibuprophen, or a biologic such as the RNA vaccine for COVID-19 or an antibody developed against cancer. When a protein is discovered and its structure known, there are parts of the structure known as active sites that interact with its environment and enable its function. If this function is out of alignment with what the cell needs, such as causing overgrowth and cancer, the protein is targeted to effectively be turned “off”. In other cases, the protein may not be performing fast enough, and may need to be targeted to improve its activity. In either case, these proteins are deemed “druggable” and therapeutics are designed for the appropriate task.
Knowing the structure and function of a druggable protein is just the first step to developing a therapeutic. Many scientists and major pharmaceutical companies work for years to develop and perfect therapeutics for one protein target or disease. This is where AI can speed up the process. AI can assist in virtually screening drug molecules in the active sites of druggable proteins to identify potential drug candidates that interact with specific protein targets. Generative models can design novel molecules or peptides with desired properties, such as high binding affinity to active sites or stability in the body. AI can analyze protein variants in patients, linking them to diseases and enabling personalized treatment strategies. This method of prediction and testing derives from years of similar data generated to develop drugs on the market, and can increase the pace at which new drugs can be tested. Each drug that is predicted in a computer generated AI software must be synthesized by scientists and tested thoroughly before being used for patient care, but the time it takes to predict the best compound can be shortened to help in the discovery of new medicine.
Artificial intelligence is an exciting and necessary advancement in protein science and modern medicine. All of the hard work and data from generations of scientists is being combined to train learning models and predict new discoveries. This has vastly improved the rate and effectiveness of protein structure prediction, functional understanding, and druggability. While it may change the dynamic of how research is conducted, AI will certainly improve the lives of scientists and patients alike if used responsibly, checked thoroughly, and combined with experimental data.