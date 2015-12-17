By applying algorithms that mimic the processes of real neurons, we can make the network learn to solve many types of problems. Google uses a powerful ANN for its now famous Deep Dream project where computers can classify and even create images.

Our group studies the immune system, with the goal of figuring out new therapies for cancer. We’ve used ANN computational models to study short surface protein-codes our immune cells use to determine whether something is foreign to our body and thus should be attacked. If we understand more about how our immune cells (such as T-cells) differentiate between normal/self and abnormal/foreign cells, we can design better vaccines and therapies.

We scoured publicly available catalogs of thousands of protein-codes identified by researchers over the years. We divided this big data set into two: normal self-protein codes derived from healthy human cells, and abnormal protein-codes derived from viruses, tumors and bacteria. Then we turned to an artificial neural network developed in our lab.



Once we fed the protein-codes into the ANN, the algorithm was able to identify fundamental differences between normal and abnormal protein-codes. It would be tough for people to keep track of these kinds of biological phenomena—there are literally thousands of these protein codes to analyze in the big data set. It takes a machine to wrangle these complex problems and define new biology.

Predictions via machine learning

The most important application of machine learning in biology is its utility in making predictions based on big data. Computer-based predictions can make sense of big data, test hypotheses and save precious time and resources.

For instance, in our field of T-cell biology, knowing which viral protein-codes to target is critical in developing vaccines and treatments. But there are so many individual protein-codes from any given virus that it’s very expensive and difficult to experimentally test each one.



Instead, we trained the artificial neural network to help the machine learn all the important biochemical characteristics of the two types of protein-codes—normal versus abnormal. Then we asked the model to “predict” which new viral protein codes resemble the “abnormal” category and could be seen by T-cells and thus, the immune system. We tested the ANN model on different virus proteins that have never been studied before.

Sure enough, like a diligent student eager to please the teacher, the neural network was able to accurately identify the majority of such T-cell-activating protein-codes within this virus. We also experimentally tested the protein codes it flagged to validate the accuracy of the ANN’s predictions. Using this neural network model, a scientist can thus rapidly predict all the important short protein-codes from a harmful virus and test them to develop a treatment or a vaccine, instead of guessing and testing them individually.

Implementing machine learning wisely

Thanks to constant refining, big data science and machine learning are increasingly becoming indispensable for any kind of scientific research. The possibilities for using computers to train and predict in biology are almost endless. From figuring out which combination of biomarkers are best for detecting a disease to understanding why only some patients benefit from a particular cancer treatment, mining big data sets using computers has become a valuable route for research.

Of course, there are limitations. The biggest problem with big data science is the data themselves. If data obtained by -omics studies are faulty to begin with, or based on shoddy science, the machines will get trained on bad data—leading to poor predictions. The student is only as good as the teacher.

Because computers are not sentient (yet), they can in their quest for patterns come up with them even when none exist, giving rise again, to bad data and nonreproducible science.

And some researchers have raised concerns about computers becoming black boxes of data for scientists who don’t clearly understand the manipulations and machinations they carry out on their behalf.

In spite of these problems, the benefits of big data and machines will continue to make them valuable partners in scientific research. With caveats in mind, we are uniquely poised to understand biology through the eyes of a machine.

This article was originally published on The Conversation. Read the original article.