{ End Bracket }
Hacking the Immune System
Nebojsa Jojic and David Heckerman
HIV is a major global health problem, with over 40 million people infected worldwide. Biomedical researchers are working to create a vaccine that would be effective against many different strains, taking into account all the individual differences in people's immune systems. Viral RNA sequences have been extracted from thousands of blood samples from infected patients. Each sequence is a string of about 9000 nucleotides serving as a program for viral assembly. Parts of patients' own genomes are sequenced in order to categorize their immune types, which are known to affect immune response and thus viral diversity through natural selection.
The resulting datasets present immense difficulty to biomedical researchers. The genetic code defies visualization in much the same way that computer binary code would without the software tools to parse, convert, and edit it in a higher language. But, as opposed to the binary code which was created by humans, the genetic code follows rules that are not fully understood. Therefore, researchers are turning to modern data-mining techniques to discover patterns of viral evolution. The problem of parsing the structure of the genetic sequences (RNA/DNA or proteins), is similar to the problem of parsing the structure of other natural signals, such as images or sounds. At Microsoft Research, we have been working on machine-learning and data-mining algorithms that aim to automatically parse such data for uses relevant to Microsoft, such as video/image browsing, spam filtering, and speech recognition. These same techniques can be applied to medicine.
The human immune system acts in many different ways to control infection. One arm of the immune system recognizes short (8-11 long) amino-acid sub-sequences in the viral protein called epitopes. Many human cells have the ability to send such short pieces of various internal proteins to the surface of the cell, where these markers representing viral presence in the cell are exposed to recognition by killer T-cells. Killer cells destroy the cells that expose foreign peptides on their surface, and usually ignore those that only show pieces of normal human protein. Being sensitive only to these short signatures (hashes, if you will) makes the immune system resource-effective, but provides an opportunity for the virus to evade immune pressure by mutating to present proteins that look like human protein sequences on their surface. The goal of vaccination is to elicit immune response to as many epitopes as possible, so the virus can't evade detection. The problem with putting a lot of epitopes into a vaccine is that long vaccines can cause illness. To shorten the vaccine, epitope overlap can be exploited.
Several years ago, we introduced a new representation of images, which we call epitome, and the associated learning algorithms, which have a very similar flavor. Epitome was used as a basis of image parsing and recognition tasks (
www.research.microsoft.com/~jojic/epitome.html) and later also for analysis of text, sound, video, and genes. While the image epitome organizes visual memory, the HIV epitome serves as an artificial organization of cellular memory, which is delivered as a vaccine into the cells. Once delivered, the cells chop up and present the pieces of the epitome on their surface, triggering the response from killer cells which then turn into memory cells capable of rapidly extinguishing the infection by strains containing these patterns. (For an illustration, see the protein epitome link at
www.research.microsoft.com/~jojic).
Each individual will learn different viral targets (epitopes) from exposure to the vaccine. When these targets are unknown, one approach is to treat every possible 8-11 amino acid pattern in every known viral strain as a potential epitope and fit them all into a short epitome. The overlap of epitopes within one strain and the repetition of these short patterns across different strains make it possible to capture over 70 percent of viral variability with epitomes that are just 5-10 times longer than a single strain. The immense diversity of the immune systems is complicating this research and there again arises a need for machine-learning techniques. We are working with scientists from Harvard University, Massachusetts General Hospital, University of Washington, Los Alamos National Laboratory, and The Royal Perth Hospital in Australia on the problem of discovering epitopes. The machine-learning techniques involved are essentially the same as the ones we used in other predictors, for example in the spam filters. We are also currently working with some of these scientists on lab experiments that should verify the efficacy of epitomes in binding assays. These experiments will show us where the T-cells taken from infected patients bind on the epitomes constructed.
Nebojsa Jojic, Ph.D. works at Microsoft Research, where he has conducted research in the areas of computer vision, computational biology, signal processing and machine learning.
David Heckerman, Ph.D, M.D. is founder and manager of the Machine Learning and Applied Statistics Group at Microsoft Research.