EvolutionPurposeImplementation (*) - Implementation of exon/intron evolution
Technology Review Top Young Innovators
Since 1999, the editors of Technology Review have honored the young
innovators whose inventions and research we find most exciting;
today that collection is the TR35, a list of technologists and
scientists, all under the age of 35. Their work--spanning medicine,
computing, communications, electronics, nanotechnology, and
more--is changing our world.[...]
Manolis Kellis develops algorithms and techniques for analyzing the
entire genomes of different species, the better to understand those
genomes. Kellis began his PhD work with little knowledge of
biology: his undergraduate degree is in computer science. For his
thesis, he compared the genomes of four yeast species to identify
all the genes and regulatory sequences in one of them--a project
hes glad no one told him was believed to be impossible.
Comparing the genomes of multiple closely related species has
proved to be a powerful new tool for finding genes and the
sequences that regulate them, and for learning about how genomes
evolve (see "Finding Evolution's Signatures").
After validating his methods in yeast, Kellis has moved to the
human genome, which he has so far compared with those of the mouse,
rat, and dog. His work is providing an intimate understanding of
the human genome that may give drug developers new points of entry
in their attempts to combat viruses and other causes of disease.
Museum of Science Award
Though far from finished with their own work, three senior MIT
researchers passed the torch to a new generation of scientists on
Tuesday, Nov. 9 at the Museum of Science in Boston.
For the past two years, the museum has named several young New England
scientists as the "Next Generation" of revolutionary researchers whose
work already has made a significant contribution to their field. This
year, the three honorees all work in biotechnology at MIT. [...]
Professor Eric Lander, founding director of the Eli and Edythe
L. Broad Institute, whose mission is to create tools for genomic
medicine, make them broadly available and use them to propel the
understanding and treatment of disease, called it "a good and new
experience for me to be part of the old generation." In his
introduction of Kellis, Lander said he was impressed with the young
scientist's enthusiasm and insights, referring to Kellis as an
"extraordinary ball of energy."
Kellis proved Lander's point by speaking enthusiastically about his
work. He called it "using evolution to inform genomics, and using
genomics to understand evolution." Kellis, who earned his Ph.D. in
computer science from MIT, has developed new computational paradigms
to help decipher DNA signals, understand gene regulation and clarify
the evolutionary mechanisms of genomes. He has applied these tools to
the yeast and human genomes, to systematically study all genes and
regulatory elements. His work also showed that the yeast genome arose
by whole-genome duplication, and that a similar event shaped the early
vertabrate evolution of several fish.
Kellis' "never-ending smile and unabashed optimism has always
impressed me," Lander said. "This is definitely our future."
Combining computer science with biology
University professors are often portrayed as self-absorbed
individuals that are too busy plotting their next breakthrough to pay
any attention to their undergraduate students. While I have never
encountered such a professor at MIT, Manolis Kellis, Assistant
Professor of Computer Science, definitely destroys this
all-too-prevalent stereotype. Last week, Prof. Kellis was honored as
one of Technology Review's Top 35 Innovators Under 35 for his
pioneering research in comparative genomics. When I sat with him today
and asked him what it was like to receive such a distinction, he
seemed genuinely surprised that I had even heard the news and
immediately attributed his achievements to his unbelievable colleagues
and students. He spoke very openly about his passion for research,
love for MIT, outlook for the future of genomics, and pressures of
living up to the hype that awards generate.
Kellis was born in Greece, but moved with his family to France and
eventually arrived in the U.S. in 1993. Manolis, his sister, and his
brother were all accepted to MIT within nine months of each other. He
says that MIT was the only school he applied to, and for him it was
the obvious choice. At MIT, he felt that the sky was the limit and he
could do whatever he wanted to do. But since his acceptance, he admits
that his path has been partly determined by a series of coincidences.
He chose to study Computer Science because it was an interesting,
broad major that could open doors to any area. Manolis got his first,
and only, UROP by "total chance." As he was walking through a
corridor, he saw a friend who was on his way to a job orientation for
the World Wide Web Consortium, led by Tim Berners-Lee (father of the
Web). It sounded interesting so he tagged along and got chosen for one
of the positions. He wrote a programmable WebCrawler for his project,
but more importantly, he got an early start, which attracted companies
and led to better opportunities. Without that UROP, he says, he
"probably wouldn't be sitting here today".
Manolis's interest in biology, and genetics in particular, also
appears to be serendipitous. One day, he ran into a friend who
happened to be reading a biology book that he himself owned. That
friend opened his eyes to biology and introduced him to Eric Lander,
the driving force behind the Human Genome Project, who eventually
became his thesis advisor. Manolis says that seeing the genomic data -
the string of A, T, G, and C's - was like seeing himself in the
mirror. He became fascinated by the "code that makes us work"
and "could never look back."
Manolis continued his studies at MIT by getting a Masters in
Engineering (M. Eng) in Electrical Engineering and Computer Science,
and then entered the field of comparative genomics for his Ph.D. while
the field was still in its infancy. Comparative genomics basically
compares the genomes of different organisms to figure out what in the
genome is important and how organisms might have evolved. For example,
if the same string of A, T, G, and C's appears in dogs, rats, and
humans at about the same place, then chances are that this string
codes for something important that is worth keeping around for
millions of years. So, Manolis compared the first four eukaryotic
genomes, which all belonged to yeast, using a novel process to find
genes and other pieces of DNA that determine when a gene is
expressed. His research has received numerous awards and has been
published seven times in the prestigious journal, Nature, in the past
3 years.
Finally, Kellis accepted a faculty position at MIT because of the
students, his love for academia, the sense of camaraderie, and the
ability to be in the middle of everything. After all, he points out,
where else can you teach at the #1 program in Computer Science,
surrounded by some of the best biological and medical institutions in
the world. He also gushes about his students, who he deems are "so
brilliant." Kellis emphasizes that he has so much to learn from his
students and loves the feeling that we are all sitting together at
a round table, trying to understand science." In general, he finds
MIT students to be intellectual, motivated, sincere, diverse, and down
to earth. He also loves that he has always felt accepted by MIT
professors and appreciates that he has been treated with the utmost
respect since his freshman year.
Few people have seen MIT from as many perspectives as Manolis
Kellis: as an undergraduate, a graduate student, and a member of the
faculty. I would also venture to say that few people understand what
makes MIT unique as well as he does. He accredits the success of MIT
to a group of people that are brilliant in their own ways but work
together. The power of diversity is evidenced by the fact that a
colony of genetically identical bacteria can be wiped out by a single
antibiotic. In the same way, cloning is boring since the secret to
survival lays in the mutations, the diversity. Kellis finally
emphasized that in life there are no right answers, we must always be
creative, grab opportunities when they appear, and accept that
mistakes will happen.
Innovator is well-suited to the language of MIT.
Kellis, now an assistant professor at MIT, recalls that the family
was living in Greece when his father decided to move the family to
France. No one in the family could speak French. Indeed, none even
knew where they were headed, which was a small city called Aix en
"I had been at the head of my class in Greece, but suddenly I
couldn't even speak the language of the school," said Kellis,
27. "I did math with a dictionary, first translating the problem
from French to Greek before I could work on the problem." "Looking
back, it was a great experience in having to adapt and acquire new
skills. At the time it was tough."
Kellis was honored recently in an event hosted by the Museum of
Science. He was one of three young MIT professors declared "Young
Innovators to Watch." Others honored were Chris Burge and Angelika
Eric Lander, who heads the Broad Institute of genomic studies in
Cambridge, introduced Kellis as one of the most promising
researchers at the institute. Lander lauded Kellis for his ability
to integrate computational paradigms to the complex tasks of
deciphering DNA signals, understanding gene regulation and
clarifying how genes evolve. Lander, founding director of the
institute, said Kellis' work has the potential to help cure disease
and also contribute to the understanding of natural history and
The term "evolution" is a relevant one to Kellis, at least in terms
of his education. After four years in France, he matriculated to
the Lycee Francais in New York, where he added English to his
linguistic repertoire. Never one to remain in one place, he
enrolled at MIT. This decision was not as unexpected as some
earlier ones: both his sister and brother were attending the
Cambridge university. "I chose MIT because it's the best," Kellis
said. "And the experience has turned out to be a great one." "What
I love about the people - faculty and students - is the belief that
nothing is impossible. People are always pushing the bounds of
After he received his undergraduate degree, he spent several years
in San Francisco. He worked with Xerox PARC, a renowned research
center for all things computer. Kellis says he worked on
computational geometry, modular robotics and human motion
understanding. Yet after several years there, he returned to
Cambridge to seek his Ph.D. in bioinformatics.
Like a good New Englander, he appears to have absorbed the credo
that adversity, at least in terms of weather, can edify the man.
Indeed, his web site contains the quote "Live in New York once, but
leave it before it makes you hard. Live in California once, but
leave before it makes you soft." Now again entrenched in
Cambridge, Kellis is affiliated with both the Broad Institute and
MIT's Computer Science and Artificial Intelligence Laboratory.
Because bioinformatics is such a new field, he has the opportunity
to become a pioneer as young scientists continue to add to the
knowledge of the field.
Though a career at MIT often results in professors leaving for
high-tech startups, Kellis appears content to pursue his career in
academia. "I just started an assistant professorship, and my goal
is research and academic life," he said. "There might be
consulting opportunities or business opportunities in the future,
but at this point I am excited to be at an institution with so many
possibilities for research."
Finally, Kellis appears prepared to stay put.
Genomic revelations from fly's family tree
In one of the first large-scale comparisons of multiple animal
genomes, scientists at the Broad Institute of MIT and Harvard, the
Computer Science and Artificial Intelligence Laboratory (CSAIL) at
MIT, and many collaborating institutions, have analysed the genomes of
twelve species of the fruit fly Drosophila to reveal insights on the
evolution of genes and genomes and to discern the functional elements
encoded in animal DNA. The work appears in the November 7 issue of
Nature and in more than 40 accompanying papers in Genome Research and
other journals. The method of comparing the genomes of multiple
related species, fly or otherwise, not only reveals new insights into
species evolution and identifies thousands of novel genes and other
functional elements, but also provides a powerful tool for unravelling
genome function that may help researchers unlock the secrets of our
own genome.
In these papers, the international consortium reported the genomes of
ten newly sequenced Drosophila species, some very closely related and
others less so, and their comparison to two previously sequenced flies
including Drosophila melanogaster, one of the most powerful model
organisms for the study of animal biology and evolution. The
availability of the many Drosophila genomes has enabled a great deal
of new insights about genome function and aided the study of how
genomes have changed across evolutionary time.
"Having the sequences of many closely related species allows us to
study the evolutionary forces that have shaped the fruit fly's family
tree, and to discover the working parts of the fly genome in a
systematic way," said Manolis Kellis, associate member of the Broad
Institute, assistant professor in MIT's CSAIL, and one of the
consortium's project leaders.
On one hand, the researchers studied the differences across species to
help elucidate how evolution has shaped fly biology over millions of
years. Their analysis revealed that while many attributes of
Drosophila genomes are in fact conserved across multiple species, each
species has novel features not seen in any other. In fact, only 77
percent of the approximately 13,700 protein-coding genes in
D. melanogaster are shared with all of the other 11 species. For
example, the genes involved in interactions with the environment and
in reproduction showed signs of adaptive evolution, meaning that they
likely provided some survival advantage to the organism.
On the other hand, the researchers studied the similarities of the
different species to help define the functional parts of the fly
genome. The parts of a genome that are unchanged (conserved) are those
that have been kept by evolution, and are thus likely to play crucial
roles. Thus, genome comparison can reveal which regions of the genome
are functional, based on the degree to which evolution has conserved
"Focusing on the conserved part of the genome is a great way to
discover what has been maintained by evolution," said
Kellis. "Moreover, by looking more closely at the subtle patterns of
mutation within conserved regions, we can predict the functional roles
they play."
Indeed, at the level of DNA, several combinations of letters, or
nucleotides, may encode the same function, in the way that a
storyteller can use different combinations of words to tell the same
tale. For example, four different nucleotide combinations - GTT, GTC,
GTA, and GTG - all encode the same protein building block, or amino
acid. Thus, a change in the third letter would leave the amino acid
unchanged, one example of how DNA changes can be tolerated while still
preserving the function of the corresponding protein.
Through these kinds of random mutations, evolution explores the space
of possible nucleotide combinations that preserve function. This
exploration produces unique patterns of genomic change, described by
the researchers as "evolutionary signatures" that are specific to the
function of that region of DNA. Protein-coding genes, for example,
show frequent substitutions at every third nucleotide, due to the fact
that one amino acid can be encoded by several nucleotide triplets. In
contrast, some genes that don't encode proteins - so-called RNA genes
- show changes that preserve the overall structure of RNA while
tolerating changes in the genes' DNA sequence.
Like codebreakers turning their knowledge of biology into
computational algorithms, Kellis and his colleagues identified
evolutionary signatures associated with a variety of roles in the
genome: protein-coding genes, non-coding RNAs, microRNAs, and
regulatory motifs. In each case, the researchers identified distinct
evolutionary signatures associated with each function, based on the
tolerated changes that still preserve that function.
The researchers then used these evolutionary signatures to
systematically identify the functional elements encoded in the fly
genome, leading to hundreds of novel functional elements and many new
insights on animal biology.
The work allowed the discovery of 1,193 new sequences that encode
proteins, the flagging of 414 regions that were mistakenly labelled as
protein-coding genes, and corrections to hundreds of previously
annotated protein-coding genes. This allowed the researchers to revise
the catalogue of protein-coding genes for Drosophila melanogaster,
with updates affecting 10% of all genes. The revision was confirmed
through manual curation by scientists at the FlyBase consortium and
through large-scale experimental validation led by the Berkeley
Drosophila Genome Project.
In addition, the researchers identified hundreds of new RNA genes and
structures, new microRNA genes, and new DNA sequences involved in the
control of gene expression during embryo development and environmental
changes. The twelve genomes also allowed the prediction of very small
regulatory targets in the genome, which can help piece together the
first regulatory network for an animal genome without having to
perform intense and expensive experiments.
The work also led to many surprises. For example, the researchers
found many protein-coding genes that defy the traditional rules of how
the DNA code gets translated into protein. For example, 150 genes
apparently bypass signals that would normally cause DNA to stop being
translated, and other genes encode multiple proteins in a single RNA
transcript. Other findings include surprising evidence that a single
microRNA gene locus can produce up to four functional microRNAs, each
with distinct functions.
The team's analysis is the first time that such a diverse range of
evolutionary signatures has been applied to identify the functional
elements of a genome in a comprehensive way. "By comparing many
closely related genomes, we were able to discover things we never
thought were possible using one genome sequence alone," said
Kellis. One intriguing possibility is that evolutionary signatures may
even identify novel, yet unknown classes of functions. For example,
although the fruit fly has been intensely studied for over a century,
microRNAs were only discovered in the last decade, and are now known
to play a central role in development. Many other classes of yet
unknown functional elements may be hidden in the fly genome, and
recognition of their common evolutionary properties may help lead to
their discovery.
The study of the 12 flies has immediate implications for the discovery
of functional elements in the human genome. "We are now using similar
methods to analyse 32 mammalian genomes, in order to help understand
the human genome," Kellis explained. "We should be able to apply the
methodology of evolutionary signatures to any group of closely related
Fly consortium uncovers swarm of novel findings about genomic evolution, function
Scientists compare 12 fruit fly genomes Fly consortium uncovers swarm
of novel findings about genomic evolution, function
An international research consortium of scientists, supported by the
National Human Genome Research Institute (NHGRI), part of the National
Institutes of Health (NIH), today announced publications comparing the
genome sequences of 12 closely related fruit fly species, 10 of which
were sequenced for the first time. The analyses identify thousands of
novel genes and other functional elements in the insects' genomes, and
describe how evolution has shaped the genomes of these important
models for genetic research.
"This remarkable scientific achievement underscores the value of
sequencing and comparing many closely related species, especially
those with great potential to enhance our understanding of fundamental
biological processes," said Francis S. Collins, M.D., Ph.D., director
of NHGRI. "Thanks to the consortium's hard work, scientists around the
world now have a rich new source of genomic data that can be mined in
many different ways and applied to other important model systems as
well as humans."
The fruit fly is one of the most important model organisms in genetic
research. In studies dating back nearly a century, researchers used
fruit flies to discover the basic rules of inheritance and to study
how a single cell, the fertilized egg, develops into a whole
animal. Because fruit flies are easy to work with in laboratory
settings, they continue to be used as a model to study fundamental
biological processes that occur in many living things, including
Although fruit flies have a genome that is 25 times smaller than the
human genome, many of the flies' genes correspond to those in humans
and control the same biological functions. In recent years, fruit fly
research has led to discoveries related to the influence of genes on
diseases, animal development, population genetics, cell biology,
neurobiology, behavior, physiology and evolution.
In papers published in the journal Nature, the Drosophila Comparative
Genome Sequencing and Analysis Consortium compare the genome sequences
of Drosophila melanogaster, which was published in 2000, and
D. pseudoobscura, published in 2005, with the recently sequenced
genomes of D. sechellia, D. simulans, D. yakuba, D. erecta,
D. ananassae, D. persimilis, D. willistoni, D. mojavensis, D. virilis
and D. grimshawi. In addition, two companion manuscripts in today's
Nature were contributed by researchers from the Laboratory of Cellular
and Developmental Biology of the National Institute of Diabetes and
Digestive and Kidney Diseases, at NIH.
The work was carried out by hundreds of scientists from more than 100
institutions in 16 countries. The sequencing of the 10 new genomes was
led by Agencourt Bioscience Corp., Beverly, Mass. Other sequencing
centers contributing to the sequencing were Washington University
School of Medicine, St. Louis, Mo., the Broad Institute of MIT and
Harvard, Cambridge, Mass., and the J. Craig Venter Institute,
Rockville, Md. The sequencing centers were funded as part of NHGRI's
Large-Scale Sequencing Research Network.
To the average person, one fruit fly hovering around an overripe
banana looks pretty much like any other. Researchers found that, at
first glance, the genomes of the various types of fruit flies appear
quite similar. However, a more detailed examination reveals that only
77 percent of the approximately 13,700 protein-coding genes in
D. melanogaster are shared with all of the other 11 species.
Scientists observed that different regions of the fruit fly genomes,
including protein-coding genes and gene families, are evolving at
different rates. For example, genes involved in taste and smell,
detoxification and metabolism, sex and reproduction, and immunity and
defense appear to be the most rapidly evolving in the fruit fly
The findings suggest that these particular protein-coding genes likely
evolve in the fruit fly genome as a result of adaptation to changing
environments and sexual selection. For instance, the fruit fly species
D. sechellia, whose population lives on the Seychelles islands in the
Indian Ocean, is losing gustatory (taste) receptors approximately five
times faster than other fruit fly species that generally encounter a
more diverse set of foods than those available on an island.
In a surprising finding, researchers found that the genes that produce
selenoproteins appear to be absent in the D. willistoni
genome. Selenoproteins are responsible for reducing excess amounts of
the mineral selenium, an antioxidant found in a variety of food
sources. Selenoproteins are present in all animals, including
humans. D. willistoni appears to be the first animal known to lack
these proteins. However, researchers suggest that D. willistoni may
possibly encode selenoproteins in a different way, opening a new
avenue for further research.
A project leader and co-author for the studies, William M. Gelbart,
Ph.D., of Harvard University in Cambridge, Mass., said "The
availability of the 12 fruit fly genomes resulted in a dramatic
increase in resolution allowing us to examine how evolution has
fine-tuned biological processes. Our work shows that discovery power
increases with the number of genomes available for comparison."
More than 40 companion manuscripts with further detailed analyses are
in current and forthcoming issues of Bioinformatics, BioMed Central
(BMC) Bioinformatics, BMC Evolution Biology, BMC Genomics, Genetics,
Genome Biology, Genome Research, Journal of Insect Science, Molecular
Biology and Evolution, Nature Genetics, Public Library of Science
(PLoS) Genetics, PLoS One, Proceedings of the National Academy of
Sciences, and Trends in Genetics.
In addition to their analyses aimed at gaining a better understanding
of genomic evolution, consortium scientists used the 12 fruit fly
genomes to identify thousands of new genes and other functional
elements. This work will bolster efforts to find all functional
elements in the reference genome sequence of D. melanogaster.
"Comparing the 12 fruit fly genomes allowed us to recognize
evolutionary signatures characteristic of each function. These
signatures enabled us to distinguish and identify thousands of new
functional elements." said Manolis Kellis, Ph.D., of the Massachusetts
Institute of Technology in Cambridge, Mass., and a co-author of the
Nature papers.
Specifically, researchers used the evolutionary signals to discover
1,193 new protein-coding sequences and called into question 414
sequences previously reported as protein-coding genes in the
D. melanogaster genome sequence. In addition, they found hundreds of
novel functional elements across the 12 fruit fly genomes, including:
non-protein coding genes; regulatory elements involved in the control
of gene transcription; and DNA sequences that mediate the structure
and dynamics of chromosomes.
"Our analyses only represent a small portion of questions that can be
answered in the context of these 12 species," said Andrew G. Clark,
Ph.D., from Cornell University in Ithaca, N.Y., a co-author on the
Nature papers. "Today's findings represent an important starting point
for future research aimed at understanding the function of the genome
features we discovered and their relevance to the human genome."
Fruit fly genome provides evolutionary insight
Scientists have cracked the DNA code of a dozen different species of
fruit fly, a tour de force that will lay bare new details of how
evolution works.
The study will shed light on human medicine too and has already
revealed that earlier methods to find genes are flawed.
The fruit fly is the world's favourite laboratory animal because many
of their genes correspond to those in humans and control the same
biological functions, a fact underlined how today's study was carried
out by hundreds of scientists from over 100 institutions in 16
"The evolution of the fruit fly is interesting in itself as,
basically, they have been following us around the planet as we discard
rotting fruit - they started off in Africa with us - now are
everywhere," said Dr Ewan Birney of The European Bioinformatics
Institute, Cambridge, who comments on the work in Nature.
"Now these fruit flies, who have been our evolutionary companion as we
left Africa, repay us by giving insights into their genome. Flies can
remember things, they can get drunk," he said, listing some of the
ways they will help human medicine.
"Their evolution is particular interesting, especially in Hawaii where
there are these weird super-sized flies," he said, adding that he
expected the work will pave the way for similar comparisons of
mammalian genomes.
More than 40 manuscripts related to these studies are forthcoming, in
addition to the papers published today in the journal Nature.
The studies reveal new details of how genes are regulated- turned on
and off - which is important to understand how relatively few genes -
around 14,000 and 20,000 respectively, - can build an organism as
complex as a fruit fly or a human being.
Genes are messages written in DNA that control the manufacture of the
proteins the build and operates our bodies. Today's papers also reveal
major flaws in the way scientists identify genes.
They found 1,193 new DNA sequences that encode proteins, 414 regions
that were mistakenly labelled as protein-coding genes, and made
corrections to hundreds of stretches of DNA previously thought to be
protein-coding genes. The resulting corrections will affect 10% of all
fruit fly genes.
The scientists also learned certain genes appear to be evolving faster
than others, such as the genes associated with smell and taste, sex
and reproduction, and defences against disease.
"Having the sequences of many closely related species allows us to
study the evolutionary forces that have shaped the fruit fly's family
tree," said Manolis Kellis, associate member of the Broad Institute,
near Boston.
"This remarkable scientific achievement underscores the value of
sequencing and comparing many closely-related species, especially
those with great potential to enhance our understanding of fundamental
biological processes," said Dr Francis Collins, director of the US
National Human Genome Research Institute, Maryland.
"Scientists around the world now have a rich new source of genomic
data that can be mined in many different ways and applied to other
important model systems as well as humans."
Surprises in fly genome
Comparative genome sequencing of 12 Drosophila species reveals new
genes, gene structures, and regulators
Even after decades of genetic study, the Drosophila melanogaster
genome still contains undiscovered genes and other genetic elements,
according to a study in this week's Nature.
By comparing evolutionary signatures in the genome sequences of 12
Drosophila species, the authors found new protein-coding, RNA, and
microRNA genes, as well as gene regulators and targets. They also
discovered that several unusual translation mechanisms -- including
skipped stop codons and reading-frame shifts -- are more common than
previously thought.
Accompanying research papers in Nature this week present an overview
of Drosophila genome evolution, as well as new findings in Drosophila
sex chromosomes and sex-biased gene expression.
Finding new protein-coding genes in an organism as well-studied as
D. melanogaster is "an interesting surprise," said Elliott Margulies
of the National Human Genome Research Institute in Rockville, Md., who
was not involved in the work.
Scientists led by four researchers -- Alexander Stark, Michael Lin,
and Pouya Keradpour of the Broad Institute of MIT and Harvard in
Cambridge, Mass., and Jakob Pedersen of the University of Copenhagen
-- examined the 12 Drosophila sequences for evidence of regions that
have been under natural selection. They scanned the genomes for unique
evolutionary signatures associated with each type of genetic
element. For example, conserved protein-coding regions usually show
base changes that do not affect amino-acid sequence, while RNA genes
allow mutations that preserve base-pairing interactions and microRNA
genes show strong conservation only in certain parts of their
This approach "kicks up a notch the kind of comparative genomics that
you can do," Margulies said. While most previous comparative
studiesonly allowed researchers to determine whether a given region
went through selection, using these signatures identifies what type of
element it likely is.
The analysis predicted about 1,200 novel protein-coding exons in the
Drosophila genome, corresponding to 150 new genes. Their results led
to the revision of hundreds of gene transcription and translation
models, which senior author Manolis Kellis of the Broad Institute said
will be reflected in the next version of the annotated Drosophila
genome at FlyBase.
The authors found evidence of several unusual gene structures in the
fly, such as stop-codon readthough, in which a stop codon is misread
or skipped, and poly-cistronic genes, which code for two or more
distinct proteins. They also found that the Drosophila genome contains
several instances of "programmed" changes in the reading frame of
translation, which alters how messenger RNA is read into protein. All
of these discoveries were "really unexpected," Kellis told The
Scientist. "Many protein-coding genes don't actually follow the rules
you would expect them to follow."
According to Ross Hardison of Pennsylvania State University in
University Park, who was not involved in the work, these gene
structures were thought to be very rare. "The importance of them
becomes more obvious when you see multiple examples of them in a
genome-wide study," he said.
The comparative analyses uncovered new microRNA genes, RNA genes, and
RNA structures involved in post-transcriptional processes such as
messenger RNA editing and translational control. They also revealed
many new gene regulators, including several found at higher levels in
specific tissues than regulators already known to be important in
these tissues.
Their ability to add so much information to D. melanogaster annotated
genome through comparative genomics shows "how powerful these methods
are," Kellis said.
Analyses Of 12 Fly Genomes Reveals New Insights On Genome Evolution And Regulation
Genome Research is publishing a number of papers related to
comparative analyses of twelve Drosophila (fly) genomes. The twelve
fly genome project is unique in that the analyses of closely related
species has allowed for a more complete and correct annotation of
functional genes and regulatory elements in Drosophila melanogaster,
a major model organism in genetics.
With a life span of just weeks, the fruit fly has been an important
model organism in genetic studies for decades and has helped
researchers unravel the rules that govern inheritance. Though there
are many differences between fruit flies and humans, the two also
share many genes that regulate the same biological functions.
Expanding universe of microRNAs
MicroRNAs (miRNAs) are short RNA molecules encoded by plant and
animal genomes that have garnered significant interest for their
ability to regulate gene expression. A number of miRNAs have been
discovered in recent years, however it is likely that many miRNAs
have gone undetected. Two papers published in Genome Research
utilize the twelve fly genomes to identify novel miRNAs, further
refine the set of known miRNAs, and investigate the biology and
origins of miRNA genes.
In a study led by Dr. David Bartel, a combination of computational
methods and high-throughput sequencing techniques identified new
miRNAs conserved across the Drosophila species. "The new fly genomes
enabled us to predict new miRNAs, 20 of which we experimentally
confirmed, and the genome alignments enabled us to more accurately
predict the evolutionarily conserved targets of these and other
miRNAs," explains Bartel.
While computational methods are important for identifying novel
miRNAs, large-scale sequencing of small RNAs indicates that many
miRNAs continue to evade prediction. "Most of the 59 novel miRNAs
that we found were not predicted by us or by others," describes
Bartel. "This illustrates the advantages of high-throughput
sequencing of small RNAs, and the limitations of comparative
sequence analysis for miRNA gene identification."
In a related paper, a study led by Dr. Manolis Kellis utilized the
twelve Drosophila genomes to computationally predict and
experimentally validate novel miRNAs by defining the structural and
evolutionary properties of known miRNAs. Classification of newly
identified miRNAs has revealed greater diversity in the regulation
gene expression by miRNAs, with increased potential for
combinatorial regulation, and provided new insights on miRNA
biogenesis and function. "We learned that both arms of a miRNA
hairpin can produce functional miRNAs, which sometimes work
cooperatively to target a common pathway," explains Kellis.
The combination of comparative and experimental analyses by both
groups also provided novel evidence for emergent gene function,
deriving from the portion of the miRNA hairpin previously believed
to be discarded, and the strand of the DNA previously not thought to
produce a miRNA.
Revisiting D. melanogaster
Drosophila melanogaster is one of the most intensely studied model
organisms in biology. Numerous studies over the years have defined
nearly 14,000 protein-coding genes by experimental and computational
methods, however these methods are likely to have produced erroneous
annotations or may be missing other annotations. In order to assess
the D. melanogaster protein-coding gene catalog, a group of
researchers led by Dr. Manolis Kellis identified evolutionarily
signatures of protein-coding genes by comparative analysis of the
twelve fly genomes. This strategy was then applied to evaluation of
the current catalog and identification of genes that have escaped
The study led to the discovery of hundreds of new genes, refined
existing genes, and concluded that greater than 10% of the
protein-coding gene annotations requires refinement.
Additionally, the work revealed abundant unusual gene
structures. "We have learned that many brain-expressed proteins may
be undergoing post-transcriptional changes by stop-codon
read-through," explains Kellis. "We found 149 genes for which a
conserved stop codon is followed by strong evidence of
protein-coding selection for up to hundreds of amino acids,
suggesting a new mechanism for post-transcriptional regulation in
animal genomes." The researchers also report additional widespread
evidence suggesting several diverse mechanisms of
post-transcriptional regulation for protein-coding genes.
Scientists compare 12 fruit fly genomes
WASHINGTON, Nov. 7 (Xinhua) -- An international research
consortium of scientists announced on Wednesday their publications
comparing the genome sequences of 12 closely related fruit fly
species, 10 of which were sequenced for the first time.
The analyses identify thousands of new genes and other functional
elements in the insects' genomes, and describe how evolution has
shaped the genomes of these important models for genetic research.
The work was carried out by hundreds of scientists from more than
100 institutions in 16 countries. In papers published in the
journal Nature, the consortium compare the genome sequences of
Drosophila melanogaster, which was published in 2000, and
D. pseudoobscura, published in 2005, with the recently sequenced
genomes of 10 fruit fly species.
Researchers found that, at first glance, the genomes of the
various types of fruit flies appear quite similar. However, a more
detailed examination reveals that only 77 percent of the
approximately 13,700 protein-coding genes in D. melanogaster are
shared with all of the other 11 species.
Scientists observed that different regions of the fruit fly
genomes, including protein-coding genes and gene families, are
evolving at different rates. For example, genes involved in taste
and smell, detoxification and metabolism, sex and reproduction,
and immunity and defense appear to be the most rapidly evolving in
the fruit fly genomes.
The findings suggest that these particular protein-coding genes
likely evolve in the fruit fly genome as a result of adaptation to
changing environments and sexual selection.
"Scientists around the world now have a rich new source of genomic
data that can be mined in many different ways and applied to other
important model systems as well as humans," said Francis Collins,
director of the U.S. National Human Genome Research Institute,
which supported the project.
Making a buzz: Gene study on fruitflies sheds light on evolution
Lab sleuths said they had laid bare the complete genetic code of a
family of fruitflies, enabling the world's first comparison of the
genomes of a dozen closely-related species.
The achievement opens a window onto key evolutionary processes and may
one day serve as a model for understanding why we humans and our
nearest primate cousins are so close yet also so different, they
In studies to be published on Thursday in the British journal Nature,
around 150 scientists around the globe added 10 fully-sequenced
genomes of the fruitfly Drosophila to two that had already been
They also began to mine the treasure trove of data, publishing
analyses touching on two cornerstones of evolutionary theory -- the
principles of positive and negative selection.
Positive selection means genetic mutations that spread through a
species because they provide an advantage in the struggle to
survive. Negative selection means genetic characteristics that are
forced out of a species because they are an encumbrance to survival.
By looking across a broad family, researchers can spot a mutation that
has been favoured as it confers an evolutionary advantage on a
specific species -- such as a change in the immune system -- or a
mutation that has been weeded out.
They can also identify genetic code that has remained unchanged, or
conserved, because of its enduring usefulness to the species.
Histone proteins, which determine how DNA is packed inside cell
nuclei, have barely changed over the 60 million years going back to
the single common ancestor from which all the Drosophila species
eventually emerged, the investigators found.
"Once evolution figures how to make something like that work, it does
not change easily," said David Rand of Brown University, whose
laboratory worked on sequencing mitochondrial DNA from all 12 fruitfly
Only 77 percent of the approximately 13,700 protein-coding genes are
shared by all 12 species, the consortium says.
The humble fruitfly, especially the main species, Drosophila
melanogaster, has long been used to probe the biology of multicellular
It is familiar to many as the barely visible objects of school-age
experiments on the mechanics of heredity.
Over the last decade, however, geneticists have gradually shunted the
harmless flies aside in favor of worms and especially mice, which
offer better models for studying the relationship between genes and
human disease.
But the newly sequenced "Drosophila Dozen," which range from the tiny
D. simulans to the relatively giant D. grimshawi of Hawaii and the
red-eyed D. yakuba of the African savannah, are sure to rekindle
interest in the winged beasts.
"The 12 Drosophila genomes give us an unprecedented opportunity to
understand evolutionary adaptation right down at the genetic level,"
said Brown researcher Kristi Montooth.
"If we want to understand how the fly that lives on the savannah is
different from the fly that lives in the desert, we can trace
physiological differences back to specific genes."
A novel approach to DNA analysis
In a milestone for the emerging field of comparative genomics, an
international team of scientists has carried out a comparative
analysis of the genome sequences of 12 different species of fruit
flies. Not only did the researchers uncover patterns in the way that
genes evolve as species adapt to different environments, but they
also developed a new way of identifying the functional elements of
the genome--a discovery with potentially far-reaching consequences.
For more than a hundred years, the fruit-fly species Drosophila
melanogaster has been instrumental in the study of genetics,
developmental biology, and animal behavior. Because a significant
number of human genes have fruit-fly analogues, researchers have
also used the insect to study many human diseases, including cancer,
diabetes, and neurodegenerative disorders such as Alzheimer's. In
2000, scientists published the genome sequence for D. melanogaster;
the sequence of a second fruit-fly species followed several years
There are 1,500 species of fruit flies, however, and they vary in
appearance, behavior, and habitat. To fully understand the fruit-fly
genome and how it has evolved, a consortium of more than a hundred
labs around the world sequenced an additional 10 species and
compared all 12 sequences. The group details its findings in two
reports published in the November 8 issue of Nature.
"If you want to get a crystal-clear picture of how genes influence
what an animal will look like, what it will eat, what behavior it
will exhibit, this is a completely unparalleled resource for doing
that," says Leslie Vosshall, a neurogeneticist at Rockefeller
University, in New York.
The researchers selected species from all over the world--from
Africa, Asia, the Americas, and the Pacific Islands. Some species
are widespread and feed on a range of foods, whereas others are more
limited. For instance, one species lives only on the Seychelles
islands off the east coast of Africa and eats only one kind of
In one of the papers, a team led by Manolis Kellis, a computational
biologist at MIT, compared the 12 sequences in order to identify all
the functional elements in the fruit-fly genome. These include not
only genes that code for proteins, but also sequences that help
regulate gene expression by, for instance, encoding small RNA
molecules that bind to other parts of the genome. To find these
elements, researchers typically look for sequences that are common,
and therefore highly conserved, among different genomes. "The basic
premise of comparative genomics is that if something is conserved
over millions of years in a dozen species, it's likely to do
something useful," says Kellis.
But Kellis and his colleagues were also seeking an alternative
strategy. They figured that by looking only for sequences that have
remained roughly the same, they would miss a large number of
functional elements. For instance, protein-coding genes can undergo
extensive changes and yet retain their critical functions.
By looking at all 12 genomes, the team found that each type of
functional element changes in characteristic ways over time, and
those patterns of change serve as evolutionary signatures. For
instance, a series of three-letter DNA sequences in which the first
two letters are always conserved but the third one changes is likely
to be a protein-coding gene, says Kellis. So the researchers
designed computer algorithms to mine the sequence data and find the
evolutionary signatures for each type of functional element. "This
allowed us to find things that we would never have expected to find
just by looking at a single genome," says Kellis.
Kellis's team found thousands of previously unidentified functional
elements, including 150 protein-coding genes and more than a hundred
microRNA genes. (MicroRNAs are short segments of RNA that silence
genes by binding to specific sites in the genome.) The researchers
also found that some genes, during their translation into proteins,
ignore certain instructions and, as a result, acquire bits of
protein encoded by other genes. "This is an entirely new mechanism,"
says Kellis, adding that his group has since found evidence of this
mechanism in the human genome as well.
The second Nature paper describes research led by Andrew Clark, a
population geneticist at Cornell University, who looked at known
genes to see how they vary from one species to another and how they
evolve, acquiring new functions as species adapt to their changing
environments. Genes involved in the immune system, for instance,
appear to evolve more rapidly than genes in the rest of the
genome. The same was true for genes that regulate insecticide
Taste and smell receptor genes also undergo frequent changes. When
the researchers compared species of flies that are generalists with
those that have more specialized food preferences, they found that
the specialists lose genes for different taste receptors at a much
higher rate than the generalists do. "How you smell the world
influences how you eat, and this will tell us an enormous amount
about how genes that encode for smell and taste influence behavior,"
says Vosshall.
The studies of the 12 fruit-fly genomes will no doubt help
scientists better understand the human genome, says Kellis. Not only
do fruit flies and humans have so many genes in common, but now
researchers have a systematic way of interpreting genomes that could
lead to the discovery of entirely new kinds of functional elements,
he says.
A fruity dozen: sequencing effort nets many fly genomes
Normally, our reports focus on one or two papers that mark a major
or intriguing scientific milestone. Today is an exceptionplay a huge
role in evolution. In any species' lineage, nearly half the gene
families will change in size. Somewhere in the genome, a family
member will be gained or lost every 60,000 years. The specialist,
sechellia, lost more than most, as its simplified lifestyle has
allowed it to forgo many metabolic activities and defenses against
toxins. In contrast to this rampant reuse of existing genes, new
genes appeared rarely. The researchers detected only 44 new genes
that were clearly not the result of horizontal gene transfer. Most
of them were short, intron-free, and involved in sex and
The massive data set also gives us greater perspective on the rare
and unusual. Many large, conserved protein coding regions have stop
codes in the middle of them. The authors that identified them
suggest that 123 are cases where a single transcript codes for more
than one protein, something that was once thought to only happen in
bacteria. Another 150 or so seem to have the stop codon edited out
by enzymatic alteration of the RNA (common in some single-celled
Eukaryotes, but not recognized to occur widely in animals).
I'm only scratching the surface of a couple of the publications, but
it seems that anyone interested in biology will find something
compelling somewhere in the blizzard of publications coming out of
these genomes. And get ready for something even bigger: a similar
effort is already well underway to sequence an equivalent group of
Scientists complete DNA sequencing and analysis of multiple fruit fly genomes
Copy of Broad Press Release
Fruit Fly Blitz Shows the Power of Comparative Genomics
A consortium of about 250 researchers have done a 12-way comparison
to track the evolution of genes, regulatory regions, entire
pathways, and cellular processes. Having these patterns in hand
makes it easier to spot similar features in the genomes of other
species, including humans, researchers report in more than 40
research papers in the 8 November issue of Nature and in other
"This work has really increased the sophistication of what we can
learn from comparative sequence analysis," says genomicist Elliott
H. Margulies of the National Human Genome Research Institute (NHGRI)
in Bethesda, Maryland. As project co-leader Michael Eisen of
Lawrence Berkeley National Laboratory in California points out, the
comparison "allows you to map where [genetic] changes occur along
the tree, and that allows you to study the process of evolution, not
just the product."
Manolis Kellis of the Broad Institute in Cambridge, Massachusetts,
led an assessment of how each type of gene or regulatory region
changed--or didn't change--from one species to the next, revealing
specific evolutionary patterns, or signatures. Kellis and others
have incorporated those telltale patterns into software to look for
the same patterns in other species to pinpoint each type of
DNA. "This allows us to assign function" to some regions "through
computation alone," says Margulies.
Based on a common pattern of insertions, deletions, base usage, and
substitutions, Kellis and his colleagues detected 192 undiscovered
protein-coding genes as well as 150 that do not follow standard
rules. Typically, proteincoding genes have a "stop" sequence that
signals the end of the gene. But in these 150 cases, protein-coding
sequences extended beyond the "stop." "It's always a little humbling
that the assumptions we are taught in school do not apply across all
genes," says Ewan Birney of the EMBL European Bioinformatics
Institute in Hinxton, U.K.
With these new tools, which are particularly useful for recognizing
regulatory DNA, Kellis and his colleagues have pieced together a
fruit fly gene regulatory network that incorporates 81 microRNAs and
67 transcription factors. "The methodology and principles are
absolutely general, and they are applicable to any genome," says
Kellis. Others say that the model still needs refining to reconcile
it with experimental results. But geneticist Rama Singh of McMaster
University in Hamilton, Canada, is quite pleased with this
beginning. Because many fruit fly and human genes are equivalent,
the network "is going to tell us a lot about humans," he predicts.
The analysis bodes well for the utility of bird, marsupial, and
reptile sequences in analyzing the human genome. It also argues for
sequencing and comparing all the primates, says Birney: "The
take-home message is that there are a lot of clear wins from doing
this sort of evolutionary genomics."
UNM Graduate Student is Co-author of Paper in Nature
UNM Graduate Student Sushmita Roy spent last summer at the
Massachusetts Institute of Technology in the lab of Manolis Kellis,
an assistant professor of Electrical Engineering and Computer
Science. He specializes in developing computational algorithms for
decoding the information present in the genomes of organisms.
As part of her internship, Roy played a small part in a large
project analyzing the genomes of 12 fly species. The paper
describing the project and its results was released this week in the
journal "Nature."
In her summer project, she analyzed statistical properties of the
fly regulatory network, computationally predicted by Kellis' lab,
with nodes representing genes and edges representing regulatory
control exercised by a "regulator" gene on a "regulate" gene.
This led to the identification of network nodes with different types
of connectivity. Nodes with high-connectivity were themselves
regulators controlling important events in the growth and
development of flies.
Roy says the edges in the network also had non-random
properties. Edges had a higher chance of existing between genes
functioning in the same fly tissue, rather than in different
The identification of these statistical properties helped the
researchers to clarify the biological significance of the predicted
regulatory network of developing flies, which can provide insight
into important developmental events in higher organisms.
The title of the journal article is "Discovery of Functional
Elements in 12 Drosophila Genomes Using Evolutionary Signatures."
Roy is listed as one of the co-authors on the article.
Roy is working on her Ph.D. in Computer Science applying statistical
algorithms to understand living systems. Her advisors, Assistant
Professor of Computer Science Terran Lane and Professor of Biology
Margaret Werner-Washburne are guiding her efforts to apply
statistical algorithms to understand living systems.
Her internship in the MIT summer program was sponsored by the
Program in Interdisciplinary Biological and Biomedical Sciences
(PIBBS) at UNM and (Howard Hughes Medical Institute (HHMI)
Interfaces program and was arranged by Bruce Birren, director of the
Microbial Sequencing Center and co-director of the Genome Sequencing
and Analysis program at the Broad Institute at the MIT and UNM
Biology Professor Margaret Werner-Washburne.
Genome Analysis of Twelve Drosophila Species
On Thursday, November 8, there was a birth announcement that has
been two years in the making: the initial publications on the
comparative genome analysis of the entire DNA sequences of 12
species of fruit fly (genus Drosophila). The announcement will
include two main papers in Nature describing the community effort.
One paper (Kellis et al.) focuses on the identification of
evolutionary signatures for several different classes of functional
elements within the Drosophila melanogaster genome. The other
(Drosophila 12 Genomes Consortium) focuses on understanding gene and
genome evolution using whole genome datasets. In addition,
publication of these papers has been coordinated with the
near-simultaneous publication of more than 40 companion papers in
several different journals, notably Nature, GENETICS, Genome
Research, PLoS, BMC, Molecular Biology & Evolution, Genome Biology
and others. This work comprises the research efforts of literally
hundreds of scientists for the past 2-3 years.
The Harvard MCB FlyBase group (Bill Gelbart, PI, Lynn Crosby, Bev
Matthews, Andy Schroeder, Susan St. Pierre, Sian Gramates, Rob
Kulathinal, Margo Roark, Ken Wiley, Jr., Kyl Myrick, Jerry Antone,
AJ Bhutkar, Susan Russo and Peili Zhang) has been an integral part
of the analysis coordination of these efforts. Several members of
this group are co-authors on the main papers, as well as three
companion papers that are in press and two others that are still in
preparation. A number of surprising findings, only revealed through
this kind of comparative genome analysis, were discovered. "First,
more than 100 genes are expressed in ways that violate typical
genetic code dogma, such as stop codon readthrough and shifting of
reading frame in the middle of a translated segment", says Dr. Bill
Gelbart. Secondly, "a species that, during its evolution, has
dropped the use of selenocysteine (the so-called 21st amino acid)
during translation (Drosophila willistoni)" was discovered.
In addition to these findings, the initial analyses lay the
foundation for future studies that will dissect the functional
elements of the genomes in exquisite detail and that will help
understand how ecological specialization is reflected in genome
evolution. Finally, this work presents a model for selecting a
cluster of closely related species for whole genome sequencing,
allowing better understanding of the gene products and other
functional elements encoded by the genome of a species of biological
or practical importance.
Kellis, MIT team announce significant findings in fly genome studies.
Manolis Kellis, the Karl Van Tassel Career Development Assistant
Professor of Electrical Engineering and Computer Science at MIT, and
also affiliated with the Computer Science and Artificial
Intelligence Lab, (CSAIL) and the Broad Institute of MIT and
Harvard, has announced the culmination of several years of efforts
to describe the sequencing and analysis of 12 Drosophila
genomes. The work, a large-scale project comprising a diverse
interdisciplinary team of scientists and co-led by the group at MIT,
and also including scientists at the Whitehead Institute for
Biomedical Research and the Harvard Department of Molecular and
Cellular Biology, uncovers the functional elements encoded in the
fruit fly genome as well as their evolutionary dynamics. The work
resulted in many novel findings about the biology of animal genomes,
and the computational approaches used promise in this work is that
the approach may also to help unlock the secrets of many other
genomes, including those of the human genome.
This work appears in the Nov. 8 issue of Nature and in more than 40
accompanying papers in Genome Research and other journals.
The group at MIT, including the Broad Institute and the Whitehead
Institute for Biomedical Research, led the discovery effort, the
first of its kind and scale, ranging across protein-coding genes,
RNA genes, microRNAs, regulatory motifs, and regulatory networks. By
comparing the 12 species, they Kellis and his colleagues were able
to discover thousands of new genes and other functional elements in
the fruit fly, learning a tremendous amount about the animal
biologygenomes of animal genomes, and revealing new insights into
their functioning and regulation, and the scaling of comparative
discovery power with many species. In particular, the analysis
showed that some protein-coding genes defy the traditional rules of
protein translation, reading through stop codons for sometimes
hundreds of amino-acids, and some microRNA genes can produce many
functional products from a single regions, encoded in overlapping
Kellis reports: "The results have major implications on the
understanding of the human genome, and the our team at MIT is now
leading the an effort on the to discover the functional elements in
the human genome analysis of by comparing 32 eutherian mammals to
understand the human. We are already now using similar methods to
analyze 32 mammalian genomesin the human, in order to help
understand the human genomecombined with large-scale experiments to
study tissue-specificity, cell differentiation, and epigenomics,"
Kellis explains. "We should be able to apply tThe methodology of
using evolutionary signatures to discover functional elements is
general, and should be applicable to any group of closely related
Computational comparison of multiple Drosophila genomes proves to be a powerful research tool.
CSAIL's Computational Biology Group led by Manolis Kellis co-led one
of the first large-scale comparisons of multiple animal
genomes. Results of the project will appear in four papers in
Nature, and 40 companion papers in Genome Research, Genetics, Nature
Genetics, and other journals.
One of the unique aspects of this project is that it was led by
computational scientists, working with dozens of experimental labs
to validate and test hypotheses. "Our group at MIT led the discovery
effort, the first of its kind and scale, ranging across
protein-coding genes, RNA genes, microRNAs, regulatory motifs, and
regulatory networks," says CSAIL PI and Broad Institue Associate
Member, Manolis Kellis. "By comparing 12 species the fruit fly
Drosophila, we were able to discover a tremendous amount about the
biology of animal genomes and reveal new insights into their
functioning and regulation. The technique of comparing genomes of
multiple related species also provides a powerful methodology that
could help researchers in the study of other genomes, including that
of humans."
Massive Project Reveals Shortcomings Of Modern Genome Analysis
The sequencing and comparison of 12 fruit fly genomes -- the result
of a massive collaboration of hundreds of scientists from more than
100 institutions in 16 countries -- has thrust forward researchers'
understanding of fruit flies, a popular animal model in science. But
even human genome biologists may want to take note: The project also
has revealed considerable flaws in the way they identify genes.
"We've made huge progress in recent years with many genomes,
including humans, but a lot of the problems can't be solved by
simply dumping data into a computer and having truth and light come
out the other end," said Indiana University Bloomington biologist
Thomas Kaufman, who co-led the project. "One of the things we've
learned from this project is that when you compare a lot of
different but related genomes, you are more likely to see the genes
that are buried in all that A-C-T-G mush."
Two current papers in Nature separately report the results of the
four-year genome project and use the data to draw some conclusions
about the fruit fly genus Drosophila, particularly its star species,
the human nuisance Drosophila melanogaster. Among the papers'
conclusions is the idea that resolving any individual species'
genome is greatly enhanced when related genomes are compared to it.
"This remarkable scientific achievement underscores the value of
sequencing and comparing many closely-related species, especially
those with great potential to enhance our understanding of
fundamental biological processes," said Francis S. Collins, director
of NHGRI. "Thanks to the consortium's hard work, scientists around
the world now have a rich new source of genomic data that can be
mined in many different ways and applied to other important model
systems as well as humans."
The consortium purposely chose a wide variety of fruit flies for
study, guessing correctly that both gene similarities and
differences among the 12 species would be easier to identify. Some
of the Drosophila species the scientists studied are closely related
to D. melanogaster, some not. Some of the flies fulfill very
specialized ecological niches, such as D. sechellia, which has
evolved a unique ability to detoxify the fruit of the Seychelles'
noni tree. The other 10 species the consortium examined were
D. pseudoobscura, D. simulans, D. yakuba, D. erecta, D. ananassae,
D. persimilis, D. willistoni, D. virilis, D. grimshawi, and the
cactus-loving D. mojavensis. D. melanogaster's genome was published
in 2000 and D. pseudoobscura's genome was published in 2005. The
other genomes are newly published.
In comparing the 12 genomes, the scientists found 1,193 new
protein-coding genes and hundreds of new functional elements,
including regulatory sequences that determine how quickly genes are
expressed, and genes that encode functional RNAs such as small
nuclear RNAs. They also learned certain genes appear to be evolving
faster than others, such as the genes associated with smell and
taste, sex and reproduction, and defenses against pathogens.
The Drosophila 12 Genomes Consortium found that D. melanogaster
shares about 77 percent of its genes with the other 11 species they
studied. The scientists also found errors in about 3 percent of
previously sequenced D. melanogaster protein-coding genes,
correcting 414 gene sequences on record.
A vexing problem for genomicists is finding genes and other
important DNA sequences in heterochromatin, tightly packed areas of
chromosomes presumed to experience little
expression. Heterochromatin is common in animal genomes.
"The heterochromatin is very hard to analyze," Kaufman
said. "Studies show heterochromatin changes the most. It's full of
intermediate- and full-repeat sequences. And there are genes buried
in this stuff."
The conventions for locating the genes that encode proteins are
pretty well established. The lingering problem for genomics
biologists is locating genes whose parts are interrupted repeatedly,
as well as locating genes that do not code for proteins.
By comparing a huge number of genomes, these sorts of genes are
relatively easy to locate. Genes that do important things for cells
or tissues are more likely to be "conserved" over time; that is,
they don't change much despite millions of years of mutations.
Although fruit flies have a genome that is 25 times smaller than the
human genome, many of the flies' genes correspond to those in humans
and control the same biological functions. In recent years, fruit
fly research has led to discoveries related to the influence of
genes on diseases, animal development, population genetics, cell
biology, neurobiology, behavior, physiology and evolution.
One of the companion pieces accompanying this week's Nature papers
was written by IUB computational biologist Matthew Hahn. Hahn
reports in PLoS Genetics that although all 12 Drosophila species
have about the same number of genes (14,000), the genomes are more
dynamic than one might expect.
"The highest turnover in gene number occurs in genes involved in sex
and reproduction," Hahn said. "Our results demonstrate that the
apparent stasis in total gene number among species has masked rapid
turnover in individual gene gain and loss. It is likely that this
evolutionary revolving door has played a large role in shaping the
morphological, physiological, and metabolic differences among
species. This is the reason the 12 species only share 77 percent of
their genes."
Kaufman co-founded the project with Cornell University's Andrew
Clark, North Carolina State University's Gregory Gibson, Howard
Hughes Medical Institute's Eugene Myers, University of California
Berkeley's Patrick O'Grady, and University of Arizona's Therese
Markow. FlyBase, a joint project of IU Bloomington, UC Berkeley, and
Cambridge University, helped researchers access and study the 12
sequenced Drosophila genomes. Kaufman also directs the National
Institutes of Health-funded Drosophila Genome Resource Center.
Sequencing work was handled by research staff at the Baylor College
of Medicine, the Broad Institute of M.I.T. and Harvard University,
the Washington University School of Medicine, Agencourt Bioscience
Corp., and the J. Craig Venter Science Institute.
Decoding effort reveal fly species' DNA
Collaboration of researchers from 16 nations sequences fruit fly
An enormous effort to decode the DNA of one of science's most
important laboratory animals - the fruit fly - ended in success this
week as a collaboration of researchers from 16 nations announced the
sequencing of 10 fly species' genomes.
The research allows the extraordinary side-by-side comparison of the
DNA of 12 species of fruit flies - two had already been decoded - as
scientists search to understand the workings of individual genes and
how those genes translate into specific physical characteristics.
Already the research has produced results. Scientists involved in
the collaboration also announced the findings of their initial
analysis: the discovery of thousands of new genes and other
functional elements such as DNA segments responsible for turning
genes on and off.
"The availability of the 12 fruit fly genomes resulted in a dramatic
increase in resolution, allowing us to examine how evolution has
fine-tuned biological processes. Our work shows that discovery power
increases with the number of genomes available for comparison," said
Harvard Professor of Molecular and Cellular Biology William Gelbart,
one of the project's leaders.
The results were announced in a series of papers in the Nov. 8 issue
of the journal Nature. The work, conducted by hundreds of
researchers at more than 100 institutions, was supported by the
National Institutes of Health's National Human Genome Research
"This remarkable scientific achievement underscores the value of
sequencing and comparing many closely related species, especially
those with great potential to enhance our understanding of
fundamental biological processes," said National Human Genome
Research Institute Director Francis Collins. "Scientists around the
world now have a rich new source of genomic data that can be mined
in many different ways and applied to other important model systems
as well as humans."
With a life span of just weeks, the fruit fly has been an important
model organism in genetic studies for decades and has helped
researchers unravel the rules that govern inheritance. Though there
are many differences between fruit flies and humans, the two also
share many genes that regulate the same biological functions.
The dozen fruit flies now sequenced all belong to the genus
Drosophila, which has about 2,500 different species. Though some may
think there's little difference between fruit fly types, Gelbart
said the genetic variation between fruit fly species is as large as
that found among mammals. Fruit flies are adapted to life in a wide
variety of conditions, from the desert to the rain forest, and have
a wide range of physical traits.
With researchers spanning many institutions around the world,
Gelbart said the project, officially called the Drosophila
Comparative Genome Sequencing and Analysis Consortium, was at times
a challenge to keep moving forward. Gelbart accomplished the feat
together with eight other project leaders from the Broad Institute
of MIT and Harvard, Cornell University, the University of
California, Berkeley, the Lawrence Berkeley National Laboratory, the
Agencourt Bioscience Corp., the University of Manchester, the
National Institutes of Health, the University of Arizona, Indiana
University, and the Computer Science and Artificial Intelligence
Laboratory in Cambridge, Mass.
Among the findings from their side-by-side analysis of the dozen
fruit fly genomes, researchers were able to determine that some
genes are evolving faster than others. Genes involved in sex, taste,
smell, detoxification, and metabolism seem to be evolving most
rapidly. One example showed that a fly native to the Seychelles
islands, with a limited universe of foods, is losing taste receptors
much faster than other species.
Researchers also discovered the first known animal to lack genes to
produce selenoproteins, a type of protein needed to get rid of
excess selenium in the body, in the fruit fly species Drosophila
The side-by-side comparison not only allowed the discovery of new
genetic elements, it also allowed the correction of past errors. The
results call into question more than 400 genes previously thought to
encode proteins in the first species to be sequenced, Drosophila
The work, which will be available to other researchers, not only
opens up numerous new avenues for inquiry, it creates an enormous
bank of information available to future scientists probing life's
"Like most science, it raises more questions than it answers, but
that's OK," Gelbart said.
Making Sense of Anti-Sense MicroRNAs
Three independent papers in the January 1st issue of G&D report on
the discovery of a bidirectionally transcribed microRNA (miRNA)
locus in Drosophila.
The studies from Drs. Alexander Stark and Manolis Kellis (MIT) and
colleagues, and from Dr. Eric Lai (MSKCC) and colleagues, both
reveal that antisense transcription of the Hox miRNA locus,
miR-iab-4, generates the novel miRNA precursor mir-iab-8, which is
processed into active regulatory RNAs.
When ectopically expressed, mir-iab-8 generates homeotic phenotypes
via direct repression of Hox gene targets.
The paper from Dr. Welcome Bender (Harvard Medical School)
demonstrates that knock out of miR-iab-4 reveals the existence of a
miRNA transcribed from the opposite strand. Furthermore, the loss of
the antisense miRNA causes subtle derepression of a hox gene and
results in sterility of the mutant flies.
The identification of additional antisense miRNAs in Drosophila and
mammals suggests this as a mechanism that may contribute to the
diversification of miRNA function.
MIT reports new twist in microRNA biology
Computational biology group identifies new mechanism of gene regulation
MIT scientists have found a new way that DNA can carry out its work
that is about as surprising as discovering that a mold used to cast
a metal tool can also serve as a tool itself, with two complementary
shapes each showing distinct functional roles.
Professor Manolis Kellis and postdoctoral research fellow Alexander
Stark report in the January 1 issue of the journal Genes &
Development that in certain DNA sequences, both strands of a DNA
segment can perform useful functions, each encoding a distinct
molecule that helps control cell functions.
DNA works by complementarity: paired DNA strands serve as a template
for each other during DNA replication, and ordinarily only a single
DNA strand serves as a template to produce RNA strands, which then
go on to produce proteins. The process is similar to the way each
bump or dent in a mold is paired with a corresponding dent or bump
in the resulting molded object.
While many RNAs are eventually translated into proteins with
specific functions, some RNA molecules instead act directly,
carrying out roles inside the cell. Certain RNA genes, known as
microRNAs, have been shown to play important regulatory roles in the
cell, often coordinating important events during the development of
the embryo. These microRNAs fold into relatively simple hairpin
structures, with two stretches of near-perfect complementary
sequence folding back onto each other. One of the two 'arms' of a
hairpin is then processed into a mature microRNA.
The surprising discovery is that for some microRNA genes, both DNA
strands, instead of just one, encode RNA, and both resulting
microRNAs fold into hairpins that are processed into mature
microRNAs. In other words, both the tool and its mold appear to be
functional. Kellis and Stark found two such microRNA pairs in the
fruit fly, and eight more such pairs in the mouse.
The idea that there could be such dual-function strands, where both
DNA strands encode functional RNA products, "had never even been
hypothesized," Kellis says. But followup work confirmed that they
did indeed function in this way. The work suggests that other such
unexpected pairings, with both DNA strands encoding important
functions, may also exist in a variety of species.
This discovery builds on a similar, earlier surprising finding about
microRNA regulation. In December, Stark and Kellis reported that
both arms of a single microRNA hairpin can also produce distinct,
functional microRNAs, with distinct targets. Together, these two
findings suggest that a single gene can encode as many as four
different functions - one hairpin from each of the two DNA strands,
and then one microRNA from each of the two arms of each hairpin.
These recent papers are the latest example of the power of using
computational tools to investigate the genomes of multiple species,
known as comparative genomics. The Kellis group has used this
approach to discover protein-coding genes, RNAs, microRNAs,
regulatory motifs, and targets of individual regulators in diverse
organisms ranging from yeast and fruit flies to mouse and human.
"This represents a new phase in genomics-making biological
discoveries sitting not at the lab bench, but at the computer
terminal," Kellis says.
Kellis is the Karl Van Tassel Career Development Assistant Professor
in the Department of Electrical Engineering and Computer Science and
an associate member of the Broad Institute. He grew up in Greece and
France and earned his B.S., M.Eng., and Ph.D. from MIT, and he was
appointed to the faculty here in 2004. At 30, he has already earned
numerous awards and accolades, including a place on the list of the
35 top innovators under 35 by Technology Review magazine in 2006.
Kellis' work is supported in part by grants from the National
Institues of Health and the National Science Foundation. Alex Stark
is supported by a Human Frontier Science Program fellowship.
Human Motifs Revealed - Broad Institute Press Release
Because so many genes have been conserved, or passed along from
species to species throughout evolution, comparing genomes across
species has emerged as a powerful tool for discovering functional
elements. Comparative genomics has been extremely successful in
identifying protein-coding genes and large conserved non-coding
elements in human.
But the majority of the non-coding elements remain largely
unknown. The most elusive are small regulatory motifs, of about
seven to ten base pairs, that modulate gene usage.
"We set out to systematically discover an encyclopedia of regulatory
motifs in the human, through the lens of evolutionary conservation,"
said Manolis Kellis, co-senior author of the study and associate
member of the Broad Institute. "By comparing multiple species, we
determined subtle conserved signals based on their repetition across
the genome."
These regulatory motifs define the dynamic nature of the cell,
dictating which signals a gene will respond to, and which specific
tissues--such as liver, heart, or muscle - a gene will be expressed
in, said the researchers.
"Fortunately, evolution is a good note keeper," said Kellis, who is
also an assistant professor of computer science at MIT. "By
referencing evolution's notes, we are now one step closer to a more
thorough understanding of the human genome's controlling
machinery. Once this machinery is known, we can then hopefully
control the signals for medical purposes."
In the March 17 issue of Nature, the Broad researchers report:
* more than a hundred new regulatory motifs involved in the first
stage of gene regulation, known as transcriptional initiation; and
* 105 new regulatory motifs involved in post-transcriptional
control, many of which are targets of microRNAs, a recently
discovered mechanism of gene repression.
Surprisingly, the study also led to hundreds of new microRNA genes,
a number much higher than previous estimates suggested. "Nearly
one-half of the motifs involved in post-transcriptional regulation
are associated with microRNAs, demonstrating the extraordinary
importance of this recently discovered regulatory mechanism," said
Xiaohui Xie, first author of the study and a postdoctoral associate
at the Broad.
The researchers now estimate that at least 20 percent of genes are
regulated by microRNAs, an estimate much higher than previously
expected. These tiny, single-stranded pieces of RNA may be one of
the principal players in regulating cellular mechanisms. MicroRNAs
can inhibit transcribed messages, or interrupt a gene's ability to
make protein.
The study team employed an analysis method used recently by Broad
scientists to study four related yeast species genomes. This work
published in the May 15, 2003 issue of Nature by Kellis et. al.,
showed that it is possible to systematically identify both genes and
regulatory elements by comparing a small number of genomes in a
related species. It was unclear however whether such analyses would
be possibly in the vastly more complex human genome.
"Evolution is one of the most powerful tools for understanding how
genes are regulated in health and disease," said Eric Lander,
co-senior author of the study and the founding director of the Broad
Institute. "Our ultimate goal is to use evolutionary comparison to
create a comprehensive catalog of common regulatory motifs in the
human genome."
In addition to the authors described above, included in the study
are Broad researchers Jun Lu; Edward Kulbokas; Todd Golub, who is
also affiliated with Dana-Farber Cancer Institute; Vamsi Mootha, who
is also affiliated with Massachusetts General Hospital and Harvard
Medical School; and Kerstin Lindblad-Toh.
Human gene count tumbles again
Estimates of the number of genes in the human genome have ranged
wildly over the past two decades, from 20,000 all the way up to
150,000. By the time the working draft of the human genome was
published in 2001, the best approximation stood at 35,000, yet even
that number has fallen. A new analysis, one that harnesses the
power of comparing genome sequences of various organisms, now
reveals that the true number of human genes is about 20,500,
thousands fewer than what is currently listed in human gene
The work, led by researchers at the Broad Institute of MIT and
Harvard and appearing online in the November 27 issue of PNAS, has
implications beyond merely settling the debate over how many genes
are in the human genome. An accurate gene count can help identify
the locations of genes and their functions, an important step in
translating genomic information into biomedical advances.
Ironically, the way genes are recognized has triggered much of the
confusion over the human gene count. Scientists on the hunt for
typical genes - that is, the ones that encode proteins - have
traditionally set their sights on so-called open reading frames,
which are long stretches of 300 or more nucleotides, or "letters"
of DNA, bookended by genetic start and stop signals. This method
produced the most recent gene count of roughly 25,000, but the
number came under scrutiny after the 2002 publication of the mouse
genome revealed that many human genes lacked mouse counterparts and
vice versa. Such a discrepancy seemed suspicious in part because
evolution tends to preserve gene sequences - genes, by virtue of
the proteins they encode, usually serve crucial biological
roles. But like it or not, the 25,000 DNA sequences were already
listed in the catalogs of human protein-coding genes, and skeptics
had no systematic way to remove them. "At that point, no one had
gone through the gene catalogs with a fine-toothed comb to find
evidence that they weren't valid," said Michele Clamp, first author
of the study and senior computational biologist at the Broad
Far from blatant mistakes, non-gene sequences can masquerade as
true genes if they are long enough and happen by chance to fall
between start and stop signals. Despite having gene-like
characteristics, these open reading frames may not encode
proteins. Instead, they might have other functions or possibly none
at all.
To distinguish such misidentified genes from true ones, the
research team, led by Clamp and Broad Institute director Eric
Lander, developed a method that takes advantage of another hallmark
of protein-coding genes: conservation by evolution. The researchers
considered genes to be valid if and only if similar sequences could
be found in other mammals to nearly 22,000 genes in the Ensembl
gene catalog, the analysis revealed 1,177 "orphan" DNA
sequences. These orphans looked like proteins because of their open
reading frames, but were not found in either the mouse or dog
Although this was strong evidence that the sequences were not true
protein-coding genes, it was not quite convincing enough to justify
their removal from the human gene catalogs. Two other scenarios
could, in fact, explain their absence from other mammalian
genomes. For instance, the genes could be unique among primates,
new inventions that appeared after the divergence of mouse and dog
ancestors from primate ancestors. Alternatively, the genes could
have been more ancient creations - present in a common mammalian
ancestor - that were lost in mouse and dog lineages yet retained in
If either of these possibilities were true, then the orphan genes
should appear in other primate genomes, in addition to our own. To
explore this, the researchers compared the orphan sequences to the
DNA of two primate cousins, chimpanzees and macaques. After careful
genomic comparisons, the orphan genes were found to be true to
their name - they were absent from both primate genomes. This
evidence strengthened the case for stripping these orphans of the
title, "gene."
After extending the analysis to two more gene catalogs and
accounting for other misclassified genes, the team's work
invalidated a total of nearly 5,000 DNA sequences that had been
incorrectly added to the lists of protein-coding genes, reducing
the current estimate to roughly 20,500.
In addition to suggesting a major revision to the human gene count,
this work provides a set of rules for evaluating any future
proposed additions to the human gene catalog. It also underscores
the benefit of genome sequencing projects. "Without several primate
genomes, we wouldn't have been able to put the final nail in the
coffin of these putative genes," said Clamp.
More broadly, the research reveals that little invention of genes
has occurred since mammalian ancestors diverged from the
non-mammalian lineage. "There's no real creativity going on in the
mammalian genome," explained Clamp. That means that the number,
structure, and function of protein-coding genes are not expected to
differ very much from mammal to mammal, so what makes humans
different from mice and dogs likely lies outside this realm of the
genome. Clamp and her Broad Institute colleagues are now peering
into the genomes of many other mammals, in an attempt to explain
what parts of our genome truly make us human.
Open|DOOR Interview (Nov 2003)
Manolis Kellis (Kamvysselis), SB '99, MEng '99, PhD '03, is first
author on a breakthrough comparative genomics article published in
"Nature". The work, completed with colleagues at the Whitehead
Institute/MIT Center for Genome Research, distinguished important
biological signals from surrounding nonfunctional nucleotides in
yeast and has important human genome applications.
What did baker's yeast, an organism that turns sugar to
alcohol, teach your team about the human genome?
Yeast has taught us surprisingly many lessons about the human
genome. As an experimental system, it has been the model organism
of choice for developing genome-wide technologies for monitoring
complete cell states, simultaneously observing all genes and
proteins. For our team, it became the model organism for
comparative genomics.
We work towards the ability to directly interpret genomic
information, namely to read in a string of ACGT characters and
recognize within it meaningful functional elements, such as genes
and regulatory motifs that control the expression of genes. The
difficulty to identify such elements comes from the fact that they
are hidden amidst thousands of non-functional
nucleotides. Discovering functional features is equivalent to
extracting signal from noise.
This is where comparative genomics comes in. Across evolutionary
time, mutations in non-functional regions accumulate by genetic
drift, but mutations in functional regions are selected against by
natural selection. By comparing closely related species, we can
thus recognize strongly conserved regions as likely to be
functional, and weakly conserved regions as likely to be
Yeast proved to be a wonderful organism for comparative
genomics. Its small and relatively simple genome made it feasible
to completely sequence multiple relatives and align them across
their complete genomes. We were for the first time able to study
the conservation patterns of genes and regulatory motifs across
four complete eukaryotic genomes, and develop computational methods
for discovering biological signals. Finally, the extensive
biological knowledge of gene function in yeast allowed us to
confirm our findings and our comparative methods.
We hope to apply similar methods for the understanding of the human
genome, by comparing it to chimp, mouse, rat, dog, chicken, and a
multitude of genomes whose sequencing is under way. This will not
be an easy task due to the increased complexity of these organisms,
but the lessons learned from yeast are proving to be invaluable.
How are you applying your computer science expertise to
comparative genomics?
Computer science is playing an increasingly central role in modern
biology. This has been enabled by the quantitative nature of
biological data sources and efforts to represent biological
knowledge in a controlled vocabulary. Additionally, technological
advances have increased the ability to obtain large-scale
information of cell state, leading to an explosion in both the
quantity and the types of available data.
Together, these transformations of modern biology have made the use
of computational approaches in biology not only possible, but also
imperative. The hypothesis-driven approach of specific experiments
designed to answer well-posed questions is now complemented by
data-driven approaches that generate hypotheses largely in silico,
based on large-scale biological data. Computational tools that can
discover meaningful patterns by mining through large quantities of
data can be central in biological discovery.
This is where a computer science background has proved to be very
useful. Understanding the biology is only the first step. One has
to then construct computational representations for the data at
hand, develop algorithms to manipulate the resulting computational
structures, define statistical tests to select meaningful results,
and interpret these back into the realm of biology. To obtain sound
biological results, the underlying computational methods must be
Each aspect of the yeast work required a strong computational
component. To align the four yeast species, we developed
graph-theoretic algorithms for resolving the ambiguities in the
correspondence of genes and regions across the species. To identify
protein-coding genes, we developed models for nucleotide change
within genes and intergenic regions, and built a classifier for
candidate genes according to these models. To discover regulatory
motifs, we formulated statistical tests to evaluate the genome-wide
conservation of sequence patterns, and built algorithms to refine
these motifs into a small dictionary of regulatory elements.
The marriage of computer science and biology is a necessity. My
biology friends like to joke that computer science will be
remembered as that little field that helped understand biology. My
computer science friends instead respond: "Aah, biology! Finally a
problem hard enough for computer science to solve." Good-spirited
jokes aside, both parties acknowledge the birth of a field in its
own right, where computation and biology combine to bring the best
from both worlds. The whole makes for a rapidly moving field, where
the rules are changed every few months, powerful paradigms emerge
and secrets are revealed.
BioIT World
The Search for Whole Genome Duplication
Researchers at the Broad Institute have uncovered unequivocal
evidence for the existence of WGD - whole genome duplication
(WGD). Manolis Kellis, Bruce Biren, and Eric Lander have shown for
the first time that baker's yeast (Saccharomyces cerevisiae)
originated via the duplication of the entire genome of an ancestral
strain; this predecessor subsequently lost some 90 percent of its
genes by various means to give rise to the current S. cerevisiae
"This is the first time we actually see that an organism underwent
complete genome duplication and went back to a single-copy state,"
Kellis said. WGD can occur when a cell's DNA replicates, but the
cell fails to divide as normal. From an evolutionary perspective,
WGD allows the newly duplicated genes to acquire new, potentially
advantageous roles, or adapt to new environments. Of course, many
genes are deleted or mutated, such that most traces of the
widescale duplication are quickly expunged.
The Broad team report in Nature that it found the missing link by
sequencing the complete genome of a different yeast species,
Kluyveromyces waltii, which diverged before the duplication. They
showed that each region of this pre-duplication relative
corresponds to exactly two regions of baker's yeast, providing
definitive proof of duplication.
Following deletion of 90 percent of the duplicated genes, baker's
yeast returned to having one gene per function for most of the
genome, ending up with only 457 additional genes, many of which are
devoted to sugar metabolism. "It will be interesting to see just
how far such distant echoes of genomic upheaval may be traced," the
authors conclude.
The study follows last year's comparative genomics tour de force by
Kellis and Lander, comparing the genomes of four yeast species,
also published in Nature.
M. Kellis et al. " Proof and evolutionary analysis of ancient
genome duplication in the yeast Saccharomyces cerevisiae.
Genome News Network
An odd little fungus that grows in cotton seeds and a little-known
species of yeast are giving scientists an idea how organisms evolve
and take on diverse functions. Two new studies report that the
duplication of the genome of a primitive fungus more than 100
million years ago gave rise to common baker's yeast.
The studies resolve an ongoing controversy over how the common
yeast Saccharomyces cerevisiae evolved. The organism is widely
studied because many of the genes that control the yeast's function
are also important in humans.
The new findings also give scientists clues about how gene
duplications can drive evolution. While a backup copy of an
essential gene continues to perform needed duties, a second copy is
free to mutate and take on a new role in the organism.
"There have been two camps with different views of how yeast
evolved," says Manolis Kellis of the Broad Institute in Cambridge,
Massachusetts, who participated in one of the new studies. "Some
people believe a whole genome was duplicated and others believe
smaller gene clusters were duplicated. We now have the missing
piece of evidence that points to whole-genome duplication."
Manolis and his colleagues sequenced a species of yeast called
Kluyveromyces waltii. At the same time, a team led by Peter
Philippsen of the University of Basel in Switzerland, sequenced the
genome of the filamentous fungus Ashbya gossypii. Both studies
suggest that a genome duplication in a common descendant led to the
creation of baker's yeast.
Ashbya has 4,718 genes on seven chromosomes and Kluyveromyces has
5,230 genes on eight chromosomes. Duplication of the genome of
their common descendant created an organism with about 10,000
genes. Over time, most of the duplicated genes were lost, but some
mutated and took on new functions. In the end, the baker's yeast
genome emerged with 5,714 genes on 16 chromosomes.
Philippsen, who studies the fungus Ashbya, notes that baker's yeast
and Ashbya have many genes in common but very different
functions. Ashbya is a cotton pathogen, while baker's yeast is used
to make bread.
Baker's yeast grows as a single-cell organism that grows and buds
off into separate cells, whereas the fungus grows as a long
filament containing many nuclei. The fungus grows from one end of a
filament until it runs out of nutrients, creating a branched
multi-cellular organism.
"These organisms have similar sets of genes, but very different
lifestyles," says Philippsen. "The big challenge now is to figure
out why."
One clue comes from looking at some of the duplicated
genes. Although most of the duplicated genes have disappeared in
baker's yeast, many gene pairs remain, but have mutated to acquire
different functions.
For example, one gene in baker's yeast that is important for gene
replication has a twin that silences, or shuts down, other
genes. The researchers believe that many of these paired genes with
different functions have driven evolution.
"When you have duplicated genes, many of these genes will be lost
over time, because you only need one to do the job," says
Kellis. "But for some gene pairs, one gene has preserved the
ancestral function while the other is free to evolve, sometimes
taking on entirely new roles. This leads to innovation and the
creation of new species."
Baker's yeast provides the first clear example that whole-genome
duplication plays a role in evolution. Some researchers have
proposed similar events in the evolution of plants and vertebrates.
"We would have to find an ancestor with a non-duplicated genome to
see if similar events played a role in human evolution," says
Kellis. "So far, this hasn't happened, but people are looking."
Genome News Network
Yeast Genome Revisited
Ever since the yeast genome was sequenced seven years ago,
researchers have debated the best way to identify the "true"
genes-those DNA sequences that code for proteins. Now, researchers
have sequenced three more yeast genomes and say that the current
list of genes needs to be revised.
By comparing the new genome sequences with the original, the
researchers uncovered nearly 50 new genes and 70 stretches of DNA
that regulate yeast genes. They also propose that about 500 DNA
sequences previously thought to be genes should be crossed off the
The research, published in Nature, goes far beyond bread and beer:
It could serve as a model for identifying every gene in the human
genome. Furthermore, many yeast genes have counterparts in humans,
including some that play a role in cancer.
"This study shows how valuable it is to sequence the genomes of
closely related species," says Steven L. Salzberg of the Institute
for Genomic Research (TIGR) in Rockville, Maryland, who wrote an
accompanying News & Views article. "If we line up the genomes and
see the same sequences in each species, it tells us that a gene is
"This is just what we need to do, and in fact are doing, with the
human genome," he adds.
When the budding yeast, Saccharomyces cerevisiae-used to make beer
and bread-was sequenced in 1996, researchers found nearly 6,000
likely genes (based on the length of the sequence and the presence
of specific signals that indicate where genes begin and
end). Subsequent estimates have ranged from 4,800 to 6,400
genes. According to the Nature paper, the number should be 5,538
In the new study, Manolis Kellis, a graduate student in Eric
S. Lander's laboratory at the Whitehead Institute in Cambridge,
Massachusetts, and his colleagues analyzed the three other yeast
species and compared them to S. cerevisiae.
"For each possible gene sequence, we looked to see if there was
evolutionary pressure to preserve that stretch of DNA," says
Kellis. "We discarded about 500 sequences that were not
conserved. Evolution had no reason to care about these sequences."
For Kellis, the study's most exciting discovery was finding more
than 70 new sequences that regulate gene activity.
"We found two types of regulatory sequences," he says. "Some
sequences act like little tiny traffic lights, telling the gene
when to turn on and when to turn off. Others act as zip codes, or
shipping addresses. They tell the cell where to send the message,
once a gene is made into RNA."
The researchers also found that the most variation in yeast genes
occurs on the ends of chromosomes, in regions known as
telomeres. Telomeres have not been completely sequenced in the
human genome.
"Telomeres get exchanged a lot more rapidly," says Salzberg. "My
twenty-five cent bet is that the same thing is going on in
humans. I would like to see telomeres sequenced in humans. This is
where things are most likely to be happening, where gene
rearrangements are likely to occur."
He adds, "We need to finish the human sequence down to the last
Genome Biology
Many yeasts win the vote
Comparative genomics with two or more related species reveal limitations of single genome analysis
The Human Genome Mapping Project set out to unravel the secrets of the
genes by determining the primary sequence of the human genome, but it
has become clear that this information is insufficient. Determination
of functional and coding sequences in a primary genome sequence
depends on an a priori knowledge of gene function and on statistics,
and so the information obtained is incomplete and probabilistic. In
the May 15 Nature, Manolis Kellis and colleagues at the Whitehead
Institute/Massachusetts Institute of Technology Center for Genome
Research develop and apply a general approach to determining regions
of significance in primary sequence by whole genome comparison of
several related species. They reasoned that evolution would conserve
protein coding and regulatory elements and that comparison of more
than two genomes would increase the signal:noise ratio by highlighting
changes that were not due to chance (Nature 423:241-254, 2003).
Kellis et al. compared the sequences of four related species of
yeast, Saccharomyces cerevisiae, S. paradoxus, S. mikatae, and
S. bayanus and employed a "voting system" to reach a conclusion on
the validity of theoretical open reading frames (ORFs) and on the
accuracy of the determination of proposed gene structures such as
promoters, translation start and stop sites, and intron/exon
boundaries. They propose to reduce the number of genes in the yeast
gene catalogue by eliminating 503 invalid ORFs and to redefine gene
structure assignments in at least 300 cases. They identified 188
genes that encode small proteins of <100 amino acids and many new
genes and regulatory elements; they were also able to infer
functions for more than half of their 42 newly discovered sequence
motifs by categorizing the genes associated with them. In addition,
they found evidence for rapid genome evolution at all of the
"The analyses will produce a substantial revision in our knowledge
of the yeast genome and provide strategic directions for how we
might select other sequencing targets to advance understanding of
the human genome," writes Steven Salzberg of The Institute for
Genomic Research in an accompanying News and Views article. "This
new study of yeast genomes makes it clear that comparative genome
sequencing has tremendous analytical power," he concludes.
Yeast geneticists get a 2-for-1 deal
Today's humble but extensively studied budding yeast (Saccharomyces
cerevisiae) evolved when the genome of a distant ancestor became
duplicated, according to a study published online by Nature this
Manolis Kellis and colleagues studied the genetic make-up of a
related yeast species, Kluyveromyces waltii, and compared it with
that of S. cerevisiae. K. waltii shares a common ancestor with
S. cerevisiae but diverged before the duplication event took
place. The team found that key regions of the K. waltii genome are
duplicated in S. cerevisiae. Originally, the entire genome was
duplicated, but S. cerevisiae then evicted some 90% of its
duplicated genes to make the genome fully functional.
Genome duplication may boost evolutionary innovation, the authors
say - duplicated genes probably supply the raw genetic material
needed for new functions to emerge, and so could help organisms
adapt to new environments.
Whitehead Press Release
Study answers questions on ancestry of yeast genome
CAMBRIDGE, Mass. (Mar. 8, 2004) - In work that may lead to a better
understanding of genetic diseases, researchers at the Broad
Institute of MIT, Harvard University and Whitehead Institute for
Biomedical Research show that baker's yeast was created hundreds of
millions of years ago when its ancestor temporarily became a kind
of super-organism with twice the usual number of chromosomes and
increased potential to evolve.
The study by postdoctoral fellow and lead author Manolis Kellis of
the Broad (rhymes with "code") Institute; Eric S. Lander, Broad
director and Member of Whitehead Institute; and Bruce W. Birren,
co-director of the Broad's sequencing and analysis program will be
published online by Nature on March 7.
Scientists have postulated that in a handful of instances in
evolutionary history, cells may have replicated their entire
genomes in events called whole genome duplication (WGD), but no
definitive proof existed. The work at the Broad Institute shows
conclusively for the first time that the well-studied organism
baker's yeast originated through this little-understood phenomenon,
resolving a long-standing controversy on the ancestry of the yeast
Whole genome duplication may have occurred when a cell replicated
its DNA normally, as it does every time it divides, but did not
split it between two resulting cells, or two cells may have
fused. The result is that a yeast cell with around 5,700 genes
suddenly had more than 11,000. While one copy of the gene performs
its designated function, the other is free to perform a new and
potentially valuable use. In addition, the organism is able to
evolve more rapidly with natural selection acting on thousands of
duplicated genes simultaneously, allowing for large-scale
adaptation to new environments.
This super-organism doesn't come without drawbacks. The excess
genes cause instability in the genome and are deleted through
mutation, gene loss and genomic rearrangement. As a result,
millennia after the event, very few duplicated genes remain. "This
is the first time we actually see that an organism underwent
complete genome duplication and went back to a single-copy state,"
Kellis said. In the case of baker's yeast, roughly 90 percent of
its duplicated genes were lost. The organism returned to having one
gene per function for the vast majority of its genome, ending up
with only 457 additional genes.
What's the advantage to replicating the entire genome and then
losing half the genes? According to one theory, by replicating the
whole genome, entire systems (networks and pathways) within the
organism can evolve together and take on new functions. Yeast,
which metabolizes sugar and causes fermentation, apparently evolved
to fill an evolutionary niche around the time that fruit-bearing
plants appeared, creating an abundance of sugar in the
environment. "It's the best fermenter out there," Kellis said of
Saccharomyces cerevisiae, the species the group studied. Many of
its surviving 457 genes are devoted to sugar metabolism.
If incremental evolution over millennia is like a landscape
changing through erosion, whole genome duplication is like an
earthquake. "Direct study of such a cataclysmic event may provide
major insights into the dynamics of genome evolution and the
emergence of new functions," the authors write.
Given the massive gene loss and hundreds of rearrangements, little
evidence of WGD remains within the genome of baker's yeast. Tracing
the development of a genome over billions of years is like printing
a 5,000-page book twice without page numbers, throwing away most of
the duplicate pages, shuffling both copies and binding them into a
single book. Uncovering the ancestral gene order, Kellis said,
would be like happening upon the original book in a hidden library.
The authors found the missing link by sequencing a yeast species
whose divergence precedes the duplication. They showed that each
region of this pre-duplication relative corresponds to exactly two
regions of baker's yeast, providing definitive proof of
Researchers speculate that vertebrates, including human ancestors,
may have undergone two rounds of complete duplication, but the
evidence remains weak without comparison to a pre-duplication
relative. Broad researchers used a new method to compare the
complete genomes of each of the duplicated and pre-duplication
yeast species, and they plan to apply this method to more
species. Typical methods of genome comparison would "miss the
genome duplication event if they focus on solely the best match for
every gene and every region," Kellis said.
Genomic research is leading to new understanding of the connections
between different types of genetic functions and which genes were
paired in our ancestors to work together. For example, uncovering
the duplication event provided a new link between gene silencing
and the binding of DNA-replication origins. Similarly,
understanding the dynamics of genome duplication has implications
in understanding disease. In certain types of cancer, for instance,
cells have twice as many chromosomes as they should, and there are
many other diseases linked to gene dosage and
mis-regulation. "These processes are not much different from what
happened in yeast," Kellis said.
Whole genome duplication may have allowed other organisms besides
yeast to achieve evolutionary innovations in one giant leap instead
of baby steps. It may account for up to 80 percent of flowering
plants species and could explain why fish are the most diverse of
all vertebrates. Said the authors, "The results here suggest that
it may also be fruitful to search for similar genomic signatures of
WGD in other organisms. It will be interesting to see just how far
such distant echoes of genomic upheaval may be traced."
Kellis is also part of the MIT Computer Science and Artificial
Intelligence Laboratory, and Lander is a professor of biology at
The Broad Institute, known officially as the Eli and Edythe
L. Broad Institute, is a research collaboration of the
Massachusetts Institute of Technology, Harvard University and
Whitehead Institute. The Broad's mission is to fulfill the promise
of genomics for medicine.
The Scientist
Yeasts get the vote
Comparative genomics with two or more related species reveal
limitations of single genome analysis | By Cathy Holding
The Human Genome Mapping Project set out to unravel the secrets of
the genes by determining the primary sequence of the human genome,
but it has become clear that this information is
insufficient. Determination of functional and coding sequences in a
primary genome sequence depends on an a priori knowledge of gene
function and on statistics, and so the information obtained is
incomplete and probabilistic. In the May 15 Nature, Manolis Kellis
and colleagues at the Whitehead Institute/Massachusetts Institute
of Technology Center for Genome Research develop and apply a
general approach to determining regions of significance in primary
sequence by whole genome comparison of several related
species. They reasoned that evolution would conserve protein coding
and regulatory elements and that comparison of more than two
genomes would increase the signal:noise ratio by highlighting
changes that were not due to chance (Nature 423:241-254, 2003).
Kellis et al. compared the sequences of four related species of
yeast, Saccharomyces cerevisiae, S. paradoxus, S. mikatae, and
S. bayanus and employed a "voting system" to reach a conclusion on
the validity of theoretical open reading frames (ORFs) and on the
accuracy of the determination of proposed gene structures such as
promoters, translation start and stop sites, and intron/exon
boundaries. They propose to reduce the number of genes in the yeast
gene catalogue by eliminating 503 invalid ORFs and to redefine gene
structure assignments in at least 300 cases. They identified 188
genes that encode small proteins of <100 amino acids and many new
genes and regulatory elements; they were also able to infer
functions for more than half of their 42 newly discovered sequence
motifs by categorizing the genes associated with them. In addition,
they found evidence for rapid genome evolution at all of the
"The analyses will produce a substantial revision in our knowledge
of the yeast genome and provide strategic directions for how we
might select other sequencing targets to advance understanding of
the human genome," writes Steven Salzberg of The Institute for
Genomic Research in an accompanying News and Views article. "This
new study of yeast genomes makes it clear that comparative genome
sequencing has tremendous analytical power," he concludes.
Tufts Academic Technology First Lecture
(this link contains a video of the lecture)
Bioinformatics Seminar Series
Computational Biology: Challenges and Opportunity
To meet Tufts University's growing interest in the field of
bioinformatics, AT is hosting a bioinformatics seminar series
during the spring semester. The purpose of the series is to: 1)
educate Tufts faculty on what is bioinformatics and how to pursue
research in this area; and 2) to create a support group interested
in bioinformatics and pursuing research opportunities, including
collaboration and grants support.
Dr. Manolis Kellis of the MIT/Broad Institute Center for Genome
Research will deliver the first lecture in the series. Entitled
"Computational Biology: Challenges and Opportunity," the lecture
will provide an overview of the range of definitions, tools, and
skills related to the term "bioinformatics." Dr. Kellis's research
interest is in applying computational methods to understanding
biological signals. His MIT PhD thesis focused on the computational
foundation of genomics. He pioneered new methods for discovering
biological signals using multiple species comparisons.
Dr. Kellis currently works at the MIT/Broad Institute Center for
Genome Research. A collaboration between MIT, Harvard and its
affiliated hospitals, and the Whitehead Institute for Biomedical
Research, the Broad Institute is a newly created biomedical
research institute, aimed a realizing the human genome to
revolutionize clinical medicine and to make knowledge broadly
available to scientists around the world. The Center for Genome
Research is an international leader in the Human Genome Project
(the effort to identify all of the DNA letters that make up the
instructions for a human being). The Center is the largest public
sequencing center in the world, having contributed one- third of
the content to the human genome sequence.
Museum of Science
Three MIT Professors Declared "Young Innovators to Watch" at Museum
of Science Next Generation Event
BOSTON (November 10, 2004)-Last night, at the Museum of Science's
Next Generation event, three leaders in biotechnology,
computational biology and genetics announced their choices for the
next generation of "young innovators to watch" in these fields.
Made possible by Concord Communications, Inc. of Marlborough, MA,
this year's event honored three young innovators from MIT. Chris
Burge, an associate professor at MIT, was nominated by Mary Lou
Pardue, Boris Magasanik Professor of Biology at MIT, developer of
the fundamental geneticist's tool in-situ hybridization, and
advocate for women in science. Burge physically split his
laboratory to work in both the traditional "wet" and computational
biology paradigms. His research includes RNA splicing and
microRNAs, new developments that reveal extraordinary complexity in
how genes work and suggest new disease mechanisms and cures.
Chosen by Eric Lander, Founding Director of the Broad Institute of
MIT and Harvard, Manolis Kellis, an assistant professor at MIT, is
a principal investigator of the Computer Science and Artificial
Intelligence Laboratory (CSAIL), and a member of the Broad
Institute of MIT and Harvard. He introduced new computational
paradigms to the complex tasks of deciphering DNA signals,
understanding gene regulation, and clarifying how genomes
evolve. His work may help cure disease and also contribute to our
understanding of natural history and evolution.
"The Museum of Science is dedicated to presenting science,
technology and engineering in a variety of interesting and
inspiring formats including interactive exhibits, live
presentations and special events like the Next Generation,"
remarked President and Director, Ioannis (Yannis) Miaoulis. "We're
honored to host renowned science and technology leaders and their
chosen innovators-to celebrate their exciting discoveries and
glimpse the future of their fields."
Following a brief introduction by their mentor, each young
innovator offered a 15-minute presentation on their work to date
and responded to questions from the invited audience.
"As a major employer in the greater Boston area, Concord
Communications has benefited from the highly skilled local
workforce," stated Jack Blaeser, President and CEO of Concord
Communications. "As a company, we believe it is important to
support the advancement of technology literacy so that
Massachusetts can remain a vibrant contributor to the information
economy. That is why we have chosen to become a founding sponsor of
the Museum of Science's Technology Literacy Center. Our support for
the Center showcases our commitment to the greater Boston
community, in which we work and live."
Blackwell Plant Science
Latest News - Genetic Cross-Toc: The Homologous Genes atToc33 and
atToc34 Contribute to Plant Development
What is the function of homologous genes in organisms? This
question is often simpler to ask than to answer.
In the genetically simple plant, Arabidopsis, work is underway to
investigate the function of homologous genes using the wealth of
data available from the genome sequence (1,2). In a recent paper
in The Plant Journal, the groups of Paul Jarvis and Kenneth
Keegstra published investigations into the function of a homologue
of the PPI1 gene (atToc33), PPI3 (atToc34) in Arabidopsis (1).[...]
Finally, it is worth noting the context of this paper. In the
April 8th 2004 edition of Nature, Manolis Kellis and co-workers
reported analysis of gene function in duplicated portions of the
yeast genome (7). There investigations up-hold the long-standing
hypothesis that when gene duplication occurs, one copy of the gene
retains the original function, while the homologue diverges rapidly
in sequence and function. Evidence for whole-genome duplication in
Arabidopsis neatly parallels these findings and places this paper
on the broader evolutionary canvass.
Small fish yields big insights
An international team of scientists, including several from the
Broad Institute of MIT and Harvard, has decoded the smallest known
vertebrate genome--the puffer fish or Tetraodon nigroviridis. The
fish's 21 chromosomes, which together contain more than 300 million
letters of DNA, tell a twisting evolutionary tale and even shed
light on our own genetic makeup.
Comparison with other genome sequences shows that fish proteins have
diverged much faster than those in mammals, the team reports in the
Oct. 21 issue of Nature. Tetraodon contains several key genes
previously thought to be absent from fish.
Further, comparison with the human genome suggests about 900
previously unannotated human genes. Most genes in the human DNA
sequence have two counterparts in the Tetraodon genome, the
researchers add, showing that the ancestors of this fish must have
undergone a genome duplication at some point. Indeed, the Tetraodon
sequence may even give us a window on the last common ancestor of
Tetraodon and humans--a primitive bony fish that lived hundreds of
millions of years ago.
The Broad authors are Nicole Stange-Thomann, Evan Mauceli, Manolis
Kellis, Michael Zody, Jill Mesirov, Kerstin Lindblad-Toh, Bruce
Birren, Chad Nusbaum and Eric Lander. Lander is also a professor in
MIT's Department of Biology.
This work was supported by the Consortium National de Recherche en
The Daily Nonpareil
Yeast may rise from 11,000-gene ancestor
CAMBRIDGE, Mass., Mar 08, 2004 (United Press International via
COMTEX) -- Baker's yeast may have been created when its ancestor's
genome was doubled, giving itself twice the capacity to evolve,
U.S. researchers said Monday.
MIT and Harvard researchers studied the yeast's genetic history and
concluded its genome may have been created when an ancestor cell
attempted to replicate its DNA during cell division. Instead of
splitting the DNA between the two resulting cells, one daughter
cell may have wound up with more than 11,000 genes instead of
Alhough the larger number of genes made the yeast cell more
adaptable to new environments, it also created many mutations, so
nearly all the original genes have been inactivated over time.
"This is the first time we actually see that an organism underwent
complete genome duplication and went back to single-copy state,"
said lead author Manolis Kellis.
Most of the 457 genes left in baker's yeast are devoted to one gene
function: fermentation.
Other organisms, including many flowering plants, may have
developed the same way, the researchers said, and the same kind of
development may also explain the diverse nature of fish.
Nature Highlights
Comparative genomics has the potential to tackle a central problem in
current biological research: the identification of the functional
information in the genome. The results of one of the first major
contributions to comparative genomics suggest that the technique is
extremely powerful and will have a major impact on genome analysis in
all species including humans. Draft sequences of three yeasts
separated from Saccharomyces cerevisiae by up to 20 million years of
evolution were compared with the gene sequence of S. cerevisiae. The
comparison yields major revisions to the gene catalogue, including
elimination of 500 previously annotated genes and the discovery of 50
new ones.
Nature News and Views
(requires subscription)
Bio-IT World
In a tour de force of genomic technology and computational biology,
researchers at the Whitehead/MIT Center for Genome Research have
produced a comparative study of four strains of yeast, with dramatic
implications for our understanding of gene inventory and regulatory
DNA sequence motifs that direct gene expression.
Manolis Kellis, Eric Lander, and colleagues sequenced the
approximately 12-million basepair genomes of three yeast species n
1996. Back then, sequencing a yeast genome required hundreds of
researchers and more than a year; today, a single high-throughput
center blasts through the sequence in about a week.
Two major results emerge from this comparative genomics study. First,
Kellis and colleagues show that more than 500 putative genes predicted
by algorithm approaches are spurious, dropping the total number of
yeast genes below 6,000. Second, they newly identify 42 conserved
regulatory sequence motifs. The lessons from this study in yeast are
directly relevant to the future study and understanding of the human
Diario Medico
La comparacion de genomas de levadura identifica nuevos genes
La revista Nature publica hoy el analisis comparativo de la levadura
Saccharomyces cerevisiae con el de otras tres especies: S. paradoxus,
S. mikatae y S. bayanus. Este mitodo ha permitido identificar la
funcion de elementos de la secuencia hasta ahora no identificados. Los
autores consideran que el mitodo de analisis podrma aplicarse para
comparar el genoma humano con el de otras especies primates.
Un estudio estadounidense ha realizado un analisis comparativo del
genoma de la levadura Saccharomyces cerevisiae valiindose del
borrador de la secuencia genitica de otras tres especies
relacionadas, Saccharomyces paradoxus, S. mikatae y S. bayanus,
segzn se publica hoy en Nature.
Basandose en su comparacion, el equipo de Manolis Kellis, del Centro
de Investigacion Gensmica Whitehead/MIT, en Boston (Massachusetts), ha
llegado a la conclusion de que 500 elementos del genoma que se pensaba
que iban a ser genes, en realidad no lo son, por lo que deberman
eliminarse del catalogo genitico de la Saccharomyces
cerevisiae. Asimismo, deberman aqadirse 50 nuevos genes. Finalmente,
la S. cerevisiae se compondrma sslo de 5.538 genes que codificarman
cien aminoacidos.
La identificacion de elementos funcionales codificados en un genoma es
uno de los principales retos de la biologia moderna. Al contrastar
estas cuatro secuencias similares, los investigadores fueron capaces
de identificar secuencias del ADN de la levadura que no codificaban
genes, pero que, sin embargo, podrman tener otras funciones.
El equipo se ha centrado en la Saccharomyces cerevisiae por ser uno de
los eucariotas mas estudiados. En una primera fase del trabajo se
alinearon los genomas y se caracterizs su evolucion, definiendo las
regiones y sus mecanismos de cambio. Posteriormente, se desarrollaron
mitodos para la identificacion directa de genes y elementos
El analisis genitico supuso una amplia revision del catalogo genitico
de la levadura, modificando aproximadamente al 15 por ciento de los
genes y reduciendo el nzmero total en cerca de 500. Por su parte,
gracias al analisis se identificaron de forma automatica 72 elementos
del genoma, incluyendo elementos reguladores ya conocidos, asm como
otros nuevos. "Se ha intentado deducir una posible funcion para cada
uno de estos elementos, asm como pistas sobre las interacciones entre
ellos", han especificado los investigadores de Massachusetts.
Los resultados demuestran que el analisis gensmico comparativo de
especies relacionadas puede identificar elementos funcionales
fundamentales previamente no conocidos. Por tanto, "la secuenciacion
de primates podrma proporcionar muchas pistas sobre el genoma humano",
han apuntado los cientmficos.
Mitodo ztil En un artmculo de opinion que acompaqa al estudio en el
mismo nzmero de Nature, Steven L. Salzberg, del Instituto de
Investigacion Gensmica en Rockville (Maryland), seqala que "este nuevo
estudio comparativo de los genomas de la levadura aclara que la
comparacion de secuencias geniticas puede tener un tremendo potencial
analmtico y deductivo".
Segzn el genetista, "este trabajo ofrece la posibilidad de mejorar el
conocimiento de miles de genes de una sola vez, asm como obtener datos
sobre la funcion de una vasta cantidad de ADN gensmico que no codifica
genes. Tal y como ya se habma demostrado cuando se secuencis, la
levadura nos muestra una vma hacia el mejor conocimiento de nuestra
Berliner Zeitung
Manolis Kellis und seine Kollegen vom Center for Genome Research in
Cambridge, Massachusetts wdhlten f|r ihre vergleichenden Studien
die Hefearten S. paradoxus, S. bayanus und S. mikatae aus. Wie die
Bierhefe besitzen sie je 16 Chromosomen. Voranalysen hatten zudem
ergeben, dass die meisten der angenommenen mehr als sechstausend
Erbanlagen der Bierhefe ein direktes Gegenst|ck in den drei Arten
haben. F|r rund f|nfhundert dieser Gene konnte Kellis Team
allerdings in keiner der untersuchten Arten eine Entsprechung
finden. Die Forscher halten es daher f|r unwahrscheinlich, dass es
sich bei diesen Sequenzen tatsdchlich um Gene handelt -
funktionelle Bereiche also, die den Bauplan f|r Proteine
beinhalten. Gleichzeitig fand das Team um Kellis durch den
Sequenzvergleich allerdings 43 neue Gene, die f|r kleine Proteine
mit weniger als hundert Bausteinen (Aminosduren) codieren und die
bei bisherigen Analysen |bersehen worden waren. Die Forscher
schdtzen die Zahl der Gene im Bierhefe-Erbgut nun auf 5 726.
Whitehead Press Release
Whitehead Genome Center Taps Comparative Genomics to Analyze Key
Functions in Yeast
CAMBRIDGE, MA, May 14, 2003 -- In another example demonstrating the
power of comparative genomics, scientists at the Whitehead
Institute/MIT Center for Genome Research announce that they have
compared four different species of baker's yeast, the simple,
age-old organism that turns sugar to alcohol, and shown that such
comparisons are a powerful tool for identifying key functions in
genomes. Their findings have implications for the human genome and
is yet another vital step on the path to further medical and
scientific discovery. The paper appears in the May 15, 2003 issue
of Nature.
In this project, scientists generated high-quality draft sequences
of the genomes of three of the Saccharomyces yeast species,
S. paradoxus, S. mikatae and S. bayanus. They lined up these
genomes to that of the model organism S. cerevisiae, commonly known
as baker's yeast. The resulting multiple comparisons provided a
great resource for understanding the yeast genome.
Highly significant is that the comparisons made it possible to more
easily distinguish the "noise'' portions of the genomes--areas that
appear to have little use-- from "signal," those parts that have an
obvious purpose. In humans, a mere five percent of the genome is
"The goal is to extract important biological signals hidden in the
vast noise of non-functional regions," says Manolis Kellis
(Kamvysselis), a graduate student at the Whitehead/MIT Center for
Genome Research and the Department of Computer Science,
Massachusetts Institute of Technology, who is the first author on
the paper. "These signals include genes, the building blocks of our
cells, but also regulatory motifs, tiny traffic lights that turn
genes on and off."
Not unlike attempting separating wheat from chaff with only one's
fingers, extracting signal in genomes has proved a painstaking,
inaccurate process, and a frustrating one, as being able to hone in
on the functions of a genome has vast implications for medical
science. Yeast, a relatively simple organism, has a small, compact
genome containing less noise than the human genome. It provided an
excellent organism to test comparative genomic techniques.
"We believe it is a good model for genome-wide comparative
analysis,'' Kellis says. In principal, the approach the researchers
used can be applied to any organism by choosing a set of related
species to sequence and study.
Overall, the scientists found their model to be a powerful tool for
identifying genes and refining gene structure, rapid and slow
evolutionary changes, and facets of gene regulation.
"Comparative genomics is an extremely important tool. Trying to
understand an ancient language like Egyptian hieroglyphs by simply
looking at the words in one language may be hard. But by reading
the same text in other languages like Latin or Greek and finding
common structures, we can recognize words and grammar, and learn
the meaning of each language. Similarly, by reading the same
chapter in multiple species, we get to the basis of what is
important in the book of life. Comparative genomics is the Rosetta
stone of biology," says Kellis.
Saccharomyces is perhaps best known as the magic that makes bread
rise and fruit ferment but to scientists, it is a favorite organism
of study, one that for years has helped scientists answer important
questions in genetics and cell biology. Now, the availability of
the three, high-quality yeast sequences means biomedical
researchers can better understand these organisms, some of which
mutate into invasive, deadly infections in humans.
Among the findings of this analysis is that the yeast genomes hold
about 5,700 genes, many fewer than the estimated 10,000 of the
fruitfly and 30,000 that humans have. Because of an ancient common
ancestor, humans have about 2-3,000 genes in common with the yeast,
generally genes that code for basic cell machinery.
Researchers were also able to identify signals controlling gene
expression that typically required complex experimentation and
extensive biological knowledge to find. "One of the most important
results of this analysis is that regulatory motifs can be read
directly from the DNA sequence," says Kellis. "When comparing
multiple genomes, these signals become apparent. We now have a
complete list of the most strongly conserved regulatory motifs in
The four different species of yeast are as different from each
other as mice are from humans, yet across the four yeast genomes,
all but roughly 12 genes are held in common. In other words, a mere
12 genes or so separate one yeast species from the next.
"It is striking. We saw the same thing between the human and mouse
genomes. It may mean that genetic differentiation across different
species is the result of very subtle events," Kellis says.
Understanding what changes may turn a benign species into an
invasive human pathogen will be crucial to understanding and curing
disease. Studying the differences between Saccharomyces genomes
offers insight as to how genes and new functions may evolve in
higher organisms, including humans. "We found a small number of
genes that are evolving very rapidly," says Kellis. "These are
likely to be involved in speciation events."
The genomes were sequenced using the Whole Genome Shotgun (WGS)
approach. For each species, sequence from the entire genome was
generated and reassembled by recognizing identical segments using
the ARACHNE assembler, a program developed at the Whitehead
Institute/MIT Genome Center. The WGS method is standard and has
been successfully applied to the fruitfly and the mouse. The
Saccharomyces sequences are freely available through public
sequence databases and the Saccharomyces Genome Database (SGD)
maintained at Stanford University. The sequence is still considered
a draft because there are very small missing or ambiguous portions
of the sequence.
The genome of each species of Saccharomyces is about 12 million
base pairs in size. The draft sequences show the order of the DNA
chemical bases A, T, C, and G along the yeasts' 16 chromosomes. It
includes more than 95 percent of the genomes with long, continuous
stretches of overlapping DNA and represents 7-fold coverage of the
genome. This means that the location of every base, or DNA letter,
in the Saccharomyces genomes was determined an average of 7 times,
a frequency that assures a high degree of accuracy.
Today's research also represents a major step along the path of
bioinformatics, a recent field of science that combines biology
with computing--as not one test tube was used beyond the sequencing
of the species. The project relied completely on computational
"We are entering a new era where computers will provide a bigger
and bigger role to the understanding of biology and genomics,''
Kellis commented.
The Whitehead Institute/MIT Center for Genome Research is an
international leader in the field of genomics, the study of all of
the genes in an organism and how they function together in health
and disease. A flagship of the Human Genome Project, the Center
today houses a broad range of thriving research programs combining
structural genomics, medical and population genetics and clinical
medicine. The Center's annual budget is $80 million, and it employs
350 people, including scientists and medical researchers from
Whitehead, MIT and Harvard.
Howard Hughes Medical Institute
When Manolis Kellis was 12 years old, he and his family moved
from Athens to a small town in southern France. "My dad just woke
up one morning and said, 'Let's go.'" His father, who grew up in a
village in Greece, wanted his children to be bilingual-a skill he
regretted never having gained himself. So they hit the road. "We
didn't ask questions. We just went," says Kellis, who is now a
graduate student working with Eric Lander at the Whitehead
Institute/MIT Center for Genome Research. "Four months later, we
were living in France."
The move changed Kellis's life. "In a new language and new
environment I had to work really hard," he says. "And I learned
that working hard is normal."
It also gave him a unique and intimate appreciation of
mathematics. "Learning a new language, I was able to step back and
realize that words are just placeholders for meanings. They're
abstractions." And the same is true in math-a discipline that is
all about abstraction. "Terms like x and y have no meaning on their
own," notes Kellis. "They are placeholders for other
Four years after the move to France, Kellis was one of the top
100 students in the country. "People said, 'Oh, he's good at math
because he's Greek,'" Kellis laughs. "But really it was
because I had to overcome the language barrier."
Although Kellis loved math, he yearned to find a practical
application for his skills. So, after having learned English as a
foreign language in high school, he enrolled as a computer science
major at the Massachusetts Institute of Technology. There he
learned to use computers to tackle a variety of problems: solving
geometric surfaces in multiple dimensions, building models for
human motion, and programming robots to cooperate by following
simple rules, like ants in a colony. "I worked on a different
problem every six months," says Kellis. Taking on diverse
tasks taught him how to absorb knowledge, adapt quickly to new
situations, and use his talents to do something novel.
That was excellent training for the work Kellis now does with
Lander, where he applies his skills as computer scientist and
mathematician to learn something new about how cells work. "From an
engineer's perspective, looking at life and how life works really
makes sense," he says. After all, a cell is like a robot that
evolution has designed and assembled. The information encoded in
its DNA is like the program that runs the machine. And Kellis
is working on cracking the code that cells use to live-and to
Working with Lander and his colleagues at the Whitehead Institute,
Kellis is comparing the genome sequences of four different
species of Saccharomyces, a budding yeast that bakers and brewers
have used in bread and beer for centuries. Using computer programs
that he wrote himself, Kellis is searching these yeast genomes
for patterns. He is specifically looking for short signals that are
distributed nonrandomly throughout the DNA. Sequences that are
biologically important to the survival of an organism, the
researchers theorize, will be conserved by evolution. They will
look the same and crop up in the same places in the genomes of
different organisms-in this case, closely related species of yeast.
So far the data look good. Kellis and company are finding
patterns and identifying elements that appear again and again
throughout the yeast genomes. Now the researchers are working on
correlating these signals with their functions to understand how
they regulate gene expression and enable genes to work together to
build and operate a machine as dynamic as a living cell.
To tease patterns from these genomes, Kellis has written
thousands of lines of computer code. His days are a continuous loop
of looking at data, writing programs for probing the data, and
testing new ideas. "Work, observe, work, observe, work, observe,"
as Kellis puts it.
Much of his work is actually done at home, where Kellis
shuffles from bed to computer and back again-with occasional trips
to the refrigerator. "The danger of working independently at home
is that you'll make more trips to the fridge than you'll write
lines of code," he says. Of course it also has its advantages. "You
can sleep late if you worked too hard or partied too hard the night
before," says Kellis. "Or you can go to the beach if it's
sunny on a Thursday but rainy on Saturday." But then, it's back to
work-which is fine with Kellis. "Doing a Ph.D. takes a lot of
self-motivation and you have to love what you do," he
explains. "But I love what I do. It's like I'm a kid still playing
with toys. They're just different toys."
Kellis is finishing his second year of graduate school, and if
all goes well he hopes to graduate in a year or so. As for the
future, Kellis is still undecided. He might want to teach, or
do postdoctoral work, or maybe go out and do something totally
new. Until then, he will certainly continue to work hard-and play
hard, too. When he's not in the lab, Kellis enjoys getting
physical: biking, in-line skating, or salsa dancing. "Being stuck
in a chair all the time, my hobbies involve getting out and working
my body," he says. "It's important to try to find some balance. I'm
not just a brain."
First recipient of Paris Kanellakis Fellowship
Manolis Kamvysselis awarded Kanellakis Fellowship at MIT
May 28, 1999
The Department is pleased to announce that Mr. Manolis Kamvysselis
has been chosen to receive the first Paris Kanellakis Fellowship at
MIT. Mr. Kamvysselis came to MIT as a freshman in 1995, expects to
complete his Master of Engineering Degree in June, 1999, and plans
to continue in the Artificial Intelligence Laboratory working on
his PhD.
The Kanellakis Fellowship was established at MIT by General and
Mrs. Eleftherios Kanellakis in memory of their son Paris who
received his PhD from EECS in 1982 and died unexpectedly and
tragically in 1995. The family has established two similar
fellowships at Brown University where Paris was a member of the
computer science faculty. Brown University has established a web
page in recognition of these Fellowships:
MIT EECS - 2003 Sprowls Doctoral Dissertation Award
Manolis Kamvysselis and Dina Katabi receive the Sprowls
Dissertation Award. The award honors the best thesis in the field
of computer science, from any department at MIT. Both graduate
from the department of Electrical Engineering and Computer Science.
Tau Beta Pi Intercollegiate Design Competition
For the fourth straight year, MIT won the Tau Beta Pi National
Engineering Honor Society district design competition on Saturday,
beating out approximately 17 regional competitors, including Yale
University, Brown University, Boston University, and Worcester
Polytechnic Institute.
The team of William H. Stadtlander '99, Matthew S. Duplessie '99,
and Manolis E. I. Kamvysselis '99 captured the $300 first-place
prize. The contest was part of the activities at the annual TBP
District Convention held at the Worcester Polytechnic Institute
this year.
Teams designed airplane carts
The team was challenged to design an airline cart that would
alleviate the problem of stewards suffering back injuries while
serving drinks to passengers. They were given four hours to prepare
the design and a 10-minute presentation.
Their idea consisted of a motorized cart that ran on a rubber
mat. The mat had two grooved tracks to fit the cart's grooved
wheels. The team decided to use a rubber mat so grooves would not
be cut directly into the floor of the plane. The grooved track and
wheels also prevented the cart from moving during turbulence.
Many of the competing teams developed similar motorized-cart
designs. The MIT team considered many other options, from using
tubing to deliver drinks to the passengers to using a spring-based
system to hang a cart from the cabin ceiling.
"They really didn't give us a lot of specifications," Stadtlander
The design problem "very open-ended" and wished the contest could
have "allowed more creativity," so that the competing teams' "ideas
could have been very different," Duplessie said.
Brainstorming was key to success
As in the local competition, the team used brainstorming and math
to win. They "spent a large amount of time brainstorming to iron
out the kinks in the plan," Duplessie said.
They also backed up their design with dimensions and a cost
analysis. "We did a significant bit of math," Duplessie said.
Outfitting a 30-foot airplane with one aisle was estimated to cost
about $5,000, Stadtlander said.
The team also made sure to polish their presentation, since half
their score was based on presentation and half was based on
design. "You can come up with a great idea that works, but if you
can't present it, it's worthless," Stadtlander said.
The team enjoyed the competition. "I would recommend it to any
freshman or sophomore next year," Duplessie said.
MIT International Celebration
"I-Fair has always been the most important event for ISA, just as
much as it is for many international clubs on campus. It's by far
the largest event, and it really fulfills the goal of ISA, which is
to bring closer together the different cultures represented in MIT,"
said Manolis E.I. Kamvysselis '99, president of ISA.
"I-Fair was a huge success this year. It's been the biggest I-Fair
we've had so far," Kamvysselis said. About 2,000 people dropped by
the event, and over 118 countries were represented. A record number
of 43 clubs hosted booths and 27 groups put on performances, he
"I think it was flawless this year; the spirit was there. The people
love it because they see many clubs; clubs love it because they get
a chance to perform," Kamvysselis said.
"The I-Fair is a great opportunity for the clubs at MIT to display
their traditional cultures through dance, food, and music. This
year's show went really well, we had a lot of people who came who
aren't from MIT, such as Harvard [University] and Boston College,"
said Manas D. Ratha '99, treasurer of the ISA.
While some expressed concern about the chilly weather, it was not
enough to the keep people from turning out for the fair. "The
weather could have been a bit warmer and sunnier, but it cooperated
quite well, considering" that weather services had forecasted rain
that day,Kamvysselis said.
Besides the spring I-Fair, ISAalso hosts a "mini I-Fair" in the
fall, but "we don't want to make it as big as I-Fair. I-Fair
originated in the spring, when the good weather begins; it's sunny,
and people have more time to perform," Kamvysselis said.
"I-Fair is an event unique in the life of MITstudents. It's the one
and only event that brings together so many students from so many
different backgrounds and interests and lets them participate
actively in the event, [giving] them a chance to show a bit of
themselves to such a wide and diverse public," Kamvysselis said.
Implementation of exon/intron evolution:
In computer science, evolutionary computation is an important
computational paradigm that comes from the nature. It tries to use
evolution mechanisms existing in the nature to solve computer
problems. Therefore new discoveries and hypothesis on the natural
evolution could give new ideas to evolutionary computation.
According to the known facts, in creatures' DNAs there are large
quantities of genes that are not ``translated'' by the creatures
into the phenotypes, and they are sometimes called ``garbage
genes''. But most people, including me, think they are not garbages
at all. Their functions are simply unknown currently. So many people
use the term introns to refer to such function-unknown genes instead
of ``garbages''. Correspondingly, the known genes are called
exons. Many hypothesis are made on these introns. People think they
could be meta data of the exons, or the records of the whole
evolution histroy[Kam].
In January, a paper in Nature with the title Loss and Recovery of
Wings in Stick Insects[WBM03] reported a new discovery in evolution
-- some species can lose a complex organ and get it back some time
later. This phenomenon may happen for several times during the
evolution. It amazes scientists a lot because it reveals a fact that
some complex functions of a species can be temporarily disabled and
latent in the genes and be triggered by the environment again some
generations later to come back to the species.
This discovery interests me greatly. Associating the introns facts
and the phenomenon revealed by the paper, I come to the hypothesis
that these genes are turned from exons into introns -- thus disabled
temporarily and still in the genes, and later turned from introns
back to exons -- enabled again. It seems that such a mechanism can
probably increase the adaptability of creatures, especially in an
ever-changing environment.
A natural thought is to test whether such a mechanism can result in
increase of adaptability in evolutionary computation. Therefore my
idea for this project comes into being -- a Genetic Programming with
introns and exons.
In this report I explain the experiments on the Genetic Programming
with introns and exons, and show the results I got: Section 2
presents the problem selected for the Genetic Programming to solve;
the implementation is discussed in Section 3; Section 4 explains the
experiments I made, as well as the results I got from the
experiments; finally in Section 5 and 6, the report is concluded and
future work is indicated.