Machine Learning for Bioinformatic Data Mining
生物信息数据挖掘中的机器学习
2006年4月27日 10:00-11:30
FIT 信息科学技术大楼一区315室
报告人
Sun-Yuan Kung
Professor of Princeton University
大阳城国际娱乐官网大阳城国际娱乐官网主办
Sun-Yuan Kung教授自1988任IEEE Fellow,1989-1991任IEEE信号处理协会Board of Governor 成员,他组建了IEEE信号处理协会中的多个技术委员会(TC),如VLSI Signal Processing TC (1984), Neural Networks for Signal Processing TC (1991) and Multimedia Signal Processing TC (1998);他曾任IEEE Transactions on Signal Processing编委,现担任Journal of VLSI Signal Processing Systems的主编。
Kung教授是400多篇文章的作者,著有多本教科书,如"VLSI and Modern Signal Processing"(Prentice-Hall),"VLSI Array Processors"(Prentice-Hall), "Digital Neural Networks''(Prentice-Hall), "Principal Component Neural Networks''(John-Wiley); and "Biometric Authentication: A Machine Learning and Neural Network Approach''(Prentice-Hall)。Kung教授荣获多项学术奖,如 IEEE Signal Processing Society's Technical Achievement Award (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); IEEE Signal Processing Society's Best Paper Award (1996); and IEEE Third Millennium Medal (2000).
联系人:
大阳城国际娱乐官网:康毅,6279-5788,xxxy@tsinghua.edu.cn
电子工程系:何芸,6278-1413,hey@video.mdc.tsinghua.edu.cn
讲演摘要:
Genomic bioinformatics represents a natural convergence of life science and information science. The DNA sequencing and expression profiling represent two main modalities of genomic information sources. The genome is not just a collection of genes working in isolation, but it encompasses global and highly coordinated control of information to carry out a range of cellular functions. Therefore, it is imperative to conduct a genome-wide exploration. Note that genome-wide analysis via pure DNA sequencing is computationally prohibitive. In contrast, expression of several thousands of genes can be measured simultaneously by DNA microarrays, thus permitting discovery of clusters of correlated genes. It is obvious that microarray data analysis will play a vital role in the future genome-wide bioinformatic study.
It is crucial not only to know how to cluster data but also to find appropriate way of looking at the genomic data. In other words, extraction of relevant features is critical for cluster discovery. We shall present a comprehensive set of coherence models to better capture the biological relevant features of genes. In addition, we adopt as the classification architecture several existing neural networks, e.g. SVM or decision-based neural network (DBNN). Our fusion model is built upon the classic mixture-of-experts (MOE) architecture: (1) a local expert is assigned to cover each modality; (2) a gating agent is then adopted to fuse the local scores to reach a Bayesian optimal decision. Based on the standard yeast database, the proposed machine learning/fusion system yields satisfactory performance in predicting several well-studied yeast gene groups e.g. ribosomal and molecular activity genes.
With massive amount of data having to be analyzed, genomic study will become inevitably dependent on advanced machine learning techniques. On the other hand, any computationally based genomic prediction remains untrustworthy until a careful and laborious biological verification is performed. This points to an increasingly symbiotic relationship between the machine learning and genomic technologies.