Перейти к содержимому
UzScite
  • НСИ
    • Новости События
    • Методическая информация
    • Нормативные документы
  • Каталог журналов
  • Указатель авторов
  • Список организаций

Ансамблевые методы в биоинформатике: опыт их применения в геномике и QSAR моделировании

Адылова Ф.Т.

Икрамов А.А.

Проблемы вычислительной и прикладной математики

  • № 3(9) 2017

Страницы: 

87

 – 

94

Язык: русский

Открыть файл статьи
Открыть страницу статьи в Интернет

Аннотация

Сегодня исследования в вычислительной биологии широко используют методы ансамбля из-за их уникальных преимуществ в работе с выборками малых размеров, высокой размерности признаков, и сложных структур данных. Эта статья имеет две цели. Первая,- дать обзор наиболее широко используемых методов обучения ансамбля и их применения в различных задачах биоинформатики, — экспрессии генов, протеомики на основе масс-спектрометрии , идентификации взаимодействия генов и прогнозирования регуляторных элементов из последовательностей ДНК и белков, QSAR моделировании. Вторая цель,- обобщить тенденции будущего развития методов ансамбля в области биоинформатики. Обсуждаются перспективные направления, такие как ансамбль опорных векторов, мета-ансамбль, и ансамбль для отбора признаков.

Ensemble learning is an intensively studies technique in machine learning and pattern recognition. Recent work in computational biology has seen an increasing use of ensemble learning methods due to their unique advantages in dealing with small sample size, high-dimensionality, and complexity data structures. The aim of this article is two-fold. First, it is to provide a review of the most widely used ensemble learning methods and their application in various bioinformatics problems, including the main topics of gene expression, mass spectrometry-based proteomics, gene-gene interaction identification from genome-wide association studies, prediction of regulatory elements from DNA and protein sequences and QSAR modelling. Second aim is to identify and summarize future trends of ensemble methods in bioinformatics. Promising directions such as ensemble of support vector machine, meta-ensemble, and ensemble based feature selection are discussed.

Бугунги кунда ҳисоблаш биологияси тадқиқодларида кичик ўлчамли танланмалар, юқори ўлчамли белгилар ва маълумотларнинг мураккаб структуралари билан ишлашда яққол устунлиги туфайли ансамблли методлардан қўлланилади. Мақолада асосий икки мақсад кўзланган. Биринчиси ген экспрессияси, масс-спектрометрия асосида протеомикалар, генлар ўзаро таъсир идентификация ДНК ва оқсил кетма кетликларидан регулятор элементларини башоратлаш QSAR моделлаштириш каби биоинформатиканинг турли масалаларида кенг қўлланилаётган ансамблли ўрганиш методларини тахлилини келтириш. Иккинчи биоинформатика соҳасида келажакда ансамблли методларни ривожланиш йўналишларини умумлаштириш. Таянч векторлар ансамбли, мета ансамбл ва белгиларни танлаш учун ансамбл каби перспектив йўналишлар муҳокама қилинади.

Список использованных источников

  1. Dietterich T.G. Ensemble methods in machine learning. In: Proceedings of Multiple Classifier System. Vol. 1857. Springer.- 2000. - Pp. 1-15.
  2.  Kuncheva L. Combining Pattern Classifiers: Methods and Algorithms. Wiley. - 2004.
  3.  Webb G.I., Zheng Z. Multistrategy ensemble learning: Reducing error by combining ensemble learning techniques. IEEE Transactions on Knowledge and Data Engineering. - 2004; 16(8):980-991.
  4.  Breiman L. Arcing classifiers (with discussion). The Annals of Statistics. - 1998; 26(3):801-849.
  5.  Schapire R.E., Freund Y., Bartlett P., Lee W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics. - 1998; 26(5):1651-1686.
  6.  Tsymbal A., Pechenizkiy M., Cunningham P. Diversity in search strategies for ensemble feature selection. Information Fusion. - 2005; 6:83-98.
  7.  Wolpert D.H. Stacked generalization. Neural Networks. - 1992; 5(2):241-259.
  8.  Kuncheva L.I. Switching between selection and fusion in combining classifiers: An experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. - 2002; 32(2):146-156.
  9.  Gama J., Brazdil P. Cascade generalization. Machine Learning. - 2000; 41(3):315-343.
  10.  Asyali M.H., Colak D., Demirkaya O., Inan M.S. Gene expression profile classification: a review. Current Bioinformatics. - 2006; 1(1):55-73.
  11.  Somorjai R.L., Dolenko B., Baumgartner R., Crow J.E., Moore J.H. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics. - 2003; 19:14841491.
  12.  Saeys Y., Lnza I., Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. - 2007; 23(19):2507-2517.
  13.  Hilario M., Kalousis A. Approaches to dimensionality reduction in proteomic biomarker studies. Briefings in Bioinformatics. - 2008; 9(2):102-118.
  14.  Braga-Neto U., Dougherty E. Is cross-validation valid for small-sample microarray classification? Bioinformatics. - 2004; 20(3):374-380.
  15.  Dettling M.,B Uhlmann P. ¨Boosting for tumor classification with gene expression data. Bioinformatics. - 2003; 19(9):1061-1069.
  16.  Long P. Boosting and Microarray Data. Machine Learning. - 2003; 53:31-44.
  17.  Tan A., Gilbert D. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics. - 2003; 2(3 Suppl):S75-S83.
  18.  Qu Y., Adam B.,Yasui Y.,Ward M., Cazares L., Schellhammer P., et. al. Boosted Decision Tree Analysis of Surface-enhanced Laser Desorption/Ionization Mass Spectral Serum Profiles Discriminates Prostate Cancer from non-cancer Patients. Clinical Chemistry. - 2002; 48(10):1835-1843.
  19.  Lee J., Lee J., Park M., Song S. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics &Data Analysis. - 2005; 48:869-885.
  20.  Dıaz-Uriarte R., de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. - 2006; 7:3.
  21.  Izmirlian G. Application of the random forest classification algorithm to a SELDI-TOF proteomics study in the setting of a cancer prevention trial. Annals of the New York Academy of Sciences. - 2004; 1020:154-174.
  22.  Zhang H., Yu C., Singer B. Cell and tumor classification using gene expression data: Construction of forests. Proceedings of National Academy Science. - 2003; 100(7):4168-4172.
  23.  Geurts P., Fillet M., Seny D., Meuwis M., Malaise M., Merville M., et al. Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics. - 2005; 21(15):3138-3145.
  24.  Montana G. Statistical methods in genetics. Briefings in Bioinformatics. - 2006; 7(3):297-308.
  25.  Hirschhorn J., Daly M. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics. - 2005; 6(2):95-108.
  26.  Cordell J.H. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics. - 2009; 10:392-404.
  27.  Zhang H., Bonney G. Use of classification trees for association studies.Genetic Epidemiology. - 2000; 19(4):323332.
  28.  Huang J., Lin A., Narasimhan B., Quertermous T., Hsiung C.A., Ho L.T., et.al. Tree-structured supervised learning and the genetics of hypertension. Proceedings of the National Academy of Sciences. - 2004; 101(29):10529-10534.
  29.  Ye Y., Zhong X., Zhang H. A genome-wide tree-and forest-based association analysis of comorbidity of alcoholism and smoking. BMC Genetics. - 2005; 6(Suppl. 1):S135.
  30.  Zhang Z., Zhang S., Wong M.Y., Wareham N.J., Sha Q. An ensemble learning approach jointly modeling main and interaction effects in genetic association studies. Genetic Epidemiology. - 2008; 32(4):285-300.
  31.  McKinney B.A., Reif D.M., Ritchie M.D., Moore J.H. Machine learning for detecting gene-gene interactions: a review. Applied Bioinformatics. - 2006; 5(2):77-88.
  32.  Bureau A., Dupuis J., Hayward B., Falls K., Van Eerdewegh P. Mapping complex traits using Random Forests. BMC genetics. - 2003; 4(Suppl. 1):S64.
  33.  Klein R.J., Zeiss C., Chew E.Y.,Tsai J.Y., Sackler R.S., Haynes C., et. al. Complement factor H polymorphism in age-related macular degeneration. Science. - 2005; 308(5720):385.
  34.  Meng Y., Yu Y., Cupples L.,Farrer L., Lunetta K. Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics. - 2009; 10:78.
  35.  Jiang R., Tang W., Wu X., Fu W. A random forest approach to the detection of epistatic interactions in casecontrol studies. BMC Bioinformatics. - 2009; 10(Suppl. 1):S65.
  36.  Caragea C., Sinapov J., Silvescu A., Dobbs D. Glycosylation site prediction using ensemble of SVM classifiers. BMC Bioinformatics. - 2007; 8:438.
  37.  Peng Y. Anovel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine. - 2006; 36:553-573.
  38.  Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics. - 2004; 20(18):3583-3593.
  39.  Liu K.H., Xu C.G. A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics. - 2009; 25(3):331-337.
  40.  Bhanot G., Alexe G., Venkataraghavan B., Levine AJ. A robust meta-classification strategy for cancer detection from MS data. Proteomics. - 2006; 6(2):592-604.
  41.  Kedarisetti K.D., Kurgan L., Dick S. Classifier ensembles for protein structural class prediction with varying homology. Biochemical and Biophysical Research Communications. - 2006; 348(3):981-988.
  42.  Hassan M.R., Hossain M.M., Bailey J., Macintyre G., Ho J., Ramamohanarao K. A voting approach to identify a small number of highly predictive genes using multiple classifiers. BMC Bioinformatics. - 2009; 10 (Suppl. 1):S19.
  43.  Boulesteix A.L., Slawski M. Stability and aggregation of ranked gene lists.Briefings in Bioinformatics. - 2009; 10(5):556-568.
  44.  Dutkowski J., Gambin A. On consensus biomarker selection. BMC Bioinformatics. - 2007; 8(Suppl 5):S5.
  45.  Zhang Z., Yang P., Wu X., Zhang C. An agent-based hybrid system for microarray data analysis. IEEE Intelligent Systems. - 2009; 24(5):53-63.
  46.  Abeel T., Helleputte T., VandePeer Y., Dupont P., Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. - 2010; 26(3):392-398.
  47.  Netzer M., Millonig G., Osl M., Pfeifer B., Praun S., Villinger J., et al. A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics. - 2009; 25(7):941-947.
  48.  Yang Y.H., Xiao Y., Segal M.R. Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics. - 2005; 21(7):1084-1093.
  49.  Chan D., Bridges S.M., Burgess S.C. In: An Ensemble Method for Identifying Robust Features for Biomarker Discovery. Chapman& Hall; - 2007. - Pp. 377-392.
  50.  Pengyi Yang, Yee Hwa Yang, Bing B. Zhou1, and Albert Y. Zomaya A review of ensemble methods in bioinformatics Current Bioinformatics. - 2010. 5, (4):296-308.

Список всех публикаций, цитирующих данную статью

Copyright © 2025 UzScite | E-LINE PRESS