帮助 本站公告
您现在所在的位置:网站首页 > 知识中心 > 文献详情
文献详细Journal detailed

水稻蛋白质磷酸化位点预测SVM工具的研发
A New Specific SVM Predictor on Protein Phosphorylation Sites in Rice /(Oryza Sativa L./)

导  师: 何华勤

学科专业: 071010

授予学位: 硕士

作  者: ;

机构地区: 福建农林大学

摘  要: 蛋白质磷酸化是一种在蛋白质翻译后最重要且最常见的修饰方式之一/(Post-Translation Modification,PTM/)。蛋白质磷酸化和去磷酸化调控着基因的表达、细胞的生长与分化等生命过程。前人应用生物化学技术已经从不同物种中鉴定出大量的蛋白质磷酸化位点,也开发出广谱或物种特异的蛋白质磷酸化位点预测工具,本实验室也在前期研究中开发出适用于水稻的蛋白质磷酸化预测工具一PhosphoRice。但由于PhosphoRice是一个整合工具,其运行效率与速度受限于其子工具的性能。为此,本文应用支持向量机/(SVM/),结合不同特征提取方法,开发专门用于水稻蛋白质磷酸化位点预测的SVM工具。 首先,从新近发表的文章以及蛋白质数据库Swiss-Prot中下载获得经试验验证的水稻蛋白质磷酸化位点数据,构建正负样本数据集。其中正样本有4966个磷酸化位点,负样本有5586个非磷酸化位点。然后,比较6种特征提取方法/(AF,KNN, CKSAAP及其两两结合法/)对正负样本的处理性能,结果发现AF/_CKSAAP、 CKSAAP和CKSAAP/_KNN的性能现状优于其他3种特征提取策略。再比较4种分类算法,包括决策树、K最近邻、随机森林和支持向量机,对水稻蛋白质磷酸化位点的性能,结果表明,基于CKSAAP、AF/_CKSAAP和CKSAAP/_KNN特征提取策略的SVM模型的预测MCC均大于0.50,显著优于其他3种分类算法的的预测性能。为此,本文应用SVM结合AF/_CKSAAP特征提取,并以CKSAAP和CKSAAP/_KNN为辅助的特征提取方法,构建专门用于水稻蛋白质磷酸化位点预测的Rice/_Phospho1.0工具,其预测的准确率/(ACC/)为80.90/%,马修斯系数/(MCC/)为0.617,明显好于新近发表的5种蛋白质磷酸化位点预测工具的预测性能,包括KinasePhos、DISPHOS、Musite、Scansite和PhosphoRice。最后,应用独立测试集数据,测试Rice/_Phospho1.0工具对丝氨酸S、苏氨酸T、酪氨酸Y磷酸化位点的预测性能,结果发现� Protein phosphorylation is one of the most important protein post-translational modification /(PTM/), which can lead to changes in the catalytic activity of phosphorylated proteins, and influence the physiology of a cell specific processes. As the growing body of experimental identified protein phosphorylation sites data, the researchers used the data to train diffferent algorithms and develop a lot of protein phosphorylated site predictors. But most of the predictors focused on the phosphorylated sites of the proteins in human being and animals. Although PhosphoRice1.0, developed in our previous research work, predicts the specific protein phosphorylated sites in rice /(Oryza sativa L./), its runing efficiecy critically depended on the element predictors because of the integration strategy. Therefore, in this paper, a new specific predictor on protein phosphorylated sites in rice was constructed by using Support Vector Machine /(SVM/) algorithm with different feature selection methds. First, experimental phosphorylation sites of proteins in rice were collected from newly publications and protein database, and the positive and negative datasets were established Totally, there were4966positive sites and5586negative sites. Second, the performance of6feature selection, including AF, KNN, CKSAAP and their combining strategy, was compared. The result implied that AF/_CKSAAP, CKSAAP and CKSAAP/_KNN showed higher performance on protein sequence feature than the other3methods. Meanwhile, the ability to classify the positive and negative data of4classification algorithm, including Decision Tree /(DT/), K-Nearest Neighbors /(KNN/), was analyzed. Third, we used SVM with AF/_CKSAAP, CKSAAP and CKSAAP/_KNN methods to construct phosphorylated sites predictor, Rice/_Phospho, which was specific for the proteins in rice. The accuracy /(ACC/) and Matthews coefficient /(MCC/) of Rice/_Phospho1.0reached80.90/%and0.617, which were significantly higher than that of newly predictors, including KinasePhos, DISPHOS, Musite, Scansite

关 键 词: 水稻 蛋白质磷酸化位点 分类算法 特征提取 预测工具

领  域: [农业科学]

相关作者

作者 周凌燕
作者 屈萍

相关机构对象

机构 中山大学教育学院
机构 中山大学教育学院体育教育系
机构 北京语言大学
机构 暨南大学华文学院

相关领域作者

作者 李振义
作者 吴晨
作者 张琳
作者 丁培强
作者 吴肖林