[1]张 颖,窦一峰.基于WEKA的医学数据分类及糖尿病早期预测[J].医学信息,2021,34(06):32-35.[doi:10.3969/j.issn.1006-1959.2021.06.009]
 ZHANG Ying,DOU Yi-feng.Medical Data Classification and Early Diabetes Prediction Based on WEKA[J].Medical Information,2021,34(06):32-35.[doi:10.3969/j.issn.1006-1959.2021.06.009]
点击复制

基于WEKA的医学数据分类及糖尿病早期预测()
分享到:

医学信息[ISSN:1006-1959/CN:61-1278/R]

卷:
34卷
期数:
2021年06期
页码:
32-35
栏目:
出版日期:
2021-03-15

文章信息/Info

Title:
Medical Data Classification and Early Diabetes Prediction Based on WEKA
文章编号:
1006-1959(2021)06-0032-04
作者:
张 颖窦一峰
(天津市宝坻区人民医院泌尿外科1,网络信息中心2,天津 301800)
Author(s):
ZHANG YingDOU Yi-feng
(Department of Urology1,Network Information Center 2,People’s Hospital of Baodi District,Tianjin 301800,China)
关键词:
医学数据算法糖尿病
Keywords:
Medical dataAlgorithmDiabetes
分类号:
R587.1;R195.1
DOI:
10.3969/j.issn.1006-1959.2021.06.009
文献标志码:
A
摘要:
目的 探索机器学习算法及衍生算法在医学数据集上的分类效果,以期更好的发现计算机在辅助医学诊断方面的应用价值。方法 以皮马印第安人糖尿病数据集为例,利用WEKA平台构建机器学习模型,包括基于贝叶斯定理的NavieBayes、基于集成学习的Bagging、基于树思想的J48等模型,共六大类21种算法,运用多维度多指标对所建立模型的预测效果进行评价。结果RMSE和RRSE均较小的前5位算法依次为Logistic、LMT、RotationForest、RandomForest和Bagging;LMT、SMO、Logistic、NavieBayes、RotationForest的分类正确率均超过了76%,其真阳性率均在76%以上,ROC曲线显示,除SMO外,其余算法曲线下面积均在0.82以上。结论 在该糖尿病数据集上的分类预测效果较好的算法有6种,分别是LMT、SMO、Logistic、NavieBayes、RotationForest和Bagging,均具有较高的正确率和预测价值。
Abstract:
Objective To explore the classification effect of machine learning algorithms and derivative algorithms on medical data sets, in order to better discover the application value of computers in assisted medical diagnosis.Methods Taking the Pima Indians diabetes dataset as an example, use the WEKA platform to build a machine learning model,it includes models such as NavieBayes based on Bayes’ theorem, Bagging based on ensemble learning, and J48 based on tree ideas. There are 21 algorithms in six categories.Using multiple dimensions and multiple indicators to evaluate the prediction effect of the established model.Results The top 5 algorithms with smaller RMSE and RRSE were Logistic, LMT, RotationForest, RandomForest, and Bagging;The classification accuracy rates of LMT, SMO, Logistic, NavieBayes, and RotationForest all exceed 76%, and their true positive rates were all above 76%. The ROC curve showed that, except for SMO, the area under the other algorithm curves was above 0.82.Conclusion There are 6 algorithms with better classification prediction effect on this diabetes dataset, namely LMT, SMO, Logistic, NavieBayes, RotationForest and Bagging, all of which have high accuracy and predictive value.

参考文献/References:

[1]肖文翔.基于电子病历分析的糖尿病患病风险数据挖掘方法研究[D].青岛大学,2015. [2]缪琦.基于随机森林和支持向量机的糖尿病风险预测方法研究[D].江苏大学,2019. [3]陈真诚,杜莹,邹春林,等.基于K-Nearest Neighbor和神经网络的糖尿病分类研究[J].中国医学物理学杂志,2018,35(10):1220-1224. [4]周翔海,张秀英,罗樱樱,等.2型糖尿病及糖尿病前期简易决策树模型外部验证的研究[J].中国糖尿病杂志,2014,22(4):297-301. [5]熊祥樽.基于数据挖掘的T2DM患者胰岛素用药预测模型建立[D].电子科技大学,2020. [6]黄仕鑫,罗佳婧,罗亚玲,等.基于BP神经网络模型鉴别2型糖尿病肾病的认知图研究[J].中华内分泌代谢杂志,2017(33):949. [7]吴兴惠,周玉萍,邢海花,等.机器学习分类算法在糖尿病诊断中的应用研究[J].电脑知识与技术,2018,14(35):177-178,195. [8]赵培培.机器学习多种算法在糖尿病检测分类中的应用研究[D].兰州大学,2018.

相似文献/References:

[1]李 叶.网络心电系统在新疆某三甲医院的分级应用[J].医学信息,2020,33(01):19.[doi:10.3969/j.issn.1006-1959.2020.01.008]
 LI Ye.Hierarchical Application of Network ECG System in a Tertiary Hospital in Xinjiang[J].Medical Information,2020,33(06):19.[doi:10.3969/j.issn.1006-1959.2020.01.008]

更新日期/Last Update: 1900-01-01