«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1006-1959.2024.01.012]
点击复制

基于K-BERT的中文妇产科电子病历实体识别研究()

分享到：

医学信息[ISSN:1006-1959/CN:61-1278/R]

卷:: 37卷
期数:: 2024年01期

页码:: 65-71

栏目:: 医学数据科学

出版日期:: 2024-01-01

文章信息/Info

Title:: Research on Entity Recognition of Chinese Obstetrics and Gynecology Electronic Medical Records Based on K-BERT

文章编号:: 1006-1959（2024）01-0065-07

作者:: 张由; 李舫; （上海电力大学计算机科学与技术学院，上海 201306）

Author(s):: ZHANG You; LI Fang; (College of Computer Science and Technology,Shanghai University of Electric Power,Shanghai 201306,China)

关键词:: K-BERT; 双向长短时记忆网络; 条件随机场; 妇产科电子病历; 命名实体识别

Keywords:: K-BERT; Bidirectional long short-term memory; Conditional random fields; Obstetrics and gynecology electronic medical records; Name entity recognition

分类号:: TP391.1

DOI:: 10.3969/j.issn.1006-1959.2024.01.012

文献标志码:: A

摘要:: 针对利用预训练模型进行中文妇产科电子病历命名实体识别时，BERT缺乏一定的医疗领域专业知识而导致其识别性能下降的问题，提出了一种基于知识图谱的预训练模型——K-BERT的命名实体识别模型K-BERT-BiLSTM-CRF。通过K-BERT预训练模型获取包含医学背景知识的语义特征向量，利用双向长短时记忆网络（BiLSTM）与条件随机场（CRF）提取上下文相关特征并且解决标签偏移问题，完成实体识别。利用真实妇产科医疗电子病历数据集进行训练，K-BERT-BiLSTM-CRF模型的F1值达到了90.04%。实验表明，相比一般BERT的模型，K-BERT-BiLSTM-CRF命名实体识别模型在中文妇产科电子病历领域上的表现更优异，识别效果更好。

Abstract:: When the pre-trained model is used to name entity recognition of Chinese obstetrics and gynecology electronic medical records, BERT lacks certain professional knowledge in the medical field, which leads to the decline of its recognition performance. A pre-trained model based on knowledge graph-K-BERT name entity recognition model K-BERT-BiLSTM-CRF is proposed. The K-BERT pre-training model is used to obtain the semantic feature vector containing the medical background knowledge, and the bidirectional long short-term memory network (BiLSTM) and conditional random field (CRF) are used to extract the context-related features and solve the label offset problem to complete the entity recognition. Using the real obstetrics and gynecology medical electronic medical record data set for training, the F1 value of the K-BERT-BiLSTM-CRF model reached 90.04%. Experiments show that compared with the general BERT model, the K-BERT-BiLSTM-CRF name entity recognition model performs better in the field of Chinese obstetrics and gynecology electronic medical records, and the recognition effect is better.

参考文献/References:

[1]卫生部.电子病历基本规范(试行)[J].中国卫生质量管理,2010,17(4):22-23.[2]Gandhi H,Attar V.Extracting Aspect Terms using CRF and Bi-LSTM Models[J].Procedia Computer Science,2020,167(1):2486-2495.[3]Lample G,Ballesteros M,Subramanian S,et al.Neural architectures for named entity recognition[EB/OL].(2016-03)[2023-03-10].https://www.researchgate.net/publication/305334469_Neural_Architectures_for_Named_Entity_Recognition.[4]王若嘉,魏思毅,王纪民.BiLSTM-CRF模型在中文电子病历命名实体识别中的应用研究[J].文学与数据,2019,1(2):53-66.[5]李超凡,马凯.基于词嵌入和BiLSTM-CRF模型的医疗记录实体识别方法[J].中国数字医学,2022,17(4):32-36.[6]Lu NJ,Zheng J,Wu W,et al.Chinese clinical named entity recognition with word-level information incorporating dictionaries[C]//2019 International Joint Conferenceon Neural Networks.Budapest:IEEE,2019:1-8.[7]Devlin J,Chang MW,Lee K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[EB/OL].(2019-05-24)[2023-03-10].https://arxiv.org/pdf/1810.04805.pdf.[8]Yang B,Li D,Yang N,et al.Intelligent judicial research based on BERT sentence embedding and multi-level attention CNNs[EB/OL].(2019-09-21)[2023-03-10].https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=IPFD&filename=JKDZ201909002038.[9]谢腾,杨军安,刘辉.基于BERT-BiLSTM-CRF模型的中文实体识别[J].计算机系统应用,2020,29(7):48-55.[10]Liu ML,Zhou XS,Cao Z,et al.Team MSIIP atCCKS 2019 Task 1[C]//2019 China Conference on Knowledge Graph and Semantic Computing. Hangzhou: ChineseInformation Processing Society of China,2019:1-11.[11]张芳丛,秦秋莉,姜勇,等.基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J].数据分析与知识发现,2022,6(2):251-262.[12]Bekaert S,Van Hecke A,Remmen R,et al.Women’s privacy and confidentiality concerns when consulting with health care providers about a sexually transmitted infection[J].J Obstet Gynecol Neonatal Nurs,2018,47(4):512-520.[13]Liu W,Zhou P,Zhao Z,et al.K-BERT: Enabling Language Representation with Knowledge Graph[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(3):2901-2908.[14]Ma X,Tao Z,Wang Y,et al.Long short-term memory neural network for traffic speed prediction using remote microwave sensor data[J].Transp Res Part C Emerg Technol,2015,54:187-197.[15]Fukada T,Schuster M,Sagisaka Y.Phoneme boundary estimation using bidirectional recurrent neural networks and its applications[J].Syst Comput Jpn,1999,30(4):20-30.[16]Graves A,Schmidhuber J.Framewise phoneme classification withbidirectional LSTM and otherneural network architectures[J].Neural Network,2005,18(5-6):602-610.[17]Cornegruta S,Bakewell R,Withey S,et al.Modelling radiological language with bidirectional long short-term memory networks[EB/OL].(2016-09-27)[2023-03-10].https://arxiv.org/pdf/1609.08409.pdf.[18]Zweig G,Nguyen P,van Compernolle D,et al.Speech Recognition with Segmental Conditional Random Fields: A Summary of the JHU CLSP 2010 Summer Workshop[C]//Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).2011:5044-5047.[19]Paszke A,Gross S,Chintala S,et al.Automatic differentiation in PyTorch[EB/OL].(2017-10-28)[2023-03-10].https://openreview.net/pdf?id=BJJsrmfCZ.

更新日期/Last Update: 1900-01-01

医学信息[ISSN:1006-1959/CN:61-1278/R]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics