Neural architectures for named entity recognition and relation classi cation in biomedical and clinical texts

Sahu, Sunil Kumar

Neural architectures for named entity recognition and relation classi cation in biomedical and clinical texts

dc.contributor.author	Sahu, Sunil Kumar
dc.date.accessioned	2019-07-16T06:47:18Z
dc.date.accessioned	2023-10-20T04:36:53Z
dc.date.available	2019-07-16T06:47:18Z
dc.date.available	2023-10-20T04:36:53Z
dc.date.issued	2018
dc.description	Anand, Ashish	en_US
dc.description.abstract	The increasing number of biomedical and clinical texts such as research articles, discharge summaries, electronic health records and texts created by social network users is an immeasurable source of information. The extracted information can be used for several applications, e.g., construction of medical knowledge bases, drug repurposing etc. Extracting structured information from unstructured text is called information extraction (IE) and is considered as a higher level of natural language processing (NLP) task. Regular organization of shared challenges for the last decade for various information extraction tasks in the biomedical domain has made several standard benchmark datasets publicly available. Availability of the benchmark datasets has led to a continuous development of various methods for information extraction tasks. The majority of existing methods divide IE tasks into several subtasks. Named entity recognition (NER), and relation classification (RC) are the two main subtasks. In each subtask, explicitly designed features are used in machine learning (ML) methods for classification into correct categories. Although ML methods have been successfully used for many biomedical NER and RC tasks, they still face a few challenges. The performance of such methods is highly dependent on the quality of user-designed features. Further, these feature sets also need to be adapted if domain or task is changed from one to another. For instance, a set of morphological feature designed for gene entity recognition may not work for drug or disease name recognition and features designed based on lexical resources forgene entity recognition may not be suitable for disease name recognition. Other features may require domain-specific resources or NLP tools. Another major challenge faced is in making the whole system reproducible and usable in practice. This happens due to the lack of finer details of feature engineering available in the public domain.Recent years have seen renewed interest in representation learning using neural network models. One of the primary motivations of such models is to reduce the efforts required for explicit feature engineering. Representation learning is a way to learn the projection of the data that helps a machine learning model to make the correct prediction. For instance, in an NER task, a good projection is one which embeds linguistics, orthographic, contextual and syntactic information of a word with its representation. Similarly, in an RC task, a good projection would be one which embeds semantic and syntactic information about the sentence with targeted entities. In this thesis, we focus on these two subtasks of IE. Our objective is to use representation learning with reduced explicit feature engineering to benchmark against standard approaches and to analyze the results. Towards this end, we employ several neural network models and analyze their performances on the two subtasks of IE	en_US
dc.identifier.other	ROLL NO.136101007
dc.identifier.uri	https://gyan.iitg.ac.in/handle/123456789/1275
dc.language.iso	en	en_US
dc.relation.ispartofseries	TH-1780;
dc.subject	COMPUTER SCIENCE AND ENGINEERING	en_US
dc.title	Neural architectures for named entity recognition and relation classi cation in biomedical and clinical texts	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Abstract-TH-1780_136101007.pdf
Size:: 65.02 KB
Format:: Adobe Portable Document Format
Description:: ABSTRACT

Download

Name:: TH-1780_136101007.pdf
Size:: 1.77 MB
Format:: Adobe Portable Document Format
Description:: THESIS

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

PhD Theses (Computer Science and Engineering)