Neural architectures for named entity recognition and relation classi cation in biomedical and clinical texts

dc.contributor.authorSahu, Sunil Kumar
dc.date.accessioned2019-07-16T06:47:18Z
dc.date.accessioned2023-10-20T04:36:53Z
dc.date.available2019-07-16T06:47:18Z
dc.date.available2023-10-20T04:36:53Z
dc.date.issued2018
dc.descriptionSupervisor: Ashish Ananden_US
dc.description.abstractThe increasing number of biomedical and clinical texts such as research articles, discharge summaries, electronic health records and texts created by social network users is an immeasurable source of information. The extracted information can be used for several applications, e.g., construction of medical knowledge bases, drug repurposing etc. Extracting structured information from unstructured text is called information extraction (IE) and is considered as a higher level of natural language processing (NLP) task. Regular organization of shared challenges for the last decade for various information extraction tasks in the biomedical domain has made several standard benchmark datasets publicly available. Availability of the benchmark datasets has led to a continuous development of various methods for information extraction tasks. The majority of existing methods divide IE tasks into several subtasks. Named entity recognition (NER), and relation classification (RC) are the two main subtasks. In each subtask, explicitly designed features are used in machine learning (ML) methods for classification into correct categories. Although ML methods have been successfully used for many biomedical NER and RC tasks, they still face a few challenges. The performance of such methods is highly dependent on the quality of user-designed features. Further, these feature sets also need to be adapted if domain or task is changed from one to another. For instance, a set of morphological feature designed for gene entity recognition may not work for drug or disease name recognition and features designed based on lexical resources forgene entity recognition may not be suitable for disease name recognition. Other features may require domain-specific resources or NLP tools. Another major challenge faced is in making the whole system reproducible and usable in practice. This happens due to the lack of finer details of feature engineering available in the public domain.Recent years have seen renewed interest in representation learning using neural network models. One of the primary motivations of such models is to reduce the efforts required for explicit feature engineering. Representation learning is a way to learn the projection of the data that helps a machine learning model to make the correct prediction. For instance, in an NER task, a good projection is one which embeds linguistics, orthographic, contextual and syntactic information of a word with its representation. Similarly, in an RC task, a good projection would be one which embeds semantic and syntactic information about the sentence with targeted entities. In this thesis, we focus on these two subtasks of IE. Our objective is to use representation learning with reduced explicit feature engineering to benchmark against standard approaches and to analyze the results. Towards this end, we employ several neural network models and analyze their performances on the two subtasks of IEen_US
dc.identifier.otherROLL NO.136101007
dc.identifier.urihttp://172.17.1.107:4000/handle/123456789/1275
dc.language.isoenen_US
dc.relation.ispartofseriesTH-1780;
dc.subjectCOMPUTER SCIENCE AND ENGINEERINGen_US
dc.titleNeural architectures for named entity recognition and relation classi cation in biomedical and clinical textsen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Abstract-TH-1780_136101007.pdf
Size:
65.02 KB
Format:
Adobe Portable Document Format
Description:
ABSTRACT
No Thumbnail Available
Name:
TH-1780_136101007.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format
Description:
THESIS
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: