PhD Theses (Electronics and Electrical Engineering)

Browse

Recent Submissions

Now showing 1 - 20 of 247
  • Item
    Modeling, Optimization and Analysis of Wireless Information and Energy Transfer in Beyond 5g Iot Networks
    (2023) Kumar, Chandan
    This thesis focuses on system modeling, optimization and comprehensive performance analysis of wireless information and energy transfer to the Internet-of-Things (IoT) devices. These devices are likely to form a core component of beyond fifth generation (5G) wireless systems in order to support numerous applications foreseen in such systems. We first investigate whether a base station (BS) with massive number of antennas can support joint machine-centric communication among the IoT devices and human-centric communication among the mobile terminals. To this end, we derive downlink spectral efficiency (SE) of the IoT devices with maximum ratio precoding when channel estimates are acquired via the proposed distance-dependent grouping based hybrid pilot assignment strategy. As benchmarks, we also evaluate SE under non-orthogonal pilot assignment and distance-independent grouping. We show that under channel inversion based power control at the BS, the proposed pilot assignment and channel estimation strategy yields the highest sum SE and can serve the largest number of IoT devices when compared to the benchmarks. The corresponding performance under max-min power allocation is also presented
  • Item
    Cardiac Parameters Estimation Using Seismocardiographic and Remote Photoplethysmographic Signals
    (2025) Das, Mousumi
    Cardiovascular diseases (CVDs) are major risk factors contributing to the increasing death rate. Effective and regular monitoring of cardiac activities are useful for early detection and clinical management of the CVDs. Many vital parameters, such as heart rate (HR), heart rate variability (HRV), blood pressure (BP), oxygen saturation (SpO2), and respiratory rate (BR) provide insight to cardiac health and help in diagnosing and treating life-threatening diseases. In this study, two emerging cardiac modalities, such as seismocardiography (SCG) and remote photoplethysmography (rPPG) are considered for the estimation of cardiac vital parameters. The SCG is a non-invasive technique that captures the chest wall vibrations induced by cardiac mechanical activities. The acquired SCG signal needs precise delineation and feature extraction prior to the measurement of human vital parameters. The first part of the thesis involves the detection of the prominent peaks of SCG cycles and investigates their possible clinical applications. A data-adaptive modified variational mode decomposition (MVMD) method along with simple decision rules are incorporated to extract two fiducial points, AO and post-AC (pAC) peaks. Later, these points are utilized to derive systolic blood pressure (SBP), diastolic blood pressure (DBP) and HRV parameters. Another application is explored, which utilizes these feature points along with the demographical information of the volunteers to identify ventricular depolarization events using a deep feedforward neural network (DFN). The proposed methods are evaluated on publicly available CEBS database (at PhysioNet archive) and in-house recordings created using a small electronic circuit board consisting of a 3D MEMs-based accelerometer, pre-amplifier, and a filter.
  • Item
    Experimental Investigation of Energy Consumption and Performance of an Electric Vehicle Powertrain on Different Laboratory and Real-World Driving Cycles and Drive Modes
    (2021) Lairenlakpam, Robindro
    Over the last few years, the prospect of a rapid rise in global temperature and air pollution has created concerns (about global warming, health issues) and the need to reduce the use of fossil fuels and the associated emissions. Vehicle emissions and Green-House Gases (GHGs) can have adverse impacts on health (such as cardiopulmonary diseases), environmental damage, and contribution to global warming. Road transport is one of the major contributors to the growing air quality problems in Indian cities. There is a need for greener or alternative vehicle technologies to address the vehicle emissions issue. So, researchers are focused on developing alternative technologies such as hybrid electric, electric, fuel cell, and plug-in vehicles. Hence the necessity of Electric Vehicles (EVs) has been realized because of their substantial advantages over conventional Internal Combustion Engine (ICE) vehicles.
  • Item
    Exploration of Source and Filter Information for the Detection of Replay Attacks in Speaker Verification
    (2025) Jelil, Sarfaraz
    Automatic speaker verification (ASV) is defined as the task of accepting or rejecting an identity claim of a speaker based on their speech. ASV systems are prone to different kinds of spoofing attacks where the system is presented with a spoofed speech signal instead of a speech signal from a genuine speaker. These spoofing attacks can be a serious threat to an ASV system as they can increase false acceptance rates and negatively impact the performance of the system. Hence, it becomes essential to detect these spoofing attacks and protect the security of an ASV system. This thesis deals with a specific kind of spoofing attack called the replay attack. A replay attack is performed by secretly recording the speech of a genuine user of an ASV system and playing it back to the system to obtain unauthorized access.
  • Item
    Hand Gesture Detection and Recognition for Gesture-based Patient Rehabilitation and Assistance Systems
    (2024) Dutta, H Pallab Jyoti
    Hand gestures serve as a natural and widely used means of human interaction, playing a crucial role in establishing seamless human-computer interactions. However, to facilitate effortless interactions, precise decoding of the hand gestures is essential, which is hindered by background clutter, variations in illumination, the presence of skin areas, such as hands or faces in the vicinity, occlusion, and variable hand shapes and sizes. Much work has been done in the literature to address these issues and recognize the gestures, but the generalization is yet to be achieved. This dissertation aims to address these concerns by developing a robust method for hand gesture recognition and applying it to create interfaces that enable human-computer interaction tailored to specific human needs. A method is proposed to segment the hand region in an image that removes the irrelevant information from the background. For this, two segmentation models were proposed; one model utilizes spatial and channel attention and the other benefits from combining a convolution neural network and a linearized transformer unit. Moreover, a novel loss function optimizes the models to resolve class imbalance, ensure boundary smoothness, and retain the hand’s shape. These segmented results were further utilized to obtain hand gesture recognition results in a two-stage arrangement. A novel adaptive kernel channel attention layer assists the recognition network in achieving accurate results. The recognition accuracy for two benchmark datasets was 93.8% and 98.0%, which highlights the preciseness of the proposed approach. This two-stage approach is not very suitable for online applications. Therefore, three hand detection methods that localized the hand region and gesture class con currently were proposed. The first method is an anchor boxes-based RetinaNet CBAM hand gesture detection model. The second and third methods are anchor less and detect hand gestures using a detection transformer and CenterNet-based model, respectively. The best-performing model, i.e., the second method, achieved a recognition rate of 89.6% and 100% for two benchmark datasets. Once a robust detection model is available, it can be used to model gesture-operated interfaces for specific tasks. Hand keypoint detection also plays an important role in these interfaces, and hence, a robust keypoint detection model with a multiscale attention block is proposed. In this work, two interfaces were designed that cater to patients undergoing hand rehabilitation and patients in hospitals communicating with medical staff. The proposed methods underwent thorough qualitative and quantitative analysis, revealing state-of-the-art performance even under challenging conditions. The seam less integration of the hand detection algorithm into the interfaces was also success fully accomplished. Patients using the rehabilitation interface reported noticeable improvements in hand functioning, while those utilizing the communication inter face experienced smooth and efficient communication with medical staff. These outcomes underscore the effectiveness of the proposed methods, demonstrating their practical applicability in real-world scenarios.
  • Item
    Modeling and Analysis of Co-Located and Cell-Free Massive MIMO Enabled Underlay Spectrum Access Networks
    (2025) Pothan, Enukonda Venkata
    The design of wireless communication networks continues to evolve in order to meet the ever-increasing requirements for higher spectral efficiency (SE) and the need to support diverse technologies, services and applications. The data rates and the number of users that can be served depend heavily on the wireless spectrum that is available. However, below 6 GHz, we do not have sufficient free spectrum available for new technologies. This has motivated spectrum regulatory bodies around the globe, for example, the Federal Communications Commission (FCC) in the USA to actively consider spectrum sharing as a potential solution to enable efficient reuse of the already allocated spectrum. In this thesis, we focus on concurrent spectrum sharing, where the cognitive user (CU) also referred to as the unlicensed user may transmit simultaneously with the incumbent primary user (PU) who owns the spectrum provided it satisfies an interference constraint. To prevent adverse impact of spectrum sharing on the primary performance, the CU must adapt its transmit power, which, in turn, limits its SE measured in bits/s/Hz.
  • Item
    Modelling, Analysis and Optimization of Non-orthogonal Multiple Access in Next Generation Wireless Communication Systems
    (2024) Pawar, Aditya Raosaheb
    In this thesis, we present detailed performance analysis and optimization of non-orthogonal multiple access (NOMA) systems aided by massive multiple input multiple output (MIMO) and intelligent reflecting surface (IRS). To being with, we analyze the uplink of a massive MIMO-NOMA system and deduce new lower bounds on the achievable spectral efficiency (SE) based on zero-forcing (ZF) decoding at the base station (BS). User grouping and power allocation are employed to regulate the performance of users in a NOMA system. To cancel the inter-group interference, the ZF decoder is designed as a function of channel estimates acquired based on two low overhead channel estimation schemes, namely, Scheme-I and Scheme-S. Further, to ensure uniform quality-of-service to all users, we obtain the max-min power control coefficients which maximize the minimum achievable SE.
  • Item
    VLSI Implementation of Training Accelerators for Decision Tree Algorithm
    (2023) Choudhury, Rituparna
    This thesis presents the hardware realization of three DT algorithms namely, Two Means Decision Tree (TMDT),Hybrid Decision Tree (HDT), and Perceptron Decision Tree (PDT). The TMDT algorithm classifies the data based on two-means clustering in each node. This reduces the computation to a great extent and thus, results in efficient hardware. First, the offline implementation of TMDT training is performed where the entire data is loaded to on-chip memory on Field Programmable Gate Array (FPGA). In this work, the TMDT algorithm is implemented in serial and mixed mode for binary classification only. However, the serial processing of blocks limits the speed of execution. So, next, a mixed implementation of TMDT training is proposed on FPGA. In this work, some blocks are implemented in pipelined manner and some blocks in parallel to minimize the latency. The memory access latency increased to a great extent due to large on-chip memory. Also, the huge memory consumption limited the training data size. So, the TMDT is modified and a batch-mode training process is implemented for multi-class classification on FPGA in the next chapter. In this implementation, the training data is divided into batches and one batch at a time is loaded into the chip memory to train the DT. This batch-mode implementation removed any constraint on the training data size. The accuracy of TMDT can be enhanced by implementing hybrid nodes. So, next, HDT and its training implementation on FPGA are proposed. This DT is a hybrid combination of split nodes hosting mean-dependent and axis-aligned split decision functions. The mean-dependent nodes are similar to the TMDT nodes and the axis-aligned split function parameters are learned by reducing the number of impurity computations. The HDT is observed to perform better than TMDT. Although there is a little increase in training latency, the performance measures (accuracy and F1-score) were observed to improve. However, the HDT performance was still inferior for small-sized datasets. To further improve the accuracy of the small-sized datasets, PDT training is implemented on FPGA. In this DT, each split node hosts a single output perceptron (no hidden layer). For implementing a perceptron efficiently in hardware, the perceptron architecture consisting of Offset Binary Coding (OBC) and Co-Ordinate Rotation-axis Digital Computer (CORDIC) is proposed and implemented on both FPGA and Application Specific Integrated Circuits (ASICs). The OBC architecture has been used in literature for inner product calculation in filters where multiplication is implemented using shifters and adders. Also, training hardware for PDT is proposed on FPGA. Next, classification hardware for PDT is proposed for bio-medical applications on both FPGA and ASIC. The DT algorithms presented in this thesis have lower complexity compared to CART, C4.5 or ID3 algorithms. The training of these DT algorithms is implemented on FPGA and found to be both resource-efficient than the existing training accelerators. The serial architecture consumes fewer resources and runs at least 10x faster than the software implementation. The serial implementation of TMDT optimizes the resource consumption by at-least 8x as compared to the existing training accelerator for CART. The mixed implementation is found to be at least 14x faster than the software implementation. The batch-mode implementation is found to speed up the training by at least 27x as compared to software implementation. It halves the LUT consumption as compared to previous designs. It also exhibits 10x reduction in BRAM usage as compared to the training accelerator for CART. The PDT training is accelerated by 34x for the worstcase scenario (largest dataset) as compared to software implementation. This design almost halves the resource utilization as compared to previous designs and has 6x saving in BRAM consumption as compared to CART training accelerator. This hardware achieves a speed-up by a factor of 2 as compared to the software.
  • Item
    (An) Electrocardiogram Based Secure Person Adaptive Cardiovascular Disease Diagnosis System
    (2024) Jyotishi, Debasish
    The electrocardiogram (ECG) signal is the primary non-invasive diagnostic tool used by cardiologists for diagnosing cardiovascular diseases (CVDs). Timely identification of CVDs is critical for effective treatment and prevention of fatalities. The automated diagnosis of CVDs is crucial in assisting cardiologists and facilitating remote monitoring; contributing to the advancement of AI-based healthcare. However, the significant challenge lies in the inter-individual variability of morphological characteristics in the ECG signal, necessitating the development of a person-adaptive CVD diagnosis system. Additionally, automated diagnosis brings with it security and privacy considerations concerning wearable healthcare devices and the handling of sensitive medical data. This thesis work aims to learn deep temporal and spatio-temporal representations from multi-lead ECG signals with the overarching goal of developing an automated CVD diagnosis system and a robust biometric system. Additionally, a novel method is proposed to effectively leverage learned personspecific representations for the development of a person-adaptive CVD diagnosis system. In our first work, an ECG based person identification and verification system is developed by learning the underlying temporal representation of the ECG signal. A biometric system based on long shortterm memory (LSTM) network is designed for explicitly learning the temporal representation. Further, a novel attention based hierarchical LSTM (HLSTM) model is designed to learn the temporal variation of the ECG signal in different abstractions. Empirical findings demonstrate substantial performance enhancements by using multi-scale temporal information. In the second study, the multi-scale temporal dynamics learning network (MSTDLNet) is introduced to concurrently capture the local morphological representation and multi-scale temporal dynamics in the ECG signals for biometric applications. The experimental findings affirm that the multi-scale temporal representation learned by MSTDLNet yields robust and persistent performance, significantly improving outcomes in multi-session analyses. In the third work, an automated CVD diagnosis system using multi-lead ECG signal is proposed. Specifically, an attentive spatio-temporal learning network (ASTLNet) is developed to learn better diagnostic representation by exploiting the concurrent spatio-temporal variation of a multilead ECG signal.
  • Item
    Highly Compact and Low Mutual Coupling MIMO Antennas
    (2023) Mishra, Mohit
    Multiple-input multiple-output (MIMO) is an efficient technology that can meet the demands of the modern communication system, such as higher data rate and less probability of error in data transmission. However, there are issues associated with the deployment of multiple antennas on both the transmitter and receiver sides. One such major challenge is increased electromagnetic interaction between antenna elements in a compact MIMO system, leading to increased mutual coupling (MC) and antenna correlation coefficient (ACC). The increase in the MC and the ACC between closely packed antenna elements of a compact MIMO antenna degrades the performance of a MIMO communication system. Therefore, this thesis begins with an introduction emphasizing the relevance of MIMO communication systems, followed by a discussion of the cause of the MC and its impact on MIMO communications' performance. A detailed study of the existing MIMO antenna design methodologies is carried out. It includes electromagnetic band-gap (EBG) structures, defected ground structures (DGS), neutralization line (NL) technique, array decoupling surfaces (ADSs), and substrate-integrated waveguide-based MIMO antennas. Compared to NLs for MC reduction, EBGs & isolators require much more space between antennas to accommodate them. Two NL-based MIMO antenna designs are presented in this thesis. The first is a dual-band two-port MIMO antenna, while the second is a six-port MIMO antenna. The mutual coupling between antenna ports in the two-port and six-port planar MIMO antenna is reduced by connecting adjacent antennas through NLs. ADSs or superstrates employed in isolation enhancement are generally placed at a certain height above the antenna array due to the associated operating principle. Consequently, an additional layer gets added and increases the MIMO antenna profile. Furthermore, single negative (SNG) meta-grid lines (MGLs) are proposed for mutual coupling reduction. The SNG MGLs can reduce the MC in the two-port printed monopole MIMO antennas and the two-port MIMO DRAs with minor structural modifications. This technique of the MC reduction between antennas neither necessitates extra space between antenna elements, unlike the EBGs and the isolators, nor the installation of additional layers such as the ADS, enabling compact design of two-port MIMO antennas. It is noted that the proper placement of a double-side copper cladded substrate between two half-split cylindrical dielectric resonator antennas (CDRAs) can reduce the MC significantly and provide a space-efficient approach for the MC reduction in MIMO DRAs. One such twoport half-split DRA with overall size miniaturization is presented in chapter 5. The substrate-integrated waveguide (SIW) cavity-based antennas possess a noteworthy feature of size miniaturization by employing fraction mode SIWs (half-mode, quarter-mode, etc.). In addition, SIW cavities are known for their low profile, ease of integration, and self-consistent electrical shielding. Hence, SIW-based MIMO antennas do not require extra decoupling units when designed carefully and can achieve MC below -15 dB with size compactness. One such sector-shaped compact π/8 partial SIW cavity antenna from the TM220 diagonal mode of the SIW cavity is designed. This design technique offers 61% size miniaturization compared to the SIW rectangular cavity in its complete mode configuration. The proposed radiator is used to design an 8-port SIW-based compact MIMO antenna in chapter 6. Furthermore, the performance of the above-mentioned multi-port antennas for MIMO communications has been described in terms of channel capacity loss, diversity measure, and sum rate loss. This thesis thoroughly investigated all the MIMO antenna designs with their operation and utility.
  • Item
    Exploration of Novel approaches for offline writer identification using handwritten words
    (2024) Kumar, Vineet
    This thesis presents innovative approaches for offline handwritten word image author identification, leveraging various deep learning techniques. The first work employs feature maps from pre-trained CNN layers to capture writer-specific characteristics. Key-point regions are first detected using the SIFT algorithm across different abstractions like characters and their combinations. These regions are processed through a CNN, producing feature maps that are then represented using a modified HOG feature descriptor. A unique contribution lies in extracting additional cues from these feature maps through a saliency measure derived using Sparse Principal Component Analysis (SPCA). The saliency scores are integrated with HOG features to create customized descriptors, which are then classified using SVMs to determine the identity of the writer.
  • Item
    Design of Cryptographic Primitives for Wireless Communication and Blockchain Mining
    (2024) Goswami, Sushree Sila P
    The rising reliance on the internet across various sectors has heightened the importance of security measures, given the potential threat posed by cyber attackers who could corrupt or misuse data. This thesis explores the implementation of diverse cryptographic algorithms—DES, RSA, AES, ECC, and ECCDH—on FPGA (Field Programmable Gate Array). In secure wireless communications, stream ciphers are preferred for their hardware implementation simplicity. The design of stream ciphers generally involves using a pseudorandom number generator to produce a keystream, which masks the plaintext through a XOR operation, resulting in cipher text. This research presents the realization of these designs using Verilog Hardware Description Language and their implementation on FPGA. Experimental results indicate that a modified SNOW 2.0 architecture is 13% more resource-efficient and 19% more efficient overall compared to the traditional SNOW 2.0, and 104% more efficient than existing architectures. Security is paramount in electronic communication, particularly in wireless networks like LTE, where cryptographic algorithms are vital for protecting sensitive data. While software implementations are straightforward, they often lack the speed required for real-time communication devices, necessitating hardware implementations of cryptographic processors. This thesis introduces a novel SNOW3G crypto processor for 4G LTE security, optimized for area, power, and efficiency. Implemented on the Zynq ZC702 FPGA, this design uses only 0.31% of available area and achieves significant efficiency and low power consumption, making it suitable for mobile devices.
  • Item
    Design and Implementation of Continuous Flow FFT Processors for OFDM in Wireless GigaHertz Standards
    (2023) Agarwal, Sumit
    The throughput requirement of latest OFDM based IEEE 802.11ay WLAN standard is between 20 to 40 Gbps. Also, the FFT processor needed for OFDM must work in continuous mode for real time communication. The number of FFT points can be variable. In this thesis, we propose a Continuous flow architecture for a 512 point FFT to meet this requirement at about 28 Gbps. The number of points are kept fixed here to illustrate the essential features of the design. They can be varied, if desired, with minimal architectural modifications. Architectures to meet the throughput requirement (10 Gbps) of earlier WLAN standard IEEE 802.11ad have been reported in the literature. The proposed architecture achieves more than double this throughput at 28 Gbps with similar chip area and clock as the best existing 10 Gbps designs. This is made possible through a specialized design for OFDM unlike the earlier FFT chips which were designed for general purpose FFT. The proposed architecture uses two radix-16 and one radix-2 stages to meet the high throughput requirement. Standard continuous flow (CF) FFT designs use two memories. The proposed design exploits the smaller wordlength of 4 bit (for 64 QAM) of OFDM to introduce an additional smaller input memory and a simpler processing element (PE) for the input stage. Combined with the existing two memories, there are now three memories for the three stage FFT. Thus this design allows memories to assume dedicated roles for each stage. Compared to the existing practice of switching of memories, dedicated memories need a novel addressing scheme to maintain CF as data is replaced in same memory rather than switching the memories.
  • Item
    Design of Low Power VLSI Architectures for Machine Learning Based Wearable Healthcare Devices
    (2023) Janveja, Meenali
    According to the World Health Organization (WHO), cardiovascular diseases cause approximately 17.9 million fatalities yearly, which is estimated to be 31% of the global mortality rate. An electrocardiogram (ECG) is a biosignal that provides information on the patient’s heart’s electrical activity. ECG enables the diagnosis of various cardiac abnormalities, from acute coronary syndrome to cardiac arrhythmias. Therefore, ECG monitoring in daily life is necessary for early diagnosis of heart disease. Hardware and software developments have led to the development of machine learning enabled wearable healthcare devices, such as smartwatches and chest patches, which can continuously monitor cardiac functioning easily. The wearable devices provide critical alerts for events that require prompt medical attention or hospitalization, making them highly efficient and practical. A conventional wearable device has three primary modules. The first module is the sensors and analog front, responsible for acquiring the ECG signals and converting them to digital samples. The second module consists of an ECG co-processor, incorporating a feature extraction block and a machine learning-based classifier responsible for ECG signal analysis and classification of cardiovascular diseases. The final module comprises data compression and transmitter blocks, which transmit ECG data and the classifier output to the cloud servers. In wearable devices, battery life is critical because most devices monitor ECG continuously. Further, these devices should be small and easy to use. Therefore, area and power-optimized algorithms and their VLSI architectures are required for continuous monitoring of ECG on wearable devices. Thus, we present optimized ECG signal processing algorithms and their low-power and resource-efficient VLSI architectures for cardiovascular disease detection, such as cardiac arrhythmia and myocardial infarction, for wearable devices.
  • Item
    Automated Diagnosis of Heart Valve Diseases from Phonocardiogram Signals using Deep Learning
    (2023) Das, Samarjeet
    Heart valve diseases (HVDs) are the primary causes of mortality in developing and underdeveloped countries. Early detection of HVDs is essential to avoid lethal heart diseases due to the disease’s progression. Phonocardiogram (PCG) signal provides a non-invasive and cost-effective tool that helps with the preliminary diagnosis of HVDs. However, the raw PCG signals are often susceptible to noise and artifacts. It degrades the signal quality and makes it challenging to diagnose HVDs manually. Furthermore, the wide variabilities in the PCG morphologies due to HVDs exhibit manual examination, often subjective and prone to human error. To address the above challenges, this dissertation focuses on developing automated deep-learning methods for diagnosing HVDs.
  • Item
    Development of Kalman Filter based Algorithms for Fringe Pattern Analysis
    (2024) Sharma, Shikha
    The purpose of fringe pattern analysis is to retrieve the phase from the fringe pattern. The phase retrieval is essential from the fringe pattern in order to derive the object information. Therefore, demand for the phase information has promoted the development of fringe analysis techniques. Spatial fringe analysis techniques typically involve different operations such as fringe denoising, fringe normalization, and fringe pattern demodulation for the phase estimation. In some cases, phase aberration compensation is also required to be performed. The thesis presents a number of spatial fringe processing algorithms based on the application of Kalman filter.
  • Item
    Analysis of Speech and Music Content for Movie Genre Classification
    (2023) Bhattacherjee, Mrinmoy
    Movies are a popular mode of entertainment around the world. The consistent rise in the production and consumption of movies demands more efficient automatic movie content analysis applications. Movie Genre Classification (MGC) is vital for underage censorship, search, retrieval, and targeted publicity. Current trends in MGC literature indicate a focus on short trailers instead of full movies and a multimodal approach. The audio modality is generally used only as an auxiliary channel. However, due to its rich genre-specific information, the audio signal deserves a dedicated study in the current context. Hence, this thesis aims to perform only audio-specific MGC. The thesis has four principal contributions. First, spectral peak tracking-based magnitude spectrum features are proposed for isolated speech and music classification. Second, the underexplored phase component of the audio signals is utilized for discriminating speech and music. The third contribution involves using harmonic-percussive sourceseparated features and classifiers in the multi-task learning framework for identifying speech overlapped with music. Finally, the above proposals are employed for the MGC task. The spectral peak trackingbased method performs better than the other proposals and the baselines. Specific combinations of all the proposed and baseline features provide the overall best performance, even in the cross-dataset scenario. The thesis work can be extended in the future by analyzing the individual constituents of speech and music for a more nuanced representation of movie genres.
  • Item
    Automatic Dialect Identification in Ao, a Low Resource Language
    (2023) Tzudir, Moakala
    Dialect Identification (DID) is a significant research problem widely explored in major languages like Arabic, Chinese, and Spanish. DID can serve as a frontend for many applications like Automatic Speech Recognition (ASR) that may require special dialect-specific enhancements for improved performance. This thesis proposes an automatic DID system for Ao, an under-resourced language of India. Ao is a Tibeto-Burman language spoken in Nagaland. It is a tonal language with three lexical tones: high, mid, and low. Chungli, Mongsen, and Changki are the three dialects of Ao that differ in their respective tone assignment on lexical words. Four principal contributions are made in this thesis. The first contribution of this thesis is creating a manually collected and annotated novel speech dataset to foster research on the Ao language. The second contribution of the thesis is a detailed acoustic study of the unexplored tone dynamics of the dialects of Ao. Based on the analysis, a tonal feature ($F_0$) to capture the dialect-specific tone information is proposed. The DID performance improves when the proposed tonal feature is combined with other spectral features. As the third contribution, this thesis explores three excitation source features in the DID task. The source features studied are Residual Mel Frequency Cepstral Coefficient (RMFCC), Integrated Linear Prediction Residual Log Mel Spectrogram (ILPR-LMS), and Linear Prediction (LP)-gammatonegram. A notable performance improvement is observed when the source information is combined with the vocal tract information. The fourth contribution of this thesis is the exploration of prosody-related characteristics of speech signals. The prosodic features are observed to provide significant performance improvements in classifying the dialects of Ao. The thesis work is concluded by combining all the proposed approaches to build an efficient DID system for Ao. Among many hurdles in studying under-resourced languages like Ao, the need for more data is the most prominent. Nevertheless, the contributions of this thesis may bridge some of those gaps and spur future research in this direction.
  • Item
    Story Segmentation and Retrieval of News Videos in a Multi-modal Framework
    (2024) Haloi, Pranabjyoti
    Shot segmentation, categorization, indexing, and news story formation are the most important and primary steps in building an efficient and well-sorted video storage and retrieval system. News channels have evolved as one of the primary sources of information. However, in recent times, with the increase in the number of news channels, a plethora of news content is available on air, and it has become difficult to store and retrieve the news videos effectively. Commercials are also included in a news video, containing considerably less information. These commercials are to be filtered out, and the remaining news video will be segmented meaningfully. Segmentation of news videos is a crucial process for efficient storage and categorizing of the videos. The segmented stories also facilitate the easy retrieval and finding of the desired news. In this work, we developed different algorithms for shot segmentation, categorization, indexing, and retrieval of news videos. Our methods are independent of different temporal and spatial structures of various news channels and require a minimal manual input.
  • Item
    Graph based Classi cation Techniques for Pig Breed Identi cation from Hand-crafted Visual Muzzle Descriptors
    (2023) Chakraborty, Shoubhik
    Breed classification of pigs based on muzzle images has been attempted in this thesis. Limited, noisy, heterogeneous visual data stemming from MUZZLE images taken from Pigs belonging to different breeds pose many challenges, not just from the point of view of identifying and isolating those features and statistics which are discriminatory in nature, but also from the point of view of constructing a suitable breed-centric model (aided by an inferencing mechanism), which is robust and stable. The work in this light has three primary contributions:  Designing and selecting a set of Handcrafted Colour and Texture based visual descriptors which are breed-discriminatory.  Devising a feature-specific siphoning policy and model for segregating breeds serially.  Using Spanning Trees in DUAL MODE (MIN-tree and MAX-tree forms) for binding breed-specific features and devising a NOVEL test-point INDUCTION procedure for producing an OUTLIER score, whether the point is in the INTERIOR or EXTERIOR of the breed-cluster. Given the diversity of data on hand and the limited training set available to build the model, CROSS-testing results were very promising: DUROC-breed (93.85%), GHUNGROO (97.48%), HAMPSHIRE (94.27%) and YORKSHIRE (100%).