PhD Theses (Computer Science and Engineering)

Browse

Recent Submissions

Now showing 1 - 20 of 109
  • Item
    Content and Coherence Based Strategies for Optimizing Refreshes in Volatile Last Level Caches
    (2022) Manohar, Sheel Sindhu
    With each process generation, Moore’s law offers us an exponential growth in the transistor budget on the chip. Technically, these extra transistors were used to improve processor architecture speed by adding more complicated and simple pipelines and better arithmetic and floating-point units. The futher advances included multi-core systems which demanded larger on-chip caches to support the data demands. Larger last-level caches are deployed across the chip to meet the increasing need for higher cache capacity due to included CMPs in processing cores. LLCs play an important function in the cache hierarchy by giving necessary data to hungry CMPs. SRAMs are not scalable and require advancements in power, performance, and scalability. In order to deploy massive LLCs, researchers are focusing on the construction of caches using alternative technologies that have advantages over traditional SRAM. High scalability, lower leakage power, and higher capacity in the same area footprint as SRAM are among the benefits of these technologies. However, we must investigate the best of these alternatives because they are not without flaws.
  • Item
    Multimodal Attention Variants for Visual Question Answering
    (2023) Mishra, Aakansha
    Visual Question Answering (VQA) is an exciting field of research that involves answering natural language questions asked about an image. This multimodal task requires models to understand the syntax and semantics of the question, interact with the relevant objects in the image, and infer the answer using both image and text semantics. Due to its complex behavior, VQA has gained considerable attention from both vision and natural language research community.
  • Item
    Enhancing Endurance of NVMs by Coarse-To-Fine Grained Write Reduction and Intra-line Wear Leveling
    (2023) Nath, Arijit
    The unprecedented development in the processing speed of the Chip Multi-Processor (CMP) and the rise of modern data-intensive applications impose high pressure on the memory subsystem. It significantly increases the main memory footprint and necessitates designing of energy-efficient and high capacity main memory. Unfortunately, the traditional memory systems, built predominantly using DRAM are not scalable to the low nanometer regime. At this need of the hour, the Emerging Non-Volatile Memories (NVMs) like PCM, STT-RAM, ReRAM offer fascinating features like high density and low leakage power that are useful for building high capacity and energy-efficient memory systems. However, NVMs have asymmetric read/write operations, where writes are costly in terms of latency and energy. Also, frequent write operations to the NVM cells tend to wear out the memory cells, leading to a shortened memory lifetime. Furthermore, NVMs retain data even after the system is powered down. Hence, an attacker having physical access to the NVM DIMM can easily stream out the sensitive data stored in the NVM. Researchers have proposed encryption techniques to protect the sensitive NVM content. However, encryption algorithms lead to enormous bit-flips when the encrypted data is written in the NVM arrays. Hence, the lifetime issue of the NVM devices is further complicated by encryption-induced bit-flip spikes.
  • Item
    Design and Testing of Digital VLSI Circuits using Approximate Computing
    (2023) Jena, Sisir Kumar
    Several studies on the applications of Recognition, Mining, and Synthesis (RMS) have been undertaken in recent years. The tasks executed by these applications don’t require a golden answer or an outstanding numerical result. Instead, they must deliver products that are acceptable or sufficient in quality. These workloads have inherent application resilience or the capacity to deliver acceptable results even if a significant portion of their computations are executed in an imprecise or approximate manner. Intrinsic application resilience adds a whole new level to the optimization of computing platforms. However, the belief that every computation must be conducted with the same stringent idea of accuracy continues to govern the design of computing systems. With unrelenting demand for computing performance on one side and the power requirement from technology scaling on the other, it’s essential to delve into a new source of efficiency. Approximate Computing (AxC) is a new design method that takes advantage of the flexibility given by intrinsic application resilience to optimise hardware or software implementations that are more energy or performance efficient. Several AxC techniques have been effectively developed for system architecture, software, storage elements, arithmetic circuits, and simulation in the last decade. In this thesis, we focus on Approximate Arithmetic Circuits, particularly Approximate Adder, which are the result of applying AxC techniques at the hardware level, and Approximate Testing, which is the process of approximating the conventional test procedure.
  • Item
    Cost-effective Video streaming for Internet of Connected Vehicles using Heterogeneous Networks
    (2023) Chowdhury, Debanjan Roy
    "Internet of Connected Vehicles (IoCV) comprises smart vehicles which communicate among themselves and are connected to the Internet through static infrastructure nodes. Infrastructure nodes may use heterogeneous network technologies like cellular networks, Wifi networks, or Dedicated Short Range Communication (DSRC) networks. Among these networks, cellular networks have limited resources and impose access costs. Therefore, reducing the number of simultaneous cellular connections in an IoCV is a requirement. Smart vehicles of IoCV need persistent Internet connections for various safety messages and infotainment services. Among the infotainment services, video type infotainment services are prevalent. As the major portion of the traffic carried by the Internet core is of video type, reducing video traffic is the need of the hour. To meet the high-quality and low-latency demands for video services, content originators use the services of Content Distribution Networks (CDN). While providing video infotainment services over IoCVs, the objectives of CDN providers are to reduce the traffic volume of the Internet core, reduce service costs, and increase service profitability. To reduce the traffic volume, CDN providers deploy replica servers to serve the demands locally. However, if several vehicles demand the same video content simultaneously, like in the case of a live video streaming, a CDN replica server may get overwhelmed by the number of concurrent and redundant flow requests. As the content demand is homogeneous, the number of one-to-one flows to the CDN replica server can be reduced by bringing the content further closer to an IoCV using edge servers. Using infrastructure nodes as edges incurs deployment costs or carrier partnership costs, whereas using vehicles as edges needs Vehicle-to-Vehicle (V2V) collaborations. To reduce the service cost, the CDN provider needs to minimize the usage of simultaneous cellular connections and maximize V2V collaborations while ensuring service quality and client satisfaction. To generate additional revenues, CDN providers offer multi-tier video services where higher-tier clients pay more for enhanced video quality. However, the dynamic connectivity among vehicles and the intermittent availability of different networks (Wifi, cellular, DSRC) make the above-mentioned tasks extremely challenging. Accordingly, the objective of this dissertation is to find cost-effective solutions for CDN providers to run video infotainment services over IoCVs. This dissertation has four contributions toward the objective. The first contribution is focused on devising a centralized solution for reducing Internet bandwidth usage and the number of simultaneous cellular connections by minimizing the number of edge vehicles. The second contribution has proposed a distributed version of the first contribution, which helps CDN providers to reduce capital expenditure by avoiding setting up expensive servers of high-computing facilities. In the third contribution, a solution is provided for efficient Vehicle-to-Infrastructure (V2I) mode selection to increase CDN providers' profit in heterogeneous network scenarios. The fourth contribution of this dissertation devises an edge selection solution for CDN providers to provide multi-tier streaming services. The experiment results show that in comparison to existing solutions, the proposed solutions are the most cost-effective for CDN providers."
  • Item
    Quality of Experience for Dynamic Adaptive Streaming over HTTP in Mobile Devices: Assessment, Modeling and Improvement
    (2022) Yarnagula, Hema Kumar
    The works in this dissertation assess, model and improve the quality of experience (QoE) for dynamic adaptive streaming over HTTP (DASH) in mobile devices. We first investigate how the bitrate adaptation algorithms in DASH influence the end-user QoE, particularly operating under highly fluctuating bandwidth conditions such as mobile networks. To this end, we perform video QoE assessment (both subjective and objective assessments) of several popular DASH bitrate adaptation algorithms in mobile clients. We also present and formally define a set of application-layer objective QoE metrics for the objective quality assessment with an end goal to capture the severity of the metrics on the end-user viewing experience. From the insights gained from the detailed evaluation of the experimental results, we provide guidelines for designing bitrate adaptation algorithms, particularly for DASH, with an aim to improve end-user QoE in the presence of realistic mobile network scenarios. Second, we propose a parametric multinomial logistic regression (MLR) model for QoE estimation/prediction by merely taking a set of five objective metrics as input. The proposed MLR model predicts both short-term and long-term QoE scores that closely resemble the end-user viewing experience for a DASH session. Third, We propose an adaptive segment-aware Kpush algorithm for DASH/2 with the objective of improving end-user QoE while addressing the request message overhead (a.k.a. request explosion problem) in mobile clients. The proposed adaptive segment-aware K-push algorithm aims to determine an appropriate adaptation push pair (i.e., number of segments and corresponding bitrate level) for the push cycle in DASH/2. Finally, we propose an adaptive segment prefetching strategy with an aim to improve the end-user QoE for DASH in Multi-Access Edge Computing (MEC) networks. The proposed strategy dynamically determines the number of segments and their corresponding video bitrate and prefetch them to the MEC server (located at the 5G/LTE base station) with an objective to improve the end-user QoE, reduce the video segment access latency and improve the backhaul link utilization.
  • Item
    Immuno-Inspired Embodied Lifelong Learning in Robots
    (2023) Kulkarni, Divya D
    The work describes the formulation of immuno-inspired mechanisms to continuously evolve, cache, manage and evict several Artificial Neural Network (ANN) based robot controllers, within disparate Halls-of-Fame, thereby facilitating Embodied Life-long learning in robots. The work also introduces a novel concept termed Mutational Puissance to enhance learning in ANN based controllers that use neuro-evolution. Further, unlike the conventional layer-wise transfers conducted in ANN-based Transfer Learning, a new immunology inspired Neuronal-level Transfer Learning technique has also been described. The technique aids in identifying neurons that play a more significant role during the learning phase. These, so-called, hot neurons, when transferred to target ANNs hasten learning convergence, especially when the source and target dataset domains are dissimilar. Transfer of such neurons has also proved to be effective while learning in scenarios involving robots.
  • Item
    Formal Modeling of Network-on-Chip and its Applications in Starvation and Deadlock Detection and in Developing Deadlock Free Routing Algorithms
    (2022) Das, Surajit
    Due to the uncontrolable heat genearated by a single high speed processor the transition takes place from a single high frequencey processor to Chip Multi Processor (CMP) consists of many processors with moderate clock frequency. The communication between these processors take place with help of a programmable network called Network-on-Chip. Starvation and deadlock are two major issues that degrade the performance of NoC or halt the system. There is no inbuild support to detect such problem in state-of-the-art NoC simulators like Booksim and Gem5. Therefore, our objective is to detect deadlock and starvation using formal model. The first contribution of the thesis ia formal modeling of complete NoC using Finite State Machne (FSM) and detection of starvation freedom. We have used NuSMV model checker to verify the FSM based NoC model. We have also verified progress and transfer of packets in the FSM based NoC model. In our second contribution, we have modelled detailed NoC using Communicating Finite State Machine (CFSM) for detection of application specific deadlock. We have developed a simulation framework with the CFSM based model. The formal simulator checks a given traffic pattern with respect to a given algorithm for detection of confirmed deadlock. We have tested three algorithms using our CFSM based simulator. We have detected deadlock in dynamic XY-routing and detected false positive deadlock warning in the booksim simulator. In our third contribution, we presented deadlock analysis in Torus NoC. Torus NoC is deadlock prone due to the inbuild cyclic paths via wraparound channels, that are useful in reducing hop counts. We have proposed a deadlock avoidance approace for Torus NoC called Arc Model. We have also proposed Directional Dependecny Graph (DDG) for theoretical deadlock detection and avoidance. In our fourth contribution, we have proposed three deadlock free routing algorithms for Torus NoC using Arc Model and DDG. These algorithms do not use any additional resources for deadlock avoidance. We have compared our algorithms with FirstHop and Up*/Down* routing using uniform traffic and PARSEC benchmark suites. Our future direction of work is to develop more effecient algorithms analysing hardware overhead.
  • Item
    Sentiment analysis of tweets on societal topics
    (2021) Singh, Loitongbam Gyanendro
    With the increasing popularity of Twitter, a surge in understanding public opinions on various social issues has been evident. One of the parameters often considered in such studies is understanding public sentiment toward target policies or issues. Earlier studies on sentiment analysis primarily focus on commercial domains such as product review, movie review, restaurant review, etc. However, sentiment analysis of opinions on the societal domain faces unique challenges because of a wide range of topics with diverse vocabularies, target aspects, and different language constructs. This thesis aims to investigate the characteristics of tweets in societal and non-societal domains and build an effective sentiment analysis classifier for the Societal domain. Unlike regular texts, sentiment analysis of tweets needs to deal with various challenges. Tweets are generally short and often under-specified due to character limits. They often consist of noises due to informal writing (shorten/elongated text), misspelling, multilingual code-switch, and code-mixed contents. Contributions are made to address the challenges of sentiment analysis of tweets such as under-specificity, noise, and multilingual content, by proposing deep learning-based methods to improve the performance of the sentiment classifiers in the societal domain. We proposed a method to represent tweets in a heterogeneous multilayer network by incorporating the relation of keywords, hashtags, and users to enrich the representation of tweets by adding information relevant to the tweets, such as keywords, hashtags, and users information. The proposed method for representing tweets in a network structure can capture better information than text-based representation by adding relevant related information. Further, we proposed a multi-view learning framework to incorporate both text and graph views of a tweet to enrich the tweet representation for the sentiment classification task. It is observed that the proposed multi-view learning framework can better perform classify tweet's sentiment than its single-view counterparts.
  • Item
    Manipuri-English Machine Translation using Comparable Corpus (An Unsupervised Statistical Machine Translation Approach)
    (2022) Laitonjam, Lenin
    Machine Translation (MT) is an essential tool for communicating with foreign-language speakers. The current mainstream MT frameworks, namely, Statistical MT (SMT) and Neural MT (NMT), are characterized by learning to translate automatically via machine learning techniques. It has been observed that these systems require a large number of parallel sentences between the source and the target language pair to produce a high-quality translation. Unfortunately, readily available parallel sentences are limited for most language pairs. Manually generating a quality parallel corpus is also very costly and time-consuming. As a result, many practical applications of MT are restricted to widely spoken and rich-resource languages. On the other hand, MT quality has not reached a reasonable level in many low-resource language pairs. This thesis reports the problem of developing an MT system that translates between low-resource Manipuri and English. Manipuri is one of the scheduled Indian languages. The study focus on improving the MT quality between the language pair by exploiting unsupervised MT approaches to cope with bilingual corpora's scarceness. Unsupervised MT enables translation between languages without using parallel data by exploiting source and target language monolingual corpora. This thesis first presents a Manipuri-English comparable corpus to facilitate MT research between the language pair. The corpus belongs to the same domain and is also aligned at date and document levels. Although the results are promising, unsupervised MT techniques have the drawback that their performance suffers when the source and target languages have different linguistic properties. To alleviate issues incurred due to different linguistic aspects between English and Manipuri, this thesis proposes two methods. The first method is proposed to normalize the morphological inflection issue of Manipuri. The second method is proposed to induce inter-language connecting points between Manipuri and English. The last part of the thesis is dedicated to making the best use of the proposed comparable corpus for the language pair MT task. Specifically, the study exploited the document-aligned and temporally-aligned characteristics of the corpus. Firstly, this thesis proposes a multi-step approach to exploit document-aligned comparable corpus. From various experimental results on English-to-Manipuri and Manipuri-to-English MT, it is observed that both the proposed methods developed for leveraging the comparable corpus's different alignment characteristics succeeded in their respective task and further enhanced the translation results.
  • Item
    Performance Enhancement of Tiled Multicore Processors using Prefetching and NoC Packet Compression
    (2022) Deb, Dipika
    The well-known memory wall problem is created because of the disparity between the processor speed and main memory speed, restricting a system from achieving the maximum performance benefit. In a multicore system, the performance is closely linked to how fast a cache miss is served, i.e., Average Memory Access Time (AMAT). However, in Tiled Chip MultiProcessors (TCMP), the inbuilt Network on Chip (NoC) plays an important role in determining AMAT. This is because the last level cache is shared and distributed among the tiles present in the system. In such systems, very often, the role of the underlying communication network gets unnoticed. Thus, cache misses experience additional delay apart from the conventional memory access latencies, which makes the block access time non-uniform. The additional delay is the network latency incurred to transfer the cache miss request and reply packet to the requesting tile. The non-uniform memory access latency across the tiles makes it unpredictable to estimate AMAT. Prefetching and NoC packet compression are the two techniques that can be used to reduce AMAT in TCMP. However, none of the existing techniques considers the on-chip communication overhead of TCMP. Considering the limitations of existing prefetching techniques, in this thesis, we propose efficient prefetching strategies that are aware of the underlying TCMP architecture. It identifies the false positive cases of prefetching that results in generating useless prefetch requests. These conditions prevail only on TCMP architectures due to its shared and distributed last level cache. The useless prefetch requests, thus generated, causes cache pollution which further results in generating unwanted NoC traffic. It further congests the network, thereby increasing the packet transfer rate in NoC. We notice that useless prefetch requests increases AMAT, hampering the system performance. We also cannot ignore the fact that cache pollution can be caused by useful prefetches by evicting important demand blocks from the cache. Hence, to reduce cache pollution we propose mechanisms for throttling useless prefetches and efficient strategies for placement of prefetch blocks. In order to reduce the on-chip communication latency, a novel packet compression technique is also proposed that operates at a smaller granularity of data to achieve better compression ratio. Experimental analysis shows that the proposed prefetching and compression techniques perform better than the existing techniques. Thus, both the techniques combined reduces the on-chip communication cost that directly improves AMAT for TCMP architectures.
  • Item
    Adaptive Resource Allocation for Faster Formation of 6TiSCH IoT Network
    (2022) Kalita, Alakesh
    The IPv6 over the TSCH mode of IEEE 802.15.4e (6TiSCH) network is standardized to meet high reliability, end-to-end latency, and network lifetime requirements of various IoT applications. The formation of 6TiSCH network should happen before establishing end-to-end data communication. So, the 6TiSCH Working Group published 6TiSCH Minimal Configuration (6TiSCH-MC) standard for 6TiSCH network formation. Basically, 6TiSCH network formation is started by the Join Registrar/Coordinator (JRC), and all the new nodes (aka pledges) join one-by-one in the multi-hop 6TiSCH networks. Faster formation of 6TiSCH network is challenging because of the inherent channel hopping feature of TSCH as the pledges do not know in which channel and at what time control packets are transmitted by the already joined nodes and/or the JRC. Apart from this, the resource allocated by 6TiSCH-MC standard is static in nature, and it did not provide any mechanism to handle congestion in shared cell and to regulate the transmission rates of different control packets used to form the network. Therefore, we set the objective of this thesis is to augment the 6TiSCH-MC standard by updated mechanisms for achieving faster formation of 6TiSCH IoT network. It is observed that congestion in shared cell becomes an inevitable problem when the number of joined nodes increases, and it degrades the performance of 6TiSCH network formation. Therefore, to reduce the congestion in shared cell, we proposed two schemes, namely, channel condition based dynamic beacon interval (C2DBI) and game theory based congestion control (GTCC). We further observe that due to fixed priority assignment to the control packets and insufficient transmission of routing control packet, formation of 6TiSCH network gets delayed. Therefore, for sufficient and efficient transmission of routing control packet, we propose another two schemes, namely, opportunistic transmission of control packets (OTCP) and adaptive control packet broadcasting (ACB). Further, we leverage all the available channels at a time in order to increase the number of shared cells per slotframe for quicker transmission of control packets. For this, we propose another two schemes i.e., autonomous allocation and scheduling of minimal cell (TACTILE) and time-variant RGB (TRGB) model. Finally, it is worthwhile to mention that all the proposed schemes are evaluated by Markov Chain based theoretical analysis, simulation, and testbed experiments. As a whole, this dissertation improves the performance of 6TiSCH network during its formation period in terms of joining time and energy consumption of the IoT nodes while maintaining stable network
  • Item
    Algorithms for Facility Location Problems in Geometric Settings
    (2023) Mishra, Pawan Kumar
    Facility Location Problems (FLPs) have been the subject of extensive research because of its diverse range of applications in VLSI, networks, clustering, and other areas. The covering problem and the dispersion problem are two popular FLPs. Covering problem refers to selecting a subset of covering objects from a given set of objects such that the union of the selected objects contains all the elements. On the other hand, the dispersion problem refers to selecting a subset of a given set of objects such that the closeness between the objects in the selected set is undesirable. In this thesis, we investigated capacitated version of the covering problem and the dispersion problem and many of its variants. We established a necessary and sufficient condition through which we can ensure whether the given instance of the capacitated covering problem is feasible or not. Further, we prove that the problem is NP-complete. Finally, we proposed a local search algorithm for the capacitated covering problem, and proved that the proposed algorithm is a PTAS. For the dispersion problem, we introduce the concept of dispersion partial sum, which generalizes the notion of dispersion. Based on the dispersion partial sum, we defined variants of the dispersion problem, namely the 1-dispersion problem, the 2-dispersion problem, and the c-dispersion problem. We studied the following dispersion problems in Euclidean space: the 2-dispersion problem in both R2 and R1, and the 1-dispersion problem in R1. We presented a (2√3 + ε)-factor approximation result for the 2-dispersion problem in R2. We also developed a common framework for the dispersion problem in Euclidean space, which produces a 2√3-factor approximation result and an optimal result for the 2-dispersion problem in R2 and R1, respectively. We studied the c-dispersion problem in a metric space. We proposed a greedy algorithm for the c-dispersion problem, which produces a 2c-factor approximation result. We also proved that the c-dispersion problem in a metric space parameterized by solution size is W[1]-hard. Finally, we considered a variant of the 1-dispersion problem, where a set of locations is the vertices of a convex polygon. This variant of the 1-dispersion problem is referred to as the convex 1-dispersion problem. We proposed an O(n3)-time algorithm that returns an optimal result where the objective is to select k(= 4) vertices for the convex 1-dispersion problem. We also proposed a √3 (≈ 1.733)-factor approximation algorithm for the convex 1-dispersion problem for any value of k.
  • Item
    Brownout-based Power Allocation Strategies in Microgrids
    (2023) Raj, Basina Deepak
    "Future Smart Grids are expected to transform into hierarchically connected networks of miniaturized power systems called Microgrids, which are usually aimed to cater to a small geographical region. Microgrids have been envisaged to integrate heterogeneous distributed energy resources (DERs) including renewables (like photovoltaics, wind turbines and hydro–electricity), fossil fuel based micro–generators and batteries, along with occasional transfers of power from/to the main grid. Microgrid operations management is an important and yet a challenging problem. This is especially because, microgrids must be effectively tuned to maximize renewable energy penetration which are often volatile/intermittent, while still being able to schedule power generation, distribution and consumption in a fashion that ensures an economic and reliable operation. In order to achieve an effective balance between demand and supply, in the face of possible intermittent power shortage scenarios, Demand Response (DR) management strategies for microgrids must include mechanisms that allow adaptive urgency–prioritized control over electricity consumption. Today, with the advancements in smart metering as well as information and networking technologies, implementation of such adaptive power management schemes which require precise knowledge and finer controllability over consumer loads on the demand side, have become practical. The thesis of this dissertation is as follows: In the presence of intermittent power deficits, efficient policies for microgrid sizing, equitable power distribution, appliance scheduling, real–time power balancing, etc., can be designed by using brownout–oriented approaches which allow selective provisioning of electricity to essential appliances while curtailing supply to less important ones, in times of power shortages."
  • Item
    Learning Player-specific Strategies Using Cricket Text Commentary
    (2021) Behera, Swarup Ranjan
    In recent times, as sports get more competitive than ever, players and teams are looking for ways to get an edge over their rivals. The progressive trend of analyzing vast amounts of data has also emerged in cricket, as it brings a significant advantage against other teams. A massive amount of information in the form of scorecards, audio commentary, video broadcasts, and tracking data is generated in every match. Various graphical representations and statistical summaries are obtained from these data to build player-specific strategies. The graphs and statistics summarize player’s - batting overview (wagon wheel, ground map, batting average, and strike rate), bowling overview (pitch map, bowling economy, and bowling average), and fielding overview (field position map). While interesting, the focus of these analyses has primarily been at an aggregate level. They capture the game’s play at a macroscopic level and do not attend to the minute details. For example, the batting average and the strike rate tell us about the batsman’s overall statistics, but not the finer details like how a batsman plays under di
  • Item
    Towards Cost-aware Resource Management in Federated Cloud Data Centers
    (2022) Najm, Moustafa
    The work in this thesis addresses the problems of cost-aware resource management in federated data centers. We design a resource management framework for federated cloud data centers along with some efficient techniques for dynamic resource allocation to minimize the operating cost of the cloud provider, along with considering the requirements of various applications. We propose a hierarchical framework for resource management in a federated cloud, catering to the requirements of each provider in the federation. This framework is used in handling all the other problems in our work. We address the problem of costaware resource allocation for hosting data-intensive applications in a federated cloud. We also address the problem of cost and delay aware dynamic resource management for hosting modern vehicular applications in a federated cloud. Finally, we address the problem of cost-aware VM migration in federated clouds to maximize the profit of the cloud provider. All the proposed algorithms are evaluated based on the data from practical cloud deployments and under different scenarios. Results show that exploiting federation brings great benefits for the cloud provider and the end-users alike.
  • Item
    Near-Memory acceleration of Convolutional Neural Networks by exploiting Parallelism, Sparsity, and Redundancy
    (2022) Das, Palash
    The gap between the processing speed of the CPU and the access speed of the memory is becoming a bottleneck for many emerging applications. This gap can be reduced if the computation can be taken closer to the memory through near-memory processing (NMP). Among the logic options, application-specific integrated circuits (ASICs) are highly efficient in terms of power and area overhead for NMP logic integration. In this thesis, we aim to accelerate Convolutional Neural Networks (CNNs) by integrating custom hardware near the memory. As CNNs are widely used in several emerging applications, the designed hardware can be extensively used in all such cases. To design an NMP-based system with high performance and energy efficiency, we explore various techniques such as leveraging parallelism, exploiting data sparsity, and utilizing computation redundancy to reduce the number of operations. All such techniques result in hardware designs that implement the appropriate data flow and data-parallel algorithm. The designs have positively impacted the system's performance and energy efficiency. To examine the deployability of the NMP approach, we perform experiments on various memory technologies like 3D memory, hybrid memory, and the commodity DRAM. Additionally, we also measure the efficacy of NMP for other applications like database operations. The proposed systems have performed substantially well while comparing them with various baselines and state-of-the-art works.
  • Item
    Understanding and Mitigation of Noise in Crowd-Sourced Relation Classification Dataset
    (2023) Parekh, Akshay
    Relation classification (RC), a task of classifying the relation between a given pair of entities in a sentence to a relation label is fundamental to IE systems. The identified structured triple (subject_entity, relation, object_entity) from the unstructured text can vastly help in knowledge base completion. This organized relational knowledge can further be used for other downstream tasks like question-answering, and common-sense reasoning. A large RC dataset TACRED has been widely used for benchmarking modern deep neural models. However, RC at a large scale is restricted mainly due to the presence of noise in the training dataset. Hence, the performance of such advanced deep neural models, which have shown excellent improvement on other NLP tasks, has been held back for RC.
  • Item
    LongLiveNoC: Wear Levelling, Write Reduction and Selective VC allocation for Long lasting Dark Silicon aware NoC Interconnects
    (2021) Rani, Khushboo
    Increasing processing demand has led to the development of chip multiprocessors which can have multiple to many cores connected with each other and with the on-chip caches. These connections are established by an on-chip packet switched Network-on-Chip (NoC). Scaling of technology nodes increases the power dissipated by the chips leading to thermal restrictions. To control the chip thermal design power, certain components (like cores and caches) may be turned off. However, in this scenario of dark silicon, the interconnect is expected to be available. The thesis aims to save power consumed by this always ON interconnect by replacing the power hungry SRAM buffers in the routers with low leakage Non- Volatile Memory (NVM) based buffers.However, the major challenges with the employment of the NVMs are slower writes and weak write endurance. The thesis proposes: 1.Methods to evenly distribute the writes across these NVM buffers in order to increase their lifetime. This is done by static and dynamic allocation of buffers to the virtual channels and their selection during packet transmission. 2. Power can also be saved by using frequency scaling of the routers and/or turning off certain buffers when the usage is less. Investigation is done for all such approaches and power savings is demonstrated. 3.Endurance can be improved and energy can be saved if we can reduce the number of writes performed on the buffers. This is achieved by proposing two compression techniques leading to reduced network traffic and improved lifetime. All the above methods help in improving the lifetime of the NVM based NoC interconnects in the context of dark silicon
  • Item
    Significance of Hashtags for Improved Topic Modeling on Tweets
    (2022) Kumar, Durgesh
    With the increase in Twitter's popularity, topic modeling on Twitter has become an important problem with applications in diverse fields such as text summarization, document clustering, information retrieval, and sentiment analysis. The short and noisy tweets with informal writing style make topic modeling on tweets more challenging due to increased data sparsity and under-specificity. Latent Dirichlet Allocation (LDA), one of the widely used topic models, suffers from data sparsity and under-specificity. Researchers have tried to counter the data sparsity and under-specificity in tweets by adding related content from external sources such as News pages and Wikipedia or pooling related tweets to pseudo documents. Adding the content from external resources is non-trivial due to differences in writing styles and vocabulary. Moreover, Topic modeling on pooled documents may lose the distribution of topics over the individual tweet and increase the corpus size due to duplicate tweets in different pools. From earlier studies and our preliminary investigation, hashtags are found to provide necessary meta-information in linking tweets to the underlying topics. Motivated by the above observation, this thesis proposes two approaches to counter the data sparsity and under-specificity in tweets for topic modeling tasks: i) expanding tweets with semantically related hashtags, and ii) prioritization of selected hashtags. From various experimental results, it is evident that our proposed methods enhance the topic modeling performance either by i) tweet expansion with semantically related hashtags or ii) incorporating prioritized hashtags in LDA. Furthermore, this thesis investigates the effect of LDA in relation prediction as a case study by exploiting topic and entity relation. It is observed that event-centric relations are effectively predicted using topic modeling over news articles.