Perceptual Hashing for Wavelet-Based Scalably-Coded Video

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
A perceptual hash function for video extracts a Dxed-length binary string called the perceptual hash on the basis of the perceptual content of the video. Besides being sensitive to the content diDerences in videos, a perceptual hash function should be robust against the content-preserving operations on the videos. Recent developments in the Deld of scalable video coding (SVC) demands the robustness of the perceptual hash against the scalability features of SVC. The 3D discrete wavelet transform (3D-DWT) is a way of achieving scalable coding, wherein the inherent multi-resolution structure of the 3D-DWT is exploited. This thesis deals with content-based representation and hashing of video using the 3D-DWT for the use in the wavelet-based SVC (WSVC). This thesis Drst considers extracting representative frames for video using the 3D-DWT. It ex- amines the representation of the content of a video at the group-of-frames (GOF) level by the bands of the 3D-DWT decomposition. The spatio-temporal low-pass band at the full level of temporal and an intermediate level of spatial decomposition of a GOF is used for representing the content of the GOF. Experimental results show the eDectiveness of the band in representing the content of the GOF. Two perceptual hash functions are extracted from the perceptually-representative spatio-temporal low-pass band. For this purpose, the band is divided into perceptual blocks that are sensitive to local contents of the GOF. The Drst hash function derives a hash of the GOF by binarising the wavelet coeDcients in each perceptual block. The similarity between two GOFs is measured in terms of the maximum Hamming distance between the hashes of the corresponding perceptual blocks. Experi- mental results show that the hash function is robust against the scalability features of WSVC and other content-preserving operations, and sensitive to content diDerences at the frame and GOF levels. The hash function has limitations of a large hash size and weak confusion and diDusion properties. The second hash function computes a compact hash by binarising the forward and backward cumulative averages of the local means of the perceptual blocks in the spatio-temporal low-pass band. Experimental results show the robustness of the hash functions against the scalability features of WSVC and other content-preserving operations, and the sensitivity to the content diDerences at the frame and GOF levels. This hash function is shown to have good diDusion and confusion properties....
Supervisor: P. K. Bora