Robust Watermarking for Scalable Video Sequence Thesis submitted to the Indian Institute of Technology, Guwahati for the award of the degree of Doctor of Philosophy in Computer Science and Engineering Submitted by Nilkanta Sahu Under the guidance of Dr. Arijit Sur Department of Computer Science and Engineering Indian Institute of Technology Guwahati 2015 c©2015 Nilkanta Sahu. All rights reserved. TH-1469_10610110 Dedicated to my family and friends Whose blessings, love and support made my path to success TH-1469_10610110 TH-1469_10610110 DECLARATION I certify that a. the work contained in this thesis is original and has been done by me under the guidance of my supervisor. b. the work has not been submitted to any other Institute for any degree or diploma. c. I have followed the guidelines provided by the Institute in prepar- ing the thesis. d. I have conformed to the norms and guidelines given in the Ethical Code of Conduct of the Institute. e. whenever I have used materials (data, theoretical analysis, fig- ures, and text) from other sources, I have given due credit to them by citing them in the text of the thesis and giving their details in the references. Further, I have taken permission from the copyright owners of the sources, whenever necessary. Nilkanta Sahu TH-1469_10610110 TH-1469_10610110 Copyright Attention is drawn to the fact that copyright of this thesis rests with its author. This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that its copy- right rests with its author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author. This thesis may be made available for consultation within the Indian Institute of Technology Library and may be photocopied or lent to other libraries for the purposes of consultation. Signature of Author.......................................................................... Nilkanta Sahu TH-1469_10610110 TH-1469_10610110 Certificate This is to certify that the project work entitled “Robust Watermarking for Scalable Video Sequence” be- ing submitted to Department of Computer Science and Engineering, Indian Institute of Technology Guwahati by Nilkanta Sahu, in partial fulfillment for the award of the degree of Doctor of Philosophy in Computer Science and Engineering, is a bonafide work carried out by him under my supervision. To the best of my knowledge it has not been submitted elsewhere for award of degree Date: ........................ Dr. Arijit Sur Place: Assistant Professor Department of Computer Science and Engineering IIT Guwahati TH-1469_10610110 TH-1469_10610110 Acknowledgments A great many people have contributed to production of this disser- tation. I owe my gratitude to all those people who have made this possible. I wish to express my deepest gratitude to my adviser, Dr. Arijit Sur. I have been fortunate to have an advisor who gave me the freedom to explore on my own, and at the same time the guidance to recover when my steps faltered. His patience, support constant motivation helped me overcome many crisis situations and finish this dissertation. Besides my advisor, I would like to thank the rest of my thesis com- mittee: Prof. S V Rao, Prof. P K Bora and Dr. Pinaki Mitra, for their insightful comments and encouragement. Their construc- tive criticism and suggestions helped me to widen my research from various perspectives. I would also like to acknowledge the services and support of the Staff of Dept. of Computer Science and Engineering, IITG for providing access to valuable resources and extending all necessary support for the successful completion of my research work. I am grateful to all my seniors, friends and juniors especially Shrini- vasa sir, Ashok sir, Mamata di, Pravati, Shishendu, Mayank, Shashi, Shilpa, Basant, Sibaji, Satish, Rana, Anirban and many others for their unconditional help and support. You made my life at IIT Guwa- hati a memorable. Most importantly, none of this would have been possible without the love and patience of my family. My family has been a constant source of love, concern, support and strength all these years. TH-1469_10610110 TH-1469_10610110 Abstract With the emergence of the scalable video coding (SVC), efficient and secure transmission of scalable video stream becomes an important research topic. In the recent literature, watermarking is regarded as an efficient tool for scalable video authentication. Primary motivation of this entire dissertation, is to develop robust watermarking solutions for different scalable adaptations like resolution, temporal and qual- ity. In the first part of this work, watermarking issues for resolution and quality scalability have been considered. It has been observed in the literature that there are two basic requirements for the scalable water- marking, firstly the watermark should be extracted from each of the scalable layers and secondly, reliability of the extracted watermark should be increased with the increase of the video quality layers i.e. achieving graceful improvement. In this context, an uncompressed domain watermarking scheme has been proposed to meet both the requirements while maintaining the decent visual quality of the wa- termarked video. It is observed that the temporal adaptation is also a serious problem for designing robust scalable watermarking. In the next phase of this work, a robust algorithm has been devised where DCT based motion compensated temporal filtering is used to handle the temporal adap- tation. A wavelet based spatial filtering is also used for embedding zone selection to achieve an acceptable visual quality. Although, the proposed scheme against the resolution scalability out- performs recent existing schemes, its performance can be improved, especially when the resolution scaling is relatively large. In the third TH-1469_10610110 phase of the work, a scale invariant feature transformation (SIFT) based image watermarking has been proposed which can easily be ex- tended to the frame based video watermarking. The proposed scheme exploits the scale invariant property of the SIFT feature to devise a robust algorithm when the resolution scaling is relatively high. It can be observed that the proposed algorithm against temporal adaptation, in the second part of the thesis, mostly outperforms the existing schemes, but it requires a location map for the extraction of the watermark which is an extra overhead. To take away this extra overhead, two schemes have been proposed in the final phase of this thesis, which require no location map for the watermark extraction. In the first scheme, a SIFT based watermarking algorithm is proposed which is invariant to the temporal scaling and performs well against temporal adaptation and any frame dropping and averaging attacks. In the second scheme, the frames of each temporal layer have been embedded with a different watermark which is generated by block DCT decomposition of a single watermark image to achieve graceful improvement in the successive enhancement layers. Finally, the thesis concludes by briefly summarizing the works pre- sented in the dissertation and explaining the possible future research directions. Keywords : Watermarking, scale invariant watermarking, RST, content adaptation, SVC, MCDCT-TF, SIFT, visual saliency, wavelet, block DCT, base layer, enhancement layer. TH-1469_10610110 Abbreviation BIR Bit Increase Rate DCT Discrete Cosine Transform DFT Discrete Fourier Transform GOF Group of Frames GOP Group of Pictures GPE Global Perceptual Error HD High Definition HVS Human Visual System IMCDCT-TF Inverse Motion Compensated DCT based Temporal Filtering JSVM Joint Scalable Video Model MC Motion Compensation MCDCT-TF Motion Compensated DCT based Temporal Filtering MCTF Motion Compensated Temporal Filtering PSNR Peak Signal to Noise Ratio RST Rotation Scaling Translation SIFT Scale Invariant Feature Transform SSIM Structural Similarity SVC Scalable Video Coding VQM Video Quality Metric xvTH-1469_10610110 TH-1469_10610110 List of symbols σ Scale of Gaussian filter α Watermark strength δ Change in intensity V Input video/ Original video V w Watermarked Video I Input Image/ original image Iw Watermarked Image R Residual Layer Rw Watermarked residual layer D SIFT descriptor of original image/frame D′ SIFT descriptor of watermarked image/frame Dw Watermark descriptor Vth Visual quality threshold |x| modulus of x A\B Set difference of A and B W Watermark signal H(x→y), V (x→y) Motion Vector from frame x to frame y Lmap Location map xviiTH-1469_10610110 TH-1469_10610110 List of Figures 1.1 Use of scalable video . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Different Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Video Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 A basic Scalable Video Watermarking scenario . . . . . . . . . . . 5 1.5 Block Diagram for Watermark Embedding [39] . . . . . . . . . . . 11 1.6 Block Diagram for Watermark Extraction[39] . . . . . . . . . . . 12 1.7 Block diagram of Meerwald’s [44] watermark embedding method for two spatial layers. . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Motion Compensated Temporal Filtering(Figure courtesy [54]). . . 20 2.2 DCT based MCTF (L, M,H are low middle and high frequency frames respectively) . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Connected and unconnected pixel, (a)Fully connected pixels, (b)Unconnected pixels, (c)Partially connected pixel, (d)One to many connection . 23 2.4 RARE2012 flow chart [58] . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Block DCT of video frame . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Location Map based Technique for Spatial Coherency . . . . . . . 37 3.3 Watermark Embedding Model . . . . . . . . . . . . . . . . . . . . 38 3.4 Watermark Extraction Model . . . . . . . . . . . . . . . . . . . . 44 3.5 PSNR comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6 Flicker Metric comparison . . . . . . . . . . . . . . . . . . . . . . 51 3.7 VQM comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 xixTH-1469_10610110 LIST OF FIGURES 3.8 SSIM comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.9 Robustness comparison . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 Spatial Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 DCT based Motion Compensated Temporal Filtering (MCDCT-TF) 62 4.3 Inverse motion compensation of the Location Map . . . . . . . . . 64 4.4 Pixel categories and Location Map . . . . . . . . . . . . . . . . . 65 4.5 Watermark Embedding Model . . . . . . . . . . . . . . . . . . . . 68 4.6 Frames after every step . . . . . . . . . . . . . . . . . . . . . . . . 68 4.7 Watermark Extraction Model . . . . . . . . . . . . . . . . . . . . 68 4.8 PSNR comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.9 Flicker comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.10 SSIM comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.11 VQM comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.12 Robustness at different Temporal layer . . . . . . . . . . . . . . . 74 4.13 Robustness comparison . . . . . . . . . . . . . . . . . . . . . . . 75 5.1 Lena Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Plot for finding best possible β . . . . . . . . . . . . . . . . . . . 82 5.3 Robustness Comparison with scheme [24] (red) and [38](black ) . 86 5.4 Matching ratio of watermark descriptor with original descriptor in previous scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.5 Watermark Embedding Scheme . . . . . . . . . . . . . . . . . . . 90 5.6 Visual Degradation Comparison between proposed scheme and ex- isting schemes for Lena . . . . . . . . . . . . . . . . . . . . . . . . 92 5.7 Robustness Comparison between proposed scheme and existing schemes for Boy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.8 Robustness Comparison between proposed scheme and existing schemes for Serano . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.9 Embedding of watermark for baboon, barbara and lena. Top row shows the original images. Middle row shows the watermarked images with patch. Bottom row shows the newly generated SIFT points due to insertion of patch. . . . . . . . . . . . . . . . . . . . 94 5.10 Plot depicting variation of intensity with stability . . . . . . . . . 97 xxTH-1469_10610110 5.11 Plot depicting variation of intensity with perceptual error . . . . . 98 5.12 Plot depicting variation of stability with perceptual error . . . . . 101 5.13 Variation of robustness with change in intensity of the patch by 20 102 5.14 Variation of GPE with change in intensity of the patch by 20 . . . 102 6.1 Side Plane and Embedding zone . . . . . . . . . . . . . . . . . . . 104 6.2 Embedding Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 Motion Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4 New SIFT features for Smooth Area and Busy Area . . . . . . . . 108 6.5 Selection of blocks belonging to relatively smoother region . . . . 109 6.6 Zone Selection Procedure . . . . . . . . . . . . . . . . . . . . . . . 109 6.7 Temporally adapted video and corresponding resizing for extraction110 6.8 Robustness comparison with Chong’s scheme [67] for City video . 114 6.9 PSNR comparison of the watermarked frames . . . . . . . . . . . 115 6.10 SSIM comparison of the watermarked frames . . . . . . . . . . . . 116 6.11 Flicker comparison of the watermarked frames . . . . . . . . . . . 116 6.12 Watermark Generation . . . . . . . . . . . . . . . . . . . . . . . . 118 6.13 DCT coefficients in zigzag scan . . . . . . . . . . . . . . . . . . . 120 6.14 Watermark Embedding . . . . . . . . . . . . . . . . . . . . . . . . 121 6.15 Watermark Extraction . . . . . . . . . . . . . . . . . . . . . . . . 121 6.16 Graceful Improvement . . . . . . . . . . . . . . . . . . . . . . . . 122 6.17 Robustness comparison . . . . . . . . . . . . . . . . . . . . . . . . 123 6.18 PSNR comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 TH-1469_10610110 TH-1469_10610110 List of Algorithms 3.1 Embedding Algorithm (V, α,W ) . . . . . . . . . . . . . . . . . . . 42 3.2 Residual Embedding (R,α,W,Lmap,MV ) . . . . . . . . . . . . . . 43 3.3 Extraction Algorithm (V w, α) . . . . . . . . . . . . . . . . . . . . . 46 3.4 Residual Extraction (Rw,α, Lmap,MV ) . . . . . . . . . . . . . . . 47 4.1 Embedbit(A,B,wb, δ) . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Embedding Algorithm (V, α,W ) . . . . . . . . . . . . . . . . . . . 67 4.3 Extraction Algorithm (V w) . . . . . . . . . . . . . . . . . . . . . . 69 5.1 Watermark Zone selection . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 Watermark Embedding . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Watermark Extraction & Authentication . . . . . . . . . . . . . . . 84 5.4 Watermark Zone selection . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Watermark Embedding . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1 Watermark Embedding . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2 Watermark Extraction & Authentication . . . . . . . . . . . . . . . 112 TH-1469_10610110 TH-1469_10610110 List of Tables 2.1 Experimental Data set . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Hamming distance of the extracted watermark from scalable CIF Bus video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Hamming distance of the extracted watermark from scalable CIF Akiyo video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Hamming distance of the extracted watermark from scalable HD Pedestrian area video . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Hamming distance of the extracted watermark from scalable HD Sunflower video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1 GPE for Standard Images . . . . . . . . . . . . . . . . . . . . . . 85 5.2 Average Robustness for all the images in the dataset . . . . . . . 86 5.3 Robustness for Standard images when scaled . . . . . . . . . . . . 87 5.4 GPE and Saliency for Standard Images . . . . . . . . . . . . . . . 95 5.5 Median Robustness for all the images . . . . . . . . . . . . . . . 95 5.6 Robustness for Standard images when scaled . . . . . . . . . . . . 96 6.1 Robustness against random frame dropping . . . . . . . . . . . . 113 6.2 Robustness against temporal scaling . . . . . . . . . . . . . . . . 113 6.3 Robustness against frame averaging . . . . . . . . . . . . . . . . . 114 xxvTH-1469_10610110 TH-1469_10610110 Contents 1 Introduction 1 1.1 Digital Video Watermarking . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Evaluation Parameters . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Scalable Image Watermarking . . . . . . . . . . . . . . . . 6 1.2.2 Scalable Video Watermarking . . . . . . . . . . . . . . . . 8 1.3 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Watermarking against resolution and quality scalability . 14 1.4.2 Watermarking against temporal and quality scalability . . 14 1.4.3 Watermarking based on SIFT . . . . . . . . . . . . . . . . 15 1.4.4 Watermarking against temporal scalability . . . . . . . . . 15 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Research Background 19 2.1 Motion Compensated Temporal Filtering (MCTF) . . . . . . . . . 19 2.1.1 Motion Compensated DCT based Temporal Filtering (MCDCT- TF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Inverse Motion Compensated DCT based Temporal Filter- ing (IMCDCT-TF) . . . . . . . . . . . . . . . . . . . . . . 22 xxviiTH-1469_10610110 CONTENTS 2.1.3 Connected and unconnected pixels . . . . . . . . . . . . . 23 2.2 Scale Invariant Feature Transform . . . . . . . . . . . . . . . . . . 24 2.3 Visual saliency model RARE2012 . . . . . . . . . . . . . . . . . . 26 2.4 Evaluation Parameters . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.1 Visual quality parameters . . . . . . . . . . . . . . . . . . 28 2.4.2 Robustness parameter . . . . . . . . . . . . . . . . . . . . 30 2.5 Experimental Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Robust Video watermarking against Resolution and Quality Scal- ability 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 DC Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Graceful Improvement of the Watermark . . . . . . . . . . 35 3.3 Proposed Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Watermark Embedding Scheme . . . . . . . . . . . . . . . 37 3.3.2 Extraction Scheme . . . . . . . . . . . . . . . . . . . . . . 41 3.3.3 Embedding Capacity . . . . . . . . . . . . . . . . . . . . . 45 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.1 Visual Quality . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Robust Video watermarking against Temporal and Quality Scal- ability 59 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Proposed Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.1 Location Map . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.2 Embedding Scheme . . . . . . . . . . . . . . . . . . . . . . 63 4.2.3 Coefficient Selection . . . . . . . . . . . . . . . . . . . . . 64 4.2.4 Visual Quality Threshold . . . . . . . . . . . . . . . . . . 68 4.2.5 Extraction Scheme . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 69 xxviiiTH-1469_10610110 CONTENTS 4.3.1 Visual Quality . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Robustness Comparison . . . . . . . . . . . . . . . . . . . 72 4.3.3 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 SIFT based Robust Image Watermarking against Resolution Scal- ability 77 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Proposed Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.1 Watermark Zone Selection . . . . . . . . . . . . . . . . . . 79 5.2.2 Derivation of Quality Parameter β . . . . . . . . . . . . . 81 5.2.3 Watermark Embedding . . . . . . . . . . . . . . . . . . . . 82 5.2.4 Watermark Extraction & Authentication . . . . . . . . . . 84 5.2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . 84 5.3 Improvement over the proposed scheme . . . . . . . . . . . . . . . 87 5.3.1 Strength of Individual SIFT Feature . . . . . . . . . . . . 87 5.3.2 Modified Watermark Zone Selection . . . . . . . . . . . . 88 5.3.3 Watermark Embedding . . . . . . . . . . . . . . . . . . . . 89 5.3.4 Watermark Extraction . . . . . . . . . . . . . . . . . . . . 89 5.3.5 Experimental Result . . . . . . . . . . . . . . . . . . . . . 92 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6 Robust Video Watermarking against Temporal Scalability 103 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability 104 6.1.1 Watermarking Zone Selection . . . . . . . . . . . . . . . . 105 6.1.2 Watermark Embedding . . . . . . . . . . . . . . . . . . . . 109 6.1.3 Watermark Detection & Authentication . . . . . . . . . . 110 6.1.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . 112 6.2 Robust video watermarking against Temporal Scalability . . . . . 117 6.2.1 Proposed scheme . . . . . . . . . . . . . . . . . . . . . . . 117 6.2.2 Watermark Generation . . . . . . . . . . . . . . . . . . . . 118 6.2.3 Watermark Embedding . . . . . . . . . . . . . . . . . . . . 119 6.2.4 Watermark Extraction . . . . . . . . . . . . . . . . . . . . 119 6.2.5 Graceful Improvement . . . . . . . . . . . . . . . . . . . . 119 xxixTH-1469_10610110 CONTENTS 6.2.6 Experimental Result . . . . . . . . . . . . . . . . . . . . . 121 6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Conclusion and Future Works 125 7.1 Watermarking against Resolution and Quality Scalability . . . . 125 7.2 Watermarking against temporal and quality scalability . . . . . . 126 7.3 Image watermarking based on SIFT against resolution scaling . . 126 7.4 Watermarking against Temporal Scalability . . . . . . . . . . . . 127 7.5 Future Research Scope . . . . . . . . . . . . . . . . . . . . . . . . 128 References 138 xxxTH-1469_10610110 Chapter1 Introduction The rapid growth in Internet technology and media communication started anew era of video broadcasting and transmission. In this new era, the heterogene- ity among the end-using display devises increased considerably with respect to the display resolution, processing power, network bandwidth, etc. Depending on their computation power, display size or storage capacity, these devices have varying requirements in terms of video quality, frame rate, resolution, etc. It has been observed that achieving these scalable adaptation at the receiving side for a variety of end-using devices is a bit complicated process. Scalable video trans- mission provides a viable solution to this problem by performing these scalable adaptation at the multimedia servers rather than on the receiving end. A hypo- thetical scalable video transmission scenario is depicted in Fig. 1.1. A scalable video stream can be adapted to provide different types of resolutions, quality and spatio-temporal characteristics. A single video stream (high quality, high resolu- tion and high frame rate) will be stored and each content consumer will be able to extract the best video representation for their applications or devices. Different types of video scalability are presented in Fig. 1.2. There are several strategies to achieve scalability : layered coding which is followed by MPEG-4 and its pre- decessor, embedded coding used by 3D subband coder, such as MC-EZBC and hybrid coding utilized by MPEG4-FGS and H.264/SVC [1]. The widespread and easy accesses to multimedia contents and the possibility to make unlimited copies without loss of considerable fidelity/quality have made 1TH-1469_10610110 1. INTRODUCTION Figure 1.1: Use of scalable video the digital rights management as an essential requirement for the efficient media transmission. Thus, assuring protection such as ownership as well as video con- tent authentication became a challenging research problem especially when the scalable media is concerned. Encryption and cryptographic hashes are proposed to meet the solutions. But it is observed that the scalability property of the bit stream is lost [2, 3] if the video bit stream is encrypted with conventional crypto- graphic ciphers like AES [4]. Some schemes used multiple keys for multiple layers but it generally requires a complicated key management to meet the application scenario. Secret sharing based cryptographic solutions [5] also fails against reso- lution scaling. Watermarking has been used [6, 7, 8, 9, 10, 11] popularly in the last two decades for copyright protection and content authentication of multime- dia content. In this research work, video watermarking is considered as the tool for ensuring secure video transmission. 1.1 Digital Video Watermarking Digital video watermarking is a technique which inserts a digital signature (num- ber sequence, binary sequence, logo, etc.) into the video stream which can be 2TH-1469_10610110 1.1 Digital Video Watermarking (a) Temporal Scalability (b) Spatial Scalability (c) SNR scalability Figure 1.2: Different Scalability. 3TH-1469_10610110 1. INTRODUCTION Figure 1.3: Video Watermarking extracted or detected to authenticate the ownership of the video or the video itself. A fundamental video watermarking system is described in the Fig. 1.3 and the basic watermarking scenario for scalable video sequence is depicted in the Fig. 1.4. 1.1.1 Evaluation Parameters There are many parameters to evaluate the efficiency of a watermarking system. These parameters are often mutually conflicting. For example, the visual quality (imperceptibility) may be reduced to increase the robustness of the scheme. Few important parameters are described below: Robustness : The robustness of a watermarking scheme is defined as how effi- ciently the watermarking scheme withstands against intentional and unin- tentional attacks. Imperceptibility : Imperceptibility implies that the watermark should not be perceptually noticeable in the watermarked video. Payload : Payload measures the number of bits or the size of the watermark which is embedded to the cover media. Blindness : A watermarking scheme is called blind if the original content is not required at the time of watermark extraction. 4TH-1469_10610110 1.1 Digital Video Watermarking Figure 1.4: A basic Scalable Video Watermarking scenario Bit Increase Rate : The video bit rate may get increased due to embedding. An efficient watermarking scheme embeds watermark in such a way that bit increase rate (BIR) should not be increased considerably. 1.1.2 Applications Despite the widespread applications of digital watermarking as content or copy- right authentication tool, it may be used for several purposes. Some of them are narrated below: Ownership Authentication / Copyright Protection [9] : As video content is a valuable commodity, ownership or copyright of the content must be pro- tected. Watermarking resolves copyright issues of digital media by using copyright data as watermark information. Video Authentication [12] : Alteration to a video content can be done easily and such alteration is often very difficult to detect. Watermarking can be used to verify the authenticity of the content by identifying the possible video tampering or forgery. Traitor Tracing[13] : Watermark can also be used to trace the source of pirated video content to stop the unauthorized content distribution. 5TH-1469_10610110 1. INTRODUCTION Broadcast Monitoring [14] : Watermark can also be used for managing video broadcasting by putting a unique watermark in each video clip and assessing broadcasts by an automated monitoring station. Medical Application [15] : A mix up in X-rays and MRI scans of two patients may be disastrous and must be avoided. Visible watermark can be used to identify the patient accurately by embedding some digital signature [16]. 1.2 Literature Survey It is observed in the literature that few video watermarking schemes are reported as a direct extension of the existing image watermarking scheme [17, 18]. More specifically, frame by frame video watermarking may be achieved by applying existing image watermarking scheme. Although, there exists few limitations of such extension, scalable image watermarking may be a good starting point to analyze the merits and demerits of state of the art scalable video watermarking. 1.2.1 Scalable Image Watermarking Piper et al. mentioned explicitly about scalable watermarking (for image) in [19]. They used different coefficient selection methods for the spread-spectrum embedding proposed by Cox [6] and have evaluated the robustness of the scheme against quality and resolution scalability. Later, Piper [20] discussed that the spatial resolution and quality scalable watermarking can be achieved by exploit- ing the characteristic of Human Visual System (HVS). In another work, Seo et. al. [21] evaluated a scalable image watermarking scheme for protecting distant learning content and proposed a watermark embedding technique using wavelet based image coding. Content-based image watermarking schemes are generally used to resist the geometric attacks [22, 23, 24]. There are a variety of watermarking techniques that aim to be robust against a specific subset of geometric attacks. In some schemes [25, 26], watermark is embedded in such domains (e.g. Fourier-Mellin, Radon) which are invariant to the geometric attacks. In, [27], a reference pattern 6TH-1469_10610110 1.2 Literature Survey or template is embedded which can be used to synchronize the watermark during extraction whereas in [28], exhaustive search has been performed. Feature point-based approach is one of the most promising classes of image watermarking. Kutter et al. [29] argued that a watermark is more robust if the embedding is performed using these feature points as they can be viewed as containing second-order information of the image. In this direction, Bas et al. [22] proposed a scheme where delaunay triangulation is computed on the set of feature points which are robust to the geometric distortion. The watermark is then embedded into the resulting triangles and detection is done using the correlation properties on the different triangles. Drawback of this method is that the extraction process may not accurately extract the watermark especially when the extracted feature points from the original and distorted images are not matched as the sets of triangles generated during watermark insertion and detection are different. Scale Invariant Feature Transform (SIFT) [30] is an image descriptor for image based matching developed by David Lowe. SIFT features have been used in many applications like multi view matching [31, 32], object recognition [33], object classification [34, 35], robotics [36] etc. It is also being used for robust image watermarking against geometric attacks [24, 37, 38]. Miyaki et al. [37] proposed a RST invariant object based watermarking scheme where SIFT features are used for object matching. In the detection scheme, the object region is first detected by feature matching. The transformation parameters are then calculated, and the message is detected. Though the method produces quite promising results but it is a type of informed watermarking. The register file has to be shared between the sender and receiver which may not be always desirable. In another SIFT based work, Kim et al. [24] have inserted watermark into the circular patches generated by the SIFT. The detection ratio of the method varies from 60% to 90% depending upon the intensity of the attack. It is observed that under strong distortions due to attenuation and cropping, this additive watermarking method sometimes fails to accurately detect the watermark. More recently, Jing et al. [38] used SIFT points to form a convex hull. The SIFT points are then optimally triangulated. The watermark is embedded into the circles centered around the 7TH-1469_10610110 1. INTRODUCTION centroid of each triangle. This method also fails to sustain the watermark when the image is scaled down considerably. It the watermark observed in the literature that above mentioned image water- marking schemes are often directly used for frame by frame video watermarking. But there are certain limitations for the frame by frame embedding. Firstly, it can create flickering artifacts [17, 39], secondly these schemes are generally vulnerable against collusion attacks [17, 40, 41]. 1.2.2 Scalable Video Watermarking It is discussed in the previous subsection that the scalable video transmission is an emerging field of research in recent times. But, literature reveals that relatively less attention has been paid to the scalable video watermarking in comparison with the general video watermarking. Lu [42] possibly the first characterized the scalable watermarking and argued that the watermark should be detected at every resolution and quality layers. Piper [20] mentioned graceful improvement as another important property of the scalable watermarking where the watermark detection should become more and more accurate with the improvement of the video quality (with addition of enhancement layers). Challenges of the Scalable Video Watermarking The main problem of scalable watermarking is that the bit-budget for a scalable sub-stream is not known a-priory as main bit stream can be truncated at any spatio-temporal bit truncation point. In scalable watermark, it is required to protect the base layer as well as the enhancement layers which generally causes substantial bit increase for the watermarked video. Keeping low bit increase rate (BIR) for scalable video watermarking is a real challenging task [43, 44]. Since, different scalable parameters like resolution, frame rate, quality etc. are different in nature, assuring combined watermarking security to all of them sometimes re- quires conflicting demands. Thus achieving combined scalable watermarking is a difficult task [20]. It is also observed that the statistical distribution of the trans- form domain coefficients of the base layer is substantially different than that of enhancement layer. It makes the multi-channel detection more complicated 8TH-1469_10610110 1.2 Literature Survey for incremental detection performance. Finally, watermarking zone selection be- comes challenging in presence of the inter layer prediction structure of the scalable coding. Literature on Scalable Video Watermarking Alattar et al. [45] proposed a compressed domain watermarking scheme for MPEG-4 against RST attack. They used a synchronization template to resist the RST attacks. In [41], Jung et al. proposed a RST invariant watermarking scheme where the content adaptive watermark signal is embedded in the Discrete Fourier Transform (DFT) domain of the video stream. Authors have used log polar projection to detect the watermark. The problem of extending this work for scalable video is that it may not withstand quality and temporal adaptation although it achieves desired robustness if only resolution scaling is considered. Moreover, visual artifacts may be generated due to logarithmic mapping during watermark embedding. Chang et al. [46] have combined encryption and wa- termarking to realize layered access control to a temporally scalable M-JPEG stream. They have encrypted enhancement layer and embedded the key needed to decrypt it in the base layer. So that, the key receives stronger error-protection than the content. Wang et al. [43] proposed a blind watermarking scheme for MPEG-4 where watermark embedded into FGS bit planes for authentication of enhancement layer. One bit is embedded by forcing the number of non-zero bits Tj per bit plane j and block to even or odd depending on the watermark. In another recent scheme, Y. Wang and A. Pearmain [47] have proposed a blind scale-invariant watermarking scheme for MPEG-2. In this scheme, authors em- bedded the watermark in a single frame (middle frame) of the GOP (Group of Picture) of 3 frames. It is observed that the scheme is vulnerable against type I collusion attack [40, 41] as the watermark can easily be estimated by comparing watermarked frame and the two adjacent non-watermarked frame. The scheme may also be vulnerable against frame dropping attack as the watermark for an entire GOP has been lost if the single watermarked frame is dropped or replaced. The temporal artifacts may be caused by watermark embedding, as the scheme is not using motion compensated embedding. Moreover, since the watermark can 9TH-1469_10610110 1. INTRODUCTION only be extracted from the base layer, base layer computation is always required for the watermark extraction from any of the enhancement layers. A RADON transformation based RST invariant watermarking scheme [48] has been proposed by Liu and Zhao. In this scheme, authors have used temporal DFT and embedded the watermark using RADON coefficients. Objectionable visual artifacts in the watermarked video may be caused due to embedding by altering the RADON coefficients. Since the embedded watermark in the base layer is same as in the different enhancement layers, the cross-layer collusion at- tack [40, 41] may be mounted over the watermarked video. Moreover, temporal motion may degrade the embedded watermark. In schemes [49, 50], 3D wavelet coefficients are used for watermark embedding. Since the temporal motion is not considered in these schemes, they may suffer from the flickering artifact due to embedding and may produce visually degraded watermarked video [39]. However, these algorithms are not designed for recent scalable video coding techniques, and they did not introduce the concept of adaptation detection into the scalable wa- termark model. In last few years, after standardization of SVC [51] in 2007, very few works are published in the literature on watermarking of scalable video content. Some significant works are described as follows: Watermarking based on Temporal Filtering It is observed in the previous section that flickering artifact is caused due to frame-by-frame watermarking. It is found in the literature [39, 40] that the mo- tion compensation in the temporal direction during embedding generally reduces these artifacts. On the other hand, temporal filtering can be used to resist the frame dropping or frame averaging kind of attacks. Considering these facts, there exists a group of scalable watermarking schemes which use motion compensated temporal filtering (MCTF) [52, 53, 54] to find the suitable embedding zone for watermarking [39, 40]. In this subsection, MCTF based schemes and their pros and cons are described. P. Vinod and P. K. Bora [40] used MCTF structure for video watermarking to resist the collusion attacks. First they segmented the video into different scenes. Then every scene is temporally decomposed using wavelet based MCTF. Then Watermark is embedded along the motion trajec- tory of low pass frames. Bhowmik et al. [39] proposed a motion compensated 10TH-1469_10610110 1.2 Literature Survey spatio-temporal sub-band decomposition scheme, based on the modified MCTF for video watermarking. They have used a 2D+t+2D decomposition framework where they decomposed the video sequence in different temporal and spatial lev- els and choose the best subband for embedding. Temporal decomposition is done using Modified MCTF. Embedding distortion is evaluated using MSE and flicker difference metric. The proposed sub-band decomposition also has low compu- tational cost as MCTF is performed only on sub-bands where the watermark is embedded. Authors have proposed two approaches, one is blind and another one is non-blind. In the non-blind approach additive watermarking is used where co- efficients are increased or decreased according to the Equation 1.1. Block diagram of the embedding and extraction technique for the blind technique is shown in Fig. 1.5 and Fig. 1.6 respectively. C ′s,t[m,n] = Cs,t[m,n] + αCs,t[m,n]W (1.1) where Cs,t[m,n] and C ′ s,t[m,n] is [m,n]th is original and corresponding water- marked coefficient respectively, α is watermark strength and W is watermark signal. Authors have evaluated it against only quality scalability attack which Figure 1.5: Block Diagram for Watermark Embedding [39] may not be useful in a real world situation where a video will be scaled in any dimension according to the requirement. Visual quality, Bit Increase Rate are (BIR) not evaluated for watermarked video. They haven’t considered any zone selection or coefficient selection method where more research can be done. 11TH-1469_10610110 1. INTRODUCTION Figure 1.6: Block Diagram for Watermark Extraction[39] Watermarking in Compressed Domain Meerwald et al. [44] proposed a compressed domain watermarking for H.264/SVC. They extended a framework for robust watermarking of H.264-encoded video pro- posed by Noorkami et al. [55] to scalable video coding (SVC). The main objective of this work is to handle spatial (resolution) scaling. Authors have shown that the watermark embedding in the base layer of the video is insufficient to protect the decoded video at higher enhancement layers as the embedded watermark may get faded in higher resolution layers. Moreover, the bit rate of the enhancement layer may get increased. To solve this problem, they up-sampled the base layer watermark signal and embedded in the enhancement layer. At first, watermark is embedded in the base layer stream using the Eqn. 1.2 proposed in [55]. Rwk = Rk + Sk.Wk (1.2) Where Wk is watermark signal, Sk is location matrix and Rk and R w k is base layer residual block and corresponding watermarked block. Then upsampled water- mark signal is embedded using the Eqn. 1.3. Rk Ew = Rk E +Wk E (1.3) where Wk E is upsampled watermark signal, Rk E is the enhancement layer residual block . Proposed Watermarking structure is shown in Fig. 1.7. Problem with this approach is that the higher resolution video 12TH-1469_10610110 1.3 Motivation and Objectives Figure 1.7: Block diagram of Meerwald’s [44] watermark embedding method for two spatial layers. must be down-sampled to the base layer for the watermark extraction. Thus, it doesn’t follow the convention given in [19]. It is also observed that the BIR for the enhancement layer is bit high and the scheme becomes too complex for more than two enhancement layers. 1.3 Motivation and Objectives From the above literature, it is observed that most of the schemes are not perform- ing well for relatively high degree of resolution scaling. The temporal adaptation is also not properly taken care off and the temporal synchronization for hetero- geneous motion, collusion resistant watermarking etc. issues should be properly handled. The suitable zone selection for maintaining decent a visual quality of the watermarked video during scalable watermarking is also an important issue and should be more explored. Finally, SIFT based schemes look very promising and how SIFT features are more efficiently used for watermarking may be another very interesting study. Motivated by these issues, main objective of this work is to enhance the robustness of the watermarking scheme against the content adap- 13TH-1469_10610110 1. INTRODUCTION tation attacks while maintaining the decent visual quality of the watermarked video. This has been carried out by 1. Proposing robust watermarking against the resolution, temporal and quality adaptation attacks. 2. Maintaining a decent visual quality of the watermarked video by proposing suitable zone selection methods for watermarking. 3. Proposing robust watermarking schemes by exploiting SIFT features against temporal and resolution adaptation. 1.4 Contribution of the thesis 1.4.1 Watermarking against resolution and quality scala- bility In the first work, a watermarking scheme against resolution and quality scalability is proposed where the base layer embedding is done on the DC frame which is generated by accumulating DC values of non-overlapping blocks for every frame in the input video sequence. DC frame sequence is up-sampled and subtracted from the original video sequence to generate the residual frame sequence. Then Discrete Cosine Transform (DCT) based temporal filtering is applied on DC as well as residual frame sequence. Watermark is embedded in the low pass DC frames and up sampled watermark is embedded in the low pass residual frames to achieve the graceful improvement of watermark signal in successive enhancement layers. It is experimentally shown that the proposed scheme performs well against resolution and quality adaptation and outperforms existing related schemes. 1.4.2 Watermarking against temporal and quality scala- bility In the next phase, a blind scalable video watermarking scheme is proposed, which is robust against quality and temporal scalability. In the proposed scheme, DCT based temporal filtering and wavelet based spatial filtering is used for selecting 14TH-1469_10610110 1.4 Contribution of the thesis watermark embedding zone. Temporal filtering is used on GOP to exploit the correlation among frames. Watermark is embedded in the low pass frames. In this work, location map is used to accurately describe the embedding locations for efficient and blind watermark extraction. 1.4.3 Watermarking based on SIFT In this work, a novel image watermarking scheme is proposed which is robust to the resolution scaling. In the proposed method, SIFT features are used which are invariant to scaling. Most of the existing SIFT based watermarking algorithms, fail to retain the watermark when the image is heavily scaled down. In the proposed scheme, a context coherent object or patch is inserted in the image such that it generates strong SIFT features. These newly generated SIFT features are themselves used as the watermark. Since the SIFT features are invariant to scaling, these features can be extracted from any image resolution with high probability. 1.4.4 Watermarking against temporal scalability In the final phase of the work, two watermarking schemes are proposed against temporal scalability. In the first work, SIFT features are used to handle the tem- poral scalability. In this work, a cubic region of the video is modified to generate new SIFT feature, which are stored in database as the watermark. Modification is done in a low motion area of a randomly selected frame set as the embedding in high motion area often creates flickering artifacts. To resist temporal scaling and frame dropping attack, SIFT features are extracted from the side plane. In the proposed scheme, low motion and high texture zones of selected n consecu- tive frames are chosen as the embedding location. Effectiveness of the scheme is experimentally justified against the temporal adaptation and frame dropping attacks. In the second work, each temporal layer has been separately embedded with a different watermark which is generated by DCT domain decomposition of a single watermark image to ensure the graceful improvement of the extracted watermark along successive higher layers. A zigzag sequence of the block DCT coefficients 15TH-1469_10610110 1. INTRODUCTION of the watermark image is partitioned into non-overlapping sets and each set is embedded separately into different temporal layers. The base layer is embedded with the first set of DCT coefficient (which includes DC coefficient of each block) and successive layers are embedded with successive non-overlapping coefficient sets. The coefficients of each set are chosen in such a fashion that uniform energy distribution across all temporal layers can be maintained. 1.5 Thesis Organization This PhD dissertation consists of seven chapters. The first chapter consists of a brief introduction of scalable image and video watermarking, a brief literature survey, research motivation and objectives of the thesis, contribution of the thesis and the organization of the thesis. • Chapter 2 describes the background of the research which includes some pre- liminary concepts like MCTF, SIFT etc., evaluation metrics, experimental data set etc. which are used in later chapters. • In the 3rd chapter, a robust watermarking algorithm is presented against the resolution and quality scaling where watermark is first embedded in base layer and then the up-sampled watermark is embedded in the enhancement layer to achieve the graceful improvement. • Chapter 4 introduces a MCDCT-TF based robust watermarking against temporal and quality scalability. Watermark is embedded in the temporal low pas frames and a location map is used for the watermark extraction. • In chapter 5, SIFT based scale invariant image watermarking is presented. In this scheme, intensity of an image patch is changed to generate new SIFT features, which is stored as the watermark. • Chapter 6 describes two different schemes against temporal scalability and temporal attacks. In the first work, SIFT features of the side planes of the video sequence are used for watermarking to handle temporal scalability. In the second work, different temporal layer frames are embedded with different watermarks. 16TH-1469_10610110 1.6 Summary • The chapter 7 briefly summarizes the PhD dissertation and suggests future research directions. 1.6 Summary In this introductory chapter, at first the domain of the research is defined. Then a brief literature survey is presented and the corresponding limitation have been identified. Based on these limitations, motivation and the objective of the re- search work is formulated. Finally, the brief description of the contributions and the thesis organization are presented. 17TH-1469_10610110 TH-1469_10610110 Chapter2 Research Background In this chapter, a brief overview of mathematical preliminaries and theoretical foundations relevant to the topics of interests are presented. This includes a discussion on MCTF, SIFT and RARE2012. In addition, the different evalua- tion parameters for the proposed algorithms and corresponding data set used for experimentations are also described. 2.1 Motion Compensated Temporal Filtering (MCTF) MCTF [54, 52, 53] is as its name suggests, the low pass filtering of the input frames along the motion direction. It is used to remove temporal correlation within the sequence [54]. Input frames need to be aligned along the motion trajectories for better de-correlation. A basic MCTF based on Haar wavelet transformation is described in Fig. 2.1. In Fig. 2.1, each odd frame is predicted from previous even frame using op- erator P . Predicted frame then subtracted from the original frame to obtain the residual error frame (high-pass temporal frame) or H-frame. Low-pass frame are generated by adding the residual information to the reference frame by the U (Up- date) operator, which performs an additional Motion Compensation (MC) stage using the reversed motion field. The Haar wavelet is a short filter and provides limited de-correlation. Longer length filter can make better use of correlation in 19TH-1469_10610110 2. RESEARCH BACKGROUND Figure 2.1: Motion Compensated Temporal Filtering(Figure courtesy [54]). the temporal domain. A DCT based temporal filtering which uses a longer filter is described in the next sub-section. 2.1.1 Motion Compensated DCT based Temporal Filter- ing (MCDCT-TF) MCTF have been used for video watermarking in [39, 40]. While these water- marking schemes used 2-tap Haar filter for MCTF, use of longer length filter can make better use of correlation in the temporal domain. Atta et al. used a DCT based temporal filtering (MCDCT-TF) in [56] for scalable video encoding. They have used size of Group of Frame (GOF) as 9. First temporal filtering is done on sub-GOF of 3 frames. For next level of temporal decomposition, they used 2 low pass frames from first level decomposition. In our work, a variant of MCDCT-TF is used for scalable watermarking. In the used MCDCT-TF, one low pass frame is used for the next level of filtering to avoid overlapping of watermark information. In this work, video sequence is divided into group of N frames, which are again subdivided into group of K frames. After applying K × 1 temporal DCT, a new sequence of N K low pass frames are formed. By recursive decomposition of the video sequence we finally get one low pass frame. Procedure of MCDTC-TF for GOFs of 3 frames is explained in Fig.2.2, where L,M and H are low, middle and high frequency frames respectively. For the simplicity, all equations are written with the above assumption (GOF=9, sub-GOF=3). 20TH-1469_10610110 2.1 Motion Compensated Temporal Filtering (MCTF) Figure 2.2: DCT based MCTF (L, M,H are low middle and high frequency frames respectively) The predicted frames can be given as Eqn. 2.1 and 2.2. • From first frame to second frame I13t+2[m,n] = I3t+1[m+ H 1−>2 , n + V 1−>2 ] (2.1) • From third frame to second frame I33t+2[m,n] = I3t+3[m+ H 3−>2 , n + V 3−>2 ] (2.2) where I3t+1, I3t+2 and I3t+3 are 3 frames in sub-GOF. (H 1−>2 ,V 1−>2 ), (H 3−>2 ,V 3−>2 ) are motion vectors of I3t+2 with respect to I3t+1 and I3t+3 respectively. After the motion compensation, frames are aligned along the motion trajectories. Now (3×1) temporal DCT (refer to Eqn. 2.3) is done pixel by pixel. Temporal DCT results three frequency level frames 21TH-1469_10610110 2. RESEARCH BACKGROUND • Low frequency high energy frame L • Middle level frequency frame M • High frequency and low energy frame H  L[m,n]M [m,n] H[m,n]  = A×  I13t+2[m,n]I3t+2[m,n] I33t+2[m,n]  (2.3) where A=  1√ 3 1√ 3 1√ 3 1√ 2 0 − 1√ 2 1√ 6 − √ 2 3 1√ 6  is 3× 1DCTkernel. 2.1.2 Inverse Motion Compensated DCT based Temporal Filtering (IMCDCT-TF) For the inverse transform, 3 × 1 inverse DCT is done on L, M and H as in equation 2.4. Then inverse motion compensation is done on result using Eqn. 2.5 and Eqn. 2.6  I13t+2[m,n]I3t+2[m,n] I33t+2[m,n]  = AT ×  L[m,n]M [m,n] H[m,n]  (2.4) where AT is transpose of matrix A I3t+1[m,n] = I 1 3t+2[m−H1−>2, n− V 1−>2], for fully connected pixels = I3t+1[m,n] , for all other (2.5) I3t+3[m,n] = I 3 3t+2[m−H3−>2, n− V 3−>2], for fully connected pixels = I3t+3[m,n] , for all other (2.6) 22TH-1469_10610110 2.1 Motion Compensated Temporal Filtering (MCTF) Figure 2.3: Connected and unconnected pixel, (a)Fully connected pixels, (b)Unconnected pixels, (c)Partially connected pixel, (d)One to many connection 2.1.3 Connected and unconnected pixels If the video frames are filtered along all motion trajectories, then some pixels are filtered more than once and some pixels are not filtered at all. To avoid this problem, Ohm [57] categorized pixels into two classes “connected” and “un- connected” by their estimated motion vectors. In [39], the problem of uncon- nected pixels in the MC temporal filtering phase was considered between every pair of input frames. Different strategy is required when MCDCT-TF is applied to three frames. In [56], pixels are categorized in three different categories. The basic mechanism of treating unconnected pixels with our DCT temporal analysis of three frames is illustrated in Fig. 2.3. During MCDCT-TF we get three kind pixels, 1. Fully connected pixels : In this case, the pixels of the second frame have a connection from first and third frame. White pixels in motion compensated frames in Fig 2.3 are connected pixels. 2. Unconnected pixels : These types of pixels occur only in first referee and third referee frames. Black pixels in motion compensated frames in Fig 2.3 are unconnected pixels. 23TH-1469_10610110 2. RESEARCH BACKGROUND 3. Partially connected pixel : When the pixel in the second frame is con- nected to the pixel in the first frame but unconnected to the pixel in the third frame or when the pixel in the second frame is connected to the pixel in the third frame but unconnected with the pixel in the first frame, those pixels are considered as partially connected. Grey pixels in Fig 2.3 are partially connected pixels. There are pixels in the reference frames (1st or 3rd) which are connected to more than one pixels in 2nd frame, any one of them are selected during motion compensation. 2.2 Scale Invariant Feature Transform It is observed in the literature that SIFT based watermarking performs well against RST attacks. The SIFT [30] algorithm extracts distinctive features of local image patches and is proved to be invariant to image scaling and rotation. SIFT descriptors are robust to noise and changes in illumination, distortion and viewpoint. These local invariant features are highly distinctive and are matched with a high probability against large image distortions. The SIFT descriptor ex- tracts features and their properties, such as the location (x, y), the scale (σ )and the orientation (θ). Four major steps for finding SIFT descriptors are (1) Scale-space extrema detection; (2) Keypoint localization; (3) Orientation assignment; (4) Key point descriptor. 1. Scale-space extrema detection Given an input image I(x,y), the scale space of image I can be defined as Difference of Gaussian (DOG) as in Eqn. 2.7 in [30]. DOG(x, y, σ) = [G(x, y, kσ)−G(x, y, σ)] ∗ I(x, y) = L(x, y, kσ)− l(x, y, σ) (2.7) where ∗ is the convolution operation in x and y directions, and G(x, y, σ) = 1 2piσ2 exp− x2+y2 σ2 (2.8) 24TH-1469_10610110 2.2 Scale Invariant Feature Transform is the Gaussian kernel and σ denotes the standard deviation of the Gaussian kernel. The scale-space extrema are detected from the points (x, y, σ) in scale-space at which the scale-normalized Laplacian assumes local extrema with respect to space and scale. In a discrete setting, such comparisons are usually made in relation to all neighbors of a point in a 3 × 3 × 3 neighborhood over space and scale. 2. Key point localization: Many unstable keypoint candidates are detected in scale-space extrema de- tection. In this step, only stable keypoints are retained and unstable points are filtered out. The points with low contrast or located along the edges are rejected. Details are given in [30]. 3. Orientation assignment: All the retained key-points in the previous step are assigned to one or more orientations based on local image gradient direction to make it rotation invariant. The key-point orientation is calculated from an orientation his- togram of local gradients from the closest smoothed image L(x, y, σ). For each image sample L(x, y) at the key-points scale σ, the gradient magnitude mag(x, y) and orientation θ(x, y) is computed using pixel differences, using Eqn. 2.9 and Eqn. 2.10 respectively. mag(x, y) = √ L21 + L 2 2 (2.9) θ(x, y) = arctan(L2/L1) (2.10) where, L1 = L(x+ 1, y, σ)− L(x− 1, y, σ), and L2 = L(x, y + 1, σ)− L(x, y − 1, σ) Then a orientation histogram is formed and the peak of this histogram is selected as the direction of that feature. 25TH-1469_10610110 2. RESEARCH BACKGROUND 4. Keypoint Descriptors: In this part, a distinct key-point descriptor is computed for each key-point to make it invariant to illumination change, 3D view point etc. First the gradient magnitude and orientation at each image sample point in a region around the key-point location is computed in a 16×16 neighborhood. Then a weighted histogram of these samples for each of 16 4 × 4 sub-region is formed. A 128 bit descriptor is formed by concatenating 16 such histograms. Descriptor matching: To match these computed key point descriptors with another set of descriptors, the nearest keypoint i.e. one with having minimum Euclidean Distance has been found. To reduce the ambiguous matches, only those matches are selected for which the ratio between distances to the nearest and second nearest point is less than 0.8. 2.3 Visual saliency model RARE2012 In chapter 5, a saliency map is used for embedding zone selection. Brief descrip- tion of the saliency model named RARE2012 [58] is given in this section. In [58] authors used multi-scale rarity to select the information which attracts human attention. Saliency of a region is calculated in three major steps. First low-level color features and medium-level orientation features are extracted by principal component analysis and gabor filter. By convolving with Gabor filter at 8 differ- ent orientation, 8 maps (mapi) are generated. Each map then alloted an efficiency coefficient using Eqn. 2.11. Maps are sorted according to their efficiency and each map is weighted according to their rank using Eqn 2.12 as mentioned in [58]. EC2i = (maxi −meani)2 (2.11) ∀i ∈ [1, N ], { if ECi ECn ≥ T Mi = iN ×mapi else Mi = 0 (2.12) Then, a multi-scale rarity mechanism is applied on the combination of the maps. Finally, rarity maps are fused into a single final saliency map. Flow chart of the algorithm is given in Fig. 2.4. Authors claimed that the model out performs other recent attention models. 26TH-1469_10610110 2.3 Visual saliency model RARE2012 Figure 2.4: RARE2012 flow chart [58] 27TH-1469_10610110 2. RESEARCH BACKGROUND 2.4 Evaluation Parameters 2.4.1 Visual quality parameters In this work, Peak Signal to Noise Ration (PSNR), Structural Similarity (SSIM) [59], flicker metric [60] and Video Quality Metric (VQM) [61] are measured using MSU video quality measurement tool [62] to quantify the distortion due to wa- termark embedding. The evaluation parameters are described below. PSNR: PSNR is the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of the signal. PSNR is calculated using Eqn.2.13. PSNR = 10× log10 Peak2 ×M ×N M∑ i=1 N∑ j=1 (I(i, j)− Iw(i, j))2 (2.13) where Peak is maximum possible intensity, M and N are width and height of the frame/image respectively, I is the original image and Iw is modified image. SSIM: SSIM index is an image quality metric. It is function of three components eg. luminance similarity (l(x, y)), contrast similarity (c(x, y)) and structural sim- ilarity (S(x, y)) as given in Eqn. 2.14 SSIM(x, y) = [l(x, y)]α.[c(x, y)]γ.[s(x, y)]η (2.14) where α, β and η are parameters used to adjust the relative importance of the three components. In [62] SSIM is calculated for each frame. Detail description of the algorithm is given in [59]. Temporal Flicker: To determine the temporal flickering the average bright- ness value is first calculated for each of the frames. The flickering metric is calculated as modulus of difference between average brightness values of previous and current frames [61]. In this paper, flicker difference of original video and watermarked video is calculated. 28TH-1469_10610110 2.4 Evaluation Parameters VQM [61]: VQM is a DCT based video quality metric. To calculate VQM each DCT coefficients are converted to local contrast (LC) using Eqn. 2.15 LC(i, j) = DCT (i, j) ∗ Power(DC/1024, 0.65)/DC (2.15) where DC is the DC component of the block, 1024 is the mean DCT value for 8 bit image, 0.65 is the best parameter for fitting psychophysics data as claimed by the author of the refereed paper. Then LC coefficients are converted to just-noticeable differences (JND) by multiplying each DCT coefficient by its corresponding entry in the spatial contrast sensitivity function (SCSF) matrix. Then weighted pooling of mean and maximum distortion is done using equa- tion 2.16 Mean Dist = 1000 ∗mean(mean(abs(diff))) Max dist = 1000 ∗maximum(maximum(abs(diff))) VQM = (Mean dist + 0.005 ∗Max dist) (2.16) Maximum distortion weight parameter 0.005 is chosed based on several primitive psychophysics experiments. Parameter 1000 is the standardization ratio. In MSU (video quality measurement tool [62]) a modified version of [61] is implemented. Watson Metric: Watson metric [63] is a DCT based perceptual error mea- sure. The quantization errors for each coefficient in each block are scaled by the corresponding visual sensitivities of each DCT basis function in the block. The visual sensitivities are determined by three factors: contrast sensitivity, lu- minance masking and contrast masking. Initially luminance threshold (tij) for each DCT basis function is computed as a function of mean luminance of the display by taking contrast sensitivity into account. Then tij is adjusted by taking the approximation of luminance masking i.e. the local mean luminance within the image. The masked threshold mij is then computed by considering the con- trast masking. From the masked threshold mijk and quantization error eijk, the perceptual error in each frequency of each block is given by: dijk = eijk mijk (2.17) 29TH-1469_10610110 2. RESEARCH BACKGROUND The total perceptual error is then given by: d(i, i′) = 1 N2 [ ∑ i,j ( ∑ k dβsijk) βf βs ] 1 βf (2.18) Watson recommends βs = βf = 4. In our scheme, total perceptual error d(i, i′) is calculated by taking i as our original image and i′ as modified image. The details are given in [63]. 2.4.2 Robustness parameter To measure the robustness of the watermarking scheme, Hamming distance of the extracted watermark and the original watermark is calculated. The Hamming distance (H) between the original watermark W and the extracted watermark W ′ is calculated using Eqn.2.19. H = 1 L L∑ i=1 Wi ⊕W ′i (2.19) where L is length/size of the watermark signal. 2.5 Experimental Dataset The data set (video sequences and images) used for the experimentation covers a wide range of different video and images. Th data set are enlisted in Table 2.1. 2.6 Summery In this chapter few background concepts eg. MCTF, SIFT etc. as well as few evaluation parameter are explained. These concepts are in different phase of the watermarking schemes proposed in the later chapters. The evaluation parameters described are used to measure the efficiency of those scheme. Along with that dataset used for experimentation are also summarized in this chapter. 30TH-1469_10610110 2.6 Summery Bus, Foreman, Crew, Coast Guard, Mobile, Video Sequence Used Akiyo, City, Hall, Mother Daughter, Sunflower , Pedestrian area , News Complex Scene Saliency Dataset (CSSD) Image Data sets Extended Complex Scene Saliency Dataset (ECSSD) [64], Caltech 256 dataset [65], LabelMe-12-50k dataset [66], few other standard images eg. lena, cameraman, baboon etc. Video Resolution 1080p(Full HD), CIF, 4CIF Watermark signal 32× 32 and 64× 64 binary image Table 2.1: Experimental Data set 31TH-1469_10610110 TH-1469_10610110 Chapter3 Robust Video watermarking against Resolution and Quality Scalability 3.1 Introduction As mentioned in introduction chapter, there are two main characteristics of the scalable video watermarking. Firstly, watermark should be extractable from each layer and secondly, robustness of the watermark should be increased with increase of enhancement layers. It has been observed in the literature (refer to Sec. 1.2) that existing schemes against resolution scaling [43, 44] fail to achieve both of the above mentioned requirements for scalable watermarking. In this chapter, a video watermarking algorithm is proposed which is robust against spatial and quality adaptation attacks. In the proposed scheme, water- mark can be extracted from all the layers as well as the scheme has achieved the graceful improvement in the successive higher layers for both resolution and quality scaling. To handle the spatial and quality adaptation attack, in the proposed work, wa- termark is embedded by altering the coefficients of the motion compensated tem- poral filtered DC frames which are generated by accumulating DC values of non- overlapping blocks of original video frames. The proposed scheme is described in subsequent sections. 33TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY 3.2 Background In the proposed scheme, watermark is embedded in the DCT based motion com- pensated coefficients of the DC frames. DC frames are generated by accumulating DC values of non-overlapping blocks of given size from the original frame. The main motivation of the proposed scheme is to prevent the degradation of the em- bedded watermark signal when a controlled resolution scaling is performed on the watermarked video frame to achieve the resolution scalability. DC frame based embedding helps to handle this controlled resolution scaling. A detail discussion of the DC frame is presented in the Sec. 3.2.1. Another important motivation of the proposed scheme is to achieve the grace- ful improvement of the watermark signal with higher enhancement layers. The most essential requirement to achieve this improvement is to maintain the spatial synchronization of watermark embedding locations between every successive lay- ers. In other words, in every enhancement layer, two watermark signals are added (one is up-sampled from previous layer and other is embedded in the residual layer of the corresponding enhancement layer) and they should be co-located. Intu- itively, this spatial coherency achieves the graceful improvement of the embedded watermark signal in successive enhancement layers. The concept is elaborated in Sec. 3.2.2. In this work, motion compensated DCT based temporal filtering (MCDCT- TF) is done on the DC frames and low pass frames are selected for embedding. Embedding in the low pass frames, spreads the embedded watermark into all frames such that frame dropping and collusion attacks can be resisted. During the inverse MCDCT-TF, additive noise due to embedding affects the all frame in a GOP. Thus, the watermark is spread over all the frames in that GOP. Mo- tion coherent embedding helps to reduce temporal flickering. MCDCT-TF is discussed in Sec. 2.1.1. During MCDCT-TF, only temporally connected pixels are considered for temporal filtering and for watermark embedding. The concept of connected-unconnected pixels is narrated in Sec. 2.1.3. 34TH-1469_10610110 3.2 Background Figure 3.1: Block DCT of video frame 3.2.1 DC Frame The concept of DC frame based watermarking is first introduced by Y. Wang et. al. [47] where DC frame is generated by accumulating the DC value (after 2D block DCT transform) of non overlapping 8× 8 blocks within a video frame. Embedding in the DC frame results in the spreading of the watermark signal during the up-sampling process for generating enhancement layer frames. In the proposed scheme, the size of the DC frame is fixed and the non-overlapping block size are determined based on the size of full resolution video frame size. The DC frame formation in this work is depicted in the Fig. 3.1. Since the DC frame size (P ×Q) is fixed, the video frame (M ×N) is divided into (M P × N Q ) non-overlapping blocks. The DC frame is generated by accumulating all DC coefficients of such (M P × N Q ) non-overlapping blocks after 2D block DCT. 3.2.2 Graceful Improvement of the Watermark Graceful improvement [20] means, the improvement in the quality of the extracted watermark signal along with the increase video quality (addition of successive 35TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY enhancement layer with respect to the different scalable parameters). Intuitively, there are two main problems associated with this DC frame based embedding to achieve graceful improvement. Firstly, since enhancement layers are predicted from base layer, the embedded watermark signal in the base layer is also up- sampled in higher layers and there is a chance of watermark signal degradation when residual component are added at different higher layers. Secondly, as a consequence of first case, in every successive higher layers, watermark signal are getting degraded due to this error propagation to the watermark signal. So, the grace full up-gradation of the watermark signal is not possible, rather there is a chance for continuous degradation of the watermark signal towards the higher layers. Moreover, the enhancement layer residual component (to be added in each enhancement layer with the up-sampled version from the lower layer) is not secured. As a counter-measure, an up-sampled watermark can be separately added to the residual components of each enhancement layer which is to be added to the up-sampled version from the previous lower layer [44]. The main challenge in this proposed technique is that two watermark signal (one which is getting up-sampled with last lower layer of the original signal and the other which is embedded in residual component of the current enhancement layer) should be spatially collocated such that addition of two watermark signal should up-grade their signal strength rather degrade it. A location map (Lmap) is used in the proposed scheme to maintain this spatial coherency when adding two watermark signal in any enhancement layer. The proposed location map (Lmap) based tech- nique for achieving said spatial coherency is depicted in Fig. 3.2. It is observed in the Fig. 3.2 that the location map (Lmap) is obtained during the embedding in the DC frame. The location map (Lmap) which is a binary matrix is up-sampled (according to the resolution ratio of the base layer (DC frame) and corresponding enhancement layer) to get the required location map (LmapU) for a particular enhancement layer. The up-sampled location map (LmapU) is used during the embedding in the residual frame for that particular enhancement layer to achieve spatially co-located watermark embedding. 36TH-1469_10610110 3.3 Proposed Scheme Figure 3.2: Location Map based Technique for Spatial Coherency 3.3 Proposed Scheme 3.3.1 Watermark Embedding Scheme In this subsection, proposed watermarking embedding scheme has been described. The embedding scheme has three modules. Firstly, in Sec. 3.3.1, embedding zone selection is discussed using spatio-temporal filtering with the help of the robustness as well as the visual quality thresholds. In Sec. 3.3.1 and Sec. 3.3.1, base layer embedding and enhancement layer embedding schemes are presented respectively. The block diagram for the overall watermark embedding scheme is depicted in the Fig. 3.3. Watermark Zone Selection In this subsection, block based zone selection is described to embed the water- mark signal. Each frame of the video sequence I is subjected to 2D Block DCT 37TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Figure 3.3: Watermark Embedding Model transform as described in Fig. 3.1 and the DC values of the non overlapping blocks of size (M P × N Q ) (Sec. 3.2.1) are accumulated to obtain the DC frame sequence C which is considered as the base layer. The base layer frames i.e. DC frames (C) are up-sampled to obtain the pre- dicted enhancement layer frame sequence (say E) using Eqn. 3.1 Ei =↑ Ci (3.1) where i is the frame index. Up-sampled DC frames is subtracted from the orig- inal enhancement layer video frames to get the residual frame sequence R using Eqn. 3.2. Ri = Vi − Ei (3.2) 38TH-1469_10610110 3.3 Proposed Scheme MCDCT-TF has been employed (as described in Fig. 2.2) on the extracted base layer as well as residual of enhancement layers. The motion compensation has been done using Eqn. 2.1, 2.2 and the connected and unconnected pixel regions are identified as shown in Fig. 2.3. Using the Eqn. 2.3, the DCT based temporal filtering has been done to get the low pass coefficients Ct from motion compensated base layer and Rt from motion compensated residual layer. In the proposed watermarking scheme, the watermark is embedded in base layer (DC frame) by modifying the DC values. Three consecutive coefficient in low pass frame (namely Ct1, Ct2, Ct3) are used for embedding one watermark bit. It may sometime happen that the absolute difference between Ct2 and the average of Ct1 and Ct3 (i.e. |Ct2 − (Ct1 + Ct3)/2|) is relatively high. For this case, pro- posed embedding scheme adds relatively higher noise which may cause flickering artifacts [60]. As a countermeasure, an adaptive threshold [say Visual Quality Threshold (Vth)] is incorporated which is described in Eqn. 3.3 to select the suit- able coefficients for watermark embedding such that the embedding noise will be under an acceptable limit. The value of the threshold is (Ct1 + Ct3) ∗ 2α, (where α is the robustness threshold) which is used to increase the embedding strength of the watermark to make it robust against the content adaptation attack. Gen- erally the value of α is taken very close to 0 to avoid the objectionable artifacts. In this scheme, α value is taken as 0.01 for experimentation. if ∣∣(Ct1+Ct3 2 − Ct2) ∣∣ 6 Vth then the coefficients trio are selected for embedding else the coefficients trio are rejected  (3.3) where | | represents absolute value, Ct1, Ct1 and Ct3 are three consecutive coeffi- cient in low pass frames. In the proposed scheme, if the coefficients do not satisfy the visual threshold then the middle value is increased (or decreased) to a certain limit to mark the corresponding set of coefficients unsuitable for the embedding such that at the time of extraction, decoder can identify the set easily as non- embedded set. A location map (Lmap) is derived during the time of base layer (DC frame) embedding which is used to select embedding coefficients during any enhancement layer embedding (refer to Sec. 3.2.2). 39TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Base Layer Embedding In the proposed scheme, the watermark is embedded into the motion compen- sated low pass DC frames as shown in Fig. 2.2. The proposed blind watermark embedding scheme is depicted in Fig.3.3. A set of 3 consecutive coefficients [Ct1, Ct2, Ct3] which satisfies the visual quality threshold (refer to Eqn. 3.3) is selected for embedding using Eqn. 3.4. Ct′2 = (Ct1 + Ct3) 2 + ∣∣∣∣(Ct1 + Ct3)2 ∣∣∣∣ ∗ α ∗Wi (3.4) where Wi ∈ (0, 1) is the watermark bit and α is the robustness threshold (watermarking strength) (refer to Sec. 3.3.1 ). Ct′2 is the watermarked coeffi- cient corresponding to Ct2. The embedding location is saved in a location map (Lmap). An up-sampled version of the location map (LmapU) is used to locate the spatial coherent locations in the residual frame for a particular enhancement layer embedding. After embedding of watermark bits, the motion compensated inverse temporal filtering (IMCDCT-TF) is done to get base layer watermarked video sequence C ′. Enhancement Layer Embedding Similar to the base layer, the residual frame sequence (R) is partitioned into non overlapping set of 3 residual frames in the temporal direction. For such a set of 3 residual frames (say R1, R2, R3), R 2 1 and R 2 3 are predicted from R1 and R3 respectively using the motion vectors 1-D temporal DCT of R 2 1, R2 and R 2 3 is calculated using Eqn. 2.3 to generate the low pass temporal filtered residual frame Rt. Then the Base layer location map (Lmap) is up-sampled to the size of residual layer to detect the watermark regions in the low pass temporal residual layer. The watermarking region of the low pass residual frame (Rt) is again parti- tioned into a non-overlapping set of 3 consecutive coefficients (say Rt(k), Rt(k+1) and Rt(k + 2)) and the watermark is embedded using the Eqn. 3.5. Rt′(k + 1) = (Rt(k) +Rt(k + 2)) 2 + ∣∣∣∣(Rt(k) +Rt(k + 2))2 ∣∣∣∣ ∗ α ∗Wi (3.5) 40TH-1469_10610110 3.3 Proposed Scheme where Wi is the same watermark bit embedded in spatially coherent base layer (low pass DC frame) coefficients and α is the watermarking strength. Rt′2 is watermarked coefficient corresponding to coefficient Rt2 of the residual layer. After embedding, the low pass residual frames are subjected to the IMCDCT- TF to get watermarked residual for the enhancement layer sequence as Rw. The base layer watermark video coefficient C ′ is up-sampled to the enhancement layer using Eqn. 3.1 to E ′ and added with Rw to get the watermarked video sequence. The overall watermark embedding procedure is narrated in Algorithm 3.1 and 3.2. 3.3.2 Extraction Scheme Watermark extraction for spatial scalability is done for base layer as well as from residual layer separately. The enhancement layer watermark is derived by using the base and enhancement layer watermark. Extraction of Base Layer Watermark To extract the base layer watermark, the pre-processing steps are same as the em- bedding scheme as shown in Fig. 3.4. For the content authentication of base layer, first DC frame sequence (C´) is formed and MCDCT-TF is done on C´. Similar to the embedding scheme, the visual threshold is used to detect the embedding zone. Watermark is extracted from a set of three consecutive non-overlapping coefficient Ct′1, Ct ′ 2, Ct ′ 3 from the detected watermarked block using Eqn. 3.6. W ′bi = 0 if Ct ′ 2 6 (Ct′1+Ct′3) 2 W ′bi = 1 if Ct ′ 2 > (Ct′1+Ct′3) 2 } (3.6) Similar to the embedding scheme, the watermark locations are saved in a lo- cation map (Lmap). A up-sampled version of the location map (Lmap) is used to locate the spatial coherent locations in the residual frame for a particular en- hancement layer extraction. Extraction scheme is describe stepwise in Algorithm 3.3. 41TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Algorithm 3.1: Embedding Algorithm (V, α,W ) Input: V :Raw Video, and α: Watermark Strength W: Watermark Bit stream Output: V w:Watermarked Video 1. /∗ Generate the DC frame (base layer) video Ci from the Raw video V ∗/ for each video frame (Vi) from raw video sequence V do (a) Partition video frame (Vi) into non overlapping blocks of size M P × NQ as Fig. 3.1. M ×N is the frame size and P ×Q is fixed DC frame size. (b) Accumulate DC values of each blocks to obtain the DC frame (Ci) for the corresponding raw video frame (Vi) (refer to Fig. 3.1). (c) Up-sample DC frame (Ci) and subtract from the raw video frame (Vi) to get the residual video frame (Ri) using Eqn. 3.1, 3.2. 2. Partition the DC frame sequence (say C) corresponding to whole raw video sequence (V ) into non overlapping set of k DC frames in temporal direction. In this algorithm, k is taken as 3 for experimental basis. 3. For such a set of 3 DC frames {C1, C2, C3}, calculate the motion vectors (MV ) and predict the C2 from the C1 and C3 using MV s. Let C 1 2 and C 3 2 are predicted frames respectively. 4. Calculate coefficient wise temporal DCT of C12 , C2 and C 3 2 using Eqn. 2.3 to generate the low pass temporal filtered DC frame Ct. 5. Partitioned the low pass DC frame (Ct) into a non-overlapping set of 3 consecutive coefficients for each such set of 3 coefficients do • if the corresponding set satisfy the visual threshold (refer Eqn. 3.3) then (a) Embed the watermark in the selected coefficient set using Eqn. 3.4. (b) Take watermark reference location from the base layer watermarking to location map Lmap. 6. Do the Inverse MCDCT-TF to get the watermarked DC frame sequence C ′. 7. Up-sample the watermarked DC frame to a required enhancement layer. 8. /∗Call the residual layer embedding function to get the residual watermark frame Rw as described in Algorithm3.2 ∗/. Rw = Residual Embedding(R,α,W,Lmap,MV ) 9. Add the up-sampled watermark DC frame and watermark residual layer Rw to get the watermark video V w. 42TH-1469_10610110 3.3 Proposed Scheme Algorithm 3.2: Residual Embedding (R,α,W,Lmap,MV ) Input: R:Residual Layer, α: Watermark Strength W: Watermark Bit stream, Lmap: Base layer location map and MV: Motion vector Output: Rw:Watermarked Residual Layer 1. The Residual frame sequence (say R) is partitioned in to non overlapping set of k residual frames in temporal direction. in this algorithm, k is taken to 3 for experimental basis. 2. For such a set of 3 residual frames {R1, R2, R3} using the MV the R12and R32 are predicted from the R1and R3. 3. Calculate coefficient wise temporal DCT of R12,R2and R 3 2 using Eqn. 2.3 to generate the low pass temporal filtered residual frame Rt. 4. Take the Base layer location map (Lmap) and up-sample to size of residual layer(Lmapu) to detect the watermark regions in the low pass temporal residual layer. 5. The watermarking region of the low pass residual frame (Rt) is again partitioned into a non-overlapping set of 3 consecutive coefficients. for each such set of 3 coefficients do • Embed the watermark in the residual coefficient group using Eqn. 3.5. 6. Do the IMCDCT-TF on the watermark low pass residual layer to get the watermarked residual layer Rw. 7. return (Rw) 43TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Figure 3.4: Watermark Extraction Model Extraction of Enhancement Layer Watermark Extraction of watermark from the residual layer is important for content authen- tication of the corresponding enhancement layer. The extraction scheme is same as the embedding scheme of the residual layer. Base layer location map (Lmap) is up-sampled to the size of residual layer (Lmapu) to detect the watermark regions in the low pass temporal residual layer. The watermarked regions of the low pass watermarked residual frame (Rt′) are again partitioned into a non-overlapping set of 3 consecutive coefficients (say Rt′(k), Rt′(k+ 1) and Rt′(k+ 2) and extract the watermark using Eqn. 3.7. Wt′i = bN/3c∑ k=0 (Rt′(3k) +Rt′(3k + 2))/2−Rt′(3k + 1) |(Rt′(3k) +Rt′(3k + 2))/2−Rt′(3k + 1)| (3.7) 44TH-1469_10610110 3.3 Proposed Scheme where N is number of residual coefficient in a block coherent to a single wa- termarked coefficient in base layer (low pass DC frame) as shown in Fig. 3.2. After extraction of each block, the binary watermark is generated using Eqn. 3.8. Stepwise residual layer extraction is presented in Algorithm 3.4. W ′ri = 0 if Wt ′ i 6 0 W ′ri = 1 if Wt ′ i > 0 } (3.8) The base layer and the residual layer watermark is combined to generate the enhancement layer watermark using Eqn. 3.9. W ′ = W ′b ⋃ W ′r (3.9) 3.3.3 Embedding Capacity Embedding capacity of the proposed scheme depends on two factors, the visual quality threshold given in Eqn.3.3 and the number of connected coefficients ob- tained from temporal filtering. Let the number of connected coefficients per frame is η. So, at most η/3 watermarking bits can be embedded. The embedding ca- pacity is further reduced by the visual quality threshold. Intuitively embedding capacity for GOP directly proportional to η and inversely proportional to Vth. 45TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Algorithm 3.3: Extraction Algorithm (V w, α) Input: V w:Watermarked Video, and α: Watermark Strength Output: W ′b: Extracted base layer watermark, W ′ e: Extracted enhancement layer watermark 1. /∗ Generate the DC frame (base layer) watermarked video C ′ from the watermarked video V w as described in Sec. 3.2.1 ∗/ for each video frames V wi from watermarked video sequence V w do • The watermarked video frame V wi is partitioned to non overlapping blocks of size M P × NQ as on Fig. 3.1. • DC values of each blocks are accumulated to obtain the watermarked DC frame (C ′i) for the corresponding watermarked video frame V w i refer to Fig. 3.1. • Watermarked DC frame (C ′i) is up-sampled and subtracted from the watermarked video frame (V wi ) to get the watermarked residual video frames (Rwi) using Eqn. 3.1, 3.2. 2. The watermarked DC frame sequence (say C ′) corresponding to whole watermarked video sequence (V w) is partitioned into non overlapping set of k DC frames in temporal direction. In this algorithm, k is taken as 3 for experimental purpose. 3. For such a set of 3 watermarked DC frames {C ′1, C ′2, C ′3} the motion vectors (MV ) is calculated and the C ′12 and C ′3 2 are predicted from the C ′ 1and C ′ 3 using motion MV . 4. Calculate coefficient wise temporal DCT of C ′12 ,C ′ 2and C ′3 2 using Eqn. 2.3 to generate the low pass temporal filtered watermarked DC frame Ct′. 5. The low pass watermarked DC frame(Ct′) is again partitioned into a non-overlapping set of 3 consecutive coefficients. for each such set of 3 coefficients do • if the corresponding set satisfy the visual threshold (refer Eqn. 3.3) then (a) Extract the watermark W ′b in the selected coefficient group using Eqn. 3.6. (b) Take watermark reference location from the base layer extraction to location map Lmap. 6. /∗ Call the residual layer extraction function to get the residual watermark W ′r as described in Algorithm 3.4.∗/ W ′r = Residual Extraction(Rw,α, Lmap,MV ) 7. Generate the enhancement layer watermark W ′e using Eqn. 3.9 by combining the base and residual layer watermark. 46TH-1469_10610110 3.3 Proposed Scheme Algorithm 3.4: Residual Extraction (Rw,α, Lmap,MV ) Input: Rw: Watermarked Residual Layer, α: Watermark Strength, Lmap: Base layer location map and MV: Motion vector Output: W ′r: Watermark extracted from the Residual Layer 1. The watermarked Residual frame sequence (say Rw ) is partitioned into non overlapping set of k residual frames in temporal direction. In this algorithm, k is taken to 3 for experimental basis as on embedding scheme. 2. For such a set of 3 watermarked residual frames {Rw1, Rw2, Rw3} using the MV the Rw12 and Rw 3 2 are predicted from the Rw1 and Rw3. 3. Calculate coefficient-wise temporal DCT of Rw12, Rw2 and Rw 3 2 using Eqn. 2.3 to generate the low pass temporal filtered watermarked residual frame Rt. 4. Take the Base layer location map (Lmap) and up-sample to size of watermarked residual layer to detect the watermarked regions in the low pass temporal watermarked residual layer. 5. The watermarking region of the low pass watermarked residual frame (Rwt) is again partitioned into a non-overlapping set of 3 consecutive coefficients. for each such set of 3 coefficients do • Extract the watermark from the residual layer using Eqn. 3.7. 6. Generate the residual binary watermark W ′r using Eqn. 3.8. 7. return (W ′r) 47TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Table 3.1: Experimental Setup Parameters for Values Taken Watermarking Encoder H.264/SVC version Reference Software JSVM Bus, City, Crew, Coastguard, Mobile, Video Sequence Used Akiyo, Hall, Mother Daughter, Sunflower , Foreman, News Video Resolution 4CIF, CIF, 720p Watermark signal 32× 32 and 64× 64 binary image Visual Quality Metrics PSNR, flicker metric, VQM, SSIM Robustness metric Hamming distance 3.4 Experimental Results The proposed method is tested on different High Definition (Elephants dream, Sunflower, Pedestrian area) and CIF (Foreman, Akiyo, Bus) video sequences. Videos with different motion characteristic are selected for the experimentation. Pedestrian area and Bus sequences are high motion video where Sunflower and Akiyo have relatively low motion . A 64× 64 binary image is used as watermark signal. The fixed DC frame size is taken as 88 × 72. The watermark signal is embedded in the luma component using the proposed watermarking scheme. PSNR, SSIM [59] VQM [61] and flicker metric [60] are measured using the MSU VQM Tool [62] to evaluate the visual quality for the proposed scheme. To measure the robustness of the scheme, hamming distance is used. The robustness analysis is done at different resolutions extracted from H.264/SVC encoded bit stream. Whole experimental setup is tabulated in Table 3.1. 3.4.1 Visual Quality The visual quality of the proposed scheme is compared with the related spatial scalable scheme proposed by Y. Wang and A. Pearmain [47] and quality scalable watermarking scheme proposed by Bhowmik et al. [39]. In Y. Wang and A. 48TH-1469_10610110 3.4 Experimental Results Pearmain scheme [47], the watermark is embedded in the 2nd frame of GOP of 3 frames. So for the comparison, PSNR, VQM, SSIM are calculated on the 2nd frame of GOP of 3 consecutive frames. The flicker metric [60] is used to find out the blinking effect of the video, it has been measured on the full video sequence. The comparative result of the proposed scheme with the Wang and A. Pear- main’s scheme [47] and Bhowmik’s scheme [39] with respect to PSNR, flicker metric, VQM and SSIM has been depicted in Fig. 3.5, 3.6, 3.7 and 3.8 respec- tively. PSNR comparison for Pedestrian area, Sunflower, Akiyo and Bus video is shown in Fig. 3.5. The average PSNR for the proposed scheme is observed as close as 40dB for all the videos which may be regarded as acceptable quality. It can be also observed that the proposed scheme is much better than that of Wang’s scheme [47] and average PSNR of all the frames is better than Bhowmik’s scheme [39]. In Fig. 3.6, absolute difference of the flicker metric between original and watermarked video is shown. Because of motion coherent embedding, flicker dif- ference is close to zero for the proposed scheme. In Bhwomik’s scheme, despite use of MCTF, flickering is high because of shorter filter length. Fig. 3.7 depicts the VQM comparison of above mentioned three schemes. Lower value of VQM metric means better visual quality [61]. Fig. 3.7 proves that the visual quality of the proposed scheme is better than the existing schemes. The comparison results for the Pedestrian area and Sunflower video shows that the proposed scheme produces less visual artifact than Wang’s scheme [47] as well as Bhowmik’s scheme [39] for both the very low and very high motion HD videos. 3.4.2 Robustness Robustness of the proposed watermarking scheme is measured by means of Ham- ming distance between the original watermark and the extracted watermark af- ter H.264/SVC content adaptation attack. The watermark is extracted from the different resolution layers after the watermarked video is compressed with the scalable encoder H.264/SVC with 5 different possible scaled versions. • Full resolution video. 49TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY (a) Pedestrian area video (b) Sunflower video (c) Akiyo video (d) Bus video Figure 3.5: PSNR comparison • Vertical 1 2 resolution video • Horizontal 1 2 resolution video • Vertical 1 2 resolution & Horizontal 1 2 resolution video • Random (down sampled) resolution video Video sequences with different resolutions at different bit rates (quality with respect to QP) are evaluated to analyze the robustness of the proposed scheme against resolution and quality adaptation attacks. Table 3.2 gives the the ham- ming distance of the watermark signal extracted from base layer and enhancement layers of watermarked Bus video at different bit rates. Tables 3.3, 3.4, 3.5 present similar results for Akiyo , Pedestrian area and Sunflower video. 50TH-1469_10610110 3.4 Experimental Results (a) Pedestrian area video (b) Sunflower video (c) Akiyo video (d) Bus video Figure 3.6: Flicker Metric comparison From Tables 3.2 to 3.5, it is observed that the quality (hamming distance between extracted watermark and the original watermark) of the extracted water- mark for enhancement layer is relatively higher than that of base layer. Moreover, it is also observed that robustness is increasing with higher levels of enhancement layer. This observation advocates the proposed claim of graceful improvement. Robustness of the proposed scheme has been compared with the Wang and A. Pearmain’s scheme [47] and Bhowmik’s scheme [39] for Bus, Akiyo, Pedes- trian area and Sunflower videos. The watermarked raw videos are encoded with H.264/SVC video encoder and the extracted watermark is compared with the original watermark to measure the robustness of the scheme against scalable adaptation. Since, in the Wang et. al.’s scheme [47], the full resolution video is necessary for watermark extraction, the comparison with proposed scheme is done only for high resolution versions of the video sequences. In Fig. 3.9, the 51TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY (a) Pedestrian area video (b) Sunflower video (c) Akiyo video (d) bus video Figure 3.7: VQM comparison comparison of Hamming distance (as robustness metric) between proposed and existing schemes [39, 47] are presented. It is observed from the Fig. 3.9 that the proposed scheme for enhanced layer video as well as base layer video performs better than both schemes with respect to hamming distance. From the results depicted from Fig. 3.9, it is observed that proposed scheme provides better performance for both base and enhancement layer video sequence than that of existing schemes [39, 47] . From Table 3.2 to 3.5, it is evident that both base layer and the enhanced layers are secured by the proposed scheme. Moreover, the graceful improvement has been achieved for successive enhance- ment layers of the on video sequences. 52TH-1469_10610110 3.5 Conclusion (a) Pedestrian area video (b) Sunflower video (c) Akiyo video (d) Bus video Figure 3.8: SSIM comparison 3.5 Conclusion In this chapter, a DC frame based blind watermarking scheme has been pro- posed which can resist resolution scalability. A DCT based spatial filtering and a MCDCT based temporal filtering are used to find the suitable embedding zone for the embedding. Moreover, a robustness threshold and a visual quality thresh- old are used to enhance visual quality and robustness of the proposed scheme. A comprehensive set of experiments have been carried out to justify that the proposed scheme is performing better than the existing related works [39, 47] with respect to visual quality as well as robustness against resolution scalability. Proposed scheme ensures a graceful improvement of the extracted watermarking signal with successive enhancement layers. In this chapter, the watermarking issues are discussed only for the resolution and quality adaptation. In the sim- 53TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Table 3.2: Hamming distance of the extracted watermark from scalable CIF Bus video Hamming distance of Bus video Watermark Resolution QP Bit rate kbit/s Base layer Enhancement layer 352× 288 26 2600 0.02356 0.01866 28 2200 0.04121 0.03015 30 1910 0.05453 0.03848 32 1490 0.075921 0.05137 176× 288 26 1238 0.03485 0.02626 28 957 0.05914 0.04161 30 827 0.07023 0.04824 32 514 0.09756 0.06447 352× 144 26 1352 0.03158 0.02453 28 1138 0.06042 0.04168 30 1023 0.07438 0.04978 32 775 0.1048 0.06633 176× 144 26 385 0.06584 0.04428 28 294 0.1014 0.06394 30 246 0.1252 0.07423 32 156 0.1523 0.09213 272× 192 26 691 0.0486 0.03536 28 538 0.05996 0.04511 30 445 0.06953 0.05487 32 263 0.08659 0.07509 ilar line of thought, temporal adaptation should also be analyzed. In the next chapter, the temporal scalability issue has been considered. 54TH-1469_10610110 3.5 Conclusion Table 3.3: Hamming distance of the extracted watermark from scalable CIF Akiyo video Hamming distance of Akiyo video Watermark Resolution QP Bit rate kbit/s Base layer Enhancement layer 352× 288 26 752 0.01056 0.0091 28 690 0.01489 0.01249 30 566 0.02298 0.01874 32 443 0.03156 0.02523 176× 288 26 345 0.01659 0.01386 28 238 0.02392 0.01958 30 197 0.02792 0.02264 32 84 0.03784 0.03009 352× 144 26 433 0.01523 0.01286 28 336 0.02352 0.01935 30 295 0.02824 0.02301 32 201 0.03679 0.02970 176× 144 26 135 0.02153 0.01778 28 90 0.03295 0.02637 30 72 0.03825 0.03027 32 28 0.05295 0.04118 272× 192 26 222 0.01954 0.01622 28 538 0.02652 0.02199 30 445 0.02955 0.02437 32 263 0.04895 0.03776 55TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY Table 3.4: Hamming distance of the extracted watermark from scalable HD Pedestrian area video Hamming distance of Pedestrian area video Watermark Resolution QP Bit rate kbit/s Base layer Enhancement layer 1280× 720 26 6621 0.02123 0.01583 28 5624 0.0389 0.02627 30 4800 0.05192 0.03404 32 4214 0.05823 0.03837 640× 720 26 3971 0.03156 0.02273 28 957 0.05055 0.03463 30 827 0.06486 0.04299 32 514 0.07684 0.04993 1280× 360 26 4166 0.03042 0.02201 28 3313 0.04795 0.03134 30 2782 0.05806 0.03667 32 2120 0.07013 0.04291 640× 360 26 2725 0.04296 0.02992 28 1962 0.0564 0.03763 30 246 0.06605 0.04273 32 156 0.08265 0.05136 320× 144 26 827 0.07958 0.04744 28 600 0.127 0.07049 30 480 0.15055 0.08214 32 285 0.18754 0.10209 56TH-1469_10610110 3.5 Conclusion Table 3.5: Hamming distance of the extracted watermark from scalable HD Sunflower video Hamming distance of Sunflower video Watermark Resolution QP Bit rate kbit/s Base layer Enhancement layer 1280× 720 26 5622 0.013 0.01017 28 4600 0.02534 0.01825 30 3986 0.03306 0.02333 32 3343 0.0397 0.02813 640× 720 26 3008 0.0162 0.01256 28 2200 0.02729 0.01977 30 1500 0.03773 0.02639 32 964 0.04675 0.03209 1280× 360 26 3174 0.01454 0.01172 28 2994 0.01751 0.01376 30 2717 0.02206 0.01676 32 1274 0.04256 0.02984 640× 360 26 2063 0.02136 0.01647 28 1390 0.0379 0.02651 30 1109 0.04528 0.03104 32 545 0.06185 0.03987 320× 144 26 552 0.03846 0.02544 28 362 0.0539 0.03584 30 283 0.06085 0.04013 32 125 0.07445 0.04859 57TH-1469_10610110 3. ROBUST VIDEO WATERMARKING AGAINST RESOLUTION AND QUALITY SCALABILITY (a) Pedestrian area video (b) Sunflower video (c) Akiyo video (d) Bus video Figure 3.9: Robustness comparison 58TH-1469_10610110 Chapter4 Robust Video watermarking against Temporal and Quality Scalability 4.1 Introduction In this chapter, a scalable watermarking scheme is proposed where temporal and quality adaptations have been considered. The main issue of the temporal adapta- tion is to handle the reduction of the frame rate (frame dropping). The proposed work can also resist the temporal desynchronization attacks where random frame dropping or frame averaging is done intentionally. There are some non-hostile situations where the video frames can be dropped for example network conges- tion, buffer overload at end using devices etc. All of these cases essentially causes some sort of temporal desynchronization and can be handled by the proposed work. It is observed that the number of frames which have been dropped in case of scalable adaptation is very high in comparison with general frame dropping attacks. But the pattern of frame dropping in temporal adaptation is generally known a priory which is in general random in nature for the frame dropping at- tacks. Frame by frame watermarking (inserting watermark in each of the video frames) may be a naive solution to this problem but it has some serious disad- vantages. Firstly, frame by frame watermarking is in general vulnerable against collusion attacks type I and II [17, 40] where simple frame averaging can be used to estimate the watermark. Moreover, simple frame by frame watermarking may 59TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY cause flickering artifacts [39, 17] due to presence of inter frame motion if proper motion compensation is not taken care off at the time of embedding. There exists few watermarking schemes in the literature which have addressed the problem of temporal desynchronization. Chong et al. [67] proposed a RST invariant watermarking scheme which is also resilient to the frame dropping at- tack. In this scheme [67], authors have claimed that the average DC energy of a frame is RST invariant thus the DC energy histogram can be used to embed the watermark. Although this approach performs well against random frame drop- ping as well as temporal scalability, embedding capacity of the scheme is very low. In another scheme [68], authors have proposed a blind video watermarking method against frame dropping and frame averaging attack based on 3D-DWT transform. They showed that temporal high frequency coefficients are orthogo- nal to normally distributed watermark with zero mean and used high frequency band for embedding. It is experimentally observed that the scheme [68] fails if relatively large number of frames (more than 30%) have been dropped. In overall study , it has been observed that relatively less attention has been paid to the scalable watermarking to resist temporal adaptation until recently. Most of the existing schemes are not performing well against temporal scaling if frame rate adaptation (number of frames to be dropped) is relatively high. In this chapter, a semi-blind watermarking scheme is proposed against tem- poral and quality scaling attacks where watermark is embedded in the motion compensated low pass frames. It is semi-blind because although original video is not required for the extraction but a location map describing watermark em- bedding locations is used. Due to embedding in low pass frames, watermark gets distributed among all the frames. Proposed scheme is tested over a large set of standard videos and the result shows that it performs well even at very high frame dropping rate. The scheme is described in subsequent sections. 4.2 Proposed Scheme One of the main challenges in scalable video watermarking is to choose appropri- ate locations for embedding. In the proposed scheme, LL1 (Fig.[4.1]) subband of the each frame after 2-level of wavelet decomposition is chosen for embedding to 60TH-1469_10610110 4.2 Proposed Scheme make it robust against quality scalability. After spatial decomposition, motion compensated temporal decomposition is done on the LL1 subbands of all frames and low pass version of the LL1 sub-bands are used for embedding. Due to embedding in low pass frames of the LL1 sub-bands, watermark information gets spread over all the frames which can be extracted even after the frame dropping attack. Moreover, the embedded watermark in low pass frames spreads over the motion coherent locations due to motion com- pensated temporal filtering which reduces the flicker artifacts [39, 60]. Morover. embedding in the motion coherent regions helps to resist the collusion attacks [40]. In this work, DCT based motion compensated temporal filtering [56] is used. Temporal filtering is done on GOF of size 9 frames which can be generalized for any number of frames. At first, the temporal filtering is done on sub-GOF of 3 frames. Then 2nd level of decomposition is done on 3 low pass frames generated from 3 consecutive sub-GOF’s as shown in Fig. 4.2. During motion compensation among three frames, we get three different set of pixels having different motion characteristic. Those three categories are connected, unconnected and partially connected pixels (refer to chapter 2). These different pixels are needed to be handled differently. Only connected pixels are used for embedding. Unconnected and partially connected pixels are stored and used to get the watermarked video. Concept of MCDCT-TF is explained in details in Sec. 2.1.1 and how pixels are categorized into connected-unconnected sets is described in Sec. 2.1.3. In the proposed scheme, watermark is extracted from each of the frames to make it robust against frame dropping. But during inverse motion compensation, wa- termark coefficients may change their locations in motion direction. To resist this desynchronization, in this work, embedding locations are saved during embedding in a map (say LocationMap). Generation of location map and whole embedding process is described in subsequent sub-sections. 4.2.1 Location Map Because of temporal scalability, all embedded frames may not be available at the receiver side. So it may not be possible to get the same GOF structure during watermark extraction. But the additive watermark which is embedded in low 61TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY Figure 4.1: Spatial Decomposition Figure 4.2: DCT based Motion Compensated Temporal Filtering (MCDCT-TF) 62TH-1469_10610110 4.2 Proposed Scheme pass frames will be distributed in all the frames during the reconstruction of watermarked video. The non-zero coefficients obtained from 2nd level temporal decomposition, satisfying a given selection criteria, are used for embedding. For every low pass frames, a location map is generated to store the embedding lo- cations as shown in Fig 4.4. In Fig 4.4, I3t+1, I3t+2 and I3t+3 are 3 consecutive original frames. Arrows on frame I3t+1 and I3t+3 are motion directions of 4 × 4 non-overlapping blocks. To predict the frame I3t+2 from I3t+1 and I3t+3, motion compensation is done where I13t+2 and I 3 3t+2 are the predicted frames. From the predicted frames, fully connected pixel locations are generated. All connected coefficients are passed through coefficient selection procedure. In the Fig.4.4, co- efficients which are marked with non zero numbers in the location map are the embeddable coefficient. Zeros in gray area (in location map of Fig. 4.4) rep- resents the coefficients which are rejected by the coefficient selection procedure. To capture the embedding locations in the upper layer frames, Location maps are also subjected to the (inverse) motion compensation using Eqn. 4.1-4.2. The procedure is shown in Fig 4.3. Lmapl+13t+1[m+ H 1−>2 , n + V 1−>2 ] = Lmap l3t [m, n] (4.1) Lmapl+13t+3[m+ H 3−>2 , n + V 3−>2 ] = Lmap l3t [m, n] (4.2) where Lmapl3t is the location map of 3t th frame at lth temporal layer. (H 1−>2 ,V 1−>2 ) (H 3−>2 ,V 3−>2 ) are MV’s of I3t+2 with respect to I3t+1, I3t+3 respectively. Size of the location map is directly proportional to size of the video frame. If frame size of the video is M × N , then size of the location map for that frame will be M 2 × N 2 . 4.2.2 Embedding Scheme In this sub-section, proposed embedding scheme is described. For watermark embedding, a blind scheme has been employed. The watermark embedding rule is as follows: A < B if wb = 0 A > B if wb = 1 } (4.3) 63TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY Figure 4.3: Inverse motion compensation of the Location Map where wb is watermark bit and A, B are two embeddable coefficient. Embed- ding rule is implemented in Algorithm 4.1. Overall embedding scheme is depicted in Fig.[4.5]. A step by step embedding scheme is given in Algorithm 4.2. Algorithm 4.1: Embedbit(A,B,wb, δ) Input: A and B: Two consecutive coefficients, and δ: Watermark Strength, wb: Watermark bit Output: A′ and B′:Watermarked Coefficient 1 begin 2 if wb==1 then 3 if A−B < δ then 4 A′ = A+ (δ − (A−B))/2; 5 B′ = B − (δ − (A−B))/2; 6 else 7 if B-A< δ then 8 A′ = A− (δ − (A−B))/2; 9 B′ = B + (δ − (A−B))/2; 4.2.3 Coefficient Selection After watermark embedding in the low pass frames, IMCDCT-TF (Sec.2.1.2) is applied to get the watermarked video. Let L(m,n) and L(m,n+1) are two consec- utive coefficients in low pass frames which are used for embedding and WL(m,n) and WL(m,n+1) are corresponding watermarked coefficient. Frames I13t+2, I3t+2 64TH-1469_10610110 4.2 Proposed Scheme Figure 4.4: Pixel categories and Location Map 65TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY and I33t+2 in Fig. 4.4 are reconstructed using Eqn. 2.4. Let corresponding wa- termarked frames are WI13t+2, WI3t+2 and WI 3 3t+2 respectively. From Eqn. 2.4, value of WI13t+2(m,n) can be written as Eqn. 4.4. WI13t+2(m,n) = 1√ 3 WL(m,n) + 1√ 2 M(m,n) + 1√ 6 H(m,n) (4.4) If the embedded watermark bit is 1 and locations areWI13t+2(m,n),WI 1 3t+2(m,n+ 1) for correct extraction WI13t+2(m,n) > WI 1 3t+2(m,n+ 1) or, 1√ 3 WL(m,n)+ 1√ 2 M(m,n)+ 1√ 6 H(m,n) > 1√ 3 WL(m,n+1)+ 1√ 2 M(m,n+1)+ 1√ 6 H(m,n+1) or, 1√ 3 (WL(m,n)−WL(m,n+1)) > 1√ 2 (M(m,n+1)−M(m,n))+ 1√ 6 (H(m,n+1)−H(m,n)) or, √ 3 2 (M(m,n+ 1)−M(m,n)) + 1√ 2 (H(m,n+ 1)−H(m,n)) < δ (4.5) Similarly condition for WI33t+2 will be√ 3 2 (M(m,n)−M(m,n+ 1)) + 1√ 2 (H(m,n+ 1)−H(m,n)) < δ (4.6) and for WI3t+2 √ 2(H(m,n)−H(m,n+ 1)) < δ (4.7) So for blind extraction from every temporal layer, it is required to extract wa- termark from every frame. Now, let a watermark bit (wb = 1) is embedded at L(m,n) and L(m,n+ 1). To extract it from WI13t+2(m,n) and WI 1 3t+2(m,n+ 1) (see Fig. 4.6), inequality given in the Eqn.4.5 must be satisfied. Two coefficients of low pass frames are selected for embedding only if cor- responding coefficients at middle frequency and high frequency frames satisfies inequalities given in Eqn(s).4.5, 4.6 and 4.7. 66TH-1469_10610110 4.2 Proposed Scheme Algorithm 4.2: Embedding Algorithm (V, α,W ) 1 Input V :Raw Video, and δ: Watermark Strength, W: Watermark Image 2 Output V w:Watermarked Video, Lmap : Location Map 1. Divide the Raw video sequence (V ) into group of (GOF) 9 non overlapping frames. Take one such GOF of 9 frames. 2. Each frame of a GOF is subjected to 2-level Haar Wavelet decomposition (Fig.4.1). Let LLv is the low frequency subband sequence after 2-level of Haar Wavelet decomposition. LLv = {LL1, LL2....LL9} 3. Divide the LLv sequence into non-overlapping sub-group of 3 low frequency sub-band sequence. 4. Let such a sub-group of 3 low frequency subband named LL1, LL2 and LL3. Find motion vector from LL1 to LL2 (MV1−>2) and LL3 to LL2 (MV3−>2). 5. Determine the motion compensated version of LL1 as LLmc1 and LL3 as LLmc3. Now calculate pixel by pixel temporal DCT of LLmc1, LL2 and LLmc3. 6. After 1D-DCT over LLmc1, LL2 and LLmc3, we get DCT coefficient frames as LDCT (1), MDCT (1) and HDCT (1) (low, medium and high frequency respectively) [LDCT (1)(i, j),MDCT (1)(i, j), HDCT (1)(i, j)] = 1DDCT [LLmc1(i, j), LL2(i, j), LLmc3(i, j)] for i, j = 1 to M×N4 where size of the frames are M ×N 7. Take LDCT (1), LDCT (2) and LDCT (3) from 3 sub-groups and apply step 4 to 6 to get LLDCT , LMDCT and LHDCT . 8. Every two consecutive coefficient form LLDCT is chosen for embedding if passes the selection criteria explained in Sec 4.2.3 and visual quality threshold discussed in Sec 4.2.4 9. Embed watermark in selected coefficients of LLDCT sequence using the function Embedbit(). Store the embedding location in Lmap 10. Do 2-level inverse temporal DCT and 2-level inverse spatial wavelet to get the watermarked video. Inverse motion compensation is done on Lmap and is stored for extraction. 67TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY Figure 4.5: Watermark Embedding Model Figure 4.6: Frames after every step Figure 4.7: Watermark Extraction Model 4.2.4 Visual Quality Threshold If difference between two consecutive coefficients (A,B) in Algorithm 4.1 is much less than δ (watermark strength) then distortion due to embedding will be high. To decrease this embedding distortion, further restriction is imposed on the se- 68TH-1469_10610110 4.3 Experimental Results lection of coefficients. A coefficient pair is selected for embedding if it satisfies the visual quality threshold (Vth) in the Condition1. Value of Vth is dependent on the payload. A−B > Vth when wb = 1 B − A > Vth when wb = 0 } Condition1 4.2.5 Extraction Scheme The watermark bit is extracted by the Eqn. 4.8. W′i = 0 if A′ < B′ W ′i = 1 if A ′ > B′ } (4.8) where A′ and B′ are embedded coefficients. The extraction procedure is shown in the Fig.4.7. A step by step extraction procedure is given in Algorithm 4.3. Algorithm 4.3: Extraction Algorithm (V w) 1 Data V w:Watermarked Video 2 Result W : Watermark bit stream 1. Each frame of the watermarked sequence is subjected to 2-level Haar Wavelet decomposition (refer to Fig.4.1). Let WLLv are the low frequency subband sequence after 2-level of Haar Wavelet decomposition. 2. From the frame rate of the video, the number of frames dropped due to temporal scaling (if there is any) has been found. 3. For each LL subband, the corresponding Location map has been determined. 4. Using location map embedded coefficients are selected and watermark bit extracted using Eqn. 4.8. 4.3 Experimental Results Proposed Scheme is evaluated on a set of standard video sequences with different motion characteristic and different size. In this paper, result of 4 video sequences 69TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY Table 4.1: Experimental Setup Parameters for Values Taken Watermarking Encoder H.264/SVC version Reference Software JSVM Bus, City, Crew, Coastguard, Mobile, Video Sequence Used Akiyo, Hall, Mother Daughter, Sunflower , Foreman, News Video Resolution 4CIF, CIF Watermark signal 32× 32 and 64× 64 binary image Visual Quality Metrics PSNR, flicker metric, VQM, SSIM Robustness metric Hamming distance with different motion characteristics, e.g. Crew (with CIF size and high object motion), Coastguard (with CIF size having object motion as well as camera mo- tion ), City (with 4CIF size and only camera motion ))are shown . A 32 × 32 and 64×64 binary logo is embedded in the luma component of the CIF and 4CIF sized video sequences respectively. All the watermarked video has been encoded using JSVM (Joint Scalable Video Model) reference software (H.264/SVC) and extracted in different quality (bit-rate) and temporal (frame rate) levels to eval- uate the performance of the scheme. Experimental setup is tabulated in Table 4.1. 4.3.1 Visual Quality In this sub-section, visual quality of the proposed scheme is compared with scheme proposed by Bhowmik et al. [39]. PSNR between cover video and watermarked video is compared with Bhowmik et al.’s scheme [39] in Fig. 4.8 for 4 video sequences. It is observed from the Fig.(s) that the visual quality (with respect to PSNR) of the proposed scheme (green line) is better than that of Bhowmik’s scheme [39] for all the video sequences. Intuitively use of visual quality threshold (Vth) in the proposed scheme helped to improve the PSNR value of the water- marked video. Although Vth has trade off with the payload, it chooses the best 70TH-1469_10610110 4.3 Experimental Results coefficients for embedding for a given payload. Flicker metric calculates inter frame distortion by taking three consecutive frames into consideration. In the proposed scheme, watermark is embedded in motion coherent location upto 9 frames (filter length) as a result absolute flicker difference of original video and watermarked video is close to zero. Absolute flicker difference is compared in Fig. 4.9. Fig. 4.10 depicts the SSIM comparison of proposed and Bhowmik’s scheme. We can see that the mean SSIM is better or close to Bhowmik’s scheme. Comparison of VQM with Bhowmik’s scheme is given in Fig. 4.11. Lower VQM means better video quality. So, video quality with respect to VQM is better than existing work [39] for all the video sequences. (a) Coastguard video (b) Crew video (c) City video (d) Ice video Figure 4.8: PSNR comparison 71TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY (a) Coastguard video (b) Crew video (c) City video (d) Ice video Figure 4.9: Flicker comparison 4.3.2 Robustness Comparison Robustness of the proposed scheme is measures by Hamming distance as given in Eqn. 2.19. Robustness at different temporal layers of different videos are shown in Fig. 4.12. Here weighted average of extracted watermarks from different layers is taken. Then hamming distance from the original frame is calculated. Base layer frames are given more weight because bit-rate of base layer frames are more (less compression) than other layer frames. Figure shows progressive improvement of robustness over frame rate. Robustness of the proposed scheme is compared with scheme proposed by Bhowmik et al. [39]. Fig.4.13 shows the robustness comparison for the same set of video sequences. It is observed from the Fig.4.13 72TH-1469_10610110 4.3 Experimental Results (a) Coastguard video (b) Crew video (c) City video (d) Ice video Figure 4.10: SSIM comparison 73TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY (a) Coastguard video (b) Crew video (c) City video (d) Ice video Figure 4.11: VQM comparison Figure 4.12: Robustness at different Temporal layer 74TH-1469_10610110 4.3 Experimental Results that the robustness (Hamming Distance) of the proposed scheme is better than that of Bhowmik’s scheme [39] and it shows the required graceful improvement in robustness as bitrate gets increased. (a) Coastguard video (b) Crew video (c) City video (d) Ice video Figure 4.13: Robustness comparison 4.3.3 Explanation In this watermarking scheme, watermark is embedded in the low pass spatio- temporal frame of the video. Here the watermark is embedded only in the con- nected pixels of the spatio-temporal low pass video frames, so the watermark will sustain after quality scaling (bitrate scaling), temporal scaling or frame dropping. As the watermark is embedded in temporal low pass frames, watermark informa- tion are distributed in all the frames, so even after the frame dropping due to 75TH-1469_10610110 4. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL AND QUALITY SCALABILITY temporal scaling, watermark information can be extracted from the remaining frames. 4.4 Conclusion In this chapter, a MCDCT-TF based watermarking scheme has been proposed. MCDCT-TF uses longer length filter than the existing schemes which exploits more correlation in consecutive frame’s. Proposed MCDCT-TF based water- marking scheme shows better robustness and less embedding distortion than ex- isting MCTF based watermarking scheme. The robustness is analyzed against compression in H.264/SVC coding. In this chapter, the temporal scalability and the quality scalability have been considered. In the previous chapter and this chapter, robust watermarking schemes are proposed for three different scalabil- ity like resolution, temporal and quality scalability. It has been observed that the proposed schemes are doing well if resolution or temporal scaling are rela- tively low. But there is still scope for improvement when amount of scaling is bit high. In the next chapter, a SIFT based approach is devised to handle the large resolution scaling. 76TH-1469_10610110 Chapter5 SIFT based Robust Image Watermarking against Resolution Scalability 5.1 Introduction In chapter 3, a watermarking scheme against resolution scaling is proposed where up-sampled base layer watermark is embedded in the enhancement layers. But experimental results shows that the performance of the proposed scheme is not up to the mark when the scaling factor is relatively high. In this chapter, a scale invariant image watermarking scheme based on Scalable Invariant Feature Transform (SIFT) [30] features is proposed. The proposed image watermarking scheme can be easily extended to video by taking the motion information into consideration to avoid the flickering artifact. The SIFT [30] algorithm (refer to chapter 2) extracts the distinctive features of local image patches and is proved to be invariant to image scaling and rotation. When it comes to scale invariance, Morel et al. [69] shown that SIFT is the best feature extraction methods and out- performs all other image feature extraction methods. SIFT descriptors are robust against noise, changes in illumination and viewpoint etc. These local invariant features are highly distinctive and are matched with a high probability against large image distortions. SIFT features have been used in many applications like multi view matching [31, 32], object recognition [33], object classification [34, 35], robotics [36] etc. It is also being used for robust image watermarking against ge- 77TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY ometric attacks [24, 38, 37]. Miyaki et al. [37] proposed a RST invariant object based watermarking scheme where SIFT features are used for the object match- ing. In the detection scheme of [37], the object regions are first detected by feature matching. The transformation parameters are then calculated to detect the hidden message. Though the method produces quite promising results but it is a type of informed watermarking as the register file has to be shared between the sender and receiver. Kim et al. [24] inserted watermark into the circular patches generated by the SIFT. The detection ratio of the method varies from 60% to 90% depending upon the intensity of the attack. Under strong distortions due to attenuation and cropping, the additive watermarking method may fail to survive for several images. Jing et al. [38] used SIFT points to form a convex hull, which are then optimally triangulated. The watermark is then embedded into the circles centered around the centroid of each triangle. This scheme also fails to large resolution scaling. All these scheme used SIFT feature to select a embedding zone and to synchro- nize watermark location during extraction. In this work, a novel watermarking scheme is proposed where SIFT feature descriptor itself is used as watermark signal. In the proposed scheme, intensity of an image patch is altered in such a way that it generates some new feature points. Descriptor of this new feature points are stored as watermark signal. The patch is chosen in such way that it creates less perceptual distortion. 5.2 Proposed Scheme In this scheme, the invariance property of the SIFT features to the rotation, scaling and translation is exploited. The original image is modified in a context coherent way such that the perceptual meaning of the image is not changed. The said perturbation generates new set of SIFT features. The new SIFT descriptors, (the 128 bit dimensional vectors associated with this new SIFT features) act as the watermark message in the image. These new features are extracted and registered in the database. During the watermark detection, the SIFT features are extracted from the image in consideration. Then SIFT matching is done between the registered SIFT features in the database and the features which are 78TH-1469_10610110 5.2 Proposed Scheme extracted from the attacked image. A high degree of matching (greater than a prescribed threshold), denotes the authenticity of the image. Since SIFT features are invariant to geometric distortions, it is more likely that these features would produce a match with high probability in case of image scaling. The proposed watermarking scheme is thus robust to the resolution and quality adaptation. 5.2.1 Watermark Zone Selection The first step of the proposed scheme is to find an image region such that alter- ation to the region will generate least perceptual error. To do this, contrast of the image is increased (refer Step 2 of the Algorithm 5.1), so that some back- ground objects which are interleaved in the background are detected in the next step. In Step 4, for removing small objects from binary image, an area opening is performed which removes all connected components having less number of pixels. Now connected components are decided by 4-connected neighborhood operator [70]. The areas and pixel position of all the objects are obtained. It is observed that in the obtained areas A, the top 5% of the objects with large areas capture almost the whole of the image, leaving a collection of small objects which are insignificant to the human visual system (HVS). The distribution of the objects is such that there are maximum number of objects with less area and very few objects with appreciable area. Choosing the area generically, for applying the patch from all the objects wouldn’t give good result as the distribution of the object areas varies from image to image. Unique areas of objects U (Step 6) are hence considered because their distribution will be nearly same for all the images. The selection of an object which is neither be too large to be noticeable for the human eye nor too small to give fragile SIFT points is decided by β. The value of β is empirically found to be 1.5. The details are given in Sec. 5.2.2. Now the object with area U(bn β c) is found. In very rare cases, the value bn β c may be zero where it is considered as U(1). This gives the pixels where the image has to be altered. 79TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Algorithm 5.1: Watermark Zone selection Input: Image I, quality parameter β. Output: Pixel Locations in I 1. Convert I to gray scale image. 2. Increase the contrast of I. 3. Convert the gray scale image I to a binary image using threshold computed using Otsu’s method [71]. 4. Remove small objects from binary image. 5. Get a list of connected components along with their pixel positions and areas in the modified binary image. Let each object in the list be denoted by {A(i), P (i)} where A(i) is the area and P (i) contains pixel locations of the ith object. 6. Get a list of unique areas in A sorted in the ascending order. Let the list be U and length of the list be n. 7. Let p = bn β c. Obtain a connected object with the area U(p) i.e. find i such that A(i) = U(p). 8. Get the pixel locations of the object i which is P (i) 80TH-1469_10610110 5.2 Proposed Scheme Figure 5.1: Lena Binary Image 5.2.2 Derivation of Quality Parameter β Lower β gives the object having larger area in Algorithm 5.1. This results a watermark patch which is more perceptible to the HVS and is more robust. Sim- ilarly higher β gives less robustness and good visual quality. Hence, there is a trade-off between these two factors. The proposed algorithm uses an empirical value of 1.5 for quality parameter β. The value of β directly affects the visual quality and the robustness of the watermark. In the experimental setup, the value of β is varied form 1.1 to 2.0 in 0.1 increments. The Watson metric and robustness for each scale is calculated for every alpha. The optimal value for β is the one which has maximum robustness along with minimum perceptual error. For total robustness of an image at a particular β, the arithmetic mean of robustness at each scale is considered as 1 |s| ∑ s Rs where Rs is the robustness at that scale s. In this experimental study, the scaling factor is taken from 0.3 to 1.2 and robustness Rs is calculated as given in equation 5.2. Now, arithmetic mean over all the images is considered. In this way, the robustness at each β is obtained. Since the perceptual error is very less and the robustness is high relative to it, both are separately normalized. The difference between Robustness (R) and Perceptual Error (PE) is consid- ered to determine the value of β. The plot of the graph can be seen in Fig. 5.2. It can be observed that after β = 1.5 there is very little increase in the slope of the curve. Moreover as the value of β goes above 1.5, the robustness will be very less which makes the watermarking itself ineffective. This is similar to the parametric 81TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.2: Plot for finding best possible β estimation done by Lowe [30], where it is mentioned that one must settle for a solution that trades off efficiency with completeness. So even if there is a slight increase in the curve at the end, the value of β is chosen as 1.5. 5.2.3 Watermark Embedding The entire watermark embedding process is summarized in Algorithm 5.2. Algo- rithm 5.1 is applied on the given image to get the pixel locations which are to be modified. In this work, the pixels obtained are changed using Eqn. 5.1 so that the new SIFT features obtained are strong and do not match with the original image. Now the SIFT features Dw which are not in the original image but are present in modified image are extracted. These set of features act as watermark message and are registered in the database. Each members of the set Dw will be a 128 bit dimensional vector representing the SIFT descriptors which are exclusively present only in watermarked image. 82TH-1469_10610110 5.2 Proposed Scheme Algorithm 5.2: Watermark Embedding Input: Image I Output: Watermark Descriptors Dw 1. Extract the SIFT features of the original image I. Let the set of SIFT descriptors obtained for the original image be D. 2. Apply Algorithm 5.1 to get the pixel locations P in the image I where the patch has to be applied. 3. Modify the original image I to I ′ by modifying the intensities at the pixel locations non-linearly. Intensity of pixels at location P is modified as I ′(P ) = mod(I(P )2, 256). (5.1) 4. Extract the SIFT features of the modified image I ′. Let the set of SIFT descriptors obtained from the modified image be D′. 5. Take the difference of the two sets. Dw = D′ \D. 83TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY 5.2.4 Watermark Extraction & Authentication In the proposed scheme, a blind watermark extraction method has been em- ployed. First SIFT features are extracted from the attacked image. Then the feature matching is done between the extracted features and registered features. For authentication, matching percentage must be greater than some predefined threshold. Step wise description is given in Algorithm 5.3. Algorithm 5.3: Watermark Extraction & Authentication Input: Attacked Image I ′ 1. Extract the SIFT features of the Image to be checked for authenticity I ′. Let the set of SIFT descriptors obtained from the modified image be D′. 2. Get the registered feature descriptors Dw stored in the database. 3. Apply SIFT matching to find the features in Dw which match with D′. If the degree of matching is high (greater than a prescribed threshold), then the image is matched and is authenticated. 5.2.5 Experimental Results In this section, a comprehensive set of experimentations have been carried out to justify the efficiency of the proposed scheme over the existing literature. Experimental Setup Data Set : Proposed scheme is tested on a huge dataset of approximately 82000 images. The images are collected from various computer vision standard databases such as Complex Scene Saliency Dataset (CSSD) and Extended Com- plex Scene Saliency Dataset (ECSSD) [64]. Caltech 256 dataset [65] and The LabelMe-12-50k dataset [66]. So Images used for experimentations are of dif- ferent characteristics and of various categories. The collection also includes 24 standard images such as lena, baboon, airplane etc. 84TH-1469_10610110 5.2 Proposed Scheme Image b ab o on b ar b ar a b oa t gi rl le n n a m ou n ta in se rr an o tu li p s ze ld a GPE (×10−4) 1.5 5 7. 89 2. 82 2. 54 2. 89 16 .7 2. 72 3. 14 14 .2 Table 5.1: GPE for Standard Images Evaluation Parameters : Robustness : The percentage of matching is taken as the robustness of the descriptor i.e. Robustness R = m |Dw| (5.2) where Dw is the feature point descriptors in the database and m is the number of points matched in Dw when compared with D′. Perceptual Quality: For assessing the perceptual quality of the proposed scheme, Watson metric [63] is used where the global perceptual error (GPE) of the water- marked image with respect to original image is computed. In addition RARE2012 [58] is used to evaluate the mean visual saliency of the patch in the original image to compare the efficiency of the embedding location selection process. Visual Quality The embedding patch is selected in such a way that the modified image will be very close to the original image in terms of its visual perception. Table 5.1 gives the statistical metrics of the GPE for 10 standard images. Robustness against Resolution Scaling Table 5.2 gives the median value of robustness of the watermark for all images in the database. The results for the standard images are shown in Table 5.3 for the blind watermarking scheme. The watermark is observed to be highly robust against resolution scaling. This is due to the selection of highly stable SIFT points. 85TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Scale 1.2 0.9 0.7 0.5 0.3 Robustness 89.91 86.72 71.87 69.36 50.59 Table 5.2: Average Robustness for all the images in the dataset Figure 5.3: Robustness Comparison with scheme [24] (red) and [38](black ) A comparison of the proposed scheme with [24] and [38] with respect to the robustness against the resolution scaling is depicted in Fig. 5.3. It is observed that the proposed scheme outperforms the existing schemes for the standard images. 86TH-1469_10610110 5.3 Improvement over the proposed scheme Image 1.2 0.9 0.5 0.3 baboon 91.67 83.33 76.67 53.33 barbara 85.71 82.22 80.00 60.00 boat 83.33 81.13 69.81 47.17 girl 90.00 86.67 86.00 62.22 lena 88.89 85.29 73.52 52.94 mountain 89.66 85.21 70.69 55.17 serrano 96.88 90.63 90.63 68.75 tulips 85.71 85.716 81.63 59.18 zelda 83.33 86.36 86.36 81.81 Table 5.3: Robustness for Standard images when scaled 5.3 Improvement over the proposed scheme Although proposed scheme in the previous sub-section performs quite well against resolution scalability, it can be further improved. In this subsection, the improve- ments over the previous scheme have been discussed. In this improved version, a visual saliency based zone selection method has been employed to get better visual quality of the watermarked video. In addition, the stability (or robustness) of the SIFT features is considered in the improved version. The new SIFT fea- tures generated due to patch insertion are sorted with respect to their stability and 10 most robust (stable) features are stored as the watermark. 5.3.1 Strength of Individual SIFT Feature SIFT is extensively used in computer vision, object matching and image re- trieval. The SIFT algorithm generates a large number of feature points. Matching strength or robustness of all feature points are not same. In [72], a method is described to measure the stability of a feature point. In the scheme [72], first a large set of SIFT descriptor is clustered into 4096 clusters, then each cluster is assigned a quality metric. Quality metric is measured by the false match rate where less false match rate signifies more stability. To get the false match rate 87TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY of a cluster, features from a large image set is calculated and the mapping be- tween clusters and features have been identified. Then, each image from the image set is passed through a set of transformation (scaling rotation etc.) and the false match is calculated for each image and for each transformation. The ratio of total number of false match from a cluster and total number of feature in that cluster is determined as false match rate of that cluster. In the proposed scheme, newly generated features due to change of intensity of a patch are stored as the watermark. Features which belongs to cluster with less false match rate are chosen from the new feature set. 5.3.2 Modified Watermark Zone Selection Step by step procedure of modified zone selection is described in Algorithm 5.4. The image zone is determined using visual attention model named RARE2012 [58]. RARE2012 [58], which assigns saliency based on multi-scale spatial rarity is described in Sec.2.3. First saliency map of the input image is calculated. Then the saliency map is converted into binary values where lowest 15% saliency values are changed to white(1). Then all the connected components of the binary saliency map are determined. Algorithm 5.4: Watermark Zone selection Input: Image I. Output: Pixel Locations in I 1. R = RARE2012(I) 2. Convert saliency map into binary by changing lowest 15% values to one and rest zero. 3. Get a list of connected components in this binary map. 4. Return the all connected components and their locations 88TH-1469_10610110 5.3 Improvement over the proposed scheme 5.3.3 Watermark Embedding The modified watermark embedding process is summarized in Algorithm 5.5. Algorithm 5.4 is applied on the input image to find the candidate pixel locations for embedding. The object with the lowest saliency values in the map is chosen for embedding. In this scheme, the pixel’s intensity perturbed by δ. The value of δ is varied from -255 to 255 and is chosen such that it generates a large number of new features with relatively less visual degradation. If the number of features thus generated is less than some threshold (F th) then next object is chosen. After required number of new features are generated, they are sent to selection procedure described in Sec. 5.3.1. In this scheme, 10 features are stored as watermark having lowest false match rate. Let D and D′ be set of descriptors generated from the original image and watermarked image respectively. In the previous scheme, new feature set is generated by taking set difference of D and D′. Due to change in intensity of the patch, lot of original descriptor will change to a new descriptor with very less difference/distance between them. Those features will be stored as watermark. But when matching with the original descriptor set, those feature will match with high probability because in SIFT matching algorithm (as described in Sec. 2.2), two matched features does not require to have zero distance (exact match). As a result in the previous scheme, there is a high percentage of matching with the stored watermark and the original feature set (false positive). The matching ratio of watermark descriptor (Dw) with the original descriptor (D) in previous scheme is plotted in Fig. 5.4. In this modified embedding scheme, new feature set is calculated (in step 7 of algorithm 5.5) by taking set difference of D′ and Dm, where Dm is sub set of D′ which are matched with the original descriptor set D. 5.3.4 Watermark Extraction Watermark extraction is same as the previous scheme as described in Sec. 5.2.4. 89TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.4: Matching ratio of watermark descriptor with original descriptor in previ- ous scheme Figure 5.5: Watermark Embedding Scheme 90TH-1469_10610110 5.3 Improvement over the proposed scheme Algorithm 5.5: Watermark Embedding Input: Image I Output: Watermark Descriptors D 1. Extract the SIFT features of the original image I. Let the set of SIFT descriptors obtained for the original image be D. 2. Apply Algorithm 5.4 to get the connected components P with lowest saliency. 3. Select smallest component say Pi 4. Modify the original image I to I ′ by modifying the intensities at the pixel locations Pi i.e. I ′(Pi) = I(Pi) + δ where δ is constant. 5. If number of new features is less than F th then select next small connected component and go to step 4 6. Extract the SIFT features of the modified image I ′. Let the set of SIFT descriptors obtained from the modified image be D′. 7. Find SIFT matching between D and D′. Let Dm is the set of feature in D′ matched with D. Then new feature set D = D′ \Dm. 8. Find out quality of each new SIFT feature as described in Sec. 5.3.1. 9. Sort descriptor set D in ascending order according to their quality. 10. Store the first 10 descriptor from sorted D as watermark. 91TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.6: Visual Degradation Comparison between proposed scheme and existing schemes for Lena 5.3.5 Experimental Result Modified scheme is evaluated with the same experimental setup as previous scheme (5.2.5). Comparison with previous scheme In a previous work [73], a SIFT based image watermarking scheme is proposed where watermark was embedded in an object of a chosen size. All the newly generated features are stored in the register. The proposed algorithm results less visual degradation than our previous scheme [73] as the changes are made in the least salient image regions in this work. Comparison between two schemes against GPE for 7 standard images is shown in Fig. 5.6. There is also a significant improvement in robustness over the previous scheme. This improved result may be caused due to the selection of the stable SIFT features for watermarking. Comparison of robustness of Boy and Serano image is shown in Fig(s). 5.7 and 5.8 respectively. 92TH-1469_10610110 5.3 Improvement over the proposed scheme Figure 5.7: Robustness Comparison between proposed scheme and existing schemes for Boy Figure 5.8: Robustness Comparison between proposed scheme and existing schemes for Serano Visual Quality Changes made to the images in the proposed scheme are very less perceptible to the HVS because the changes are made in the least salient region of the image. The watermarked images and the SIFT features generated for baboon, barbara and lena are shown in Fig. 5.9. It can be observed from the Fig. 7 that there are no perceptible visual artifacts in the watermarked images due to embedding. The GPE [63] for 10 standard images along with the saliency values using RARE2012 [58] for the corresponding image locations are tabulated in Table 5.4. As least salient regions are chosen for embedding, saliency values for selected image loca- 93TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.9: Embedding of watermark for baboon, barbara and lena. Top row shows the original images. Middle row shows the watermarked images with patch. Bottom row shows the newly generated SIFT points due to insertion of patch. 94TH-1469_10610110 5.3 Improvement over the proposed scheme tion are zero for the proposed scheme. Image GPE (×10−4) Saliency Values baboon 0.21 0.0 girl 0.22 0.0 lena 0.96 0.0 serrano 0.28 0.0 tulips 2.84 0.0 monarch 0.29 0.0 boy 0.80 0.0 Table 5.4: GPE and Saliency for Standard Images Robustness against Resolution Scaling Table 5.5 tabulates the median value of robustness of the watermark for sample images from the databases. The results for the standard images are shown in Table 5.6. From the Table, the proposed watermarking scheme is observed to be highly robust against resolution scaling. Intuitively, this is due to the selection of highly stable SIFT points. Relation of change in candidate pixel (P) intensity with feature stabil- ity and quality of the image In this section, how stability of the new feature points and quality of the water- marked image varies with the change in intensity of the candidate pixels (P) is tested. The intensity of the patch (P) is gradually varied and the changes in the Scale 1.2 0.9 0.7 0.5 0.3 Robustness 76.34 76.39 71.02 67.08 52.73 Table 5.5: Median Robustness for all the images 95TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Image 1.2 0.9 0.5 0.3 baboon 50.0 83.33 83.33 83.33 girl 85.71 71.42 71.42 42.85 lena 100.0 83.33 83.33 83.33 monarch 100.0 100.0 50.0 25.00 boy 70.0 70.0 70.0 60.00 serrano 60.00 90.00 60.00 30.00 tulips 76.47 64.71 58.82 23.53 Table 5.6: Robustness for Standard images when scaled stability of new feature points are observed. It can be seen from the plots of the Fig. 5.10 that the stability of the SIFT features increase with increasing change in patch intensity. Since the images which are used for experimentation are 8 bit color images, plot stabilizes when intensity changed to 0 or 255 (intensity can not be decreased/increased). The point at which this occurs clearly depends on the initial image intensity. It has been observed that the value of the maximum stability depends on the image. This is because the number of newly generated SIFT points and their stability varies depending on the context of the patch. Similar experiments are done to observe the change in image quality (per- ceptual error) due to change in intensity. The intensity of the patch selected is gradually varied and the changes in the perceptual error, when compared with original image is observed. Plots of the results of the experiments are shown in Fig. 5.11. It is seen that the perceptual error magnitude is very less. Overall the images, it has been observed that the perceptual error never crosses 3.235×10−3. This is due to the significantly small size of the patch when compared to the image size. Similar to the stability plot, perceptual error also stabilizes at certain point when changed intensity reaches to maximum (255) or minimum (0) possible value. These observations are reconfirmed by observing the change in the stability with the perceptual error. The resulting graphs are shown in Fig. 5.12. As ex- pected, with the increase in the stability, greater perceptual error results. Hence, there is a trade-off between the two factors of stability and visual perceptual 96TH-1469_10610110 5.3 Improvement over the proposed scheme Figure 5.10: Plot depicting variation of intensity with stability 97TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.11: Plot depicting variation of intensity with perceptual error 98TH-1469_10610110 5.4 Conclusion error. Another trade-off between Robustness and Visual Quality is explained in the next section. Robustness and Visual Quality Trade-off In this section, we observe the trade off, by modifying Step 4 of Algorithm. 5.1 used for zone selection. Throughout the scheme, the watermark is obtained by complementing the intensity values of the original patch. Since the user has only the watermarked image and not the original image, it is difficult for the user to predict the location of the patch. In spite of that, a patch can be inserted by just increasing the pixel intensity by some value. In our experimentation, the value is taken as 20 based on experimental evidence. It is observed that the amount of alteration in the pixel value within the selected patch due to injection of watermark has a direct consequence to the perceptual quality and robustness of the watermarking scheme. Increasing the intensity value by small amount instead of complementing the image clearly leads to decrease in the perceptual error rate. Since SIFT extracts distinctive features of local image patches, increasing the intensity by a small amount would generate less points when compared to complementing the same pixel values. Hence, there will be a decrease in robustness. This has been experimentally verified and the comparative graph of the visual quality metric and robustness for the two cases is shown in Fig. 5.14 and Fig. 5.13. 5.4 Conclusion In this chapter, a content-based image watermarking scheme is proposed where an image is modified by inserting a context coherent patch. Newly generated SIFT descriptor are stored as the watermark. The proposed scheme can be used on each video frame to develop a video watermarking scheme resilient to resolution scaling. The scheme belongs to the class of second generation watermarking schemes which uses SIFT descriptor points. Experimental results showed that proposed watermarking scheme is highly robust to the scaling attacks. In the second part of this chapter, the proposed scheme is improved by incorporating 99TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY saliency based embedding zone selection and selecting stable SIFT features for watermarking. Also, various results regarding the stability of the feature points are described. In the next chapter, the SIFT features are used to develop a video watermarking scheme against the temporal scalability. 100TH-1469_10610110 5.4 Conclusion Figure 5.12: Plot depicting variation of stability with perceptual error 101TH-1469_10610110 5. SIFT BASED ROBUST IMAGE WATERMARKING AGAINST RESOLUTION SCALABILITY Figure 5.13: Variation of robustness with change in intensity of the patch by 20 Figure 5.14: Variation of GPE with change in intensity of the patch by 20 102TH-1469_10610110 Chapter6 Robust Video Watermarking against Temporal Scalability As it has been observed in chapter 3, temporal de-synchronization is a potential threat to the watermarking system and it can be employed by simply dropping some random frames. It was also observed that temporal scaling or frame rate scaling is a common de-synchronization attack. In chapter 3, a semi-blind wa- termarking algorithm has been proposed to resist the attack. In that scheme, a location map is used to find the watermark locations. Size of the location map is directly proportional to the size of the video. Storing or communicating the loca- tion map (of size close to the size of original video frame) for every video frame is an extra overhead and may not be always feasible. In this chapter, two blind video watermarking schemes are proposed which can resist temporal de-synchronization attacks (eg. frame dropping) without the help of any location map. In the first method, a SIFT (Scale Invariant Feature Transform) based watermarking scheme is proposed to resist temporal scaling where SIFT features of side views of the video are used for watermarking. In the second scheme, different watermarking signals are embedded in different hierarchical layers of the H.264/SVC encoding to ensure the graceful improvement in successive enhancement layers. These two schemes are described in details in the following sub-sections. 103TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY (a) Side Plane (b) Watermarked region and corre- sponding Side Planes Figure 6.1: Side Plane and Embedding zone 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability In this work, a watermarking scheme has been proposed which is robust against the temporal adaptation for scalable video coding. The SIFT has been used in the proposed scheme to make it robust against frame rate adaptation due to temporal scaling. The frame dropping attack can also be resisted by the proposed scheme. In this method, the video sequence (of a given no. of frames) is modeled as a 3D signal. It essentially forms a three dimensional cuboid having width equivalent to no. of frames in the given sequence. If the dimension of the video frame is m×n and k no. of such frames are considered then a cuboid of dimension m × n × k has been formed where it’s height is m, width is n and it has a depth of k pixels. Now, if a side face of this cuboid is imagined as an image (having dimension m× k), without loss of generality, it can be said that there exists n such images. So the new form of cuboid is having its height as m, width as k and depth as n. The scenario has been depicted in Fig. 6.1. Intuitively, the side face image of above defined cuboid depicts the motion characteristics of the given video sequence. In the proposed scheme, a water- mark patch (let the size of the patch is u× v) is inserted in one of the side face images. The patch is embedded in such a way that it generates strong SIFT features. These newly generated SIFT features themselves work as a watermark. The frame rate adaptation or frame dropping attacks may be considered as width scaling of such images. Now, if such width scaling is known a-priory, correspond- 104TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability (a) Cover Video Frame (b) Side View Figure 6.2: Embedding Zones ing height scaling can also be possible to get a resized side view image. Since the SIFT features are invariant to scaling, these features can be extracted from any image resolution i.e. from the video after frame rate adaptation or frame dropping attacks. The different steps of proposed scheme such as watermarking zone selection, watermark embedding and extraction are narrated in successive subsections. 6.1.1 Watermarking Zone Selection In the previous subsection, it is said that a patch is inserted in the side face image such that it generates strong SIFT features. In this work, although the SIFT feature has been calculated over the side face images of the video cuboid, the watermarking zones have been selected for embedding with respect to the spatial and temporal characteristics of the normal video frames (in the rest of the paper, this actual video orientation is called main view). Intuitively, the video zones with low motion and high texture are suitable for embedding as it helps to reduce the flickering artifacts and usually masks the spatial additive noise due to watermarking. The idea of the embedding zone selection of the proposed scheme has been depicted in Fig. 6.2, where the suitable embedding zones on the main view are located in Fig. 6.2a and the corresponding side view image regions are located in Fig. 6.2b. 105TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY (a) Original Frame (b) Motion Map (c) Binary Motion Map after Thresholding Figure 6.3: Motion Map Block Selection in the Main View In the proposed scheme, the watermarking zone selection with respect to the main view is done firstly. Low motion zones in the main view are selected for embedding as embedding in high motion zones may cause flickering artifacts [60]. To find the low motion zones for a video sequence, a motion map has been generated. Difference of consecutive frames is directly proportion to the motion of the video. That is if there is high motion then difference will be high, unless there is scene change. Using this characteristics, motion map is devised as the summation of absolute difference of consecutive frames in a video sequence (having n frames) as shown in Eqn. 6.1. MMap(i, j) = n∑ k=1 |framek+1(i, j)− framek(i, j)| (6.1) where (i, j) is the pixel (spatial) index of the frame and k is the frame (temporal) index of a video sequence having n no. of frames. The main view of the city video and its corresponding motion map are de- picted in Fig. 6.3a and Fig. 6.3b respectively. A motion threshold has been employed to select the 10% of the frame pixels having lowest motion according to the motion map. A binary motion map after thresholding is depicted in Fig. 6.3c where black pixels are representing low motion pixels in the map. After getting the motion map, the frame (main view) is divided into non overlapping blocks of size w × h and a block DCT is done to each block. The energy of the block 106TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability is measured by taking the sum of the squared AC coefficients of the transformed block as given in Eqn. 6.2. EBL = ( x=h−1∑ x=0 y=w−1∑ y=0 [C(x, y)]2 ) − [C(0, 0)]2 (6.2) where C(0, 0) is the DC coefficient, C(x, y) is DCT coefficient at location (x, y), h and w is height and width of the block. An energy threshold has been used to select the higher energy blocks for embedding. The blocks are sorted in non increasing order separately with respect to the energy of the blocks and the higher number of black pixels in the motion map. Now blocks having highest energy (priority one) with maximum black pixels in the motion map (priority two) has been used for embedding. The number of blocks thus selected for embedding may be decided according to the desired payload which depends on the applications. Block Selection in the Side View Once the blocks with respect to the main view image are selected, the embedding suitability of these blocks with respect to side view image has been analyzed. The blocks which are selected with respect to the main view image can easily be identified in the side view image. The selected block locations with respect main view image and their corresponding locations in the side view image are depicted in Fig. 6.2a and Fig. 6.2b respectively. Corresponding blocks in side view image will be at different depth. In Fig. 6.2b, it is shown in single side view image. In side view image, a subset of the selected blocks have been chosen such a way that it generates strong SIFT features. In this work, the strength of the SIFT features are quantified by the number of new SIFT features which are generated when a patch is inserted in a selected block. In other words, blocks / regions which generates more new SIFT features due to patch insertion are more suitable for watermarking. An experiment has been done where number of newly generated SIFT features due to patch insertion are compared between low frequency (smooth background) area and high frequency area (busy areas). The results are plotted in the Fig. 6.4. It can be observed from the Fig. 6.4 that the number of newly generated SIFT features are usually more for the low frequency (smoother) areas than that of high frequency (busy) areas. 107TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Figure 6.4: New SIFT features for Smooth Area and Busy Area According to the above observation, blocks resided in the relatively smother areas are chosen for embedding from the previously selected blocks. In this work, the blocks residing in relatively smoother area in the side view image are deter- mined by measuring the average energy for the surrounding area of that block. To calculate the average energy of the surrounding area of a block, a bigger block (of size U × V ) is imagined covering the block of size u × v. Thus, there are U/u × V/v no.s of blocks of size u × v within a single bigger block. To decide the relative smoothness among the bigger blocks, the mean intensity values of each smaller block pixels have been calculated and U/u × V/v block DCT has been employed on these mean intensity values. The smoothness of a bigger block is quantified by the squared sum of the AC coefficients obtained from the block DCT. The bigger blocks (of size U × V ) are sorted in non-decreasing order of squared sum of AC coefficients (higher the squared sum of AC coefficients, lower the smoothness) and initial blocks are taken for embedding where the number of blocks per frame depends on the prescribed payloads. The selection procedure for the blocks located in relatively smoother regions are depicted in the Fig. 6.5. In Fig. 6.5, it is shown that a block (in green) has been selected for embedding as it resides in relatively smoother region within a side view. Over all zone selection 108TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability Figure 6.5: Selection of blocks belonging to relatively smoother region algorithm is shown in Fig. 6.6. 6.1.2 Watermark Embedding In the proposed scheme, watermark embedding is done using a patch insertion (altering the pixel intensities of the selected region) in the selected regions as described in the previous subsection (refer to Sec. 6.1.1). Essentially, the new SIFT features generated due to patch insertion are treated as the watermark signal. Figure 6.6: Zone Selection Procedure 109TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY (a) Temporally adapted video (b) The required resizing for water- mark extraction Figure 6.7: Temporally adapted video and corresponding resizing for extraction In this watermark embedding process, pixels intensities of the single side plane of the selected cuboid region are altered to generate new SIFT features. The image plane is chosen randomly from multiple image planes (depth of the cuboid). In our experiments, and such randomly chosen image plane is used for patch insertion as described in [73]. Whole watermarking scheme is described in the Algorithm 6.1. 6.1.3 Watermark Detection & Authentication In the proposed scheme, one of the main goals is to detect watermark from the temporally adapted video sequence. In general, amount of temporal scaling (the number of frames are actually dropped) is known a-priory to the watermark ex- traction process. In this work, the height of the side view is scaled in accordance with the temporal scaling (width scaling) so that 2D SIFT can be used for wa- termark extraction. In case of random frame dropping attack (intentional or unintentional), number of dropped frames is very less. In that situation manual height scaling is not required. So if the frame dropped ratio is greater than a threshold (Dth) then only height scaling of side frames are required. This resiz- ing process has been depicted in Fig. 6.7. The overall watermark detection and authentication method is summarized in Algorithm 6.2. 110TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability Algorithm 6.1: Watermark Embedding Input: Video V Output: Watermark Descriptors Dw, Watermarked frame index y 1. Watermark embedding zones have been selected as discussed in Sec. 6.1.1. 2. From the side view, there exists multiple image planes (depth of the cuboid). 3. One side image plane (let say SV Ik) has been randomly chosen for the watermarking. 4. Extract the SIFT features of the side view image (SV Ik). Let the set of SIFT descriptors obtained for this chosen side view image (SV Ik) is D. 5. Change the pixel intensities of the region for the side image plane (SV Ik) using Eqn. 6.3. Let the modified side image plane be SV I ′k C ′ = min(C + δ, 255) (6.3) where C is the original pixel value of the selected region and δ is change in intensity 6. Extract the SIFT features of the modified side view image (SV I ′k). Let the set of SIFT descriptors obtained for the modified side view image is D’. 7. Find set difference of two set DW = D′\D. DW is newly generated SIFT features. 8. Store DW in database as watermark. 111TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Algorithm 6.2: Watermark Extraction & Authentication Input: Watermarked Video V W , Number of frames in original video n1, side plane index where watermark is embedded y, watermark descriptor DW 1. Let number of frames in the watermarked video V W is n2 2. if n2/n1 > Dth then Resize side planes to keep its height-width ratio same(Fig.6.7b) end 3. Extract SIFT descriptors from yth side plane of V W . Let (D′) is. 4. Apply SIFT matching on D′ and DW . 5. If there is high percentage of matching then the video is authenticated. 6.1.4 Experimental Results In this section, a comprehensive set of experimentations have been carried out to justify the efficiency of the proposed scheme. Proposed scheme is applied on many standard videos of CIF resolution. Few of theme are presented here. All the videos are encoded with H.264/SVC (Scalable Video Coding) using JSVM [74] reference software after watermarking. Evaluation Parameters : Robustness : The percentage of matching is taken as the robustness of the descriptor i.e. Robustness R = m |D| (6.4) where D is the feature point descriptors in the database and m is the number of points matched in D when compared with D′ (descriptors generated from attacked video). Perceptual Quality: For assessing the perceptual quality of the proposed scheme, PSNR, SSIM and Flicker metric are used. MSU-VQMT [62] tool is used to calculate these metrics. 112TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability HHHHHHHHHVideo Frame Drop 15/300 30/300 50/300 city 87 84 80 crew 91 85 80 coast guard 91 86 84 mobile 92 85 83 Table 6.1: Robustness against random frame dropping HHHHHHHHHVideo Frame Drop 25% 50% 75% city 75 71 45 crew 70 56 50 coast guard 81 80 52 mobile 82 80 56 Table 6.2: Robustness against temporal scaling Robustness against Frame Drop Attack The robustness of the proposed scheme [as defined in Eqn. 6.4] has been evaluated against frame dropping and temporal adaptation attacks respectively in Table 6.1 and Table 6.2. Table 6.1 shows robustness of the scheme when 15, 30 and 50 (out of 300) frames are dropped randomly. In Table 6.2 robustness against temporal scaling attack is tabulated where dyadic/non-dyadic scaling is done using H.264-SVC encoder. From the tables, it is evident that robustness is more than 70% even after 50% frames are dropped. Table 6.3 shows the robustness of the proposed scheme against frame averaging where every frame is replaced by the average of previous and next frame. Robustness against frame dropping attack and temporal scaling of the proposed scheme is compared with Chong’s scheme [67] in Fig. 6.8. Figure shows that robustness of the proposed scheme is better than Chong’s scheme [67]. 113TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY video Robustness city 88 crew 75 coast guard 80 mobile 82 Table 6.3: Robustness against frame averaging Figure 6.8: Robustness comparison with Chong’s scheme [67] for City video 114TH-1469_10610110 6.1 SIFT based Video Watermarking Resilient to Temporal Scalability Perceptual Quality of the Watermarked Video In this section, PSNR, Structural Similarity (SSIM) and flicker metric [60] are measured using MSU video quality measurement tool [62] to quantify the visual distortion due to watermark embedding. In Fig. 6.9, comparison of PSNR of the watermarked city video of proposed scheme and Chong’s scheme [67] is presented. In this figure, only watermarked frames are compared. Although number of frames altered in Chong’s scheme is higher than the proposed scheme, only same number of frames are compared. PSNR of proposed scheme for all frames are constant because number of pixel altered and value of δ is same for all the frames. In Fig. 6.10 and Fig. 6.11, SSIM and flicker metric of the watermarked frames are compared with Chong’s scheme [67] for City video. As it can be seen from the figures that the proposed scheme gives better result in terms of flickering artifacts. Figure 6.9: PSNR comparison of the watermarked frames 115TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Figure 6.10: SSIM comparison of the watermarked frames Figure 6.11: Flicker comparison of the watermarked frames 116TH-1469_10610110 6.2 Robust video watermarking against Temporal Scalability 6.2 Robust video watermarking against Tempo- ral Scalability In the previous chapters, it has been shown that how temporal adaptation attacks can be resisted using a location map based temporal desynchronization scheme and a SIFT based approach. In this sub-section another very simple approach has been employed against temporal adaptation to meet another important re- quirement for scalable video watermarking called graceful improvement. In an- other words graceful improvement means that the robustness of the watermark increases with the addition of successive enhancement layers for scalable encod- ing. In the proposed scheme, each temporal layers of the scalable video has been separately embedded with a different watermark signals which are generated by DCT domain decomposition of a single watermark image. A zigzag sequence of block wise DCT coefficients of the watermark image is partitioned into non over- lapping sets and each set is embedded separately into different temporal layers. The base layer is embedded with the first set of DCT coefficient (which includes DC coefficient of each block) and successive layers are embedded with successive non-overlapping coefficient sets. The coefficients of each set is chosen in such a fashion that uniform energy distribution across all temporal layers can be main- tained. Experimental results show that the proposed scheme is robust against temporal scalability and robustness of the watermark increases with the addition of successive enhancement layers to achieve the graceful improvement. 6.2.1 Proposed scheme In this section, proposed watermarking scheme has been illustrated in four sub- sections. Firstly, the process of watermark signal generation is described, followed by embedding and extraction algorithms are presented and finally, how the pro- posed scheme has achieved graceful improvement, is justified. H.264/SVC sup- ports dyadic and non-dyadic temporal scalability and the nature of the scalability as well as number of temporal layers are defined by some input parameters (e.g size of GOP, input frame rate, output frame rate etc.). In the proposed scheme, first, all frames are divided into groups, each containing frames of a specific layer 117TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Block DCT Watermark Distribution Figure 6.12: Watermark Generation as shown in Fig. 6.12. In the Fig. 6.12, L0, L1 and L2 are group of frames. Frames are categorized using the input parameters of H.264/SVC. Then each set of frame is embedded with different watermark as discussed in Sec. 6.2.2. 6.2.2 Watermark Generation In the proposed scheme, gray scale image is used as watermark. The watermark image is divided into non overlapping blocks of the size k × k. Each blocks then are subjected to k × k block DCT transform. Coefficients of each block are divided into N non overlapping sets where N is the number of temporal layers. Each set consists of DCT coefficients in zigzag order as shown in Fig.6.12. In Fig.6.12, watermark generation and distribution of the watermark is shown for 3 dyadic temporal layers. Black colored coefficient from each block is embedded in all frames of layer L0. Watermark for next layer (L1) frames consist of gray coefficients of each block and so on. Number of coefficients in each set is selected in such a way that the energy of the watermark gets distributed uniformly across the every set of coefficients. 118TH-1469_10610110 6.2 Robust video watermarking against Temporal Scalability 6.2.3 Watermark Embedding In the proposed scheme, watermark is embedded in the approximation sub band obtained from 2-level of wavelet decomposition of each frame to make it robust against compressions. After the watermark generation, first set of coefficients which includes DC coefficient is embedded in the each frame of the base layer frame set (L0 in Fig. 6.12). Successive set of coefficients are embedded in the successive enhancement layers as shown in Fig. 6.12. The overall embedding scheme is depicted in Fig. 6.14. Watermark is embedded using Eqn.6.5 C ′1 = C1 + αw (6.5) where C1 is wavelet coefficient after 2-level wavelet decomposition and α is the watermark strength. In Fig. 6.13, plot of absolute values of DCT coefficients of 16×16 block of a gray scale natural image in zigzag order is shown. It is observed from the Fig. 6.13 that the scale of values decreases toward the end of the right bottom corner. So the use of same α value in Eqn.6.5 for every layer may not be useful in this case. So in the proposed scheme, value of α is incremented in every layer depending on the energy distribution of the successive coefficient sets. 6.2.4 Watermark Extraction Same as embedding, watermarked video is also divided into sets of frames during extraction. Each set contains frames from different temporal layers. From dif- ferent set of frames, watermark information are extracted separately. If only the base layer is available at the user end then only the first set of watermark coef- ficients can be extracted. If one or more enhancement layers are available then successive set of coefficients can be extracted. After extraction, coefficients are arranged in a block structure. If all coefficients are not available then blocks are filled with zeros and inverse-DCT is done on the blocks to extract the watermark. The whole extraction scheme is depicted in Fig.6.15. 6.2.5 Graceful Improvement As mentioned in [20], graceful improvement is one of the important characteris- tics of scalable watermarking. For temporal scalability, with addition of higher 119TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Figure 6.13: DCT coefficients in zigzag scan temporal layers, robustness of the extracted watermark should increase. Fig. 6.16 depicts the mechanism employed to achieve graceful improvement within the pro- posed scheme. If only the base temporal layer of the video is available at user end, then only a crude approximation of the watermark image is obtained from the first set of extracted watermark coefficients. Now with the addition of the higher temporal layer, higher frequency coefficients of the watermark image can be ex- tracted, thus improving the correlation of the extracted image with the original 120TH-1469_10610110 6.2 Robust video watermarking against Temporal Scalability Figure 6.14: Watermark Embedding Figure 6.15: Watermark Extraction watermark. 6.2.6 Experimental Result The proposed scheme has been tested on different video sequences with different motion characteristics (e.g. Akiyo with low motion and Bus with high motion) and different size (e.g City and Crew with 4CIF resolution and Bus and Akiyo with CIF resolution). Joint Scalable Video Model (JSVM) [74] reference soft- ware version 9.19.12 is used for temporal scalable video encoding. Robustness of the proposed scheme is calculated by finding the correlation of the extracted watermark image and original watermark image. In Fig.6.17, robustness of the proposed scheme is compared with the Bhowmik’s [39] scheme at different frame rate for different video sequences. The graph shows that with increasing frame rate, robustness of the proposed scheme is increas- ing and it is consistently better than Bhowmik’s[39] scheme for all the video sequences. During extraction in Bhowmik’s approach, all the dropped frames 121TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Figure 6.16: Graceful Improvement are replaced with original frames to maintain the GOP structure. Visual qual- ity of the watermarked video for the proposed scheme performs better than the Bhowmik’s scheme. PSNR comparison of the corresponding watermarked video sequences is shown in Fig.6.18. Payload of each layer frame is different which generates the pattern in the PSNR plot. 6.3 Conclusion In this chapter, two watermarking schemes resilient to temporal adaptation attack are described. In the first scheme, instead of embedding any watermark informa- tion, watermark is generated from the video itself. In that scheme, intensity of few patches from side faces of the video are changed. Due to this change, some new features are generated, which are used as watermark. Experimental result shows that watermark sustains even after 75% frame dropping. In the second scheme, each temporal layer frames are watermarked with different watermark, which is generated by block DCT of a single watermark image. Proposed scheme achieves graceful improvement over the robustness with addition of higher layer 122TH-1469_10610110 6.3 Conclusion Akiyo video Bus video City video Crew video Figure 6.17: Robustness comparison frames. 123TH-1469_10610110 6. ROBUST VIDEO WATERMARKING AGAINST TEMPORAL SCALABILITY Akiyo video Bus video City video Crew video Figure 6.18: PSNR comparison 124TH-1469_10610110 Chapter7 Conclusion and Future Works With the recent popularity of the scalable video coding, secure scalable video transmission become an important requirement. In this dissertation, the entire work is primarily motivated to propose the robust watermarking solutions for different scalable adaptations like resolution, temporal and and quality scalability. A brief summary of contributions made toward these is provided below. 7.1 Watermarking against Resolution and Qual- ity Scalability It has been observed in the literature that most of the existing schemes against res- olution and quality adaptation fail to meet two basic requirements of the scalable watermarking, firstly the watermark should be extracted from each of the scal- able layers and secondly reliability of the extracted watermark should be increased with increase of the video quality layers i.e. achieving graceful improvement. In the first work of this dissertation, a uncompressed domain blind video watermarking scheme is proposed against resolution and quality scalability where enhancement layers are embedded with up-sampled base layer watermark. The spatial synchronization between successive layers are maintained using a location map to achieve the graceful improvement. For base layer, watermark is embedded in a DC frame which is generated by accumulating DC values of non-overlapping 125TH-1469_10610110 7. CONCLUSION AND FUTURE WORKS blocks for every frame in the input video sequence. DC frame sequence is up- sampled and subtracted from the original video sequence to generate residual frame sequence. Then DCT based temporal filtering is applied on DC frame sequence as well as residual frame sequence. The watermark is embedded in low pass DC frames and the up sampled watermark is embedded in the low pass residual frames. It is experimentally shown that the proposed scheme performs well against resolution and quality adaptation and outperforms existing related schemes. 7.2 Watermarking against temporal and quality scalability There exist few schemes [67, 68] in the literature which are resilient to the random frame dropping where number of dropped frames are very less. But these schemes fails against temporal adaptation where number dropped frames are more. In the second work of this thesis, a scalable video watermarking scheme has been pro- posed, which is robust against quality and temporal scalability. In the proposed scheme, wavelet based spatial filtering and DCT based temporal filtering are used for selecting watermark embedding zone. Temporal filtering is used on Group of Picture (GOP) to exploit the correlation among frames and the watermark is embedded in the low pass frames. To extract the watermark, a location map is required which essentially stores locations of embedded watermark in each frame so watermark can be extracted after the temporal adaptation. 7.3 Image watermarking based on SIFT against resolution scaling Although, the proposed scheme against the resolution scalability outperforms recent existing schemes, its performance may be improved especially when the resolution scaling is relatively large. In the third work of the dissertation, a novel SIFT based image watermarking scheme is proposed which is robust to the res- olution scaling, which can be easily extended to video by taking the temporal 126TH-1469_10610110 7.4 Watermarking against Temporal Scalability dimension (motion) into consideration. In this work, a context coherent image patch has been inserted in the image such a way that it generates new and stable SIFT features. These newly generated SIFT feature descriptors are themselves used as the watermark. Since the SIFT features are invariant to scaling, these features can be extracted from any image resolution with high probability. Ex- periment on large image data set have been carried out to prove the efficiency of the scheme over the existing literature against high degree of resolution scaling. 7.4 Watermarking against Temporal Scalability In the fourth chapter of this thesis, a watermarking scheme is proposed for tempo- ral scalability which outperforms the existing methods. But it requires location map for the extraction of the watermark. In the final phase of the work, two blind watermarking schemes are proposed against temporal scalability which re- quires no extra information for the watermark extraction. In the first work, SIFT features are used to handle the temporal scalability. In this work, a patch of a side plane of the video is modified to generate a set of new SIFT feature, which then stored in database as watermark. Modification is done in a low motion area of a randomly selected frame set to avoid flickering artifacts. Effectiveness of the scheme is experimentally justified against the temporal adaptation and frame dropping attacks. In the second work, frames of each temporal layer has been embedded with a different watermark which is generated by block DCT decomposition of a single watermark image. A zigzag sequence of block DCT coefficients of the watermark image is partitioned into non overlapping sets such that energy are distributed uniformly among the sets. Each set of coefficients are then embedded separately into different temporal layers to achieve graceful improvement. The base layer is embedded with the first set of DCT coefficient (which includes DC coefficient of each block) and successive layers are embedded with successive non-overlapping coefficient sets. Experimental result shows good robustness and graceful improve- ment over the temporal scalability. 127TH-1469_10610110 7. CONCLUSION AND FUTURE WORKS 7.5 Future Research Scope The present study of this dissertation is mainly restricted for the uncompressed domain watermarking. Since uncompressed domain schemes are a bit slow be- cause of decoding and further re-encoding, the equivalent study in compressed domain may be an interesting future scope. In addition, combining different scal- ability is always an difficult task and may also be another important future scope of this work. Finally, extension of these schemes for HD, beyond HD and 3D video sequence may also another good topic for further research. 128TH-1469_10610110 References [1] P. Meerwald and A. Uhl, “Toward robust watermarking of scalable video,” in SPIE, Security, Forensics, Steganography, and Watermarking of Multimedia Contents X, vol. 6819, Jan 2008. [Pg.1] [2] T. Stutz and A. Uhl, “A survey of h.264 avc/svc encryption,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 22, no. 3, pp. 325 –339, march 2012. [Pg.2] [3] K. Mokhtarian and M. Hefeeda, “Authentication of scalable video streams with low communication overhead,” Multimedia, IEEE Transactions on, vol. 12, no. 7, pp. 730 –742, nov. 2010. [Pg.2] [4] National Institute of Standards and Technology, “Advanced encryption stan- dard (AES),” FIPS-197, Nov 2001. [Pg.2] [5] P. K. Atrey, W.-Q. Yan, E.-C. Chang, and M. S. Kankanhalli, “A hierarchical signature scheme for robust video authentication using secret sharing,” in Proceedings of the 10th International Multimedia Modelling Conference, ser. MMM ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 330–. [Online]. Available: http://dl.acm.org/citation.cfm?id=968883.969463 [Pg.2] [6] I. Cox, J. Kilian, F. Leighton, and T. Shamoon, “Secure spread spectrum wa- termarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, no. 12, pp. 1673 –1687, dec 1997. [Pg.2], [Pg.6] 129TH-1469_10610110 REFERENCES [7] F. Hartung and M. Kutter, “Multimedia watermarking techniques,” Pro- ceedings of the IEEE, vol. 87, no. 7, pp. 1079–1107, Jul 1999. [Pg.2] [8] M. Swanson, M. Kobayashi, and A. Tewfik, “Multimedia data-embedding and watermarking technologies,” Proceedings of the IEEE, vol. 86, no. 6, pp. 1064–1087, Jun 1998. [Pg.2] [9] G. Langelaar, I. Setyawan, and R. Lagendijk, “Watermarking digital image and video data. a state-of-the-art overview,” Signal Processing Magazine, IEEE, vol. 17, no. 5, pp. 20–46, Sep 2000. [Pg.2], [Pg.5] [10] Y. Tew and K. Wong, “An overview of information hiding in h.264/avc com- pressed video,” Circuits and Systems for Video Technology, IEEE Transac- tions on, vol. 24, no. 2, pp. 305–319, Feb 2014. [Pg.2] [11] S. P. MAITY and M. K. KUNDU, “Performance improvement in spread spectrum image watermarking using wavelets,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 09, no. 01, pp. 1–33, 2011. [Online]. Available: http://www.worldscientific.com/doi/abs/ 10.1142/S0219691311003931 [Pg.2] [12] R. B. Wolfgang and E. J. Delp, “Fragile watermarking using the vw2d wa- termark,” in Proc. SPIE/IS&T Inter. Conf. Security and Watermarking of multimedia Contents, 1999, pp. 204–213. [Pg.5] [13] J. Haitsma and T. Kalker, “A watermarking scheme for digital cinema,” in Image Processing, 2001. Proceedings. 2001 International Conference on, vol. 2, Oct 2001, pp. 487–489 vol.2. [Pg.5] [14] S. Emmanuel and M. S. Kankanhalli, “Mask-based interactive watermarking protocol for video,” pp. 247–258, 2001. [Online]. Available: http: //dx.doi.org/10.1117/12.448209 [Pg.6] [15] R. Anderson and F. A. Petitcolas, “On the limits of steganography,” Selected Areas in Communications, IEEE Journal on, vol. 16, no. 4, pp. 474–481, May 1998. [Pg.6] 130TH-1469_10610110 REFERENCES [16] P. Singh and R. Chadha, “A survey of digital watermarking techniques, ap- plications and attacks,” International Journal of Engineering and Innovative Technology (IJEIT), vol. 2, no. 9, 2013. [Pg.6] [17] G. Doerr and J. Dugelay, “Security pitfalls of frame-by-frame approaches to video watermarking,” Signal Processing, IEEE Transactions on, vol. 52, no. 10, pp. 2955–2964, Oct 2004. [Pg.6], [Pg.8], [Pg.59], [Pg.60] [18] G. Dorr and J.-L. Dugelay, “A guide tour of video watermarking,” Signal Processing: Image Communication, vol. 18, no. 4, pp. 263 – 282, 2003, special Issue on Technologies for Image Security. [Online]. Available: http: //www.sciencedirect.com/science/article/pii/S0923596502001443 [Pg.6] [19] A. Piper, R. Safavi-Naini, and A. Mertins, “Coefficient selection methods for scalable spread spectrum watermarking,” in Digital Watermarking, ser. Lecture Notes in Computer Science, T. Kalker, I. Cox, and Y. Ro, Eds. Springer Berlin Heidelberg, 2004, vol. 2939, pp. 235–246. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-24624-4 18 [Pg.6], [Pg.13] [20] ——, “Resolution and quality scalable spread spectrum image watermark- ing,” in Proceedings of the 7th workshop on Multimedia and security, ser. MM&Sec ’05. New York, NY, USA: ACM, 2005, pp. 79–90. [Online]. Available: http://doi.acm.org/10.1145/1073170.1073186 [Pg.6], [Pg.8], [Pg.35], [Pg.119] [21] J. Seo and H. Park, “Data protection of multimedia contents using scalable digital watermarking,” in Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science, ser. ICIS ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 376–380. [Online]. Available: http://dx.doi.org/10.1109/ICIS.2005.42 [Pg.6] [22] P. Bas, J.-M. Chassery, and B. Macq, “Geometrically invariant watermarking using feature points,” Image Processing, IEEE Transactions on, vol. 11, no. 9, pp. 1014–1028, 2002. [Pg.6], [Pg.7] 131TH-1469_10610110 REFERENCES [23] H.-Y. Lee, C.-h. Lee, H.-K. Lee, and J. Nam, “Feature-based image water- marking method using scale-invariant keypoints,” in Advances in Multimedia Information Processing-PCM 2005. Springer, 2005, pp. 312–324. [Pg.6] [24] H.-Y. Lee, H. Kim, and H.-K. Lee, “Robust image watermarking using local invariant features,” Optical Engineering, vol. 45, no. 3, pp. 037 002–037 002– 11, 2006. [Online]. Available: +http://dx.doi.org/10.1117/1.2181887 [Pg.xx], [Pg.6], [Pg.7], [Pg.78], [Pg.86] [25] J. J. O’Ruanaidh and T. Pun, “Rotation, scale and translation invariant digital image watermarking,” in Image Processing, 1997. Proceedings., In- ternational Conference on, vol. 1. IEEE, 1997, pp. 536–539. [Pg.6] [26] J. Radon, “1.1 u¨ber die bestimmung von funktionen durch ihre integralw- erte la¨ngs gewisser mannigfaltigkeiten,” Classic papers in modern diagnostic radiology, p. 5, 2005. [Pg.6] [27] S. Pereira and T. Pun, “Robust template matching for affine resistant image watermarks,” Image Processing, IEEE Transactions on, vol. 9, no. 6, pp. 1123–1129, 2000. [Pg.6] [28] G. Sharma and D. J. Coumou, “Watermark synchronization: Perspectives and a new paradigm,” in Information Sciences and Systems, 2006 40th An- nual Conference on. IEEE, 2006, pp. 1182–1187. [Pg.7] [29] M. Kutter, S. K. Bhattacharjee, and T. Ebrahimi, “Towards second genera- tion watermarking schemes,” in Image Processing(ICIP). Proceedings. 1999 International Conference on, vol. 1. IEEE, 1999, pp. 320–323. [Pg.7] [30] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004. [Online]. Available: http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 [Pg.7], [Pg.24], [Pg.25], [Pg.77], [Pg.82] [31] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. New York, NY, USA: Cambridge University Press, 2003. [Pg.7], [Pg.77] 132TH-1469_10610110 REFERENCES [32] S. E. Chen and L. Williams, “View interpolation for image synthesis,” in Proceedings of the 20th annual conference on Computer graphics and interactive techniques, ser. SIGGRAPH ’93. New York, NY, USA: ACM, 1993, pp. 279–288. [Online]. Available: http://doi.acm.org/10.1145/166117. 166153 [Pg.7], [Pg.77] [33] D. Lowe, “Object recognition from local scale-invariant features,” in Com- puter Vision, 1999. The Proceedings of the Seventh IEEE International Con- ference on, vol. 2, 1999, pp. 1150–1157 vol.2. [Pg.7], [Pg.77] [34] A. Bosch, A. Zisserman, and X. Muoz, “Scene classification via plsa,” in Computer Vision ? ECCV 2006, ser. Lecture Notes in Computer Science, A. Leonardis, H. Bischof, and A. Pinz, Eds. Springer Berlin Heidelberg, 2006, vol. 3954, pp. 517–530. [Online]. Available: http://dx.doi.org/10.1007/11744085 40 [Pg.7], [Pg.77] [35] J. Mutch and D. Lowe, “Object class recognition and localization using sparse features with limited receptive fields,” International Journal of Computer Vision, vol. 80, no. 1, pp. 45–57, 2008. [Online]. Available: http://dx.doi.org/10.1007/s11263-007-0118-0 [Pg.7], [Pg.77] [36] P. Saeedi, P. Lawrence, and D. Lowe, “Vision-based 3-d trajectory tracking for unknown environments,” Robotics, IEEE Transactions on, vol. 22, no. 1, pp. 119–136, 2006. [Pg.7], [Pg.77] [37] V.-Q. Pham, T. Miyaki, T. Yamasaki, and K. Aizawa, “Geometrically in- variant object-based watermarking using sift feature,” in Image Processing, 2007. ICIP 2007. IEEE International Conference on, vol. 5, 2007, pp. V – 473–V – 476. [Pg.7], [Pg.78] [38] L. Jing, L. Gang, and Z. Jiulong, “Robust image watermarking based on sift feature and optimal triangulation,” in Information Technology and Applica- tions, 2009. IFITA’09. International Forum on, vol. 3. IEEE, 2009, pp. 337–340. [Pg.xx], [Pg.7], [Pg.78], [Pg.86] 133TH-1469_10610110 REFERENCES [39] D. Bhowmik and C. Abhayaratne, “Video watermarking using motion compensated 2d+t+2d filtering,” in Proceedings of the 12th ACM workshop on Multimedia and security. New York, NY, USA: ACM, 2010, pp. 127–136. [Online]. Available: http://doi.acm.org/10.1145/1854229.1854254 [Pg.xix], [Pg.8], [Pg.10], [Pg.11], [Pg.12], [Pg.20], [Pg.23], [Pg.48], [Pg.49], [Pg.51], [Pg.52], [Pg.53], [Pg.60], [Pg.61], [Pg.70], [Pg.71], [Pg.72], [Pg.75], [Pg.121] [40] P. Vinod and P. Bora, “Motion-compensated inter-frame collusion attack on video watermarking and a countermeasure,” Information Security, IEE Proceedings, vol. 153, no. 2, pp. 61 – 73, june 2006. [Pg.8], [Pg.9], [Pg.10], [Pg.20], [Pg.59], [Pg.61] [41] H.-S. Jung, Y.-Y. Lee, and S. U. Lee, “Rst-resilient video watermarking using scene-based feature extraction,” EURASIP J. Appl. Signal Process., vol. 2004, pp. 2113–2131, Jan. 2004. [Online]. Available: http://dx.doi.org/10.1155/S1110865704405046 [Pg.8], [Pg.9], [Pg.10] [42] W. Lu, R. Safavi-Naini, T. Uehara, and W. Li, “A scalable and oblivious digital watermarking for images,” in Signal Processing, 2004. Proceedings. ICSP ’04. 2004 7th International Conference on, vol. 3, aug.-4 sept. 2004, pp. 2338 – 2341 vol.3. [Pg.8] [43] C.-C. Wang, Y.-C. Lin, S.-C. Yi, and P.-Y. Chen, “Digital authentication and verification in mpeg-4 fine-granular scalability video using bit-plane watermarking.” in IPCV, H. R. Arabnia, Ed. CSREA Press, 2006, pp. 16–21. [Online]. Available: http://dblp.uni-trier.de/db/conf/ipcv/ ipcv2006-1.html#WangLYC06 [Pg.8], [Pg.9], [Pg.33] [44] P. Meerwald and A. Uhl, “Robust watermarking of h.264-encoded video: Extension to svc,” in Proceedings of the 2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, ser. IIH-MSP ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 82–85. [Online]. Available: http://dx.doi.org/10.1109/IIHMSP.2010.28 [Pg.xix], [Pg.8], [Pg.12], [Pg.13], [Pg.33], [Pg.36] 134TH-1469_10610110 REFERENCES [45] A. Alattar, E. Lin, and M. Celik, “Digital watermarking of low bit-rate advanced simple profile mpeg-4 compressed video,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 8, pp. 787–800, Aug 2003. [Pg.9] [46] F.-C. Chang, H.-C. Huang, and H.-M. Hang, “Layered access control schemes on watermarked scalable media,” in Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, may 2005, pp. 4983 – 4986 Vol. 5. [Pg.9] [47] Y. Wang and A. Pearmain, “Blind mpeg-2 video watermarking in dct domain robust against scaling,” Vision, Image and Signal Processing, IEE Proceed- ings -, vol. 153, no. 5, pp. 581 –588, oct. 2006. [Pg.9], [Pg.35], [Pg.48], [Pg.49], [Pg.51], [Pg.52], [Pg.53] [48] L. Yan and Z. Jiying, “Rst invariant video watermarking based on 1d dft and radon transform,” in Visual Information Engineering, 2008. VIE 2008. 5th International Conference on, 29 2008-aug. 1 2008, pp. 443 –448. [Pg.10] [49] Z. Huai-yu, L. Ying, and W. Cheng-ke, “A blind spatial-temporal algorithm based on 3d wavelet for video watermarking,” in Multimedia and Expo, 2004. ICME ’04. 2004 IEEE International Conference on, vol. 3, 2004, pp. 1727– 1730 Vol.3. [Pg.10] [50] A. Essaouabi and E. Ibnelhaj, “A 3d wavelet-based method for digital video watermarking,” in Networked Digital Technologies, 2009. NDT ’09. First International Conference on, 2009, pp. 429–434. [Pg.10] [51] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the h.264/avc standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 17, no. 9, pp. 1103 –1120, sept. 2007. [Pg.10] [52] M. Flierl and B. Girod, “Video coding with motion-compensated lifted wavelet transforms,” Signal Processing: Image Communication, vol. 19, no. 7, pp. 561 – 575, 2004. [Online]. Available: http: 135TH-1469_10610110 REFERENCES //www.sciencedirect.com/science/article/pii/S0923596504000372 [Pg.10], [Pg.19] [53] S.-J. Choi and J. Woods, “Motion-compensated 3-d subband coding of video,” Image Processing, IEEE Transactions on, vol. 8, no. 2, pp. 155 – 167, feb 1999. [Pg.10], [Pg.19] [54] F. Verdicchio, Y. Andreopoulos, T. Clerckx, J. Barbarien, A. Munteanu, J. Cornelis, and P. Schelkens, “Scalable video coding based on motion- compensated temporal filtering: complexity and functionality analysis.” in ICIP, 2004, pp. 2845–2848. [Online]. Available: http://dblp.uni-trier. de/db/conf/icip/icip2004-5.html#VerdicchioACBMCS04 [Pg.xix], [Pg.10], [Pg.19], [Pg.20] [55] M. Noorkami and R. M. Mersereau, “A framework for robust watermarking of h.264-encoded video with controllable detection performance,” Informa- tion Forensics and Security, IEEE Transactions on, vol. 2, no. 1, pp. 14 –23, march 2007. [Pg.12] [56] R. Atta and M. Ghanbari, “Spatio-temporal scalability-based motion- compensated 3-d subband/dct video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 16, no. 1, pp. 43 – 55, jan. 2006. [Pg.20], [Pg.23], [Pg.61] [57] J.-R. Ohm, “Advanced packet-video coding based on layered vq and sbc techniques,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 3, no. 3, pp. 208 –221, jun 1993. [Pg.23] [58] N. Riche, M. Mancas, M. Duvinage, M. Mibulumukini, B. Gosselin, and T. Dutoit, “Rare2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis,” Signal Processing: Image Communication, vol. 28, no. 6, pp. 642–658, 2013. [Pg.xix], [Pg.26], [Pg.27], [Pg.85], [Pg.88], [Pg.93] [59] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess- ment: from error visibility to structural similarity,” Image Processing, IEEE Transactions on, vol. 13, no. 4, pp. 600 –612, april 2004. [Pg.28], [Pg.48] 136TH-1469_10610110 REFERENCES [60] F. X, G. W, L. Y, and Z. D, “Flicking reduction in all intra frame cod- ing,” JVT-E070, Tech. Rep, October 2002. [Pg.28], [Pg.39], [Pg.48], [Pg.49], [Pg.61], [Pg.106], [Pg.115] [61] F. Xiao, “Dct-based video quality evaluation,” Final Project for EE392J, Final Project for EE392J, December 2000. [Pg.28], [Pg.29], [Pg.48], [Pg.49] [62] D.Vatolin, M.Smirnov, A.Ratushnyak, and V.Yoockin, “Msu video quality measurement tool,” MSU Graphics and Media Lab, Tool, 2001-2008. [Online]. Available: http://www.compression.ru/video/ [Pg.28], [Pg.29], [Pg.48], [Pg.112], [Pg.115] [63] A. B. Watson, “Dct quantization matrices visually optimized for individual images,” in IS&T/SPIE’s Symposium on Electronic Imaging: Science and Technology. International Society for Optics and Photonics, 1993, pp. 202– 216. [Pg.29], [Pg.30], [Pg.85], [Pg.93] [64] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in Com- puter Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013, pp. 1155–1162. [Pg.31], [Pg.84] [65] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset,” 2007. [Pg.31], [Pg.84] [66] R. Uetz and S. Behnke, “Large-scale object recognition with cuda-accelerated hierarchical neural networks,” in Intelligent Computing and Intelligent Sys- tems, 2009. ICIS 2009. IEEE International Conference on, vol. 1. IEEE, 2009, pp. 536–541. [Pg.31], [Pg.84] [67] C. Chen, J. Ni, and J. Huang, “Temporal statistic based video watermarking scheme robust against geometric attacks and frame dropping,” in Digital Watermarking, ser. Lecture Notes in Computer Science, A. Ho, Y. Shi, H. Kim, and M. Barni, Eds. Springer Berlin Heidelberg, 2009, vol. 5703, pp. 81–95. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-03688-0 10 [Pg.xxi], [Pg.60], [Pg.113], [Pg.114], [Pg.115], [Pg.126] 137TH-1469_10610110 [68] C. Wang, C. Zhang, and P. Hao, “A blind video watermark detection method based on 3d-dwt transform,” in Image Processing (ICIP), 2010 17th IEEE International Conference on, Sept 2010, pp. 3693–3696. [Pg.60], [Pg.126] [69] J. Morel and G. Yu, “Is sift scale invariant?” Inverse Problems and Imaging, vol. 5, no. 1, pp. 115–136, 2011. [Pg.77] [70] R. Haralick, “Some neighborhood operators,” in Real-Time Parallel Com- puting. Springer, 1981, pp. 11–35. [Pg.79] [71] N. Otsu, “A threshold selection method from gray-level histograms,” Auto- matica, vol. 11, no. 285-296, pp. 23–27, 1975. [Pg.80] [72] H. Su, W.-H. Chuang, W. Lu, and M. Wu, “Evaluating the quality of individ- ual sift features,” in Image Processing (ICIP), 2012 19th IEEE International Conference on, Sept 2012, pp. 2377–2380. [Pg.87] [73] P. Bollimpalli, N. Sahu, and A. Sur, “Sift based robust image watermarking resistant to resolution scaling,” in Image Processing (ICIP), 2014 21st IEEE International Conference on, Sept 2014. [Pg.92], [Pg.110] [74] J. Reichel, H. Schwarz, and M. Wien, “Joint scalable video model 11 (jsvm 11),” Jul. 2007. [Pg.112], [Pg.121] 138TH-1469_10610110 List of Publication Journal Publication : 1. Nilkanta Sahu, Shuvendu Rana, Arijit Sur, ”MCDCT-TF based video watermarking resilient to temporal and quality scaling”, Multimedia Tools and Application, doi :10.1007/s11042-015-2949-y. 2. Shuvendu Rana, Nilkanta Sahu, and Arijit Sur, ”Robust watermarking for resolution and quality scalable video sequence”, Multimedia Tools and Applications, Springer US, 2015, 74, 7773-7802. 3. Arijit Sur, Sista Venkat Madhav Krishna, Nilkanta Sahu and Shuvendu Rana, ”Detection of Motion Vector Based Video Steganography”, Multi- media Tools and Applications, Springer, pp. 1-16, 2014 Conference Publication : 1. Priyatham Bollimpalli, Nilkanta Sahu and Arijit Sur, ”SIFT Based Ro- bust Image Watermarking Resistant To Resolution Scaling,”, IEEE Inter- national Conference on Image Processing (ICIP), pp. 5507-5511, Paris, France, October, 2014 2. Nilkanta Sahu, Vivek Tiwari and Arijit Sur ”Robust Video Watermark- ing Resilient to Temporal Scalability”, National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), Patna, India. Book Chapter : 1. Nilkanta Sahu, Arijit Sur, ”Scalable Video Watermarking : A Survey”, In R. Pal (Ed.), Innovative Research in Attention Modeling and Computer Vision Applications (pp. 365-387). Hershey, PA: Information Science Ref- erence. doi:10.4018/978-1-4666-8723-3.ch015 139TH-1469_10610110