カンファレンス (国際) Learnable Cube-based Video Encryption for Privacy-Preserving Action Recognition
Yuchi Ishikawa, Masayoshi Kondo, Hirokatsu Kataoka
IEEE/CVF Winter Conference on Applications of Computer Vision 2024 (WACV 2024)
With the development of cloud services and machine learning, there has been an inevitable need to enhance privacy and security when serving video recognition models. Although existing image encryption methods can be used to address this issue, applying them frame by frame to videos is insufficient in two respects: model performance degradation and security strength. In this paper, we propose a novel encryption approach for privacy-preserving action recognition. It consists of two encrypting operations; Learnable Cube-based Video Encryption (LCVE) and ViT Scrambling. LCVE is video encryption based on spatio-temporal cubes, which has a large key space and can provide robust privacy protection. ViT Scrambling encrypts the Vision Transformer (ViT) model, which enables it to recognize the encrypted videos in the same manner as unencrypted videos without modifying the model architecture or fine-tuning on the encrypted data. We evaluate our method in an action recognition task with seven datasets containing a variety of action classes as well as motion and visual patterns. Empirical results demonstrate that LCVE combined with ViT Scrambling can preserve video privacy while recognizing action in encrypted videos as well as unencrypted videos. As a result, our approach outperforms existing privacy-preserving action recognition methods.