Spatio-Temporal Deep Learning for Improved Face Presentation Attack Detection
- Authority: Knowledge-Based Systems
- Category: Journal Publication
Face presentation attack detection (FacePAD) is critical for securing face recognition systems against attacks such as printed photos, videos, and 3D masks. Existing methods often struggle with generalizability, computational efficiency, and handling sophisticated attacks, particularly in resource-constrained environments. To address these challenges, this study proposes a lightweight CNN-based architecture, MobileNetV3, integrated with spatio-temporal feature extraction. The proposed method effectively captures both dynamic and static characteristics and achieves state-of-the-art performance, including an Equal Error Rate (EER) of 0.0% on the Replay-Attack and Replay-Mobile datasets, and 0.83% on the challenging ROSE-Youtu dataset. With real-time efficiency, processing 256 samples in 11 ms, the model is suitable for deployment on mobile and embedded platforms. This work demonstrates that lightweight architectures with spatio-temporal features can balance computational efficiency and accuracy, setting a benchmark for practical FacePAD systems in applications like mobile authentication, surveillance, and access control, enhancing biometric security.