Continuous Saudi Sign Language Dataset Development
Dates: 2024
Principal Investigator: Dr. Hamzah Luqman
Description: The lack of annotated datasets at the sentence level hinders progress in developing real-time Continuous Sign Language Recognition (CSLR) systems. This project aims to address the primary challenge in CSLR systems by proposing the first continuous Saudi Sign Language (SSL) dataset. By collecting data from Saudi TV programs for deaf individuals and utilizing mobile phone cameras in selfie mode, the dataset will comprise 1,000 sentences. An additional dataset of 300 sentences will be collected for evaluation purposes, potentially expanded to 1,000 sentences using various video cameras if necessary. To ensure dataset diversity, multiple signers will contribute samples using different cameras. These datasets will enable the study of SSL's linguistic properties, facilitating the development of machine translation systems between Arabic spoken language and SSL. Annotation at the sentence level will enhance the datasets' utility for recognition and translation tasks. Public availability of these datasets will encourage researchers to develop and evaluate CSLR systems, fostering advancements in the field.