Advanced Deep Fake Detection System
DOI:
https://doi.org/10.63671/ijsesr.v2i1.70Keywords:
Deepfake Detection, Multimodal Learning, Convolutional Neural Network (CNN), Random Forest Classifier, TF-IDF Vectorizer, Cosine Similarity, Vision Transformer, SIGLIP, RoBERTa, AI-Generated Content Detection, Digital Forensics, Content Authentication, Machine LearningAbstract
The production and distribution of deepfake material in text, audio, video, and image modalities has greatly grown because to the quick development of generative artificial intelligence. Public safety, journalism, cybersecurity, and digital trust are all seriously threatened by such synthetic media. The efficiency of current detection techniques in handling intricate, multimodal manipulation techniques is limited by their primary focus on single-modality analysis. This work offers an integrated multimodal deepfake detection methodology intended to evaluate authenticity across diverse media sources inside a single architecture in order to get over this restriction. A Convolutional Neural Network (CNN) is used to extract spatial information from video frames and detect manipulation traces, visual artifacts, and face abnormalities in order to detect video deepfakes. In order to detect audio deepfakes, discriminative acoustic features are extracted and then classified using a Random Forest method, which offers resilience against attacks using speech synthesis and voice cloning. A TF-IDF Vectorizer in conjunction with Cosine Similarity is used to quantify the semantic similarity between reference materials and input text in order to detect textual plagiarism. The system incorporates a refined vision transformer model based on SIGLIP for image authenticity verification in order to differentiate between AI-generated images and content provided by humans. Furthermore, a RoBERTa-base transformer model that can categorize both machine-generated and human-written text utilizing contextual embeddings is used for AI-generated text identification. A full authenticity score is generated by combining the outputs from all modalities. The framework's usefulness for automated content moderation systems and real-world digital forensics is highlighted by experimental evaluation, which shows better robustness and scalability in comparison to isolated detection approaches.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Science and Engineering Science Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
