Member-only story
DeepFake is composed of Deep Learning and Fake means taking one person from an image or video and replacing it with someone else likeness using technology such as Deep Artificial Neural Networks [1].
Kaggle dataset is used https://www.kaggle.com/c/deepfake-detection-challenge/data
Github Reference: https://github.com/ageitgey/face_recognition
!pip install face_recognition
Data
- The data is comprised of .mp4 files, split into ~10GB apiece. A metadata.json accompanies each set of .mp4 files and contains the filename, label (REAL/FAKE), original and split columns, listed below under Columns.
- The full training set is just over 470 GB.
References: https://deepfakedetectionchallenge.ai/faqs
Data exploration
DATA_FOLDER = '../input/deepfake-detection-challenge'
TRAIN_SAMPLE_FOLDER = 'train_sample_videos'
TEST_FOLDER = 'test_videos'
print(f"Train samples: {len(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))}")
print(f"Test samples: {len(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER)))}")
Files
- train_sample_videos.zip
- sample_submission.csv
- test_videos.zip