Deepfake FAQs

The Basics of Deepfakes

Amplifies Disinformation:

Generative AI reduces the cost, time, and skill needed to engage in large-scale disinformation campaigns.

ABC NEWS Verify finds social media posts spreading misinformation about the Bondi attack are receiving millions of views, as a deepfake image depicting one of the victims circulates online. #ai

[image or embed]

— Aussie News (@aussienews.bsky.social) December 16, 2025 at 2:00 PM

Undermines Democracy:

Deepfakes can erode the public’s trust in democratic institutions and undermine their ability to effectively fulfill their mandates.

The Irish presidential election campaign has been disrupted by an artificial intelligence-generated deepfake video of candidate Catherine Connolly announcing her “withdrawal” from the race.

[image or embed]

— POLITICO Europe (@politico.eu) October 22, 2025 at 11:32 AM

Persecutes Citizens:

Generative AI facilitates the creation of personalized, multi-lingual content designed to harass and intimidate political activists.

Thanks to advances in AI, transnational repression is entering a new phase. Governments can now monitor, intimidate, and target people across borders with unprecedented precision. www.theguardian.com/world/2025/d...

[image or embed]

— Social Media Lab (@socialmedialab.ca) December 11, 2025 at 12:01 AM

Manipulates Emotions:

Generative AI’s ability to convincingly simulate human conversation can affect users’ emotions, leading to a variety of potential harms.

California and Delaware AGs blast OpenAI over its youth safety. The AGs are demanding more information from the AI platform after accusations of recent chatbot-linked deaths.

[image or embed]

— Politico (@politico.com) September 5, 2025 at 1:53 PM

Facilitates Hate:

Deepfakes have contributed to a significant increase in digital hate, harassment, and violence against vulnerable groups.

Even celebrities have struggled with the current wave of deepfake apps, and the tools are increasingly being used to torment young girls in middle and high school.

[image or embed]

— Ars Technica (@arstechnica.com) July 3, 2025 at 8:39 AM

Enhances Fraud:

Generative AI enables criminal actors to significantly enhance the effectiveness of their methods and techniques for defrauding users.

Facebook has been overrun with AI spam and scams. Content moderation experts say Facebook has stopped asking them for help: www.404media.co/has-facebook...

[image or embed]

— 404 Media (@404media.co) June 24, 2024 at 11:15 AM

Implementing and Evaluating Solutions

Further Reading

A non-exhaustive list of key technologies, listed in no particular order, that make deepfakes possible.
  1. Generative Models: 
    • Diffusion Models are a type of deep learning model used to generate images, audio, and other data types. Introduced in 2015, these models are inspired by diffusion processes in physics, where a substance spreads out over time. They work by representing the data as a sequence of progressively noisier versions of an original dataset and transforming noise into structured data. In recent years, diffusion models have gained popularity for their ability to generate high-quality samples, particularly in image synthesis. They are often considered alternatives to generative models like Generative Adversarial Networks and transformer models.
    • Generative Adversarial Networks (GANs) are a class of generative deep learning models introduced in 2014 to generate high-quality and realistic outputs, including images, videos, and audio. GANs consist of two neural networks: a generator and a discriminator. These networks “compete” against each other in a game-like setting. The generator creates fake data (e.g., images, video), while the discriminator evaluates it to determine whether it is real or fake. This competitive process helps ensure that the outputs produced by the generator are accurate. 
    • Transformer Models are a type of generative deep learning model (a specific type of encoder-decoder architecture) that uses self-attention mechanisms to process and analyze sequential data, allowing for the parallel handling of input sequences. This type of model transforms input data (e.g., pixels from an image) into tokens and then uses a “self-attention mechanism” to learn connections between tokens, which in turn allows the model to generate new content (e.g., images) by predicting new tokens (e.g., pixels). Self-attention mechanisms are a key component of transformer models that allow the model to weigh the importance of different tokens in a sequence relative to each other. In other words, these mechanisms help capture contextual relationships and dependencies within the data, enabling the model to focus on relevant parts of the input when making predictions or generating outputs. Since their introduction in 2017, transformers have become foundational in developing state-of-the-art models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer).
  2. Traditional Video and Photo Editing Tools are sophisticated software that allows users to edit, manipulate, and transform raw visual media files into professional photos and videos. While AI can assist with tasks such as object tracking, motion smoothing, and automated effects, traditional photo and video editing tools can fine-tune the AI outputs, making manual adjustments to enhance the realism and overall quality of the final product. This combination keeps the human in the loop and it allows for more precise and realistic visual effects, bridging the gap between automated AI-generated content and the creative, human touch necessary for professional and more realistic results. 
  3. Facial Recognition and Tracking (FRT) technologies detect, identify, and follow faces in images or videos. They are widely applied in areas such as security, user authentication, augmented reality, and social media.
  4. Voice Synthesis and Audio Processing are two interconnected fields that focus on the generation, manipulation, and analysis of sound, particularly human speech and music. Both areas leverage advanced algorithms and technologies to create, modify, and understand audio signals.
  5. Machine Learning (ML) is a subset of Artificial Intelligence (AI) that focuses on developing algorithms and statistical models that enable computers to perform tasks without explicit instructions. Instead of being programmed to follow specific rules, machine learning systems learn from data, identifying patterns and making decisions based on that data.
  6. Natural Language Processing (NLP) is a subfield of ML, focused on enabling computers to understand, interpret, and generate human language. The key technologies involved in NLP are: i) tokenization and parsing, which break down text into smaller units that the computer can understand; ii) word embeddings, which enable computers to understand the contextual meanings of human language; and, iii) transformer models (e.g., ChatGPT), which understand and generate human-like text.
  7. Deep Learning (DL) is a subset of ML that uses artificial neural networks (defined below) to model complex patterns in data. It involves training networks with multiple layers (hence “deep”) to automatically learn features from raw input data, enabling it to solve tasks like image recognition, natural language processing, and game-based challenges.
  8. Neural Networks (NN) are a fundamental component of AI and ML. They are inspired by the structure and function of the human brain and are used to teach a computer how to create synthetic media that resembles the real thing as closely as possible. There are different types of NNs designed for different data types and applications.

A non-exhaustive list of common neural networks

  • Autoencoder (AE) are a type of NN used to learn efficient data representations, typically for dimensionality reduction or feature learning. In the context of deepfakes, AEs are commonly used to create deepfakes of human faces. They are composed of two parts: i) an encoder that compresses the input data into a more manageable form; and, ii) a decoder that reconstructs the original data from the compressed version.
  • Convolutional Neural Networks (CNN) are a type of artificial NN designed to work with grid-like data, most commonly images. They are especially effective at capturing spatial hierarchies in data, making them well-suited for tasks like image recognition, object detection, and segmentationCNNs are used to power deepfake techniques like face-swapping and facial animation to recreate realistic faces and expressions. 
  • Encoder-Decoder Architecture (EDA) is a type of DL model used to transform one type of data into another. It is widely used in translation, text-to-speech, text summarization, and image captioning. Like an AE, EDA architecture involves two main components: an encoder that processes the input and a decoder that generates the desired output.
  • First Order Motion Model (FOMM) is a DL-based model primarily used for animating still images. FOMM generates animations by understanding and modeling the movement of an object (like a face). Unlike conventional methods that need an extensive training dataset of a particular target object, FOMM uses unsupervised learning to animate images without specific training. 
  • Recurrent Neural Networks (RNNs) are a class of NN designed for processing sequential data. Unlike traditional feedforward NNs, RNNs have connections that loop back on themselves, allowing them to maintain a memory or context of preceding data in a sequence while simultaneously generating new data. This unique structure makes RNNs particularly effective for tasks involving sequences, such as time series analysis, NLP, and speech recognition.

Erosion of Public Trust in Democratic Institutions

Amplification of Mis- and Disinformation

Scams, Fraud, and Digital Deception

Gendered Violence and Online Hate

Mental Health and Promoting Harmful Behaviours 

Transnational Repression and Foreign Interference