Everyone has heard of Photoshop, and there’s a reason why it is so popular. Given superior flexibility, photo editors turn into magic wands in the hands of not only professionals, but also amateurs and newbies. Such software is capable of working visual wonders — from creating abstract art to building Mr. Olympia physique in a matter of hours. But, as Uncle Ben said, “With great power comes great responsibility,” the largest part of which lies on the shoulders of people who use photo editors to commit fraud and spread misinformation. Photoshop has been around for enough time such that experts have accumulated knowledge and experience to spot a doctored image easily. However, one method of digital material manipulation has emerged relatively recently, and we’ve yet to discover its full potential. We are talking about deepfakes.

The term “deepfake”, first coined by a Reddit user in 2017, is a combination of the terms “deep learning” and “fake”. “Deep learning” (DL) refers to techniques used in artificial intelligence (AI) that generalize the way the human brain works. Such AI models, if given enough quality data, are capable of performing complex tasks and expressing human-like behavior, and thus are employed in technologies such as chatbots, facial recognition, translators, automated vehicles, and virtual assistants. So far, it seems like the power of DL is being harnessed and directed in the right way. But the same cannot be said for its applications to deepfake. 

Deepfakes are an example of seemingly childish and, at the same time, dangerous approach to utilizing the power of AI. Have you seen videos of Tom Cruise talking nonsense, Vladimir Putin making questionably bold political statements, or Mark Zuckerberg admitting that he is spying on us? If yes, then you have been exposed to deepfake. In simpler terms, deepfakes are videos that were altered to show very realistic-looking fake content. Most deepfakes shown to the public depict politicians or celebrities saying something ridiculous or acting in an unnatural way. As yet another satisfaction of peoples’ fantasies, deepfake videos of pornographic nature have also become increasingly popular. You might think that, compared to a face swap done on a single photo, such a task done on a video would require manually modifying thousands of individual frames. Seems infeasible, right? Then how does AI manage to build such realistic fake materials?

A popular DL model responsible for deepfakes essentially consists of two parts: an encoder and a decoder. First, to “train” the algorithm, an encoder model analyzes and compares photos of two people. Next, the encoder combines photos of the two people into compressed images representing their common facial structure and features. Using these “simplified” versions of the pictures, a decoder model — which also needs to be trained — recreates either one or both of the faces. In short, you give some photos to the encoder, get compressed material, and feed it to the decoder. If trained well, the decoder will produce realistic videos of both people with their faces swapped.

With enough material and time, DL models can be trained to produce results which are sometimes indistinguishable from what is real. Raw information is easily accessible in the world of big data, which is one of the reasons why deepfakes are so realistic, popular, and dangerous. They can be used for impersonation, digital fraud, fake news, harassment, and so on, which pose a considerable amount of threat to one’s security, privacy, and mental wellbeing. For example, in 2019, a CEO of an energy company in the UK transferred 243,000 USD to a scammer who pretended to be the CEO’s German colleague by using deepfake. On a greater scale, even though it is unlikely that deepfakes will pose any real threat to global economy and international relations, there are numerous ways that could undermine public morale and decrease transparency in major events, such as presidential elections or conflict resolution.

As of now, deepfakes are not developed enough to be a recurrent problem; nevertheless, they can become omnipresent in no time. It is important to highlight the fact that deepfakes represent technological progress, and there is no reason why we shouldn’t be proud of it. As advancements in technology open up new possibilities, we need to continuously adapt to the world filled with our own creations. In the near future, there will be a need for education centered around deepfake crime prevention practices. Ironically, DL itself can be used to spot deepfakes, but given its complexity, faster methods should be promoted. To spot a deepfake, one might want to look for unnatural facial features, movements, and inconsistent lighting. To protect one’s own photos and videos, digital fingerprints and watermarks should be used. Ultimately, deepfakes pose no threat to someone who perceives the media with a healthy dose of skepticism, and this attitude should be practiced in the current times as well.

Copyright © The KAIST Herald Unauthorized reproduction, redistribution prohibited