Wavelet-Packet Powered Deepfake Image Detection

Modern neural networks generate realistic artificial images and audio. This development will allow us to create movies, music and audio effects never seen before. Yet at the same time, the new technology may enable new digital ways to lie.

In response, the need for a diverse and reliable toolbox arises to identify artificial images and other content. This short blog post aims to summarize the main points regarding the use of the wavelet packet transform to identify artificially generated deepfake images. The key observation is that wavelet packet coefficients are distributed differently for real and fake images.

The image above illustrates this. The leftmost column shows a single real image from the Flickr-Faces-HQ data set as well as an artificially generated image for reference. To study the feasibility of wavelet packets for deepfake detection third-degree Haar-Wavelet packet coefficients are computed for 5k real and fake images using the PyTorch-Wavelet-Toolbox. Comparing the mean coefficients in the center as well as their standard distribution, we notice differences especially as the frequency increases along the diagonal. The standard deviation is significantly different in the background parts of the images across the board. The differences suggest a possibility to separate real from fake based on the wavelet packet coefficients.

A first experiment explores the separability of images from the Flicker-Faces-HQ dataset as well as style-gan generated images. Working with 63k 128 by 128 images from each source the task is to identify the origin of an image.

The plot above shows the convergence of a classifier trained to identify the source of an image. The wavelet packets allow the classifier to converge faster with performance improvements during all stages of the training.

If you would like to find out more the source code as well as a preprint are now freely available online.

Wavelet optimization for Network compression

Wavelets are uncommon in machine learning, systems with learnable wavelets, in particular, are rare. Promising applications of wavelets in neural networks exist. Adaptive wavelets for network compression are explored in the new paper ‘Neural network compression via learnable wavelet transforms‘. By defining new wavelet loss terms based on the product filter approach to wavelet design, the wavelets become part of the network architecture. They can be learned just like any other weights. Source code implementing wavelet optimization in PyTorch is available on Github.

Jaxlets – Fast Wavelet Transformations in JAX

The fast wavelet transform is an important signal processing algorithm. Jet a differentiable implementation in JAX has been missing so far, I have therefore opened my implementation . It supports the one and two dimensional analysis and synthesis transforms. As well as an implementation of the forward wavelet packet transform. The plot below shows an analysis of a linear chirp signal using a Daubechies wavelet.

Wavelet analysis of a linear chirp signal.

As the chirps’ frequency increases we see that the wavelet coefficients rise as well.

Source code is available at https://github.com/v0lta/jaxlets .

Video Prediction à la Fourier

Video frame prediction is a very challenging problem. Many recent neural network based solution-attempts trained using a mean squared error lead to blurry predictions. My most recent paper currently under review proposes to use Phase correlation and the Fourier-Shift theorem estimate changes and transform current images into predictions. A demo is shown below. The video shows ground truth (left), shift prediction (middle) and an off the shelf GRU prediction (right).

Source code is available on github .

A more detailed description is available in the paper .

Complex Recurrent Neural Nets

The paper Complex gated recurrent neural networks explores machine learning in the complex domain. For gradient descent to work the functions involved must be differentiable. In the complex domain holomorphic functions, which satisfy the Cauchy-Riemann partial differential equations are differentiable. Finding functions which fulfill this requirement and are useful for machine learning tasks is very difficult. In practice split differentiable complex functions are used which are real differentiable in the real and complex parts. This is true for the two most popular complex activation functions the ModRelu and the Hirose non-linearites shown below:

Modern RNNs rely on gating equations for memory management. Typically the gates produce values between zero and one, where one means that a value will be stored in the memory cell and zero that it will be removed. In the complex domain this behavior can be reproduced by using mappings from C to R, in particular a weighted average of the real and imaginary parts can be fed into a sigmoid non-linearity.

Using the split differentiable approach with a hirose activation and C to R gates its possible to define complex memory cells. The plot below tests their performance on the synthetic memory and adding benchmark problems.

In short it can be observed that the complex gated cell can solve both the memory as well as the adding problem, when it combines the complex orthogonal structures from uRNNs with a gating mechanism similar to classic RNNs. For a more detailed discussion please take a look at the full paper

Below a complex memory unit solving the human motion prediction problem can be seen in action:

A complex gru cell solving the human motion prediction problem.

The code for this project is available on Github. I tested the complex memory cell on human motion data using a setting following this repository.


Fourier methods have a long and proven track record as an excellent tool in data processing. Integrating Fourier methods into complex recurrent neural network architectures is therefore an important goal. I integrated the short-time Fourier transform into a recurrent (complex-valued) network structure. This helps when dealing with hard prediction tasks such as human motion prediction, a demo paper and code are available.

Paper: https://arxiv.org/pdf/1812.05645.pdf

Code: https://github.com/v0lta/Spectral-RNN

Control Engineering

My favorite control project thus far as been a quad-copter control project. The project consisted of three steps. In a first step a state space model for a quad-copter had to be found. After testing the model with some simple simulations, an LQR controller based in this model was designed. Secondly a small weight disturbance was added, which was counteracted by using integrators in the controller. Finally to reach more realistic scenarios Kalman-Filtering was included in the design. The plots below show the results of a simulated test flight:

In the top left plot the red circles show checkpoints which the quad-copter had to reach as quickly as possible. The graphs below the position in x,y,z as well as control actions and the evolution of the quadcopter’s angles over time is shown. Overall it can be concluded, that the Simulink-LQR-Controller does its job.

Read more of my work on control:

Project report of my quadcopter control project

Bachelor thesis on pseudospectra in control [in German]

Listen, Attend and Spell

During my Master Thesis project I re-implemented Listen, attend and Spell, an attention based speech recognition system. A key problem in speech recognition is that often it is unknown what is said when. In other words the speech signal and its transcription is unaligned. Attention based system such as the one I wrote solve this problem by computing attention weights for each input vector. A visualization of the system is given below:

The LAS architecture. BLSTM blocks are shown in red.LSTM blocks in blue and attention nets in green.

The overall system consists of the two blocks shown in the image above. A listener network computes a compressed fixed length encoding of an input signal. Which is then transcribed to an output sequence by the speller. To come up with a transcription the speller has to compute attention weights such as those shown below:

Plot of the alignment vectors computed by the network for all 45 labelsassigned to timit utterancefmld0_sx295(left), and alignments assigned by a humanlistener (right).

The plot above reveals that the attention weights found be the network are quite similar to those assigned by a human listener. Some artifacts remain, but its must be kept in mind that TIMIT is a quite small data set. Better results have been observed when larger data sets and more that just a single 8GB graphics card are used for training.

The source code is available on github.

For an in depth discussion please take a look at my thesis text.

The links below lead to more samples of my work on support vector machines and data-mining:
More on svms
More on data-mining

Medical Image Analysis

As coursework for my computer vision class in Leuven I looked into support vector machines on medical data. The task was to locate incisor teeth.

For every window the svm generates the probabilities for a miss and hit. In this example the frame with the largest hit probability was chosen, as all images where known to contain incisors somewhere.

The links below lead to more samples of my work on support vector machines and data-mining:
More on svms
More on data-mining

Linear Algebra

One of the most interesting things I have encountered in linear algebra are pseudospectra and their relation to toeplitz symbol functions, as well as their associated circulant matrix eigenvalues.
Below I have included plots which illustrate this beautiful relation (click to enlarge in new tab):


Shown on the left are the Symbol functions (yellow), Toeplitz eigenvalues (blue) and circulant matrix
eigenvalues (green). On the right epsilon-pseudospectra of the same matrices are shown.

Interested? More on: