LJSpeech is a dataset of actual human voices. Lets start our own subjective quality evaluation by considering the example below. Modern artificial neural networks can generate credible sounding human speech. A MelGAN-reproduction of the first LJSpeech sentence is hard to identify as such. The same is true for the HIFI-GAN version below. The audio is …

# Category Archives: Research Projects

## Jax on Juwels Booster

This post illustrates a possible way to set up multinode Jax computations on the Juwels Booster partition at the Jülich Supercomputing Centre. The following text adapts instructions from official documentation to run in Jülich. Let’s start with the Python code. The code snippet below determines how many GPUs we have and tells Jax to run …

## On the similarities of diffused- and gan-generated image detection

Guided diffusion has become the new go-to method for image generation. To avoid misuse of this inspiring new technology, we must ensure fake detection networks remain up to speed with recent developments. Using the approach described in “Diffusion models beat gans on image synthesis”. Wavelet packets decompose an input into blocks according to frequency. The …

Continue reading “On the similarities of diffused- and gan-generated image detection”

## Wavelet-Packet Powered Deepfake Image Detection

Modern neural networks generate realistic artificial images and audio. This development will allow us to create movies, music and audio effects never seen before. Yet at the same time, the new technology may enable new digital ways to lie. In response, the need for a diverse and reliable toolbox arises to identify artificial images and …

Continue reading “Wavelet-Packet Powered Deepfake Image Detection”

## Wavelet optimization for Network compression

Wavelets are uncommon in machine learning, systems with learnable wavelets, in particular, are rare. Promising applications of wavelets in neural networks exist. Adaptive wavelets for network compression are explored in the new paper ‘Neural network compression via learnable wavelet transforms‘. By defining new wavelet loss terms based on the product filter approach to wavelet design, …

Continue reading “Wavelet optimization for Network compression”

## Video Prediction à la Fourier

Video frame prediction is a very challenging problem. Many recent neural network based solution-attempts trained using a mean squared error lead to blurry predictions. My most recent paper currently under review proposes to use Phase correlation and the Fourier-Shift theorem estimate changes and transform current images into predictions. A demo is shown below. The video …

## Complex Recurrent Neural Nets

The paper Complex gated recurrent neural networks explores machine learning in the complex domain. For gradient descent to work the functions involved must be differentiable. In the complex domain holomorphic functions, which satisfy the Cauchy-Riemann partial differential equations are differentiable. Finding functions which fulfill this requirement and are useful for machine learning tasks is very …

## Spectral-RNN

Fourier methods have a long and proven track record as an excellent tool in data processing. Integrating Fourier methods into complex recurrent neural network architectures is therefore an important goal. I integrated the short-time Fourier transform into a recurrent (complex-valued) network structure. This helps when dealing with hard prediction tasks such as human motion prediction, …