On Deepfake audio fingerprints

LJSpeech is a dataset of actual human voices. Lets start our own subjective quality evaluation by considering the example below. Modern artificial neural networks can generate credible sounding human speech. A MelGAN-reproduction of the first LJSpeech sentence is hard to identify as such. The same is true for the HIFI-GAN version below. The audio is …

On the similarities of diffused- and gan-generated image detection

Guided diffusion has become the new go-to method for image generation. To avoid misuse of this inspiring new technology, we must ensure fake detection networks remain up to speed with recent developments. Using the approach described in “Diffusion models beat gans on image synthesis”. Wavelet packets decompose an input into blocks according to frequency. The …

Wavelet-Packet Powered Deepfake Image Detection

Modern neural networks generate realistic artificial images and audio. This development will allow us to create movies, music and audio effects never seen before. Yet at the same time, the new technology may enable new digital ways to lie. In response, the need for a diverse and reliable toolbox arises to identify artificial images and …

Wavelet optimization for Network compression

Wavelets are uncommon in machine learning, systems with learnable wavelets, in particular, are rare. Promising applications of wavelets in neural networks exist. Adaptive wavelets for network compression are explored in the new paper ‘Neural network compression via learnable wavelet transforms‘. By defining new wavelet loss terms based on the product filter approach to wavelet design, …

Jaxlets – Fast Wavelet Transformations in JAX

The fast wavelet transform is an important signal processing algorithm. Jet a differentiable implementation in JAX has been missing so far, I have therefore opened my implementation . It supports the one and two dimensional analysis and synthesis transforms. As well as an implementation of the forward wavelet packet transform. The plot below shows an …

Video Prediction à la Fourier

Video frame prediction is a very challenging problem. Many recent neural network based solution-attempts trained using a mean squared error lead to blurry predictions. My most recent paper currently under review proposes to use Phase correlation and the Fourier-Shift theorem estimate changes and transform current images into predictions. A demo is shown below. The video …

Spectral-RNN

Fourier methods have a long and proven track record as an excellent tool in data processing. Integrating Fourier methods into complex recurrent neural network architectures is therefore an important goal. I integrated the short-time Fourier transform into a recurrent (complex-valued) network structure. This helps when dealing with hard prediction tasks such as human motion prediction, …