On Deepfake audio fingerprints

LJSpeech is a dataset of actual human voices. Lets start our own subjective quality evaluation by considering the example below. Modern artificial neural networks can generate credible sounding human speech. A MelGAN-reproduction of the first LJSpeech sentence is hard to identify as such. The same is true for the HIFI-GAN version below. The audio is …

On the similarities of diffused- and gan-generated image detection

Guided diffusion has become the new go-to method for image generation. To avoid misuse of this inspiring new technology, we must ensure fake detection networks remain up to speed with recent developments. Using the approach described in “Diffusion models beat gans on image synthesis”. Wavelet packets decompose an input into blocks according to frequency. The …

Wavelet-Packet Powered Deepfake Image Detection

Modern neural networks generate realistic artificial images and audio. This development will allow us to create movies, music and audio effects never seen before. Yet at the same time, the new technology may enable new digital ways to lie. In response, the need for a diverse and reliable toolbox arises to identify artificial images and …