Trying out AI music generation with Dance Diffusion

Posted on 2022-10-27 in AI music

·2 min read

In late September the Dance Diffusion music generation model (beta) was released by Harmonai. Harmonai, backed by Stability AI (which also backed Stable Diffusion), is a deep learning research lab focused on creating open-source generative audio models, supporting researchers and developers with compute grants and community, and providing musicians and audio engineers with new creative tools.

As an oversimplication, Dance Diffusion is kind of like the audio version of Stable Diffusion or DALLE-2, except simpler. Essentially, one adds random noise to an audio track and then a model is trained to remove that noise. A gentle intro to Dance Diffusion is on the awesome Weights & Biases website.

Dance Diffusion doesn’t operate on MIDI files (i.e. the digital version of sheet music) which are used by most generative methods, it operates directly on raw audio files, so that allows for almost any sound to be created.

I think it’s a really cool that a method used for image generation could be applied in the same way to audio. It’s another testament to how fundamental deep learning knowledge can be applied to so many fields.

Unfortunately, it’s still early days, and it’s tough to generate good-sounding music with this model. Here’s one clip I made by interpolating between the noised representations of two 8-second clips from Harder, Better, Faster, Stronger by Daft Punk (see the awesome original music vid on Youtube), using the Dance Diffusion model trained on glitch music from glitch.cool.

Jerry Chi · Dance Diffusion interpolation of Daft Punk

Someone else was able to make a much better clip, but they used several tools in addition to Dance Diffusion (“gpt-j (lyrics) -> jukebox (w/modded sampler) -> demucs -> pymixconsole -> dance diffusion -> mastering”):

Baltigor · Misery, I Know You (Like Company)

This Dance Diffusion model is much smaller and simpler than Stable Diffusion, and it’s still the early beta version, so I think that explains why it’s not that good yet. Another reason is that there’s not that much non-copyrighted music out there in the public to use as training data.

Anyway, stay tuned because I will share more about AI music generation!

Stable Diffusion prompt: “Pop art of daft punk at a vaporwave neon futuristic cyberpunk Tokyo bustling street at night cyberart by liam wong, rendered in octane, 3d render, trending on cgsociety, blender 3d”