Automatic Generation of Audio Jingles

This page accompanies the report Génération automatique de jingles audio. Please refer to this document for context.

Comparison of the different frameworks

Warning: some sounds can be loud.

Diffusion paradigm Data format # learnable parameters # training steps Ground 1 Prediction 1 Ground 2 Prediction 2
EDM Float. codes 231.6 M 2 M (our best model)
DDPM CQT 56.9 M 500 k
EDM CQT 56.9 M 500 k
DDPM Float. codes 75.4 M 500 k
EDM Float. codes 75.4 M 500 k
DiscDPM Disc. codes 87.9 M 500 k
DiscDPM Disc. codes 133.1 M 670 k (stopped by scheduler)

Testing control over our best model

Our best model was trained for 2 M steps (231.6 M learnable parameters). We partially set the control to -1 (unconditional).

Control Prediction 1 Prediction 2 Prediction 3
Full control
No instruments control
No chromas control
No velocity control
Partial instruments control
Partial chromas control
Partial velocity control
No control
One instrument at a time
Piano + violin
Violin + trumpet
Piano + trumpet
Piano + violin + trumpet