Article·Jun 5, 2024

How AI Changed the Music Industry in under a Year

By Samuel Adebayo
PublishedJun 5, 2024
UpdatedJun 5, 2024

TL;DR

  • Generative music leverages AI to create music, blending human creativity with computational techniques.
  • This process starts by feeding music examples into the model. The model learns patterns and structures and then generates new melodies or sounds. These can be assembled into complete compositions manually or automatically
  • Music needs to be encoded correctly to be created with AI. Symbolic and audio representations are used to capture different details in music.
  • Music AI systems are great at short segments but struggle with longer structures. The sound quality isn't as good as that of human-made music, and the artistic control is limited (well, companies like Udio and Suno are doing ground-breaking work here).

Music generation is an application of generative artificial intelligence (AI), a subset of AI that focuses on creating new content, such as music, images, or text. This field is gaining popularity due to the exciting possibilities it offers for creative expression and its potential to inspire people in unique ways. 

AI music generation tools like Google's Magenta and IBM's Watson Beat have already demonstrated the ability to compose original pieces and assist musicians in their creative process.

However, music is a vast and complex domain, making it difficult to objectively formalize and evaluate the quality of AI-generated music. The perception of music quality often depends on individual preferences and emotional responses. Despite these challenges, AI has made significant strides in generating music and supporting human composers.

Some argue that AI-generated music lacks the emotional depth and creative thought process inherent in human-composed music. However, it is worth exploring how AI generates music and its potential impact on the music industry.

In this article, you will learn about:

  • The fundamental concepts of generative music

  • The mechanics of the music generation process

  • The history and evolution of AI in music

  • Generative techniques and use cases, with examples of tools for generating music

  • Ethical considerations, challenges, and the future of AI in music

What is Generative Music?

Generative music is both an art and a science. It uses AI techniques to create music with different levels of autonomy. It is a branch of computational creativity that explores the nature and definition of creativity and develops systems that exhibit creative behavior. 

Computational creativity involves algorithms and machine learning (ML) to generate novel and valuable outputs in various domains, including music, art, and literature.

The generative music process generally follows these steps:

  • Input: A collection of music examples (e.g., melodies, drums) is fed into the model.

  • Training: The model learns the patterns and relationships within the music data, effectively developing its own "musical imagination." Common AI techniques include recurrent neural networks (RNNs), LSTMs, and transformers designed to handle complex sequential data like music.

  • Output: The model generates new melodies, rhythms, or even entire musical pieces based on what it has learned.

  • Assembly: The level of human intervention in assembling the final song varies depending on the specific AI tool and the desired outcome. Some tools allow for highly automated composition, while others provide more granular control to the musician.

How generative music works: Diagram illustrating the process of AI music generation. It shows examples of music (melody, drums) as input fed into an AI model labeled, which produces new melodies and drums as output. The final step involves manually or automatically assembling the song.

How generative music works: Diagram illustrating the process of AI music generation. It shows examples of music (melody, drums) as input fed into an AI model labeled, which produces new melodies and drums as output. The final step involves manually or automatically assembling the song.

For humans, the process is similar:

  • Inspiration: An artist starts with inspiration from a particular genre, subconsciously drawing upon their musical influences and experiences. They create a base pattern, such as piano or guitar chords, as a foundation for the song.

  • Creation: The artist adds other elements to the composition through imagination and musical experience. They may collaborate with music producers or other industry professionals to refine and enhance the piece.

  • Production: Together, they produce a unique piece for their audience.

The key difference is that human musicians rely primarily on intuition and creativity, honed through years of practice and exposure to music. 

In contrast, AI models base their creative output on patterns and structures learned from massive datasets of existing music.

The Interdisciplinary Nature of Generative Music

Generative music draws upon a wide range of disciplines, including:

For example, knowledge of music theory is essential for generating music within specific genres or styles, while music cognition can help guide the creation of emotionally evocative pieces.

While you don't need to be an expert in all these areas, understanding the tool and its capabilities and having a clear vision of the desired outcome are crucial for creating generative music.

History of AI Music Generation

To give some context on how AI music generation works, let’s understand how we got here. Research in AI music generation has evolved significantly, from early attempts at transcribing live performances to the sophisticated generative models we see today.

Early Developments:

  • 1700s: While not involving AI, experiments like the "piano roll" and Mozart's Dice Game explored the possibilities of automated or semi-automated music creation.

  • 1950s-1960s: The advent of computers opened up new possibilities. The Illiac Suite for String Quartet, composed using the ILLIAC I computer in 1956, is considered a landmark in algorithmic composition. Similar experiments, such as Rudolf Zaripov's Ural chants, followed.

  • 1965: Ray Kurzweil developed software capable of recognizing musical patterns and synthesizing new compositions, hinting at the potential of machine learning in music.

  • 1981: David Cope's Experiments in Musical Intelligence (EMI) program sparked debate by producing music in the style of Bach, raising questions about the nature of creativity and authorship.

The Rise of Machine Learning

2000s: The 2000s saw the emergence of startups like Aiva, Jukedeck, and Melodrive, which began exploring the commercial potential of AI music generation. While the technology was still in its early stages, these companies paved the way for more sophisticated tools.

The Generative AI Era

The recent explosion of generative AI has revolutionized the field of music generation. Projects like Google's Magenta, Amazon's AWS DeepComposer, Meta's MusicGen, and OpenAI's Jukebox can now generate complete musical pieces, sometimes even with lyrics, from simple text prompts or other inputs. It has become an exciting field that will expand in the coming years.

How AI Music Generation Works

AI music generation begins by encoding music data into a format that AI models can understand. This involves converting the music into a representation that captures its essential patterns and structures. 

There are two main approaches to music representation:

  • Symbolic representation

  • Audio representation

Symbolic Representation

This approach represents music using discrete symbols like notes, chords, and rhythms, similar to sheet music. Symbolic representations are compact and computationally efficient, making them well-suited for modeling long-term musical structures. However, they may not capture all the nuances of sound, and the output requires audio conversion.

This method works well for formal structures like classical music, where each note, rhythm, and dynamic are precisely documented. It is less effective for genres focusing on production elements like synthesized sounds, samples, and effects, such as EDM (Electronic Dance Music). OpenAI’s MuseNet is an example of a model that uses this representation.

Formats: MIDI (Musical Instrument Digital Interface), Piano-roll, MusicXML, and Tablature.

Audio Representation

The sound content of music can be represented in a way that models can learn from, capturing all its aspects and nuances. This high-dimensional representation makes the model more complex and requires a lot of computational power to train. 

However, capturing long-term patterns and controlling the output can be challenging due to the lack of compositional information. Examples of models using this method include MusicGen, among others.

When encoding music into the latent feature space, a compressed representation of the original data is crucial for generating music. If done correctly, this significantly simplifies the process. Otherwise, essential parts of the music may be lost. 

Symbolic and audio representations offer different benefits, but they can complement each other in generative music systems, leveraging their strengths to create more expressive and diverse outputs.

Formats: Spectrogram, Constant-Q Transform (CQT), Waveform, Audio embeddings.

Modeling Music

Once the music data is encoded, it's time to choose an AI model to learn from this representation and generate new music. Music is inherently sequential, with each note or chord influencing what comes next. Therefore, models designed for sequential data perform well in music generation. 

Here are some common types of models:

  • Hidden Markov Chains (HMMs): HMMs are good at capturing local dependencies in music, such as chord progressions or melodic patterns. However, they may struggle to model long-range relationships.

Diagram showing the various states and transitions in chord progressions for a Hidden Markov Model, including states like C Major, F Major, and G Major, with emission probabilities indicated between them.

Diagram showing the various states and transitions in chord progressions for a Hidden Markov Model, including states like C Major, F Major, and G Major, with emission probabilities indicated between them.

  • RNNs (Recurrent Neural Networks): RNNs are specifically designed for handling sequential lengths in music and excel at modeling long-term dependencies. They are commonly used for tasks like melody generation or drum pattern generation.

  • Diffusion Models: These models generate high-quality samples by gradually refining random noise into structured musical output. They are well-suited for capturing the complex, layered nature of music.

Animation illustrating the diffusion process from noising a snare sample (Source: CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis (Rouard, Hadjeres))

Animation illustrating the diffusion process from noising a snare sample (Source: CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis (Rouard, Hadjeres))

  • Transformers: Originally developed for natural language processing, transformers have shown impressive results in music generation. Their self-attention mechanism allows them to capture short—and long-term relationships in music for more coherent and musically interesting generations.

Other models, such as Long-Short-Term Memory (LSTM) networks and Generative Adversarial Networks (GANs), can also be used for music generation, depending on the specific task and desired output. 

LSTMs are a type of RNN that can capture long-term dependencies in music sequences, while GANs consist of a generator and discriminator network that compete against each other to generate realistic music samples.

Use Cases of Generative Music Systems

The use cases for generative music systems depend highly on the system's goal, how the music is generated, who uses it, and the autonomy it gives them in the creative process. Here are a few use cases:

  • Text-to-music generation.

  • Sound Synthesis.

  • Singing voice cloning.

  • Audio Inpainting.

  • Automatic accompaniment.

Text-to-Music Generation

One of the most exciting and accessible use cases is text-to-music generation. With this technology, users can create music by simply providing a text description of their desired mood, genre, instruments, or even a specific story or scene they want the music to evoke.

MusicLM is an experimental text-to-music model that can generate unique songs based on your ideas or descriptions. | Source: Turn ideas into music with MusicLM.

MusicLM is an experimental text-to-music model that can generate unique songs based on your ideas or descriptions. | Source: Turn ideas into music with MusicLM.

  • Applications: Film and game scoring, music for social media, advertising, marketing, personalized playlists.

  • Benefits: Easy to use, accessible to non-musicians, can generate custom music quickly and efficiently.

  • Challenges: Requires clear and descriptive text input, quality can vary depending on the model and the input.

  • Examples: MusicLM, Suno.

Sound Synthesis

Generative music systems can also generate new and unique sounds for use in music production. This involves generating novel instrumental sounds, sound effects, or even new sonic textures.

The process of creating new sounds from the original sample to generating temporal embeddings that can be reconstructed to new sounds. | Source: NSynth: Neural Audio Synthesis.

The process of creating new sounds from the original sample to generating temporal embeddings that can be reconstructed to new sounds. | Source: NSynth: Neural Audio Synthesis.

  • Applications: Music production, sound design, sound libraries.

  • Benefits: Expands the palette of available sounds, enables experimentation and creativity.

  • Challenges: It requires some understanding of sound design principles and can be computationally intensive.

  • Examples: Google Magenta (NSynth)

Singing Voice Cloning

This is one of the most popular use cases right now. The system replicates a human singing voice using AI, allowing users to create new vocal performances based on existing voices.

Generative AI applications for voice cloning. | Source: KwiCut.

Generative AI applications for voice cloning. | Source: KwiCut.

  • Applications: Virtual duets, posthumous releases, voice restoration, interactive experiences, text-to-speech.

  • Benefits: Can bring back the voices of lost loved ones, enable collaboration with artists across time and space.

  • Challenges: Raises ethical concerns around consent and copyright, potential for misuse, and deepfakes.

Examples: Deepgram, MyVocal.AI (voice cloning examples), KwiCut, ElevenLabs.

Audio Inpainting

This technique fills in missing or damaged parts of an audio recording. It works similarly to image inpainting, where AI algorithms analyze the surrounding audio and generate content to fill in the gaps.

Udio web interface with the audio inpainting feature in action. | Source: TestingCatalog.

Udio web interface with the audio inpainting feature in action. | Source: TestingCatalog.

  • Applications: Audio restoration, noise reduction, error correction.

  • Benefits: Can repair damaged recordings, improve audio quality, and streamline the editing process.

  • Challenges: Requires a good understanding of the surrounding audio context; quality can vary depending on the model and the extent of the missing data.

  • Examples: Udio

https://x.com/udiomusic/status/1788243716676759668

Automatic Accompaniment

Automatic accompaniment systems generate musical accompaniment in real-time based on a soloist's input. This can provide backing tracks for singers, instrumentalists, or even dancers.

  • Applications: Music practice, performance enhancement, music education.

  • Benefits: Improves musical confidence, provides real-time feedback, and improves learning experiences for amateurs.

  • Challenges: The system must adapt to the soloist's tempo and style, which can be limited by the quality of the input and the model's training data.

  • Example: Nootone

These use cases show how generative music systems can enhance creativity and offer new ways to experience and interact with music. While AI alone might not be considered creative, combining human imagination with AI can improve music creation.

Ethical Considerations of AI-Generated Music

Generative music is growing in popularity not only because of its aesthetic value but also because of its economic benefits. This has raised important copyright concerns for the music industry. Using copyrighted material to train AI models without permission can lead to legal disputes, especially in applications like singing voice cloning.

The ethical implications extend beyond legal issues. Some key questions to consider include:

  • Fair Compensation: How should artists be compensated when their work is used to train AI models that generate music?

  • Bias and Representation: How can we ensure that AI-generated music doesn't perpetuate existing biases in the music industry or marginalize underrepresented groups?

  • Creative Agency: What is the role of human creativity in a world where AI can generate music? Does AI-generated music have the same artistic value as human-created music?

While companies like DataMxr are working on solutions for ethical AI training, the broader ethical landscape of AI-generated music is still evolving. 

The industry must engage in open and transparent discussions to address these concerns and ensure the technology is used responsibly and equitably.

Challenges and Possible Future of generative music

Music AI is an interesting field, although it still faces several challenges that limit its full potential.

Technical Challenges

  • Long-term Structure: AI models often struggle to maintain coherence and structure over extended musical pieces. They can generate short, interesting segments but may lose track of the overall narrative or theme as the piece progresses. This is partly due to the difficulty modeling long-range dependencies in music data.

  • Sound Quality: The audio quality of AI-generated music can sometimes be inferior to human-produced music. This can be due to artifacts introduced by audio compression or synthesis techniques or limitations in the model's ability to capture the full complexity of sound.

Creative Challenges

  • Limited Artistic Control: While AI tools can spark new ideas and generate variations, they can also be difficult to control precisely. When matching their specific vision or style, artists may find guiding the AI's creative output challenging.

  • Deep Learning Black Box: The deep learning models used in music generation are often opaque, making it difficult to understand how they arrive at their creative decisions. This lack of transparency can hinder artistic experimentation and control.

Researchers and developers are actively addressing these challenges so that we can anticipate significant improvements in the future. For example, Udio recently (5/29/2024) announced a model capable of two-minute generations. 

However, it's important to acknowledge these limitations as we explore the possibilities and potential of AI-generated music.

Conclusion

AI music is here to stay and will profoundly impact how we create and experience music. As large technology companies continue to integrate AI into their tools, the combination of human creativity and advanced models will be at the forefront of generating new musical possibilities. All use cases for music generation will improve as the underlying technology advances.

Addressing ethical concerns about copyright and consent is crucial to ensuring that human collaboration with AI protects musicians and enhances the quality, coherence, and other aspects of AI-generated music. 

With ongoing efforts to resolve these issues, we can look forward to a future where AI and human creativity work harmoniously to push music's boundaries.

FAQs

1. What is AI Music Generation?

AI music generation involves using artificial intelligence to create music. This application of generative AI combines technology and creativity to produce new compositions, autonomously or with human input, by learning patterns and structures from existing music data.

2. How does generative music work?

Generative music systems use machine learning models to analyze music examples and generate new compositions. The process involves feeding the model with various inputs, such as melodies, rhythms, and instrumentals. The AI then creates new music based on these patterns, with the level of autonomy depending on the specific system used.

3. What are the main challenges of AI-generated music?

AI-generated music faces several challenges, including maintaining long-term composition structure, achieving high sound quality, providing adequate creative control for users, and the "black box" nature of deep learning models, which makes the creative process opaque and difficult to understand.

4. What are some use cases for generative music systems?

Generative music systems can be used for text-to-music generation, sound synthesis, singing voice cloning, audio inpainting, and automatic accompaniment. These applications range from creating new music based on text descriptions to filling in missing parts of an audio track or providing instrumental support for solo musicians.

5. What are the ethical considerations in AI-generated music?

Ethical considerations include the potential for copyright infringement, using someone's voice without consent, and the environmental impact of the computational power required for AI models. Ensuring ethical use involves obtaining proper permissions and setting clear terms for using AI-generated content.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.