캡스톤

주제탐색(2)

avocado8 2024. 3. 14. 21:14

 

Intelligent task vs Creative task

intelligent task

- ability to create an object success metric that we can use to evaluate the quality of and algorithm 

- object detection, speech recognization . . .

- objective success metric

- well-defined problem description (formalized)

creative taks

- dont' have objective success metric

- formalize music? can't . . 

-> difficult to reproduce with machine

 

GM Era

Big tech experiments era

AWS DeepComposer(Amazon, 2019)

Jukebox(OpenAI, 2020)

- raw-audio generation / advanced deep learning / full piece + lead vocals

- ! turning point !

 

Music AI hype (2023~)

generative AI (ChatGPT, DALL-E...)

Text-to-music

- MusicLM(Google, 2023)

- MusicGen(Meta, 2023)

Generative audio models

- Mousa, AudioLDM, SingSong, RAVE 2, Riffusion . . . .

Second startup wave

- SOUNDRAW, Riffusion, boomy, beatoven.ai, WAVEAI . . 

Aiva.ai

 

How to classify GM systems

Classifying GM Systems

- goal of system? : melody, chord progressions, full-tracks, jazz imporve, loops, drums, .... / video games, movies, ads, concert, SNS, ....

- who's the users? : composers, songwriters, developers, consumers, researchers, marketing agency, ...

- how autonomous is the system? :  

   human-machine co-creation ----------- human supervision -------------- fully autonomous

- how is music generated?: machine learning, deep learning...

- how is music represented? : symbolic representation, media, forms, ....

 

how about ours . . ?

- goal: full-tracks

- users: consumers? writers

- human-machine collab

- audio representation

 

Use cases

Text-to-music generation

- textual inputs (descriptions of music) -> generate music

- can be used by people on social media who have very little understanding of Music (minimal human input)

- deep learning based

- Audio representation

- MusicLM, MusicGen, Mubert

 

Singing voice cloning

- Generate / clone voice

- producers, wanna-be musicians

- human-machine collab

- deep learning, audio representation

 

Automatic accompaniment

- Instrumental accompaniment of lead vocals

- Amateur musicians

- human-machine collab

- deep learning, rule-based techniques / symbolic representation

- Nootone

 

Sound synthesis

- generation of alien sounds

- mid / pro producers

- human-machine collab

- deep learning / audio representation

- NSynth(Google) 

 

Open Source Research (The Sound of AI)

- voice-to-sound synthesizer

- community-driven research project

 

 

Representation

Symbolic representation

- Symbols (notes, instruments...)

- Similar to a score

- MIDI, MusicXML, Piano-roll, ABC notation...

- Discipline connections : music theory, composition, (computational) musicology

Symbolic generation

- MuseNet(OpenAI, 2019) : GPT2 architecture / trained on MIDI files. predict next token

- Pros : Compact. easy to manipulate. clear, precise. losts of compositional info. capture long-term dependencies. small models

- Cons: Oversimplified. Musical limitations. Limited performance info. no production info. Output isn't audio.

- 언제 ideal? : structure + composition is focus, notated western music(classical, jazz....)

- isn't ideal? : performance+production is focus. EDM, drone, .... note에 포함되지 않은 beauty...

 

Audio representation

- waveform, spectogram, audio embeddings, music cognition

- Audio generation

- sample audio-based models: Jukebox, MusicLM, MusicGen, RAVE

- Pros : Lots of performance/production info, complex and rich, Audio output

- Cons: Large dimension/size, difficult to manipulate. no compositional info. model size is big(요구성능높음), difficult to capture long-term dependencies

 

* A good music representation solves 50% of GM

* Symbolic은 score, audio는 waveform

* compositional details가 많을 경우 symbolic, performance details가 많을 경우 audio가 적합

 

 

Generative music taxonomy

- traditional(symbolic) : Symbolic AI, Optimization, Complex systems, Statistical methods

- cutting edge(symbolic + audio) : Deep learning

 

Deep learning

- Artifical neural nets

- Learn from massive datasets

- Imitate target style

- Audio / symbolic generation 

- Computationally demanding

- Learn long-term dependencies

- No manual input

- Architectures

ㄴ Recurrent neural nets(DeepBach(symbolic)) , Variational auto encoders(Jukebox), Diffusion models(Riffusion), Transformers(MusicGen) - 뒤에 세개는 audio

 

Limitations

Text-to-music(musicLM, musicGEN...)

- Long-term structure

- Audio fidelity

- Semantic mapping

- Minimal creative control (musical knowledge.. 무시하고 데이터만 넣으면 동작할것이라는 기대)

 

DeepLearning models

- Music is highly dimensional (harmony.melody,rhythm...)

- Network can't learn all dimensions

- DL model has no musical knowledge

- Massive datasets

- Lack of music coherence 일관성

- Black box -> difficult to steer

Solving the curse of DL?

- hybrid systems

- merge DL and symbolic AI (Neuro-Symbolic Integration)

 

Music representation

- audio is too complex, symbolic is too simple

- no representation captures all music details efficiently

ㄴ hybrid symbolic + audio representations

ㄴ embeddings(symbolic + audio + context)

ㄴ custom representations

 

Grammar과 같은 방식으로...

T = {C,D,E,F,A,B,Whole,Half,Quarter}

N = {Melody, Phrase, Pitch, Duration}

S = Melody

P = {

Melody -> Phrase Phrase

Phrase -> Pitch Duration | Pitch Pitch Duration

Pitch -> C | D | E | F | G | A | B

Duration -> Whole | Half | Quarter

}

- Production rules은 어떻게 determine?

ㄴ Extract manually(music theory)

ㄴ Learn from dataset

 

Generative tasks

- Melody generation, Chord progressions, Music structure, Full track generation ....

 

Designing a generative grammar

- Finding the correct music representation is key

- What musical dimensions ?

- what do symbols represent ?

 

Lindenmayer system(L-system) (형식문법의 일종이라ㅐ...)

- musical output에서 사용..

- apply all production rules at once for each iteration

*L-System for chord generation

  • A(alphabet) = {A,B,C,D,E,F,G}
  • S(axiom) = A
  • P = {
    A->ABC
    B->BA
    C->EF
    F->GFD
    }

https://web.mit.edu/music21/

 

music21: a Toolkit for Computer-Aided Musicology

What is music21? Music21 is a set of tools for helping scholars and other active listeners answer questions about music quickly and simply. If you’ve ever asked yourself a question like, “I wonder how often Bach does that” or “I wish I knew which b

web.mit.edu

 

 

Markov chain(MC)

- Mathemetical system that undergoes transitions from one state to another

- events probabilistically의 model sequence

* The next state depends only on the current state

history of sequence와는 무관함

- states: the possible conditions

- initial probabilities : likelihood of starting the sequence in a state

- transition probabilities: likelihood of moving from one state to another

 

Modelling music with MCs

- melody : sequence of notes (parameter : duration + pitch)

- generate a melody based on the probability of one note following another

- chord progression : sequene of chords

- generate a chord based on the probability of one chord following another

ex) C major 5음계 scale

/ simplifications : pitches in one octave / focus on pitch(duration은 일단 무시)

  • S={C,D,E,G,A}
  • Ip = (pc pd pe pg pa) (각 확률 vector)
  • Tp = (pcc pcd pce ... pdc pdd... pec... pgc.... pac...) (all combination에 대한 확률 matrix)

1) First pitch

- use Ip vector -> roll dice -> get pitch from Ip

2) Subsequent pitche

- use Tp matrix -> get to the row of the current pitch (E가 선택됐다면 E에 대한 row. pec ped pee peg pea) -> roll dice -> get new pitch (즉 현재 pitch에만 dependent하게 결정됨)

3) repeat until end

 

+) other model with MCs

- Rhythms, Octaves, Dynamics(piano, fortissimo...), Simple melodic patterns, Instrumentation, Articulations(staccateo, legato...), Form, ...

 

* Two modelling approaches

- multiple parameters일 때, 각 parameter마다 1 MC 사용

-> makes problem more treatable

 

Pros

- Simple, Flexible, Fun and creative(..?), OK for ambient

Cons

- Random walk, Lack of musical context, Bad for genres with strong musical direction

 

 

Melody generation with MC

MC model을 사용한 melody generator (학습된 곡: 반짝반짝작은별...)

https://github.com/musikalkemist/generativemusicaicourse/blob/main/12.%20Melody%20generation%20with%20Markov%20chains/Code/markovchain.py

 

 

 

Cellular automata(CA)

- models used to simulate complex systems using rules on a grid of cells

- ceels change state based on their own and neighbors' states

- complex patterns emerge from simple rules

 

CA formalisation

- Grid : line of cells(1D), plane of cells(2D)

- Cell: each cell is identified by its row + column position

- States : each cell can be in one of a finite number of states

- Neighborhood : set of cells around cell whose states influence the cell's next state

- Transition rules

a) dictate how the state of a cell changes

b) functions of the states of the cell and its neighbors at time t to determine the state at time t+1

- Initial conditions: initial states of the grid

 

CA for music generation

1) Map axes to different mjusical params(pitch, inst, time...)

2) Assign states to musical events (on/off, pitches...)

3) Design rules for musical evolution - may or may not be music-based rules

4) Map time (e.g. 1 beat = 1 step)

e.g.) time을 x-axis로, y-axis를 드럼 패턴으로(floor, hihat, snare, kick...)

e.g.) melody generation by CA

States={C,D,E,F,G,A,None(rest)}

time을 x-axis로, each cell에 state가 들어가도록

e.g.) expressive chord generation

States={pp, p, mf, f, ff, None}

pitch를 x-axis로(C,D,E,G,A), inst를 y-axis로(synth, piano, organ...)

 

Music strategies for CA

- Generate entire score

- Guideline for imporvisation

- Integrate CA-generated inst into a composition

- Pros: Flexible, Experimentation, OK for raw material

- Cons: Bad musical output, No music knowledge(just mechanism)

 

Drum Generation with Cellular Automata

States={ON, OFF}

xaxis: time, yaxis: hihat / snare / kick

transition rules: Syncopation resolution / filling gaps / accenting / mutation

 

CellularAutomatonDrumGenerator

https://github.com/musikalkemist/generativemusicaicourse/blob/main/14.%20Drum%20generation%20with%20cellular%20automata/Code/cellularautomaton.py

 

 

 

Genetic algorithms (GA)

- Optimization techniques inspired by the process of Natural Selection(limited resources로 인해 fittest individuals가 survive. over many generations, these processes can rules in adaptaion and the evolution of species)

- solutions evolve over generations to optimize a specific objective

- Aerospace design, routing problems, DNA sequence alignment, Art/music generation, ....

 

Formalising GA

- Population : a set of candidate solutions (individuals)

- Chromosomes : Encoded version of the candidate solution (e.g. design a table. 다리는 다섯개고 키는 80센치고 색은 빨강이고 등등...)

- Fitness function : Measures how effective a solution is

- Genetic operators

1) Selection : choose fittest individuals for producing offspring

- likelihood of an individual being selected 더 잘맞는게 살아남음

- roulette wheel selection, ...

2) Crossover : combine the genetic info of two parents to produce new offspring

- 자식은 부모간 유전자 parts가 exchanges. one-point crossover, two-point crossover, ...

- creates genetic diversity, can lead to new solutions

3) Mutation : introduce variation into the offspring's genetic makeup

- random changes are made to parts of the genetic code of the offspring

- mutation rate is low to prevent random search

 

Pros: Flexible, Explore unconvential ideas, Good results

Cons: Crafting fitness funtion is complex, Subjectivity

 

GA step by step

Create initial population (randomly, criteria, ...) -> Evaluate fitness -> Select parents -> Generate offspring (crossover) -> Mutate offspring -> Replace population ->. Check termination condition

 

What are GAs good for?

Problems where traditional optimization techs fail

Large, complex, multimodal spaces

Diversity and adaptation

 

GA for music generation

1) Encode music elements as chromosomes

- melody : 1 note per gene, pitch+duration (C4 - 0.5, D4 - 1.0, ...)

- chords : 1 chord per gene (Cm, Dm, D, ...)

- sound synthesis : 1 synth parameter per gene (cut-off frqeuncy 0.34, reverb 0.46, delay 0.22, ...)

 

2) Craft the fitness function

- evaluates the aesthetic value of a composition

- infer from music theory

- learn from data

- subjective : what is a good melody?

e.g.) for melody

Linear combination of multiple criteria: scale conformity, melodic contour, rhythmic variatio, dissonance resolution

F = w1 * SC + w2 * MC + w3 * RV + w4 * DR

 

3) Run the algorithm

 

Melody Harmonization with GA

MelodyData(dataclass. 학습시킬 노래의 info)

GeneticMelodyHarmonizer (generations)

FitnessEvaluator (fitness function)

https://github.com/musikalkemist/generativemusicaicourse/blob/main/16.%20Melody%20harmonization%20with%20genetic%20algorithms/Code/geneticmelodyharmonizer.py

 

 

 

Transformer

 

 

 

Text-to-music Generation with Mustango

- The MusicBench dataset : audio tracks with corresponding natural language descriptions of music features

- Architecture

Two components : Latent Diffusion Model / MuNET

1) Latent Diffusion Model

(Audio) -> Audio Encoder -> Diffusion Model(Forward / Reverse Process) -> Audio Decoder -> Audio model

2) MuNet (Music Domain Knowledge Informened UNet)

- conditioning of the audio synthesis

- tamps : t1, t1+t2, t1+t2+t3, ... ... max beat

 

 

https://replicate.com/declare-lab/mustango

 

declare-lab/mustango – Run with an API on Replicate

Run time and cost This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs. Readme Mustango: Toward Controllable Text-to-Music Generation M

replicate.com

 

 

 

 

 

공신력 있는 글을 읽고 싶다면?

저작권소멸 시 (공..모전?) / 현대소설 일부발췌 / 데이터 쌓이면 비슷한 사용자가 추천, 좋아요한 작품추천

공신력이 기본 /  아마추어 글도 추가적인 기능?

공신력 있는 걸 기본으로 쪼금 띄워주고 / 사용자참여

 

 

 

 

 

 

 

'캡스톤' 카테고리의 다른 글

GPT 모델로 미술 작품 감상을 위한 채팅봇 구현하기  (0) 2024.05.27
기술 검증  (0) 2024.04.12
주제탐색(4)  (0) 2024.04.08
예시 데이터  (0) 2024.04.01
주제탐색(3)  (1) 2024.03.26