• Kinza Babylon Staked BTCKinza Babylon Staked BTC(KBTC)$83,270.000.00%
  • Steakhouse EURCV Morpho VaultSteakhouse EURCV Morpho Vault(STEAKEURCV)$0.000000-100.00%
  • Stride Staked InjectiveStride Staked Injective(STINJ)$16.51-4.18%
  • Vested XORVested XOR(VXOR)$3,404.231,000.00%
  • FibSwap DEXFibSwap DEX(FIBO)$0.0084659.90%
  • ICPanda DAOICPanda DAO(PANDA)$0.003106-39.39%
  • TruFin Staked APTTruFin Staked APT(TRUAPT)$8.020.00%
  • bitcoinBitcoin(BTC)$104,600.001.39%
  • ethereumEthereum(ETH)$2,509.790.95%
  • VNST StablecoinVNST Stablecoin(VNST)$0.0000400.67%
  • tetherTether(USDT)$1.000.01%
  • rippleXRP(XRP)$2.37-1.78%
  • binancecoinBNB(BNB)$653.910.24%
  • solanaSolana(SOL)$172.760.41%
  • Wrapped SOLWrapped SOL(SOL)$143.66-2.32%
  • usd-coinUSDC(USDC)$1.000.00%
  • dogecoinDogecoin(DOGE)$0.231870-0.35%
  • cardanoCardano(ADA)$0.81-0.03%
  • tronTRON(TRX)$0.2630910.69%
  • staked-etherLido Staked Ether(STETH)$2,508.790.92%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$104,563.001.37%
  • SuiSui(SUI)$4.020.99%
  • Gaj FinanceGaj Finance(GAJ)$0.0059271.46%
  • Content BitcoinContent Bitcoin(CTB)$24.482.55%
  • USD OneUSD One(USD1)$1.000.11%
  • chainlinkChainlink(LINK)$16.941.31%
  • Wrapped stETHWrapped stETH(WSTETH)$3,011.980.86%
  • avalanche-2Avalanche(AVAX)$24.861.44%
  • UGOLD Inc.UGOLD Inc.(UGOLD)$3,042.460.08%
  • stellarStellar(XLM)$0.3076170.53%
  • ParkcoinParkcoin(KPK)$1.101.76%
  • shiba-inuShiba Inu(SHIB)$0.0000160.80%
  • hedera-hashgraphHedera(HBAR)$0.206343-2.35%
  • ToncoinToncoin(TON)$3.380.51%
  • HyperliquidHyperliquid(HYPE)$25.04-0.40%
  • bitcoin-cashBitcoin Cash(BCH)$410.72-2.22%
  • polkadotPolkadot(DOT)$5.131.32%
  • USDSUSDS(USDS)$1.00-0.02%
  • leo-tokenLEO Token(LEO)$8.37-1.64%
  • Pi NetworkPi Network(PI)$1.0946.93%
  • litecoinLitecoin(LTC)$100.52-2.57%
  • wethWETH(WETH)$2,508.331.00%
  • Yay StakeStone EtherYay StakeStone Ether(YAYSTONE)$2,671.07-2.84%
  • moneroMonero(XMR)$338.465.26%
  • Pundi AIFXPundi AIFX(PUNDIAI)$16.000.00%
  • PengPeng(PENG)$0.60-13.59%
  • Wrapped eETHWrapped eETH(WEETH)$2,675.710.88%
  • PepePepe(PEPE)$0.0000143.61%
  • Bitget TokenBitget Token(BGB)$4.880.79%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.000.16%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Meet JEN-1: A Universal AI Framework that Combines Bi-Directional and Uni-Directional Modes to Generate High-Quality Music Conditioned on Either Text or Music Representations

August 16, 2023
in AI & Technology
Reading Time: 5 mins read
A A
Meet JEN-1: A Universal AI Framework that Combines Bi-Directional and Uni-Directional Modes to Generate High-Quality Music Conditioned on Either Text or Music Representations
ShareShareShareShareShare

YOU MAY ALSO LIKE

How to use Gemini to generate unique backgrounds in Google Meet

Dream 7B: How Diffusion-Based Reasoning Models Are Reshaping AI

Music, often hailed as the universal language of humanity by Henry Wadsworth Longfellow, carries within it the essence of harmony, melody, and rhythm, weaving a tapestry of cultural significance that profoundly resonates with people across the globe. Recent advancements in deep generative models have driven progress in music generation. However, the challenge of generating high-quality, realistic music that captures its complexity and nuances, particularly when conditioned on textual descriptions, remains formidable.

 Existing methods for generating music have made significant strides, but developing intricate and lifelike music that aligns with free-form textual prompts still needs to be improved. Music’s multifaceted nature, spanning various instruments and harmonies, requires addressing specific challenges:

  1. The music encompasses a wide frequency spectrum, necessitating high sampling rates like 44.1KHz stereo for capturing intricate details. This contrasts with speech, which operates at lower sampling rates.
  2. The intricate interplay of instruments and the arrangement of melodies and harmonies result in complicated and complex musical structures. Precision is crucial, as music is highly sensitive to dissonance.
  3. Maintaining control over attributes like key, genre, and melody is pivotal to realizing the intended artistic vision.

To address these challenges of the text-to-music generation, the Futureverse research team designed JEN-1. JEN-1 leverages a unique omnidirectional diffusion model that combines autoregressive (AR) and non-autoregressive (NAR) paradigms, allowing it to capture sequential dependencies while accelerating generation. Unlike prior methods that often convert audio data to mel-spectrograms, JEN-1 directly models raw audio waveforms, maintaining higher fidelity and quality. This is possible through a noise-robust masked autoencoder that compresses original audio into latent representations, preserving high-frequency details.Researchers introduce a normalization step that reduces anisotropy in the latent embeddings to enhance the model’s performance further.

JEN-1’s core architecture is an omnidirectional 1D diffusion model combining bid and unidirectional modes. The model leverages a temporal 1D efficient U-Net inspired by the Efficient U-Net architecture. This architecture is designed to model waveforms effectively, and it includes both convolutional and self-attention layers to capture sequential dependencies and contextual information. The unidirectional mode, crucial for music generation due to its time-series nature, is incorporated through causal padding and masked self-attention, ensuring the generated latent embeddings on the right depend on their left counterparts.

One of JEN-1’s unique strengths lies in its unified music multi-task training approach. It supports three main music generation tasks:

  • Bidirectional text-guided music generation
  • Bidirectional music inpainting (restoring missing segments)
  • Unidirectional music continuation (extrapolation)

Through multi-task training, JEN-1 shares parameters across tasks, allowing it to generalize better and handle sequential dependencies more effectively. This flexibility makes JEN-1 a versatile tool that can be applied to diverse music generation scenarios.

The experiment setup involves training JEN-1 on 5,000 hours of high-quality music data. The model uses a masked music autoencoder and FLAN-T5 for text embeddings. During training, multi-task objectives are balanced, and classifier-free guidance is employed. JEN-1 is trained for 200k steps using the AdamW optimizer on 8 A100 GPUs.

JEN-1’s performance is compared against several state-of-the-art methods using objective and subjective metrics. It outperforms other methods in terms of plausibility (FAD), audio-text alignment (CLAP), and human-rated text-to-music quality (T2M-QLT), and alignment (T2M-ALI). Despite its computational efficiency, JEN-1 surpasses competing models in text-to-music synthesis.

Ablation studies demonstrate the effectiveness of different components in JEN-1. Incorporating the auto-regressive mode and employing multi-tasking objectives enhance music quality and generalization. The proposed method consistently achieves high-fidelity music generation without increasing training complexity.

Overall, JEN-1 presents a powerful solution for text-to-music generation, significantly advancing the field. It generates high-quality music by directly modeling waveforms and combining auto-regressive and non-autoregressive training. The integrated diffusion models and masked auto-encoders enhance sequence modeling. JEN-1 demonstrates superiority in subjective quality, diversity, and controllability compared to strong baselines, highlighting its effectiveness for music synthesis.


Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.


Madhur Garg is a consulting intern at MarktechPost. He is currently pursuing his B.Tech in Civil and Environmental Engineering from the Indian Institute of Technology (IIT), Patna. He shares a strong passion for Machine Learning and enjoys exploring the latest advancements in technologies and their practical applications. With a keen interest in artificial intelligence and its diverse applications, Madhur is determined to contribute to the field of Data Science and leverage its potential impact in various industries.


🔥 Use SQL to predict the future (Sponsored)

Credit: Source link

ShareTweetSendSharePin

Related Posts

How to use Gemini to generate unique backgrounds in Google Meet
AI & Technology

How to use Gemini to generate unique backgrounds in Google Meet

May 11, 2025
Dream 7B: How Diffusion-Based Reasoning Models Are Reshaping AI
AI & Technology

Dream 7B: How Diffusion-Based Reasoning Models Are Reshaping AI

May 11, 2025
FTC pushes the enforcement of its ‘click-to-cancel’ rule back to July
AI & Technology

FTC pushes the enforcement of its ‘click-to-cancel’ rule back to July

May 10, 2025
Your PS5 now natively accepts Apple Pay
AI & Technology

Your PS5 now natively accepts Apple Pay

May 10, 2025
Next Post
Blackstone Decides to Extend Stay After La Quinta IPO

Blackstone Decides to Extend Stay After La Quinta IPO

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Why are Hong Kong gamers shunning Western products?

Why are Hong Kong gamers shunning Western products?

May 9, 2025
Dept. of Education to resume collection of defaulted student loans

Dept. of Education to resume collection of defaulted student loans

May 5, 2025
U.S. government is ‘trying to evade’ court review in deportation plans, says ACLU deputy director

U.S. government is ‘trying to evade’ court review in deportation plans, says ACLU deputy director

May 5, 2025

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!