• bitcoinBitcoin(BTC)$81,603.000.86%
  • ethereumEthereum(ETH)$2,376.71-0.34%
  • tetherTether(USDT)$1.000.01%
  • rippleXRP(XRP)$1.421.13%
  • binancecoinBNB(BNB)$634.421.12%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$87.372.99%
  • tronTRON(TRX)$0.3430411.08%
  • dogecoinDogecoin(DOGE)$0.1159223.89%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.40%
  • whitebitWhiteBIT Coin(WBT)$60.330.20%
  • USDSUSDS(USDS)$1.00-0.01%
  • HyperliquidHyperliquid(HYPE)$44.124.09%
  • cardanoCardano(ADA)$0.2636804.42%
  • leo-tokenLEO Token(LEO)$10.350.28%
  • bitcoin-cashBitcoin Cash(BCH)$459.613.20%
  • zcashZcash(ZEC)$528.0623.48%
  • moneroMonero(XMR)$406.770.88%
  • chainlinkChainlink(LINK)$9.863.65%
  • CantonCanton(CC)$0.148398-0.49%
  • the-open-networkToncoin(TON)$2.0617.39%
  • stellarStellar(XLM)$0.1621232.26%
  • USD1USD1(USD1)$1.000.01%
  • MemeCoreMemeCore(M)$3.4428.63%
  • daiDai(DAI)$1.000.02%
  • litecoinLitecoin(LTC)$56.722.79%
  • avalanche-2Avalanche(AVAX)$9.552.79%
  • hedera-hashgraphHedera(HBAR)$0.0913103.23%
  • suiSui(SUI)$0.994.85%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • shiba-inuShiba Inu(SHIB)$0.0000063.36%
  • RainRain(RAIN)$0.007378-1.81%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • crypto-com-chainCronos(CRO)$0.0706112.53%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$286.52-0.57%
  • tether-goldTether Gold(XAUT)$4,632.592.19%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • pax-goldPAX Gold(PAXG)$4,633.762.24%
  • polkadotPolkadot(DOT)$1.304.69%
  • uniswapUniswap(UNI)$3.391.90%
  • mantleMantle(MNT)$0.650.89%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0663234.04%
  • Pi NetworkPi Network(PI)$0.1832191.44%
  • SkySky(SKY)$0.080226-1.17%
  • okbOKB(OKB)$85.950.38%
  • Falcon USDFalcon USD(USDF)$1.000.20%
  • pepePepe(PEPE)$0.0000045.16%
  • AsterAster(ASTER)$0.691.64%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Inworld AI Launches Realtime TTS-2: A Closed-Loop Voice Model That Adapts to How You Actually Talk

May 6, 2026
in AI & Technology
Reading Time: 5 mins read
A A
Inworld AI Launches Realtime TTS-2: A Closed-Loop Voice Model That Adapts to How You Actually Talk
ShareShareShareShareShare

Voice AI has a dirty secret: most of it was never designed for conversation. The dominant paradigm — feed text in, get audio out — traces its lineage to audiobook narration and voiceover production, where the model never hears the person on the other end. That’s fine when you’re generating a podcast intro. It’s not fine when a frustrated user is trying to get support from an AI agent at 11pm.

Inworld AI is calling that out directly with the launch of Realtime TTS-2, a new voice model released as a research preview via its Inworld API and Inworld Realtime API. The model hears the full audio of the exchange, picks up the user’s tone, pacing and emotional state, then takes voice direction in plain English the way developers prompt an LLM.

YOU MAY ALSO LIKE

Valve Releases Design Files For Its Out-Of-Stock Steam Controller

One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it

What’s Actually Different Here

The meaningful architectural distinction with TTS-2 is that it operates as a closed-loop system. The model takes the actual audio of the prior turns of the exchange as input, not just a transcript — it hears how the user actually sounded. That’s a non-trivial difference. A transcript of “okay, fine” gives you the words. The audio of “okay, fine” tells you whether the person is relieved, resigned, or sarcastic. TTS-2 is designed to use that signal.

The same line lands differently after a joke than after bad news, and the model knows the difference because it heard the prior turn. Tone, pacing, and emotional state carry forward automatically. Practically speaking, audio context flows across turns inside a Realtime session without developers needing to pass explicit prior_audio fields or build additional plumbing.

Four Capabilities, One Model

Inworld team is shipping TTS-2 with four key features, positioning the combination and not any individual piece, as the differentiation.

  1. Voice Direction: It lets developers steer delivery using plain-language prompts inline at inference time. Instead of selecting from a fixed emotion enum like [sad] or [excited], developers pass a bracket tag like [speak sadly, as if something bad just happened] directly in the text. Long, descriptive prompts beat short labels — the model responds far better to full context than single-word labels. Inline non-verbal markers like [laugh], [sigh], [breathe], [clear_throat], and [cough] can be dropped anywhere in the text where the moment should occur, and the model places them as audio events, not pronounced words.
  2. Conversational Awareness: It is the closed-loop architecture described above — the architectural shift that separates TTS-2 from prior-generation models that treat each sentence as a stateless generation call.
  3. Crosslingual support: One voice identity is preserved across over 100 languages, including mid-utterance language switches inside a single generation. No language flag is needed — the model handles transitions automatically, keeping timbre, pitch, and character constant across the switch. The top-tier languages ship at native-speaker quality, while the long tail is described as launch-window experimental, consistent with the model releasing as a research preview.
  4. Advanced Voice Design: It generates a saved voice from a written prompt and no reference audio required. Developers can describe a person in prose, save the result as a reusable voice, and call it like any other voice in the app. Voice Design ships with three stability modes: Expressive (for live consumer conversation and companions), Balanced (the default for most agent workloads), and Stable (for IVR and professional deployments where pitch drift is unacceptable).

The Conversational Layer Underneath

Beyond the four key features, it calls out a set of behaviors that push speech further into what it describes as “person paying attention” territory. The most technically interesting is disfluencies: the model generates natural uh and um, self-corrections, mid-noun-phrase pauses, and trailing thoughts that signal warmth and recall rather than malfunction. Critically, different speaker profiles cluster fillers differently, and the model follows the rhythm — filler-as-energy sounds different from filler-as-hesitation. Voice cloning is also supported via a two-step API: upload a reference sample (5–15 seconds, clean, single speaker) to /voices/v1/voices:clone, get a voice ID, and use it like any other voice.

Where It Fits in the Stack

TTS-2 is one layer in Inworld’s broader Realtime API pipeline. The full stack includes Realtime STT, which transcribes and profiles the speaker in one pass — capturing age, accent, pitch, vocal style, emotional tone, and pacing as structured signals on the same connection. A Realtime Router that routes across 200+ models, selecting the appropriate model and tools based on the user’s state and conversation context. And TTS-2 at the output layer. The pipeline runs over a single persistent WebSocket connection, with sub-200ms median time-to-first-audio for the TTS layer.

https://artificialanalysis.ai/text-to-speech/leaderboard. (data as of May 5, 2026)

The Broader Context

Realtime TTS 1.5 already ranks #1 on the Artificial Analysis Speech Arena (as of May 5, 2026), ahead of Google (#2) and ElevenLabs (#3). The launch of TTS-2 signals that Inworld considers raw audio quality a solved problem — and is now competing on the behavioral layer: context-awareness, steerability, and identity consistency across languages.


Check out the Docs and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

Valve Releases Design Files For Its Out-Of-Stock Steam Controller
AI & Technology

Valve Releases Design Files For Its Out-Of-Stock Steam Controller

May 5, 2026
One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it
AI & Technology

One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it

May 5, 2026
Apple Will Pay 0 Million For Failing To Deliver Its AI-Powered Siri On Time
AI & Technology

Apple Will Pay $250 Million For Failing To Deliver Its AI-Powered Siri On Time

May 5, 2026
Closing the ‘Expressivity Gap’: How Mistral’s Voxtral TTS is Redefining Multilingual Voice Cloning with a Hybrid Autoregressive and Flow-Matching Architecture
AI & Technology

Closing the ‘Expressivity Gap’: How Mistral’s Voxtral TTS is Redefining Multilingual Voice Cloning with a Hybrid Autoregressive and Flow-Matching Architecture

May 5, 2026
Next Post
OpenAI is on a roll! but Google might be cooking..

OpenAI is on a roll! but Google might be cooking..

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
SpaceX launches 6-ton ViaSat-3 F3 satellite on Falcon Heavy rocket – Spaceflight Now

SpaceX launches 6-ton ViaSat-3 F3 satellite on Falcon Heavy rocket – Spaceflight Now

April 29, 2026
Tesla's Robotaxi Opportunity Is Dead In Light Of Waymo's Dominance

Tesla's Robotaxi Opportunity Is Dead In Light Of Waymo's Dominance

May 4, 2026
Ex-JPMorgan banker Chirayu Rana files wild new claims against Lorna Hajdini

Ex-JPMorgan banker Chirayu Rana files wild new claims against Lorna Hajdini

May 4, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!