• bitcoinBitcoin(BTC)$77,821.00-1.52%
  • ethereumEthereum(ETH)$2,313.57-3.80%
  • tetherTether(USDT)$1.000.00%
  • rippleXRP(XRP)$1.43-1.03%
  • binancecoinBNB(BNB)$636.12-1.58%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$85.47-2.58%
  • tronTRON(TRX)$0.3290710.00%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.041.62%
  • dogecoinDogecoin(DOGE)$0.096231-0.74%
  • whitebitWhiteBIT Coin(WBT)$55.05-2.19%
  • USDSUSDS(USDS)$1.00-0.02%
  • HyperliquidHyperliquid(HYPE)$41.04-0.25%
  • leo-tokenLEO Token(LEO)$10.280.35%
  • bitcoin-cashBitcoin Cash(BCH)$457.07-1.63%
  • cardanoCardano(ADA)$0.246666-2.58%
  • moneroMonero(XMR)$372.20-1.73%
  • chainlinkChainlink(LINK)$9.27-1.81%
  • stellarStellar(XLM)$0.176057-1.62%
  • CantonCanton(CC)$0.150144-1.74%
  • zcashZcash(ZEC)$332.382.59%
  • MemeCoreMemeCore(M)$4.261.38%
  • daiDai(DAI)$1.000.00%
  • USD1USD1(USD1)$1.00-0.01%
  • litecoinLitecoin(LTC)$55.47-1.18%
  • avalanche-2Avalanche(AVAX)$9.29-2.33%
  • Ethena USDeEthena USDe(USDE)$1.00-0.01%
  • hedera-hashgraphHedera(HBAR)$0.090326-1.69%
  • suiSui(SUI)$0.94-2.98%
  • shiba-inuShiba Inu(SHIB)$0.000006-1.88%
  • RainRain(RAIN)$0.007470-4.00%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • the-open-networkToncoin(TON)$1.35-1.58%
  • crypto-com-chainCronos(CRO)$0.069738-0.82%
  • Circle USYCCircle USYC(USYC)$1.12-0.08%
  • tether-goldTether Gold(XAUT)$4,688.61-0.78%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.077099-2.65%
  • Global DollarGlobal Dollar(USDG)$1.000.02%
  • BittensorBittensor(TAO)$243.07-1.65%
  • pax-goldPAX Gold(PAXG)$4,690.41-0.85%
  • mantleMantle(MNT)$0.64-1.18%
  • uniswapUniswap(UNI)$3.26-4.21%
  • polkadotPolkadot(DOT)$1.22-4.69%
  • SkySky(SKY)$0.0849641.50%
  • nearNEAR Protocol(NEAR)$1.39-1.58%
  • Falcon USDFalcon USD(USDF)$1.00-0.17%
  • okbOKB(OKB)$83.66-1.71%
  • Pi NetworkPi Network(PI)$0.167159-1.31%
  • HTX DAOHTX DAO(HTX)$0.000002-0.14%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers

April 19, 2026
in AI & Technology
Reading Time: 5 mins read
A A
xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers
ShareShareShareShareShare

Elon Musk’s AI company xAI has launched two standalone audio APIs — a Speech-to-Text (STT) API and a Text-to-Speech (TTS) API — both built on the same infrastructure that powers Grok Voice on mobile apps, Tesla vehicles, and Starlink customer support. The release moves xAI squarely into the competitive speech API market currently occupied by ElevenLabs, Deepgram, and AssemblyAI.

What Is the Grok Speech-to-Text API?

Speech-to-Text is the technology that converts spoken audio into written text. For developers building meeting transcription tools, voice agents, call center analytics, or accessibility features, an STT API is a core building block. Rather than developing this from scratch, developers call an endpoint, send audio, and receive a structured transcript in return.

YOU MAY ALSO LIKE

Talking to AI agents is one thing — what about when they talk to each other? New startup BAND debuts ‘universal orchestrator’

Turkey wants to ban social media for kids under 15

The Grok STT API is now generally available, offering transcription across 25 languages with both batch and streaming modes. The batch mode is designed for processing pre-recorded audio files, while streaming enables real-time transcription as audio is captured. Pricing is kept straightforward: Speech-to-Text is $0.10 per hour for batch and $0.20 per hour for streaming.

The API includes word-level timestamps, speaker diarization, and multichannel support, along with intelligent Inverse Text Normalization that correctly handles numbers, dates, currencies, and more. It also accepts 12 audio formats — nine container formats (WAV, MP3, OGG, Opus, FLAC, AAC, MP4, M4A, MKV) and three raw formats (PCM, µ-law, A-law), with a maximum file size of 500 MB per request.

Speaker diarization is the process of separating audio by individual speakers — answering the question ‘who said what.’ This is critical for multi-speaker recordings like meetings, interviews, or customer calls. Word-level timestamps assign precise start and end times to each word in the transcript, enabling use cases like subtitle generation, searchable recordings, and legal documentation. Inverse Text Normalization converts spoken forms like ‘one hundred sixty-seven thousand nine hundred eighty-three dollars and fifteen cents’ into readable structured output: “$167,983.15.”.

Benchmark Performance

xAI research team is making strong claims on accuracy. On phone call entity recognition — names, account numbers, dates — Grok STT claims a 5.0% error rate versus ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%. That is a substantial margin if it holds in production. For video and podcast transcription, Grok and ElevenLabs tied at a 2.4% error rate, with Deepgram and AssemblyAI trailing at 3.0% and 3.2% respectively. xAI team also reports a 6.9% word error rate on general audio benchmarks.

https://x.ai/news/grok-stt-and-tts-apis
https://x.ai/news/grok-stt-and-tts-apis

What is the Grok Text-to-Speech API?

Text-to-Speech converts written text into spoken audio. Developers use TTS APIs to power voice assistants, read-aloud features, podcast generation, IVR (interactive voice response) systems, and accessibility tools.

The Grok TTS API delivers fast, natural speech synthesis with detailed control via speech tags, and is priced at $4.20 per 1 million characters. The API accepts up to 15,000 characters per REST request; for longer content, a WebSocket streaming endpoint is available that has no text length limit and begins returning audio before the full input is processed. The API supports 20 languages and five distinct voices: Ara, Eve, Leo, Rex, and Sal — with Eve set as the default.

Beyond voice selection, developers can inject inline and wrapping speech tags to control delivery. These include inline tags like [laugh], [sigh], and [breath], and wrapping tags like <whisper>text</whisper> and <emphasis>text</emphasis>, letting developers create engaging, lifelike delivery without complex markup. This expressiveness addresses one of the core limitations of traditional TTS systems, which often produce technically correct but emotionally flat output.

Key Takeaways

  • xAI has launched two standalone audio APIs — Grok Speech-to-Text (STT) and Text-to-Speech (TTS) — built on the same production stack already serving millions of users across Grok mobile apps, Tesla vehicles, and Starlink customer support.
  • The Grok STT API offers real-time and batch transcription across 25 languages with speaker diarization, word-level timestamps, Inverse Text Normalization, and support for 12 audio formats — priced at $0.10/hour for batch and $0.20/hour for streaming.
  • On phone call entity recognition benchmarks, Grok STT reports a 5.0% error rate, significantly outperforming ElevenLabs (12.0%), Deepgram (13.5%), and AssemblyAI (21.3%), with particularly strong performance in medical, legal, and financial use cases.
  • The Grok TTS API supports five expressive voices (Ara, Eve, Leo, Rex, Sal) across 20 languages, with inline and wrapping speech tags like [laugh], [sigh], and <whisper> giving developers fine-grained control over vocal delivery — priced at $4.20 per 1 million characters.

Check out the Technical details here. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Talking to AI agents is one thing — what about when they talk to each other? New startup BAND debuts ‘universal orchestrator’
AI & Technology

Talking to AI agents is one thing — what about when they talk to each other? New startup BAND debuts ‘universal orchestrator’

April 23, 2026
Turkey wants to ban social media for kids under 15
AI & Technology

Turkey wants to ban social media for kids under 15

April 23, 2026
Aevex CEO Speaks on Raising 0 Million in US IPO
AI & Technology

Aevex CEO Speaks on Raising $320 Million in US IPO

April 23, 2026
Trump Says ‘Highly Unlikely’ He Extends Iran Ceasefire
AI & Technology

Trump Says ‘Highly Unlikely’ He Extends Iran Ceasefire

April 23, 2026
Next Post
Live updates: Iran war ceasefire deadline looms as Strait of Hormuz closed again – CNN

Live updates: Iran war ceasefire deadline looms as Strait of Hormuz closed again - CNN

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Dow rallies 800 points, S&P 500 tops 7,100 for the first time after Iran declares Strait of Hormuz open: Live updates – CNBC

Dow rallies 800 points, S&P 500 tops 7,100 for the first time after Iran declares Strait of Hormuz open: Live updates – CNBC

April 17, 2026
Five people injured in shooting near University of Iowa

Five people injured in shooting near University of Iowa

April 21, 2026
A lot of you panic-bought PCs to avoid RAMaggedon 2026

A lot of you panic-bought PCs to avoid RAMaggedon 2026

April 17, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!