• bitcoinBitcoin(BTC)$77,063.000.32%
  • ethereumEthereum(ETH)$2,103.390.57%
  • tetherTether(USDT)$1.000.03%
  • binancecoinBNB(BNB)$660.370.89%
  • rippleXRP(XRP)$1.350.06%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$84.78-0.23%
  • tronTRON(TRX)$0.3715751.79%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.00%
  • dogecoinDogecoin(DOGE)$0.101726-0.01%
  • HyperliquidHyperliquid(HYPE)$61.28-2.12%
  • USDSUSDS(USDS)$1.000.02%
  • zcashZcash(ZEC)$649.07-2.03%
  • leo-tokenLEO Token(LEO)$9.99-0.51%
  • cardanoCardano(ADA)$0.2427450.69%
  • moneroMonero(XMR)$385.77-0.18%
  • bitcoin-cashBitcoin Cash(BCH)$348.900.83%
  • chainlinkChainlink(LINK)$9.460.82%
  • whitebitWhiteBIT Coin(WBT)$56.750.37%
  • CantonCanton(CC)$0.1660090.36%
  • the-open-networkToncoin(TON)$1.9310.14%
  • stellarStellar(XLM)$0.1495461.78%
  • USD1USD1(USD1)$1.00-0.02%
  • Ethena USDeEthena USDe(USDE)$1.000.04%
  • daiDai(DAI)$1.00-0.01%
  • suiSui(SUI)$1.040.99%
  • litecoinLitecoin(LTC)$52.51-0.17%
  • avalanche-2Avalanche(AVAX)$9.281.25%
  • MemeCoreMemeCore(M)$2.963.17%
  • RainRain(RAIN)$0.0080206.09%
  • hedera-hashgraphHedera(HBAR)$0.0880000.07%
  • nearNEAR Protocol(NEAR)$2.7714.71%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • shiba-inuShiba Inu(SHIB)$0.000006-0.21%
  • crypto-com-chainCronos(CRO)$0.0687170.04%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • tether-goldTether Gold(XAUT)$4,553.430.18%
  • BittensorBittensor(TAO)$278.812.05%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • mantleMantle(MNT)$0.650.60%
  • pax-goldPAX Gold(PAXG)$4,560.880.10%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.10%
  • OndoOndo(ONDO)$0.4374460.82%
  • polkadotPolkadot(DOT)$1.261.57%
  • uniswapUniswap(UNI)$3.31-1.69%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0611990.56%
  • HTX DAOHTX DAO(HTX)$0.0000020.53%
  • AsterAster(ASTER)$0.69-1.71%
  • Falcon USDFalcon USD(USDF)$1.00-0.02%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

May 24, 2026
in AI & Technology
Reading Time: 6 mins read
A A
StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension
ShareShareShareShareShare

StepFun, the Shanghai-based AI lab, released StepAudio 2.5 Realtime. It is an end-to-end real-time speech large language model with fully customizable persona capabilities.

StepAudio 2.5 Realtime is a voice model that operates in real time. Unlike pipeline-based systems that separate speech recognition, reasoning, and synthesis into sequential steps, this is an end-to-end model. Audio goes in and audio comes out through a single unified system. The model supports Chinese and English.

YOU MAY ALSO LIKE

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

It connects via a WebSocket API. The endpoint is wss://api.stepfun.com/v1/realtime using the model string step-2.5-realtime.

The Three Technical Pillars

StepFun research team describes three core architectural innovations behind the model:

1. Million-Scale Persona Data Augmentation

Starting from 10,000+ high-quality natively authored personas, StepFun applied algorithmic augmentation to build a million-scale persona feature matrix. This was combined with millions of real-world conversational samples for training. The intent is generalization — specifically, stable performance on difficult, long-tail conversational topics.

Instead of manually labeling millions of persona samples, StepFun team used algorithmic expansion from a curated seed set.

2. Roleplay-Specific RLHF Alignment

A known failure mode in conversational AI is “out-of-character” (OOC) behavior — when a model drifts away from its defined persona mid-conversation. StepFun team conducted dedicated RLHF (Reinforcement Learning from Human Feedback) optimization specifically for persona consistency in roleplay scenarios. RLHF is a training technique where human preference signals are used to train a reward model, which then guides language model behavior. Applying it specifically to roleplay stability is a targeted design choice.

3. Unified Speech Understanding and Generation

StepAudio 2.5 Realtime inherits the StepAudio 2.5 TTS capabilities and deeply fuses speech understanding and generation through reinforcement learning. This enables what StepFun calls “global scene-level tonal setting” and “intra-sentence detail sculpting.” The model can set an overall emotional register for a response while adjusting finer acoustic details within individual sentences.

Paralinguistic Understanding

A technically distinct area of this model is paralinguistic perception. Paralinguistics refers to non-verbal acoustic information in speech — things like tone, speaking rate, pauses, sighs, and laughter. By analyzing these elements, the model can perceive the user’s mood and underlying intentions. For example, it can identify fatigue from a low tone or frustration from a rapid speech rate. Capturing these signals requires the model to operate on audio features rather than transcribed text alone.

StepAudio 2.5 Realtime scored 82.18 on the paralinguistic comprehension benchmark, demonstrating perception of vocal speed, emotion, age, and other acoustic features.

https://stepaudiollm.github.io/step-audio-2.5-realtime/

Benchmark Results

StepFun research team conducted a comprehensive suite of subjective and objective evaluations, benchmarking StepAudio 2.5 Realtime against leading real-time voice models across five dimensions.

Human evaluation is conducted through real mobile app conversations scored by human raters. The scores:

  • Human evaluation (subjective): 80.41
  • General dialogue (objective): 86.36
  • Automotive scenario (objective): 84.80
  • Spoken QA, covering 11 audio understanding tasks (objective): 79.80
  • Paralinguistic comprehension (objective): 82.18

Key Takeaways

  • StepAudio 2.5 Realtime is an end-to-end real-time speech LLM, released by Shanghai-based StepFun.
  • It uses persona-specific RLHF and million-scale data augmentation to maintain stable character consistency.
  • The model ranked first across all five benchmark dimensions, tested in April 2026.
  • Paralinguistic comprehension — perceiving tone, rate, emotion from audio — is a core technical differentiator.
  • API access is via WebSocket at wss://api.stepfun.com/v1/realtime with model string step-2.5-realtime.

Check out the Model Card and Demo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving
AI & Technology

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

May 25, 2026
Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE
AI & Technology

Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

May 25, 2026
Here’s The First Car From Jony Ive’s Design House
AI & Technology

Here’s The First Car From Jony Ive’s Design House

May 25, 2026
Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk
AI & Technology

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

May 25, 2026
Next Post
Check Out Meshchera, An Atmospheric Match-Three Game For Playdate Set In A Haunted Marsh

Check Out Meshchera, An Atmospheric Match-Three Game For Playdate Set In A Haunted Marsh

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Nvidia posts record profit of .3bn amid AI chip boom – Al Jazeera

Nvidia posts record profit of $58.3bn amid AI chip boom – Al Jazeera

May 21, 2026
LG’s UltraGear Is A Native 1,000Hz Full HD Gaming Monitor

LG’s UltraGear Is A Native 1,000Hz Full HD Gaming Monitor

May 19, 2026
Her Mother-In-Law Is Costing Them Their House

Her Mother-In-Law Is Costing Them Their House

May 20, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!