• bitcoinBitcoin(BTC)$68,861.002.90%
  • ethereumEthereum(ETH)$2,028.963.97%
  • tetherTether(USDT)$1.000.01%
  • binancecoinBNB(BNB)$638.943.54%
  • rippleXRP(XRP)$1.371.99%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$86.044.94%
  • tronTRON(TRX)$0.286041-1.49%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.72%
  • dogecoinDogecoin(DOGE)$0.0914522.42%
  • whitebitWhiteBIT Coin(WBT)$54.962.49%
  • USDSUSDS(USDS)$1.000.01%
  • cardanoCardano(ADA)$0.2583652.59%
  • bitcoin-cashBitcoin Cash(BCH)$450.460.84%
  • leo-tokenLEO Token(LEO)$9.100.61%
  • HyperliquidHyperliquid(HYPE)$34.7413.17%
  • chainlinkChainlink(LINK)$8.994.55%
  • moneroMonero(XMR)$344.110.78%
  • Ethena USDeEthena USDe(USDE)$1.000.03%
  • CantonCanton(CC)$0.145084-3.78%
  • stellarStellar(XLM)$0.1524072.20%
  • USD1USD1(USD1)$1.000.01%
  • RainRain(RAIN)$0.0091452.69%
  • daiDai(DAI)$1.00-0.06%
  • litecoinLitecoin(LTC)$54.132.54%
  • hedera-hashgraphHedera(HBAR)$0.094395-0.17%
  • paypal-usdPayPal USD(PYUSD)$1.000.03%
  • avalanche-2Avalanche(AVAX)$9.365.30%
  • suiSui(SUI)$0.968.16%
  • zcashZcash(ZEC)$215.207.05%
  • the-open-networkToncoin(TON)$1.340.68%
  • shiba-inuShiba Inu(SHIB)$0.0000053.14%
  • crypto-com-chainCronos(CRO)$0.0754881.63%
  • tether-goldTether Gold(XAUT)$5,101.83-0.58%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.1003141.84%
  • MemeCoreMemeCore(M)$1.51-0.16%
  • pax-goldPAX Gold(PAXG)$5,142.12-0.52%
  • polkadotPolkadot(DOT)$1.513.43%
  • uniswapUniswap(UNI)$3.916.02%
  • mantleMantle(MNT)$0.672.93%
  • Pi NetworkPi Network(PI)$0.2177654.76%
  • okbOKB(OKB)$97.930.08%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • BittensorBittensor(TAO)$195.066.52%
  • SkySky(SKY)$0.0754503.41%
  • Falcon USDFalcon USD(USDF)$1.000.02%
  • AsterAster(ASTER)$0.712.51%
  • Global DollarGlobal Dollar(USDG)$1.000.01%
  • aaveAave(AAVE)$107.791.18%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain

September 10, 2025
in AI & Technology
Reading Time: 7 mins read
A A
Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain
ShareShareShareShareShare

In this tutorial, we walk through an advanced yet practical workflow using SpeechBrain. We start by generating our own clean speech samples with gTTS, deliberately adding noise to simulate real-world scenarios, and then applying SpeechBrain’s MetricGAN+ model to enhance the audio. Once the audio is denoised, we run automatic speech recognition with a language model–rescored CRDNN system and compare the word error rates before and after enhancement. By taking this step-by-step approach, we can experience firsthand how SpeechBrain enables us to build a complete pipeline for speech enhancement and recognition in just a few lines of code. Check out the FULL CODES here.

!pip -q install -U speechbrain gTTS jiwer pydub librosa soundfile torchaudio
!apt -qq install -y ffmpeg >/dev/null


import os, time, math, random, warnings, shutil, glob
warnings.filterwarnings("ignore")
import torch, torchaudio, numpy as np, librosa, soundfile as sf
from gtts import gTTS
from pydub import AudioSegment
from jiwer import wer
from pathlib import Path
from dataclasses import dataclass
from typing import List, Tuple
from IPython.display import Audio, display
from speechbrain.pretrained import EncoderDecoderASR, SpectralMaskEnhancement


root = Path("sb_demo"); root.mkdir(exist_ok=True)
sr = 16000
device = "cuda" if torch.cuda.is_available() else "cpu"

We begin by setting up our Colab environment with all the required libraries and tools. We install SpeechBrain along with audio processing packages, define basic paths and parameters, and prepare the device so we are ready to build our speech pipeline. Check out the FULL CODES here.

YOU MAY ALSO LIKE

EA laid off staffers across Battlefield studios to ‘better align’ its teams

Microsoft announces Copilot Cowork with help from Anthropic — a cloud-powered AI agent that works across M365 apps

def tts_to_wav(text: str, out_wav: str, lang="en"):
   mp3 = out_wav.replace(".wav", ".mp3")
   gTTS(text=text, lang=lang).save(mp3)
   a = AudioSegment.from_file(mp3, format="mp3").set_channels(1).set_frame_rate(sr)
   a.export(out_wav, format="wav")
   os.remove(mp3)


def add_noise(in_wav: str, snr_db: float, out_wav: str):
   y, _ = librosa.load(in_wav, sr=sr, mono=True)
   rms = np.sqrt(np.mean(y**2) + 1e-12)
   n = np.random.normal(0, 1, len(y))
   n = n / (np.sqrt(np.mean(n**2)+1e-12))
   target_n_rms = rms / (10**(snr_db/20))
   y_noisy = np.clip(y + n * target_n_rms, -1.0, 1.0)
   sf.write(out_wav, y_noisy, sr)


def play(title, path):
   print(f"▶ {title}: {path}")
   display(Audio(path, rate=sr))


def clean_txt(s: str) -> str:
   return " ".join("".join(ch.lower() if ch.isalnum() or ch.isspace() else " " for ch in s).split())


@dataclass
class Sample:
   text: str
   clean_wav: str
   noisy_wav: str
   enhanced_wav: str

We define small utilities that power our pipeline from end to end. We synthesize speech with gTTS and convert it to WAV, inject controlled Gaussian noise at a target SNR, and add helpers to preview audio and normalize text. We also create a Sample dataclass so we neatly track each utterance’s clean, noisy, and enhanced paths. Check out the FULL CODES here.

sentences = [
   "Artificial intelligence is transforming everyday life.",
   "Open source tools enable rapid research and innovation.",
   "SpeechBrain brings flexible speech pipelines to Python."
]
samples: List[Sample] = []
print("🗣️ Synthesizing short utterances with gTTS...")
for i, s in enumerate(sentences, 1):
   cw = str(root/f"clean_{i}.wav")
   nw = str(root/f"noisy_{i}.wav")
   ew = str(root/f"enhanced_{i}.wav")
   tts_to_wav(s, cw)
   add_noise(cw, snr_db=3.0 if i%2 else 0.0, out_wav=nw)
   samples.append(Sample(text=s, clean_wav=cw, noisy_wav=nw, enhanced_wav=ew))


play("Clean #1", samples[0].clean_wav)
play("Noisy #1", samples[0].noisy_wav)


print("⬇️ Loading pretrained models (this downloads once) ...")
asr = EncoderDecoderASR.from_hparams(
   source="speechbrain/asr-crdnn-rnnlm-librispeech",
   run_opts={"device": device},
   savedir=str(root/"pretrained_asr"),
)
enhancer = SpectralMaskEnhancement.from_hparams(
   source="speechbrain/metricgan-plus-voicebank",
   run_opts={"device": device},
   savedir=str(root/"pretrained_enh"),
)

In this step, we generate three spoken sentences with gTTS, save both clean and noisy versions, and organize them into our Sample objects. We then load SpeechBrain’s pre-trained ASR and MetricGAN+ enhancement models, providing us with all the necessary components to transform noisy audio into a denoised transcription. Check out the FULL CODES here.

def enhance_file(in_wav: str, out_wav: str):
   sig = enhancer.enhance_file(in_wav) 
   if sig.dim() == 1: sig = sig.unsqueeze(0)
   torchaudio.save(out_wav, sig.cpu(), sr)


def transcribe(path: str) -> str:
   hyp = asr.transcribe_file(path)
   return clean_txt(hyp)


def eval_pair(ref_text: str, wav_path: str) -> Tuple[str, float]:
   hyp = transcribe(wav_path)
   return hyp, wer(clean_txt(ref_text), hyp)


print("\n🔬 Transcribing noisy vs enhanced (MetricGAN+)...")
rows = []
t0 = time.time()
for smp in samples:
   enhance_file(smp.noisy_wav, smp.enhanced_wav)
   hyp_noisy,  wer_noisy  = eval_pair(smp.text, smp.noisy_wav)
   hyp_enh,    wer_enh    = eval_pair(smp.text, smp.enhanced_wav)
   rows.append((smp.text, hyp_noisy, wer_noisy, hyp_enh, wer_enh))
t1 = time.time()

We create helper functions to enhance noisy audio, transcribe speech, and evaluate WER against the reference text. We then run these steps across all our samples, comparing noisy and enhanced versions, and record both transcriptions and error rates along with the processing time. Check out the FULL CODES here.

def fmt(x): return f"{x:.3f}" if isinstance(x, float) else x
print(f"\n⏱️ Inference time: {t1 - t0:.2f}s on {device.upper()}")
print("\n# ---- Results (Noisy → Enhanced) ----")
for i, (ref, hN, wN, hE, wE) in enumerate(rows, 1):
   print(f"\nUtterance {i}")
   print("Ref:      ", ref)
   print("Noisy ASR:", hN)
   print("WER noisy:", fmt(wN))
   print("Enh ASR:  ", hE)
   print("WER enh:  ", fmt(wE))


print("\n🧵 Batch decoding (looping API):")
batch_files = [s.clean_wav for s in samples] + [s.noisy_wav for s in samples]
bt0 = time.time()
batch_hyps = [transcribe(p) for p in batch_files]
bt1 = time.time()
for p, h in zip(batch_files, batch_hyps):
   print(os.path.basename(p), "->", h[:80] + ("..." if len(h) > 80 else ""))
print(f"⏱️ Batch elapsed: {bt1 - bt0:.2f}s")


play("Enhanced #1 (MetricGAN+)", samples[0].enhanced_wav)


avg_wn = sum(wN for _,_,wN,_,_ in rows) / len(rows)
avg_we = sum(wE for _,_,_,_,wE in rows) / len(rows)
print("\n📈 Summary:")
print(f"Avg WER (Noisy):     {avg_wn:.3f}")
print(f"Avg WER (Enhanced):  {avg_we:.3f}")
print("Tip: Try different SNRs or longer texts, and switch device to GPU if available.")

We summarize our experiment by timing inference, printing per-utterance transcriptions, and contrasting WER before and after enhancement. We also batch-decode multiple files, listen to an enhanced sample, and report average WERs so we clearly see the gains from MetricGAN+ in our pipeline.

In conclusion, we clearly see the power of integrating speech enhancement and ASR into a unified pipeline with SpeechBrain. By generating audio, corrupting it with noise, enhancing it, and finally transcribing it, we gain hands-on insights into how these models improve recognition accuracy in noisy environments. The results highlight the practical benefits of using open-source speech technologies. We conclude with a working framework that can be easily extended for larger datasets, different enhancement models, or custom ASR tasks.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

Credit: Source link

ShareTweetSendSharePin

Related Posts

EA laid off staffers across Battlefield studios to ‘better align’ its teams
AI & Technology

EA laid off staffers across Battlefield studios to ‘better align’ its teams

March 9, 2026
Microsoft announces Copilot Cowork with help from Anthropic — a cloud-powered AI agent that works across M365 apps
AI & Technology

Microsoft announces Copilot Cowork with help from Anthropic — a cloud-powered AI agent that works across M365 apps

March 9, 2026
Still Apple’s best overall tablet, with a few caveats
AI & Technology

Still Apple’s best overall tablet, with a few caveats

March 9, 2026
Microsoft says ungoverned AI agents could become corporate ‘double agents.’ Its fix costs  a month.
AI & Technology

Microsoft says ungoverned AI agents could become corporate ‘double agents.’ Its fix costs $99 a month.

March 9, 2026
Next Post
Ford rolls out new ad campaign amid industry-wide uncertainty – CNBC

Ford rolls out new ad campaign amid industry-wide uncertainty - CNBC

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Big tech companies agree to not ruin your electric bill with AI data centers

Big tech companies agree to not ruin your electric bill with AI data centers

March 4, 2026
Dow futures tumble 800 points as U.S. oil tops 0 a barrel to begin the week's trading: Live updates – CNBC

Dow futures tumble 800 points as U.S. oil tops $100 a barrel to begin the week's trading: Live updates – CNBC

March 9, 2026
U.S. Steel expects hiring boost as Nippon deal brings investment 

U.S. Steel expects hiring boost as Nippon deal brings investment 

March 5, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!