• bitcoinBitcoin(BTC)$62,558.00-1.55%
  • ethereumEthereum(ETH)$1,674.85-0.84%
  • tetherTether(USDT)$1.000.03%
  • binancecoinBNB(BNB)$597.05-0.64%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • rippleXRP(XRP)$1.16-0.16%
  • solanaSolana(SOL)$66.03-1.33%
  • tronTRON(TRX)$0.323020-1.13%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.032.19%
  • HyperliquidHyperliquid(HYPE)$62.05-0.97%
  • dogecoinDogecoin(DOGE)$0.085615-1.12%
  • USDSUSDS(USDS)$1.000.02%
  • leo-tokenLEO Token(LEO)$9.41-1.49%
  • RainRain(RAIN)$0.013006-2.47%
  • zcashZcash(ZEC)$469.2210.45%
  • stellarStellar(XLM)$0.198456-1.87%
  • CantonCanton(CC)$0.1678146.84%
  • cardanoCardano(ADA)$0.1674940.27%
  • moneroMonero(XMR)$321.022.13%
  • chainlinkChainlink(LINK)$7.91-0.92%
  • whitebitWhiteBIT Coin(WBT)$44.61-1.48%
  • the-open-networkToncoin(TON)$1.750.91%
  • USD1USD1(USD1)$1.000.03%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • daiDai(DAI)$1.000.00%
  • bitcoin-cashBitcoin Cash(BCH)$206.09-0.70%
  • MemeCoreMemeCore(M)$2.98-3.09%
  • hedera-hashgraphHedera(HBAR)$0.080361-2.12%
  • litecoinLitecoin(LTC)$42.58-1.49%
  • LABLAB(LAB)$10.41-18.46%
  • suiSui(SUI)$0.75-1.21%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • avalanche-2Avalanche(AVAX)$6.72-1.12%
  • nearNEAR Protocol(NEAR)$2.16-0.86%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • shiba-inuShiba Inu(SHIB)$0.000005-1.49%
  • crypto-com-chainCronos(CRO)$0.061190-2.25%
  • tether-goldTether Gold(XAUT)$4,319.850.26%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.33%
  • BittensorBittensor(TAO)$215.420.70%
  • pax-goldPAX Gold(PAXG)$4,330.720.21%
  • mantleMantle(MNT)$0.54-1.49%
  • worldcoin-wldWorldcoin(WLD)$0.5310.29%
  • OndoOndo(ONDO)$0.363925-0.79%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.055340-0.55%
  • Ripple USDRipple USD(RLUSD)$1.000.01%
  • AsterAster(ASTER)$0.62-1.65%
  • polkadotPolkadot(DOT)$0.97-1.43%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab

June 9, 2026
in AI & Technology
Reading Time: 3 mins read
A A
NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab
ShareShareShareShareShare

YOU MAY ALSO LIKE

Apple Is Making It Easier To Create Genmoji In iOS 27

FCC Relaxes Amazon’s Satellite Internet Deadline

print("\n" + "=" * 90)
print("[5] cuTile kernels are defined only if cuda.tile imports successfully")
print("=" * 90)
if cutile_import_ok:
   ConstInt = ct.Constant[int]
   @ct.kernel
   def cutile_vec_add_direct_kernel(a, b, c, TILE: ConstInt):
       bid = ct.bid(0)
       a_tile = ct.load(a, index=(bid,), shape=(TILE,))
       b_tile = ct.load(b, index=(bid,), shape=(TILE,))
       c_tile = a_tile + b_tile
       ct.store(c, index=(bid,), tile=c_tile)
   @ct.kernel
   def cutile_vec_add_gather_kernel(a, b, c, TILE: ConstInt):
       bid = ct.bid(0)
       offsets = bid * TILE + ct.arange(TILE, dtype=torch.int32)
       a_tile = ct.gather(a, offsets)
       b_tile = ct.gather(b, offsets)
       c_tile = a_tile + b_tile
       ct.scatter(c, offsets, c_tile)
   @ct.kernel
   def cutile_matrix_add_gather_kernel(a, b, c, TILE_M: ConstInt, TILE_N: ConstInt):
       bid_m = ct.bid(0)
       bid_n = ct.bid(1)
       rows = bid_m * TILE_M + ct.arange(TILE_M, dtype=torch.int32)
       cols = bid_n * TILE_N + ct.arange(TILE_N, dtype=torch.int32)
       rows = rows[:, None]
       cols = cols[None, :]
       a_tile = ct.gather(a, (rows, cols))
       b_tile = ct.gather(b, (rows, cols))
       c_tile = a_tile + b_tile
       ct.scatter(c, (rows, cols), c_tile)
   @ct.kernel
   def cutile_matmul_kernel(A, B, C, TM: ConstInt, TN: ConstInt, TK: ConstInt):
       bid_m = ct.bid(0)
       bid_n = ct.bid(1)
       num_tiles_k = ct.num_tiles(A, axis=1, shape=(TM, TK))
       acc = ct.full((TM, TN), 0, dtype=ct.float32)
       zero_pad = ct.PaddingMode.ZERO
       compute_dtype = ct.tfloat32 if A.dtype == ct.float32 else A.dtype
       for k in range(num_tiles_k):
           a_tile = ct.load(
               A,
               index=(bid_m, k),
               shape=(TM, TK),
               padding_mode=zero_pad
           ).astype(compute_dtype)
           b_tile = ct.load(
               B,
               index=(k, bid_n),
               shape=(TK, TN),
               padding_mode=zero_pad
           ).astype(compute_dtype)
           acc = ct.mma(a_tile, b_tile, acc)
       out = ct.astype(acc, C.dtype)
       ct.store(C, index=(bid_m, bid_n), tile=out)
else:
   print("Skipping cuTile kernel definitions because cuda.tile is unavailable.")
print("\n" + "=" * 90)
print("[6] High-level wrappers")
print("=" * 90)
def vec_add_tutorial(a, b, use_gather=True):
   if a.shape != b.shape:
   if likely_runtime_ok and a.is_cuda:
       c = torch.empty_like(a)
       TILE = 256 if use_gather else min(1024, 2 ** math.ceil(math.log2(a.numel())))
       grid = (math.ceil(a.numel() / TILE), 1, 1)
       kernel = cutile_vec_add_gather_kernel if use_gather else cutile_vec_add_direct_kernel
       ct.launch(torch.cuda.current_stream(), grid, kernel, (a, b, c, TILE))
       return c
   return a + b
def matrix_add_tutorial(a, b):
   if a.shape != b.shape:
   if likely_runtime_ok and a.is_cuda:
       c = torch.empty_like(a)
       TILE_M = 16
       TILE_N = 64
       grid = (math.ceil(a.shape[0] / TILE_M), math.ceil(a.shape[1] / TILE_N), 1)
       ct.launch(
           torch.cuda.current_stream(),
           grid,
           cutile_matrix_add_gather_kernel,
           (a, b, c, TILE_M, TILE_N)
       )
       return c
   return a + b
def matmul_tutorial(A, B):
   if A.shape[1] != B.shape[0]:
       raise ValueError("A.shape[1] must equal B.shape[0]")
   if likely_runtime_ok and A.is_cuda:
       if A.dtype in (torch.float16, torch.bfloat16):
           TM, TN, TK = 128, 128, 64
       else:
           TM, TN, TK = 32, 32, 32
       C = torch.empty((A.shape[0], B.shape[1]), device=A.device, dtype=A.dtype)
       grid = (math.ceil(A.shape[0] / TM), math.ceil(B.shape[1] / TN), 1)
       ct.launch(
           torch.cuda.current_stream(),
           grid,
           cutile_matmul_kernel,
           (A, B, C, TM, TN, TK)
       )
       return C
   return A @ B
print("Wrappers ready.")
print(f"Execution backend: {'cuTile' if likely_runtime_ok else 'PyTorch fallback'}")

Credit: Source link

ShareTweetSendSharePin

Related Posts

Apple Is Making It Easier To Create Genmoji In iOS 27
AI & Technology

Apple Is Making It Easier To Create Genmoji In iOS 27

June 9, 2026
FCC Relaxes Amazon’s Satellite Internet Deadline
AI & Technology

FCC Relaxes Amazon’s Satellite Internet Deadline

June 9, 2026
Ninja Theory Cancels Psychological Horror Game Project Mara
AI & Technology

Ninja Theory Cancels Psychological Horror Game Project Mara

June 9, 2026
A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search
AI & Technology

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

June 9, 2026
Next Post
Defunding Planned Parenthood is ‘politically toxic,’ says its CEO

Defunding Planned Parenthood is ‘politically toxic,’ says its CEO

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
The One Stock That Belongs in Every Portfolio?

The One Stock That Belongs in Every Portfolio?

June 8, 2026
After decades of pelvic pain and 100-plus doctor visits, one question changed it all – The Washington Post

After decades of pelvic pain and 100-plus doctor visits, one question changed it all – The Washington Post

June 6, 2026
Google Shuts Down The AI Image App Pixel Studio

Google Shuts Down The AI Image App Pixel Studio

June 5, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!