• bitcoinBitcoin(BTC)$65,962.00-6.92%
  • ethereumEthereum(ETH)$1,831.44-8.47%
  • tetherTether(USDT)$1.000.02%
  • binancecoinBNB(BNB)$635.96-8.31%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.20-6.23%
  • solanaSolana(SOL)$73.41-9.11%
  • tronTRON(TRX)$0.330975-3.19%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.61%
  • HyperliquidHyperliquid(HYPE)$70.16-6.83%
  • dogecoinDogecoin(DOGE)$0.091759-8.88%
  • USDSUSDS(USDS)$1.000.02%
  • zcashZcash(ZEC)$617.259.12%
  • leo-tokenLEO Token(LEO)$10.070.53%
  • RainRain(RAIN)$0.0136742.06%
  • cardanoCardano(ADA)$0.209920-7.62%
  • stellarStellar(XLM)$0.219553-7.50%
  • moneroMonero(XMR)$329.94-4.11%
  • chainlinkChainlink(LINK)$8.28-7.95%
  • CantonCanton(CC)$0.148727-3.74%
  • whitebitWhiteBIT Coin(WBT)$48.18-7.68%
  • the-open-networkToncoin(TON)$1.98-5.32%
  • bitcoin-cashBitcoin Cash(BCH)$255.26-12.26%
  • USD1USD1(USD1)$1.00-0.02%
  • Ethena USDeEthena USDe(USDE)$1.000.02%
  • LABLAB(LAB)$14.15-24.78%
  • daiDai(DAI)$1.000.02%
  • MemeCoreMemeCore(M)$3.25-1.13%
  • hedera-hashgraphHedera(HBAR)$0.085541-7.19%
  • litecoinLitecoin(LTC)$47.34-5.34%
  • nearNEAR Protocol(NEAR)$2.72-0.66%
  • avalanche-2Avalanche(AVAX)$8.09-8.62%
  • suiSui(SUI)$0.80-8.35%
  • shiba-inuShiba Inu(SHIB)$0.000005-5.75%
  • paypal-usdPayPal USD(PYUSD)$1.000.04%
  • Circle USYCCircle USYC(USYC)$1.13-0.08%
  • crypto-com-chainCronos(CRO)$0.062099-4.45%
  • tether-goldTether Gold(XAUT)$4,457.26-0.33%
  • Global DollarGlobal Dollar(USDG)$1.000.02%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.141.11%
  • BittensorBittensor(TAO)$222.64-11.91%
  • pax-goldPAX Gold(PAXG)$4,473.16-0.28%
  • mantleMantle(MNT)$0.60-5.84%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.059350-0.32%
  • OndoOndo(ONDO)$0.3828437.27%
  • polkadotPolkadot(DOT)$1.07-6.79%
  • Ripple USDRipple USD(RLUSD)$1.00-0.01%
  • okbOKB(OKB)$83.14-6.40%
  • AsterAster(ASTER)$0.68-2.98%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Coding Tutorial on Google Colab

June 3, 2026
in AI & Technology
Reading Time: 6 mins read
A A
How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Coding Tutorial on Google Colab
ShareShareShareShareShare

In this tutorial, we fine-tune Liquid AI’s LFM2 model through a complete open-source workflow. We start by loading the base LFM2 checkpoint with QLoRA, preparing a chat-style supervised fine-tuning dataset, training a lightweight LoRA adapter using TRL and PEFT, and then merging the adapter back into the model. We also extend the workflow with DPO to show how we can improve response preference using chosen and rejected answers. At the end, we have a practical pipeline that moves from a base LFM2 model to an SFT-tuned, preference-aligned checkpoint, ready for further testing or deployment.

Copy CodeCopiedUse a different Browser
!pip install -q -U "transformers>=4.55" "trl>=0.12" "peft>=0.13" "datasets>=2.20" "accelerate>=0.34" bitsandbytes


import torch, gc
from datasets import load_dataset, Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
from trl import SFTConfig, SFTTrainer, DPOConfig, DPOTrainer


MODEL_ID    = "LiquidAI/LFM2-1.2B"
USE_4BIT    = True
RUN_DPO     = True
SFT_SAMPLES = 500
SFT_STEPS   = 60
DPO_STEPS   = 40
MAX_LEN     = 1024


BF16 = torch.cuda.is_available() and torch.cuda.is_bf16_supported()
DTYPE = torch.bfloat16 if BF16 else torch.float16
assert torch.cuda.is_available(), "No GPU detected — set Runtime > Change runtime type > GPU"
print(f"GPU: {torch.cuda.get_device_name(0)} | dtype={DTYPE} | 4bit={USE_4BIT}")

We install all the required libraries for fine-tuning LFM2 inside Google Colab. We import the core tools from Transformers, TRL, PEFT, datasets, bitsandbytes, and PyTorch. We also define the main training settings, detect available GPUs, and select the appropriate precision for efficient training.

Copy CodeCopiedUse a different Browser
def load_base(four_bit: bool):
   quant_cfg = None
   if four_bit:
       quant_cfg = BitsAndBytesConfig(
           load_in_4bit=True,
           bnb_4bit_quant_type="nf4",
           bnb_4bit_use_double_quant=True,
           bnb_4bit_compute_dtype=DTYPE,
       )
   model = AutoModelForCausalLM.from_pretrained(
       MODEL_ID,
       device_map="auto",
       dtype=DTYPE,
       quantization_config=quant_cfg,
   )
   model.config.use_cache = False
   return model


tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
if tokenizer.pad_token is None:
   tokenizer.pad_token = tokenizer.eos_token


model = load_base(USE_4BIT)


@torch.no_grad()
def chat(m, user_msg, system=None, max_new_tokens=200):
   msgs = ([{"role": "system", "content": system}] if system else []) + \
          [{"role": "user", "content": user_msg}]
   inputs = tokenizer.apply_chat_template(
       msgs,
       add_generation_prompt=True,
       return_tensors="pt",
       tokenize=True,
       return_dict=True,
   ).to(m.device)
   m.config.use_cache = True
   out = m.generate(
       **inputs,
       max_new_tokens=max_new_tokens, do_sample=True,
       temperature=0.3, min_p=0.15, repetition_penalty=1.05,
       pad_token_id=tokenizer.pad_token_id,
   )
   m.config.use_cache = False
   prompt_len = inputs["input_ids"].shape[-1]
   return tokenizer.decode(out[0, prompt_len:], skip_special_tokens=True)


PROBE = "Explain what makes the LFM2 architecture good for on-device AI, in 2 sentences."
print("\n=== BASELINE (before fine-tuning) ===\n", chat(model, PROBE))

We load the LFM2 base model with optional 4-bit quantization to reduce GPU memory usage. We prepare the tokenizer, set the padding token, and define a chat function for testing model responses. We then run a baseline prompt to compare the model’s behavior before and after fine-tuning.

Copy CodeCopiedUse a different Browser
sft_ds = load_dataset("HuggingFaceTB/smoltalk", "all", split=f"train[:{SFT_SAMPLES}]")
sft_ds = sft_ds.select_columns(["messages"])
print("\nSFT example messages:", sft_ds[0]["messages"][:2])


lora_sft = LoraConfig(
   r=16, lora_alpha=32, lora_dropout=0.05, bias="none",
   task_type="CAUSAL_LM", target_modules="all-linear",
)


sft_cfg = SFTConfig(
   output_dir="outputs/sft/lfm2_demo",
   max_length=MAX_LEN,
   per_device_train_batch_size=2,
   gradient_accumulation_steps=4,
   learning_rate=2e-5,
   warmup_ratio=0.03,
   lr_scheduler_type="cosine",
   max_steps=SFT_STEPS,
   logging_steps=10,
   save_strategy="no",
   gradient_checkpointing=True,
   gradient_checkpointing_kwargs={"use_reentrant": False},
   bf16=BF16, fp16=not BF16,
   optim="paged_adamw_8bit" if USE_4BIT else "adamw_torch",
   packing=False,
   report_to="none",
)


sft_trainer = SFTTrainer(
   model=model,
   args=sft_cfg,
   train_dataset=sft_ds,
   peft_config=lora_sft,
   processing_class=tokenizer,
)
sft_trainer.train()
sft_trainer.save_model("outputs/sft/lfm2_adapter")
print("\n=== AFTER SFT ===\n", chat(sft_trainer.model, PROBE))

We load a chat-formatted supervised fine-tuning dataset and keep only the messages column. We configure LoRA for lightweight adapter-based training and define the SFT training settings. We then train the model with SFT, save the LoRA adapter, and test the improved model response.

Copy CodeCopiedUse a different Browser
del sft_trainer, model
gc.collect(); torch.cuda.empty_cache()


base_fp16 = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", dtype=DTYPE)
sft_merged = PeftModel.from_pretrained(base_fp16, "outputs/sft/lfm2_adapter").merge_and_unload()
sft_merged.save_pretrained("outputs/sft/lfm2_merged")
tokenizer.save_pretrained("outputs/sft/lfm2_merged")
print("Merged SFT model saved -> outputs/sft/lfm2_merged")

We clear the earlier training objects from memory to free GPU resources. We reload the base LFM2 model in fp16 or bf16 and attach the trained SFT LoRA adapter. We then merge the adapter into the base model and save the merged SFT checkpoint for the next stage.

Copy CodeCopiedUse a different Browser
if RUN_DPO:
   pref_rows = [
       {"prompt":  [{"role": "user", "content": "Reply to a customer whose order is late."}],
        "chosen":  [{"role": "assistant", "content": "I'm sorry your order is delayed. I've checked your tracking and it will arrive within 2 days — here's a 10% credit for the inconvenience."}],
        "rejected":[{"role": "assistant", "content": "Orders are sometimes late. Please wait."}]},
       {"prompt":  [{"role": "user", "content": "Summarize the benefit of edge AI in one line."}],
        "chosen":  [{"role": "assistant", "content": "Edge AI runs models locally, giving low latency, offline reliability, and stronger privacy."}],
        "rejected":[{"role": "assistant", "content": "Edge AI is AI on the edge of things and it is good."}]},
       {"prompt":  [{"role": "user", "content": "Decline a meeting politely."}],
        "chosen":  [{"role": "assistant", "content": "Thanks for the invite — I have a conflict then. Could we find another slot this week?"}],
        "rejected":[{"role": "assistant", "content": "No."}]},
   ] * 20
   pref_ds = Dataset.from_list(pref_rows)


   lora_dpo = LoraConfig(r=16, lora_alpha=32, lora_dropout=0.05, bias="none",
                         task_type="CAUSAL_LM", target_modules="all-linear")
   dpo_cfg = DPOConfig(
       output_dir="outputs/dpo/lfm2_demo",
       per_device_train_batch_size=1,
       gradient_accumulation_steps=4,
       learning_rate=5e-6,
       beta=0.1,
       max_length=MAX_LEN,
       max_prompt_length=512,
       max_steps=DPO_STEPS,
       logging_steps=10,
       save_strategy="no",
       gradient_checkpointing=True,
       gradient_checkpointing_kwargs={"use_reentrant": False},
       bf16=BF16, fp16=not BF16,
       report_to="none",
   )
   dpo_trainer = DPOTrainer(
       model=sft_merged,
       ref_model=None,
       args=dpo_cfg,
       train_dataset=pref_ds,
       processing_class=tokenizer,
       peft_config=lora_dpo,
   )
   dpo_trainer.train()
   final = dpo_trainer.model.merge_and_unload()
   final.save_pretrained("outputs/final/lfm2_sft_dpo")
   tokenizer.save_pretrained("outputs/final/lfm2_sft_dpo")
   print("\n=== AFTER SFT + DPO ===\n", chat(dpo_trainer.model, PROBE))
   print("Final model saved -> outputs/final/lfm2_sft_dpo")


print("\nDone. Compare the BASELINE vs AFTER-SFT(+DPO) outputs above.")

We optionally run DPO using prompt-chosen-and-rejected response pairs. We configure another LoRA adapter for preference tuning and train the SFT-merged model with DPO. We finally merge the DPO adapter, save the final model checkpoint, and compare the result against earlier outputs.

In conclusion, we built a full fine-tuning pipeline for LFM2 using only open-source tools, including Transformers, TRL, PEFT, datasets, and bitsandbytes. We used QLoRA to make training efficient on Colab GPUs, applied supervised fine-tuning to chat-formatted data, merged the trained adapter into the base model, and optionally further improved the model through DPO. It gives us a clear view of how modern LLM fine-tuning works in practice, from loading the model to producing a final checkpoint that can be compared against the original baseline and prepared for deployment.


Check out the Codes with Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

YOU MAY ALSO LIKE

Meta Will Reportedly Let Employees Take 30-Minute Breaks From Its Tracking Program

Alibaba’s Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it’s proprietary

The post How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Coding Tutorial on Google Colab appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Meta Will Reportedly Let Employees Take 30-Minute Breaks From Its Tracking Program
AI & Technology

Meta Will Reportedly Let Employees Take 30-Minute Breaks From Its Tracking Program

June 3, 2026
Alibaba’s Qwen3.7-Plus supports text, video and imagery inputs at low cost of alt=
AI & Technology

Alibaba’s Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it’s proprietary

June 2, 2026
Control Resonant Will Bend Your Reality On September 24, 2026
AI & Technology

Control Resonant Will Bend Your Reality On September 24, 2026

June 2, 2026
Enterprise AI agents keep creating data silos. Microsoft’s Build answer is Microsoft IQ and Rayfin.
AI & Technology

Enterprise AI agents keep creating data silos. Microsoft’s Build answer is Microsoft IQ and Rayfin.

June 2, 2026
Next Post
Her Friends Say She’s Made the Biggest Mistake of Her Life

Her Friends Say She's Made the Biggest Mistake of Her Life

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Spotify’s Latest Feature Makes It Easier To Share Podcast Clips

Spotify’s Latest Feature Makes It Easier To Share Podcast Clips

May 27, 2026
Russian Drone Hits Romanian Apartment Building – The New York Times

Russian Drone Hits Romanian Apartment Building – The New York Times

May 29, 2026
Miley Cyrus receives star on Hollywood Walk of Fame

Miley Cyrus receives star on Hollywood Walk of Fame

May 29, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!