• Kinza Babylon Staked BTCKinza Babylon Staked BTC(KBTC)$83,270.000.00%
  • Steakhouse EURCV Morpho VaultSteakhouse EURCV Morpho Vault(STEAKEURCV)$0.000000-100.00%
  • Stride Staked InjectiveStride Staked Injective(STINJ)$16.51-4.18%
  • Vested XORVested XOR(VXOR)$3,404.231,000.00%
  • FibSwap DEXFibSwap DEX(FIBO)$0.0084659.90%
  • ICPanda DAOICPanda DAO(PANDA)$0.003106-39.39%
  • TruFin Staked APTTruFin Staked APT(TRUAPT)$8.020.00%
  • bitcoinBitcoin(BTC)$103,339.000.54%
  • ethereumEthereum(ETH)$2,444.186.22%
  • VNST StablecoinVNST Stablecoin(VNST)$0.0000400.67%
  • tetherTether(USDT)$1.000.02%
  • rippleXRP(XRP)$2.412.80%
  • binancecoinBNB(BNB)$653.312.92%
  • solanaSolana(SOL)$171.411.03%
  • Wrapped SOLWrapped SOL(SOL)$143.66-2.32%
  • usd-coinUSDC(USDC)$1.000.01%
  • dogecoinDogecoin(DOGE)$0.22797912.25%
  • cardanoCardano(ADA)$0.803.28%
  • tronTRON(TRX)$0.259642-0.58%
  • staked-etherLido Staked Ether(STETH)$2,446.596.64%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$103,292.000.57%
  • SuiSui(SUI)$3.982.16%
  • Gaj FinanceGaj Finance(GAJ)$0.0059271.46%
  • Content BitcoinContent Bitcoin(CTB)$24.482.55%
  • USD OneUSD One(USD1)$1.000.11%
  • chainlinkChainlink(LINK)$16.524.01%
  • avalanche-2Avalanche(AVAX)$24.426.00%
  • Wrapped stETHWrapped stETH(WSTETH)$2,928.666.05%
  • UGOLD Inc.UGOLD Inc.(UGOLD)$3,042.460.08%
  • stellarStellar(XLM)$0.3048363.39%
  • ParkcoinParkcoin(KPK)$1.101.76%
  • shiba-inuShiba Inu(SHIB)$0.0000165.80%
  • hedera-hashgraphHedera(HBAR)$0.2092785.36%
  • ToncoinToncoin(TON)$3.363.39%
  • bitcoin-cashBitcoin Cash(BCH)$421.403.70%
  • HyperliquidHyperliquid(HYPE)$24.981.71%
  • USDSUSDS(USDS)$1.000.01%
  • leo-tokenLEO Token(LEO)$8.62-0.85%
  • litecoinLitecoin(LTC)$102.454.13%
  • polkadotPolkadot(DOT)$5.026.08%
  • Yay StakeStone EtherYay StakeStone Ether(YAYSTONE)$2,671.07-2.84%
  • wethWETH(WETH)$2,456.696.71%
  • Pundi AIFXPundi AIFX(PUNDIAI)$16.000.00%
  • moneroMonero(XMR)$326.898.08%
  • PengPeng(PENG)$0.60-13.59%
  • Wrapped eETHWrapped eETH(WEETH)$2,607.466.49%
  • Bitget TokenBitget Token(BGB)$4.848.73%
  • PepePepe(PEPE)$0.0000133.89%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.000.02%
  • Pi NetworkPi Network(PI)$0.732.59%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Fondant AI Releases Fondant-25M Dataset of Image-Text Pairs with a Creative Commons License

October 15, 2023
in AI & Technology
Reading Time: 4 mins read
A A
Fondant AI Releases Fondant-25M Dataset of Image-Text Pairs with a Creative Commons License
ShareShareShareShareShare

YOU MAY ALSO LIKE

Spreadsheet puzzles, metatextual platformers and other new indie games worth checking out

Bessent on AI, Hollywood Shaken by Tariff Threat | Bloomberg Technology

Handling and analysis of vast amounts of data is called Large-scale data processing. It involves extracting valuable insights, making informed decisions, and solving complex problems. It is crucial in various fields, including business, science, healthcare, and more. The choice of tools and methods depends on the specific requirements of the data processing task and the available resources. Programming languages like Python, Java, and Scala are often used for large-scale data processing. In this context, frameworks like Apache Flink, Apache Kafka, and Apache Storm are also valuable.

Researchers have built a new open-source framework called Fondant to simplify and speed up large-scale data processing. It has various embedded tools to download, explore, and process data. It also includes components for downloading through URLs and downloading images. 

The current challenge with generative AI, such as Stable Diffusion and Dall-E, is trained on hundreds of millions of images from the public Internet, including copyrighted work. This creates legal risks and uncertainties for users of these images and is unfair toward copyright holders who may not want their proprietary work reproduced without consent.

To tackle it, researchers have developed a data-processing pipeline to create 500 million datasets of Creative Commons images to train the latent diffusion image generation models. Data-processing pipelines are steps and tasks designed to collect, process, and move data from one source to another, where it can be stored and analyzed for various purposes.

Creating custom data processing pipelines involves several steps, and the specific approach may vary depending on your data sources, processing requirements, and tools. Researchers use the method of building blocks to create custom pipelines. They designed the Fondant pipelines to mix reusable components and custom components. They further deployed it in a production environment and set up automation for regular data processing.

Fondant-cc-25m contains 25 million image URLs with their Creative Commons license information that can be easily accessed in one go! The researchers have released a detailed step-by-step installation program for local users. To execute the pipelines locally, users must have Docker installed in their systems with at least 8GB of RAM allocated to their Docker environment. 

As the released dataset may contain sensitive personal information, the researchers only designed the datasets to include public, non-personal information in support of conducting and publishing their open-access research. They say the filtering pipeline for the dataset is still in progress, and they are willing to have contributions from other researchers to contribute to creating anonymous pipelines for the project. Researchers say that in the future, they want to add different components like Image-based deduplication, automatic captioning, visual quality estimation, watermark detection, face detection, text detection, and much more!


Check out the Blog Article and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..


Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.


▶️ Now Watch AI Research Updates On Our Youtube Channel [Watch Now]

Credit: Source link

ShareTweetSendSharePin

Related Posts

Spreadsheet puzzles, metatextual platformers and other new indie games worth checking out
AI & Technology

Spreadsheet puzzles, metatextual platformers and other new indie games worth checking out

May 10, 2025
Bessent on AI, Hollywood Shaken by Tariff Threat | Bloomberg Technology
AI & Technology

Bessent on AI, Hollywood Shaken by Tariff Threat | Bloomberg Technology

May 10, 2025
IBM CEO on DOGE and Government Contracts, Doubling Down on AI
AI & Technology

IBM CEO on DOGE and Government Contracts, Doubling Down on AI

May 10, 2025
Palantir’s Raised Outlook Not Enough to Extend Rally
AI & Technology

Palantir’s Raised Outlook Not Enough to Extend Rally

May 10, 2025
Next Post
Why I’m Downgrading ASGN Stock

Why I'm Downgrading ASGN Stock

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Destiny 2 will get a Star Wars expansion this December

Destiny 2 will get a Star Wars expansion this December

May 6, 2025
More than 25 protesters arrested after taking over University of Washington building – ABC News

More than 25 protesters arrested after taking over University of Washington building – ABC News

May 6, 2025
Sunstone Hotel Investors: Series H Preferred Equity Has Significant Duration Risk

Sunstone Hotel Investors: Series H Preferred Equity Has Significant Duration Risk

May 6, 2025

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!