• Kinza Babylon Staked BTCKinza Babylon Staked BTC(KBTC)$83,270.000.00%
  • Steakhouse EURCV Morpho VaultSteakhouse EURCV Morpho Vault(STEAKEURCV)$0.000000-100.00%
  • Stride Staked InjectiveStride Staked Injective(STINJ)$16.51-4.18%
  • Vested XORVested XOR(VXOR)$3,404.231,000.00%
  • FibSwap DEXFibSwap DEX(FIBO)$0.0084659.90%
  • ICPanda DAOICPanda DAO(PANDA)$0.003106-39.39%
  • TruFin Staked APTTruFin Staked APT(TRUAPT)$8.020.00%
  • bitcoinBitcoin(BTC)$105,526.000.97%
  • ethereumEthereum(ETH)$2,517.681.63%
  • VNST StablecoinVNST Stablecoin(VNST)$0.0000400.67%
  • tetherTether(USDT)$1.00-0.02%
  • rippleXRP(XRP)$2.170.26%
  • binancecoinBNB(BNB)$649.510.83%
  • Wrapped SOLWrapped SOL(SOL)$143.66-2.32%
  • solanaSolana(SOL)$149.440.63%
  • usd-coinUSDC(USDC)$1.000.00%
  • dogecoinDogecoin(DOGE)$0.1834961.81%
  • tronTRON(TRX)$0.2856282.57%
  • cardanoCardano(ADA)$0.660.17%
  • staked-etherLido Staked Ether(STETH)$2,516.821.61%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$105,506.001.01%
  • Gaj FinanceGaj Finance(GAJ)$0.0059271.46%
  • Content BitcoinContent Bitcoin(CTB)$24.482.55%
  • USD OneUSD One(USD1)$1.000.11%
  • HyperliquidHyperliquid(HYPE)$34.583.85%
  • SuiSui(SUI)$3.240.29%
  • Wrapped stETHWrapped stETH(WSTETH)$3,033.601.95%
  • UGOLD Inc.UGOLD Inc.(UGOLD)$3,042.460.08%
  • ParkcoinParkcoin(KPK)$1.101.76%
  • chainlinkChainlink(LINK)$13.780.66%
  • avalanche-2Avalanche(AVAX)$20.473.31%
  • leo-tokenLEO Token(LEO)$9.252.66%
  • stellarStellar(XLM)$0.2639880.13%
  • bitcoin-cashBitcoin Cash(BCH)$410.213.70%
  • ToncoinToncoin(TON)$3.180.50%
  • shiba-inuShiba Inu(SHIB)$0.0000131.11%
  • USDSUSDS(USDS)$1.000.01%
  • hedera-hashgraphHedera(HBAR)$0.1679221.57%
  • Yay StakeStone EtherYay StakeStone Ether(YAYSTONE)$2,671.07-2.84%
  • litecoinLitecoin(LTC)$88.291.06%
  • wethWETH(WETH)$2,517.661.60%
  • Wrapped eETHWrapped eETH(WEETH)$2,692.571.79%
  • polkadotPolkadot(DOT)$4.022.29%
  • moneroMonero(XMR)$329.082.51%
  • Pundi AIFXPundi AIFX(PUNDIAI)$16.000.00%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.00-0.04%
  • PengPeng(PENG)$0.60-13.59%
  • Ethena USDeEthena USDe(USDE)$1.00-0.04%
  • Bitget TokenBitget Token(BGB)$4.661.53%
  • MurasakiMurasaki(MURA)$4.32-12.46%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

How vector databases can revolutionize our relationship with generative AI

April 30, 2023
in AI & Technology
Reading Time: 6 mins read
A A
How vector databases can revolutionize our relationship with generative AI
ShareShareShareShareShare

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


Generative AI has received a lot of attention already this year in the tech world and beyond. Whether it’s ChatGPT’s prose or Stable Diffusion’s art, 2022 provided an insight into the potential for AI to disrupt creative industries.

But behind the headlines, 2022 brought an even more important development in AI: the rise of the vector database.

YOU MAY ALSO LIKE

Agent-based computing is outgrowing the web as we know it

Marvel Tōkon, Resident Evil Requiem and more

While their impacts are less immediately obvious, the adoption of vector databases could completely upend the way we interact with our devices, along with dramatically improving our productivity in a vast range of administrative and clerical tasks.

Ultimately, vector databases will be essential infrastructure in bringing about the societal and economic changes promised by AI.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

 

Register Now

But what is a vector database? To understand that, we have to make sense of the underlying problem it addresses: unstructured data.

The database dilemma

Databases are one of the software industry’s longest-lasting and most resilient verticals. The total spend on databases and database management solutions doubled from $38.6B in 2017 to $80B in 2021. And since 2020, databases have only further entrenched their position as one of the most rapidly growing software categories, owing to further digitization following mass shifts to remote working.

However, the modern database is still constrained by a problem that has persisted for decades: the problem of unstructured data. This is the up to 80% of data stored globally that has not been formatted, tagged or structured in a way that allows it to be rapidly searched or recalled. 

For a simple analogy of structured vs. unstructured data, think of a spreadsheet with multiple columns per row. In this case, a row of “structured data” has all the relevant columns filled in, whereas a row of “unstructured data” does not. In the case of the unstructured entry, it may be that the data has been automatically imported into the first column of the row; someone now needs to break up that cell and populate data into relevant columns.

Why is unstructured data a problem? In short, it makes it harder to sort, search, review and use information in a database. However, our understanding of unstructured data is relative to how data is usually structured.

Missing tags or misaligned formatting means that unstructured entries can be missed in searches or incorrectly excluded/included from filtering. This introduces risks of error to many database operations, which we have to address through manually structuring the data. This often requires us to manually review unstructured entries. This doesn’t mean that the data itself is necessarily unstructured; it just requires more manual intervention than our usual means of data storing. 

We often hear about the burden of manual review with claims such as data scientists spending 80% of their time on data preparation. But in practice, this is something we all do to some extent, or at least live with the effects of. If you’ve had to wrestle with a file explorer to find something on your hard drive or spend lots of time screening out irrelevant search engine results, you’ve likely been hit by the unstructured data problem.

This wasted time on manual formatting, reviewing and filtering is not a new or exclusively digital problem. For example, librarians manually arrange books according to the Dewey Decimal System. The unstructured data problem is just a digital version of a fundamental challenge with every record-keeping task humans have had since we invented writing: We need to classify information to store and use it. 

This is where vector databases prove particularly exciting. Rather than relying on distinct categories and lists to organize our records, vector databases instead place them on a map.

Vectors and mapping

Vector databases use a concept in machine learning and deep learning called vector embeddings. Vector embedding is a technique where words or phrases in a text are mapped to high-dimensional vectors, also known as word embeddings. These vectors are learned in such a way that semantically similar words are close together in the vector space.

This representation allows deep neural networks to process textual data more effectively, and has proven very useful in a variety of natural language processing tasks such as text classification, translation and sentiment analysis.

In the database context, vector embedding is effectively a numerical representation of a group of properties we want to measure.

To create an embedding, we take a trained machine learning model and instruct it to monitor for those properties in entries in a dataset.

In the case of a text string, for example, the model could be told to log the average word length, sentiment analysis scores, or occurrence of specific words.

The final embedding takes the form of a series of numbers corresponding to the “scores” logged in the audit of properties. A vector database takes the scores of the vector embeddings and plots them on a graph. Every property we measure in a vector embedding constitutes a dimension of the graph, resulting in it usually having many more than the three dimensions we can conventionally visualize. 

With all this information plotted, we can still calculate how “far” away any one embedding is from another embedding in the same way we can in any other graph. Perhaps more importantly, we can engage in a novel way of searching data. By generating a vector embedding of an inputted search query, we plot a point on the graph we want to target. Then, we can discover the embeddings that are the nearest to our search point.

Vector embeddings are not a perfect solution for everything. They are typically learned in an unsupervised manner, making it difficult to interpret their meaning and how they contribute to the overall model performance. Pre-trained embeddings can also contain biases present in the training data, such as gender, racial or political biases, which can negatively impact model performance.

The potential of vector search

A vector database doesn’t rely on tags, labels, metadata or other tools typically used to structure data. Instead, because a vector embedding can track any property we deem relevant, vector databases allow us to obtain search results based on overall similarity.

Whereas current searches of unstructured data involve manual reviewing and interpreting, vector databases will allow searches to actually reflect the meaning behind our queries rather than superficial properties like keywords.

This change stands to revolutionize data handling, record-keeping and most administrative work and clerical tasks. Because of the reduction in “false positive” search results and a reduced need to pre-screen and format queries to a system, vector databases can dramatically boost the productivity and efficiency of just about any job in the knowledge economy.

Aside from gains in administrative productivity, these advanced search capabilities will allow us to rely on databases to engage more effectively with creative and open-ended queries. 

This is an ideal complement to the rise of generative AI. Because vector databases reduce the need to structure data, we can substantially speed up training times for generative AI models by automating much of the work around processing unstructured data for training and production.

As a result, many organizations can simply import their unstructured data into a vector database and tell it what properties they want to be measured in their embeddings. With those embeddings generated, an organization can rapidly train and deploy a generative model by simply letting it search the vector database to gather information for tasks.

The vector database is set to dramatically improve our productivity and revolutionize how we field queries to computers. Altogether, this makes vector databases one of the most important emergent technologies of the coming decade.

Rick Hao is partner at Speedinvest.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Credit: Source link

ShareTweetSendSharePin

Related Posts

Agent-based computing is outgrowing the web as we know it
AI & Technology

Agent-based computing is outgrowing the web as we know it

June 7, 2025
Marvel Tōkon, Resident Evil Requiem and more
AI & Technology

Marvel Tōkon, Resident Evil Requiem and more

June 7, 2025
Monument Valley 3 launches on console and PC on July 22
AI & Technology

Monument Valley 3 launches on console and PC on July 22

June 7, 2025
Make it Home takes interior design on the road
AI & Technology

Make it Home takes interior design on the road

June 7, 2025
Next Post
We must perfect predictive models for generative AI to deliver on the AI revolution

We must perfect predictive models for generative AI to deliver on the AI revolution

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Viper Energy Boosts Distributable Cash Flow With Acquisition Of Sitio Royalties (VNOM)

Viper Energy Boosts Distributable Cash Flow With Acquisition Of Sitio Royalties (VNOM)

June 4, 2025
Busses burn at Philadelphia maintenance facility

Busses burn at Philadelphia maintenance facility

June 6, 2025
Chinese seller on Amazon sold ‘defective’ tools linked to deaths, life-changing injuries: lawsuits

Chinese seller on Amazon sold ‘defective’ tools linked to deaths, life-changing injuries: lawsuits

June 4, 2025

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!