• bitcoinBitcoin(BTC)$62,166.002.78%
  • ethereumEthereum(ETH)$1,630.684.90%
  • tetherTether(USDT)$1.000.01%
  • binancecoinBNB(BNB)$596.453.94%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.156.06%
  • solanaSolana(SOL)$65.305.94%
  • tronTRON(TRX)$0.3269151.67%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.29%
  • HyperliquidHyperliquid(HYPE)$59.343.45%
  • dogecoinDogecoin(DOGE)$0.0847544.84%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.50-0.55%
  • RainRain(RAIN)$0.0132993.85%
  • zcashZcash(ZEC)$422.4218.37%
  • stellarStellar(XLM)$0.2070760.18%
  • CantonCanton(CC)$0.1655864.93%
  • cardanoCardano(ADA)$0.1628872.84%
  • moneroMonero(XMR)$310.384.39%
  • chainlinkChainlink(LINK)$7.735.60%
  • whitebitWhiteBIT Coin(WBT)$44.383.18%
  • the-open-networkToncoin(TON)$1.738.41%
  • USD1USD1(USD1)$1.000.04%
  • bitcoin-cashBitcoin Cash(BCH)$225.005.13%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • daiDai(DAI)$1.00-0.06%
  • LABLAB(LAB)$13.1937.14%
  • MemeCoreMemeCore(M)$3.109.31%
  • hedera-hashgraphHedera(HBAR)$0.0815393.87%
  • litecoinLitecoin(LTC)$42.02-0.34%
  • suiSui(SUI)$0.755.68%
  • avalanche-2Avalanche(AVAX)$6.690.97%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.0000054.33%
  • crypto-com-chainCronos(CRO)$0.0600693.35%
  • nearNEAR Protocol(NEAR)$2.049.32%
  • tether-goldTether Gold(XAUT)$4,303.720.36%
  • Global DollarGlobal Dollar(USDG)$1.000.04%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.41%
  • BittensorBittensor(TAO)$209.718.18%
  • pax-goldPAX Gold(PAXG)$4,308.030.30%
  • mantleMantle(MNT)$0.544.17%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0553910.01%
  • Ripple USDRipple USD(RLUSD)$1.00-0.01%
  • OndoOndo(ONDO)$0.3452916.79%
  • worldcoin-wldWorldcoin(WLD)$0.48265117.42%
  • AsterAster(ASTER)$0.642.36%
  • polkadotPolkadot(DOT)$0.973.56%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Good bot, bad bot: Using AI and ML to solve data quality problems

May 6, 2023
in AI & Technology
Reading Time: 5 mins read
A A
Good bot, bad bot: Using AI and ML to solve data quality problems
ShareShareShareShareShare

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


More than 40% of all website traffic in 2021 wasn’t even human. 

YOU MAY ALSO LIKE

How To Get Your Money’s Worth From YouTube Premium

What’s Driving the Rally in AI Stocks?

This might sound alarming, but it’s not necessarily a bad thing; bots are core to functioning the internet. They make our lives easier in ways that aren’t always obvious, like getting push notifications on promotions and discounts.

But, of course, there are bad bots, and they infest nearly 28% of all website traffic. From spam, account takeovers, scraping of personal information and malware, it’s typically how bots are deployed by people that separates good from bad.

With the unleashing of accessible generative AI like ChatGPT, it’s going to get harder to discern where bots end and humans begin. These systems are getting better with reasoning: GPT-4 passed the bar exam in the top 10% of test takers and bots have even defeated CAPTCHA tests. 

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

 

Register Now

In many ways, we could be at the forefront of a critical mass of bots on the internet, and that could be a dire problem for consumer data. 

The existential threat

Companies spend about $90 billion on market research each year to decipher trends, customer behavior and demographics. 

But even with this direct line to consumers, failure rates on innovation are dire. Catalina projects that the failure rate of consumer packaged goods (CPG) is at a frightful 80%, while the University of Toronto found that 75% of new grocery products flop.

What if the data these creators rely on was riddled with AI-generated responses and didn’t actually represent the thoughts and feelings of a consumer? We’d live in a world where businesses lack the fundamental resources to inform, validate and inspire their best ideas, causing failure rates to skyrocket, a crisis they can ill-afford now. 

Bots have existed for a long time, and for the most part, market research has relied on manual processes and gut instinct to analyze, interpret and weed out such low-quality respondents. 

But while humans are exceptional at bringing reason to data, we are incapable of deciphering bots from humans at scale. The reality for consumer data is that the nascent threat of large language models (LLMs) will soon overtake our manual processes through which we’re able to identify bad bots. 

Bad bot, meet good bot

Where bots may be a problem, they could also be the answer. By creating a layered approach using AI, including deep learning or machine learning (ML) models, researchers can create systems to separate low-quality data and rely on good bots to carry them out. 

This technology is ideal for detecting subtle patterns that humans can easily miss or not understand. And if managed correctly, these processes can feed ML algorithms to constantly assess and clean data to ensure quality is AI-proof. 

Here’s how: 

Create a measure of quality

Rather than relying solely on manual intervention, teams can ensure quality by creating a scoring system through which they identify common bot tactics. Building a measure of quality requires subjectivity to accomplish. Researchers can set guardrails for responses across factors. For example: 

  • Spam probability: Are responses made up of inserted or cut-and-paste content? 
  • Gibberish: A human response will contain brand names, proper nouns or misspellings, but generally track toward a cogent response. 
  • Skipping recall questions: While AI can sufficiently predict the next word in a sequence, they are unable to replicate personal memories. 

These data checks can be subjective — that’s the point. Now more than ever, we need to be skeptical of data and build systems to standardize quality. By applying a point system to these traits, researchers can compile a composite score and eliminate low-quality data before it moves on to the next layer of checks. 

Look at the quality behind the data

With the rise of human-like AI, bots can slip through the cracks through quality scores alone. This is why it’s imperative to layer these signals with data around the output itself. Real people take time to read, re-read and analyze before responding; bad actors often don’t, which is why it’s important to look at the response level to understand trends of bad actors.

Factors like time to response, repetition and insightfulness can go beyond the surface level to deeply analyze the nature of the responses. If responses are too fast, or nearly identical responses are documented across one survey (or multiple), that can be a tell-tale sign of low-quality data. Finally, going beyond nonsensical responses to identify the factors that make an insightful response — by looking critically at the length of the response and the string or count of adjectives — can weed out the lowest-quality responses. 

By looking beyond the obvious data, we can establish trends and build a consistent model of high-quality data. 

Get AI to do your cleaning for you

Ensuring high-quality data isn’t a “set and forget it” process; it requires consistently moderating and ingesting good — and bad — data to hit the moving target that is data quality. Humans play an integral role in this flywheel, where they set the system and then sit above the data to spot patterns that influence the standard, then feed these features back into the model, including the rejected items. 

Your existing data isn’t immune, either. Existent data shouldn’t be set in stone, but rather subject to the same rigorous standards as new data. By regularly cleaning normative databases and historic benchmarks, you can ensure that every new piece of data is measured against a high-quality comparison point, unlocking more agile and confident decision-making at scale. 

Once these scores are in-hand, this methodology can be scaled across regions to identify high-risk markets where manual intervention could be needed.

Fight nefarious AI with good AI

The market research industry is at a crossroads; data quality is worsening, and bots will soon constitute an even larger share of internet traffic. It won’t be long and researchers should act fast. 

But the solution is to fight nefarious AI with good AI. This will allow for a virtuous flywheel to spin; the system gets smarter as more data is ingested by the models. The result is an ongoing improvement in data quality. More importantly, it means that companies can have confidence in their market research to make much better strategic decisions. 

Jack Millership is the data expertise lead at Zappi.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Credit: Source link

ShareTweetSendSharePin

Related Posts

How To Get Your Money’s Worth From YouTube Premium
AI & Technology

How To Get Your Money’s Worth From YouTube Premium

June 7, 2026
What’s Driving the Rally in AI Stocks?
AI & Technology

What’s Driving the Rally in AI Stocks?

June 7, 2026
Best 21 Low-Code and No-Code AI Tools in 2026
AI & Technology

Best 21 Low-Code and No-Code AI Tools in 2026

June 7, 2026
Nvidia Gets Into the PC Market With New Chip | Bloomberg Tech 6/1/2026
AI & Technology

Nvidia Gets Into the PC Market With New Chip | Bloomberg Tech 6/1/2026

June 7, 2026
Next Post
Will Dogecoin Become Twitter’s Official Currency?

Will Dogecoin Become Twitter's Official Currency?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
S&P to 8,000? Robert Schein breaks down the momentum pushing us higher

S&P to 8,000? Robert Schein breaks down the momentum pushing us higher

June 2, 2026
Teacher reflects on student’s bravery during San Diego mosque shooting

Teacher reflects on student’s bravery during San Diego mosque shooting

May 31, 2026
Microsoft debuts Surface RTX Spark Dev Box to run large AI models without cloud costs

Microsoft debuts Surface RTX Spark Dev Box to run large AI models without cloud costs

June 2, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!