Thursday, April 3, 2025

Latest Posts

How Strong Is AI When Hallucinations Haunt?

ProDentim Coinrule Hostinger ClickFunnels Max and Mark Decor


AI knows it all — but what happens when it makes it up?

I remember research analysts being the most frustrated group back in November 2022 when ChatGPT exploded onto the tech scene. They were being asked to experiment with and use AI in their workflows, but it didn’t take long for them to encounter a major stumbling block. After all, would you risk your career and credibility over a new technology fad?

While content creators like myself, data scientists, and engineers were thriving with AI adoption, we could only empathize with our research analyst peers as we partnered with them to find new ways to make OpenAI, Gemini, Langchain, and Perplexity cater to their requirements. Everyone tried building trust in AI as we put on our researcher hats.

But soon, the consensus was that AI hallucinations were a problem for knowledge workers, whether you were a researcher, content creator, developer, or a business leader.

Fast forward to 2025, and despite all the advancements in AI, hallucinations haven’t disappeared. While companies like Anthropic, OpenAI, and NVIDIA are pushing the boundaries of AI reasoning models, the ghost of hallucinations still lingers. Our latest G2 LinkedIn poll reveals that nearly 75% of professionals have experienced AI hallucinations, with over half (52%) saying they have experienced AI hallucinations multiple times.

These new developments might promise smarter, faster, and more reliable AI, but the question remains — are they strong enough to keep hallucinations at bay? 

Let’s take a closer look at the latest AI LLM updates shaping the industry:

Hallucinations, the ‘Answer Economy’, and real-world challenges

As AI models evolve with new capabilities, the way we interact with information is also transforming. We are witnessing the rise of a mega-trend that our very own Tim Sanders calls the “Answer Economy.” People are transitioning from search-based research to an answer-driven style of learning, buying, and working.

But there’s a catch in all of this. AI chatbots seem to be delivering instant, confident responses — even when they’re wrong. And despite accuracy concerns, these AI-generated answers are influencing decisions across industries. This shift poses a critical question: are we too quick to accept AI’s responses as truth, especially when the stakes are high? How strong is our trust in AI?

While AI chatbots are shaking up search and AI companies are leaping towards agentic AI, how strong are their roots when hallucinations haunt? AI hallucinations can be as trivial as Gemini telling people to eat rocks and glue pizza. Or as big as fabricating claims like the ones below.

There were several other notable AI hallucination mishaps in 2024 involving brands like Air Canada, Zillow, Microsoft, Groq, and McDonald’s.

So, are AI chatbots making life easier or just adding another layer of complexity for businesses? We combed through G2 reviews to uncover what’s working, what’s not, and where the hallucinations hit hardest.

More than your average newsletter.

Every Thursday, we spill hot takes, insider knowledge, and news recaps straight to your inbox. Subscribe here

The G2 take

A quick comparison of ChatGPT, Gemini, Claude, and Perplexity shows ChatGPT as the leader at a glance, with an 8.7/10 score. However, a closer look reveals that Gemini leads in terms of reliability — by a slim margin.

G2 comparison of ChatGPT, Gemini, Claude, and Perplexity

Source: G2.com

While ChatGPT has greater capabilities of learning from user interactions to reduce errors and understand context, Perplexity and Gemini beat it at content accuracy with an 8.5 score.

Accuracy - ChatGPT, Gemini, Claude, and Perplexity

Context understanding - ChatGPT, Gemini, Claude, and Perplexity

Source: G2.com

Nearly 35% of reviews highlight the accuracy gap

These AI chatbots are being used in small businesses, SMEs, and enterprises by all kinds of professionals — research analysts, marketing leaders, software engineers, tutors, etc. And a deep dive into G2 review data reveals a glaring trend: inaccuracy remains a shared concern across the board.

We can’t help but notice that, right off the bat, an average of ~34.98% of reviews have concerns about inaccuracy, context understanding, and outdated information.

AI chatbot inaccuracy rates as per G2 review data

Source: Exclusive G2 Data

Users aren’t shy about flagging their frustrations. Out of the hundreds of reviews, accuracy concerns topped the list of cons:

  • ChatGPT: 101 mentions of inaccuracy, with outdated information adding to the frustration
  • Gemini: 33 instances of inaccurate responses, compounded by 26 complaints about context understanding
  • Claude: Fewer reports, but with seven accuracy issues and five concerns about recognition
  • Perplexity: While boasting quick insights, it wasn’t immune — users pointed out seven limitations related to AI accuracy

While China’s DeepSeek has turned heads and wreaked stock market havoc due to its speed and cost-saving go-to-market (GTM) product, it does not have a definite (and dare we say legal enough) presence in the USA for valid concerns over safety and potential data siphoning. Speculations around its reliability outweigh the allure of affordability.

Our VP of Insights, Tim Sanders, called it out for its hallucination rate in a recent interview.

“DeepSeek’s R1 has an 83% hallucination rate for research and writing, which is much higher than the 10% hallucination rate of other AI platforms.”

Tim Sanders
VP of Research Insights at G2

Gemini: The ironic productivity booster for research analysts

We noted several research analysts use Gemini. Some particularly prefer the research mode and use it for academic and market research. 

“Daily use, particularly in love with research mode. Gemini’s speed enhances the surfing experience overall, especially for those who use the internet for extensive research and work duties or who multitask.”

Elmoury T.
Research Analyst

But here’s the twist: research analysts aren’t raving about Gemini for its research reliability. Instead, it’s the seamless connectivity to Google’s suite of tools and customizable user experience that steals the spotlight. Productivity boosts, streamlined workflows, and smoother task management? Absolutely. Trusting it for rigorous research? Not so much.

While Gemini’s research mode aggregates information from the internet, accuracy and fact-checking aren’t making the headlines. Memory management issues and sluggish performance also keep it from being a true research powerhouse.

Cyril Clare G2 user review of Gemini

Source: G2.com Reviews

ChatGPT: power player with precision pitfalls

From code generation to market research, ChatGPT has become a daily go-to for professionals to brainstorm, generate content quickly, and answer complex questions. Yet, accuracy concerns persist.

Geopolitical topics and nuanced research often lead to misleading results. Context understanding is solid, but misinformation and hallucinations still plague users.

User reviews praise ChatGPT’s polished tone and contextual understanding, but this confidence often masks the occasional hallucination. Users highlighted its tendency to provide plausible-sounding but inaccurate information, especially in complex or nuanced scenarios like geopolitics. It’s a textbook case of “sounding smart but not always being right.”

Paid account users are impressed with its new multimodal inputs, voice interactions, and memory retention but also highlight its limitations in data analysis, image creation, and overall accuracy. 

Overall, paid users find the product pricey compared to other free alternatives available in the market owing to ChatGPT’s server down time and accuracy issues.

Shilpi M G2 user review of ChatGPT

Source: G2.com Reviews

Juan M G2 user review of ChatGPT

Source: G2.com Reviews

G2 reviews also surfaced how users go through back-and-forth with ChatGPT to get their desired outcomes. At times, users ran out of allotted tokens quickly, leaving their queries unsatisfied.

Sakshi G2 user review of ChatGPT

Source: G2.com Reviews

But for some users, the benefits far outweigh the pitfalls. For instance, in industries where speed and efficiency are crucial, ChatGPT is proving to be a game-changer.

“Traditionally, my weekly research could take me over an hour of manual work, scouring data and reports. ChatGPT has slashed this process to just 10-15 minutes. That’s time I can now invest in other critical areas of my business.”

Peter Gill
G2 Icon and Freight Broker

Peter advocates that AI’s benefits extend far beyond the logistics sector, proving to be a powerful ally in today’s data-driven world.

Perplexity: speed meets smarts — with a side of stumbles

Perplexity’s external web search capability and speedy updates have earned it a solid fanbase among researchers. Users praise its ability to provide comprehensive, context-aware insights. The frequent integration of the latest AI models ensures it remains a step ahead.

But it’s not all sunshine and summaries. Users flagged issues with data export, making it harder to translate insights into actionable reports. Minor UX improvements could also significantly elevate its user experience.

Michael N., a G2 reviewer and head of customer intelligence, stated that Perplexity Pro has transformed how he builds knowledge.

Michael N G2 user review of Perplexity

Source: G2.com Reviews

“Easiest way of conducting tiny and complex research with proper prompting.”

Vitaliy V.
G2 Icon and Product Marketing Manager

Business leaders and CMOs like Andrea L. are using different AI chatbots to either supplement, complement, or complete their research.

Andrea L G2 user review of Perplexity

Source: G2.com Reviews

“Perplexity is our trusted companion for research purposes, while we use ChatGPT for managing the obtained data. We also use additional tools and wrappers, API, local models etc. But the unbeatable ones are Perplexity and ChatGPT at this moment.”

Luca Piccinotti
G2 Icon and CTO at Studio Piccinotti

Claude: a fairly honest, human-like, data-deficient counterpart

Claude’s conversational tone and contextual understanding shine through in reviews. Users appreciate its willingness to admit when it doesn’t know something rather than hallucinating a response. That level of transparency builds trust.

However, limited training data and capability gaps compared to competitors like ChatGPT remain areas for improvement. And while its strengths lie in conversational accuracy, its structured data analysis is still a work in progress.

Unlike most AI chatbots that confidently provide incorrect answers, Claude users appreciate its transparency when it doesn’t know something. This “honesty over hallucination” approach is a unique selling point, making it a preferred choice for users who value reliable feedback over speculative responses.

John E G2 user review of Claude

Source: G2.com Reviews

However, users also expressed frustrations around Claude’s professional mode, citing its usage bandwidth and lack of customer service.

Jennifer S G2 user review of Claude

Source: G2.com Reviews

Verdict: AI for research — yay or nay?

It’s a cautious yay — which is still better than the classic “it depends”.

AI chatbots are undeniably valuable research tools, especially for speeding up information gathering and summarizing. But they’re not flawless.

4 key takeaways

Hallucinations, accuracy issues, and inconsistent reliability remain challenges.

  1. Gemini might be your productivity sidekick, just not your research fact-checker if you’re a research analyst who values integration and productivity over pinpoint accuracy.
  2. ChatGPT is a productivity booster for quick research tasks, but fact-checking remains a must, even if you’re paying a bomb for the paid subscription.
  3. Perplexity is a reliable knowledge companion for researchers who value speed and cutting-edge AI.
  4. Claude is the choice for those seeking honest, human-like responses, but don’t expect it to crunch complex datasets.

Hallucinate less, verify more: avoid the AI tunnel vision trap

Expect AI models to double down on accuracy and transparency. Advances in multimodal AI and retrieval-augmented generation (RAG) could reduce hallucinations. Perplexity, OpenAI, Google, and Anthropic now have their own AI search capabilities, which will plug into real-time user data to sharpen the accuracy and relevance of outputs.

Even though newer models like DeepSeek R1 are being built at one-tenth the cost of major competitors, its trustworthiness will determine its fate in the global market.

In the end, AI chatbots and LLMs are your research sidekick, not your fact-checker. Use them wisely, question relentlessly, and let the data — not the chatbot — lead the way.

Enjoyed this deep-dive analysis? Subscribe to the G2 Tea newsletter today for the hottest takes in your inbox.


Edited by Supanna Das



ProDentim Coinrule Hostinger ClickFunnels Max and Mark Decor

Latest Posts

Don't Miss

Stay in touch

To be updated with all the latest news, offers and special announcements.