AI knows it all — but what happens when it makes it up?
I remember research analysts being the most frustrated group back in November 2022 when ChatGPT exploded onto the tech scene. They were being asked to experiment with and use AI in their workflows, but it didn’t take long for them to encounter a major stumbling block. After all, would you risk your career and credibility over a new technology fad?
While content creators like myself, data scientists, and engineers were thriving with AI adoption, we could only empathize with our research analyst peers as we partnered with them to find new ways to make OpenAI, Gemini, Langchain, and Perplexity cater to their requirements. Everyone tried building trust in AI as we put on our researcher hats.
But soon, the consensus was that AI hallucinations were a problem for knowledge workers, whether you were a researcher, content creator, developer, or a business leader.
Fast forward to 2025, and despite all the advancements in AI, hallucinations haven’t disappeared. While companies like Anthropic, OpenAI, and NVIDIA are pushing the boundaries of AI reasoning models, the ghost of hallucinations still lingers. Our latest G2 LinkedIn poll reveals that nearly 75% of professionals have experienced AI hallucinations, with over half (52%) saying they have experienced AI hallucinations multiple times.
These new developments might promise smarter, faster, and more reliable AI, but the question remains — are they strong enough to keep hallucinations at bay?
Let’s take a closer look at the latest AI LLM updates shaping the industry:
A timeline of key AI LLM model updates in 2025
- February 24, 2025: Anthropic released Claude 3.7 Sonnet, the world’s first hybrid reasoning AI model to enhance and expand output limits
- February 27, 2025: OpenAI unveiled GPT-4.5 Orion, integrating various technologies into a unified model for streamlined AI applications
- March 18, 2025: NVIDIA announced the open Llama Nemotron family of models with reasoning capabilities to empower enterprise
- March 20, 2025: At GTC 2025, NVIDIA introduced NVIDIA Dynamo, an open-source software designed to accelerate and scale AI reasoning models in AI factories
Hallucinations, the ‘Answer Economy’, and real-world challenges
As AI models evolve with new capabilities, the way we interact with information is also transforming. We are witnessing the rise of a mega-trend that our very own Tim Sanders calls the “Answer Economy.” People are transitioning from search-based research to an answer-driven style of learning, buying, and working.
But there’s a catch in all of this. AI chatbots seem to be delivering instant, confident responses — even when they’re wrong. And despite accuracy concerns, these AI-generated answers are influencing decisions across industries. This shift poses a critical question: are we too quick to accept AI’s responses as truth, especially when the stakes are high? How strong is our trust in AI?
While AI chatbots are shaking up search and AI companies are leaping towards agentic AI, how strong are their roots when hallucinations haunt? AI hallucinations can be as trivial as Gemini telling people to eat rocks and glue pizza. Or as big as fabricating claims like the ones below.
AI hallucinations: a timeline of legal challenges
- January 6, 2025: An AI expert’s testimony was challenged in court for relying on AI hallucinated citations in a deepfake-related lawsuit, raising concerns about the credibility of AI-generated evidence
- February 11, 2025: Lawyers in Wyoming faced potential sanctions for using AI-generated fictitious citations in a lawsuit against Walmart, highlighting the risks of relying on hallucinated data in legal filings
- March 20, 2025: OpenAI faced a privacy complaint in Europe after ChatGPT falsely accused a Norwegian individual of murder, raising concerns about reputational damage and GDPR violations
There were several other notable AI hallucination mishaps in 2024 involving brands like Air Canada, Zillow, Microsoft, Groq, and McDonald’s.
So, are AI chatbots making life easier or just adding another layer of complexity for businesses? We combed through G2 reviews to uncover what’s working, what’s not, and where the hallucinations hit hardest.
More than your average newsletter.
Every Thursday, we spill hot takes, insider knowledge, and news recaps straight to your inbox. Subscribe here
The G2 take
A quick comparison of ChatGPT, Gemini, Claude, and Perplexity shows ChatGPT as the leader at a glance, with an 8.7/10 score. However, a closer look reveals that Gemini leads in terms of reliability — by a slim margin.
Source: G2.com
While ChatGPT has greater capabilities of learning from user interactions to reduce errors and understand context, Perplexity and Gemini beat it at content accuracy with an 8.5 score.
Source: G2.com
Nearly 35% of reviews highlight the accuracy gap
These AI chatbots are being used in small businesses, SMEs, and enterprises by all kinds of professionals — research analysts, marketing leaders, software engineers, tutors, etc. And a deep dive into G2 review data reveals a glaring trend: inaccuracy remains a shared concern across the board.
We can’t help but notice that, right off the bat, an average of ~34.98% of reviews have concerns about inaccuracy, context understanding, and outdated information.
Source: Exclusive G2 Data
Users aren’t shy about flagging their frustrations. Out of the hundreds of reviews, accuracy concerns topped the list of cons:
- ChatGPT: 101 mentions of inaccuracy, with outdated information adding to the frustration
- Gemini: 33 instances of inaccurate responses, compounded by 26 complaints about context understanding
- Claude: Fewer reports, but with seven accuracy issues and five concerns about recognition
- Perplexity: While boasting quick insights, it wasn’t immune — users pointed out seven limitations related to AI accuracy
While China’s DeepSeek has turned heads and wreaked stock market havoc due to its speed and cost-saving go-to-market (GTM) product, it does not have a definite (and dare we say legal enough) presence in the USA for valid concerns over safety and potential data siphoning. Speculations around its reliability outweigh the allure of affordability.
Our VP of Insights, Tim Sanders, called it out for its hallucination rate in a recent interview.
“DeepSeek’s R1 has an 83% hallucination rate for research and writing, which is much higher than the 10% hallucination rate of other AI platforms.”
Tim Sanders
VP of Research Insights at G2
Gemini: The ironic productivity booster for research analysts
We noted several research analysts use Gemini. Some particularly prefer the research mode and use it for academic and market research.
“Daily use, particularly in love with research mode. Gemini’s speed enhances the surfing experience overall, especially for those who use the internet for extensive research and work duties or who multitask.”
Elmoury T.
Research Analyst
But here’s the twist: research analysts aren’t raving about Gemini for its research reliability. Instead, it’s the seamless connectivity to Google’s suite of tools and customizable user experience that steals the spotlight. Productivity boosts, streamlined workflows, and smoother task management? Absolutely. Trusting it for rigorous research? Not so much.
While Gemini’s research mode aggregates information from the internet, accuracy and fact-checking aren’t making the headlines. Memory management issues and sluggish performance also keep it from being a true research powerhouse.
Source: G2.com Reviews
ChatGPT: power player with precision pitfalls
From code generation to market research, ChatGPT has become a daily go-to for professionals to brainstorm, generate content quickly, and answer complex questions. Yet, accuracy concerns persist.
Geopolitical topics and nuanced research often lead to misleading results. Context understanding is solid, but misinformation and hallucinations still plague users.
User reviews praise ChatGPT’s polished tone and contextual understanding, but this confidence often masks the occasional hallucination. Users highlighted its tendency to provide plausible-sounding but inaccurate information, especially in complex or nuanced scenarios like geopolitics. It’s a textbook case of “sounding smart but not always being right.”
Paid account users are impressed with its new multimodal inputs, voice interactions, and memory retention but also highlight its limitations in data analysis, image creation, and overall accuracy.
Overall, paid users find the product pricey compared to other free alternatives available in the market owing to ChatGPT’s server down time and accuracy issues.
Source: G2.com Reviews
Source: G2.com Reviews
G2 reviews also surfaced how users go through back-and-forth with ChatGPT to get their desired outcomes. At times, users ran out of allotted tokens quickly, leaving their queries unsatisfied.
Source: G2.com Reviews
But for some users, the benefits far outweigh the pitfalls. For instance, in industries where speed and efficiency are crucial, ChatGPT is proving to be a game-changer.
G2 Icon use case
Peter Gill, a G2 Icon and freight broker, has embraced AI for industry-specific research. He uses ChatGPT to analyze regional produce trends across the U.S., identifying where seasonal peaks create opportunities for his trucking services. By reducing his weekly research time by up to 80%, AI has become a critical tool in optimizing his business strategy.
“Traditionally, my weekly research could take me over an hour of manual work, scouring data and reports. ChatGPT has slashed this process to just 10-15 minutes. That’s time I can now invest in other critical areas of my business.”
Peter Gill
G2 Icon and Freight Broker
Peter advocates that AI’s benefits extend far beyond the logistics sector, proving to be a powerful ally in today’s data-driven world.
Perplexity: speed meets smarts — with a side of stumbles
Perplexity’s external web search capability and speedy updates have earned it a solid fanbase among researchers. Users praise its ability to provide comprehensive, context-aware insights. The frequent integration of the latest AI models ensures it remains a step ahead.
But it’s not all sunshine and summaries. Users flagged issues with data export, making it harder to translate insights into actionable reports. Minor UX improvements could also significantly elevate its user experience.
Michael N., a G2 reviewer and head of customer intelligence, stated that Perplexity Pro has transformed how he builds knowledge.
Source: G2.com Reviews
“Easiest way of conducting tiny and complex research with proper prompting.”
Vitaliy V.
G2 Icon and Product Marketing Manager
Business leaders and CMOs like Andrea L. are using different AI chatbots to either supplement, complement, or complete their research.
Source: G2.com Reviews
G2 Icon use case
Luca Piccinotti, a G2 Icon and CTO at Studio Piccinotti, uses AI to navigate complex market dynamics. His team uses AI to process vast amounts of data from surveys, social media, and customer feedback for sentiment analysis, helping them gauge public opinion and spot emerging trends. AI also streamlines their survey workflows by automating question generation, data collection, and analysis, making their research more efficient.
To translate insights into actionable strategies, Luca relies on predictive analytics to forecast consumer behavior, monitor competitors, and personalize marketing campaigns. His preferred AI tools? Perplexity for research and ChatGPT for managing and refining the data.
“Perplexity is our trusted companion for research purposes, while we use ChatGPT for managing the obtained data. We also use additional tools and wrappers, API, local models etc. But the unbeatable ones are Perplexity and ChatGPT at this moment.”
Luca Piccinotti
G2 Icon and CTO at Studio Piccinotti
Claude: a fairly honest, human-like, data-deficient counterpart
Claude’s conversational tone and contextual understanding shine through in reviews. Users appreciate its willingness to admit when it doesn’t know something rather than hallucinating a response. That level of transparency builds trust.
However, limited training data and capability gaps compared to competitors like ChatGPT remain areas for improvement. And while its strengths lie in conversational accuracy, its structured data analysis is still a work in progress.
Unlike most AI chatbots that confidently provide incorrect answers, Claude users appreciate its transparency when it doesn’t know something. This “honesty over hallucination” approach is a unique selling point, making it a preferred choice for users who value reliable feedback over speculative responses.
Source: G2.com Reviews
However, users also expressed frustrations around Claude’s professional mode, citing its usage bandwidth and lack of customer service.
Source: G2.com Reviews
Verdict: AI for research — yay or nay?
It’s a cautious yay — which is still better than the classic “it depends”.
AI chatbots are undeniably valuable research tools, especially for speeding up information gathering and summarizing. But they’re not flawless.
4 key takeaways
Hallucinations, accuracy issues, and inconsistent reliability remain challenges.
- Gemini might be your productivity sidekick, just not your research fact-checker if you’re a research analyst who values integration and productivity over pinpoint accuracy.
- ChatGPT is a productivity booster for quick research tasks, but fact-checking remains a must, even if you’re paying a bomb for the paid subscription.
- Perplexity is a reliable knowledge companion for researchers who value speed and cutting-edge AI.
- Claude is the choice for those seeking honest, human-like responses, but don’t expect it to crunch complex datasets.
My tried-and-tested prompting hacks to avoid AI hallucinations
- Prompt structure = Be precise + give context + specify what you want the desired outcome to be + warn it about what its output should not have + share an example if possible
- Use a prompt that calls on AI’s chain-of-thought reasoning to check accuracy and identify hallucinations. Ask the AI chatbot: “Break down the steps you followed to produce this output. Also, can you explain your rationale for doing so?”
- Use templatization and follow organization-wide guidelines on using AI chatbots and LLMs for work
- Humans in the loop remain important, especially in high-stakes environments like legal research, market research, medical research, financial research, etc.
- Always verify and cross-check sources. We know life gets busy, but a quick check is always cheaper than a lawsuit!
Hallucinate less, verify more: avoid the AI tunnel vision trap
Expect AI models to double down on accuracy and transparency. Advances in multimodal AI and retrieval-augmented generation (RAG) could reduce hallucinations. Perplexity, OpenAI, Google, and Anthropic now have their own AI search capabilities, which will plug into real-time user data to sharpen the accuracy and relevance of outputs.
Even though newer models like DeepSeek R1 are being built at one-tenth the cost of major competitors, its trustworthiness will determine its fate in the global market.
In the end, AI chatbots and LLMs are your research sidekick, not your fact-checker. Use them wisely, question relentlessly, and let the data — not the chatbot — lead the way.
Enjoyed this deep-dive analysis? Subscribe to the G2 Tea newsletter today for the hottest takes in your inbox.
Edited by Supanna Das