Artifical Intelligence development moves quickly. Despite their first-mover advantage, OpenAI has quickly fallen behind Google, both in image generation and in LLM performance. It shouldn't be a surprise for those who have been in the AI space for the past decades, since many already knew these companies were quietly researching on building their own LLMs and testing its capabilities. Like an amateur poker player betting out-of-turn, OpenAI shoved their entire stack when they released GPT-3 to the public, forcing the other players to call, raise, or bluff their stacks in return. The speed of this development comes with several costs, those being inaccuracies, model biases and sluggish regulation.

Search Engines versus Answer Engines

Some months ago, Cloudflare published their 2025 Annual Founders’ Letter outlining various concerns they have about AI, namely how search engines are being replaced by ‘answer engines’ and the resulting shifts in how people have used the Internet over the past fifteen years. Many of us are familiar with the phrase “just google it” by which we mean to search for the answer to one’s question online. Almost any question could be googled and one could then go about solving their problem based on the answers they have obtained. At present, the equivalent phrase has become to “ask GPT”; though, ‘ask’ is a bit of a misapplication of the verb here. What we get the LLM to do is more than just ‘answer’ our query: it ends up writing the email for us; using formulas and tricks we never could to solve a math problem; generating a bibliography with authors and citations neatly listed in our desired citation format. They answers our queries, but they also execute our follow-up instructions—all too confidently.

Depending on your prompt, LLMs can generate responses within seconds. Unless you are familiar with the topic being queried, or are actively thinking about the accuracy of the output, the sheer confidence by which the model gives its reply can convince many people that it hasn’t made a mistake. This overconfidence is well-documented and inherent to popular models due to how they were trained. In other words, it’s a feature, not a bug. Moreover, LLMs still have a stubborn tendency to hallucinate answers (see table below) and our current models are still vulnerable to sycophancy. Moreover, in fields and topics that are under-represented in AI training corpora, the accuracy metric falls further (GIGO). Put simply, the combination of confidence and sycophancy reinforces confirmation biases and discourages users from verifying its (inaccurate) answers.

Column

  • SimpleQA: A diverse dataset of four-thousand fact-seeking questions with short answers and measures model accuracy for attempted answers.

  • PersonQA: A dataset of questions and publicly available facts about people that measures the model’s accuracy on attempted answers.

Column clear center

Table 1: Hallucination evaluations, taken from OpenAI2025

DatasetMetrico3o4-minio1
SimpleQAaccuracy (higher is better)0.490.200.47
^hallucination rate (lower is better)0.510.790.44
PersonQAaccuracy (higher is better)0.590.360.47
^hallucination rate (lower is better)0.330.480.16

A leaderboard from Vectora ranks the hallucination rates of various popular LLMs after they gained “reasoning” updates, with some showing a double-digit increase in hallucination rate.

vectara-hallucination_leaderboard

The Second Digital Divide

I have not seen much talk of concerns the people who are ignorant about locating information effectively—be it online or in physical books and manuscripts. This set of people likely shares a substantial overlap with the set of people who do not verify the sources of information they read from, or those who lack a decent command of the English language to read such sources, or lack the attention span required to locate the necessary information within an article and then cross-check it with other sources. Those who do not check their sources are also vulnerable to perpetuating false information that gets picked up by other AI models—a kind of two-way failure that creates a dangerous, self-reinforcing feedback loop of misinformation and inaccuracy. Thus, it is natural that as AI becomes more integrated into our everyday lives, those who are technologically illterate will fall increasingly far behind the rest of society.

Apart from those who want to use AI agents, like LLMs, in their everyday life, there are also groups who refuse to learn, or insist on sustainability grounds that AI is harmful overall. They believe in maintaining the status quo because it works, and may prescribe their philosophy to others. Others simply have more grounded takes on what AI can offer, flaws and all. These opinions are reminiscent of how many felt about the Internet over 20 years ago, and is well-documented by the Pew Research Center:

Column

Concerns of the digital have-nots

This generational story is often overlooked in discussions of the digital divide. Significant numbers of non-users cite issues besides the cost of computers and Internet access as problems when they think about the online world.

  • 54% of those not online believe the Internet is a dangerous thing.
  • 51% of those not online say they do not think they are missing anything by staying away from the Internet.
  • 39% of those not online say the Internet is too expensive.
  • 36% of those not online express concern that the online world is a confusing and hard place to negotiate.

Come 2025, if you live in an urban environment, the Internet has been fully integrated into your life—sometimes needlessly so. 1 AI development follows this pattern but at a much quicker pace. Driven by promises of increased productivity and profit margins, corporations have already replaced some entry-level roles with AI agents. However, whether they truly offer increased cost and time savings remains murky. On the consumer side, many are already complaining that AI agents are being needlessly shoehorned into applications and programs we use today, much to the detriment of user experience and product design. Some might say the similarities are slightly uncanny: visions of progress and skyrocketing productivity promised by AI parallel the hopeful aspirations many had of the Internet, but so do the cynical criticisms of the technology match public sentiment from 30 years ago.

Individual Agency and Safeguards

Quote

The computer is the ultimate polluter: Its feces are indistinguishable from the food it produces.

—Perlis, Epigrams of Programming

Another problem I haven’t seen talked in-depth about is the types of biases the models have due to its training data or model parameters. Perhaps, this is still a work-in-progress since most LLMs remain to be black boxes to us at present. Some progress has been made though: it has been discovered that some popular AI models exhibit partisan bias when discussing politics. 2 I won’t be surprised if it is also found that certain AI models have skewed responses when recommending certain websites, businesses or services as well. I’m willing to predict that some companies will also be exposed for using dark patterns in their LLMs to secretly promote or boycott certain brands or websites when providing responses.

In the arts and cultural sectors, creatives who have a more direct stake in the AI debate have put forth a different narrative in defense of their work being used for AI training and replication: the ‘democratization’ of creativity belittles the expertise, decades of time, and countless setbacks many artists have faced while improving their craft. In contrast, AI art builds itself on virtues of efficiency, scalability and profit maximization, diluting the creative space with subpar, ‘Kinkade-esque’ replicas without a ‘soul’. These replicas are often made based on training data that includes the works of artists that remain uncredited or copyright-infringed. If we use the digital streaming and media industry as a reference point, we can expect creatives to continue to be shafted by these tech giants (take Spotify as an example).

Given the scale and speed of these challenges, is government policy catching up? Some governments have begun to respond, albeit reactively. Japan’s AI Promotion Act, enacted earlier this year in May 2025, sought to foster AI research, investment and implementation. Yet, within months the government formally requested OpenAI refrain from “[engaging] in… copyright infringement” after Sora 2 was discovered using copyrighted material from Japanese Manga and Anime studios in its training material. This case is part of a pattern: policy frameworks start out optimistically broad with gaps in the framework. Specific dangers or loopholes are then found and patched as they go along. While Japan’s willingness to both pave the way in AI regulation and stand up against an AI giant is admirable, the incident highlights challenges in writing regulations in a field evolving faster than the legislators can follow. For everyday individuals concerned about protecting and preserving their likeness, waiting for comprehensive government policy may be a risky bet.

As individuals, surely there is something we can do if we want to stand out from the noise, or to protect ourselves from being appropriated by tech corporations without our consent. Personally, the answer to this question is to build a dedicated platform to share thoughts and ideas. A dedicated platform distinguishes your voice from others and makes it easy for other people to find you, your likeness or any information about you, regardless of what that platform is. Constantly updating this platform adds to and preserves your likeness, which is part of the solution. At least at the individual level, putting your voice out there is substantially more effective than inaction.

Comments on AGI Development

Without going into depth, current popular AI models are good at finding and predicting patterns. Common features across models include training on massive datasets (large corpora of textual input, large codebases, text-image pairs), followed by fine-tuning the model’s behavior via human feedback (often using reinforcement learning). The models are retrained until they meet certain goals optimally within constraints (maximizing some value function within time, compute, input data etc.).

Would you consider a species of mold to be intelligent? Depending on your definition of intelligence, it very well could be. Like you, LLMs can figure out the next word in a phrase or sentence: try and figure out the word that fills the blanks in the phrases “skin as ____ as snow” and “life isn’t all black and ____”. As long as you are decently familiar with the English language, you would have guessed the word was “white”. Unlike you however, LLMs do not “understand” what the word means. They treat these words as ‘tokens’, which are mapped to sets of numbers known as embeddings. These word-token pairs allow the model to identify patterns: related tokens share similar embeddings, like how the word “father” shares similar features to words like “family” and “grandfather”. They are capable of locating relationships between words as well. Take for example the word-pairs “king-prince” and “father-son”. If you ask it what “father” means, it can give you a definition, like any human being could. However, it can only ever describe this concept in terms of textual relationships, since that is all it is ever exposed to. Herein, I am not claiming that there is any unique kind knowledge which stands to be gained from experience, or that a posterior knowledge is a necessary condition for intelligence. I am only saying that the kinds of output we can expect from our models is correlated with the kinds of input we expose them to.

If we seek to develop some inkling of AGI, I believe a necessary step requires unifying robotics with AI so that the latter gains some ability to take in empirical information from its environment. This would allow for learning beyond textual relationships, and a more comprehensive understanding of real-world physics. Though, I’m not confident current architecture allows for the kind of real-time information possessing and long-term memory to store and learn from previous information at levels mimicking humans. To be precise, even if human understanding is pattern-based, our brains enable us to process, learn, and store large amounts of information quickly. I do not foresee current architecture allowing for such capabilities, especially when they fail to learn iteratively, and are too reliant on clean and tagged data. 3 Upon further consideration, perhaps we don’t want our AI agents to mimic humans: our brains sacrifice accuracy for processing speed, exploiting heuristics at the cost of succumbing to cognitive biases and prejudice. Given the earlier mentioned problems with agency and lack of safeguards, it may be wise to err on the side of caution when developing models that may attain AGI status.

My Prediction

It is also possible that now is still too early to tell: AI research has been ongoing for more than one decade, but it has only ramped up in the past few years due to compute and the availability of clean data lagging behind all this while. In other words, our nascent AI models may not be representative of the full capacity of current resources and model architecture. That being said, in the spirit of making falsifiable bets, I am willing to make a prediction:

Column

By December 2030, whether by design or resource limitations, we will not see AGI develop to a sufficiently reliable and cost-effective state to be run on present software architecture for highly accessible public use.

To be precise, I consider “highly accessible public use” to be akin to some current LLM models which can be run locally on comercially available smart devices (Ollama, LM Studio,) or interfaced with via an API-backed system (GPT-4 etc.). What I consider to be “AGI” is more difficult to pinpoint exactly and continues to be a major a topic of controversy. In general, researchers list the following characteristics as necessary conditions for an AI model to be considered AGI: 4

Column

  • reason, use strategy, solve puzzles, and make judgments under uncertainty,
  • represent knowledge, including common sense knowledge,
  • communicate in natural language,
  • learn,
  • plan,
  • if necessary, integrate these skills in completion of any given goal.

For the purposes of this prediction, I consider an AI that is able to independently gain and generalise knowledge beyond its training corpora, transfer skills learned between domains, and solve novel problems without task-specific reprogramming to be sufficient for me to be wrong in my prediction. It need not be as fast as an adult human but it must be able to solve novel problems accurately within a resonable duration, and show improvements in efficiency over time—something like a duration of less than two orders of magnitude and 80% accuracy is sufficient. Admittedly, these values are completely arbitrary and simply outline the boundaries necessary for the fulfillment of my prediction. This prediction risks overstepping the cautionary tone and position I usually hold. Nevertheless, transparent and declarative predictions offer some kind of stake to properly review one’s understanding of a topic—in this case, the development of artificial intelligence in relation to the human intelligence—while holding oneself accountable for their mistakes.

This prediction’s fulfillment could imply hard limitations in transformer architecture that limit AI models from achieving the general and versatile intelligence humans possess, and that alternative solutions might be more worthwhile investigating. Its violation could reveal oversights in my understanding of AI systems and their development, an underestimation of how quickly the lack of resources (compute, clean and tagged data) can be solved, my (naïve) overestimations in human intelligence, or some other limitation that I had not foreseen. Either outcome would be highly informative. Lastly, there remains the prudent question of whether we are sufficiently prepared to secure, contain and protect ourselves from a rogue or malicious AGI (or artificial superintelligence) should one emerge, and whether we are sufficiently prepared to deal with the (economic, environmental etc.) consequences that can arrive from excessive investment in the technology. 5

Footnotes

1. Some IoT devices depend on cloud providers so much so that an outage can cause them to malfunction rather than just switch to an offline mode. Others have found it easier to sideload games onto their smart fridges before they can even be launched on an iPhone

2. Harrison, S. (2024, October 16). Popular AI Models Show Partisan Bias When Asked to Talk Politics. Stanford Graduate School of Business; Stanford University. https://www.gsb.stanford.edu/insights/popular-ai-models-show-partisan-bias-when-asked-talk-politics

3. Jones, N. (2024). The AI revolution is running out of data. What can researchers do? Nature, 636(8042), 290–292. https://doi.org/10.1038/d41586-024-03990-2

4. Taken from Wikipedia. This list of intelligent traits is based on the topics covered by major AI textbooks, including: Russell & Norvig 2003, Luger & Stubblefield 2004, Poole, Mackworth & Goebel 1998 and Nilsson 1998.

5. See AI bubble; Xiao, T., Nerini, F. F., Matthews, H. D., Tavoni, M., & You, F. (2025). Environmental impact and net-zero pathways for sustainable artificial intelligence servers in the USA. Nature Sustainability, 8(12), 1541–1553. https://doi.org/10.1038/s41893-025-01681-y