Artificial intelligence (AI) is ubiquitous and mesmerizing. 

You’ve undoubtedly heard some of its headline achievements: In 1997, IBM’s DeepBlue defeated chess grandmaster Gary Kasparov; then IBM’s Watson beat ‘Jeapordy!’'s human champions. By 2016, DeepMind’s AlphaGo smoked Mr. Lee Sedol, a Go legend, by fusing search trees and deep learning. Smartphone cameras and AI can now create an augmented reality that parses out anomalous moles from benign, assisting dermatologists’ melanoma prognoses in real-time. OpenAI’s ChatGPT can explain Bob Dylan’s approach to songwriting as if it were Mike Tyson (and vice versa) amusingly well. AI is even encroaching on creative domains many of us assumed would remain uniquely human, with generative AI models like OpenAI’s DALL·E 2 slinging out surreal, human-quality two-dimensional art. 

Beyond these headline achievements, many less touted AI applications are chugging along too. AI-assisted smart tractors employ computer vision to track individual plant health, monitoring pest and fungal activity, and even target precise pesticide bursts at individual weeds. Understaffed and underfunded park rangers in Africa and Asia employ PAWS—an AI system that predicts poaching activity—to fine-tune their patrolling routes. Europe is widely adopting autonomous robotic lawnmowers and they’re catching on in the United States too. AI’s leaps and bounds forward are as impressive as they are dizzying to keep up with.

But think back to when you first learned of (or used) your favorite AI application—one that genuinely impressed you. Let’s call this application A. Maybe you’ve since grown disenchanted with application A, but when you first encountered A, did you find A intelligent? Or even something close to intelligent? As useful as they can be, when tinkering around with AI applications—more often than not—we don’t exactly feel that we’re interacting with intelligence.

And ample evidence verifies our hunch. Game AI like AlphaGo or DeepBlue falter when you slightly tweak their game board dimensions from what they were trained on; humans can adapt their gameplay to such alterations with relative ease. As accurate as large language models (LLMs) can be, within about ten minutes of experimenting with one, you’re bound to discover the limitations wrapped up in using some colossal training corpora to spit out the most probable next words without understanding those words’ underlying semantics. Computer vision has come a long way, too, but autonomous lawnmowers still sometimes maim hedgehogs petrified with fear, a critter that humans easily identify and avoid. If you peer behind AI’s remarkable feats, you’ll find many glaring shortcomings.

Imagine a continuum where traversing toward one end brings us toward some superintelligence; the opposite direction brings us closer to literal stones. Someday, we may get to the point where we reminisce on the “good ol’ days,” when AI was less intelligent, but, for now, nearly all our AI systems could benefit from crawling toward superintelligence; we’d at least like gaming AI that can handle different board sizes, chat apps that grasp the concepts that we map to words and sentences, and lawnmowers that circumnavigate balled up hedgehogs resting in the grass.

Why—after the many advances over the past few decades—do AI applications still seem so far off from our intuitive sense of intelligence? And does it even matter?

We’re Making So Much Progress. We’ll Get There Eventually—Right?

Most recent AI advances lean hard on deep learning. If deep learning has carried us this far, it seems perfectly reasonable to assume that deep learning will carry us much further still. While most deep learning applications are adept at one specific task (often called “narrow artificial intelligence”) the kind of intelligence that we are after is something akin to (or exceeding) human-level intelligence (often dubbed “artificial general intelligence” or AGI). Drawing on evidence of LLMs learning things they weren’t specifically designed to learn, Gwern Branwen’s “Scaling Hypothesis” makes a compelling case that as we run more data through larger models, “ever more sophisticated behavior will emerge.” Given that our minds somehow emerge from the neural activity in our brains, and given that artificial neural networks (very, very loosely) mimic this neural activity, it’s not outlandish to suggest that AGI could emerge from some amalgamation of deep learning-based narrow artificial intelligence models, refined far beyond their current capabilities.

If this is the case, we face an iterative engineering problem. Like the engineers that refine Rolls-Royce turbines year after year, we can happily go on tweaking neural network architectures, scrounging up more data, tacking on more parameters, relying on future hardware innovations, and, eventually, we’ll propel toward AGI (or beyond). We have enough hobbyists, research institutions, startups, and multinational companies fine-tuning deep-learning applications that we’ll eventually figure it out, right? Maybe. But what if we eventually start facing significant energy, hardware, or quality data limitations? Or what if AGI fails to emerge from artificial neural networks?

We Might Need More Than Deep Learning

In the past few years, some wary voices sprouted up amongst an AI landscape rife with deep learning-based breakthroughs. And, no, these oft-dismissed voices ain’t Luddites; nor are they AI doomsday alarmists; they’re AI practitioners and researchers with some grease on their sleeves, asking us to clamber up above the treeline, re-shoot our azimuths, and gain our bearings because maybe deep learning can’t carry us all the way on its own. While few researchers believe deep learning is the only answer, we’re tossing most of our chips—funding, GPUs/TPUs, training data, and PhDs—on deep learning, and if it turns out that we only ace narrow intelligence, then we’ll have merely developed souped-up automation.

Souped-up automation is incredibly useful, of course. I frequently use Youtube’s automated captioning and translation to watch a Turkish series. Youtube’s translation from Turkish to English is garbled, laughable even. But, combined with video footage, that garbled translation provides me sufficient context to enjoy the show. You’ve likely encountered Siri or Alexa’s many shortcomings. They’re also flawed enough to routinely make you chuckle (or curse). But because they can reliably pull off tasks like retrieving factoids, songs, or weather forecasts, we find them helpful enough to fork over a few hundred bucks for. What would be better, though, are automated captioning and translation systems (and LLMs) that understand linguistic nuances beyond the likelihood that certain words will co-occur with other words—the way a United Nations interpreter does when negotiating a nuclear proliferation treaty. Or virtual assistants that can bring to bear their superior data processing capabilities and reason a bit like humans so that they can augment our decision-making. For these sorts of applications, we need to move closer to AGI. And it’s AGI that some researchers suggest we could remain far away from if we don’t sufficiently explore beyond deep learning approaches.

What Would AI Need to Be More Intelligent?

So far, we’ve assumed that most AI applications have miles to traverse toward the “more intelligent” side of our continuum, and we’ve considered that deep learning, alone, might not carry us to our destination. Now it’s time for the fun stuff. Let’s brainstorm some broad principles that would make AI more intelligent.

First, AI shouldn’t be a one-trick pony—it ought to be multifaceted. What this might look like is application-dependent, but digital assistants serve a decent example because they can process language and retrieve knowledge; a model that helps detect cancerous skin moles, since it only excels at one task, is not so broad. Related, AI should be multimodal so that the sum of several sense modalities’ performance is greater than the maximum performing individual sense modality. An autonomous vehicle equipped with computer vision and ultrasonic sensors should perform better than it could with the highest performing of vision or ultrasonic sensors. Multimodality and making AI more multifaceted are both active research areas.

Next, AI models should generalize beyond their training data and transfer knowledge from familiar domains to new domains. Suppose a zoologist, for example, discovers an unknown species. She’ll compare her current knowledge about similar species’ appearances and behaviors, generalizing and deciding what to make of this novel organism, before dropping it into an appropriate category (mammal, reptile, fish, etc.). Deep learning models are apt to falter at the same task if the new species varies too far from their training data. But Stanford adjunct professor and Matroid CEO Reza Zadeh believes that recent generative AI advances have potential here. For example, an image classification model lacking a photo for the label “hippopotamus snowboarding a halfpipe,” might generate its own image for that label and then request human feedback for how well the model’s generated image matches the odd phrase. This could reduce the amount of training data and time necessary for models to learn.

Perhaps the most important and most difficult intelligence trait that we’d like to engineer is AI’s holy grail—machine “common sense.” Because we take common sense for granted, it’s a fuzzy concept to carve out. Dr. Howard Shrobe, program manager of the Defense Advanced Research Projects Agency’s (DARPA) 70-million dollar “Machine Common Sense” project, sees three components in common sense:

  1. Intuitive physics: a sense of how objects move in one’s environment

  2. Intuitive psychology: a sense of how other agents interact and behave

  3. General knowledge: some set of general knowledge that most adults possess

You can effortlessly judge billiard balls’ paths and interpret your friend’s furrowed brow as worry thanks to intuitive physics and psychology, respectively. Given that we develop sophisticated intuitive physics and psychology as infants and toddlers—before we’ve enjoyed many training epochs of our own—it seems a great deal is baked into our brains’ wetware. Perhaps due to his research in developmental psychology, Gary Marcus, New York University emeritus professor, has been a tireless advocate for AI approaches that mimic the role that (we think) innateness plays in human cognitive development. And he’s not alone in this view; DARPA’s “Machine Common Sense” project similarly aims for machines to mimic a six-months-old human’s learning processes. Even computing pioneer Alan Turing argued that simulating a child’s mind was preferable to simulating an adult’s mind.

Could a Hybrid Approach Get Us There?

Early AI—primarily employing systems of symbols to hardcode logic into systems (also called symbolic AI)—was brittle enough for most researchers to set aside years ago. Marcus believes, however, that hybrid approaches—fusing symbolic AI and deep neural networks—could help AI combine the best of both worlds. University of Rochester’s professor emeritus Henry Kautz believes such hybrid approaches (also called neurosymbolic) could harness Daniel Kahneman’s notion of System 1 and System 2 thinking. Artificial deep neural networks roughly correspond to humans’ quick, intuitive, often sensory thinking (System 1) and symbolic AI roughly corresponds to humans’ slower, methodical thinking (System 2). You use System 1 thinking, for example, when you drive your vehicle to work; you’re nearly on auto-pilot. But suppose you roadtrip with your best friend, hashing out the meaning of life together. Not exactly an auto-pilot scenario (unless you’ve got it all figured out), so you’d be employing System 2 thinking.

Though mostly tossed out as symbolic AI faded away, symbols are undeniably efficient shortcuts for understanding and transmitting concepts. We use them every time we speak, read, and write, so AI ought to exploit symbols. Symbol-like features sometimes emerge in deep learning approaches; convolution neural networks (CNNs), for example, pick up on images’ features like outlines, for example. Unfortunately, most current deep learning approaches don’t harness symbols’ full power. The same holds for symbolic approaches; Humans constantly map raw sensory input—sight, sound, smell, taste, touch, and emotion—to our symbols, so we ought to infuse our symbolic approaches’ symbols with perceptual meaning à la deep learning.

Marcus’s pleas to dedicate more firepower to neurosymbolic AI seems worth a shot but are there any proofs of concept? While not nearly as broadcasted as pure deep learning’s achievements, neurosymbolic approaches aren’t sitting on the sidelines. First, Marcus argues that AlphaGo’s melding of deep learning with symbolic-tree search qualifies as a neurosymbolic approach. Additionally, in 2018 Ellis et al. developed a neurosymbolic model that uses CNNs to convert hand-drawn images into flawed, but human-readable, computer graphics programs. While humans must verify these programs’ correctness, it’s exciting to see a CNN yield a human-interpretable symbolic system more complex than an image outline. Finally, in 2020, Cranmer et al. developed a technique that enlisted graph neural networks to automatically extract symbolic expressions from data, finding a novel formula for predicting dark matter concentrations. Indeed, it seems that neurosymbolic approaches have significant potential.

Paths Forward

Ok, we’ve now grappled with why AI still falls short of our intuitive sense of intelligence—despite its many victories. Deep learning will certainly continue yielding many more novel and useful applications but it’s not likely the lone vessel in our voyage toward AGI. Given how multifaceted intelligence is, an armada of approaches seems more appropriate than going all in on deep learning. What exactly these other ships might be remains to be seen. A neurosymbolic approach—combining artificial neural networks and symbols to mimic our “fast” and “slow” thinking—seems promising; it could also be a dead end though. The main thing we need to do, particularly since AI is so young, is guard against flippantly tossing out approaches and embrace AI as an explorative, path-finding phase where we experiment with a kaleidoscope of different approaches.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo
Essential Building Blocks for Voice AI