Skip to content
Avatar photo

Towards long-term memory recall with Kinetica, an LLM, and contexts

You’ve tried and tried to remember the name of someone you’ve lost, until one day you see something familiar and suddenly, everything clicks. Suppose we could reproduce “everything clicks” using GPUs.

Prior to the emergence of machine learning, and particularly “deep learning,” I was an ML skeptic.  Judging from what I saw from the state of the art at the time, I’d say there was no way to program a CPU or a GPU — each of which, after all, is just a sophisticated instance of a Turing machine — to make it exhibit behaviors that could pass for human intelligence.

It seemed like a sensible enough stance to take, given that I spent the bulk of a typical work week translating ambiguous requirements from customers into unambiguous instructions a computer could execute.  Algorithmic neural networks had been around since the 1950s, yet most AI algorithms had been designed to follow a fixed set of steps with no concept of training.  Algorithms are sets of recursive steps that programs should follow to attain a discrete result.

While machine learning does involve algorithms at a deep level, what the computer appears to learn from ML typically does not follow any comprehensible set of steps.  ML is a very different world.  Once an ML system is trained, unlike an ordinary algorithm, it isn’t always clear what those steps actually are.


The whole approach to algorithms in AI changed in 2012, when a University of Toronto computer science team led by Prof. Geoffrey Hinton used GPUs to train deep neural networks, as part of a computer vision competition.  Their model for rectified linear activation beat the competition by a long shot.  By the next year, a majority of competitors had already abandoned their procedural algorithms and used deep neural networks.

I think the most interesting result of the discovery is that it was completely unexpected.  For years prior, well-funded corporations had the resources available to them to train a model like that.  Sure, there were neural networks, but they were only exhibiting success on a very small scale.

 You could make a similar case about the development of the Transformer architecture in 2017, and GPT-3 in 2020.  Up to that point, the most powerful language models used LSTM networks, though they could not scale upward after a few million or so parameters.  GPT-2, an early transformer model, was impressive enough, yet it did not seem revolutionary.  Some people at OpenAI had a “crazy” idea that if they spent $5 million upgrading GPT-2 (1.5 B parameters) to GPT-3 (175 B parameters), it would be worth the investment.  They couldn’t have known in advance what the payoff would be.  Yet their results were astonishing:  The GPT-3 model exhibited emergent traits, capable of performing unanticipated and unforeseen tasks that weren’t even being considered for GPT-2.

The way all these events unfolded caused me to re-evaluate what I thought was possible.  I realized there were many things we don’t understand about large ML models: for example, how when they’re given a large enough number of parameters, suddenly they exhibit emergent traits and behavioral characteristics.  When a large language model exceeds about 7 billion parameters, for some reason, it behaves differently.  We don’t have any workable theories about why this happens.  And we don’t yet have a good way to predict how or when these capabilities will suddenly change.

So many products and platforms have aimed to reduce or even eliminate the need for coding — Business Objects, PowerBI, Tableau, and all those “fourth-generation languages.”  Yet once you’ve successfully “integrated” these tools into your workflow, they become bottlenecks.  They require decision makers to build the skills necessary to use them effectively – or else wait for analysts or data scientists to come in, analyze how they’re being used, and write their own reports.  Still, I saw the potential for generative AI to empower anyone in an organization to query a database which would enable users to get otherwise inaccessible insights.

Associations and contexts

To better understand how AI handles semantics — the study of the meaning of words — we should turn our attention for a moment to the human brain.  In the study of neurology, semantics has been shown to have a definite, if unexplained, connection to people’s short-term memory.  The effectiveness of semantics in aiding recall from short-term memory, appears to be limited.

Consider a situation where you get an email from someone you haven’t seen in ten years or so.  You can be looking at their name, and gleaning a few details about them from the message.  But you could be stuck trying to recall their identity from your long-term memory, until you gather enough information about them from the message to trigger an association.  Short-term memory works sequentially, but long-term memory works through connections.  Sometimes it takes time for your short-term memory sequences to build enough connections to serve as a kind of context, that would trigger your long-term memory associations.

An LLM has its own context that’s analogous to a person’s short-term memory.  The size of this context is limited, since the computation requirements of Transformer models scales quadratically with the context’s length.  It’s important for us to use contexts effectively, because they enable us to train the model for any given, specific inferencing query in a few seconds’ time, without needing to optimize and update the 16 billion parameters that comprise the model.

Today, Kinetica is working on a vector capability that will create an index of embeddings, that collectively represent semantic bits of information that can be easily modified.  These embeddings may represent many different types of data, including natural language, stock market data, or database syntax.  Using an LLM to recall information associatively, we can augment the context with information retrieved from a semantic search, then place the results from that search inside context objects.  Then with an inferencing request, we would query the index using a semantic similarity search, and then use the results to populate the short-term memory represented by the context.  The results we expect to see will be the type of associative recall of information, events, and even complex concepts that could never be attained from ordinary semantic search processes, like the typical Google search.

The evolutionary issue

Speaking for myself, I still maintain a healthy level of skepticism about the capabilities of modern AI systems.  You won’t see me jumping onto Elon Musk’s anti-AI bandwagon, or signing a petition to pause or halt AI research.  I believe humanity to be a far greater threat to itself than AI, and elevating AI to the level of an existential threat is a distraction from far more important issues that we face. 

That being said, the swift emergence of generative AI makes me wonder if we could create an LLM with the capacity for a large, persistent, and evolving context.  Such a data construct would have the ability to retain long-term memories, pose new questions of its own, and explore the world around it.  Of course, such an AI would have an ability that no system has today: the ability to evolve its own capabilities independently of outside programming or influence.  Normally, such a development would sound like it’s a long way off.  But considering the history of unexpected advancements, I’ll say instead:  Stay tuned.

MIT Technology Review

Making Sense of Sensor Data

Businesses can harness sensor and machine data with the generative AI speed layer.
Download the report