LLMs and Marketplaces
LLMs might be unreliable but adding a more procedural regulatory layer can help you get the best of them while limiting downsides.
Long ago I had written about a “paradox” (I now don’t know why I called it that) in marketplaces. This was back in the day when everyone and her uncle was building an “Uber for X”.
This is the fundamental paradox of service marketplaces. When services are reliable, you don’t need a marketplace. So if you need a marketplace only if services are unreliable, the server side of the marketplace is full of unreliable people. The hope, and the value that the marketplace adds, is that by aggregating a bunch of unreliable people, some level of reliability is guaranteed. The question is how sustainable this is.
Now I’m thinking about this in the context of LLMs. One of the first questions I get when I tell people that I’m building something based off LLMs is how I’ll deal with hallucination. “Business insights from data is a very important task, where you can’t go wrong”, they say.
This is a very valid question - LLMs are highly prone to making errors, prompting can be a nightmare, and their fundamental stochasticity means they are not consistent. And this means that exposing LLMs, either directly, or through a fine tuned model, can be fraught with inaccuracies.
Where the user has patience, they are able to get over it by means of repeatedly prompting, checking elsewhere and getting the answer right. Elsewhere, the LLM is highly unreliable.
And thinking about this, this is where the idea I wrote about way back in 2015 works - except that I’m not dealing with human service providers here.
Where you are dealing with unreliable agents such as LLMs, there is massive value that can be added by a wrapper “executive function” layer.
To quote an old professor from my undergrad, “redundancy adds to reliability”. So if you are building a system USING LLMs, by adding a wrapper that brings in “executive functioning”, you can do gymnastics.
The idea is that the LLMs do the “creative” work, but there is a “checker” sitting on top of it which directs the creativity in the right direction, “gives repeat work” if necessary and regulates outputs from various LLMs (either in series or parallel) in order to produce a final output.
This “checker”, by definition, cannot be another LLM (that defeats the purpose of the model itself). It is more likely to be a much more traditional computer program. Think if it like an Uber regulating a whole bunch of unreliable drivers to provide great experience to customers.
What we are building at Babbage Insight is not a LLM. It is a data science model to do data science. There are LLM-based tools that we will be drawing upon (including fine-tuned LLM models), but the ultimate system is a data science (friendly reminder that data science != machine learning) model that puts everything together to provide exceptional insights to our customers!
So maybe we can call ourselves a “Uber for X” after all? :P
PS: The wrapper layer can be thought of as a long put option. Combined with the stock (the LLM), it limits the downside while retaining upside.
How is the "Wrapper" layer different from the "human-in-the-loop" concept?