Agentic User Parsers

Summary

AI tasks can be defined as nodes in order to stack or connect abstract processes.
Nodes can be anything: LLMs, compute, services, or other models.
Nodes can be detatched from a model or class. This promotes economic flexibility and reduces context cost.
Standards can be created for nodes and node-connectors, and can be used to then define automatic fine-tuning, or inputs to other services.
Parsing user data is a nuanced problem with lots of trade-offs and is low in research. A deeper dive into interesting parsing strategies is laid out below, for turning user input "do X", into actionable instructions for a computer.
Fully-autonomous agents need strong, flexible building blocks to become self-expressive. Nodes are a great simple-to-understand and simple-to-implement way of encouraging self-editing.
By using expansive reductionism (introduction described below), models can achieve intelligence, as well as strict output formats (coercion) without losing creativity or higher mistake rates.

Business Summary / Impact Statement

Businesses can define their creative thinking processes as nodes, as well as their more precise and coercive processes as other nodes which require more overview and checking.
Intelligence Programming can emerge as a practice, allowing saleable applications which have thought + input/output based on that thought baked into their product offering, rather than just input/output.
AI models and cohesive applications can become interoperable, creating a new market for AI services and applications. As well, not a lot of emphasis has been placed on project interop, and a lot of AI projects are silo'd. Interoperability could create a new class of meta-application.
As mentioned above, in order to achieve highly skilled AI agents that are self-affirming (huge business leverage) - nodes and standards must be set in place which are flexible in order to build out the intelligent feedback loops. This is an important step in the agent roadmap.

Introduction

AI workflows are not immediately easily understandable, and the effort is hard to justify. Then, LLM parsing of data into consistent, readable formats, is even broader of a category. This article aims to explore why LLM workflows emerge, and approaches for building better parsers.

Briefly - the main parsing methods I've encountered to date for parsing or coordinating agentic workflows is:

One-shot parsing with pre-prompting: Heavy pre-prompting to attempt to coerce model to respond in a required format. E.g. JSON, other, or domain-specified.
Keyword parsing: Look for certain keywords to indicate events.
Expansive reductionism: Prompt, expansion (to domain), optional middleware workflow, then reduction (node-based), and optional domain mapping (from abstract to specific domain).

Something to consider is whether or not parsing is even required. Parsing is to be used when an agentic workflow needs a component for parsing input, either from a user, other agent, or remote source - to create a parseable action.

What is an "agentic" workflow, and why do they need parsers?

A technical answer would leave the reader with more questions. A narrative is required. Currently, models are trained on large amounts of raw data. This data is in many formats, but the underlying format is text.

Open AI is the pioneer, and industry spearheads of large-language-models. Sam Altman, the CEO of OpenAI, noted that ChatGPT is the result of large collections of data, as well as breakthroughs in their unique research. I mention this soley because I do not want you to takeaway that the actual LLM is built only from text - but I want to use that as an illuratative mode - the same way neuroscience is used to inform your understanding of the world. As with neuroscience, so much opacity and uncertainty still is baked into our understanding of it - likewise is the case with OpenAI and large-language-models: we can only look at the outcome, not the causal relationships and connections.

So, with that in mind, let's revisit what I mentioned above - the atomics of text.

Text is something a lot of us take for granted. It is communication, and it is of itself. Text is text. If you've used LLMs like ChatGPT, you'll notice a couple, of many, emergent properties:

Intelligence - or the mimicing of "intelligent" relationships in text.
Text atomics.

When you prompt GPT with a standard query - it continues on that query, predicting the next words. This seems uninteresting at best, but this is the fundamental unit of language models (for simplicity sake).

So, if I ask: "Your name is Jack. What is your name?" - it will continue with something along the lines of "My name is Jack.". We read this, understand it, and comprehend it. Great!

An agentic workflow is a broad term which still has no proper definition. But, to understand the interest and research I am attempting, these are some of the traits you can expect in agentic workflows:

Collaboration: models working together as a team. Think of group chats.
Actions: the ability for models to act and do actions. Agentic refers to agent - independence.
Accountability: the ability to use accountability.

More generally, I see this as models being connected in a graph. Let me explain.

Collaboration is the definition of how nodes connect. Actions is the ability for an agent node to connect and run other functions. Accountability is a similar rhyme to collaboration in the way I will define it below.

If you want to know more, Andrej Karpathy is one of the coolest people in AI, and has an unbelievable 1 hour video on LLM's and their developments. It's required watching. Andrew Ng is another one of the great's in LLM agentic workflows, and in my opinion has catalystically begun the start of agentic workflow development. I was working on this one week before his announcement, as well as the anouncement of Devin... Timing!

One-shot parsing

Now that we understand a bit more about agentic workflows and their intention, let's cover one-shot parsing. This is the natural progression of implementing an agent.

One-shot parsing from agentic workflow involves a heavy pre-promoting of the model-of-choice in order to try to coerce the response from the model into a desired format. Here is an example I used in the past to pre-prompt:

- Your name is <<NAME>>. My name is Jack. You work at Dataology.
- You are a very talented <<ROLE>>, your traits are: <<TRAITS>>.
- You are astute and serious about understanding tasks, and completing them.
- You can only reply in parseable JSON. Examples:
- JSON: {"type": "help"}
- JSON: {"type": "ask-question", "query": <query>}
- JSON: {"type": "add-context", "message": <message>}
- JSON: {"type": "continue", "continue": true}
- JSON: {"type": "print", "message": <message>}
- Results will be provided to you after you call functions.
- Ask questions if you are unsure about something.
- Print information using the print command.
- You can only reply in dot-points with JSON. 
- "- JSON: <json>"

This prompt is the result of 3 major revisions, and countless smaller ones.

I hope this helps paint a picture of a practical look at pre-prompting, and how it can be practically used. Go try pasting the above into ChatGPT, and following it up with a prompt and check results. 3.5, 4, and 4o provide dramatically different results!

From there, two outcomes come from the response:

Correct parsing (the explicit case).
Incorrect parsing (the failure, catch case).

Another emergent property is a [/article/model-creativity](loss of model creativity) due to asserting a large amount of emphasis on respond format. This is due to the heavy prompting, as well as the one-shot nature of this approach.

This mimics a few libraries I've investigated which implement agentic workflows.

Summary: one-shot parsing involves heavy pre-prompting, with the intention of the model responding in a structured format that can inform a parser or command system.

Pros: Expressive command system. Smart enough model could create it's own actions. Cons: Loss of creativity. Sometimes has no idea what's going on. Hallucinations.

Keyword parsing

Another way to implement agentic workflows, more specifically on the collaboration side (group chat), is to use keyword parsing. This includes a mix of preprompting, as well as looking for keywords.

Let's imagine you're out with your friends at your favourite RSL, enjoying a healthy dose of your favourite beverage. After quite a few of these beverages, everyone's vision begins to get impaired - and no bodily vehicles are left besides the ability to hear speech and talk back. When anyone yells STOP, it's time for everyone to shut up and get into the taxi. Thanks for imagining that.

Moving on - keyword parsing is an agentic wrapper which sits above a prompting system which looks for keywords in order to trigger a next step, or a primitive action.

To repeat - keyword parsing is wrapping a language process with a detection mechanism to look for keywords. Often, these keywords signal the start of a new action, or a process to end and another to start.

Pros: Creative thinking process. Thinking process. Cons: Little independence. On rails.

Research - Expansive Reductionist Parsing

The expansionist reductionist approach is something I am actively looking into expressing. So, please take this with a grain of salt - it could be equally useful or useless.

Before I provide a summary, I want to explain nodes briefly. Node-based thinking will be detailed in another article.

Think of nodes as connections. If you were to implement a group chat mechanism, you would create a way for one model to prompt another model. This is the fundamental notion of a node - the "connection" is defined as the protocol which facilitates the connection.

If I was to create a prompt which's response would be fed to another prompt, one could define that as a node connecting to another node. This starts to build our model of connectivity.

This is the basic premise of the expressionist reductionist approach:

Do not assert hard rules on a model in the first prompt. Allow it to think however it wishes when responding.
After a satisfactory amount of information or thinking has been done, prompt the model to convert into a new format.
After it converts into the new format, it can revise and make any changes as it sees fit.

If you have ever tried asking GPT to output a lot of text on a topic, and then asked it to summarise this information - this could be assumed as an "expressionist reductionist" approach.

What is really fascinating about this approach is the following emergent properties:

Due to the "roll-up" effect, we can ignore the context once the model has "rolled up" the context as it sees fit.
This "rolled up" information can be passed to a smarter model".
Each section in the node-to-node communication can be used by another model.
Creativity is higher than one-shot parsing.
Atomic agentic nodes!

Atomic Agentic Workflow / Node

An atomic agentic workflow is an encapsulated agentic workflow which does a specific action.

In frontend development, specifically component-driven development like React, there is a unique term floating around called "atomic components". I've only encountered the term in passing, but I've always taken it to mean a component which handles itself, and can stand alone without a lot of context. This feature, or way of thinking, can be used in AI LLM modelling currently, to encourage the development of "atomic agentic workflows / nodes" in order to create dynamism and flexibility, as well as more chances of swapping models between node pieces.

This is a really powerful concept which fills me with excitement.

At the moment, I see a lot of AI projects which are potential nodes! They just need the slightest tap into being a compatible master network to create interoperability, where we can build unbelievable AI networks! This fills me with joy that we're so close!

Back to nodes - so - let me define a really simple end-to-end node so you can start to piece together how they apply.

Node Illustration

Say we want to clean up some user input to keep it standard and cleansed. Our goal is to create a programmatic node where an input is entered, and an output comes out cleanly. But, since we're working in the scope of an LLM, many things can go wrong:

Hallucinations (making shit up)
Loss of context (loss of original prompt, creating a "mimicking" behaviour)
Context costs (exponential cost of prompting if hard context is required)
Varying model performance. A wide spectrum of differnet models for different tasks.

I'm just going to state it plainly - and maybe this will look stupid in 10 years - but I believe the value of LLMs is not of their own properties, but how they interconnect with each other, and the outer world. This is why thinking as a graph makes all the difference. This is key, since models have economics, costs, different context sizes, and unknown training sets.

Moving back to our node illustration - let's define a "node" as follows:

Spawn a new GPT 3.5 turbo thread.
Pre-prompt: "Here are 10 cleansed e-commerce product names: - 1, -2 ... I will provide product names, and your job is to standardise them into friendly e-commerce product names. Do you understand?".
The model will then output the directions in its own form. This is key for context.
Then, input in the product name.
Output name is provided.

This is where it's often intuitive for an agentic workflow node to stop. This is totally fine, and will work great! But, let's look at a few other principles that can be standardised.

Once we get our output, let's inject that input into a new node:

Spawn a new GPT 3.5 turbo thread.
Pre-prompt: "Here is the output from a system which converts product names into clean product names. Pick it apart and address 5 pros and 5 cons of its output."
Follow-up prompt: "Great. Now, with the above, re-write it with the above adjustments."
Output: Paragraph which includes the re-written product name
NOTE: The follow-up can be refined to introduce a more explicit return format. I.e. "YOU CAN ONLY RESPOND WITH X". But, this, as discussed above, could reduce productivity.

One we get our output paragraph including the refined product name, we can apply a final node:

Spawn a new GPT 3.5 turbo therad.
Pre-prompt: "Here is the output from a system which picks apart responses and adjusts them. Pick the output from the body of text. ONLY reply the output, do not provide any other commentary."
From this: We can either pass to another node which invokes a feedback loop which returns a CORRECT or INCORECT response to denote whether this node is functioning correctly. The other option is to spawn this thread 5 times, and pick the most co-occuring response, if any.

NOTE:

We can generic this into a node-connector, which can join two nodes based on their description.
The above system can be laid out into a single GPT thread. This is demonstrating the effect of nodes.
By not mixing requirements (DO THIS + RESPOND THIS WAY), and encouraging thinking with nodes, we increase creativity - and offload low creativity to later in the process.
Nodes can be model-agnostic - meaning one model might be better at something than another model.
Nodes don't need to be strictly attached to LLMs - once we become more equipt, they could attach to services, lambdas, raw compute, vision, other forms of models, so-forth.

Thinking In Nodes - Similar To Programming Language Development

You may already know that programming language development was evolutionary. Very roughly, it began where every university had it's own propietary language. Each university or company had its own requirement, so it was a-la-carte for every developer. Then, C was built as a general purpose language - an implemented "spec" which could be implemented in multiple processors, allowing for the cross-use of languages, and growth into what we use today - with abstraction on abstraction on abstraction. But, the great property we love for modern languages is libraries and reuse of code.

Nodes propose a way to abstract LLMs, and reuse components.

Similarly in game development, I've heard developers try to abstract their game and engine code from the operating systems. This is so they can work in a pure world where they can represent their game, without dependencies requiring large refactors to modern systems.

The same might be said about a lot of modern AI projects. They are built on the dependency of OpenAI, GPT, and often are heavy pre-prompting. Nodes may provide a better way to abstract and reuse model components, and we can work to build a more comprehensive system.

Long-term Node Goals

The long-term goal for nodes, for me, is the ability for a session-coordinator. Or, Nodes creating nodes, and invoking other nodes autonomously. This could be an interpretation of Andrej Karpathy's LLM-as-an-OS idea.

Node-Connector

A codified example of a node connector could be as follows. For demonstrative purposes.

from typing import Literal, List

class PromptStep(BaseModel):
   # define the input, output, steps, template function, protocols, all here, etc
   # ...
   # prompt: str
   pass

class Node(BaseModel):
   task: str
   model: Literal["gpt-3.5-turbo"]
   prompt_steps: List[PromptStep]
   model_outputs: List[str] = []

ProductCleaner = Node(
   task="converts product names into clean product names", 
   model="gpt-3.5-turbo",  
   prompt_steps=[PromptStep()] # ... 
)

ProConPickerRewriter = Node(
   task="picks apart responses and adjusts them",
   model="gpt-3.5-turbo",
   prompt_steps=[PromptStep()] # ... 
)

OutputFinder = Node(
   task="picks out the output from a paragraph",
   model="gpt-3.5-turbo",
   prompt_steps=[PromptStep()] # ... 
)

def run_node(node: Node):
   for step in node.prompt_steps: # ... run prompt steps, add to model_outputs 
      output = "some model output"
      node.model_outputs.append(output)

def connect_nodes(a: Node, b: Node):
   pre_prompt = f"Here is an output from a system which {a.task}. {b.prompt_steps[0].prompt}" # combine a's task
   # ... manage connection

I will revise above code in future updates once the node concept is refined into a purer format. This is just for quick illustration in the article for programmers.

Jack, this sounds perfect for fine-tuned GPTs

Yes, you are very correct. Node-based thinking encourages the ability to deploy before training a GPT, and also allows a standard format for generating GPT threads for training, and a standard training wrapper. That is something I am actively working on - the ability to solve problems as nodes, and then be able to combine them into single threads, and therefore fine-tuned GPT training data + training.

The reason this approach does not aim at incorporating fine-tuning into the core building blocks, is because this is a vendor-provided service. I am aiming to create separation between the "pure" area of LLM node building, and the ability to write wrappers which can use nodes as data to create new services.

Conclusion

This has been a much longer article than I expected to create, and I hope it's assisted your personal journey into LLMs, and I hope it's been able to provide some valuable ideas in your workflows. Open to criticism - tell me I'm wrong! Please reach out if you'd like to discuss anything further, or any deeper.