Hypercontextual interfaces are an emerging UI paradigm for organizing AI-enabled applications and for augmenting traditional apps with greater discoverability. This essay introduces these interfaces via a series of examples and live demos.

To motivate and introduce the ideas, we’re going to first look at a few non-AI tasks that aren’t well-served by either existing app UIs or chatbots. These fall within the “long tail” of user goals—they aren’t what you spend 80% of your time on in an app (the 80% functionality is easily discoverable and has a pleasing UI) and so the functionality is often buried somewhere unguessable. For instance:

  • In your banking app, where in the UI should you click to set up an alert if your balance drops below $1000?
  • On Stripe, where in the UI should you go if you need to revoke an API key?
  • On Amazon.com, where do you reduce the frequency of a subscribe-and-save shipment? Or figure out when the next delivery will be?
  • In the Paypal app, where can you audit the logins over the last month? Where can you get a report of all outgoing payments, and the payment method used for each?
  • In the Netflix app, is there some way to turn off the autoplay of show previews?

Every app struggles with this. As much as designers try to arrange the app’s functionality into some reasonable information hierarchy, and as visually appealing as the app might be, it’s not enough. The user experience of being clear about what you want to do yet forced to find that functionality in someone else’s invented information hierarchy can be an exercise in frustration.

Why can’t we just tell the computer what we want, and have it… do that thing?

Ten years ago, the technology didn’t really exist to give users a better experience. Now it does. Interfaces of the future will not be limited to static UI hierarchies, but will grow to include hypercontextual interfaces to an underlying programming model or API. These interfaces will be dynamically assembled based on context, serving the long tail of user goals that either aren’t well-handled by existing UIs or which are flat-out impossible.

A hypercontextual interface can feel conversational, but it isn’t a chatbot: the modality isn’t always (or even often) text. And while hypercontextual interfaces often use LLMs for some subtasks, LLMs are only a means to an end, always in service of providing a great user experience. They can be used for simple non-AI tasks, for rich interactions with agentic workflows, or anything in between.

Example 1: Setting an alert in your banking app

Here’s a simple example of a long-tail request we might make in a banking app: “Add an alert if my balance drops below $1000”. We’d love to be able to make such requests using natural language, without having to figure out where the heck in the static UI hierachy the designers have decided to put this functionality.

Notice this natural language request is ambiguous. We might have multiple accounts (say a personal and a joint account) and haven’t specified which ones we want alerts on. And how should this alerting happen? Email? Text message? Carrier pigeon?

A good interface will need to clarify these things, and this clarification may require multiple steps. If we simply want the alerts to go to the email and/or phone number my bank has on file, then perhaps we can multi-select from those two options and be done. But if we want to provide a new phone number for this alert, perhaps there is an additional 2FA process. And while this is just a simple example, the logic for these flows could in general be quite complicated and bespoke to the domain.

Let’s see a demo:

This interface is somewhere between a chatbot and a traditional application. Notice how sparingly text is used here as the modality. Yes, text is used where it makes sense, but elsewhere we use proper UI controls that are more constrained and offer better affordances.

Now, there is nothing wrong with text as a modality, but if we need the user to select which account they want, what’s the better experience:

Add an alert if my balance drops below $1000
Please enter the account number(s) you wish to add an alert for.
Really?!
Add an alert if my balance drops below $1000
Which account(s) would you like to monitor?

Personal Checking

$2,847.32

Joint Checking

$5,124.67

Likewise, what is the better UX, a chatbot that replies with obscure instructions like “Navigate to Accounts > Inscrutable Menu Item 92 > Unguessable Random Category 17, then select Alerts, then… ”, or an interface that directly presents UI elements so you can accomplish the thing you want, right there?

It’s not even close! Of course we want the best UI control for the job, and of course we want to be able to directly do the thing, not be given confusing instructions of how to do the thing elsewhere. Seriously, chatbot, if you have successfully interpreted the request, why can you not just help the user to carry it out right here and now?

Amelia Wattenberger has a great essay, Why Chatbots Are Not The Future, which explains many of the problems with text as a modality.

LLMs will often be used within these interfaces, but in this example, it’s used only at the start, where natural language input is parsed into a structured request. Everything that follows is just deterministic logic. And this is fine. Our goal shouldn’t be to maximally use LLMs, it should be to help users accomplish things. When LLMs are the best tool, use them; when they aren’t, don’t!

Judicious use of LLMs has another benefit, too: we can get away with smaller, faster, and cheaper models, all while achieving greater reliability. Why? Disambiguating a natural language query into one of a dozen or two discrete commands, then handing that off to deterministic code is “easy”. It’s harder to reliably support the same overall user interactions with a huge (expensive) model that’s been supplied with dozens of tools and a prompt to hopefully steer it in the right direction.

Of course, not all tasks are amenable to deterministic code, and LLMs have their place. We’ll see examples later in this essay. But if you can use regular code for part of a task, that’s usually a win, since regular code is deterministic, cheap, runs instantaneously, and so on.

Example 2: Changing shipping address of a recent order

Suppose we have a marketplace or shopping app and want a place where the user can make various long-tail requests and queries like “change the shipping address on my order”. This is an interesting, multi-step interaction:

  • The UI needs to clarify or confirm: which order are you referring to? Ideally via a UI with recent orders they can tap on to select. Not typing the order number in a chat window.
  • The UI needs to collect the corrected shipping address, allowing for both selecting for the address on file, or entering a new one. Not typing free-form text in a chat window.
  • If entering a new address, there may be an additional authentication step.

Again, hypercontextual interactions are often kicked off with natural language, but since natural language is frequently imprecise, a good interface will disambiguate and refine the user’s intent via sequences of crisp UI controls. The user should never feel anxiety that the system will take some irrevocable action based on some assumed interpretation of natural language input. Here’s an interactive demo:

After an initial natural language input, the user is walked through a sequence of crisp and context-sensitive UI controls allowing them to directly carry out their goals, with the affordances one expects from any other polished UI.

Now compare this to the alternatives:

I need to change the address for my order
To change your address, go to Account > Orders > Latest, then select your order. In the lower left there's a link button labeled, "edit". This brings you to a screen where you can edit your shipping address
Can't you just do that for me?
A chatbot that just gives you instructions on how to navigate the website
I need to change the address for my order
Please hold while we assign you to a customer support agent. This can take up to 15 minutes.
15 minutes!?
A human support agent you have to wait 15 minutes to talk to
I need to change the address on Order #12345 to 1 New Address St, Cambridge, MA 02139
You got it. Thats all done now.
Thanks, whats the shipping address on that order, #12345?
Just checked, its 6 Old Address St, Cambridge, MA 02139
Really!?
An LLM that confidently tells you it's changed the shipping address on your order, except that it actually hasn't—it just told you what it thought sounded like a helpful reply
I need to change the address for my order
Select a recent order

#1847

Not yet shipped
Organic Cotton T-Shirt(Navy, M)
Select an address

Margaret Hamilton

Default
+1 (617) 123-1234
1 Software Engineering Rd, Cambridge, MA 02139, USA
A good interface uses rich UI controls appropriate to the domain, and with good affordances. The interaction is more efficient as a result, requiring less time and typing, and it has a feeling of crispness. It is always clear when actions are being taken.

Example 3: Setting up parental controls

Hypercontextual intefaces are often a good fit for app configuration tasks, and this example involves setting up parental controls for a video streaming service like Netflix. Again we have here long-tail functionality that isn’t easily discoverable, but this example is interesting for another reason: it’s a long-running stateful interaction that involves multiple parties. More specifically:

  • The parent initiates setup of parental controls via natural language. They are then presented with a sequence of crisp UI controls to define the settings.
  • The kids can (later) request exceptions to the existing parental control policies (“hey, Mom, can I watch Jurassic Park?”), which parents are notified of and can approve or disapprove.

Let’s look at the demo:

The “parental control process” is infinite, kicked off by the parent, going to sleep when there is nothing to do, and waking and asking for parent feedback when new events arrive.

There’s nothing very complicated about this user experience, but underlying it is a process that runs (potentially) forever and pauses for human input under certain conditions. This raises a surprisingly deep question we’ll return to later in this essay: how should one represent a paused process whose execution plays out over days, months, or years?1 Can such a thing be achieved generally, or must every such hypercontextual interface invent its own method of pausing and resuming, for the particular kind of state it builds up over the course of interacting with its users?2

Wait, do these examples even “count” as AI?

Does it matter? Almost every app has long-tail functionality that can be nicely surfaced via hypercontextual interfaces as shown above. Yes, these examples are nearly invisible low-lift applications of AI and LLMs, but that’s a good thing. It’s valuable if users can accomplish in 1 minute what would have taken a frustrating 10 minutes of clicking around at random in the UI, googling, or contacting support to speak with a human. Take the win, people!

Agentic computing

The examples so far have made light usage of LLMs, but a hypercontextual interface is also the right sort of UI paradigm for interacting with more autonomous AI agents (where, again, text is not always the best modality for steering an agent’s behavior). There are two general styles of structuring a hypercontextual interface:

  • Code-chosen control flow: In the examples above, control flow of the interaction is defined with regular code, and LLMs are used only when parsing natural language to a structured command or request. After that parsing, regular code takes over, perhaps until the interaction reaches another point where natural language input is warranted.
  • LLM-chosen control flow: On the other end of the spectrum, we can give an LLM a prompt and a set of tools that let it dynamically converse with the user via rich UI controls, as the model sees fit.3 This can be a good fit for certain tasks, and sometimes it’s the only choice.

Both modes are valid and both have different tradeoffs, and real use cases may make use of both in some interesting combination.

For instance, in coding assistant applications, there generally is no deterministic way of translating arbitrary natural language requests from a developer into code. LLM-chosen control flow is necessary, and humans are in the loop to steer the process. But once again, it is still preferrable if the agent can present rich UI elements as part of the conversation (an embedded diff where people can comment on individual lines, say) rather than solely presenting users with text and forcing them to respond via text (“on line 63 of foo.u, can you factor that logic out into a helper function and write some unit tests?”). Here’s an example of what this looks like:

Can you implement a function to find the top k of an array in java
Would you like it to be generic or specialized to a particular type?

Generic

Specialized to

int
Here's an implementation
public static int[] topK(int[] arr, int k) {
  int[] sorted = Arrays.copyOf(arr, arr.length);
  Arrays.sort(sorted);
Add a comment
Try using quickselect, instead of sorting the whole arr
int[] result = new int[Math.min(k, arr.length)]; for (int i = 0; i < result.length; i++) { result[i] = sorted[sorted.length - 1 - i]; } return result; }
In this example, the user asks for a function to return the top K elements of an array. This uses LLM-driven control flow, but the LLM can pause for input from the developer, presenting crisp controls to disambiguate aspects of the request. The output is an editable block of code, directly embedded in the conversation, and the user can comment on sections of it.

Here, the coding model chooses to sort the entire array and take the last K elements of it, but the user suggests a more efficient algorithm. We could imagine the model electing to clarify intent better up front, asking which approach was preferred (a naive full sort, a heap-based partial sort, or a quickselect-based solution), with a one tap "learn more" button on each. The ideal experience is collaborative and builds user trust and understanding in the resulting code.

While a nice hypercontextual interfaces framework will come with standard UI controls for things like sliders, radio buttons, and so on, it should also be extensible. Many domains warrant custom UIs for information or interactivity presented to users, and coding assistants are just one example. For instance:

  • An accounting application should be capable of showing editable spreadsheet fragments for users to understand and play with, building up a well-understood model piecemeal rather than generating a huge spreadsheet and asking the human to review it and hunt for errors.
  • A diagram-generating program should allow the user to visually select and to directly edit or comment on elements of the diagram.
  • And so on…

Feasible AI vs the sci-fi fever dream

When it comes to applications of AI, there is often a gap between “works well enough for a cool demo” and “works well enough for production usage by a general audience”. Companies sometimes roll out pilots of AI features (inspired by the cool demo) only to find the technology just isn’t good enough for real usage or it’s simply too expensive to operate at production scale.

We advocate for feasible and honest AI, in being humble about what the technology can currently do well today and building realistic and reliable systems that are useful now, not in some imagined sci-fi future world. This generally means some sort of interesting collaboration between humans and automated systems, and in such systems we should care deeply about what interface best serves the humans involved. We shouldn’t force users to carry out all interactions with text and natural language just because that’s what LLMs more natively speak!

Andrej Karpathy, inventor of the term “vibe coding”, had this to say about AI coding assistants, arguably one of the most successful application of modern AI outside of general LLM chat interfaces:

On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don’t want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don’t feel ready to supervise a team of 10 of them. I’d like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I’d like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I’m told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn’t done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc.

Like Karpathy, we would like to live in an “intermediate world” where humans and LLMs collaborate in sensible ways to produce results we understand and have confidence in.

The examples shown in this essay barely scratch the surface. In the coming years, the industry will continue pouring creativity and energy into building all sorts of software that involves humans, AI agents, and deterministic code, arranged in intricate and interesting collaborative systems. And such systems will increasingly start to displace the usual mode of organizing applications into fixed information hierarchies. More dynamic, contextualized interfaces will become the norm, and indeed we might look back on the software of today and wonder why we ever accepted its rigidity.

The programming model of hypercontextual interfaces

While it’s possible to build hypercontextual interfaces manually for specific cases, the underlying domain is deceptively complicated. A hypercontextual interface is a UI paradigm for multi-party, computational conversations. Unlike conversations in natural language, a computational conversation may exchange rich types and domain objects, with interesting state accumulating as the conversation proceeds (over the course of days, weeks, or months) and with multiple participants (humans, agents, and deterministic code) all contributing to the conversation. And this is all playing out with potentially multiple concurrent threads of control.

A good programming model for this domain is close to “just” regular concurrent programming, but augmented with the ability to pause and surface different UI controls at pause points. If we squint, it’s not terribly different from being able to call the readLine() function to pause a thread and request a line of input from the user, except that:

  • Instead of only being able to display text, the programming model allows for displaying richer UI elements whenever it requests user input or sends them output.
  • A pause can be targeted at specific users or roles (as opposed to a console app, which can only request input from whoever or whatever is feeding bytes to standard input).4

Yet there’s one crucial aspect that makes this hard: because of how long-running these conversations can be, it isn’t sufficient to keep the state of a paused program in memory. That’s not resillient enough; we’ll need to persist it so it survives node restarts or failures or redeploys.5

Notice this challenge doesn’t really arise in chatbot interfaces, where state is captured by the message history and thus the chatbot can always be “resumed” given that same message history. But we don’t just want chatbots. When a hypercontextual interface requests input from a human, the state needed for resumption of the computation (technically, the continuation) can have arbitrary structure—the result of parsing out complicated and interesting types from user interactions, computing new values, and so forth. A flat list of messages is insufficient.

Long-running conversations are not a special case

Long-running computational conversations are common even for single-user systems, but they become the norm whenever multiple parties are involved, since not all parties are necessarily in the same room synchronously participating. Think of (for example) a hypercontextual interface for initiating a wire transfer, requiring signoff from a second or third participant before releasing the wire. Or a coding assistance system where multiple people add their reviews, possibly in an iterative loop before merging. And so on.

To get a sense of what information needs to be saved at these pause points in the general case, think of using a debugger to set a breakpoint somewhere deep in a program’s call graph. The program stops running, letting the programmer inspect values and resume the computation. The debugger can be said to keep a representation of the program’s continuation from the breakpoint, enough information to resume its execution whenever the programmer wants. The continuation might be represented as a stack of call frames, a function pointer and instruction pointer for each frame, the values of all local variables, etc. In more interesting computational conversations, these continuations capture a lot of complicated state, and this state will differ for each of the places where the conversation can pause.

As there may be an unbounded number of such pause points in a hypercontextual interface, manually handling persistence and resumption quickly gets untenable. A principled approach is needed if we want a solution for the general case.

Ideally, we could persist continuations directly to storage and read them back and resume them later, but this, too, is a challenge. How do we persist an object that contains function pointers and instruction pointers? What does it even mean to serialize such a thing and read it back days or weeks later, possibly with a completely different codebase? What if the code for one of those functions in the call stack has been changed in the meantime? Or what if one of the types stashed in a local variable of the call stack has changed?

If you squint, you will start to see people inventing various approximations of a program’s continuation that are easier to persist, migrate, and maintain, either via one-off solutions tied to particular use cases, or with various flavors of event-sourcing and other tricks. Discussing these alternatives is out of scope for this essay, but having a robust way of directly capturing and restoring continuations is by far the simplest and most efficient approach if you can manage it.

In our next post, we’ll introduce our framework for easy construction of hypercontextual interfaces, showing side-by-side examples with code and the corresponding UX. By building on a deep method of capturing and saving program continuations, the resulting API is shockingly straightforward to program against. Requesting input from the user is a single function call, and examples like those shown in this post look like any other boring straight-line code, and there is no interleaved boilerplate for persisting state or resuming computations from storage. As it should be.

Footnotes

  1. Perhaps well beyond the lifetime of the node it first started running on.

  2. Spoiler, yes!

  3. MCP Apps and ChatGPT Apps are both along these lines.

  4. There are a few other needed features we won’t discuss much here, like durable sleeps and sleeping until certain external events arrive.

  5. As an added bonus, making state persistent means we can route user input to any available node to resume the conversation, rather than needing such requests to be routed to the node where the interaction began.