28th January 2026
4 min read
A different result from experimenting with AI in case administration

Ian Clarke
Senior Developer
Delivering the benefits of AI to users
Repeating the same thing but expecting different results, might be an oft-quoted definition of insanity, but we’ve all had to get used to it over recent years with non-deterministic AI models.
At Iken, we’ve been watching closely as model capabilities have improved, hallucination rates have dropped and behaviour — if not actual output — has become more deterministic.
So, we’ve spent a lot of time thinking about how AI can enhance the user experience of interacting with our system. As our Chief Product Officer Zak said in his post about practical AI implementation, our focus is making “case administration easier and faster, so professionals spend less time on repetitive tasks”. We are using AI extensively in-house, and not just for coding, under the governance framework Phil, our Chief Information Officer, wrote about in his piece from last week — and we want those same benefits for our users.
As technical lead for this work, I wanted to share our view of the current landscape, and give an idea of what to expect from Iken products in the near future.
Context engineering
The state of the art LLMs are now highly proficient at iterative tool calling. This means that a model should be able to reliably perform a string of actions for the user, such as creating a new case or file in the system, and then creating documents inside it, from a single prompt by the user. For such use-cases, there is really no need for Iken to fine-tune a model, and our initial assessments found the models from Anthropic to perform the best at these tasks.
If we are then to delegate the model layer of a product to one of the big AI labs and merely call an API when we require inference, then what is left for Iken to do in the app layer? The answer is “context engineering”.
This means ensuring the model is given the best possible information, alongside the prompt from the user, to enable it to carry out the actions.
This will obviously need to include the list of tools available to call, but should also details about the user and which page they might be on in the application. A “system prompt” included with every conversation would also allows us to guide the model with higher-level advice for using our suite of tools.
This might seem a relatively straightforward addition compared to the complexities of the model layer, but there is significant potential for optimising the arrangement of data in the context window to favour caching, decrease latency and reduce token usage.
Tokenomics
You don’t have to avidly follow the stock price of NVIDIA to know that GPUs are in high demand and purchasing compute capacity in data centres is expensive. While this is magnified in LLM training, running inference workloads is also a cost-sensitive operation. The number of tokens (words or part-words) in each conversation, directly affects costs — with tokens output by the model being, typically, five times more expensive than input tokens.
Choosing the right size model is important for the speed of response (smaller models being faster) but also for cost. In our evaluations, we found that Claude Sonnet was able to perform most of the tasks in our suite without us needing to opt for the larger, more expensive, Claude Opus. We also tested the smaller Claude Haiku (all three at v4.5) but found some failings at prompt adherence for more complex situations.
While tool calling for, say, creating time items, isn’t going to clock up much of a bill, this becomes more of a concern when we consider reading and writing documents.
To RAG or not to RAG
To be able to semantically search across all your documents, or even across those in a single case or file, it is necessary to index the text into a vector database. For dynamic document repositories like Iken’s where documents are being edited all the time, this could mean a lot of compute to keep the vector store up-to-date, and significant latency in recent changes being reflected.
Instead, we’ve considered building a retrieval augmented generation (RAG) system on top of Microsoft’s Retrieval API which would be a good fit for our SharePoint-based document storage — but at the moment, we’re focusing more on how we can interact with one or two specific documents only. For example, if the user wants to create a disbursement from an invoice, relying on a RAG system’s indexing of that document introduces unreliability, and at the cost of a few more tokens, we can read the whole document into context.
There’s always going to be some inaccuracies with converting file formats such as PDF and Microsoft Word’s XML into Markdown which the models can understand, and we’re not looking to add further uncertainty with needing vector-space similarity matching to return us the correct portion of a document.
In terms of writing documents, achieving the necessary formatting requirements is a real challenge as Microsoft Word is a very difficult format for these models to work with, but there might be some circumstances where outputting text with only headers and lists will be better than nothing. However, as our focus is case administration not case work, the ability to write documents is not an immediate priority for us.
Prompt injection and safety
Before you ask a model to do something for you, you need to have confidence it is not going to destroy your data, or leak it outside your organisation. Prompt injection is a real concern, and while we can always add human-in-the-loop checks before certain tasks, and show the domains clearly for generated links, our top priority in planning this work is minimising these risks.
The danger of prompt injection is nicely summarised by Simon Willison as The Lethal Trifecta, the three parts — which cannot all be present — being:
Access to private data
Exposure to untrusted content
Ability to externally communicate
Any system we built would obviously need to access private data to have any utility. We would want it to be able to read emails and other documents which should be considered untrusted, but we likely wouldn’t allow it to communicate externally. We would prevent arbitrary HTTP calls, and we wouldn’t want it to send emails without human interaction.
However, Meta AI adapt this trifecta to an Agents Rule of Two (you can only have “two” out of the three), changing the final item to the ability to “change state or communicate externally”. Such a system would change state, but human-in-the-loop checks could be put in place to control high-risk state changes (such as deletions).
Don’t repeat, automate
We can’t say when we will be able to ship this vision of an AI-enhanced Iken to users, but I hope you’re as excited by these ideas as we are. We really believe that there could be significant time savings from having a natural language interface within the Iken app that can serve as a quicker option than repetitively clicking through the same web forms again and again. Even just the ability to save a long text prompt to run again in the future (while you can’t save your mouse clicks), could be a clear benefit.
So, maybe the real insanity is just the repetition.
We’ll keep you informed as the work progresses but that’s all I have to share at the moment — I need to get back to building you a better Iken.
Learn More
If you’d like to learn more about how Iken supports local authority legal and governance teams, get in touch or explore our latest case studies from councils across the UK.








