Tackling Explainability and Interpretability in Language Models

PUBLISHED IN

Insight by Rob Lowe, Associate Director of Digital and AI Services at alliant

If you have any questions about this article, please send us a message.

The very nature of language models—large language models (LLMs), foundation models, and frontier models—confounds the notions of explainability and transparency. Even smaller open source language models have too many parameters, hyper-parameters, weights, and features for data scientists to consistently identify which one is responsible for a particular output.

Imagine, then, the difficulty of a modestly sized enterprise attempting to do the same thing, yet without data scientists, a large IT department, or the wherewithal to finetune (let alone train) their own models.

Consequently, organizations are in a precarious position. They have not trained language models, know little about their inner workings, and yet are still expected to be able to explain their results—especially when they pertain to mission-critical business processes, customers, and regulations.

“No one thing is going to be able to solve this problem,” admitted Scott Stephenson, Deepgram CEO. “And, I don’t think if you did everything that’s available today, you could solve this problem. This is still an active area of research and development. No one solution or school of thought has it figured out.”

However, by availing themselves of a number of different approaches, organizations can drastically increase transparency for understanding why language models produce specific outputs. The most prominent of these techniques involves context engineering, thinking or reasoning models, chain of thought, retrieval-augmented generation (RAG), and multi-agent architectures.

Nevertheless, the task at hand is far from trivial, particularly for those employing models from managed service providers, which is what most organizations are doing.

“Not only do you have the black box of how does the model function, but you also have the black box of whatever this company has wrapped the LLM in,” noted Blair Sammons, director of solutions, cloud and AI at Promevo.

EXPLAINABILITY OR ACCOUNTABILITY?

The need to explain the outputs of probabilistic machine learning models has long been integral to statistical AI adoption rates. When deep learning and deep neural networks were the vanguard of this technology, explainability and interpretability were facilitated in several ways. Techniques like Local Interpretable Model-agnostic Explanation (LIME), Shapley values, Individual Conditional Expectation (ICE), and others enabled users to understand the numeric significance of model outputs, which bolstered explanations for them. According to Ebrahim Alareqi, principal machine learning engineer at Incorta, “With LIME and Shapley values, they were good at highlighting which features ended up being used in a limited number of parameters. As we got to LLMs, it’s a whole different thing because now we’ve moved beyond a billion parameters.”

The mere size and intricacy of LLMs have resulted in an evolution of what explainable AI has come to mean. The term is transitioning from one in which each output of a model can be attributed to a specific weight, parameter, or feature to one in which there is transparency about why a model responded as it did. The importance of explainability, then, is rapidly being replaced by more pragmatic notions of rectitude and traceability. “If the model always does the right thing, do you really care how it did it?” Stephenson posited. “You really care when it does the wrong thing. That’s why you open it up and see why it did something. You reduce the need for explainability the higher the quality of the actions of the model are.”

REASONING MODELS

The shift in priorities from explainability to transparent accountability of language models is typified by reasoning models, also known as thinking models, which Alareqi described as a modern means of understanding model results. According to Jorge Silva, director of AI and machine learning development at SAS, “A reasoning model is one that not only answers your prompt, but provides the steps in a human-like fashion that it took to get there.” Many believe that when such models, which include Qwen, Open AI’s gpt-oss, and DeepSeek-R1, detail what specific steps were “thought about” or performed to reach a conclusion, this provides concrete traceability. This way, humans can effectively “replicate the steps,” Silva pointed out.

Examples include everything from employing these models to solve quadratic equations to explaining which product is predicted to sell better in the coming quarter—as well as why. These capabilities are predicated on the chain of thought form of prompt augmentation. According to Alareqi, when chain of thought is ingrained within reasoning models, it “is the way to look [at] what is inside the model.” However, these models are far from flawless. “There’s some research that shows reasoning models are more prone to hallucinations,” Silva commented. “Because we’re asking for more information to be returned without giving more inputs, it needs to extrapolate more.”

CHAIN OF THOUGHT

The chain of thought technique that is embedded in reasoning models initially emerged around the same time as the more popular RAG method. Adoption rates may have been compromised by the upfront effort involved. “Before, you would have to write this chain of thought for the model,” Alareqi remarked. “With the newer models, we don’t have to do this because it’s inherent in them.” With organizations required to explain the steps, for example, to predict customer lifetime value or which loans to approve, this methodology required more tokens to send to models, resulting in higher costs.

However, because thinking models have chain of thought embedded within them, there are lower costs for inputs and outputs. According to Stephenson, a caveat for employing these reasoning capabilities to facilitate transparency for model responses is, “You don’t allow the chain of thought to be too general. It needs to be very specific. So, the model doesn’t say, ‘Compared to all the competitors’; it says, ‘Compared to competitor one, and here’s the name, and competitor two, and here’s the name, and competitor one has these qualities, etc.’”

AGENT-BASED ARCHITECTURES

Employing agents fortified by language models is another means of reinforcing transparency for the results of the latter. As Sammons observed, such agents can “do our jobs for us. They can automatically gather customer sentiment, create a campaign against that sentiment, while also updating a product to fix bugs. They can do things we would normally pay lots of people to do.” Organizations can employ agents to monitor the outputs of reasoning models, as well as whether or not models actually performed the steps they claim they did. With this paradigm, users “have a critique model where its job is to look at the inner monologue of a model and say this is good reasoning or not,” Stephenson explained. “Or, ‘Hey, you need to explore this area.’ This is kind of like a manager.”

Organizations can furnish transparency for the actions agents take by performing what Saurabh Mishra, SAS director of product management, termed “path analysis.” This mechanism typically provides historic information, complete with visual cues, about which steps individual and multiple agents took to complete some of the tasks that Sammons mentioned. Such analysis would reveal “the path the agent followed; it made a language model call here and had to call five tools to get this response, then it finished by posting this on a dashboard,” Mishra said. “It’s like looking under the hood instead of a black box when I ask an agent to do something.” Such auditability buttresses transparency. Additionally, path analysis can be employed prior to deployments as a means of testing the worthiness of agent-based systems.

CONTEXT ENGINEERING

The concept of context engineering emerged as a means of improving the prompt engineering conventionally associated with RAG. By providing more context to language models, the aim is to transition from “a black box to a glass box, where I can see what’s going on inside the model,” Sammons said. Although context engineering is a pivotal construct for contemporary RAG implementations, it’s also important for the employment of reasoning models. According to Alareqi, context for models stems ultimately from the data itself. The more context you give a model, the more it draws out of that context. Consequently, it’s imperative organizations have well-governed, quality data to provide as context for models. Mechanisms like “creating a data dictionary and a slang dictionary that will have internal definitions of what this means in the real world, which will help infer what the prompt might mean, can prompt language models in the right direction,” said Rob Lowe, associate director of digital and AI services at alliantDigital.

Context engineering is important for crafting accurate model responses—which considerably reduces the need for explainability—because it provides information on which models have not necessarily been trained or fine-tuned. Even a model that has been trained on a specific domain would still need contextualization when chatting with an organization’s customer, for example.

How is the voice agent supposed to know what the customer has already purchased?” Stephenson asked. “Or, what’s your name, how to say it, or that you had problems in the past with your returned item and that’s what you’re calling for?” Such details should be included in prompts as context. Furthermore, the input windows for model prompts have become drastically bigger, so that, for example, organizations can include the contents of an entire CRM system in a prompt to provide context. Such detailed, domain-specific information would boost the accuracy of questions in response to that system—making models more accountable and dependable in their outputs.

RAG

Although RAG doesn’t necessarily supply explainability so much as it increases the accuracy of models by expanding the context from which they produce answers, it still makes their outputs easier to understand. Or, more specifically, RAG “can measure the groundedness or the faithfulness of a response,” Sammons commented.

Here are some of the ways organizations employ RAG to increase the transparency of model outputs:

Attribution: One way RAG allows models to generate more dependable outputs is by presenting specific sources, passages, and sentences from which a model’s response was culled. Such citations provide improved “visibility to users in the source of the responses and traceability of the source content,” Lowe indicated.
AI Evaluation: With this technique, organizations compile the documentation that will serve to augment prompts into a knowledgebase, then have humans create sample questions and responses about it. Those answers, which have already been approved by organizations, can be used to assess the response of RAG systems before they’re put in production. “Now, you can ask a question of your RAG and first call your knowledgebase, do a search, and then call the LLM and [see] what response you get for the exact same question,” Mishra said. Organizations can also employ language models to devise the questions asked of the knowledgebase, which conserves time and effort on the part of humans, before following the foresaid process to evaluate their RAG systems.
Data Quality: Although data quality is foundational to any data-centric practice in general, it’s integral for getting reliable results from RAG or from language models altogether. “If your data is conflicting, unstructured, and wrong, AI is only going to interpret what it’s suggesting,” Lowe pointed out. “To eliminate hallucinations and false returns, you need to be in good shape at the fundamental data level.”

HUMAN INTERPRETABLE

There is no shortage of measures for enlarging the transparency into how language model outputs are derived from specific inputs. Users can avail themselves of chain of thought techniques, reasoning or thinking models (with chain of thought embedded in them), multi-agent frameworks, RAG, or context engineering. Each of these constructs makes the inner workings of models more readily apparent. It may not do so at the interpretability and explainability level of the deep neural network heyday (before language models), but it helps prevent the proverbial black box effect.

According to Stephenson, there’s one final consideration. “All of it needs to be human interpretable,” Stephenson revealed. “This is a very important piece. These inner monologues have to happen in human language: the way we talk, read, speak, and understand it. If it was just bytes and numbers, we wouldn’t be able to read and understand it. So, a very important piece of explainability is having a way to express what the model is doing in a way that a human can understand.”

Even smaller organizations can tackle explainability and interpretability in language models by looking at reasoning models, context engineering, and RAG. These offer practical pathways to transparency and accountability. Critical to determining success is whether outputs from models remain interpretable, auditable, trustworthy, and aligned with human understanding.

Featured Leadership

Rob Lowe is an Associate Director of Digital and AI Services at alliant, where he manages daily operations. He has two decades of digital product development experience. Prior to this role, Rob was a consultant for the alliant group of companies’ United Kingdom operation, Forrest Brown. He has also held several leadership roles, such as Head of Digital at Harte Hanks, where he managed developers across three continents and oversaw projects for clients such as Samsung, Microsoft, Toshiba, and AB InvBev.

At alliantDigital, Rob leverages his two decades of experience by leading the development of digital product roadmaps, maintaining quality standards, and ensuring client satisfaction. He was instrumental in creating alliantDigital’s suite of digital products such as its next-gen AI chatbot and automations, and is passionate about bringing emerging technologies and AI tools to businesses in a variety of industries.

Tackling Explainability and Interpretability in Language Models

PUBLISHED IN

EXPLAINABILITY OR ACCOUNTABILITY?

REASONING MODELS

CHAIN OF THOUGHT

AGENT-BASED ARCHITECTURES

CONTEXT ENGINEERING

RAG

HUMAN INTERPRETABLE

Featured Leadership

Related Posts

Former IRS Commissioner: Here’s how we used AI to create immediate value when taxpayers scrutinized every dollar

AI laggards can still come out ahead

Dhaval Jadav Joins the SETI Institute Board to Help Spearhead Novel Science and Technology Approaches in the Search for Extraterrestrial Life

Contact Info

Services

About Us

Insights

R&D

179D

Export Incentives

CPA Services

Scholarships

Community

Careers

Hiring Credits & Incentives

Agriculture Industry

Manufacturing Industry

Engineering Industry

Software & Tech Industry

Architecture Industry

Industries We Support