I asked a legal tech CTO to explain fine-tuning and RAG. Here's what attorneys evaluating AI tools need to know.

May 29, 2026
Contributors
Share

Mauricio Duarte: COO at A2J Tech; practicing attorney of 10 years; 8 years at the intersection of legal practice and technology.

Pierre Martin: CTO at Gavel; formerly Microsoft Research, Amazon, Xbox; 17 years building AI systems.

I've been a lawyer for the last ten years and working in legal technology for the last eight, and obviously, things are getting crazy.

I'm a realist and pragmatic individual. AI has not necessarily been around since 2019. It has been around since even the 90s, so AI is not necessarily just starting out. The thing that catapulted everything was the existence of large language models, starting with ChatGPT. Now we're hearing about the competition and the arms race between different AI models, and everyone's talking about AI.

The gap I keep running into is this: people think it's just about adding any AI layer on top, and it will figure it out. That's not necessarily the case. So I wanted to go deeper on that, which is why I sat down with Pierre Martin, the CTO at Gavel.

Pierre has 20 years of experience building AI systems in production at Microsoft Research, Amazon, and Xbox before joining the legal technology world. What's important to understand about Pierre is that he's not necessarily approaching this from a vendor perspective. He's the person who has had to build these things, figure out what works and what doesn't, and keep refining it. That perspective is what makes this conversation worth having.

So what I wanted to understand, and what a lot of attorneys are trying to understand, is: what is fine-tuning, and does it matter for legal work? What is RAG, and what does that change about how these tools perform? When does it make sense to build your own tools, and when does that create more problems than it solves? And what should attorneys be looking at when evaluating AI products today?

What follows is that conversation, edited for clarity.

The interview

Mauricio: I keep hearing fine-tuning and RAG thrown around in every legal tech conversation I'm in. As a lawyer advising on technology adoption, I needed to actually understand what these mean in practice, what the difference is, and why it matters. Can you break that down?

Pierre: Yeah, great question. Let me start with fine-tuning. So what does fine-tuning mean? We already said that an AI model or language model is trained on a dataset. It's actually generating statistically probable content. But it's statistics, right? So you could also deviate, and this is where you have the topic of hallucinations: the model just decided to do something that's a little less probable. And certainly it doesn't make sense to a human, right?

When you fine-tune, in a way you retrain, or you add another level of training to the same model, to learn from a specific dataset, right? So this is where I would say: I'm going to give you all my contracts and fine-tune that model using my personal contracts. So it learns how to statistically reproduce language that's available in this set of contracts.

Mauricio: So the idea is you're training it on your own documents so it sounds more like your work specifically?

Pierre: That's the idea. But fine-tuning is kind of falling out of favor. I think the general industry is moving away from fine-tuning because it has two fundamental problems.

One, it really doesn't solve the hallucination problem, right? It will still just statistically sometimes do things that don't quite match what you provided it. You can invent, right? Just inventing word by word. So you can really just invent things that are not real. Hallucinations and fine-tuning is still a real problem.

The other issue is that fine-tuning is model specific, right? So we tried to do that actually at Gavel, early stage. I remember when GPT, I think it was 3.5, was like a big step function for us, right? And we said, oh cool, let's fine-tune it and see if we can become better at legal stuff, right? And then the next model came out and we realized, oh, actually all the effort we spent on fine-tuning that one model was lost.

And I think that becomes a very costly process over time, especially now where there are so many really great companies releasing new models very fast, right? You can't even fine-tune fast enough to catch up with the next model coming out, right? So I think generally the industry is kind of going past fine-tuning. I mean, there are definitely cases where it would make sense. But moving a lot more towards retrieval augmented generation.

Mauricio: And that's RAG. So walk me through what RAG actually means, because I hear the acronym constantly but I've never heard anyone explain it in plain language.

Pierre: Yeah, so retrieval augmented generation, what does it actually mean? It's like you would give a set of sources to your AI agents and give it an ability to search your sources, the way you would do a Google search. It's just the agent would then enter a workflow where it would realize: I actually need to check if there is any reference to a confidentiality agreement in my context, right? So let me search for confidentiality and pull up all the clauses and all the contexts that reference that and try to make something out of it.

And this is where in our tools, in similar tools to Gavel, you would actually see specific quoted sources that you can open up and actually see the document reference to the same paragraph, right? This is coming from retrieval augmented generation.

Mauricio: So instead of baking the knowledge into the model upfront, you're giving it a searchable library it can go look things up in at the moment it needs them?

Pierre: Exactly. And I think it's a much better method for legal, because in legal, sources are important, right? You need to be able to quote your sources specifically.

And this is where context engineering becomes critical, right? Because the ability to create those search indexes at a certain scale, like what we're shooting for is you can attach 10 gigabytes of data, or just point to a folder, and have Gavel as a system construct high-signal context that is searchable. Which is a hard problem. It's a hard engineering problem, right?

Because when you have a large dataset, where do you find the relevant information, the most relevant to the query of the user, is really the problem we're solving in depth at Gavel. And then how do you integrate the search tool into your AI agents in a way that's effective, where you're able to pull sources when you need them, you can recheck your sources, and then iterate on it, but within a certain time constraint, right? I think that's the art of actual engineering. Even though with these coding tools we're not engineering line by line anymore, we're still designing systems and designing products that are here to achieve high accuracy when you give them large amounts of data to sift through.

Mauricio: For legal teams that are evaluating tools right now, what should they actually be asking vendors about how they handle this?

Pierre: Ask whether they use RAG and ask to understand what that means in their system. And specifically, ask whether you can see the cited sources. If a tool gives you an answer and you can't trace it back to a specific clause in a specific document, that should raise a question. The whole point of RAG in a legal context is that sources are verifiable. If they can't show you that, you should understand why.

The other thing I'd push on is scale. It's one thing to demo RAG on a short contract. It's another to have it work reliably when you attach 10 gigabytes of data, or when you're pulling from multiple document types, different formats, different practice areas. That's where the engineering work is, and that's where the real difference between tools becomes visible.

Mauricio: And I think that connects to something you said about the layered nature of these tools, that when you're buying from a legal AI vendor, you're actually buying from a whole stack of providers underneath. Does that affect the RAG question too?

Pierre: Absolutely, right? What we typically mean by that is you're not only buying a solution from one vendor. When you buy a service from Harvey or Gavel, we're one part of the solution, but we obviously have dependencies on other providers. We're using OpenAI and Anthropic, we're using Amazon Web Services. They're also part of the data processing pipeline, right? So asking the tough questions about where does data actually go, and who can do what with it, is critical. And that applies to the RAG pipeline too. Your documents are moving through that stack every time a query runs. You want to know the path.

In legal AI, the question is not what approach a vendor uses. It is whether they can show you what happens to your data at every step of the process. Fine-tuning or RAG, that part does not change.

Closing Thoughts

I believe this was a great discussion. We were able to balance the technical versus the non-technical, which is something I always try to do because that's what the legal technology space needs more of.

Pierre's closing thought was that educating ourselves and communicating are key right now, and I agree with him. The industry is changing so fast, and things are going to be changing every single day, every single hour. That's not going to slow down.

What I'd take from this conversation is something I've believed for a while, but that Pierre put into words clearly. The gap between a generic AI tool and one that performs well in legal work is not necessarily about the underlying model. It's about context: your workflows, your standards, and your clients' specific situations. That's an important distinction attorneys should keep in mind when evaluating these tools.

One last thing from my own experience: I do vibe code and don't necessarily understand code. What I've learned is that if you just vibe code everything without understanding what's being built, eventually it creates more problems. The same applies when attorneys are evaluating AI tools. Ask the hard questions until your judgment is satisfied, and if you start noticing your internal hand-waving, that's probably a sign that something is off.

At A2J Tech, we think about this a lot. Pierre mentioned that ninety percent of legal demand in America goes unmet, and that's the reason why getting this right matters, not just for the legal profession but for the people who need legal services and can't access them.

I always remind everyone to visit gavel.io to see what Gavel is building.

If you're interested in what we do at A2J Tech, visit goa2jtech.com.

Mauricio Duarte is an attorney with 10 years of experience, COO of A2J Tech, and Of Counsel Partner at the firm he co-founded, focused on venture capital and private equity for tech startups.

Subscribe to our newsletter
By clicking Subscribe, you're confirming that you agree with our Terms of Service.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.