The legal profession enjoys a number of protections. It’s illegal, for example, for non-lawyers to provide legal services. That’s called unauthorized practice of law. It will get you in trouble in most places. But there’s another important shield protecting lawyers from the competitive heat of automation and innovation: it’s what I call the “legal data firewall.”

What is the legal data firewall? In simple terms it’s a lack of structured data. Laws are written in words. Contracts are full of words. Lawyers are free to express their words in almost unlimited ways, and most take full advantage of that freedom. The result is an explosion of what geeks would call unstructured data. Loads of fuzzy text. Very little standardization. Oodles of nuance.
This is great fun if you’re a lawyer. You can read a fifty page contract and find all the juicy words that might expose your client to risk. You can translate that mess into some advice that’s slightly easier to digest, and you get paid for your efforts.
But it’s an absolute nightmare if you’re trying to automate something. If you want a machine to take over some of what lawyers do, you need structured data. You need to translate all the legalese into a common language of yes and no. You need to speak binary. You can’t say “it depends.”
So, that’s what I mean by the Legal Data Firewall. Without the ability to translate legal words into standardized or normalized, structured data, it’s almost impossible to automate legal tasks and analysis.
Suppose you want to automate a contract approval process. Rather than have every contract reviewed by Legal, you want a machine to assess an incoming document for materially risky terms. If there are no materially risky terms, you give the business a green light and let them sign the deal without further delay. This immediately begs the question: how does the machine decide what materially risky terms look like?
You might tell the machine that lack of price controls in a buy-side contract is materially risky. Now the machine needs to know what buy-side looks like, and what a pricing clause looks like. But that’s not enough. Risk often lurks in the specifics of a clause, not just its high level classification. A price review clause that indexes changes to CPI is relatively low risk for a buyer (once you’ve taught your machine about inflation and indexing you’re all set). A clause that locks in fixed annual percentage increases may be risky if that number is set too high. But “too high” is both subjective and fluid: it depends on context and changes in market conditions.
As you can see, the devil is in the details. Which means a structured data model must address many details before you can build a machine that replaces a human expert.
But saying it’s complex and difficult is not the same as saying it’s impossible. It is possible to create a very fine-grained data model into which most contracts can be translated. It is possible to normalize legal words into data. It’s really just a matter of time and effort.
The road to structured legal data has two possible paths: (1) standardization, where a standard model of words and data is agreed through a formal collective process; and (2) normalization, where an objective data model is developed as a translation layer for whatever legal words are used in the wild.
I am skeptical that a standardized model (path 1) will emerge any time soon. Lawyers have agreed standardized language in certain narrow industry contexts (derivatives, insurance and construction, to give three examples). But there are powerful forces working against broader standardization of legal language. First, lawyers like to argue. Big time. Second, standardization is seen by many as a path towards commoditization. Not a selling point.
Normalization (path 2), on the other hand, is in the hands of data-driven disruptors. Rather than dictating standard words, a normalization approach creates an objective model of issues, objects, risks and data capable of accepting any legalese through a mapping exercise. This is already happening (Coupa has a universal contract model, for example), and like many disruptive forces, it will reach a tipping point of maturity where it takes off because it works.
Think about it. In the current paradigm you ask an expert whether a contract is OK to sign and, some days later, you get a meandering “maybe” reply. Or, in the normalized, data-driven paradigm, you get a machine analysis with a green light (yes, you can sign), graphic visualization of risk factors, benchmarking of this deal against industry norms, and recommended price adjustments to offset expected risks. Which world would you choose?