Lexis AI versus ChatGPT-4

I am sure by now many of you have seen the commercial for Lexis AI . The action themed pop music, dramatic screenshot closeups, really gets a researcher in the mood. Watching the technical developments of Law Startups reminds me of the console wars of my generation, namely between NES, and Sega.

Rather than wait for the competition’s response, I went searching myself for the Westlaw alternative. While I was in law school, Casetext was popular with my classmates (I used Quimbee). At the time, Casetext provided access to case law which was readily available on the web, and could also be accessed through Google Scholar. The annotations and summaries were created by a team of volunteer lawyers, law professors, and even law students. As time progressed the startup added fee-based premium services. Casetext, entered the AI arena with its product, Co-Counsel, which was quickly purchased by Westlaw. But could the two major legal publishing giants pushing out generative chat transformers lead to them ultimately shooting themselves in the foot in the long run. One might be inclined to think so. More and more professional researchers are peeking under the hood and gain a better understanding of how Generative Pre-trained Transformers and LLMs work. As our knowledge increases, we may (as a group) become less impressed by companies claims to the uniqueness of the next rollout.

Language Models. (NLP) and how they work (generally)

Language models have been around since the 1950s and they used a rule-based approach to process and understand human language. By the 1990s the models added statistics to indicate a predictive likelihood of a word or phrase being used. There are over 28,000 models of Machine Learning for text. The spell check, or auto complete on word processors are just two examples of how your average person used similar models at the time.

Large Language Models use neural networks, which are designed to artificially imitate the human brains organically interconnected neurons (hence the term Artificial Intelligence). In real life these neural networks, coated with our friend myelin, process input data from our 5 (possibly 6) senses and make predicted responses (or lack thereof) based on this learned information.

Neural Networks - Real (above) Artificial (Below)
Neural Networks – Real (above) Artificial (Below)

Sighted people make these connections with the thousands of images they see during their waking hours. When it comes to written language, literate people process the words, and symbols they see, hear, or feel (in the case of braille). Our neurons then process this input data and learn patterns. We then learn to make predictions based on learned information.

Dime con quién andas, y te diré quién eres

The above popular Spanish refrain is said to have come from the older Arabic saying “A man will follow the religion of his friends…”  English linguist J.R. Firth, applied such reasoning to words with his phrase, “You shall know a word by the company it keeps.” And through associations, our ability to predict in one language can applied to languages that are less familiar to us. Thus, regardless of your ability to read Arabic, if I write, “I love you with all my قلب (qalb) most readers can accurately predict that قلب  (qalb) means  heart.  Although it is possible that the phrase could have ended in ‘mind’ or ‘soul’, Our training will predict that the word ‘heart’ is the more likely ending.

 

Large Language Models

Although they are fundamentally similar, a major difference between an NLP and a Large Language Model is the scope of the data which the latter is trained on. Parameters are the internal settings or configurations of this program that get adjusted during training. LLM has parameters numbering from a few thousand to upwards of a million. LLM have parameters numbering in the billions. ChatGPT3 for instance has upwards of 175 billion parameters and ChatGpt-4 has over 1.7 trillion parameters.

This storage and network adjustment ability makes it an impressive source of generative data. But how impressive is its function when it comes to the very narrow (comparatively speaking) topic of law. We tend to think of the law as vast, but in the context of western law, post-Westphalia legal systems deal primarily with laws as being the product of states. Limited numbers of states translate to a limited number of laws. This is further made finite by a practitioner’s primary concern with finding “good law” that is preferably mandatory. Does hallucination cease to be a concern with legal AI or does this drawback simple confine itself to the universe of the legal dataset.

So, what should our expectations be when we develop and informed understanding of how these models are applied to the legal field generally. Or specifically, how does its incorporation into the Lexis Nexis realm, fare in comparison with its more accessible alternatives?  Well rather than hypothesis. Watch this video comparison and you be the judge.