For over a decade, artificial intelligence (AI) has played an important role in eDiscovery, particularly in the review phase. One of the primary applications for AI has been in technology-assisted review (TAR), which has been accepted as black-letter law since 2015.
But AI has come a long way in the last seven years. Recent advances in AI—specifically in the generation of text through a new chatbot—may have eDiscovery professionals wondering about other ways that they could use AI to streamline their workflows, saving both time and money.
In this blog post, we’ll take a closer look at some new developments in artificial intelligence generally. We’ll then shift our attention to legal applications for AI’s new capabilities, specifically how text-generating AI systems could be useful in eDiscovery.
Finally, we’ll touch on some of the ethical concerns and other risks that AI poses and recommend a few common-sense precautions for eDiscovery professionals who are considering expanding their AI tech stack.
Recent advancements in artificial intelligence
First, let’s take a closer look at a few recent developments in AI. Note that these aren’t legal applications for the technology; we’re simply pointing these out to show how varied and sophisticated AI applications have grown in the last several years.
Low- and no-code AI
You don’t have to know how to write code—or how to train an AI system—to develop an app that leverages AI technology. With low- and no-code AI tools, users can combine existing elements to create their own smart apps. A retail store could design an app to recommend new products for its customers based on what they’ve already purchased. Or a marketing department might want an app that can analyze the company’s Salesforce data to provide insights about what products need an extra push and which customers and prospects they should target.
Low- and no-code app builders let users assemble app components using either a drag-and-drop visual format or a question-and-answer wizard—with no coding and minimal technical know-how required.
OpenAI Generative Pre-trained Transformer 3 (GPT-3)
OpenAI is an AI research company that’s been making headlines for years with its powerful yet quirky AI applications, including the various iterations of its Generative Pre-trained Transformer or GPT. GPT is a language generation tool that can create text based on a natural language prompt like a sentence or even a short phrase.
What is GPT-3 text generator?
GPT-3, the latest full-release version, is available through an API, allowing other developers to use its AI in their own apps. We’ll talk more about GPT-3 in just a moment, because—unlike the majority of AI advances on this list—it’s potentially useful for eDiscovery professionals.
OpenAI’s work isn’t limited to language generation; the company also created DALL·E 2, an AI system that translates natural language descriptions into images. Want to see how Andy Warhol might have depicted an astronaut riding a horse? DALL·E 2 can do that. It can also edit existing images based on natural language captions. Need to add a flamingo to that picture? No problem. DALL·E 2 can even create variations on a theme and expand images. If you’ve ever wondered what might have been in the room with Vermeer’s Girl With a Pearl Earring, DALL·E 2 can fill in her surroundings for you.
But AI systems aren’t just for fun—they’re solving serious problems too.
AlphaFold and other scientific AI systems
AlphaFold is a protein structure database developed by DeepMind and EMBL’s European Bioinformatics Institute that uses AI—informed by millions of examples of proteins—to predict the three-dimensional structure of a protein. Whereas GPT-3 studies language and uses its database to predict the next word that should appear in a sentence, AlphaFold examines its database of amino acids to determine how a given sequence will fold into a stable arrangement. AlphaFold has been tested and proven highly reliable at “predicting protein structures to near experimental accuracy” in most cases.
Artificial Intelligence is also being used to advance science and public health in other ways. For example, AI algorithms have been used to analyze cellphone data and predict the spread of COVID-19 so that localities can better allocate resources and plan for outbreaks.
Just this December, OpenAI released its latest iteration of GPT: ChatGPT, a chatbot that uses what OpenAI refers to as its GPT-3.5 model. ChatGPT can engage in conversation, answer questions, compose poetry, create or troubleshoot code, and more.
ChatGPT is impressively adept at generating clear explanations with excellent grammar and an authoritative, human-sounding tone. Unfortunately, it’s also capable of authoritatively stating complete nonsense and non-factual information.
As with GPT-3, ChatGPT is designed to refuse to answer unethical requests, such as how to break into someone’s home. But that doesn’t mean ChatGPT won’t be used unethically. There isn’t currently a way to detect whether specific text—such as an essay or final exam for a college student’s history class—was generated by ChatGPT instead of a human. This has left teachers understandably concerned that they won’t know when their students are using ChatGPT to cheat on their assignments.
Of course, ChatGPT isn’t a legal tool; it’s more of an amusing toy to explore and experiment with. The other GPT models, however, may have real applications for eDiscovery professionals. Let’s turn to those now.
Using GPT-3 (and GPT-4) in eDiscovery
As we’ve already touched on, Generative Pre-trained Transformer 3, commonly known as GPT-3, is a large-language text generation model. Without getting too far into the technical details, GPT-3 uses 175 billion parameters and 96 decoder layers to evaluate a data set of hundreds of billions of words—sourced from the internet—and then uses that knowledge to generate new text. Because training such a complex system is incredibly time- and resource-intensive, GPT-3 is a pre-trained model that has already learned the general rules of syntax, grammar, and more. That general model can then be fine-tuned for specific use cases.
GPT-3 is extremely capable for many tasks, but it’s subject to the same factual shortcomings as ChatGPT. That’s in line with its design and its intended purpose as a creative text generator. Creativity is a boon for an AI model that can compose a limerick or explain an argument for or against a position. But when that same tool is used to draft legal documents or provide legal advice, non-factual responses are a huge problem. OpenAI’s CEO has acknowledged this shortcoming, stating that while the existing GPT models “know a lot,  the danger is that [they are] confident and wrong a significant fraction of the time.”
What about Generative Pre-trained Transformer 4 or GPT-4? This next iteration is still in development; while it may be launched in 2023, OpenAI has not yet officially set a date for its release.
So, where could GPT-3 or the eventual GPT-4 be useful in eDiscovery? We can see the potential of these tools for generating search queries or expanding existing queries to identify potentially relevant data or keywords in the early stages of eDiscovery. They could also be used to generate regular expressions for tools like the ZyLAB Insights extraction platform.
As our Chief Data Scientist, Johannes (Jan) C. Scholtes, summed it up, “GPT-3 is very powerful, but it must be used carefully, as it does not have any notion of common sense or world knowledge. Those gaps make it fairly dangerous when it comes to notions of legal defensibility.” Lawyers who want to use GPT models to compose legal documents—where factuality is most definitely required—should be sure that a human lawyer has validated its results.
The future of artificial intelligence and its impact on the legal profession
We haven’t seen the end of these exciting new AI tools; in fact, we fully expect the pace of development to continue picking up. And that’s a good thing, because we’re going to need smarter tools to manage the increasing volume and complexity of data that businesses are generating.
We’ve been encouraging eDiscovery professionals to use AI tools in the earlier stages of the eDiscovery pipeline for a while now. The proliferation of corporate data demands new approaches to winnow down the universe of potentially relevant data long before the review stage. That’s why we’re so excited about Live Early Data Assessment (Live EDA), our proven in-place search solution.
Live EDA navigates and reviews live data across multiple repositories to provide insights and uncover potential risks before review or even data collection. Live EDA then compiles that information into a content index that tells the user where information is located, how long it’s been there, and who has access and modification privileges.
Of course, AI has potential applications for the legal profession beyond eDiscovery. We’ve already seen AI-powered legal research and legal chatbots like DoNotPay. Expect to see more AI-enabled solutions for contract preparation (as well as other legal document drafting), contract management, due diligence, and more.
All of these tools, however, demand that lawyers stay alert to the potential risks of AI and the ethical quandaries, including inadvertent bias, that can result from its use.
Artificial Intelligence as an essential part of efficient eDiscovery
Recent advances in AI have highlighted both the potential utility of this powerful technology and the significant risks that it poses. AI can churn through vast data stores and document repositories in minutes, delivering outstanding results and saving lawyers countless hours of drudgery. Savvy eDiscovery professionals will find ways to leverage AI across the entire eDiscovery pipeline while maintaining sufficient review and oversight to ensure that the technology is being used appropriately.