Amine Anoun is chief technology officer at Evisort, a contract management software company. Views are the author’s own.
We’ve all seen the headlines: ChatGPT has passed the bar with flying colors, is already being used extensively at law firms like Allen & Overy, and was even consulted by a judge in India during a court hearing to set bail. (For all we know, ChatGPT wrote those headlines and is writing many others daily – except, perhaps, in Italy, where it has been banned by a data protection authority.)
Recently, several contract lifecycle management (CLM) software providers have announced plans to leverage large language models (LLMs) like ChatGPT and GPT-4 to transform how lawyers interact with contracts. Contracts are language, after all, and predictive language AI has the potential to dramatically expedite contract clause creation, editing during a negotiation process, and other time-consuming contract management tasks.
At the same time, in-house counsel are being asked to weigh in on their corporations’ use of ChatGPT and other generative AI tools – such as Google’s Bard, code-drafting tools like CoPilot and CodeWhisperer, and image-creating apps like Dall-E and Replika. They’re finding that generative AI potentially raises issues around intellectual property (ownership of generated creations, copyrights on training sources), ethics (biases hard-coded into algorithms – for instance, in hiring), and the reliability of results (language-prediction algorithms are not fact-checkers and have even been known to “hallucinate” on occasion).
However, when it comes to contract management use cases of generative AI, data privacy looms large as a consideration – whether the legal team is interacting with an LLM in a way that might risk exposure of personal, sensitive, or other proprietary data. Certainly, no contract attorney wants to make headlines like two engineers at Samsung recently did by disclosing proprietary code via a publicly available generative AI site.
So, if a legal team uses generative AI for contract management, will that compromise their data privacy posture?
Well, it depends.
In this case, it depends on the answer to the following questions:
What exactly is the use case?
Even within the realm of contract management, the specific use case matters. For example, having an LLM draft new contractual clauses or suggest contractual text revisions during negotiations ostensibly poses little threat while offering potentially significant savings in time and manual effort. As long as the suggested clauses and edits are reviewed and approved by counsel before being implemented – and the “prompts” used to guide the LLM do not reveal confidential, personal, or otherwise sensitive information – these use cases certainly seem innocuous enough.
However, the benefits of AI-generated contract editing become far more profound when the suggestions invoke the historical contract data and the organization’s pre-approved contracting guidelines. So what are the data privacy implications when the generative AI capabilities are directly integrated with an organization’s existing contract data?
That brings us to the next question.
What data is being shared?
It’s important to assess what information the solution is sending to an outside LLM to generate its suggestions. Is the prompt merely general organizational guidelines (for example, “Indemnify only up to the value of the contract, no exception for legal fees”), or does the use case require sharing contractual language under negotiation with a supplier or customer to generate the suggested edits? Could that language expose personal or sensitive data protected under GDPR, CCPA/CPRA, HIPAA or other regulatory frameworks?
If there is a risk of sharing such data, then the next question becomes all the more important.
Is the LLM integrated via an API or is it public?
Much of the use of LLMs so far has been via public environments such as OpenAI’s ChatGPT portal (currently dubbed a “Free Research Preview”), which obviously is not designed toward ensuring reliable data privacy protection for enterprises. However, that is not the only option.
Contract technology providers can integrate LLMs to their solutions via APIs (GPT-4 can be licensed from OpenAI or Microsoft, for example), which not only provides greater control over the exposure, use, and retention of data submitted in prompts but also enables the provider to more deeply incorporate the LLM into their solution’s capabilities. (Indeed, if the tech provider is effectively just putting a wrapper around GPT, rather than augmenting it with extra training and guardrails, the corporation or firm may wish to consider licensing the LLM’s API directly to gain enhanced data privacy with essentially the same functionality.)
Even with a licensed API approach, it’s still wise to consider the following two questions.
What is the data retention policy?
Will the shared data be deleted in the near term or retained for a longer period? Retained data not only increases the potential of breach, but also potentially triggers rights to have personal data deleted or corrected under GDPR and CCPA. OpenAI has implemented a 30-day deletion policy for all data shared via its API; this policy does not currently apply to the ChatGPT portal.
Will the data be used for training?
Similarly, will the data you provide be used to help train a publicly-available AI model, or do you have the ability to opt out of having your data used for training altogether? (OpenAI has gone further and implemented an “opt-in” policy for model training for its API users.)
If there is no opt-out for AI training, can you be confident that your contract data will be successfully anonymized to ensure there is no exposure? While there are effective techniques to achieve data anonymization in the context of training algorithms, it’s important to ensure that the solution provider has adequate data science expertise and employs reliable procedures to protect your data.
Putting AI and data privacy to the test.
As with any application of new legal technology, it’s wise to conduct a live proof of concept when deploying a contract management platform. Creating and using an active environment with ingested contract data is an effective way to test not only the quality of the AI and surrounding platform, but also to experience the actual flow of information for data privacy due diligence.
This process allows for yet more questions to be answered: Does data need to be sent overseas for processing? Are human reviewers in the loop and, if so, what data privacy controls surround that process? Are other third-party services needed to implement or configure the solution, triggering further data privacy implications?
With these questions answered successfully, generative AI – and the broader range of contract AI – can bring powerful advantages to legal teams. Those benefits go far beyond accelerating contracting processes to helping legal professionals quickly address their companies’ critical business needs by understanding the data hidden in volumes of contracts.
“Legal team safely leverages contract AI to drive its business forward” might not make headlines, but it’s a worthy goal.