Homomorphic Encryption Protects AI Conversations

▼ Summary
– User conversations with AI chatbots like ChatGPT and Grok are being exposed in search results, revealing privacy vulnerabilities in current systems.
– Duality’s private LLM framework uses fully homomorphic encryption (FHE) to process user queries without ever decrypting the data, protecting both the prompt and the AI’s response.
– The current prototype supports smaller models like Google’s BERT, requiring minor algorithmic tweaks for FHE compatibility but no model retraining.
– A major challenge is FHE’s computational slowness due to large data sizes and a complex operation called bootstrapping, which the team is addressing through algorithmic improvements and hardware acceleration.
– This encrypted approach enables secure data analysis in sensitive fields like healthcare and finance, offering a future where data utility doesn’t require exposing private information.
Recent incidents have revealed a troubling vulnerability: conversations with popular AI chatbots can inadvertently surface in public search results, raising serious questions about digital privacy. This exposure isn’t limited to one platform; prompts shared with various AI assistants have been found on public feeds. A potential solution, however, is emerging from the field of advanced cryptography, offering a way to interact with artificial intelligence without sacrificing confidentiality.
Duality, a firm focused on privacy-enhancing technologies, is developing a framework for private large language model (LLM) inference. The core of their innovation is fully homomorphic encryption (FHE), a powerful cryptographic method that allows computations to be performed directly on encrypted data. There is no need for decryption at any point during the process. Here’s how it functions: a user’s prompt is first encrypted using FHE. This scrambled query is then sent to the LLM, which processes it while still in its encrypted state. The model generates an encrypted response, which is sent back to the user for decryption.
According to Kurt Rohloff, Duality’s cofounder and CTO, this approach means users “can decrypt the results and get the benefit of running the LLM without actually revealing what was asked or what was responded.” Currently, the framework operates as a prototype supporting smaller models like Google’s BERT. The team made specific adjustments to ensure these models work efficiently with FHE, such as substituting certain complex functions with close approximations. These minor modifications allow the AI to perform normally without requiring any retraining of the underlying model.
Despite its powerful security credentials, FHE faces significant performance hurdles. It is considered secure against future quantum computers, but the encryption process itself can be slow and resource-intensive. Rashmi Agrawal of CipherSonic Labs explains that FHE relies on lattice-based cryptography, which dramatically increases data size. This results in very large encrypted files and keys that demand substantial memory. A major computational bottleneck is an operation called bootstrapping, which is necessary to periodically clean noise from the encrypted data. Agrawal notes that “this particular operation is really expensive, and that is why FHE has been slow so far.”
To tackle these challenges, Duality is refining an FHE scheme known as CKKS, which is particularly well-suited for machine learning tasks. This scheme efficiently handles large vectors of real numbers, achieving high throughput. Key improvements include integrating a recent advancement called functional bootstrapping, which enables efficient homomorphic comparisons of large vectors. All these developments are contributed to OpenFHE, an open-source library that Duality helps maintain, fostering community-driven progress.
Hardware acceleration is another critical component for making FHE practical for larger LLMs. Specialized hardware like GPUs, FPGAs, and ASICs can speed up computations by several orders of magnitude. Duality has incorporated a hardware abstraction layer into OpenFHE to seamlessly switch from standard CPUs to these faster alternatives. Agrawal concurs that GPUs and FPGAs are ideal for this task due to their speed and high-bandwidth memory connections, with FPGAs offering the added benefit of being customizable for specific FHE workloads.
Looking ahead, Duality is advancing its private inference framework from a prototype to a production-ready system. The company is also exploring ways to safeguard other AI operations, such as fine-tuning models on specialized data and performing semantic searches, all under the protection of encryption.
FHE is part of a broader toolkit for preserving privacy in AI, which also includes techniques like differential privacy and confidential computing. While confidential computing has been available longer, Agrawal points out a significant limitation: it cannot support GPUs, making it a poor fit for the computational demands of LLMs. She asserts that “FHE is strongest when you need noninteractive end-to-end confidentiality because nobody is able to see your data anywhere in the whole process of computing.”
The implications of fully encrypted LLMs are profound. In healthcare, clinical data could be analyzed without exposing patient records. Financial institutions could detect fraud without sharing bank account details. Companies could safely leverage cloud computing without risking their proprietary information. Ultimately, user dialogues with AI assistants could be truly private. We are witnessing a renaissance in privacy technologies that enables secure data collaboration, allowing us to gain powerful insights from our sensitive information without the need to expose it.
(Source: Spectrum IEEE)





