VIVI: Secure AI for the Enterprise
The CEO of a leading financial institution was excited to leverage the power of OpenAI’s language model to develop a cutting-edge investment analysis tool. They envisioned a system that could analyze market trends, predict stock prices, and provide personalized investment recommendations to their clients. However, their enthusiasm quickly turned to concern when they realized the implications of feeding their proprietary financial data, including sensitive client information and trading strategies, into a public AI system. The risk of data leakage, potential exposure to competitors, and the uncertainty surrounding data ownership and control were simply too great. They realized that while OpenAI could provide faster, more accurate analysis, it lacked the safeguards required to maintain the confidentiality of their proprietary trading strategies and client portfolios. This forced them to either abandon the efficiencies of AI or find a more secure solution that would ensure the confidentiality and protection of their sensitive data.
This is precisely why our AI Agent Platform, VIVI, is built on Microsoft Azure OpenAI. VIVI leverages Azure’s private data lakes and enhanced security features to provide a secure and controlled environment for businesses to develop and deploy AI agents. With VIVI, organizations can harness the power of advanced language models while maintaining complete control over their proprietary data and ensuring compliance with industry regulations. This white paper delves deeper into the security advantages of Azure OpenAI and how VIVI utilizes these features to empower businesses to confidently embrace AI innovation, while mitigating the risks associated with public AI systems.
Public AI and Data Security
Public AI systems, such as those offered by OpenAI, have advanced natural language processing and AI capabilities. However, when it comes to enterprise use cases involving sensitive data, security and privacy become critical concerns. These models are often trained on vast datasets of public information, raising inherent risks about the potential exposure of proprietary or confidential data.
OpenAI’s current policy states that they do not use data submitted through their API to train or improve their models by default. This applies to all API customers, including those using ChatGPT Enterprise and other business offers. Users can explicitly opt-in to share data for model improvement.
OpenAI generally retains API data for up to 30 days to provide services and identify abuse but offers zero data retention (ZDR) for eligible endpoints for qualifying use cases. While OpenAI is committed to GDPR compliance and offers a Data Processing Addendum (DPA) to support this, concerns remain about data sharing with third parties and transparency regarding training data sources.
For enterprises dealing with sensitive information, the limitations of public AI systems, including potential risks that will be explored further in this paper, highlight the need for a more robust and controlled environment. This is where Azure OpenAI, with its private data lakes and integration with Microsoft’s enterprise-grade security infrastructure, offers a robust alternative.
Azure OpenAI and the Advantage of Private Data Lakes
Azure OpenAI, used by the VIVI platform, provides several key advantages for businesses dealing with sensitive information:
- Enhanced data security: Your data is stored in a private environment within Azure and is not shared with OpenAI or other third parties. This includes not sharing data with other Azure OpenAI customers.
- Data isolation: Customer interactions with the model are logically isolated and secured using various technical measures, including transport encryption, compute security perimeter, and exclusive access to allocated GPU memory.
- No model training with user data: Azure OpenAI explicitly states that user data is not used to train, retrain, or improve their models. Furthermore, your data is not used to improve any Microsoft or third-party products without your explicit permission.
- Compliance and control: Azure OpenAI integrates with Microsoft’s robust enterprise security and compliance controls, including Azure role-based access control (Azure RBAC) and Virtual Networks. This layered security model provides granular control over data access and ensures compliance with industry standards.
- Data sovereignty: You retain ownership and control over your data, ensuring that sensitive information does not leave the secure boundaries of your infrastructure.
Benefits of Private Data Lakes for Internal AI Training
Private data lakes offer several key advantages for AI training, including:
- Data Confidentiality and Exclusivity: Your data is stored securely and used exclusively to train your own AI models. It is not shared with other users, competitors, or even Microsoft, ensuring complete confidentiality and control.
- Customization and Tailored Solutions: Private data lakes allow you to customize the training process to fit your specific needs. You can choose which data to use, how to prepare it, and which algorithms to apply, resulting in AI models that are tailored to your unique use cases.
- Complete Control over Model Training: You have full control over the entire model training process, from data selection and preparation to algorithm selection and model evaluation. This ensures that your AI models are developed according to your specific requirements and standards.
- Reduced Hallucinations: Training on your private data lake provides the model with domain-specific knowledge, factual grounding, and contextual understanding, reducing the likelihood of generating inaccurate or nonsensical outputs (hallucinations).
Risks of Using Public AI for Training with Sensitive Data
Using public AI for training with sensitive data poses several risks:
- Data leakage: Accidental or intentional data leaks can result in intellectual property theft or loss of competitive advantage. For example, employees might inadvertently share confidential information with a public AI model, leading to unintended exposure.
- Model inversion attacks: Cybercriminals can potentially reconstruct sensitive information by analyzing an AI model’s input and output data. This type of attack exploits the inherent learning capabilities of AI models to extract sensitive information from their responses.
- Compliance challenges: Ensuring compliance with data security and privacy regulations can be complex and challenging. Public AI systems may not always meet the specific requirements of regulations like GDPR or HIPAA, putting organizations at risk of non-compliance.
- Unforeseen vulnerabilities: Public AI models may have unknown vulnerabilities that could be exploited by malicious actors. As these models are constantly evolving, new vulnerabilities may emerge that could be exploited to gain unauthorized access to data or manipulate the model’s behavior.
- Lack of transparency: AI companies are not always transparent about the data used for training their models, making it difficult to assess potential privacy risks. For example, OpenAI does not fully disclose the data sources used to train its models, raising concerns about the inclusion of sensitive information.
- Lack of end-to-end encryption: Some public AI tools, like Google Gemini, do not offer end-to-end encryption for user interactions, increasing the risk of data interception. This means that conversations with these models could be accessed by unauthorized parties.
Conclusion
Harnessing AI while safeguarding data is critical for enterprises. Azure OpenAI, with its private data lakes and security infrastructure, offers a solution for businesses to leverage AI while maintaining control over their information. By choosing platforms like VIVI built on Microsoft Azure OpenAI, organizations can embrace AI innovation, knowing their data is protected.