Privacy & Data
This page describes where OMD Cleo stores data and what information may be transmitted to external Large Language Model (LLM) providers.
Data storage
PostgreSQL (primary database)
All conversation state is persisted in PostgreSQL:
| Table | Data stored |
|---|---|
conversations |
Channel type, room ID, workflow ID, OMD instance, config ID, language, model, participant user IDs, conversation summary |
messages |
Full message content (LangChain message JSON), sender ID, event ID, timestamp, thread references |
room_user_mappings |
Mapping of user IDs to conversation IDs per workflow |
manifests |
System prompt instructions (prompts) per tenant and role scope |
system_prompt_versions |
Versioned history of tenant system prompt edits |
schedules |
Scheduled workflow and code execution jobs |
mem0_history |
History of memory add/update operations |
Every chat message — including full text — is stored in PostgreSQL together with sender identifiers.
ChromaDB (vector store)
| Collection | Data stored |
|---|---|
customer_documents |
Embeddings (and optionally text excerpts) of documents from the OMD Document Hub |
documentation_documents |
Embeddings of OMD documentation pages |
conversation_summaries |
Embeddings of conversation summaries from past sessions |
memories |
User and organisation memory facts (text + embeddings) |
mem0 / Memgraph (graph memory)
When Memgraph is enabled, memory facts are stored as a property graph (entities and relations). This may include names, roles, preferences, and other attributes mentioned in conversations.
GitHub (optional)
When the GitHub memory backend is configured, memory facts are committed as files to a GitHub repository. The repository may therefore contain personal information extracted from conversations.
Data sent to LLM providers
Each time the agent generates a response, a request is sent to the configured LLM provider. The following data is included.
System prompt
The system prompt is assembled per conversation and may contain:
- Agent role instructions and output-format rules.
- The tenant's base prompt and role-specific manifest.
- User memory: facts previously extracted from the user's past conversations.
- Organisation memory: facts relevant to the organisation.
- Names and user IDs of conversation participants.
- Enabled tool descriptions.
- Current UTC date and time.
- When the
sql_botworkflow is active: the full StarRocks database schema for the OMD instance.
Conversation history
All messages in the current conversation window are sent, subject to the configured token limit.
OMD business data (via tool calls)
Tool call results are appended to the conversation history and sent to the LLM in subsequent turns. This data can include:
- Task details: IDs, descriptions, addresses, customer data, dates, status.
- Trip and route data: resource IDs, territories, time windows, locations.
- Customer and entity data from OData queries.
- StarRocks SQL query results (workforce analytics data).
- Superset chart and dashboard metadata.
Data residency
OMD Cleo is hosted on OMD's EU Kubernetes cluster by default. For customers with strict data-residency requirements, Cleo can be deployed into a customer-owned Kubernetes cluster. Contact OMD to discuss deployment options.
Tenant isolation
All data is strictly scoped to the tenant it belongs to. Cross-tenant data access is architecturally impossible — queries, memory lookups, and document retrieval are always filtered by tenant_id.
LLM provider selection
By default, OMD Cleo uses Azure OpenAI capacity under OMD's contract (EU region). Tenant administrators can register their own LLM endpoints, including on-premises deployments, to keep data within their own infrastructure.
Microsoft AI Foundry Resources
Data, privacy, and security for Azure Direct Models in Microsoft Foundry