Privacy & Data

This page describes where OMD Cleo stores data and what information may be transmitted to external Large Language Model (LLM) providers.

Data storage

PostgreSQL (primary database)

All conversation state is persisted in PostgreSQL:

Table	Data stored
`conversations`	Channel type, room ID, workflow ID, OMD instance, config ID, language, model, participant user IDs, conversation summary
`messages`	Full message content (LangChain message JSON), sender ID, event ID, timestamp, thread references
`room_user_mappings`	Mapping of user IDs to conversation IDs per workflow
`manifests`	System prompt instructions (prompts) per tenant and role scope
`system_prompt_versions`	Versioned history of tenant system prompt edits
`schedules`	Scheduled workflow and code execution jobs
`mem0_history`	History of memory add/update operations

Every chat message — including full text — is stored in PostgreSQL together with sender identifiers.

ChromaDB (vector store)

Collection	Data stored
`customer_documents`	Embeddings (and optionally text excerpts) of documents from the OMD Document Hub
`documentation_documents`	Embeddings of OMD documentation pages
`conversation_summaries`	Embeddings of conversation summaries from past sessions
`memories`	User and organisation memory facts (text + embeddings)

mem0 / Memgraph (graph memory)

When Memgraph is enabled, memory facts are stored as a property graph (entities and relations). This may include names, roles, preferences, and other attributes mentioned in conversations.

GitHub (optional)

When the GitHub memory backend is configured, memory facts are committed as files to a GitHub repository. The repository may therefore contain personal information extracted from conversations.

Data sent to LLM providers

Each time the agent generates a response, a request is sent to the configured LLM provider. The following data is included.

System prompt

The system prompt is assembled per conversation and may contain:

Agent role instructions and output-format rules.
The tenant's base prompt and role-specific manifest.
User memory: facts previously extracted from the user's past conversations.
Organisation memory: facts relevant to the organisation.
Names and user IDs of conversation participants.
Enabled tool descriptions.
Current UTC date and time.
When the sql_bot workflow is active: the full StarRocks database schema for the OMD instance.

Conversation history

All messages in the current conversation window are sent, subject to the configured token limit.

OMD business data (via tool calls)

Tool call results are appended to the conversation history and sent to the LLM in subsequent turns. This data can include:

Task details: IDs, descriptions, addresses, customer data, dates, status.
Trip and route data: resource IDs, territories, time windows, locations.
Customer and entity data from OData queries.
StarRocks SQL query results (workforce analytics data).
Superset chart and dashboard metadata.

Data residency

OMD Cleo is hosted on OMD's EU Kubernetes cluster by default. For customers with strict data-residency requirements, Cleo can be deployed into a customer-owned Kubernetes cluster. Contact OMD to discuss deployment options.

Tenant isolation

All data is strictly scoped to the tenant it belongs to. Cross-tenant data access is architecturally impossible — queries, memory lookups, and document retrieval are always filtered by tenant_id.

LLM provider selection

By default, OMD Cleo uses Azure OpenAI capacity under OMD's contract (EU region). Tenant administrators can register their own LLM endpoints, including on-premises deployments, to keep data within their own infrastructure.

Microsoft AI Foundry Resources

Data, privacy, and security for Azure Direct Models in Microsoft Foundry

Microsoft Products and Services Data Protection Addendum