Skip to main content

SharePoint and OneDrive Connectors: File Upload Security

How and when Perplexity connects to SharePoint and OneDrive via API or indexed modes, without training models on your data

Written by Emilio Morales
Updated over a month ago

Perplexity's SharePoint connector and OneDrive connector enable Enterprise Pro and Enterprise Max users to search their organization's SharePoint sites and OneDrive files directly through AI-powered natural language queries.

We have recently introduced a hybrid search architecture that addresses both data security concerns and scalability limitations through two distinct search modes: High-Precision Search (indexing-based) and Standard Search (API-based).

Search Architecture: Two-Tier Approach

Standard Search (API-Based)

How It Works:

  • Queries SharePoint/OneDrive directly via Microsoft's Search API at query time

  • No file copies stored in Perplexity infrastructure, aside from results found from queries (can be mitigated by data retention policy settings)

  • Searches across the user's entire SharePoint/OneDrive drive without file count limitations

  • Available to all Enterprise Pro/Max users immediately upon connector activation

Security Model:

  • Zero Data Retention: Files are not copied or stored in Perplexity systems, aside from results found from queries (can be mitigated by data retention policy settings)

  • Real-time Permissions: Respects SharePoint's/OneDrive’s native access controls dynamically

  • Minimal Data Copying: Only citation snippets included in answers are retained

  • No Model Training: Synced files are never used to train AI models

  • Immediate Access Revocation: When SharePoint/OneDrive permissions change, access is immediately reflected in Perplexity

Use Case: Organizations requiring maximum data privacy and minimal data footprint, especially for searching across millions of files at enterprise scale.

High-Precision Search (Indexing-Based)

How It Works:

  • Users select specific files/folders to sync for local indexing in Perplexity

  • Files are downloaded, parsed, and stored in dedicated AWS S3 buckets with vector embeddings in Vespa

  • Enables deeper semantic analysis and more comprehensive answers

  • File Limits: 500 files per Space (Enterprise Pro), 5,000 files per Space (Enterprise Max)

  • Total User Limits: 15,000 files (Enterprise Pro), 50,000 files (Enterprise Max)

Security Model:

  • Dedicated Storage: Each organization's files stored in isolated AWS S3 "folders" with unique namespaces in Vespa vector storage

  • Encryption: AES-256 encryption at rest, TLS encryption in transit

  • Role-Based Access Control (RBAC): Minimum privilege access enforced across all systems

  • No Model Training: Synced files are never used to train AI models

  • Automatic Sync: File changes/deletions in SharePoint/OneDrive are automatically reflected in Perplexity

Use Case: Teams requiring maximum answer accuracy for frequently accessed documents, project-specific file collections, or collaborative Spaces with curated content.

Backend Architecture & Data Flow

Connection & Authentication

  • Admin Enablement: Organization admins enable SharePoint/OneDrive connector in Permissions settings

  • User Authentication: Users authenticate via OAuth 2.0 through Microsoft Entra (Azure AD)

  • Site Selection: Users select specific SharePoint sites to connect

  • Admin Consent: Microsoft admins may need to grant organization-wide consent for Perplexity app in Microsoft Entra

When Both Methods Are Active:

  • Perplexity queries both the local index and SharePoint API concurrently

  • Results are re-ranked to prioritize the most relevant sources

  • Citations link directly back to SharePoint for full file access

Indexing Process (High-Precision Search)

File Sync and Storage:

  1. File Selection: User selects files/folders through Perplexity UI

  2. Download: Files downloaded from SharePoint via Microsoft Graph API

  3. Storage: Raw files stored in AWS S3 with dedicated organizational namespaces

  4. Parsing: Text extraction from supported formats (PDF, DOCX, XLSX, PPTX, CSV, TXT, MD, JSON)

  5. Vectorization: Content converted to embeddings and stored in Vespa vector database

  6. Metadata Indexing: File metadata (name, path, permissions) indexed for search retrieval

  7. Removal: When a user disconnects SharePoint/OneDrive from Perplexity, they can choose to remove any indexed files

Security & Compliance Infrastructure

Permission Enforcement

SharePoint/OneDrive-Side Permissions:

  • If a user loses access to a file in SharePoint/OneDrive, that file is immediately removed from Perplexity

  • File deletions in SharePoint/OneDrive trigger immediate removal from Perplexity index

  • Users can only search files they have explicit SharePoint/OneDrive permissions to access

Perplexity-Side Permissions:

  • Admins control which users can access connectors via Organization settings

  • Files synced to Spaces are searchable by Space members, but file content requires SharePoint/OneDrive permissions

  • Thread sharing respects organizational sharing policies set by admins

Best Practices

When to Use High-Precision Search (Indexing)

  • Project-specific file collections in Spaces

  • Frequently accessed knowledge base documents

  • Files requiring deep semantic analysis

  • Collaborative team environments with curated content

When to Use Standard Search (API-Only)

  • Searching across vast file repositories (millions of files)

  • Strict data residency and minimal data copying requirements

  • Exploratory searches across infrequently accessed files

  • Organizations with heightened security/compliance constraints