SharePoint and OneDrive Connectors: File Upload Security

Perplexity's SharePoint connector and OneDrive connector enable Enterprise Pro and Enterprise Max users to search their organization's SharePoint sites and OneDrive files directly through AI-powered natural language queries.

We have recently introduced a hybrid search architecture that addresses both data security concerns and scalability limitations through two distinct search modes: High-Precision Search (indexing-based) and Standard Search (API-based).

Search Architecture: Two-Tier Approach

Standard Search (API-Based)

How It Works:

Queries SharePoint/OneDrive directly via Microsoft's Search API at query time
No file copies stored in Perplexity infrastructure, aside from results found from queries (can be mitigated by data retention policy settings)
Searches across the user's entire SharePoint/OneDrive drive without file count limitations
Available to all Enterprise Pro/Max users immediately upon connector activation

Security Model:

Zero Data Retention: Files are not copied or stored in Perplexity systems, aside from results found from queries (can be mitigated by data retention policy settings)
Real-time Permissions: Respects SharePoint's/OneDrive’s native access controls dynamically
Minimal Data Copying: Only citation snippets included in answers are retained
No Model Training: Synced files are never used to train AI models
Immediate Access Revocation: When SharePoint/OneDrive permissions change, access is immediately reflected in Perplexity

Use Case: Organizations requiring maximum data privacy and minimal data footprint, especially for searching across millions of files at enterprise scale.

High-Precision Search (Indexing-Based)

How It Works:

Users select specific files/folders to sync for local indexing in Perplexity
Files are downloaded, parsed, and stored in dedicated AWS S3 buckets with vector embeddings in Vespa
Enables deeper semantic analysis and more comprehensive answers
File Limits: 500 files per project (Enterprise Pro), 5,000 files per Project (Enterprise Max)
Total User Limits: 15,000 files (Enterprise Pro), 50,000 files (Enterprise Max)

Security Model:

Dedicated Storage: Each organization's files stored in isolated AWS S3 "folders" with unique namespaces in Vespa vector storage
Encryption: AES-256 encryption at rest, TLS encryption in transit
Role-Based Access Control (RBAC): Minimum privilege access enforced across all systems
No Model Training: Synced files are never used to train AI models
Automatic Sync: File changes/deletions in SharePoint/OneDrive are automatically reflected in Perplexity

Use Case: Teams requiring maximum answer accuracy for frequently accessed documents, project-specific file collections, or collaborative projects with curated content.

Backend Architecture & Data Flow

Connection & Authentication

Admin Enablement: Organization admins enable SharePoint/OneDrive connector in Permissions settings
User Authentication: Users authenticate via OAuth 2.0 through Microsoft Entra (Azure AD)
Site Selection: Users select specific SharePoint sites to connect
Admin Consent: Microsoft admins may need to grant organization-wide consent for Perplexity app in Microsoft Entra

When Both Methods Are Active:

Perplexity queries both the local index and SharePoint API concurrently
Results are re-ranked to prioritize the most relevant sources
Citations link directly back to SharePoint for full file access

Indexing Process (High-Precision Search)

File Sync and Storage:

File Selection: User selects files/folders through Perplexity UI
Download: Files downloaded from SharePoint via Microsoft Graph API
Storage: Raw files stored in AWS S3 with dedicated organizational namespaces
Parsing: Text extraction from supported formats (PDF, DOCX, XLSX, PPTX, CSV, TXT, MD, JSON)
Vectorization: Content converted to embeddings and stored in Vespa vector database
Metadata Indexing: File metadata (name, path, permissions) indexed for search retrieval
Removal: When a user disconnects SharePoint/OneDrive from Perplexity, they can choose to remove any indexed files

Security & Compliance Infrastructure

Permission Enforcement

SharePoint/OneDrive-Side Permissions:

If a user loses access to a file in SharePoint/OneDrive, that file is immediately removed from Perplexity
File deletions in SharePoint/OneDrive trigger immediate removal from Perplexity index
Users can only search files they have explicit SharePoint/OneDrive permissions to access

Perplexity-Side Permissions:

Admins control which users can access connectors via Organization settings
Files synced to projects are searchable by Project members, but file content requires SharePoint/OneDrive permissions
Session sharing respects organizational sharing policies set by admins

Best Practices

When to Use High-Precision Search (Indexing)

Project-specific file collections in projects
Frequently accessed knowledge base documents
Files requiring deep semantic analysis
Collaborative team environments with curated content

When to Use Standard Search (API-Only)

Searching across vast file repositories (millions of files)
Strict data residency and minimal data copying requirements
Exploratory searches across infrequently accessed files
Organizations with heightened security/compliance constraints

Introduction to File Connectors for Enterprise Organizations

How to use the Microsoft SharePoint connector

Microsoft OneDrive connector

Using the Dropbox connector

Google Drive Connector: File Upload Security