PgVector Scaling

To enable high availability and better performance replicas can be used.

Overview:
PostgreSQL’s native streaming replication creates an exact copy of your primary database, including all tables, indexes, and the pgvector extension’s index structures.
How It Helps:
- Load Balancing: You can direct read-only queries (including vector similarity searches) to replicas, reducing the load on your primary server.
- High Availability: Replicas also act as backups that can be promoted if the primary fails.
Considerations:
- Replication Lag: There might be a slight delay between writes on the primary and their appearance on replicas, which could be relevant depending on your application’s consistency needs.
- Uniformity: Because physical replication duplicates the primary exactly, the replica’s indexing structure will match the primary’s. If you require a different setup (for example, fewer indexes for write optimization on the primary and more indexes for fast queries on the replica), you might need to explore other replication strategies.

Overview:
Logical replication allows you to replicate data at the table level, giving you more flexibility to tailor the subscriber’s schema and indexing.
How It Helps:
- Customized Schema: You can choose to replicate only the data and then build indexes (including pgvector indexes) as needed on the replica.
- Selective Replication: Logical replication lets you replicate only specific tables or subsets of data.
Considerations:
- Setup Complexity: Logical replication requires additional configuration and maintenance (creating publications and subscriptions).
- Eventual Consistency: Similar to streaming replication, you might experience a lag, though you have more control over the replication process.

Memory sizes:

40gb maintenance_work_mem : NOTICE: hnsw graph no longer fits into maintenance_work_mem after 6291332 tuples