Embedding model: gte-Qwen
GPU: Nvidia Tesla T4 (12 GB vRAM)
This is a list of sample documents and the time taken to process them. It includes extracting the text from the document, splitting it up into appropriately sized chunks, embedding the text into vector format and storing the extracted text and vector representations in the vector database.
Document | Description | Chunks | Time Taken (seconds) |
---|---|---|---|
10 pages, 2155 words | 20 | 7 | |
Example
To be safe, we assume, on average, 10 pages per document based on ChatGPT, but in our experience at AGAT, the ratio is four pages per document.
100,000 documents would take approximately 8 days with 1 embedding GPU
or 3 days using the embedding GPU + LLM GPU temporarily (on prem)
or 2 days using 4 GPUs (cloud).
12m documents with 10 RTX 4090 GPUs would take approximately 8 weeks.
Depends on source of documents (e.g. SharePoint) - would need extra time to download each file
GPU | System Name | Chunks | Size | Tokens | Amount Of Docs | Embedding Time | Tokens / Minute | Chunks / Minute |
---|---|---|---|---|---|---|---|---|
2 x L4 | Small | 17,980 | 113 MB | 4,537,270 | 138 | 36 Mins | 126,035 | 499 |
1 x H100 NVL | Medium | 17,980 | 113 MB | 4,537,270 | 138 | 53 Mins | 85,609 | 339 |
2 x H100 NVL | Large | 17,980 | 113 MB | 4,537,270 | 138 | 24 Mins | 188,340 | 746 |
Examples
System Name | Size | Tokens | Files | Time |
---|---|---|---|---|
Small 2 x L4 | 10 Million | 80 Mins | ||
Small 2 x L4 | 12 Million | 60 Days | ||
Large 2 x H100 | 12 Million | |||
Large 2 x H100 | 30 TB / 31m MB |