Embedding model: gte-Qwen
...
Document | Description | Chunks | Time Taken (seconds) |
---|---|---|---|
10 pages, 2155 words | 20 | 7 | |
Example
To be safe, we assume, on average, 10 pages per document based on ChatGPT, but in our experience at AGAT, the ratio is four pages per document.
Info |
---|
100,000 documents would take approximately 8 days with 1 embedding GPU |
or 3 days using the embedding GPU + LLM GPU temporarily (on prem)
or 2 days using 4 GPUs (cloud).
12m documents with 10 RTX 4090 GPUs would take approximately 8 weeks.
Depends on source of documents (e.g. SharePoint) - would need extra time to download each file
GPU | System Name | Chunks | Size | Tokens | Amount Of Docs | Embedding Time | Tokens / Minute | Chunks / Minute |
---|---|---|---|---|---|---|---|---|
2 x L4 | Small | 17,980 | 113 MB | 4,537,270 | 138 | 36 Mins | 126,035 | 499 |
1 x H100 NVL | Medium | 17,980 | 113 MB | 4,537,270 | 138 | 53 Mins | 85,609 | 339 |
2 x H100 NVL | Large | 17,980 | 113 MB | 4,537,270 | 138 | 24 Mins | 188,340 | 746 |
Examples
System Name | Size | Tokens | Files | Time |
---|---|---|---|---|
Small 2 x L4 | 10 Million | 80 Mins | ||
Small 2 x L4 | 12 Million | 60 Days | ||
Large 2 x H100 | 12 Million | |||
Large 2 x H100 | 30 TB / 31m MB |