/
Embedding Estimates Nov 2024

Embedding Estimates Nov 2024

Embedding model: gte-Qwen

GPU: Nvidia Tesla T4 (12 GB vRAM)

 

This is a list of sample documents and the time taken to process them. It includes extracting the text from the document, splitting it up into appropriately sized chunks, embedding the text into vector format and storing the extracted text and vector representations in the vector database.

Document

Description

Chunks

Time Taken (seconds)

Document

Description

Chunks

Time Taken (seconds)

PDF

10 pages, 2155 words

20

7

 

 

 

 

Example

To be safe, we assume, on average, 10 pages per document based on ChatGPT, but in our experience at AGAT, the ratio is four pages per document.

100,000 documents would take approximately 8 days with 1 embedding GPU

 

or 3 days using the embedding GPU + LLM GPU temporarily (on prem)

or 2 days using 4 GPUs (cloud).

 

12m documents with 10 RTX 4090 GPUs would take approximately 8 weeks.

Depends on source of documents (e.g. SharePoint) - would need extra time to download each file

 

GPU

System Name

Chunks

Size

Tokens

Amount Of Docs

Embedding Time

Tokens / Minute

Chunks / Minute

GPU

System Name

Chunks

Size

Tokens

Amount Of Docs

Embedding Time

Tokens / Minute

Chunks / Minute

2 x L4

Small

17,980

113 MB

4,537,270

138

36 Mins

126,035

499

1 x H100 NVL

Medium

17,980

113 MB

4,537,270

138

53 Mins

85,609

339

2 x H100 NVL

Large

17,980

113 MB

4,537,270

138

24 Mins

188,340

746

Examples

System Name

Size

Tokens

Est Num Pages

Files

Time

System Name

Size

Tokens

Est Num Pages

Files

Time

Small 2 x L4

 

10 Million

30,000 pages

 

80 Mins

Small 2 x L4

 

 

 

12 Million

60 Days

Large 2 x H100

 

 

 

12 Million

 

Large 2 x H100

30 TB / 31m MB