Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Based on ChatGPT

Based on the AGAT experiment

Number of chunks per document

10

3.5

The average size of one document

70K

Size on disk for 100,000 Documents -

7.2 GB

2.8 GB

Depending on how many pages of content - you can calculate the estimated size of DB needed for the product.

...

 

Site A

Site B

Word                     No. of Files 

                               Size  

 Average word file size

1214  

240 MB 

200Kb

480 

400 MB

800Kb 

Excel                     No. of Files 

                              Size 

 

283  

37 MB 

281 

24 MB 

 

PowerPoint          No. of Files 

                               Size  

 

472  

2368 MB 

27  

74 MB 

Pdf                        No. of Files  

                              Size  

 Average file size

1372  

2262 MB 

1600 Kb

975 

407 MB 

400Kb

Number of Documents

...

chunks

Each Chunk is 500 tokens.

...

  1. Estimate the number of tokens per page:

    • You mentioned Assume that 100 tokens are around 75 words.

    • A typical page is around 500 to 600 words, translating to approximately 700 tokens.

  2. Determine the cost per token:

    • The cost for embeddings (Text Embedding 3 small) is $0.02 for 1 million tokens.

  3. Calculate the cost for one page:

    • Since 700 tokens are on a typical page, we can calculate the cost as follows:

    Cost per page=700 tokens1,000,000 tokens×0.02 USD\text{Cost per page} = \frac{700 \text{ tokens}}{1,000,000 \text{ tokens}} \times 0.02 \text{ USD} Cost per page=1,000,000 tokens700 tokens​×0.02 USD Cost per page=0.000014 USD\text{Cost per page} = 0.000014 \text{ USD}Cost per page=0.000014 USD

So, the cost for embedding a typical page (approximately 700 tokens) would be per page.

Or in other words - 1$ $1 can produce 70K pages

Questions

The average price per question is $0.0059, meaning $1 can produce 170 questions.

1 million chunks ( take 8GB of RAM.