Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is a service that Classifies and analyses data at rest against policies.

Firewall API

This is a website exposing API to be used in real-time for inspection and classification.

Gateway Linux Server / Containers

...

The loader extracts the text from the content items and cleans unnecessary parts from the content, such as email signatures and disclaimers in emails.

Embedding service

Gets data from the Loader, splits it into chunks, and transforms the chunked content into vectors.

...

The embedding vectors will be stored in a vector DB - we’ve elected to use Chroma DB.Postgres Vector BB

Embedding AI model

For the private AI BusinessGPT uses GTE Qwen2 (by AliBaba)

This model supports 29 languages, among them Hebrew :

https://ollama.com/library/qwen2:1.5b-instruct

image-20241212-083043.pngImage Added

https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct

According to the MTEB benchmarks (link) this model is better than the one used by OpenAI ( ChatGPT) is is most cost-effective in terms of resources needed

Vector DB

The Vector DB stores embedding as vector content and the chunked text with metadata. 

...