Functional FAQ
How are permissions managed?
When a user manually uploads a document/transcript or website, they only have permission to view and ask questions about that content.
When integrating systems like SharePoint, Teams chat, and Google Drive, Pragatix syncs the permission from the source system so only users who have access to the source system can ask questions or get answers from that content.
If permissions are changed in the source system, Pragatix syncs them back to the dashboard.
Company Chat and Assistant knowledge chats provide answers based on all the documents a user can access, not just the ones they’ve uploaded. Uploaded documents are private by default and only accessible to the uploader, unless they add permissions to others.
Permissions set in the Assistant Knowledge Chat page determine which users can see the assistant Knowledge chat in the Lite Mode page. However, they do not control access to the content itself and certainly can't add permissions beyond the source permissions. To receive an answer, a user must have both permission to view the assistant and permission to access the relevant content.
The screenshot below shows the error message that a user will see when he has access to assistant chat but not to the content of the assistant chat.
At which point are permissions applied
Every user can only see items he has access to and, therefore, can only ask questions on content he has access to.
When asking a question about all company data, the system will find the most relevant documents that the user can access, and only these documents will be sent to the AI to generate the answer.
How does the AI Firewall detect bias, and is there a specific mechanism to support ethical usage?
We don’t detect Bias. Instead we give the company the tools to manage the risks of Hallucinations and Bias by enforcing policies measuring the risks of different use cases. This allows the company to decide for which use case they are OK with potential Bias/ Hallucinations issues.
In addition- when asking on your company data – you reduce / remove Bias because the answers are based on your data and not pretrained data.
Technical FAQ
Where can Pragatix be hosted?
You can deploy it in your own AWS tenant or on-prem.
Where is our data stored?
In a database on your servers. It never gets sent to other servers.
How do you provide answers to questions without sending our data to the cloud?
We run a local LLM that is totally standalone and doesn’t require internet connectivity to be used. We currently use Meta’s Llama3 model, which is open source and free for products with less than 700 million users.
What hardware does the product require?
It requires one Windows server and two Linux servers as well as at least one GPU with a minimum of 24 GB of VRAM. More details here.
Do you train LLM models with customer data?
No, we store customer data in segregated vector databases and use the RAG methodology to provide answer generation.
Which components use the SQL server?
The Dashboard, Ingestor and Gateway
Is the Ingestor service developed by AGAT?
Yes.
Can I use my own API keys to access LLMs?
Yes, we currently support OpenAI, AWS Bedrock, Fireworks and OpenRouter API access. Others available upon request. Content embedding for search uses OpenAI or local embedding.
What types of GPUs can we use?
If you want to run your own LLMs and not use third party LLM API like Open AI/ Anthropic you need GPUs.
Datacenter GPUs prioritize reliability and memory capacity, while consumer GPUs offer a better price-to-performance ratio.
Key Differences:
Performance:
Datacenter GPUs like the NVIDIA A100 and H100 are designed for large-scale training, inference, and high-performance computing (HPC) tasks, offering superior memory bandwidth and computational power. Consumer GPUs, like the RTX 4090, while powerful, are not optimized for the same level of sustained high-load operation.
Reliability and Scalability:
Datacenter GPUs are built for 24/7 operation and are designed with enhanced cooling, extended warranties, and enterprise-level support. Consumer GPUs may throttle under prolonged stress.
Cost:
Datacenter GPUs are significantly more expensive than consumer GPUs, reflecting their specialized design and features. The higher cost can be justified by the performance gains and reliability for specific workloads.
Cost-Effectiveness:
Consumer GPUs, particularly high-end models like the RTX 4090, can offer a compelling price-to-performance ratio.
For smaller to mid-sized models and workloads, consumer GPUs can be a cost-effective alternative to datacenter GPUs, especially when considering the cost savings and scalability options.
Scalability of Data Processing
What is the maximum database size the tool can efficiently process?
The AI only requires the database schema within its context window. For models with large context windows (e.g., GPT-4o with 128K tokens), it can handle up to around 100 tables. The number of rows isn't a limiting factor, as the AI interacts with the database using standard SQL queries. With proper indexing, large datasets can be efficiently queried
Have experiments been conducted with datasets of 1GB, 10GB, or 100GB? If so, what were the performance results?
Yes. It is important to note that performance is not related to the DB size, as the AI engine translates the natural language prompt into SQL.
Handling and Processing New Data
When we upload docs, does the system store them once, or re-process them every time we ask a question?
When you upload a file to the system, it goes through a process of embedding when it is stored as a vector representation of the content after being converted into plain text and split into chunks.
Every time you ask a question on the document, the system passes the existing content.
How does the tool handle newly added data?
BGTP periodically scans data sources like SharePoint and Confluence and re-ingests items that have been modified as well as new items.
The Data Analysis tool always performs queries on the current data in the database.
What is the typical processing time for newly ingested data to be available for analysis, search, and other AI-driven tasks?
Individual new documents are available in seconds.
A small environment with 2 small GPUs can process 100 average documents in 25 mins.
Database Compatibility and Requirements
What types of databases can the tool connect to (e.g., relational, NoSQL, data lakes)?
Currently, we support only Transact-SQL databases such as SQL Server.
Are there specific requirements or constraints for database integration?
The only requirement is a connection string with read-only access to the relevant tables.
AI Agent Framework and Customization
Does the tool use an AI agent framework? If so, which one?
Can users create and customize their own AI agents? If so, how can they be connected to process data?
Yes, it's possible to integrate tools with the AI agent, such as custom APIs. This is done by passing predefined Python functions to the agent. The agent can then call these functions when appropriate. For example, you can define a function that sends an email and instruct the agent to use it when needed.
Integration with Data Pipelines
What integration options are available for incorporating the tool into existing data processing pipelines?
Are there APIs, SDKs, or other integration methods available?
This question is unclear—could you clarify what kind of integration you're referring to?
RBAC
Which LDAP directories are supported?
We work with Microsoft Active Directory using the LDAP protocol, either directly or via the Global Catalog (GC).
Hardware FAQ
How much disk space does BGPT need to store the contents of my files
The usual ratio we suggest is 30% if we’re ingesting content that is hosted elsewhere (e.g. SharePoint). So if you have 30 GB of files you want to ingest into our system 10 GB will need to be allocated for that purpose. We’ll store a link to the content but not the actual files.
If you upload files directly to our system, we’ll need to store them, so you’ll require 130% of the disk space.
See here for system sizing recommendations
How PDFs Are Handled
When you upload a PDF, our system automatically determines the way to extract its text content:
The system first uses PyMuPDF to inspect the PDF and check whether it contains selectable text.
If no text is found, the PDF needs OCR (Optical Character Recognition).
If no text is found, the system involves an OCR process (for scanned PDFs)
The PDF is split into individual pages.
Each page is processed with Tesseract OCR to extract text.
The recognized text from all pages is then combined for further use.
If text is found (the PDF is regular, which does not need OCR )
The text is extracted directly using the pdfplumber library for accurate, structured content parsing.
The above works with both the API and direct upload, and supports sending contentID or full text via the API.
Note- Enabling OCR is done by a site configuration managed by AGAT for the SaaS environment.
Are we using VLLM?
Yes, most of our on-prem models run on vLLM for its efficiency and batching support. Some models (like OSS20B and OSS120B) use SGLang or other backends. We switch as needed based on performance and stability.