Configuring Classification Rules - Best Practices
Best Practices for Creating NLP Usage and Data Classification Rules in BusinessGPT
Introduction:
BusinessGPT provides customers with the powerful ability to classify and monitor AI-generated prompts and responses, ensuring data and usage are tracked and categorized for security, compliance, and productivity purposes. Customers can create both data classification rules and usage classification rules to label content accurately. These rules can be configured using either regular expressions (Regex) or natural language processing (NLP), providing flexibility and accuracy in identifying content that aligns with organizational standards and requirements.
This document outlines the best practices for configuring NLP-based classification rules, with a focus on achieving optimal rule specificity to avoid both over- and under-classification.
1. Understanding Classification Types
Data Classification Rules
Purpose: To label prompts or responses that contain sensitive information like Personally Identifiable Information (PII), confidential data, or proprietary business content.
Example: Assign a "PII" label when a prompt or response contains sensitive data like names, addresses, social security numbers, etc.
Usage Classification Rules
Purpose: To categorize prompts and responses based on their intended purpose, such as marketing, support, or product development.
Example: Assign a "Marketing" label to content that is created to promote products or engage with customers.
2. Key Best Practices for NLP-Based Classification
Using NLP for rule configuration allows for more contextual understanding than Regex, but it requires careful attention to specificity to avoid over- or under-classification.
A. Be Specific, But Not Too Narrow
Start with clear, focused keywords or phrases for each classification rule. Avoid overly generic terms that could match too many cases, yet also ensure that the rule doesn’t become so specific that it misses relevant data.
Example: Instead of specifying “user’s name,” which may miss out on slight variations, use a more general phrase like “personal identifier,” which could capture names, usernames, and email addresses while maintaining a focus on PII.
B. Phrase Rules in Complete Sentences for Contextual Clarity
When configuring with NLP, phrase the classification rule as a full sentence or question to leverage the language model's ability to understand context. This approach also reduces ambiguity in classification.
Example: For data classification, use phrasing like, “ Contains information that can personally identify an individual” instead of simply stating “PII.”
C. Balance Keyword Use with Contextual Statements
While keywords are essential, combining them with a contextual statement helps the model understand the intended purpose.
Example: Instead of just listing “social security number,” configure the rule as “Mentions a social security number or any other personal identification data”
D. Use Synonyms and Related Terms Thoughtfully
Include common synonyms or related terminology for key concepts within the same rule. This broadens the rule's applicability without overgeneralizing.
Example: For a marketing usage classification, include “content for promotion,” “advertising text,” and “campaign message” to ensure relevant prompts are captured.
3. Methodology for Rule Configuration
A. Set Clear Objectives for Each Classification Rule
Identify the primary goal of each rule before writing it. Is it intended to secure data, support compliance, or facilitate content tracking? Defining this will help in phrasing the rule and selecting keywords or terms.
Example: If your objective is to classify content containing financial data, use phrases like “financial figures” or “revenue data,” instead of just “numbers.”
B. Test Rules in a Controlled Environment
Once configured, test rules with a variety of prompts and responses to see how accurately the classification rule performs. Adjust rule phrasing if necessary to capture missed cases or avoid misclassifications.
Example: Use a testing set that includes variations of PII-containing prompts to see if the data classification rule catches each instance. Adjust if the rule is too narrow or too broad.
C. Adjust for Ambiguity with Gradual Tweaking
Start with slightly broader rules and adjust gradually based on observed accuracy. Narrow down the rule scope only if it consistently over-classifies unrelated prompts.
Example: If a rule categorizes too many prompts as “Marketing,” refine by adding a condition like “only for external promotion” to better focus on intended use.
4. Using NLP and Regex Together for Enhanced Precision
In scenarios where highly accurate classification is essential, consider combining NLP with Regex rules. NLP rules can capture broader language patterns, while Regex can pinpoint specific terms or structures.
Example: Use NLP to identify general references to marketing (e.g., “promotional content”), and Regex to identify exact phrases like “marketing campaign” or “ad copy.”
5. Best Practice Summary Checklist
Step | Description |
---|---|
Define Objective | Clarify the goal of each classification rule. |
Start with Clear Phrasing | Write rules as full sentences for clarity and context. |
Balance Keywords & Context | Include keywords but provide context to refine classification. |
Test & Refine | Continuously test and adjust rules to avoid over- or under-classification. |
Combine NLP & Regex | Use both methods where possible to optimize specificity and capture all relevant cases. |
Conclusion:
By following these best practices, customers can configure NLP-based classification rules in BusinessGPT effectively, ensuring they capture relevant data without overreach. The objective is to achieve a balance between precision and inclusivity so that each rule serves its purpose accurately, helping organizations to leverage generative AI insights while protecting and categorizing data responsibly.
For any further assistance, please consult the BusinessGPT support team.