Configuring Classification Rules - Best Practices

Best Practices for Creating NLP Usage and Data Classification Rules in BusinessGPT

Introduction:
BusinessGPT provides customers with the powerful ability to classify and monitor AI-generated prompts and responses, ensuring data and usage are tracked and categorized for security, compliance, and productivity purposes. Customers can create both data classification rules and usage classification rules to label content accurately. These rules can be configured using either regular expressions (Regex) or natural language processing (NLP), providing flexibility and accuracy in identifying content that aligns with organizational standards and requirements.

This document outlines the best practices for configuring NLP-based classification rules, with a focus on achieving optimal rule specificity to avoid both over- and under-classification.

1. Understanding Classification Types

Data Classification Rules

Purpose: To label prompts or responses that contain sensitive information like Personally Identifiable Information (PII), confidential data, or proprietary business content.
Example: Assign a "PII" label when a prompt or response contains sensitive data like names, addresses, social security numbers, etc.

Usage Classification Rules

Purpose: To categorize prompts and responses based on their intended purpose, such as marketing, support, or product development.
Example: Assign a "Marketing" label to content that is created to promote products or engage with customers.

2. Key Best Practices for NLP-Based Classification

Using NLP for rule configuration allows for more contextual understanding than Regex, but it requires careful attention to specificity to avoid over- or under-classification.

A. Be Specific, But Not Too Narrow

Start with clear, focused keywords or phrases for each classification rule. Avoid overly generic terms that could match too many cases, yet also ensure that the rule doesn’t become so specific that it misses relevant data.
Example: Instead of specifying “user’s name,” which may miss out on slight variations, use a more general phrase like “personal identifier,” which could capture names, usernames, and email addresses while maintaining a focus on PII.

B. Phrase Rules in Complete Sentences for Contextual Clarity

When configuring with NLP, phrase the classification rule as a full sentence or question to leverage the language model's ability to understand context. This approach also reduces ambiguity in classification.
Example: For data classification, use phrasing like, “ Contains information that can personally identify an individual” instead of simply stating “PII.”

C. Balance Keyword Use with Contextual Statements

While keywords are essential, combining them with a contextual statement helps the model understand the intended purpose.
Example: Instead of just listing “social security number,” configure the rule as “Mentions a social security number or any other personal identification data”

D. Use Synonyms and Related Terms Thoughtfully

Include common synonyms or related terminology for key concepts within the same rule. This broadens the rule's applicability without overgeneralizing.
Example: For a marketing usage classification, include “content for promotion,” “advertising text,” and “campaign message” to ensure relevant prompts are captured.

3. Methodology for Rule Configuration

A. Set Clear Objectives for Each Classification Rule

Identify the primary goal of each rule before writing it. Is it intended to secure data, support compliance, or facilitate content tracking? Defining this will help in phrasing the rule and selecting keywords or terms.
Example: If your objective is to classify content containing financial data, use phrases like “financial figures” or “revenue data,” instead of just “numbers.”

B. Test Rules in a Controlled Environment

Once configured, test rules with a variety of prompts and responses to see how accurately the classification rule performs. Adjust rule phrasing if necessary to capture missed cases or avoid misclassifications.
Example: Use a testing set that includes variations of PII-containing prompts to see if the data classification rule catches each instance. Adjust if the rule is too narrow or too broad.

C. Adjust for Ambiguity with Gradual Tweaking

Start with slightly broader rules and adjust gradually based on observed accuracy. Narrow down the rule scope only if it consistently over-classifies unrelated prompts.
Example: If a rule categorizes too many prompts as “Marketing,” refine by adding a condition like “only for external promotion” to better focus on intended use.

4. Using NLP and Regex Together for Enhanced Precision

In scenarios where highly accurate classification is essential, consider combining NLP with Regex rules. NLP rules can capture broader language patterns, while Regex can pinpoint specific terms or structures.
Example: Use NLP to identify general references to marketing (e.g., “promotional content”), and Regex to identify exact phrases like “marketing campaign” or “ad copy.”

5. Best Practice Summary Checklist

Step	Description

Step	Description
Define Objective	Clarify the goal of each classification rule.
Start with Clear Phrasing	Write rules as full sentences for clarity and context.
Balance Keywords & Context	Include keywords but provide context to refine classification.
Test & Refine	Continuously test and adjust rules to avoid over- or under-classification.
Combine NLP & Regex	Use both methods where possible to optimize specificity and capture all relevant cases.

Conclusion:
By following these best practices, customers can configure NLP-based classification rules in BusinessGPT effectively, ensuring they capture relevant data without overreach. The objective is to achieve a balance between precision and inclusivity so that each rule serves its purpose accurately, helping organizations to leverage generative AI insights while protecting and categorizing data responsibly.

For any further assistance, please consult the BusinessGPT support team.

BusinessGPT AI Governance & Security