What is a Chat?
A Chat (also called Chat Session) is a conversational thread that:- Maintains Context: Keeps track of the entire conversation history
- Supports Multi-turn Dialogue: Allows back-and-forth exchanges between users and AI
- Integrates Knowledge: Can leverage knowledge bases for enhanced responses
- Handles Media: Supports file uploads including documents, images, audio, and video
- Streams Responses: Provides real-time token-by-token streaming for immediate feedback
- Tracks Usage: Monitors token consumption and billing information
Chat Architecture
Core Components
1. Chat Sessions
A chat session is the container for an entire conversation. Key Features:- Unique Identifier: Each session has a UUID for tracking
- Title Management: Auto-generates meaningful titles from conversation content
- Status Tracking: Active, Archived, or Deleted states
- Metadata Storage: Flexible JSON storage for custom data
- Settings Persistence: Saves user preferences for temperature, max tokens, etc.
2. Messages
Messages are the individual exchanges within a chat session. Message Types:- USER: Messages sent by the human user
- MODEL: Responses generated by LLM models
- AGENT: Responses from AI agents
- Content (text/media)
- Role (USER/MODEL/AGENT)
- Parent message ID (for threading)
- Model or Agent ID
- Prompt/Instruction ID
- Metadata (knowledge base IDs, file references)
- Timestamps
3. File Uploads
The chat system supports rich media attachments. Supported File Types:- Documents: PDF, TXT, DOCX, XLSX, CSV
- Images: JPG, PNG, GIF, WebP
- Audio: MP3, WAV, M4A, OGG
- Video: MP4, WebM, MOV
- Files are uploaded to GCP Storage
- Presigned URLs generated for secure access
- Content extraction for text-based files
- Image/video processing for multimodal models
4. Chat Settings
Per-session LLM parameters that override defaults. Available Settings:temperature: Controls randomness (0.0 - 1.0)max_tokens: Maximum response lengthtop_p: Nucleus sampling threshold
- Request-level parameters (highest priority)
- Saved session settings
- Model defaults (lowest priority)
Chat Features
Real-time Streaming
Responses stream token-by-token for immediate user feedback. Streaming Flow: Stream Format:- Text Tokens: Regular chat responses
- Reasoning Steps: For models with reasoning capabilities
- Image Generation: Progress updates and final URLs
- Video Generation: Progress updates and final URLs
- Error Messages: Graceful error handling
Knowledge Base Integration
Chats can leverage knowledge bases for enhanced responses. Integration Flow: How It Works:- User specifies knowledge base IDs in the message
- System performs semantic search on user query
- Top relevant chunks retrieved (configurable limit)
- Context injected into the prompt
- LLM generates response using both its knowledge and KB context
Audio Transcription
Convert speech to text for voice-based interactions. Transcription Features:- Multiple Formats: MP3, WAV, M4A, OGG
- Language Support: Multi-language transcription
- Streaming Support: Process audio files of any size
- High Accuracy: Powered by advanced speech recognition
Prompt Generation
AI-powered prompt enhancement and generation. Prompt Types:- Creative: Generate creative writing prompts
- Task: Create task-oriented prompts
- Question: Generate insightful questions
- Continuation: Suggest conversation continuations
Multi-Modal Support
Advanced models can handle various content types. Capabilities:- Text Generation: Standard chat responses
- Image Generation: Create images from text descriptions
- Video Generation: Generate short video clips
- Vision: Analyze uploaded images
- Document Understanding: Extract and analyze document content
Billing and Usage Tracking
Credit System
Every chat interaction is tracked and billed. Billing Flow: Token Calculation:- Input tokens
- Output tokens
- Total tokens (weighted)
- Cached tokens (if applicable)
- Pricing ratios
- Message IDs
Usage Metadata
Chat sessions track cumulative usage. Tracked Metrics:Chat Operations
Creating a Chat Session
Auto-Created Sessions:- If no chat_id provided in send_message
- System creates session automatically
- Title set to βNew Chatβ
- Status set to ACTIVE
Updating Chat Sessions
Updatable Fields:- Title (manual or auto-generated)
- Status (ACTIVE, ARCHIVED, DELETED)
- Settings (temperature, max_tokens, top_p)
Deleting Chat Sessions
Single Delete:- Soft delete (status change to DELETED)
- Cascade deletes messages
- Removes file upload links
- Delete multiple sessions in one request
- Validates ownership
- Returns count of deleted sessions
Advanced Features
Agent Integration
Chats can interact with deployed AI agents. Agent Chat Flow: Agent Requirements:- Agent must be active
- Agent must be deployed
- User must have access to agent
Instruction/Prompt System
Use pre-defined prompts to guide AI behavior. Prompt Integration: Prompt Components:- ID: Unique identifier
- Title: Display name
- Description: Purpose description
- Content: Actual prompt text
WebSocket Updates
Real-time updates for chat state changes. Events Broadcasted:- Chat title updates (after auto-generation)
- New message notifications
- Status changes
Firebase Integration
Chat updates sync to Firebase Realtime Database. Firebase Path:Best Practices
Chat Management
- Session Organization: Use meaningful titles and archive old chats
- Settings Optimization: Adjust temperature and max_tokens per use case
- Knowledge Base Selection: Only include relevant KBs to reduce noise
- File Management: Clean up unused uploads regularly
Performance Optimization
- Streaming: Always use streaming for better UX
- Context Window: Monitor message count and summarize long conversations
- Token Limits: Set appropriate max_tokens to control costs
- Caching: Leverage cached tokens when available
Error Handling
- Billing Failures: Handle insufficient credits gracefully
- LLM Errors: Display user-friendly error messages
- File Upload Limits: Validate file size and type before upload
- Timeout Handling: Set appropriate timeouts for long operations
Security Considerations
- Access Control: Use RBAC to control chat access
- File Validation: Validate file types and scan for malware
- Content Filtering: Implement content moderation as needed
- Data Privacy: Handle sensitive conversations appropriately
Common Use Cases
Customer Support Bot
Configuration:- Temperature: 0.3 (more deterministic)
- Knowledge Bases: FAQ, Product Docs, Support Articles
- Prompts: Customer service instructions
Code Assistant
Features:- File upload for code review
- Multi-turn debugging conversations
- Code generation with context
- Documentation lookup via KB
- Temperature: 0.7 (balanced creativity)
- Models: GPT-4, Claude-3.5-Sonnet
- File types: .py, .js, .java, etc.
Research Assistant
Features:- Document upload and analysis
- Knowledge base integration
- Citation tracking
- Long-form responses
- Temperature: 0.5 (factual)
- Max tokens: 4096 (long responses)
- Knowledge Bases: Research papers, articles
Creative Writing
Features:- Prompt generation
- Story continuation
- Character development
- Style adaptation
- Temperature: 0.9 (highly creative)
- Prompt type: βcreativeβ
- Models: GPT-4, Claude-3-Opus
Performance Metrics
Quality Metrics
- Response Relevance: Accuracy of AI responses
- Context Retention: How well context is maintained
- Knowledge Integration: Effectiveness of KB usage
- Error Rate: Frequency of failed interactions
Efficiency Metrics
- Response Time: Time to first token
- Streaming Speed: Tokens per second
- Token Usage: Input/output token counts
- Cost per Chat: Average credits consumed
Usage Metrics
- Active Sessions: Number of ongoing chats
- Message Volume: Total messages sent
- File Uploads: Number and size of uploads
- Knowledge Base Hits: KB search frequency
Troubleshooting
Common Issues
Slow Responses:- Check LLM provider status
- Reduce max_tokens
- Optimize knowledge base searches
- Monitor network latency
- Verify sufficient credits
- Check transaction logs
- Review usage metadata
- Contact support for discrepancies
- Keep conversations under context limit
- Implement conversation summarization
- Use parent_message_id correctly
- Check message ordering
- Validate file size limits
- Check GCP storage configuration
- Verify content type
- Review network connectivity
Next Steps
Explore related concepts and start building:- Knowledge Base - Enhance chats with domain knowledge
- AI Agents - Build autonomous conversational agents
- LLM Models - Choose the right model for your use case
- Tools - Extend chat capabilities with custom functions