Agentic AI for Camfrog Video Chat — powered by Claude Opus
Real-time speech-to-text via Whisper Large-V3 on GPU, with emotion detection (angry, happy, sad, neutral) using wav2vec2. Every mic session is transcribed, tagged, and searchable.
Three-tier memory system with confidence scoring. Facts earn their way from short-term to long-term through corroboration. Learns user profiles, room lore, aliases, and relationships over time.
Spin the wheel, dice duels, trivia with PAT rewards, raffles, and tipping. Full token economy redeemable for prizes on PublicAccess.TV.
Image generation via Gemini, video creation with Veo 3.1, professional data charts with Chart.js, and live polls with real-time web dashboards.
Brave Search integration for real-time web queries. DOJ/Epstein document search. Ticker lookups. Weather, news, and fact-checking on demand.
Proactive conversation participant. Responds to interesting topics, mic ups, and direct address. Roasts users with personalized context from memory and recent chat.
Remote monitoring via Telegram. Live alerts on keyword mentions, mod events, and room activity. Full command access from your phone.
Automated moderation rules: auto-unblock, auto-unpunish, auto-unban, auto-friend, and auto-moderator. Persistent rules that survive restarts.
Mic stats, chat logs, mod action history, spin/wheel analytics, XP tracking, and leaderboards. All queryable via natural language.
3-tier model architecture (primary/secondary/background) with per-function overrides. Intelligent routing classifies each request by complexity and dynamically selects the optimal model based on budget constraints and adaptive user feedback. Smart mode uses local LLM classification; Fast mode uses instant rules. Auto-downgrades when budget runs low, auto-upgrades when quality drops.
Per-function latency constraints ensure time-sensitive operations like chatty mode never stall on slow models. Each model carries a priority score (Opus 100, GPT-4o 90, Sonnet 80, Gemma4:31b 65, Haiku 60, etc.) and an estimated response time. When a routed model exceeds the function's max latency, the system falls back to the highest-priority model that fits both the latency window and the current cost tier — so budget constraints are never violated by a speed override.
Unified model registry supporting Anthropic (Opus/Sonnet/Haiku), Google (Gemini Pro/Flash), OpenAI (GPT-4o), Kimi, and multiple local Ollama models simultaneously. API keys, routing tables, and per-model context limits all managed from a local admin dashboard.
Priority-based context budgeting that adapts to each model's context window (4K to 200K tokens). Parses prompts into ranked sections — identity, live data, memory, chat history, web results — and trims lowest-priority content first when space is tight. Memory keeps high-confidence facts, chat history preserves recent messages, and the user's actual question is never truncated. Zero-LLM overhead: all trimming is rule-based.