PepeFrog

🎤

Live Transcription & Sentiment

Real-time speech-to-text via Whisper Large-V3 on GPU, with emotion detection (angry, happy, sad, neutral) using wav2vec2. Every mic session is transcribed, tagged, and searchable.

🧠

Adaptive Memory

Three-tier memory system with confidence scoring. Facts earn their way from short-term to long-term through corroboration. Learns user profiles, room lore, aliases, and relationships over time.

🎲

Games & Economy

Spin the wheel, dice duels, trivia with PAT rewards, raffles, and tipping. Full token economy redeemable for prizes on PublicAccess.TV.

🎨

AI Generation

Image generation via Gemini, video creation with Veo 3.1, professional data charts with Chart.js, and live polls with real-time web dashboards.

🔎

Web Research & Data

Brave Search integration for real-time web queries. DOJ/Epstein document search. Ticker lookups. Weather, news, and fact-checking on demand.

💬

Chatty Mode

Proactive conversation participant. Responds to interesting topics, mic ups, and direct address. Roasts users with personalized context from memory and recent chat.

📡

Telegram Bridge

Remote monitoring via Telegram. Live alerts on keyword mentions, mod events, and room activity. Full command access from your phone.

🛡

AutoMod

Automated moderation rules: auto-unblock, auto-unpunish, auto-unban, auto-friend, and auto-moderator. Persistent rules that survive restarts.

📊

Analytics & Logging

Mic stats, chat logs, mod action history, spin/wheel analytics, XP tracking, and leaderboards. All queryable via natural language.

🧠

Intelligent Model Router

3-tier model architecture (primary/secondary/background) with per-function overrides. Intelligent routing classifies each request by complexity and dynamically selects the optimal model based on budget constraints and adaptive user feedback. Smart mode uses local LLM classification; Fast mode uses instant rules. Auto-downgrades when budget runs low, auto-upgrades when quality drops.

⚡

Latency-Aware Routing

Per-function latency constraints ensure time-sensitive operations like chatty mode never stall on slow models. Each model carries a priority score (Opus 100, GPT-4o 90, Sonnet 80, Gemma4:31b 65, Haiku 60, etc.) and an estimated response time. When a routed model exceeds the function's max latency, the system falls back to the highest-priority model that fits both the latency window and the current cost tier — so budget constraints are never violated by a speed override.

🌐

Multi-Provider LLM Engine

Unified model registry supporting Anthropic (Opus/Sonnet/Haiku), Google (Gemini Pro/Flash), OpenAI (GPT-4o), Kimi, and multiple local Ollama models simultaneously. API keys, routing tables, and per-model context limits all managed from a local admin dashboard.

📦

Dynamic Context Manager

Priority-based context budgeting that adapts to each model's context window (4K to 200K tokens). Parses prompts into ranked sections — identity, live data, memory, chat history, web results — and trims lowest-priority content first when space is tight. Memory keeps high-confidence facts, chat history preserves recent messages, and the user's actual question is never truncated. Zero-LLM overhead: all trimming is rule-based.

Capabilities