Agent Monitoring and Diagnostics ✅

Kastrax provides comprehensive monitoring and diagnostic capabilities for AI agents, allowing you to track performance, identify issues, and optimize behavior. This guide explains how to use these features effectively.

Monitoring Overview ✅

Agent monitoring in Kastrax enables you to:

Track agent performance metrics
Monitor resource usage
Analyze agent behavior patterns
Identify and diagnose issues
Generate reports and visualizations
Set up alerts for anomalies

Basic Monitoring Setup ✅

Here’s how to set up basic monitoring for a Kastrax agent:


import ai.kastrax.core.agent.agent
import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.MonitoringConfig
import ai.kastrax.integrations.deepseek.deepSeek
import ai.kastrax.integrations.deepseek.DeepSeekModel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor with default configuration
    val monitor = AgentMonitor.create(
        MonitoringConfig(
            enabled = true,
            collectPerformanceMetrics = true,
            collectUsageMetrics = true,
            collectBehaviorMetrics = true,
            samplingRate = 1.0 // Monitor all interactions
        )
    )
    
    // Create an agent with monitoring
    val myAgent = agent {
        name("MonitoredAgent")
        description("An agent with monitoring capabilities")
        
        model = deepSeek {
            apiKey("your-deepseek-api-key")
            model(DeepSeekModel.DEEPSEEK_CHAT)
            temperature(0.7)
        }
        
        // Enable monitoring
        monitor = monitor
    }
    
    // Use the agent
    repeat(5) { i ->
        val response = myAgent.generate("Tell me something interesting about ${i + 1}")
        println("Response ${i + 1}: ${response.text}")
    }
    
    // Get monitoring data
    val metrics = monitor.getMetrics(myAgent.name)
    println("Collected metrics: $metrics")
}

Performance Metrics ✅

Kastrax collects various performance metrics for agents:

Response Time Metrics


// Get response time metrics
val responseTimeMetrics = monitor.getResponseTimeMetrics(myAgent.name)
println("Average response time: ${responseTimeMetrics.average} ms")
println("Median response time: ${responseTimeMetrics.median} ms")
println("95th percentile response time: ${responseTimeMetrics.percentile95} ms")
println("Maximum response time: ${responseTimeMetrics.max} ms")

Token Usage Metrics


// Get token usage metrics
val tokenUsageMetrics = monitor.getTokenUsageMetrics(myAgent.name)
println("Total input tokens: ${tokenUsageMetrics.totalInputTokens}")
println("Total output tokens: ${tokenUsageMetrics.totalOutputTokens}")
println("Average tokens per request: ${tokenUsageMetrics.averageTokensPerRequest}")
println("Token usage trend: ${tokenUsageMetrics.usageTrend}")

Error Rate Metrics


// Get error rate metrics
val errorMetrics = monitor.getErrorMetrics(myAgent.name)
println("Error rate: ${errorMetrics.errorRate * 100}%")
println("Common error types: ${errorMetrics.commonErrorTypes}")
println("Error trend: ${errorMetrics.errorTrend}")

Resource Usage Monitoring ✅

You can monitor resource usage of your agents:


// Get resource usage metrics
val resourceMetrics = monitor.getResourceMetrics(myAgent.name)
println("Memory usage: ${resourceMetrics.memoryUsage} MB")
println("CPU usage: ${resourceMetrics.cpuUsage}%")
println("Network usage: ${resourceMetrics.networkUsage} KB")

Behavior Analysis ✅

Kastrax can analyze agent behavior patterns:


// Get behavior metrics
val behaviorMetrics = monitor.getBehaviorMetrics(myAgent.name)
println("Tool usage distribution: ${behaviorMetrics.toolUsageDistribution}")
println("Response length distribution: ${behaviorMetrics.responseLengthDistribution}")
println("Common topics: ${behaviorMetrics.commonTopics}")
println("Sentiment distribution: ${behaviorMetrics.sentimentDistribution}")

Custom Metrics ✅

You can define and collect custom metrics for your agents:


import ai.kastrax.core.agent.agent
import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.CustomMetric
import ai.kastrax.core.agent.monitoring.MetricType
import ai.kastrax.integrations.deepseek.deepSeek
import ai.kastrax.integrations.deepseek.DeepSeekModel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor with custom metrics
    val monitor = AgentMonitor.create()
    
    // Define custom metrics
    monitor.defineCustomMetric(
        CustomMetric(
            name = "user_satisfaction",
            description = "User satisfaction score",
            type = MetricType.GAUGE,
            unit = "score"
        )
    )
    
    monitor.defineCustomMetric(
        CustomMetric(
            name = "task_completion_rate",
            description = "Rate of successful task completions",
            type = MetricType.RATE,
            unit = "percent"
        )
    )
    
    // Create an agent with monitoring
    val myAgent = agent {
        name("CustomMetricsAgent")
        description("An agent with custom metrics")
        
        model = deepSeek {
            apiKey("your-deepseek-api-key")
            model(DeepSeekModel.DEEPSEEK_CHAT)
            temperature(0.7)
        }
        
        // Enable monitoring
        monitor = monitor
    }
    
    // Use the agent and record custom metrics
    val response = myAgent.generate("Help me solve this math problem: 2x + 5 = 15")
    println("Response: ${response.text}")
    
    // Record custom metrics
    monitor.recordCustomMetric(
        agentName = myAgent.name,
        metricName = "user_satisfaction",
        value = 4.5 // On a scale of 1-5
    )
    
    monitor.recordCustomMetric(
        agentName = myAgent.name,
        metricName = "task_completion_rate",
        value = 1.0 // 100% completion
    )
    
    // Get custom metrics
    val customMetrics = monitor.getCustomMetrics(myAgent.name)
    println("Custom metrics: $customMetrics")
}

Real-time Monitoring ✅

You can set up real-time monitoring with callbacks:


import ai.kastrax.core.agent.agent
import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.MonitoringCallback
import ai.kastrax.integrations.deepseek.deepSeek
import ai.kastrax.integrations.deepseek.DeepSeekModel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor with real-time callbacks
    val monitor = AgentMonitor.create()
    
    // Define monitoring callback
    monitor.addCallback(object : MonitoringCallback {
        override fun onRequestStart(agentName: String, requestId: String) {
            println("[$agentName] Request started: $requestId")
        }
        
        override fun onRequestComplete(agentName: String, requestId: String, durationMs: Long) {
            println("[$agentName] Request completed: $requestId (${durationMs}ms)")
        }
        
        override fun onError(agentName: String, requestId: String, error: Throwable) {
            println("[$agentName] Error in request $requestId: ${error.message}")
        }
        
        override fun onMetricUpdate(agentName: String, metricName: String, value: Double) {
            println("[$agentName] Metric update: $metricName = $value")
        }
    })
    
    // Create an agent with monitoring
    val myAgent = agent {
        name("RealTimeMonitoredAgent")
        description("An agent with real-time monitoring")
        
        model = deepSeek {
            apiKey("your-deepseek-api-key")
            model(DeepSeekModel.DEEPSEEK_CHAT)
            temperature(0.7)
        }
        
        // Enable monitoring
        monitor = monitor
    }
    
    // Use the agent
    val response = myAgent.generate("Tell me a joke")
    println("Response: ${response.text}")
}

Monitoring Dashboard ✅

Kastrax provides a monitoring dashboard for visualizing agent metrics:


import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.dashboard.MonitoringDashboard
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Get the monitor
    val monitor = AgentMonitor.getInstance()
    
    // Create a monitoring dashboard
    val dashboard = MonitoringDashboard.create(monitor)
    
    // Start the dashboard on a specific port
    dashboard.start(port = 8080)
    
    println("Monitoring dashboard started at http://localhost:8080")
    
    // Keep the application running
    readLine()
    
    // Stop the dashboard when done
    dashboard.stop()
}

Alerting ✅

You can set up alerts for specific conditions:


import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.alert.AlertCondition
import ai.kastrax.core.agent.monitoring.alert.AlertSeverity
import ai.kastrax.core.agent.monitoring.alert.AlertChannel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Get the monitor
    val monitor = AgentMonitor.getInstance()
    
    // Configure email alert channel
    val emailChannel = AlertChannel.email(
        recipients = listOf("admin@example.com"),
        smtpConfig = mapOf(
            "host" to "smtp.example.com",
            "port" to "587",
            "username" to "alerts@example.com",
            "password" to "password"
        )
    )
    
    // Configure Slack alert channel
    val slackChannel = AlertChannel.slack(
        webhookUrl = "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX",
        channel = "#agent-alerts"
    )
    
    // Add alert conditions
    monitor.addAlertCondition(
        AlertCondition(
            name = "High Error Rate",
            metricName = "error_rate",
            threshold = 0.05, // 5% error rate
            comparison = AlertCondition.Comparison.GREATER_THAN,
            duration = 300, // 5 minutes
            severity = AlertSeverity.HIGH,
            channels = listOf(emailChannel, slackChannel)
        )
    )
    
    monitor.addAlertCondition(
        AlertCondition(
            name = "Slow Response Time",
            metricName = "response_time_p95",
            threshold = 5000.0, // 5 seconds
            comparison = AlertCondition.Comparison.GREATER_THAN,
            duration = 600, // 10 minutes
            severity = AlertSeverity.MEDIUM,
            channels = listOf(slackChannel)
        )
    )
    
    println("Alert conditions configured")
}

Diagnostic Tools ✅

Kastrax provides diagnostic tools for troubleshooting agent issues:

Request Tracing


import ai.kastrax.core.agent.agent
import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.diagnostics.RequestTracer
import ai.kastrax.integrations.deepseek.deepSeek
import ai.kastrax.integrations.deepseek.DeepSeekModel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor with request tracing
    val monitor = AgentMonitor.create()
    val tracer = RequestTracer.create(monitor)
    
    // Create an agent with monitoring
    val myAgent = agent {
        name("DiagnosticAgent")
        description("An agent with diagnostic capabilities")
        
        model = deepSeek {
            apiKey("your-deepseek-api-key")
            model(DeepSeekModel.DEEPSEEK_CHAT)
            temperature(0.7)
        }
        
        // Enable monitoring
        monitor = monitor
    }
    
    // Start tracing a request
    val traceId = tracer.startTrace()
    
    // Use the agent with the trace ID
    val response = myAgent.generate(
        "Explain quantum computing",
        options = AgentGenerateOptions(
            metadata = mapOf("traceId" to traceId)
        )
    )
    
    // Get the trace
    val trace = tracer.getTrace(traceId)
    
    // Print trace details
    println("Trace ID: $traceId")
    println("Request duration: ${trace.duration} ms")
    println("Steps:")
    trace.steps.forEachIndexed { index, step ->
        println("  Step ${index + 1}: ${step.name} (${step.duration} ms)")
        println("    Input: ${step.input}")
        println("    Output: ${step.output}")
    }
}

Performance Profiling


import ai.kastrax.core.agent.agent
import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.diagnostics.PerformanceProfiler
import ai.kastrax.integrations.deepseek.deepSeek
import ai.kastrax.integrations.deepseek.DeepSeekModel
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor
    val monitor = AgentMonitor.create()
    val profiler = PerformanceProfiler.create(monitor)
    
    // Create an agent with monitoring
    val myAgent = agent {
        name("ProfiledAgent")
        description("An agent with performance profiling")
        
        model = deepSeek {
            apiKey("your-deepseek-api-key")
            model(DeepSeekModel.DEEPSEEK_CHAT)
            temperature(0.7)
        }
        
        // Enable monitoring
        monitor = monitor
    }
    
    // Start profiling
    profiler.start(myAgent.name)
    
    // Use the agent
    repeat(10) { i ->
        val response = myAgent.generate("Tell me about topic ${i + 1}")
        println("Response ${i + 1}: ${response.text.take(50)}...")
    }
    
    // Stop profiling and get results
    val profile = profiler.stop(myAgent.name)
    
    // Print profile results
    println("Performance Profile:")
    println("Total duration: ${profile.totalDuration} ms")
    println("Average response time: ${profile.averageResponseTime} ms")
    println("Token processing rate: ${profile.tokenProcessingRate} tokens/second")
    println("Hotspots:")
    profile.hotspots.forEach { (component, time) ->
        println("  $component: ${time} ms (${(time / profile.totalDuration.toDouble() * 100).toInt()}%)")
    }
}

Log Analysis


import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.diagnostics.LogAnalyzer
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Create a monitor
    val monitor = AgentMonitor.getInstance()
    val logAnalyzer = LogAnalyzer.create(monitor)
    
    // Analyze logs for a specific agent
    val analysis = logAnalyzer.analyzeAgentLogs("MyAgent", timeRange = TimeRange.last(hours = 24))
    
    // Print analysis results
    println("Log Analysis:")
    println("Total requests: ${analysis.totalRequests}")
    println("Error rate: ${analysis.errorRate * 100}%")
    println("Common error patterns:")
    analysis.errorPatterns.forEach { (pattern, count) ->
        println("  $pattern: $count occurrences")
    }
    println("Performance anomalies:")
    analysis.performanceAnomalies.forEach { anomaly ->
        println("  ${anomaly.timestamp}: ${anomaly.description} (${anomaly.severity})")
    }
}

Exporting Monitoring Data ✅

You can export monitoring data for further analysis:


import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.export.MetricsExporter
import kotlinx.coroutines.runBlocking
import java.io.File
 
fun main() = runBlocking {
    // Get the monitor
    val monitor = AgentMonitor.getInstance()
    
    // Create exporters
    val csvExporter = MetricsExporter.csv()
    val jsonExporter = MetricsExporter.json()
    val prometheusExporter = MetricsExporter.prometheus()
    
    // Export metrics for a specific agent
    val agentName = "MyAgent"
    val timeRange = TimeRange.last(days = 7)
    
    // Export to CSV
    val csvData = csvExporter.exportMetrics(monitor, agentName, timeRange)
    File("agent_metrics.csv").writeText(csvData)
    
    // Export to JSON
    val jsonData = jsonExporter.exportMetrics(monitor, agentName, timeRange)
    File("agent_metrics.json").writeText(jsonData)
    
    // Start Prometheus exporter
    prometheusExporter.start(port = 9090)
    
    println("Metrics exported to CSV and JSON files")
    println("Prometheus metrics available at http://localhost:9090/metrics")
    
    // Keep the application running
    readLine()
    
    // Stop Prometheus exporter
    prometheusExporter.stop()
}

Integration with External Monitoring Systems ✅

Kastrax can integrate with external monitoring systems:


import ai.kastrax.core.agent.monitoring.AgentMonitor
import ai.kastrax.core.agent.monitoring.integration.DatadogIntegration
import ai.kastrax.core.agent.monitoring.integration.PrometheusIntegration
import ai.kastrax.core.agent.monitoring.integration.GrafanaIntegration
import kotlinx.coroutines.runBlocking
 
fun main() = runBlocking {
    // Get the monitor
    val monitor = AgentMonitor.getInstance()
    
    // Configure Datadog integration
    val datadogIntegration = DatadogIntegration.create(
        apiKey = "your-datadog-api-key",
        applicationKey = "your-datadog-application-key",
        tags = mapOf("environment" to "production")
    )
    monitor.addIntegration(datadogIntegration)
    
    // Configure Prometheus integration
    val prometheusIntegration = PrometheusIntegration.create(
        port = 9090,
        endpoint = "/metrics"
    )
    monitor.addIntegration(prometheusIntegration)
    
    // Configure Grafana integration
    val grafanaIntegration = GrafanaIntegration.create(
        url = "http://localhost:3000",
        apiKey = "your-grafana-api-key",
        dashboardName = "Kastrax Agents"
    )
    monitor.addIntegration(grafanaIntegration)
    
    println("External monitoring integrations configured")
}

Best Practices ✅

When using agent monitoring and diagnostics, follow these best practices:

Selective Monitoring: Monitor only what’s necessary to avoid performance overhead
Sampling: Use sampling for high-traffic applications
Retention Policies: Set appropriate data retention policies
Privacy: Ensure monitoring respects privacy by not collecting sensitive information
Alerting Thresholds: Set appropriate alerting thresholds to avoid alert fatigue
Regular Review: Regularly review monitoring data to identify trends and issues
Documentation: Document monitoring setup and alert responses
Testing: Test monitoring and alerting in a staging environment

Conclusion ✅

Agent monitoring and diagnostics provide powerful capabilities for understanding, optimizing, and troubleshooting AI agent systems in Kastrax. By collecting and analyzing metrics, you can ensure your agents perform reliably and efficiently.