Monitoring and statistics¶

This guide covers how to use ddb-lib's built-in monitoring capabilities to track performance, detect anti-patterns, and optimize your DynamoDB usage. The stats system helps you understand how your application uses DynamoDB and provides actionable recommendations.

Overview¶

The monitoring system consists of three components:

StatsCollector - Collects operation metrics (latency, capacity, item counts)
RecommendationEngine - Analyzes patterns and suggests optimizations
AntiPatternDetector - Identifies common DynamoDB anti-patterns

graph LR
    A[Operations] --> B[StatsCollector]
    B --> C[Metrics Storage]
    C --> D[RecommendationEngine]
    C --> E[AntiPatternDetector]
    D --> F[Recommendations]
    E --> F

    style B fill:#4CAF50
    style D fill:#2196F3
    style E fill:#FF9800

Enabling statistics¶

Enable stats collection when creating the TableClient:

import { TableClient } from '@ddb-lib/client'
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'

const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: true,
    sampleRate: 1.0,  // Collect 100% of operations
    thresholds: {
      slowQueryMs: 1000,  // Queries slower than 1s
      highRCU: 100,       // Operations using >100 RCU
      highWCU: 100        // Operations using >100 WCU
    }
  }
})

Configuration options¶

Option	Type	Default	Description
`enabled`	boolean	required	Enable/disable stats collection
`sampleRate`	number	1.0	Sample rate (0.0-1.0) for collection
`thresholds.slowQueryMs`	number	1000	Threshold for slow queries (ms)
`thresholds.highRCU`	number	100	Threshold for high RCU consumption
`thresholds.highWCU`	number	100	Threshold for high WCU consumption

Sample rate¶

Use sampling to reduce overhead in high-traffic applications:

statsConfig: {
  enabled: true,
  sampleRate: 0.1  // Collect 10% of operations
}

When to use sampling: - ✅ High-traffic applications (>1000 ops/sec) - ✅ Production environments with tight latency requirements - ❌ Development and testing (use 1.0 for complete data) - ❌ Low-traffic applications (<100 ops/sec)

Collecting statistics¶

Statistics are automatically collected for all operations:

// Perform operations
await client.get({ pk: 'USER#123', sk: 'PROFILE' })
await client.query({
  keyCondition: {
    pk: 'USER#123',
    sk: { beginsWith: 'ORDER#' }
  }
})
await client.put({
  pk: 'USER#456',
  sk: 'PROFILE',
  name: 'Alice',
  email: 'alice@example.com'
})

// Get aggregated statistics
const stats = client.getStats()
console.log('Operation stats:', stats)

Understanding statistics¶

Operation statistics¶

View metrics by operation type:

const stats = client.getStats()

// Get operation
console.log('Get operations:', stats.operations.get)
// {
//   count: 150,
//   totalLatencyMs: 3750,
//   avgLatencyMs: 25,
//   totalRCU: 150,
//   totalWCU: 0
// }

// Query operations
console.log('Query operations:', stats.operations.query)
// {
//   count: 50,
//   totalLatencyMs: 2250,
//   avgLatencyMs: 45,
//   totalRCU: 500,
//   totalWCU: 0
// }

// Put operations
console.log('Put operations:', stats.operations.put)
// {
//   count: 100,
//   totalLatencyMs: 2000,
//   avgLatencyMs: 20,
//   totalRCU: 0,
//   totalWCU: 100
// }

Access pattern statistics¶

Track performance by access pattern:

const stats = client.getStats()

console.log('Access patterns:', stats.accessPatterns)
// {
//   getUserOrders: {
//     count: 50,
//     avgLatencyMs: 45,
//     avgItemsReturned: 12.5
//   },
//   getUserProfile: {
//     count: 150,
//     avgLatencyMs: 25,
//     avgItemsReturned: 1
//   }
// }

Getting recommendations¶

The recommendation engine analyzes your usage patterns and suggests optimizations:

// Get recommendations
const recommendations = client.getRecommendations()

for (const rec of recommendations) {
  console.log(`[${rec.severity}] ${rec.message}`)
  console.log(`  Details: ${rec.details}`)
  if (rec.suggestedAction) {
    console.log(`  Action: ${rec.suggestedAction}`)
  }
  if (rec.estimatedImpact) {
    console.log(`  Impact:`, rec.estimatedImpact)
  }
}

Recommendation types¶

1. batch opportunities¶

Detects multiple individual operations that could be batched:

[info] Batch get opportunity detected
  Details: Detected 15 individual get operations within a 1000ms window.
  Action: Use batchGet() to retrieve multiple items in a single request.
  Impact: Reduce 15 requests to 1 batch request

2. projection opportunities¶

Identifies operations fetching full items when only some attributes are needed:

[info] Consider using projection expressions for query operations
  Details: Only 30% of query operations use projection expressions.
  Action: Add projectionExpression to query operations to fetch only needed attributes.
  Impact: Reduced data transfer and RCU consumption

3. client-side filtering¶

Detects inefficient queries with low return rates:

[warning] Potential client-side filtering detected in getUserOrders
  Details: Query operations have 15% efficiency (50 operations, 42 with low efficiency).
  Action: Add a FilterExpression to your query to filter items on the server side.
  Impact: Up to 85% reduction in RCU consumption

4. sequential writes¶

Identifies sequential write operations that could be batched:

[info] Sequential write operations detected
  Details: Detected 20 sequential write operations within a 1000ms window.
  Action: Use batchWrite() to combine multiple put and delete operations.
  Impact: Reduce 20 requests to 1 batch request

5. read-before-write pattern¶

Detects get followed by put on the same key:

[info] Read-before-write pattern detected
  Details: Detected 5 instances of get followed by put on key 'USER#123'.
  Action: Use update() instead of get() + put().
  Impact: 50% reduction in operations (eliminate get)

6. slow operations¶

Identifies operations exceeding latency thresholds:

[warning] 25 slow operations detected
  Details: Found 25 operations exceeding 1000ms threshold (avg: 1500ms).
  Action: Review slow operations for optimization opportunities.
  Impact: Improved response times

7. high capacity usage¶

Detects operations consuming excessive capacity:

[warning] 10 operations with high RCU consumption
  Details: Found 10 operations exceeding 100 RCU threshold.
  Action: Use projection expressions to reduce data transfer.
  Impact: Lower read capacity costs

Anti-pattern detection¶

The anti-pattern detector identifies common DynamoDB mistakes:

import { AntiPatternDetector } from '@ddb-lib/stats'

// Get the stats collector from the client
const statsCollector = client['statsCollector']  // Internal access
const detector = new AntiPatternDetector(statsCollector)

// Detect hot partitions
const hotPartitions = detector.detectHotPartitions()
for (const partition of hotPartitions) {
  console.log(`Hot partition: ${partition.partitionKey}`)
  console.log(`  Access count: ${partition.accessCount}`)
  console.log(`  Percentage: ${(partition.percentageOfTotal * 100).toFixed(1)}%`)
  console.log(`  Recommendation: ${partition.recommendation}`)
}

// Detect inefficient scans
const inefficientScans = detector.detectInefficientScans()
for (const scan of inefficientScans) {
  console.log(`Inefficient scan: ${scan.operation}`)
  console.log(`  Scanned: ${scan.scannedCount}, Returned: ${scan.returnedCount}`)
  console.log(`  Efficiency: ${(scan.efficiency * 100).toFixed(1)}%`)
  console.log(`  Recommendation: ${scan.recommendation}`)
}

// Get all anti-pattern recommendations
const antiPatterns = detector.generateRecommendations()
for (const rec of antiPatterns) {
  console.log(`[${rec.severity}] ${rec.message}`)
}

Hot partition detection¶

Identifies partitions receiving >10% of traffic:

const hotPartitions = detector.detectHotPartitions()

// Example output:
// {
//   partitionKey: 'STATUS#ACTIVE',
//   accessCount: 850,
//   percentageOfTotal: 0.85,  // 85% of traffic!
//   recommendation: 'Consider write sharding or better key distribution'
// }

Inefficient scan detection¶

Finds scans with <20% efficiency:

const inefficientScans = detector.detectInefficientScans()

// Example output:
// {
//   operation: 'scan on my-table',
//   scannedCount: 10000,
//   returnedCount: 50,
//   efficiency: 0.005,  // 0.5% efficiency!
//   recommendation: 'Consider using a query with an appropriate index'
// }

Unused index detection¶

Identifies indexes not used in 7 days:

const unusedIndexes = detector.detectUnusedIndexes()

// Example output:
// {
//   indexName: 'my-table:OldIndex',
//   usageCount: 0,
//   lastUsed: 1701388800000,
//   recommendation: 'Consider removing this index to reduce storage costs'
// }

Monitoring in production¶

Logging recommendations¶

Automatically log high-severity recommendations:

const recommendations = client.getRecommendations()

for (const rec of recommendations) {
  if (rec.severity === 'error' || rec.severity === 'warning') {
    console.warn(`[DynamoDB] ${rec.severity.toUpperCase()}: ${rec.message}`)
    console.warn(`  Details: ${rec.details}`)
    if (rec.suggestedAction) {
      console.warn(`  Action: ${rec.suggestedAction}`)
    }
  }
}

Periodic reporting¶

Generate periodic reports:

// Report every 5 minutes
setInterval(() => {
  const stats = client.getStats()
  const recommendations = client.getRecommendations()

  console.log('=== DynamoDB Stats Report ===')
  console.log('Operations:', stats.operations)
  console.log('Access Patterns:', stats.accessPatterns)
  console.log('Recommendations:', recommendations.length)

  // Log high-priority recommendations
  const highPriority = recommendations.filter(
    r => r.severity === 'error' || r.severity === 'warning'
  )

  if (highPriority.length > 0) {
    console.log('High Priority Issues:')
    for (const rec of highPriority) {
      console.log(`  - ${rec.message}`)
    }
  }
}, 5 * 60 * 1000)

Exporting metrics¶

Export raw metrics for external monitoring:

import { StatsCollector } from '@ddb-lib/stats'

// Access the internal stats collector
const statsCollector = client['statsCollector'] as StatsCollector

// Export raw operations
const operations = statsCollector.export()

// Send to monitoring service
await sendToDatadog(operations)
await sendToCloudWatch(operations)
await sendToPrometheus(operations)

Integration with CloudWatch¶

Send metrics to CloudWatch:

import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch'

const cloudwatch = new CloudWatchClient({ region: 'us-east-1' })

async function publishMetrics() {
  const stats = client.getStats()

  const metricData = []

  // Publish operation counts
  for (const [operation, opStats] of Object.entries(stats.operations)) {
    metricData.push({
      MetricName: 'OperationCount',
      Value: opStats.count,
      Unit: 'Count',
      Dimensions: [
        { Name: 'Operation', Value: operation },
        { Name: 'TableName', Value: 'my-table' }
      ]
    })

    metricData.push({
      MetricName: 'AverageLatency',
      Value: opStats.avgLatencyMs,
      Unit: 'Milliseconds',
      Dimensions: [
        { Name: 'Operation', Value: operation },
        { Name: 'TableName', Value: 'my-table' }
      ]
    })
  }

  await cloudwatch.send(new PutMetricDataCommand({
    Namespace: 'DynamoDB/Application',
    MetricData: metricData
  }))
}

// Publish every minute
setInterval(publishMetrics, 60 * 1000)

Performance impact¶

Stats collection has minimal overhead:

Sample Rate	Overhead	Use Case
1.0 (100%)	<1ms per operation	Development, testing
0.5 (50%)	<0.5ms per operation	Staging
0.1 (10%)	<0.1ms per operation	Production (high traffic)
0.01 (1%)	<0.01ms per operation	Production (very high traffic)

Disabling in production¶

Disable stats for maximum performance:

const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: false  // No overhead
  }
})

Best practices¶

1. start with full sampling¶

// Development: Full sampling
statsConfig: {
  enabled: true,
  sampleRate: 1.0
}

2. reduce sampling in production¶

// Production: 10% sampling
statsConfig: {
  enabled: true,
  sampleRate: 0.1
}

3. set appropriate thresholds¶

statsConfig: {
  enabled: true,
  sampleRate: 1.0,
  thresholds: {
    slowQueryMs: 500,   // Stricter for low-latency apps
    highRCU: 50,        // Lower for cost-sensitive apps
    highWCU: 50
  }
}

4. act on recommendations¶

const recommendations = client.getRecommendations()

// Prioritize by severity
const errors = recommendations.filter(r => r.severity === 'error')
const warnings = recommendations.filter(r => r.severity === 'warning')
const info = recommendations.filter(r => r.severity === 'info')

// Address errors first
for (const rec of errors) {
  console.error('CRITICAL:', rec.message)
  // Take immediate action
}

5. monitor trends¶

// Track metrics over time
const history: any[] = []

setInterval(() => {
  const stats = client.getStats()
  history.push({
    timestamp: Date.now(),
    stats
  })

  // Keep last 24 hours
  const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000
  history = history.filter(h => h.timestamp > oneDayAgo)

  // Analyze trends
  analyzeTrends(history)
}, 60 * 1000)

6. reset stats periodically¶

// Reset stats every hour to prevent memory growth
setInterval(() => {
  const statsCollector = client['statsCollector'] as StatsCollector
  statsCollector.reset()
}, 60 * 60 * 1000)

Example: complete monitoring setup¶

import { TableClient } from '@ddb-lib/client'
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'
import { AntiPatternDetector } from '@ddb-lib/stats'

// Create client with monitoring
const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: true,
    sampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
    thresholds: {
      slowQueryMs: 1000,
      highRCU: 100,
      highWCU: 100
    }
  }
})

// Periodic monitoring
setInterval(() => {
  // Get statistics
  const stats = client.getStats()
  console.log('=== DynamoDB Monitoring Report ===')
  console.log('Total operations:', Object.values(stats.operations)
    .reduce((sum, op) => sum + op.count, 0))

  // Get recommendations
  const recommendations = client.getRecommendations()
  const highPriority = recommendations.filter(
    r => r.severity === 'error' || r.severity === 'warning'
  )

  if (highPriority.length > 0) {
    console.log('\nHigh Priority Issues:')
    for (const rec of highPriority) {
      console.log(`[${rec.severity}] ${rec.message}`)
      console.log(`  ${rec.suggestedAction}`)
    }
  }

  // Detect anti-patterns
  const statsCollector = client['statsCollector']
  if (statsCollector) {
    const detector = new AntiPatternDetector(statsCollector)

    const hotPartitions = detector.detectHotPartitions()
    if (hotPartitions.length > 0) {
      console.log('\nHot Partitions Detected:')
      for (const partition of hotPartitions) {
        console.log(`  ${partition.partitionKey}: ${(partition.percentageOfTotal * 100).toFixed(1)}%`)
      }
    }

    const inefficientScans = detector.detectInefficientScans()
    if (inefficientScans.length > 0) {
      console.log('\nInefficient Scans Detected:')
      for (const scan of inefficientScans) {
        console.log(`  ${scan.operation}: ${(scan.efficiency * 100).toFixed(1)}% efficiency`)
      }
    }
  }
}, 5 * 60 * 1000)  // Every 5 minutes

// Export function
export { client }

Next steps¶

Learn about Multi-Attribute Keys for advanced patterns
Review Best Practices for optimization tips
Avoid Anti-Patterns that hurt performance
See Examples for complete monitoring setups