Skip to content

Monitoring and statistics

This guide covers how to use ddb-lib's built-in monitoring capabilities to track performance, detect anti-patterns, and optimize your DynamoDB usage. The stats system helps you understand how your application uses DynamoDB and provides actionable recommendations.

Overview

The monitoring system consists of three components:

  • StatsCollector - Collects operation metrics (latency, capacity, item counts)
  • RecommendationEngine - Analyzes patterns and suggests optimizations
  • AntiPatternDetector - Identifies common DynamoDB anti-patterns
graph LR
    A[Operations] --> B[StatsCollector]
    B --> C[Metrics Storage]
    C --> D[RecommendationEngine]
    C --> E[AntiPatternDetector]
    D --> F[Recommendations]
    E --> F

    style B fill:#4CAF50
    style D fill:#2196F3
    style E fill:#FF9800

Enabling statistics

Enable stats collection when creating the TableClient:

import { TableClient } from '@ddb-lib/client'
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'

const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: true,
    sampleRate: 1.0,  // Collect 100% of operations
    thresholds: {
      slowQueryMs: 1000,  // Queries slower than 1s
      highRCU: 100,       // Operations using >100 RCU
      highWCU: 100        // Operations using >100 WCU
    }
  }
})

Configuration options

Option Type Default Description
enabled boolean required Enable/disable stats collection
sampleRate number 1.0 Sample rate (0.0-1.0) for collection
thresholds.slowQueryMs number 1000 Threshold for slow queries (ms)
thresholds.highRCU number 100 Threshold for high RCU consumption
thresholds.highWCU number 100 Threshold for high WCU consumption

Sample rate

Use sampling to reduce overhead in high-traffic applications:

statsConfig: {
  enabled: true,
  sampleRate: 0.1  // Collect 10% of operations
}

When to use sampling: - ✅ High-traffic applications (>1000 ops/sec) - ✅ Production environments with tight latency requirements - ❌ Development and testing (use 1.0 for complete data) - ❌ Low-traffic applications (<100 ops/sec)

Collecting statistics

Statistics are automatically collected for all operations:

// Perform operations
await client.get({ pk: 'USER#123', sk: 'PROFILE' })
await client.query({
  keyCondition: {
    pk: 'USER#123',
    sk: { beginsWith: 'ORDER#' }
  }
})
await client.put({
  pk: 'USER#456',
  sk: 'PROFILE',
  name: 'Alice',
  email: 'alice@example.com'
})

// Get aggregated statistics
const stats = client.getStats()
console.log('Operation stats:', stats)

Understanding statistics

Operation statistics

View metrics by operation type:

const stats = client.getStats()

// Get operation
console.log('Get operations:', stats.operations.get)
// {
//   count: 150,
//   totalLatencyMs: 3750,
//   avgLatencyMs: 25,
//   totalRCU: 150,
//   totalWCU: 0
// }

// Query operations
console.log('Query operations:', stats.operations.query)
// {
//   count: 50,
//   totalLatencyMs: 2250,
//   avgLatencyMs: 45,
//   totalRCU: 500,
//   totalWCU: 0
// }

// Put operations
console.log('Put operations:', stats.operations.put)
// {
//   count: 100,
//   totalLatencyMs: 2000,
//   avgLatencyMs: 20,
//   totalRCU: 0,
//   totalWCU: 100
// }

Access pattern statistics

Track performance by access pattern:

const stats = client.getStats()

console.log('Access patterns:', stats.accessPatterns)
// {
//   getUserOrders: {
//     count: 50,
//     avgLatencyMs: 45,
//     avgItemsReturned: 12.5
//   },
//   getUserProfile: {
//     count: 150,
//     avgLatencyMs: 25,
//     avgItemsReturned: 1
//   }
// }

Getting recommendations

The recommendation engine analyzes your usage patterns and suggests optimizations:

// Get recommendations
const recommendations = client.getRecommendations()

for (const rec of recommendations) {
  console.log(`[${rec.severity}] ${rec.message}`)
  console.log(`  Details: ${rec.details}`)
  if (rec.suggestedAction) {
    console.log(`  Action: ${rec.suggestedAction}`)
  }
  if (rec.estimatedImpact) {
    console.log(`  Impact:`, rec.estimatedImpact)
  }
}

Recommendation types

1. batch opportunities

Detects multiple individual operations that could be batched:

[info] Batch get opportunity detected
  Details: Detected 15 individual get operations within a 1000ms window.
  Action: Use batchGet() to retrieve multiple items in a single request.
  Impact: Reduce 15 requests to 1 batch request

2. projection opportunities

Identifies operations fetching full items when only some attributes are needed:

[info] Consider using projection expressions for query operations
  Details: Only 30% of query operations use projection expressions.
  Action: Add projectionExpression to query operations to fetch only needed attributes.
  Impact: Reduced data transfer and RCU consumption

3. client-side filtering

Detects inefficient queries with low return rates:

[warning] Potential client-side filtering detected in getUserOrders
  Details: Query operations have 15% efficiency (50 operations, 42 with low efficiency).
  Action: Add a FilterExpression to your query to filter items on the server side.
  Impact: Up to 85% reduction in RCU consumption

4. sequential writes

Identifies sequential write operations that could be batched:

[info] Sequential write operations detected
  Details: Detected 20 sequential write operations within a 1000ms window.
  Action: Use batchWrite() to combine multiple put and delete operations.
  Impact: Reduce 20 requests to 1 batch request

5. read-before-write pattern

Detects get followed by put on the same key:

[info] Read-before-write pattern detected
  Details: Detected 5 instances of get followed by put on key 'USER#123'.
  Action: Use update() instead of get() + put().
  Impact: 50% reduction in operations (eliminate get)

6. slow operations

Identifies operations exceeding latency thresholds:

[warning] 25 slow operations detected
  Details: Found 25 operations exceeding 1000ms threshold (avg: 1500ms).
  Action: Review slow operations for optimization opportunities.
  Impact: Improved response times

7. high capacity usage

Detects operations consuming excessive capacity:

[warning] 10 operations with high RCU consumption
  Details: Found 10 operations exceeding 100 RCU threshold.
  Action: Use projection expressions to reduce data transfer.
  Impact: Lower read capacity costs

Anti-pattern detection

The anti-pattern detector identifies common DynamoDB mistakes:

import { AntiPatternDetector } from '@ddb-lib/stats'

// Get the stats collector from the client
const statsCollector = client['statsCollector']  // Internal access
const detector = new AntiPatternDetector(statsCollector)

// Detect hot partitions
const hotPartitions = detector.detectHotPartitions()
for (const partition of hotPartitions) {
  console.log(`Hot partition: ${partition.partitionKey}`)
  console.log(`  Access count: ${partition.accessCount}`)
  console.log(`  Percentage: ${(partition.percentageOfTotal * 100).toFixed(1)}%`)
  console.log(`  Recommendation: ${partition.recommendation}`)
}

// Detect inefficient scans
const inefficientScans = detector.detectInefficientScans()
for (const scan of inefficientScans) {
  console.log(`Inefficient scan: ${scan.operation}`)
  console.log(`  Scanned: ${scan.scannedCount}, Returned: ${scan.returnedCount}`)
  console.log(`  Efficiency: ${(scan.efficiency * 100).toFixed(1)}%`)
  console.log(`  Recommendation: ${scan.recommendation}`)
}

// Get all anti-pattern recommendations
const antiPatterns = detector.generateRecommendations()
for (const rec of antiPatterns) {
  console.log(`[${rec.severity}] ${rec.message}`)
}

Hot partition detection

Identifies partitions receiving >10% of traffic:

const hotPartitions = detector.detectHotPartitions()

// Example output:
// {
//   partitionKey: 'STATUS#ACTIVE',
//   accessCount: 850,
//   percentageOfTotal: 0.85,  // 85% of traffic!
//   recommendation: 'Consider write sharding or better key distribution'
// }

Inefficient scan detection

Finds scans with <20% efficiency:

const inefficientScans = detector.detectInefficientScans()

// Example output:
// {
//   operation: 'scan on my-table',
//   scannedCount: 10000,
//   returnedCount: 50,
//   efficiency: 0.005,  // 0.5% efficiency!
//   recommendation: 'Consider using a query with an appropriate index'
// }

Unused index detection

Identifies indexes not used in 7 days:

const unusedIndexes = detector.detectUnusedIndexes()

// Example output:
// {
//   indexName: 'my-table:OldIndex',
//   usageCount: 0,
//   lastUsed: 1701388800000,
//   recommendation: 'Consider removing this index to reduce storage costs'
// }

Monitoring in production

Logging recommendations

Automatically log high-severity recommendations:

const recommendations = client.getRecommendations()

for (const rec of recommendations) {
  if (rec.severity === 'error' || rec.severity === 'warning') {
    console.warn(`[DynamoDB] ${rec.severity.toUpperCase()}: ${rec.message}`)
    console.warn(`  Details: ${rec.details}`)
    if (rec.suggestedAction) {
      console.warn(`  Action: ${rec.suggestedAction}`)
    }
  }
}

Periodic reporting

Generate periodic reports:

// Report every 5 minutes
setInterval(() => {
  const stats = client.getStats()
  const recommendations = client.getRecommendations()

  console.log('=== DynamoDB Stats Report ===')
  console.log('Operations:', stats.operations)
  console.log('Access Patterns:', stats.accessPatterns)
  console.log('Recommendations:', recommendations.length)

  // Log high-priority recommendations
  const highPriority = recommendations.filter(
    r => r.severity === 'error' || r.severity === 'warning'
  )

  if (highPriority.length > 0) {
    console.log('High Priority Issues:')
    for (const rec of highPriority) {
      console.log(`  - ${rec.message}`)
    }
  }
}, 5 * 60 * 1000)

Exporting metrics

Export raw metrics for external monitoring:

import { StatsCollector } from '@ddb-lib/stats'

// Access the internal stats collector
const statsCollector = client['statsCollector'] as StatsCollector

// Export raw operations
const operations = statsCollector.export()

// Send to monitoring service
await sendToDatadog(operations)
await sendToCloudWatch(operations)
await sendToPrometheus(operations)

Integration with CloudWatch

Send metrics to CloudWatch:

import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch'

const cloudwatch = new CloudWatchClient({ region: 'us-east-1' })

async function publishMetrics() {
  const stats = client.getStats()

  const metricData = []

  // Publish operation counts
  for (const [operation, opStats] of Object.entries(stats.operations)) {
    metricData.push({
      MetricName: 'OperationCount',
      Value: opStats.count,
      Unit: 'Count',
      Dimensions: [
        { Name: 'Operation', Value: operation },
        { Name: 'TableName', Value: 'my-table' }
      ]
    })

    metricData.push({
      MetricName: 'AverageLatency',
      Value: opStats.avgLatencyMs,
      Unit: 'Milliseconds',
      Dimensions: [
        { Name: 'Operation', Value: operation },
        { Name: 'TableName', Value: 'my-table' }
      ]
    })
  }

  await cloudwatch.send(new PutMetricDataCommand({
    Namespace: 'DynamoDB/Application',
    MetricData: metricData
  }))
}

// Publish every minute
setInterval(publishMetrics, 60 * 1000)

Performance impact

Stats collection has minimal overhead:

Sample Rate Overhead Use Case
1.0 (100%) <1ms per operation Development, testing
0.5 (50%) <0.5ms per operation Staging
0.1 (10%) <0.1ms per operation Production (high traffic)
0.01 (1%) <0.01ms per operation Production (very high traffic)

Disabling in production

Disable stats for maximum performance:

const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: false  // No overhead
  }
})

Best practices

1. start with full sampling

// Development: Full sampling
statsConfig: {
  enabled: true,
  sampleRate: 1.0
}

2. reduce sampling in production

// Production: 10% sampling
statsConfig: {
  enabled: true,
  sampleRate: 0.1
}

3. set appropriate thresholds

statsConfig: {
  enabled: true,
  sampleRate: 1.0,
  thresholds: {
    slowQueryMs: 500,   // Stricter for low-latency apps
    highRCU: 50,        // Lower for cost-sensitive apps
    highWCU: 50
  }
}

4. act on recommendations

const recommendations = client.getRecommendations()

// Prioritize by severity
const errors = recommendations.filter(r => r.severity === 'error')
const warnings = recommendations.filter(r => r.severity === 'warning')
const info = recommendations.filter(r => r.severity === 'info')

// Address errors first
for (const rec of errors) {
  console.error('CRITICAL:', rec.message)
  // Take immediate action
}
// Track metrics over time
const history: any[] = []

setInterval(() => {
  const stats = client.getStats()
  history.push({
    timestamp: Date.now(),
    stats
  })

  // Keep last 24 hours
  const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000
  history = history.filter(h => h.timestamp > oneDayAgo)

  // Analyze trends
  analyzeTrends(history)
}, 60 * 1000)

6. reset stats periodically

// Reset stats every hour to prevent memory growth
setInterval(() => {
  const statsCollector = client['statsCollector'] as StatsCollector
  statsCollector.reset()
}, 60 * 60 * 1000)

Example: complete monitoring setup

import { TableClient } from '@ddb-lib/client'
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'
import { AntiPatternDetector } from '@ddb-lib/stats'

// Create client with monitoring
const client = new TableClient({
  tableName: 'my-table',
  client: new DynamoDBClient({ region: 'us-east-1' }),
  statsConfig: {
    enabled: true,
    sampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
    thresholds: {
      slowQueryMs: 1000,
      highRCU: 100,
      highWCU: 100
    }
  }
})

// Periodic monitoring
setInterval(() => {
  // Get statistics
  const stats = client.getStats()
  console.log('=== DynamoDB Monitoring Report ===')
  console.log('Total operations:', Object.values(stats.operations)
    .reduce((sum, op) => sum + op.count, 0))

  // Get recommendations
  const recommendations = client.getRecommendations()
  const highPriority = recommendations.filter(
    r => r.severity === 'error' || r.severity === 'warning'
  )

  if (highPriority.length > 0) {
    console.log('\nHigh Priority Issues:')
    for (const rec of highPriority) {
      console.log(`[${rec.severity}] ${rec.message}`)
      console.log(`  ${rec.suggestedAction}`)
    }
  }

  // Detect anti-patterns
  const statsCollector = client['statsCollector']
  if (statsCollector) {
    const detector = new AntiPatternDetector(statsCollector)

    const hotPartitions = detector.detectHotPartitions()
    if (hotPartitions.length > 0) {
      console.log('\nHot Partitions Detected:')
      for (const partition of hotPartitions) {
        console.log(`  ${partition.partitionKey}: ${(partition.percentageOfTotal * 100).toFixed(1)}%`)
      }
    }

    const inefficientScans = detector.detectInefficientScans()
    if (inefficientScans.length > 0) {
      console.log('\nInefficient Scans Detected:')
      for (const scan of inefficientScans) {
        console.log(`  ${scan.operation}: ${(scan.efficiency * 100).toFixed(1)}% efficiency`)
      }
    }
  }
}, 5 * 60 * 1000)  // Every 5 minutes

// Export function
export { client }

Next steps