Back to Blog

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

Cloud API costs are the silent killer of SaaS unit economics. AI APIs in particular can generate unexpected bills when called without budgets, caching, or rate limiting. This guide covers a systematic approach to auditing, controlling, and testing your API cost assumptions before they become business-critical surprises.

Published

7 min read

Reading time

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

The bill arrived and it was $23,000. Not from a DDoS. Not from runaway infrastructure. From an AI API that was called once per page view, wasn't cached, and didn't have any per-user rate limiting. The feature was three weeks old.

This is the new class of cost bug. Not just AI APIs — it includes any third-party API with consumption-based pricing: mapping APIs, data enrichment, SMS/email providers, database read credits, and CDN egress. Engineering teams that treat cost as a CFO problem instead of an engineering quality problem eventually face preventable budget shocks.

This guide covers auditing current API spend, implementing cost controls in code, and testing cost assumptions in CI.


The API Cost Audit

Before optimizing, you need a clear picture of what you're spending:

flowchart TD
    A[Identify all external API dependencies] --> B
    B[Classify by cost model\nper-call / per-token / per-unit] --> C
    C[Instrument call frequency\nby feature and endpoint] --> D
    D[Calculate cost per user journey] --> E
    E{Cost per journey\nvs revenue?}
    E -->|Sustainable| F[Monitor + alert on budget]
    E -->|Unsustainable| G[Optimize call patterns]
    G --> H[Implement caching]
    G --> I[Batch & deduplicate]
    G --> J[Add rate limiting]

API Cost by Category

API Category Typical Pricing Cost Risk Mitigation
OpenAI GPT-4o ~$2.50/1M input tokens High Caching, model selection, prompt optimization
Google Maps $5–$7/1000 requests Medium Cache geocoding results (legal per ToS)
Twilio SMS $0.0079/SMS Low-Medium Dedup, verify opt-ins, don't send test SMS to real numbers
SendGrid/Resend Email $0.001/email Low Ensure no duplicate sends
Stripe API reads Free Low Cache customer objects
AWS S3 GET $0.0004/1000 Low CDN in front of S3
Postgres read credits Varies (Supabase: $0.09/million) Medium at scale Connection pooling, query optimization
IP Geolocation $1–5/1000 Low-Medium Cache by IP with short TTL

Instrumenting API Calls

You cannot manage what you cannot measure. Wrap external API calls to record cost metrics:

// lib/api-cost-tracking.ts
import { createClient } from '@/lib/supabase/server';

interface ApiCallMetrics {
  service: string;
  endpoint: string;
  userId?: string;
  inputTokens?: number;
  outputTokens?: number;
  estimatedCostUsd?: number;
  durationMs: number;
  cached: boolean;
}

export async function trackApiCall(metrics: ApiCallMetrics): Promise<void> {
  // Log to your analytics/observability platform
  // (PostHog, Segment, custom DB table, etc.)
  const supabase = await createClient();

  await supabase.from('api_cost_metrics').insert({
    service: metrics.service,
    endpoint: metrics.endpoint,
    user_id: metrics.userId,
    input_tokens: metrics.inputTokens,
    output_tokens: metrics.outputTokens,
    estimated_cost_usd: metrics.estimatedCostUsd,
    duration_ms: metrics.durationMs,
    cached: metrics.cached,
    created_at: new Date().toISOString(),
  });
}

// Wrapped OpenAI client
export async function chatCompletion(
  messages: Array<{ role: string; content: string }>,
  options: { userId?: string; cacheKey?: string; model?: string },
) {
  const model = options.model ?? 'gpt-4o-mini'; // Default to cheaper model
  const startTime = Date.now();

  // Check cache first
  if (options.cacheKey) {
    const cached = await getFromCache(options.cacheKey);
    if (cached) {
      await trackApiCall({
        service: 'openai',
        endpoint: 'chat.completions',
        userId: options.userId,
        durationMs: Date.now() - startTime,
        cached: true,
        estimatedCostUsd: 0,
      });
      return cached;
    }
  }

  const response = await openai.chat.completions.create({ model, messages });

  const usage = response.usage!;
  const costPerMillionInput = model === 'gpt-4o' ? 2.5 : 0.15; // GPT-4o vs GPT-4o-mini
  const costPerMillionOutput = model === 'gpt-4o' ? 10.0 : 0.6;
  const estimatedCost =
    (usage.prompt_tokens * costPerMillionInput + usage.completion_tokens * costPerMillionOutput) / 1_000_000;

  await trackApiCall({
    service: 'openai',
    endpoint: 'chat.completions',
    userId: options.userId,
    inputTokens: usage.prompt_tokens,
    outputTokens: usage.completion_tokens,
    estimatedCostUsd: estimatedCost,
    durationMs: Date.now() - startTime,
    cached: false,
  });

  // Cache the result
  if (options.cacheKey) {
    await setInCache(options.cacheKey, response, { ttl: 3600 });
  }

  return response;
}

Cost-Aware Testing

Write tests that assert on cost behavior, similar to how you write performance tests with latency budgets:

// tests/cost/api-calls.test.ts
import { test, expect } from '@playwright/test';

test('dashboard page load does not trigger AI API calls', async ({ page }) => {
  const aiApiCalls: string[] = [];

  // Monitor outbound requests to AI providers
  page.on('request', (req) => {
    const url = req.url();
    if (
      url.includes('api.openai.com') ||
      url.includes('anthropic.com') ||
      url.includes('generativelanguage.googleapis.com')
    ) {
      aiApiCalls.push(url);
    }
  });

  await page.goto('/dashboard');
  await page.waitForLoadState('networkidle');

  // Dashboard load should NEVER trigger AI API calls (too expensive per page view)
  expect(aiApiCalls, `Unexpected AI API calls on dashboard load: ${aiApiCalls.join(', ')}`).toHaveLength(0);
});

test('scan analysis uses cached result on repeat calls', async ({ request }) => {
  const scanId = 'test-scan-123';

  // First call — generates AI analysis
  const first = await request.post('/api/scan/analyze', {
    data: { scanId },
    headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
  });
  expect(first.status()).toBe(200);

  // Check if first call hit the AI API
  const firstCostHeader = first.headers()['x-api-cost-usd'];

  // Second identical call — should use cache
  const second = await request.post('/api/scan/analyze', {
    data: { scanId },
    headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
  });

  const secondCostHeader = second.headers()['x-api-cost-usd'];

  // Cached response should cost $0
  expect(parseFloat(secondCostHeader ?? '0')).toBe(0);

  // Content should be identical
  expect(await second.json()).toEqual(await first.json());
});

Per-User Cost Budgets

Implement hard limits to prevent any single user from generating runaway costs:

// lib/rate-limiting/api-budget.ts
import { redis } from '@/lib/redis';

const DAILY_AI_BUDGET_USD = 0.5; // Max $0.50 AI spend per user per day

export async function checkAndDeductApibudget(
  userId: string,
  estimatedCostUsd: number,
): Promise<{ allowed: boolean; remainingBudget: number }> {
  const key = `api_budget:${userId}:${new Date().toISOString().slice(0, 10)}`;

  const pipeline = redis.pipeline();
  pipeline.incrbyfloat(key, estimatedCostUsd);
  pipeline.expire(key, 86400); // Auto-expire after 24h

  const [newTotal] = (await pipeline.exec()) as [number, unknown];

  if (newTotal > DAILY_AI_BUDGET_USD) {
    // Deduct back and deny
    await redis.incrbyfloat(key, -estimatedCostUsd);
    return {
      allowed: false,
      remainingBudget: Math.max(0, DAILY_AI_BUDGET_USD - (newTotal - estimatedCostUsd)),
    };
  }

  return {
    allowed: true,
    remainingBudget: DAILY_AI_BUDGET_USD - newTotal,
  };
}

CI Budget Test in Practice

// tests/cost/budget-smoke.test.ts

test('typical user session stays within $0.01 API cost budget', async ({ page }) => {
  // Simulate a typical user session: login, browse, run one scan
  const sessionId = `test-session-${Date.now()}`;

  await page.goto('/login');
  await login(page, 'test@example.com', 'TestPassword123!');

  await page.goto('/dashboard');
  await page.click('[data-testid="new-scan-btn"]');
  await page.fill('[data-testid="scan-url"]', 'https://example.com');
  await page.click('[data-testid="start-scan"]');

  await page.waitForSelector('[data-testid="scan-complete"]', { timeout: 60_000 });

  // Query cost tracking for this session
  const costResponse = await fetch(`/api/test/session-cost/${sessionId}`);
  const { totalCostUsd } = await costResponse.json();

  console.log(`Session cost: $${totalCostUsd.toFixed(4)}`);

  // Typical user session should not exceed $0.01
  expect(totalCostUsd).toBeLessThan(0.01);
});

Related articles: Also see optimising CI/CD pipelines to reduce the API calls that drive costs, observability tooling to track API usage and surface cost anomalies, and aligning cost optimisation with SLOs and error budget policies.


Cost Optimization Quick Wins

Optimization Effort Potential Savings
Cache AI responses for identical inputs Low 40–80%
Downgrade model (GPT-4o → GPT-4o-mini) Low 90%+
Batch API calls instead of per-item Medium 30–50%
Cache geocoding results in Redis Low 70–90%
Add per-user daily spend limits Low Prevents runaway spend
Alert on 2× baseline daily spend Low Early warning
Remove AI from non-essential features Medium Case-by-case
Prompt compression (remove redundant context) Medium 20–40% token reduction

Engineering teams that treat API costs as a quality metric — with automated budgets, cost tracking, and CI tests — avoid the billing surprises that are otherwise inevitable in a consumption-based pricing world.

Monitor your production application's behavior and health continuously: Try ScanlyApp free and set up automated checks that validate your application is performing correctly and efficiently.

Related Posts

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You
DevOps & Infrastructure
6 min read

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You

Chaos engineering deliberately breaks things before they break on their own — in a controlled environment, with observability, and with a hypothesis. This guide covers practical chaos experiments for SaaS applications: network latency injection, dependency failure simulation, and building confidence that your system degrades gracefully under real-world failure conditions.