Time-Based Usage
Resources billed by the second (time).
Duration metering for GPU compute, video transcoding, and any resource billed by the second. Covers two distinct patterns: batch jobs (meter at completion) and live sessions (meter at stop, with a running clock). Includes the pre-flight quota check that prevents starting a job you can't finish, and partial-duration handling when jobs crash mid-run.
Policy
policy:
credits:
gpu_second:
description: GPU compute time in seconds
overhead_cost: 0.00028 # $0.28/GPU-hour → ~$0.000078/s; adjust per instance type
pricing_model: tiered
tiers:
- up_to: 3600 # first GPU-hour
price: { amount: 0.0004 }
- up_to: 36000 # 1–10 GPU-hours
price: { amount: 0.00035 }
- # 10+ GPU-hours
price: { amount: 0.0003 }
stof_units: s
resets: true
video_second:
description: Video transcoding time in seconds
overhead_cost: 0.000015
pricing_model: flat
price: { amount: 0.000025 }
stof_units: s
resets: true
plans:
free:
label: Free
period: monthly
default: true
entitlements:
gpu_access:
description: GPU compute access
gpu_compute:
description: GPU compute — hard limit, 1 GPU-hour/month
limit: { credit: gpu_second, mode: hard, value: '1hr', resets: true, reset_inc: 30days }
# No video_access on free
pro:
label: Pro
period: monthly
entitlements:
gpu_access:
description: GPU compute access
gpu_compute:
description: GPU compute — soft limit, 10 GPU-hours/month, overage billed
limit: { credit: gpu_second, mode: soft, value: '10hr', resets: true, reset_inc: 30days }
video_access:
description: Video transcoding access
video_transcode:
description: Video transcoding — hard limit, 2 hours/month
limit: { credit: video_second, mode: hard, value: '2hr', resets: true, reset_inc: 30days }
enterprise:
label: Enterprise
period: monthly
entitlements:
gpu_access:
description: GPU compute access
gpu_compute:
description: GPU compute — soft limit, 200 GPU-hours/month, overage billed
limit: { credit: gpu_second, mode: soft, value: '200hr', resets: true, reset_inc: 30days }
video_access:
description: Video transcoding access
video_transcode:
description: Video transcoding — soft limit, 50 hours/month, overage billed
limit: { credit: video_second, mode: soft, value: '50hr', resets: true, reset_inc: 30days }
Batch jobs (meter at completion)
For jobs with a known duration at the end — transcoding, inference runs, training steps. The job completes, you know the wall-clock time, you meter it.
import { Limitr } from '@formata/limitr';
import { readFileSync } from 'fs';
const policy = await Limitr.new(readFileSync('./policy.yaml', 'utf-8'), 'yaml');
policy.addHandler('billing', (key: string, value: unknown) => {
if (key === 'meter-overage') {
const event = JSON.parse(value as string);
billing.queueCharge({
customerId: event.customer.id,
entitlement: event.entitlement,
seconds: event.overage,
credit: event.credit.description,
});
}
});
async function handleGpuJob(
customerId: string,
jobConfig: { estimatedSeconds: number; script: string }
) {
await policy.ensureCustomer(customerId, 'pro');
if (!await policy.check(customerId, 'gpu_access')) {
return { error: 'GPU access not available on this plan', code: 'NO_ACCESS' };
}
// Pre-flight check against the estimate.
// For long jobs, failing here is far cheaper than failing mid-run.
// check() is read-only — quota is not consumed yet.
const remaining = await policy.remaining(customerId, 'gpu_compute');
if (!await policy.check(customerId, 'gpu_compute', jobConfig.estimatedSeconds)) {
return {
error: 'Insufficient GPU quota for this job',
code: 'QUOTA_INSUFFICIENT',
remainingSeconds: remaining ?? 0,
estimatedSeconds: jobConfig.estimatedSeconds,
};
}
const startedAt = Date.now();
let actualSeconds = 0;
try {
const result = await gpuCluster.run(jobConfig);
actualSeconds = Math.ceil((Date.now() - startedAt) / 1000);
// Meter actual duration at completion
await policy.allow(customerId, 'gpu_compute', actualSeconds);
return { success: true, duration: actualSeconds, result };
} catch (err) {
// Job crashed — meter whatever ran. Don't swallow the duration.
actualSeconds = Math.ceil((Date.now() - startedAt) / 1000);
if (actualSeconds > 0) {
await policy.allow(customerId, 'gpu_compute', actualSeconds);
}
return {
error: 'Job failed',
code: 'JOB_ERROR',
cause: (err as Error).message,
secondsCharged: actualSeconds, // tell the customer what they're being charged for
};
}
}
async function handleVideoTranscode(
customerId: string,
video: { durationSeconds: number; inputKey: string }
) {
if (!await policy.check(customerId, 'video_access')) {
return { error: 'Video transcoding requires Pro or Enterprise' };
}
// For video, billing by input duration (known upfront) is more predictable
// than billing by wall-clock transcode time. See Notes.
const billableSeconds = video.durationSeconds;
if (!await policy.check(customerId, 'video_transcode', billableSeconds)) {
const remaining = await policy.remaining(customerId, 'video_transcode');
return {
error: 'Monthly video transcoding limit would be exceeded',
remainingSeconds: remaining ?? 0,
requiredSeconds: billableSeconds,
};
}
const result = await transcoder.run(video.inputKey);
await policy.allow(customerId, 'video_transcode', billableSeconds);
return { success: true, outputKey: result.outputKey, secondsCharged: billableSeconds };
}
Live sessions (meter at stop)
For interactive GPU sessions, Jupyter notebooks, or any resource where a user starts a session and stops it later. You don't know the duration upfront — you meter when the session ends.
// In-memory session tracking. Use Redis or your DB in production.
const activeSessions = new Map<string, { customerId: string; startedAt: number }>();
async function startSession(customerId: string, sessionId: string) {
await policy.ensureCustomer(customerId, 'pro');
if (!await policy.check(customerId, 'gpu_access')) {
return { error: 'GPU access not available on this plan' };
}
// For live sessions we can't check an exact duration — just ensure the
// customer isn't already at zero before starting.
const remaining = await policy.remaining(customerId, 'gpu_compute');
if (remaining !== null && remaining <= 0) {
return { error: 'No GPU quota remaining', code: 'QUOTA_EXHAUSTED', remaining: 0 };
}
await gpuCluster.startSession(sessionId);
activeSessions.set(sessionId, { customerId, startedAt: Date.now() });
return { success: true, sessionId, remainingSeconds: remaining };
}
async function stopSession(sessionId: string) {
const session = activeSessions.get(sessionId);
if (!session) return { error: 'Session not found' };
activeSessions.delete(sessionId);
await gpuCluster.stopSession(sessionId);
const durationSeconds = Math.ceil((Date.now() - session.startedAt) / 1000);
await policy.allow(session.customerId, 'gpu_compute', durationSeconds);
const remaining = await policy.remaining(session.customerId, 'gpu_compute');
return { success: true, durationSeconds, remainingSeconds: remaining ?? 0 };
}
// Heartbeat: detect sessions clients abandoned without calling stop.
// Call from a cron or background worker every N minutes.
async function reapAbandonedSessions(maxIdleSeconds = 1800) {
const now = Date.now();
for (const [sessionId, session] of activeSessions) {
const elapsed = (now - session.startedAt) / 1000;
if (elapsed > maxIdleSeconds) {
console.warn(`Reaping abandoned session ${sessionId} after ${elapsed}s`);
await stopSession(sessionId); // meters and cleans up
}
}
}
async function getComputeDisplay(customerId: string) {
const used = await policy.value(customerId, 'gpu_compute') ?? 0;
const limit = await policy.limit(customerId, 'gpu_compute') ?? 0;
const remaining = await policy.remaining(customerId, 'gpu_compute') ?? 0;
const usedPct = await policy.value(customerId, 'gpu_compute', true) ?? 0;
const activeSession = [...activeSessions.entries()]
.find(([_, s]) => s.customerId === customerId);
return {
usedSeconds: used,
limitSeconds: limit,
remainingSeconds: remaining,
usedPercent: usedPct,
activeSession: activeSession ? {
sessionId: activeSession[0],
runningSince: activeSession[1].startedAt,
elapsedSeconds: Math.floor((Date.now() - activeSession[1].startedAt) / 1000),
} : null,
};
}
Notes
Always pre-flight check long jobs — for a 10ms API call, discovering a quota problem after the fact is annoying but cheap. For a 4-hour GPU job, it's expensive for you and infuriating for the customer. Use check() before starting any job longer than a few seconds. For live sessions where duration isn't known upfront, check that quota is non-zero before allowing the session to start.
Meter at completion, not at start — consuming quota upfront creates a refund problem: if the job fails, you have to release quota. Metering at completion with allow() is simpler and more correct. The tradeoff is that a customer could start more concurrent jobs than their quota supports — if that's a concern, track in-flight estimates in your application layer and deduct them from remaining() before deciding whether to start a new job.
Meter partial durations on failure — when a job crashes mid-run, meter what ran. Skipping the allow() call on failure means customers who crash jobs repeatedly get unlimited free compute. The error response should always tell the customer exactly how many seconds they're being charged for.
Billing by video duration vs. wall-clock transcode time — wall-clock transcode time depends on infrastructure utilization and codec complexity. Video duration is stable, predictable, and understood by the customer. For video products, billing by input/output video duration (a known quantity before the job starts) is almost always preferable to billing by transcode wall time. Use whichever matches your actual cost structure.
Live session abandonment — clients disconnect, browsers crash, mobile apps get backgrounded. Any live session pattern needs a reaper — a background process that detects sessions with no heartbeat and calls stopSession(). The reaper in the example above meters the full elapsed duration regardless of why the session ended. Set maxIdleSeconds to match your infrastructure's auto-shutdown behavior.
stof_units: s and unit strings — with stof_units: s, you can pass durations as unit strings anywhere Limitr accepts a value: '2min', '1hr', '90s'. The meter always stores in seconds. This is useful when your upstream gives you duration in a different unit — just pass '${response.duration_ms}ms' and let Limitr convert.