# Production Operations Report

**Date:** 2026-05-31  
**Scope:** Queue, scheduler, mail, realtime, backups, health

---

## Queue workers

| Item | Status | Notes |
|------|--------|-------|
| Dedicated worker service | ✅ Defined | `docker-compose.production.yml` — `queue` service |
| Redis connection | ✅ Configured | `QUEUE_CONNECTION=redis` on app service |
| Retry policy | ✅ | `--tries=3 --backoff=10,30,60` |
| Failed jobs table | ✅ | Laravel `failed_jobs` migration present |
| Monitoring | ⚠️ Partial | Runbook documents backlog checks; no automated alert |

**Gap:** Queue/reverb/scheduler containers lack Docker healthchecks (only MySQL has one).

---

## Scheduler

| Item | Status | Notes |
|------|--------|-------|
| `schedule:work` container | ✅ | Single long-running scheduler in compose |
| SLA jobs | ✅ | Registered in Laravel scheduler (see `routes/console.php`) |
| Cleanup / notification jobs | ✅ | Documented in operations runbook |
| HA / redundancy | ⚠️ | Single scheduler container — SPOF acceptable for current scale |

---

## Mail (Mailgun)

| Item | Status | Notes |
|------|--------|-------|
| Mailgun mailer config | ✅ | `config/mail.php`, `config/services.php` |
| Env template | ✅ | `docs/PRODUCTION_ENV_TEMPLATE.md` |
| Bounce handling | ⚠️ Unverified | No dedicated webhook handler audited in this sprint |
| Failure logging | ✅ | Laravel mail + notification failure paths |
| Health probe | ✅ **New** | `GET /health/mail` — validates mailer + from address + Mailgun credentials |

---

## Realtime (Reverb)

| Item | Status | Notes |
|------|--------|-------|
| Reverb service | ✅ | Separate compose service on port 8080 |
| Broadcast listeners | ✅ | Collaborator, assign, reply, note events wired |
| Restart strategy | ⚠️ Manual | Runbook: restart reverb container on disconnect |
| Health probe | ✅ | `GET /health/reverb` |

---

## Backups

### Database

| Policy | Recommendation |
|--------|----------------|
| Frequency | Daily full backup + hourly binlog/WAL if available |
| Retention | 30 daily, 12 monthly |
| Storage | Off-site object storage (encrypted) |
| Restore test | Quarterly restore drill to staging |

### Restore process (summary)

1. Stop app and queue workers  
2. Restore MySQL dump to target instance  
3. Run `php artisan migrate --force` if schema drift  
4. Clear Redis cache/queue or flush non-critical keys  
5. Restart queue, scheduler, reverb  
6. Verify `/health` and smoke login  

Document full runbook steps in ops vault; not automated in repo.

---

## Health endpoints

| Endpoint | Checks |
|----------|--------|
| `GET /health` | **Aggregate** — database, cache, queue, mail, storage, reverb |
| `GET /health/app` | Liveness (debug/env hidden in production) |
| `GET /health/database` | PDO + `select 1` |
| `GET /health/cache` | Read/write round-trip |
| `GET /health/queue` | Driver ping + queue size |
| `GET /health/mail` | Mailer config + Mailgun credentials |
| `GET /health/storage` | Write/read/delete on default disk |
| `GET /health/reverb` | Broadcasting config |
| `GET /up` | Laravel built-in probe |

**Tests:** `tests/Feature/Health/HealthEndpointsTest.php` — aggregate, cache, mail, storage covered.

---

## Deployment CI

| Item | Status |
|------|--------|
| `.github/workflows/deploy.yml` | Scaffold only — not wired to production |
| Post-deploy health gate | ❌ Not automated |

---

## Operations readiness score: **7.5 / 10**

Infrastructure definitions and health probes are in place. Gaps: staging/live mail bounce verification, queue/reverb healthchecks in compose, automated deploy + post-deploy gate, backup automation not in repo.
