Pular para o conteúdo principal

Observabilidade

0) Escopo

Mapeamento de observabilidade com base apenas no que esta versionado no repositorio:

  • fontes de log;
  • healthchecks existentes;
  • metricas operacionais extraiveis;
  • pontos de consulta de logs por ambiente.

0.1 Fontes de evidencia

  • docker-compose.yml
  • docker/nginx.conf
  • helm/values.yaml
  • helm/templates/deployment.yaml
  • helm/templates/wp-cron-deployment.yaml
  • .github/workflows/ci-cd-pipeline.yml
  • we-dhedalos/functions/rest/system_logs.php
  • we-dhedalos/functions/post_types/dynamic_logs.php
  • we-dhedalos/functions/post_types/user_action_log.php
  • we-dhedalos/functions/post_types/presence_log.php
  • we-dhedalos/functions/post_types/cancellation_logs.php
  • we-dhedalos/functions/3rd/log.php
  • we-dhedalos/functions/3rd/simplybook.php
  • we-dhedalos/functions/utils/user_patch_log.php
  • we-dhedalos/functions/utils/log_user.php
  • we-dhedalos/functions/utils/cache.php
  • we-dhedalos/functions.php
  • we-dhedalos/functions/utils/*_cron.php

1) Padrao de logs

1.1 Infraestrutura

Containers de runtime local:

  • nginx
  • wordpress
  • mariadb
  • redis

Coleta local:

docker compose logs -f nginx wordpress mariadb redis

Evidencia:

  • docker-compose.yml:2

1.2 Logs estruturados de dominio (WordPress)

Fonte A: CPT dynamic_logs

Campos uteis:

  • post_type
  • post_id
  • post_title
  • user_id
  • user_name
  • user_email
  • user_ip
  • action
  • details
  • timestamp

Evidencia:

  • we-dhedalos/functions/post_types/dynamic_logs.php:320

Fonte B: tabela ${prefix}simplybook_api_requests_log

Campos uteis:

  • endpoint
  • http_method
  • request_params
  • response
  • timestamp
  • user_info (ID do usuario logado, quando houver)
  • origin (cache ou api)
  • ip_address
  • user_agent

Evidencia:

  • we-dhedalos/functions/3rd/log.php:38
  • we-dhedalos/functions/3rd/simplybook.php:53

Fonte C: tabela ${prefix}user_patch_log

Campos uteis:

  • user_id
  • received
  • created_at

Evidencia:

  • we-dhedalos/functions/utils/user_patch_log.php:21

Fonte D: tabela ${prefix}user_logs

Campos uteis:

  • message
  • created_at

Evidencia:

  • we-dhedalos/functions/utils/log_user.php:13

1.3 Request-id e user-id

Estado atual observado:

  • user-id: presente em multiplas fontes (dynamic_logs.user_id, simplybook_api_requests_log.user_info, etc.).
  • request-id/trace-id: nao ha padrao implementado de correlacao no repo.

Evidencias:

  • we-dhedalos/functions/post_types/dynamic_logs.php:324
  • docker/nginx.conf:1

2) Healthchecks

2.1 Kubernetes probes

Healthchecks configurados no chart:

  • livenessProbe: GET /
  • readinessProbe: GET /

Evidencias:

  • helm/values.yaml:66
  • helm/values.yaml:70
  • helm/templates/deployment.yaml:94

2.2 Endpoints de smoke da aplicacao

Nao foi identificado endpoint dedicado /health//healthz no codigo.

Endpoints utilitarios para smoke:

  • GET /api/dhedalos/v1/theme_settings (publico)
  • GET /api/dhedalos/v1/maintenance_mode?slug=hub|cadastro (publico)

Evidencias:

  • we-dhedalos/functions/rest/theme_settings.php:35
  • we-dhedalos/functions/rest/maintenance_mode.php:30

3) Metricas importantes (extraiveis com o que existe)

Nao ha stack de metricas versionada (Prometheus/Grafana/OTel) neste repositorio. As metricas abaixo sao operacionais e derivadas de logs/endpoints/tabelas existentes.

3.1 Latencia

Pontos criticos:

  • chamadas SimplyBook (timeout configurado em 30s)
  • limpeza de submissions externas (timeout 20s)

Evidencias:

  • we-dhedalos/functions/3rd/simplybook.php:127
  • we-dhedalos/functions.php:104

Sinal pratico:

  • aumento de erros de timeout no wordpress logs (error_log).

3.2 Erros

Pontos criticos:

  • erros 4xx/5xx da API;
  • falhas de integracoes Novu/SimplyBook;
  • falhas em jobs diarios.

Evidencias:

  • endpoint de auditoria: we-dhedalos/functions/rest/system_logs.php:33
  • erro Novu: we-dhedalos/functions/utils/notify_course_start_date_cron.php:98
  • erro cron cancelamento: we-dhedalos/functions/utils/auto_cancel_enrollments_cron.php:152

3.3 Assincrono (cron em vez de fila)

Estado atual:

  • processamento assincrono por WP-Cron;
  • nao ha fila baseada em broker (RabbitMQ/SQS/Kafka);
  • ha deployment dedicado wp-cron no Kubernetes para execucao ciclica de eventos vencidos.

Evidencias:

  • we-dhedalos/functions.php:45
  • we-dhedalos/functions/utils/auto_cancel_enrollments_cron.php:31
  • helm/templates/wp-cron-deployment.yaml:4
  • helm/templates/wp-cron-deployment.yaml:89

Metricas uteis:

  • quantidade de eventos vencidos;
  • tempo de atraso dos hooks diarios;
  • taxa de falha por hook cron.

Comandos:

docker compose exec wordpress wp cron event list
docker compose exec wordpress wp cron event run --due-now

3.4 Banco e cache

Sinais uteis:

  • crescimento de dynamic_logs, user_log_action, sub_log_action, cancellation_logs;
  • crescimento de simplybook_api_requests_log;
  • comportamento de flush/invalidacao de cache.

Evidencias:

  • limpeza dynamic_logs: we-dhedalos/functions/post_types/dynamic_logs.php:1016
  • limpeza user_log_action: we-dhedalos/functions/post_types/user_action_log.php:105
  • limpeza sub_log_action: we-dhedalos/functions/post_types/presence_log.php:133
  • limpeza cancellation_logs: we-dhedalos/functions/post_types/cancellation_logs.php:117
  • flush global: we-dhedalos/functions/utils/cache.php:28

4) Onde olhar logs

4.1 Local

docker compose logs -f nginx wordpress mariadb redis

4.2 Kubernetes

Ambientes mapeados na pipeline:

  • piloto-dhedalos-ecosystem
  • dhedalos-ecosystem
  • essencia-ecosystem
  • dev-dhedalos-wp

Evidencias:

  • .github/workflows/ci-cd-pipeline.yml:56
  • .github/workflows/ci-cd-pipeline.yml:88
  • .github/workflows/ci-cd-pipeline.yml:120
  • .github/workflows/ci-cd-pipeline.yml:152

Comandos base:

kubectl -n <namespace> get deploy
kubectl -n <namespace> logs deploy/<deployment-name> -c dhedalos-app-backend-wp-phpfpm --tail=200 -f
kubectl -n <namespace> logs deploy/<deployment-name> -c dhedalos-app-backend-wp-nginx --tail=200 -f
kubectl -n <namespace> logs deploy/<deployment-name>-wp-cron -c wp-cron --tail=200 -f

Evidencia dos containers:

  • helm/templates/deployment.yaml:49
  • helm/templates/deployment.yaml:85
  • helm/templates/wp-cron-deployment.yaml:54

4.3 API de auditoria interna

Endpoints:

  • GET /api/dhedalos/v1/system-logs
  • GET /api/dhedalos/v1/system-logs/stats

Evidencias:

  • we-dhedalos/functions/rest/system_logs.php:33
  • we-dhedalos/functions/rest/system_logs.php:45

4.4 Logs da pipeline

Para falhas de build/deploy, consultar logs do GitHub Actions no workflow CI/CD Pipeline.

Evidencia:

  • .github/workflows/ci-cd-pipeline.yml:1

5) Consultas de apoio (operacao)

Nota:

  • substitua <prefix> pelo prefixo real das tabelas (WORDPRESS_TABLE_PREFIX, padrao local wp_).

5.1 Conferir retencao e volume de tabelas de log

docker compose exec wordpress wp db query "SELECT COUNT(*) AS total FROM <prefix>user_patch_log;"
docker compose exec wordpress wp db query "SELECT COUNT(*) AS total FROM <prefix>simplybook_api_requests_log;"
docker compose exec wordpress wp db query "SELECT COUNT(*) AS total FROM <prefix>user_logs;"

5.2 Conferir ultimos eventos SimplyBook

docker compose exec wordpress wp db query "SELECT endpoint,http_method,origin,timestamp FROM <prefix>simplybook_api_requests_log ORDER BY id DESC LIMIT 20;"

5.3 Conferir cron backlog

docker compose exec wordpress wp cron event list

6) Retencao de logs observada

FontePolitica observada
dynamic_logsremove entradas anteriores ao ano corrente
user_log_actionremove acima de 90 dias
sub_log_actionremove acima de 90 dias
cancellation_logsremove acima de 90 dias
simplybook_api_requests_logexiste delete_old_logs() para 1 semana, sem agendamento identificado no repo

Evidencias:

  • we-dhedalos/functions/post_types/dynamic_logs.php:1016
  • we-dhedalos/functions/post_types/user_action_log.php:105
  • we-dhedalos/functions/post_types/presence_log.php:133
  • we-dhedalos/functions/post_types/cancellation_logs.php:117
  • we-dhedalos/functions/3rd/log.php:105

7) Pendencias

  • Nao foi identificado backend externo de observabilidade explicitamente versionado (ex.: CloudWatch, ELK, Datadog, Dokku log drain) nem documentacao de consulta nesses provedores.
  • Nao ha padrao implementado de correlacao request-id/trace-id entre Nginx, WordPress e integracoes externas.
  • Nao ha endpoint dedicado de healthcheck de aplicacao (/health), apenas probes HTTP em / no chart.
  • A rotina de limpeza de simplybook_api_requests_log existe em codigo, mas nao foi encontrado hook/evento versionado que a execute automaticamente.