Setting up observability in a Goa service involves configuring Clue and OpenTelemetry. This guide walks through the essential setup steps.
First, add the required dependencies to your go.mod
:
require (
goa.design/clue
go.opentelemetry.io/otel
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
)
These packages provide:
clue
: Goa’s observability toolkitotel
: OpenTelemetry core functionalityotlpmetricgrpc
and otlptracegrpc
: OTLP exporters for sending telemetry dataotelhttp
and otelgrpc
: Auto-instrumentation for HTTP and gRPCThe logger context is the foundation of your observability setup. It carries configuration and correlation IDs throughout your application:
// Configure logger format based on environment
format := log.FormatJSON
if log.IsTerminal() {
format = log.FormatTerminal // Human-readable format for development
}
// Create base context with formatting and span tracking
ctx := log.Context(context.Background(),
log.WithFormat(format), // Set output format
log.WithFunc(log.Span)) // Include trace/span IDs in logs
// Enable debug logging if needed
if *debugf {
ctx = log.Context(ctx, log.WithDebug())
log.Debugf(ctx, "debug logs enabled")
}
// Add service information
ctx = log.With(ctx,
log.KV{"service", serviceName},
log.KV{"version", version},
log.KV{"env", environment})
The logger context provides:
OpenTelemetry setup involves creating exporters and configuring global providers:
// Create OTLP exporters for sending telemetry to a collector
spanExporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(*coladdr),
otlptracegrpc.WithTLSCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf(ctx, err, "failed to initialize tracing")
}
defer func() {
ctx := log.Context(context.Background())
if err := spanExporter.Shutdown(ctx); err != nil {
log.Errorf(ctx, err, "failed to shutdown tracing")
}
}()
metricExporter, err := otlpmetricgrpc.New(ctx,
otlpmetricgrpc.WithEndpoint(*coladdr),
otlpmetricgrpc.WithTLSCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf(ctx, err, "failed to initialize metrics")
}
defer func() {
ctx := log.Context(context.Background())
if err := metricExporter.Shutdown(ctx); err != nil {
log.Errorf(ctx, err, "failed to shutdown metrics")
}
}()
// Initialize Clue with the exporters
cfg, err := clue.NewConfig(ctx,
serviceName,
version,
metricExporter,
spanExporter,
clue.WithResourceAttributes(map[string]string{
"environment": environment,
"region": region,
}))
if err != nil {
log.Fatalf(ctx, err, "failed to initialize observability")
}
clue.ConfigureOpenTelemetry(ctx, cfg)
This configuration sets up the core OpenTelemetry infrastructure for your service. It creates exporters that send your telemetry data to a collector for processing and storage. The configuration also ensures proper shutdown handling to avoid data loss when your service terminates. Resource attributes like environment and region are added to help organize and filter your telemetry data effectively. Finally, it initializes the global OpenTelemetry providers that enable tracing and metrics collection throughout your application.
For HTTP services, wrap your handlers with observability middleware:
// Create Goa HTTP muxer
mux := goahttp.NewMuxer()
// Mount debug endpoints
debug.MountDebugLogEnabler(debug.Adapt(mux)) // Dynamic log level control
debug.MountPprofHandlers(debug.Adapt(mux)) // Go profiling endpoints
// Add middleware in correct order (inside to out):
handler := otelhttp.NewHandler(mux, serviceName) // 3. OpenTelemetry
handler = debug.HTTP()(handler) // 2. Debug endpoints
handler = log.HTTP(ctx)(handler) // 1. Request logging
// Create server with the instrumented handler
server := &http.Server{
Addr: *httpAddr,
Handler: handler,
ReadTimeout: 15 * time.Second,
WriteTimeout: 15 * time.Second,
}
For gRPC services, use interceptors:
// Create gRPC client connection with observability
conn, err := grpc.DialContext(ctx, *serverAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithUnaryInterceptor(log.UnaryClientInterceptor()),
grpc.WithStatsHandler(otelgrpc.NewClientHandler()))
// Create gRPC server with observability
srv := grpc.NewServer(
grpc.UnaryInterceptor(log.UnaryServerInterceptor()),
grpc.StatsHandler(otelgrpc.NewServerHandler()))
The middleware/interceptors provide:
Health checks help monitor your service and its dependencies. Clue provides two main interfaces for implementing health checks:
The Pinger
interface defines how to check the health of a single dependency:
type Pinger interface {
// Name returns the name of the remote service
Name() string
// Ping checks if the service is healthy
Ping(context.Context) error
}
Clue provides a default HTTP-based implementation that pings a health check endpoint:
// Create a pinger for a database service
dbPinger := health.NewPinger("database", "db:8080",
health.WithScheme("https"), // Use HTTPS (default: http)
health.WithPath("/health")) // Custom path (default: /livez)
// Create a pinger for Redis
redisPinger := health.NewPinger("redis", "redis:6379",
health.WithPath("/ping")) // Redis health endpoint
You can also implement custom pingers for special cases:
type CustomPinger struct {
name string
db *sql.DB
}
func (p *CustomPinger) Name() string { return p.name }
func (p *CustomPinger) Ping(ctx context.Context) error {
return p.db.PingContext(ctx)
}
The Checker
interface aggregates multiple pingers and provides overall health status:
type Checker interface {
// Check returns the health status of all dependencies
Check(context.Context) (*Health, bool)
}
// Health contains detailed status information
type Health struct {
Uptime int64 // Service uptime in seconds
Version string // Service version
Status map[string]string // Status of each dependency
}
Create a checker with multiple dependencies:
// Create health checker with multiple pingers
checker := health.NewChecker(
health.NewPinger("database", *dbAddr),
health.NewPinger("cache", *cacheAddr),
health.NewPinger("search", *searchAddr),
&CustomPinger{name: "custom", db: db},
)
// Create HTTP handler from checker
check := health.Handler(checker)
// Add logging to health checks
check = log.HTTP(ctx)(check).(http.HandlerFunc)
// Mount health endpoints (often on separate port)
http.Handle("/healthz", check) // Kubernetes liveness probe
http.Handle("/livez", check) // Kubernetes readiness probe
The health check endpoint returns a JSON response with detailed status:
{
"uptime": 3600, // Seconds since service start
"version": "1.0.0", // Service version
"status": { // Status of each dependency
"database": "OK",
"cache": "OK",
"search": "NOT OK" // Failed dependency
}
}
The response code is:
200
if all dependencies are healthy503
if any dependency is unhealthy// Create main application server
appServer := &http.Server{
Addr: *httpAddr,
Handler: appHandler,
}
// Create health check server on different port
healthServer := &http.Server{
Addr: *healthAddr,
Handler: check,
}
// Create pinger with custom client
client := &http.Client{Timeout: 5 * time.Second}
pinger := health.NewPinger("service", *addr,
health.WithClient(client))
check = log.HTTP(ctx,
log.With(ctx, log.KV{"component", "health"}))(check)
livenessProbe:
httpGet:
path: /healthz
port: health-port
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /livez
port: health-port
initialDelaySeconds: 5
periodSeconds: 2
Health checks are a critical component of service observability. They continuously monitor the status of your service’s dependencies, ensuring you’re alerted quickly if a dependency becomes unhealthy. The checks integrate seamlessly with Kubernetes probes, allowing the container orchestrator to make informed decisions about pod lifecycle management. When issues occur, the health checks log errors appropriately, providing valuable debugging information.
Implement proper shutdown handling to ensure clean service termination:
// Create shutdown channel
errc := make(chan error)
go func() {
c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)
errc <- fmt.Errorf("signal: %s", <-c)
}()
// Start servers
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
log.Printf(ctx, "HTTP server listening on %s", *httpAddr)
if err := server.ListenAndServe(); err != http.ErrServerClosed {
errc <- err
}
}()
// Wait for shutdown signal
if err := <-errc; err != nil {
log.Errorf(ctx, err, "shutdown initiated")
}
// Graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
log.Errorf(ctx, err, "shutdown error")
}
wg.Wait()
Proper shutdown ensures:
Control how much trace data you collect. Clue provides two sampling strategies:
Use a fixed percentage of requests:
cfg := clue.NewConfig(ctx,
serviceName,
version,
metricExporter,
spanExporter,
clue.WithSamplingRate(0.1)) // Sample 10% of requests
For more dynamic control, use the adaptive sampler which automatically adjusts the sampling rate to maintain a target requests-per-second:
cfg := clue.NewConfig(ctx,
serviceName,
version,
metricExporter,
spanExporter,
clue.WithSampler(
clue.AdaptiveSampler(
100, // Target 100 traces per second
1000))) // Adjust rate every 1000 requests
The adaptive sampler:
For example:
This adaptive sampling approach ensures you never exceed your target sampling rate while maintaining optimal visibility. During quiet periods, you get full visibility into your system’s behavior since all requests can be sampled. During peak loads, the sampling automatically adjusts to remain cost-effective while still providing statistically significant data for analysis.
Add metadata to all telemetry:
cfg := clue.NewConfig(ctx,
serviceName,
version,
metricExporter,
spanExporter,
clue.WithResourceAttributes(map[string]string{
"environment": "production",
"region": "us-west",
"deployment": "blue",
}))
OpenTelemetry can send telemetry data to different backends (systems that store and visualize your observability data). While the previous examples used OTLP (OpenTelemetry Protocol) exporters, you can use other popular systems:
Exporters are components that send your telemetry data (traces, metrics, logs) to a backend system for storage and analysis. Think of them as adapters that translate OpenTelemetry data into a format that specific backends understand.
Jaeger - Popular open-source distributed tracing system:
import "go.opentelemetry.io/otel/exporters/jaeger"
// Send traces directly to Jaeger
spanExporter, err := jaeger.New(
jaeger.WithCollectorEndpoint(
jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
Prometheus - Industry-standard metrics collection system:
import "go.opentelemetry.io/otel/exporters/prometheus"
// Export metrics in Prometheus format
metricExporter, err := prometheus.New(
prometheus.WithNamespace("myapp")) // Prefix all metrics
Zipkin - Another distributed tracing system:
import "go.opentelemetry.io/otel/exporters/zipkin"
// Send traces to Zipkin
spanExporter, err := zipkin.New(
"http://zipkin:9411/api/v2/spans")
You can send data to multiple backends simultaneously:
// Create exporters
jaegerExp, err := jaeger.New(jaegerEndpoint)
prometheusExp, err := prometheus.New()
otlpExp, err := otlp.New(otlpEndpoint)
// Configure with multiple exporters
cfg, err := clue.NewConfig(ctx,
serviceName,
version,
prometheusExp, // Metrics to Prometheus
otlpExp, // Traces to OTLP
clue.WithTraceExporter(jaegerExp)) // Also send traces to Jaeger
Using multiple exporters provides several benefits. You can leverage specialized tools that excel at specific observability needs - for example, using Prometheus for metrics while sending traces to Jaeger. This also enables you to compare different backend capabilities side-by-side to evaluate which best suits your requirements. Additionally, when you need to migrate between observability systems, you can do so gradually by running both systems in parallel during the transition period.
Now that you have basic observability set up, explore: