Every programmer has used log messages to debug problems. We do too. For logging to be useful, we must:
- Log the right data
- Aggregate and persist the data that we need long term access to
Logging the right data
We looked at the log messages generated by our services. Some services were very noisy. At this point in development, we needed the log messages for debugging. Noisy logs were impeding our progress, so we combed through each service to quiet the noisy ones. Some of our services log practically every HTTP request. Some log none. We still have much inconsistency.
Access logging
As we mentioned in our blog post on our API Gateway, our gateway logs every HTTP access to our backend. We collect several fields from the request. We also log information about the response. Here are the fields that we log, using the Envoy access logging notation:
- %REQ(:authority)%
- %REQ(:scheme)%
- %REQ(:method)%
- %REQ(:path)%
- %REQ(authorization)%
- %REQ(content-type)%
- %REQ(x-tidepool-trace-request)%
- %REQ(x-tidepool-trace-session)%
- %REQ(x-tidepool-session-token)%
- %REQ(x-forwarded-for)%
- %REQ(user-agent)%
- %REQ(referer)%
- %DOWNSTREAM_REMOTE_ADDRESS_WITHOUT_PORT%
- %RESPONSE_CODE%
- %DURATION%
- %BYTES_RECEIVED%
- %RESPONSE_CODE_DETAILS%
- %REQUEST_DURATION%
- %RESPONSE_FLAGS%
- %START_TIME%
- %UPSTREAM_CLUSTER%
These fields allow us to track all access to our backend back to the clients that attempt such access. 
Log aggregation
Each Kubernetes container emits log messages to the standard output. 
For casual debugging, we initially used the kubectl from the command line and the Kubernetes dashboard as a GUI. However, we have found k9s more to our liking.
When we need to look at logs from distinct pods, correlated over time, we use kail and stern. (Recently stern has been integrated into k9s.)
This allows quick access to recent logs. But what about access to older logs? And what about persisting access logs?
We use fluentd and fluentbit to aggregate logs and forward them to Sumologic. This addresses our long term log storage needs.
A challenge with this is that our logs can grow quite large! We have found that we must be careful when enabling debug logging to omit those logs from log aggregation entirely!
Upcoming
In our next Engineering blog post, we will discuss how we provision our Kubernetes clusters.