AWS SDK Metrics for Enterprise Support (SDK Metrics) enables Enterprise customers to collect metrics from AWS SDKs on their hosts and clients shared with AWS Enterprise Support. SDK Metrics provides information that helps speed up detection and diagnosis of issues occurring in connections to AWS services for AWS Enterprise Support customers.
As telemetry is collected on each host, it is relayed via UDP to 127.0.0.1 (aka localhost), where the CloudWatch agent aggregates the data and sends it to the SDK Metrics service. Therefore, to receive metrics, you must add the CloudWatch agent to your instance.
The following steps to set up SDK Metrics pertain to an Amazon EC2 instance running Amazon Linux for a client application that is using the AWS SDK for Python. SDK Metrics is also available for your production environments if you enable it while configuring the AWS SDK for Python.
To utilize SDK Metrics, run the latest version of the CloudWatch agent. Learn how to Configure the CloudWatch Agent for SDK Metrics in the Amazon CloudWatch User Guide.
To set up SDK Metrics with the AWS SDK for Python, follow these instructions:
For more information, see the following:
By default, SDK Metrics is turned off, and the port is set to 31000. The following are the default parameters.
Enabling SDK Metrics is independent of configuring your credentials to use an AWS service.
You can enable SDK Metrics by setting environment variables or by using the AWS Shared config file.
If AWS_CSM_ENABLED is not set, the SDK checks the AWS_DEFAULT_PROFILE profile to determine if SDK Metrics is enabled. By default this is set to false.
To turn on SDK Metrics, add the following to your environmental variables.
export AWS_CSM_ENABLED=true
Other configuration settings are available.
Note: Enabling SDK Metrics does not configure your credentials to use an AWS service.
To make changes to the port, you need to set the values and then restart any AWS jobs that are currently active.
Most services use the default port. But if your service requires a unique port ID, add AWS_CSM_PORT=[port_number], to the host's environment variables.
export AWS_CSM_ENABLED=true
export AWS_CSM_PORT=1234
Most services use the default port. But if your service requires a unique port ID, add csm_port = [port_number] to ~/.aws/config.
[default]
csm_enabled = false
csm_port = 1234
[profile aws_csm]
csm_enabled = false
csm_port = 1234
To restart a job, run the following commands.
amazon-cloudwatch-agent-ctl –a stop;
amazon-cloudwatch-agent-ctl –a start;
To turn off SDK Metrics, remove csm_enabled from your environment variables, or in your AWS Shared config file located at ~/.aws/config. Then restart your CloudWatch agent so that the changes can take effect.
Environment Variables
Remove AWS_CSM_ENABLED from your environment variables or set it to false.
unset AWS_CSM_ENABLED
AWS Shared Config File
Remove csm_enabled from the profiles in your AWS Shared config file located at ~/.aws/config.
Note
Environment variables override the AWS Shared config file. If SDK Metrics is enabled in the environment variables, the SDK Metrics remain enabled.
To explicitly opt-out of SDK Metrics set csm_enabled to false.
[default]
csm_enabled = false
[profile aws_csm]
csm_enabled = false
To disable SDK Metrics, use the following command to stop CloudWatch agent.
sudo amazon-cloudwatch-agent-ctl -a stop &&
echo "Done"
If you are using other CloudWatch features, restart CloudWatch Agent with the following command.
amazon-cloudwatch-agent-ctl –a start;
To restart a SDK Metrics job, run the following commands.
amazon-cloudwatch-agent-ctl –a stop;
amazon-cloudwatch-agent-ctl –a start;
You can use the following descriptions of SDK Metrics to interpret your results. In general, these metrics are available for review with your Technical Account Manager during regular business reviews. AWS Support resources and your Technical Account Manager should have access to SDK Metrics data to help you resolve cases, but if you discover data that is confusing or unexpected, but doesn’t seem to be negatively impacting your applications’ performance, it is best to review that data during scheduled business reviews.
Metric: | CallCount |
---|---|
Definition | Total number of successful or failed API calls from your code to AWS services |
How to use it | Use it as a baseline to correlate with other metrics like errors or throttling. |
Metric: | ClientErrorCount |
---|---|
Definition | Number of API calls that fail with client errors (4xx HTTP response codes). Examples: Throttling, Access denied, S3 bucket does not exist, and Invalid parameter value. |
How to use it | Except in certain cases related to throttling (ex. when throttling occurs due to a limit that needs to be increased) this metric can indicate something in your application that needs to be fixed. |
Metric: | ConnectionErrorCount |
---|---|
Definition | Number of API calls that fail because of errors connecting to the service. These can be caused by network issues between the customer application and AWS services including load balancers, DNS failures, transit providers. In some cases, AWS issues may result in this error. |
How to use it | Use this metric to determine whether issues are specific to your application or are caused by your infrastructure and/or network. High ConnectionErrorCount could also indicate short timeout values for API calls. |
Metric: | ThrottleCount |
---|---|
Definition | Number of API calls that fail due to throttling by AWS services. |
How to use it | Use this metric to assess if your application has reached throttle limits, as well as to determine the cause of retries and application latency. Consider distributing calls over a window instead of batching your calls. |
Metric: | ServerErrorCount |
---|---|
Definition | Number of API calls that fail due to server errors (5xx HTTP response codes) from AWS Services. These are typically caused by AWS services. |
How to use it | Determine cause of SDK retries or latency. This metric will not always indicate that AWS services are at fault, as some AWS teams classify latency as an HTTP 503 response. |
Metric: | EndToEndLatency |
---|---|
Definition | Total time for your application to make a call using the AWS SDK, inclusive of retries. In other words, regardless of whether it is successful after several attempts, or as soon as a call fails due to an unretriable error. |
How to use it | Determine how AWS API calls contribute to your application’s overall latency. Higher than expected latency may be caused by issues with network, firewall, or other configuration settings, or by latency that occurs as a result of SDK retries. |