SageMaker / Client / batch_delete_cluster_nodes
batch_delete_cluster_nodes#
- SageMaker.Client.batch_delete_cluster_nodes(**kwargs)#
Deletes specific nodes within a SageMaker HyperPod cluster.
BatchDeleteClusterNodes
accepts a cluster name and a list of node IDs.Warning
To safeguard your work, back up your data to Amazon S3 or an FSx for Lustre file system before invoking the API on a worker node group. This will help prevent any potential data loss from the instance root volume. For more information about backup, see Use the backup script provided by SageMaker HyperPod.
If you want to invoke this API on an existing cluster, you’ll first need to patch the cluster by running the UpdateClusterSoftware API. For more information about patching a cluster, see Update the SageMaker HyperPod platform software of a cluster.
See also: AWS API Documentation
Request Syntax
response = client.batch_delete_cluster_nodes( ClusterName='string', NodeIds=[ 'string', ] )
- Parameters:
ClusterName (string) –
[REQUIRED]
The name of the SageMaker HyperPod cluster from which to delete the specified nodes.
NodeIds (list) –
[REQUIRED]
A list of node IDs to be deleted from the specified cluster.
Note
For SageMaker HyperPod clusters using the Slurm workload manager, you cannot remove instances that are configured as Slurm controller nodes.
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'Failed': [ { 'Code': 'NodeIdNotFound'|'InvalidNodeStatus'|'NodeIdInUse', 'Message': 'string', 'NodeId': 'string' }, ], 'Successful': [ 'string', ] }
Response Structure
(dict) –
Failed (list) –
A list of errors encountered when deleting the specified nodes.
(dict) –
Represents an error encountered when deleting a node from a SageMaker HyperPod cluster.
Code (string) –
The error code associated with the error encountered when deleting a node.
The code provides information about the specific issue encountered, such as the node not being found, the node’s status being invalid for deletion, or the node ID being in use by another process.
Message (string) –
A message describing the error encountered when deleting a node.
NodeId (string) –
The ID of the node that encountered an error during the deletion process.
Successful (list) –
A list of node IDs that were successfully deleted from the specified cluster.
(string) –
Exceptions