EMR / Client / describe_job_flows
describe_job_flows#
- EMR.Client.describe_job_flows(**kwargs)#
This API is no longer supported and will eventually be removed. We recommend you use ListClusters, DescribeCluster, ListSteps, ListInstanceGroups and ListBootstrapActions instead.
DescribeJobFlows returns a list of job flows that match all of the supplied parameters. The parameters can include a list of job flow IDs, job flow states, and restrictions on job flow creation date and time.
Regardless of supplied parameters, only job flows created within the last two months are returned.
If no parameters are supplied, then job flows matching either of the following criteria are returned:
Job flows created and completed in the last two weeks
Job flows created within the last two months that are in one of the following states:
RUNNING
,WAITING
,SHUTTING_DOWN
,STARTING
Amazon EMR can return a maximum of 512 job flow descriptions.
Danger
This operation is deprecated and may not function as expected. This operation should not be used going forward and is only kept for the purpose of backwards compatiblity.
See also: AWS API Documentation
Request Syntax
response = client.describe_job_flows( CreatedAfter=datetime(2015, 1, 1), CreatedBefore=datetime(2015, 1, 1), JobFlowIds=[ 'string', ], JobFlowStates=[ 'STARTING'|'BOOTSTRAPPING'|'RUNNING'|'WAITING'|'SHUTTING_DOWN'|'TERMINATED'|'COMPLETED'|'FAILED', ] )
- Parameters:
CreatedAfter (datetime) – Return only job flows created after this date and time.
CreatedBefore (datetime) – Return only job flows created before this date and time.
JobFlowIds (list) –
Return only job flows whose job flow ID is contained in this list.
(string) –
JobFlowStates (list) –
Return only job flows whose state is contained in this list.
(string) –
The type of instance.
- Return type:
dict
- Returns:
Response Syntax
{ 'JobFlows': [ { 'JobFlowId': 'string', 'Name': 'string', 'LogUri': 'string', 'LogEncryptionKmsKeyId': 'string', 'AmiVersion': 'string', 'ExecutionStatusDetail': { 'State': 'STARTING'|'BOOTSTRAPPING'|'RUNNING'|'WAITING'|'SHUTTING_DOWN'|'TERMINATED'|'COMPLETED'|'FAILED', 'CreationDateTime': datetime(2015, 1, 1), 'StartDateTime': datetime(2015, 1, 1), 'ReadyDateTime': datetime(2015, 1, 1), 'EndDateTime': datetime(2015, 1, 1), 'LastStateChangeReason': 'string' }, 'Instances': { 'MasterInstanceType': 'string', 'MasterPublicDnsName': 'string', 'MasterInstanceId': 'string', 'SlaveInstanceType': 'string', 'InstanceCount': 123, 'InstanceGroups': [ { 'InstanceGroupId': 'string', 'Name': 'string', 'Market': 'ON_DEMAND'|'SPOT', 'InstanceRole': 'MASTER'|'CORE'|'TASK', 'BidPrice': 'string', 'InstanceType': 'string', 'InstanceRequestCount': 123, 'InstanceRunningCount': 123, 'State': 'PROVISIONING'|'BOOTSTRAPPING'|'RUNNING'|'RECONFIGURING'|'RESIZING'|'SUSPENDED'|'TERMINATING'|'TERMINATED'|'ARRESTED'|'SHUTTING_DOWN'|'ENDED', 'LastStateChangeReason': 'string', 'CreationDateTime': datetime(2015, 1, 1), 'StartDateTime': datetime(2015, 1, 1), 'ReadyDateTime': datetime(2015, 1, 1), 'EndDateTime': datetime(2015, 1, 1), 'CustomAmiId': 'string' }, ], 'NormalizedInstanceHours': 123, 'Ec2KeyName': 'string', 'Ec2SubnetId': 'string', 'Placement': { 'AvailabilityZone': 'string', 'AvailabilityZones': [ 'string', ] }, 'KeepJobFlowAliveWhenNoSteps': True|False, 'TerminationProtected': True|False, 'HadoopVersion': 'string' }, 'Steps': [ { 'StepConfig': { 'Name': 'string', 'ActionOnFailure': 'TERMINATE_JOB_FLOW'|'TERMINATE_CLUSTER'|'CANCEL_AND_WAIT'|'CONTINUE', 'HadoopJarStep': { 'Properties': [ { 'Key': 'string', 'Value': 'string' }, ], 'Jar': 'string', 'MainClass': 'string', 'Args': [ 'string', ] } }, 'ExecutionStatusDetail': { 'State': 'PENDING'|'RUNNING'|'CONTINUE'|'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED', 'CreationDateTime': datetime(2015, 1, 1), 'StartDateTime': datetime(2015, 1, 1), 'EndDateTime': datetime(2015, 1, 1), 'LastStateChangeReason': 'string' } }, ], 'BootstrapActions': [ { 'BootstrapActionConfig': { 'Name': 'string', 'ScriptBootstrapAction': { 'Path': 'string', 'Args': [ 'string', ] } } }, ], 'SupportedProducts': [ 'string', ], 'VisibleToAllUsers': True|False, 'JobFlowRole': 'string', 'ServiceRole': 'string', 'AutoScalingRole': 'string', 'ScaleDownBehavior': 'TERMINATE_AT_INSTANCE_HOUR'|'TERMINATE_AT_TASK_COMPLETION' }, ] }
Response Structure
(dict) –
The output for the DescribeJobFlows operation.
JobFlows (list) –
A list of job flows matching the parameters supplied.
(dict) –
A description of a cluster (job flow).
JobFlowId (string) –
The job flow identifier.
Name (string) –
The name of the job flow.
LogUri (string) –
The location in Amazon S3 where log files for the job are stored.
LogEncryptionKmsKeyId (string) –
The KMS key used for encrypting log files. This attribute is only available with EMR version 5.30.0 and later, excluding EMR 6.0.0.
AmiVersion (string) –
Applies only to Amazon EMR AMI versions 3.x and 2.x. For Amazon EMR releases 4.0 and later,
ReleaseLabel
is used. To specify a custom AMI, useCustomAmiID
.ExecutionStatusDetail (dict) –
Describes the execution status of the job flow.
State (string) –
The state of the job flow.
CreationDateTime (datetime) –
The creation date and time of the job flow.
StartDateTime (datetime) –
The start date and time of the job flow.
ReadyDateTime (datetime) –
The date and time when the job flow was ready to start running bootstrap actions.
EndDateTime (datetime) –
The completion date and time of the job flow.
LastStateChangeReason (string) –
Description of the job flow last changed state.
Instances (dict) –
Describes the Amazon EC2 instances of the job flow.
MasterInstanceType (string) –
The Amazon EC2 master node instance type.
MasterPublicDnsName (string) –
The DNS name of the master node. If the cluster is on a private subnet, this is the private DNS name. On a public subnet, this is the public DNS name.
MasterInstanceId (string) –
The Amazon EC2 instance identifier of the master node.
SlaveInstanceType (string) –
The Amazon EC2 core and task node instance type.
InstanceCount (integer) –
The number of Amazon EC2 instances in the cluster. If the value is 1, the same instance serves as both the master and core and task node. If the value is greater than 1, one instance is the master node and all others are core and task nodes.
InstanceGroups (list) –
Details about the instance groups in a cluster.
(dict) –
Detailed information about an instance group.
InstanceGroupId (string) –
Unique identifier for the instance group.
Name (string) –
Friendly name for the instance group.
Market (string) –
Market type of the EC2 instances used to create a cluster node.
InstanceRole (string) –
Instance group role in the cluster
BidPrice (string) –
If specified, indicates that the instance group uses Spot Instances. This is the maximum price you are willing to pay for Spot Instances. Specify
OnDemandPrice
to set the amount equal to the On-Demand price, or specify an amount in USD.InstanceType (string) –
EC2 instance type.
InstanceRequestCount (integer) –
Target number of instances to run in the instance group.
InstanceRunningCount (integer) –
Actual count of running instances.
State (string) –
State of instance group. The following values are no longer supported: STARTING, TERMINATED, and FAILED.
LastStateChangeReason (string) –
Details regarding the state of the instance group.
CreationDateTime (datetime) –
The date/time the instance group was created.
StartDateTime (datetime) –
The date/time the instance group was started.
ReadyDateTime (datetime) –
The date/time the instance group was available to the cluster.
EndDateTime (datetime) –
The date/time the instance group was terminated.
CustomAmiId (string) –
The custom AMI ID to use for the provisioned instance group.
NormalizedInstanceHours (integer) –
An approximation of the cost of the cluster, represented in m1.small/hours. This value is increased one time for every hour that an m1.small instance runs. Larger instances are weighted more heavily, so an Amazon EC2 instance that is roughly four times more expensive would result in the normalized instance hours being increased incrementally four times. This result is only an approximation and does not reflect the actual billing rate.
Ec2KeyName (string) –
The name of an Amazon EC2 key pair that can be used to connect to the master node using SSH.
Ec2SubnetId (string) –
For clusters launched within Amazon Virtual Private Cloud, this is the identifier of the subnet where the cluster was launched.
Placement (dict) –
The Amazon EC2 Availability Zone for the cluster.
AvailabilityZone (string) –
The Amazon EC2 Availability Zone for the cluster.
AvailabilityZone
is used for uniform instance groups, whileAvailabilityZones
(plural) is used for instance fleets.AvailabilityZones (list) –
When multiple Availability Zones are specified, Amazon EMR evaluates them and launches instances in the optimal Availability Zone.
AvailabilityZones
is used for instance fleets, whileAvailabilityZone
(singular) is used for uniform instance groups.Note
The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions.
(string) –
KeepJobFlowAliveWhenNoSteps (boolean) –
Specifies whether the cluster should remain available after completing all steps.
TerminationProtected (boolean) –
Specifies whether the Amazon EC2 instances in the cluster are protected from termination by API calls, user intervention, or in the event of a job-flow error.
HadoopVersion (string) –
The Hadoop version for the cluster.
Steps (list) –
A list of steps run by the job flow.
(dict) –
Combines the execution state and configuration of a step.
StepConfig (dict) –
The step configuration.
Name (string) –
The name of the step.
ActionOnFailure (string) –
The action to take when the step fails. Use one of the following values:
TERMINATE_CLUSTER
- Shuts down the cluster.CANCEL_AND_WAIT
- Cancels any pending steps and returns the cluster to theWAITING
state.CONTINUE
- Continues to the next step in the queue.TERMINATE_JOB_FLOW
- Shuts down the cluster.TERMINATE_JOB_FLOW
is provided for backward compatibility. We recommend usingTERMINATE_CLUSTER
instead.
If a cluster’s
StepConcurrencyLevel
is greater than1
, do not useAddJobFlowSteps
to submit a step with this parameter set toCANCEL_AND_WAIT
orTERMINATE_CLUSTER
. The step is not submitted and the action fails with a message that theActionOnFailure
setting is not valid.If you change a cluster’s
StepConcurrencyLevel
to be greater than 1 while a step is running, theActionOnFailure
parameter may not behave as you expect. In this case, for a step that fails with this parameter set toCANCEL_AND_WAIT
, pending steps and the running step are not canceled; for a step that fails with this parameter set toTERMINATE_CLUSTER
, the cluster does not terminate.HadoopJarStep (dict) –
The JAR file used for the step.
Properties (list) –
A list of Java properties that are set when the step runs. You can use these properties to pass key-value pairs to your main function.
(dict) –
A key-value pair.
Key (string) –
The unique identifier of a key-value pair.
Value (string) –
The value part of the identified key.
Jar (string) –
A path to a JAR file run during the step.
MainClass (string) –
The name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.
Args (list) –
A list of command line arguments passed to the JAR file’s main function when executed.
(string) –
ExecutionStatusDetail (dict) –
The description of the step status.
State (string) –
The state of the step.
CreationDateTime (datetime) –
The creation date and time of the step.
StartDateTime (datetime) –
The start date and time of the step.
EndDateTime (datetime) –
The completion date and time of the step.
LastStateChangeReason (string) –
A description of the step’s current state.
BootstrapActions (list) –
A list of the bootstrap actions run by the job flow.
(dict) –
Reports the configuration of a bootstrap action in a cluster (job flow).
BootstrapActionConfig (dict) –
A description of the bootstrap action.
Name (string) –
The name of the bootstrap action.
ScriptBootstrapAction (dict) –
The script run by the bootstrap action.
Path (string) –
Location in Amazon S3 of the script to run during a bootstrap action.
Args (list) –
A list of command line arguments to pass to the bootstrap action script.
(string) –
SupportedProducts (list) –
A list of strings set by third-party software when the job flow is launched. If you are not using third-party software to manage the job flow, this value is empty.
(string) –
VisibleToAllUsers (boolean) –
Indicates whether the cluster is visible to IAM principals in the Amazon Web Services account associated with the cluster. When
true
, IAM principals in the Amazon Web Services account can perform EMR cluster actions that their IAM policies allow. Whenfalse
, only the IAM principal that created the cluster and the Amazon Web Services account root user can perform EMR actions, regardless of IAM permissions policies attached to other IAM principals.The default value is
true
if a value is not provided when creating a cluster using the EMR API RunJobFlow command, the CLI create-cluster command, or the Amazon Web Services Management Console.JobFlowRole (string) –
The IAM role that was specified when the job flow was launched. The EC2 instances of the job flow assume this role.
ServiceRole (string) –
The IAM role that is assumed by the Amazon EMR service to access Amazon Web Services resources on your behalf.
AutoScalingRole (string) –
An IAM role for automatic scaling policies. The default role is
EMR_AutoScaling_DefaultRole
. The IAM role provides a way for the automatic scaling feature to get the required permissions it needs to launch and terminate EC2 instances in an instance group.ScaleDownBehavior (string) –
The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.
TERMINATE_AT_INSTANCE_HOUR
indicates that Amazon EMR terminates nodes at the instance-hour boundary, regardless of when the request to terminate the instance was submitted. This option is only available with Amazon EMR 5.1.0 and later and is the default for clusters created using that version.TERMINATE_AT_TASK_COMPLETION
indicates that Amazon EMR adds nodes to a deny list and drains tasks from nodes before terminating the Amazon EC2 instances, regardless of the instance-hour boundary. With either behavior, Amazon EMR removes the least active nodes first and blocks instance termination if it could lead to HDFS corruption.TERMINATE_AT_TASK_COMPLETION
available only in Amazon EMR version 4.1.0 and later, and is the default for versions of Amazon EMR earlier than 5.1.0.
Exceptions
EMR.Client.exceptions.InternalServerError