Glue.Client.
get_crawlers
(**kwargs)¶Retrieves metadata for all crawlers defined in the customer account.
See also: AWS API Documentation
Request Syntax
response = client.get_crawlers(
MaxResults=123,
NextToken='string'
)
dict
Response Syntax
{
'Crawlers': [
{
'Name': 'string',
'Role': 'string',
'Targets': {
'S3Targets': [
{
'Path': 'string',
'Exclusions': [
'string',
],
'ConnectionName': 'string',
'SampleSize': 123,
'EventQueueArn': 'string',
'DlqEventQueueArn': 'string'
},
],
'JdbcTargets': [
{
'ConnectionName': 'string',
'Path': 'string',
'Exclusions': [
'string',
],
'EnableAdditionalMetadata': [
'COMMENTS'|'RAWTYPES',
]
},
],
'MongoDBTargets': [
{
'ConnectionName': 'string',
'Path': 'string',
'ScanAll': True|False
},
],
'DynamoDBTargets': [
{
'Path': 'string',
'scanAll': True|False,
'scanRate': 123.0
},
],
'CatalogTargets': [
{
'DatabaseName': 'string',
'Tables': [
'string',
],
'ConnectionName': 'string',
'EventQueueArn': 'string',
'DlqEventQueueArn': 'string'
},
],
'DeltaTargets': [
{
'DeltaTables': [
'string',
],
'ConnectionName': 'string',
'WriteManifest': True|False,
'CreateNativeDeltaTable': True|False
},
]
},
'DatabaseName': 'string',
'Description': 'string',
'Classifiers': [
'string',
],
'RecrawlPolicy': {
'RecrawlBehavior': 'CRAWL_EVERYTHING'|'CRAWL_NEW_FOLDERS_ONLY'|'CRAWL_EVENT_MODE'
},
'SchemaChangePolicy': {
'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
},
'LineageConfiguration': {
'CrawlerLineageSettings': 'ENABLE'|'DISABLE'
},
'State': 'READY'|'RUNNING'|'STOPPING',
'TablePrefix': 'string',
'Schedule': {
'ScheduleExpression': 'string',
'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
},
'CrawlElapsedTime': 123,
'CreationTime': datetime(2015, 1, 1),
'LastUpdated': datetime(2015, 1, 1),
'LastCrawl': {
'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED',
'ErrorMessage': 'string',
'LogGroup': 'string',
'LogStream': 'string',
'MessagePrefix': 'string',
'StartTime': datetime(2015, 1, 1)
},
'Version': 123,
'Configuration': 'string',
'CrawlerSecurityConfiguration': 'string',
'LakeFormationConfiguration': {
'UseLakeFormationCredentials': True|False,
'AccountId': 'string'
}
},
],
'NextToken': 'string'
}
Response Structure
(dict) --
Crawlers (list) --
A list of crawler metadata.
(dict) --
Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog.
Name (string) --
The name of the crawler.
Role (string) --
The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
Targets (dict) --
A collection of targets to crawl.
S3Targets (list) --
Specifies Amazon Simple Storage Service (Amazon S3) targets.
(dict) --
Specifies a data store in Amazon Simple Storage Service (Amazon S3).
Path (string) --
The path to the Amazon S3 target.
Exclusions (list) --
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
ConnectionName (string) --
The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).
SampleSize (integer) --
Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.
EventQueueArn (string) --
A valid Amazon SQS ARN. For example, arn:aws:sqs:region:account:sqs
.
DlqEventQueueArn (string) --
A valid Amazon dead-letter SQS ARN. For example, arn:aws:sqs:region:account:deadLetterQueue
.
JdbcTargets (list) --
Specifies JDBC targets.
(dict) --
Specifies a JDBC data store to crawl.
ConnectionName (string) --
The name of the connection to use to connect to the JDBC target.
Path (string) --
The path of the JDBC target.
Exclusions (list) --
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
EnableAdditionalMetadata (list) --
Specify a value of RAWTYPES
or COMMENTS
to enable additional metadata in table responses. RAWTYPES
provides the native-level datatype. COMMENTS
provides comments associated with a column or table in the database.
If you do not need additional metadata, keep the field empty.
MongoDBTargets (list) --
Specifies Amazon DocumentDB or MongoDB targets.
(dict) --
Specifies an Amazon DocumentDB or MongoDB data store to crawl.
ConnectionName (string) --
The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.
Path (string) --
The path of the Amazon DocumentDB or MongoDB target (database/collection).
ScanAll (boolean) --
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of true
means to scan all records, while a value of false
means to sample the records. If no value is specified, the value defaults to true
.
DynamoDBTargets (list) --
Specifies Amazon DynamoDB targets.
(dict) --
Specifies an Amazon DynamoDB table to crawl.
Path (string) --
The name of the DynamoDB table to crawl.
scanAll (boolean) --
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of true
means to scan all records, while a value of false
means to sample the records. If no value is specified, the value defaults to true
.
scanRate (float) --
The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.
The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).
CatalogTargets (list) --
Specifies Glue Data Catalog targets.
(dict) --
Specifies an Glue Data Catalog target.
DatabaseName (string) --
The name of the database to be synchronized.
Tables (list) --
A list of the tables to be synchronized.
ConnectionName (string) --
The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a Catalog
connection type paired with a NETWORK
Connection type.
EventQueueArn (string) --
A valid Amazon SQS ARN. For example, arn:aws:sqs:region:account:sqs
.
DlqEventQueueArn (string) --
A valid Amazon dead-letter SQS ARN. For example, arn:aws:sqs:region:account:deadLetterQueue
.
DeltaTargets (list) --
Specifies Delta data store targets.
(dict) --
Specifies a Delta data store to crawl one or more Delta tables.
DeltaTables (list) --
A list of the Amazon S3 paths to the Delta tables.
ConnectionName (string) --
The name of the connection to use to connect to the Delta table target.
WriteManifest (boolean) --
Specifies whether to write the manifest files to the Delta table path.
CreateNativeDeltaTable (boolean) --
Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly.
DatabaseName (string) --
The name of the database in which the crawler's output is stored.
Description (string) --
A description of the crawler.
Classifiers (list) --
A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
RecrawlPolicy (dict) --
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
RecrawlBehavior (string) --
Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.
A value of CRAWL_EVERYTHING
specifies crawling the entire dataset again.
A value of CRAWL_NEW_FOLDERS_ONLY
specifies crawling only folders that were added since the last crawler run.
A value of CRAWL_EVENT_MODE
specifies crawling only the changes identified by Amazon S3 events.
SchemaChangePolicy (dict) --
The policy that specifies update and delete behaviors for the crawler.
UpdateBehavior (string) --
The update behavior when the crawler finds a changed schema.
DeleteBehavior (string) --
The deletion behavior when the crawler finds a deleted object.
LineageConfiguration (dict) --
A configuration that specifies whether data lineage is enabled for the crawler.
CrawlerLineageSettings (string) --
Specifies whether data lineage is enabled for the crawler. Valid values are:
State (string) --
Indicates whether the crawler is running, or whether a run is pending.
TablePrefix (string) --
The prefix added to the names of tables that are created.
Schedule (dict) --
For scheduled crawlers, the schedule when the crawler runs.
ScheduleExpression (string) --
A cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *)
.
State (string) --
The state of the schedule.
CrawlElapsedTime (integer) --
If the crawler is running, contains the total time elapsed since the last crawl began.
CreationTime (datetime) --
The time that the crawler was created.
LastUpdated (datetime) --
The time that the crawler was last updated.
LastCrawl (dict) --
The status of the last crawl, and potentially error information if an error occurred.
Status (string) --
Status of the last crawl.
ErrorMessage (string) --
If an error occurred, the error information about the last crawl.
LogGroup (string) --
The log group for the last crawl.
LogStream (string) --
The log stream for the last crawl.
MessagePrefix (string) --
The prefix for a message about this crawl.
StartTime (datetime) --
The time at which the crawl started.
Version (integer) --
The version of the crawler.
Configuration (string) --
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
CrawlerSecurityConfiguration (string) --
The name of the SecurityConfiguration
structure to be used by this crawler.
LakeFormationConfiguration (dict) --
Specifies whether the crawler should use Lake Formation credentials for the crawler instead of the IAM role credentials.
UseLakeFormationCredentials (boolean) --
Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials.
AccountId (string) --
Required for cross account crawls. For same account crawls as the target data, this can be left as null.
NextToken (string) --
A continuation token, if the returned list has not reached the end of those defined in this customer account.
Exceptions
Glue.Client.exceptions.OperationTimeoutException