Glue / Client / batch_get_crawlers
batch_get_crawlers#
- Glue.Client.batch_get_crawlers(**kwargs)#
- Returns a list of resource metadata for a given list of crawler names. After calling the - ListCrawlersoperation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.- See also: AWS API Documentation - Request Syntax - response = client.batch_get_crawlers( CrawlerNames=[ 'string', ] ) - Parameters:
- CrawlerNames (list) – - [REQUIRED] - A list of crawler names, which might be the names returned from the - ListCrawlersoperation.- (string) – 
 
- Return type:
- dict 
- Returns:
- Response Syntax - { 'Crawlers': [ { 'Name': 'string', 'Role': 'string', 'Targets': { 'S3Targets': [ { 'Path': 'string', 'Exclusions': [ 'string', ], 'ConnectionName': 'string', 'SampleSize': 123, 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'JdbcTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'Exclusions': [ 'string', ], 'EnableAdditionalMetadata': [ 'COMMENTS'|'RAWTYPES', ] }, ], 'MongoDBTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'ScanAll': True|False }, ], 'DynamoDBTargets': [ { 'Path': 'string', 'scanAll': True|False, 'scanRate': 123.0 }, ], 'CatalogTargets': [ { 'DatabaseName': 'string', 'Tables': [ 'string', ], 'ConnectionName': 'string', 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'DeltaTargets': [ { 'DeltaTables': [ 'string', ], 'ConnectionName': 'string', 'WriteManifest': True|False, 'CreateNativeDeltaTable': True|False }, ] }, 'DatabaseName': 'string', 'Description': 'string', 'Classifiers': [ 'string', ], 'RecrawlPolicy': { 'RecrawlBehavior': 'CRAWL_EVERYTHING'|'CRAWL_NEW_FOLDERS_ONLY'|'CRAWL_EVENT_MODE' }, 'SchemaChangePolicy': { 'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE', 'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE' }, 'LineageConfiguration': { 'CrawlerLineageSettings': 'ENABLE'|'DISABLE' }, 'State': 'READY'|'RUNNING'|'STOPPING', 'TablePrefix': 'string', 'Schedule': { 'ScheduleExpression': 'string', 'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING' }, 'CrawlElapsedTime': 123, 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'LastCrawl': { 'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED', 'ErrorMessage': 'string', 'LogGroup': 'string', 'LogStream': 'string', 'MessagePrefix': 'string', 'StartTime': datetime(2015, 1, 1) }, 'Version': 123, 'Configuration': 'string', 'CrawlerSecurityConfiguration': 'string', 'LakeFormationConfiguration': { 'UseLakeFormationCredentials': True|False, 'AccountId': 'string' } }, ], 'CrawlersNotFound': [ 'string', ] } - Response Structure - (dict) – - Crawlers (list) – - A list of crawler definitions. - (dict) – - Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog. - Name (string) – - The name of the crawler. 
- Role (string) – - The Amazon Resource Name (ARN) of an IAM role that’s used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data. 
- Targets (dict) – - A collection of targets to crawl. - S3Targets (list) – - Specifies Amazon Simple Storage Service (Amazon S3) targets. - (dict) – - Specifies a data store in Amazon Simple Storage Service (Amazon S3). - Path (string) – - The path to the Amazon S3 target. 
- Exclusions (list) – - A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler. - (string) – 
 
- ConnectionName (string) – - The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC). 
- SampleSize (integer) – - Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249. 
- EventQueueArn (string) – - A valid Amazon SQS ARN. For example, - arn:aws:sqs:region:account:sqs.
- DlqEventQueueArn (string) – - A valid Amazon dead-letter SQS ARN. For example, - arn:aws:sqs:region:account:deadLetterQueue.
 
 
- JdbcTargets (list) – - Specifies JDBC targets. - (dict) – - Specifies a JDBC data store to crawl. - ConnectionName (string) – - The name of the connection to use to connect to the JDBC target. 
- Path (string) – - The path of the JDBC target. 
- Exclusions (list) – - A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler. - (string) – 
 
- EnableAdditionalMetadata (list) – - Specify a value of - RAWTYPESor- COMMENTSto enable additional metadata in table responses.- RAWTYPESprovides the native-level datatype.- COMMENTSprovides comments associated with a column or table in the database.- If you do not need additional metadata, keep the field empty. - (string) – 
 
 
 
- MongoDBTargets (list) – - Specifies Amazon DocumentDB or MongoDB targets. - (dict) – - Specifies an Amazon DocumentDB or MongoDB data store to crawl. - ConnectionName (string) – - The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target. 
- Path (string) – - The path of the Amazon DocumentDB or MongoDB target (database/collection). 
- ScanAll (boolean) – - Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table. - A value of - truemeans to scan all records, while a value of- falsemeans to sample the records. If no value is specified, the value defaults to- true.
 
 
- DynamoDBTargets (list) – - Specifies Amazon DynamoDB targets. - (dict) – - Specifies an Amazon DynamoDB table to crawl. - Path (string) – - The name of the DynamoDB table to crawl. 
- scanAll (boolean) – - Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table. - A value of - truemeans to scan all records, while a value of- falsemeans to sample the records. If no value is specified, the value defaults to- true.
- scanRate (float) – - The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second. - The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode). 
 
 
- CatalogTargets (list) – - Specifies Glue Data Catalog targets. - (dict) – - Specifies an Glue Data Catalog target. - DatabaseName (string) – - The name of the database to be synchronized. 
- Tables (list) – - A list of the tables to be synchronized. - (string) – 
 
- ConnectionName (string) – - The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a - Catalogconnection type paired with a- NETWORKConnection type.
- EventQueueArn (string) – - A valid Amazon SQS ARN. For example, - arn:aws:sqs:region:account:sqs.
- DlqEventQueueArn (string) – - A valid Amazon dead-letter SQS ARN. For example, - arn:aws:sqs:region:account:deadLetterQueue.
 
 
- DeltaTargets (list) – - Specifies Delta data store targets. - (dict) – - Specifies a Delta data store to crawl one or more Delta tables. - DeltaTables (list) – - A list of the Amazon S3 paths to the Delta tables. - (string) – 
 
- ConnectionName (string) – - The name of the connection to use to connect to the Delta table target. 
- WriteManifest (boolean) – - Specifies whether to write the manifest files to the Delta table path. 
- CreateNativeDeltaTable (boolean) – - Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly. 
 
 
 
- DatabaseName (string) – - The name of the database in which the crawler’s output is stored. 
- Description (string) – - A description of the crawler. 
- Classifiers (list) – - A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler. - (string) – 
 
- RecrawlPolicy (dict) – - A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run. - RecrawlBehavior (string) – - Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. - A value of - CRAWL_EVERYTHINGspecifies crawling the entire dataset again.- A value of - CRAWL_NEW_FOLDERS_ONLYspecifies crawling only folders that were added since the last crawler run.- A value of - CRAWL_EVENT_MODEspecifies crawling only the changes identified by Amazon S3 events.
 
- SchemaChangePolicy (dict) – - The policy that specifies update and delete behaviors for the crawler. - UpdateBehavior (string) – - The update behavior when the crawler finds a changed schema. 
- DeleteBehavior (string) – - The deletion behavior when the crawler finds a deleted object. 
 
- LineageConfiguration (dict) – - A configuration that specifies whether data lineage is enabled for the crawler. - CrawlerLineageSettings (string) – - Specifies whether data lineage is enabled for the crawler. Valid values are: - ENABLE: enables data lineage for the crawler 
- DISABLE: disables data lineage for the crawler 
 
 
- State (string) – - Indicates whether the crawler is running, or whether a run is pending. 
- TablePrefix (string) – - The prefix added to the names of tables that are created. 
- Schedule (dict) – - For scheduled crawlers, the schedule when the crawler runs. - ScheduleExpression (string) – - A - cronexpression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:- cron(15 12 * * ? *).
- State (string) – - The state of the schedule. 
 
- CrawlElapsedTime (integer) – - If the crawler is running, contains the total time elapsed since the last crawl began. 
- CreationTime (datetime) – - The time that the crawler was created. 
- LastUpdated (datetime) – - The time that the crawler was last updated. 
- LastCrawl (dict) – - The status of the last crawl, and potentially error information if an error occurred. - Status (string) – - Status of the last crawl. 
- ErrorMessage (string) – - If an error occurred, the error information about the last crawl. 
- LogGroup (string) – - The log group for the last crawl. 
- LogStream (string) – - The log stream for the last crawl. 
- MessagePrefix (string) – - The prefix for a message about this crawl. 
- StartTime (datetime) – - The time at which the crawl started. 
 
- Version (integer) – - The version of the crawler. 
- Configuration (string) – - Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior. For more information, see Setting crawler configuration options. 
- CrawlerSecurityConfiguration (string) – - The name of the - SecurityConfigurationstructure to be used by this crawler.
- LakeFormationConfiguration (dict) – - Specifies whether the crawler should use Lake Formation credentials for the crawler instead of the IAM role credentials. - UseLakeFormationCredentials (boolean) – - Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials. 
- AccountId (string) – - Required for cross account crawls. For same account crawls as the target data, this can be left as null. 
 
 
 
- CrawlersNotFound (list) – - A list of names of crawlers that were not found. - (string) – 
 
 
 
 - Exceptions - Glue.Client.exceptions.InvalidInputException
- Glue.Client.exceptions.OperationTimeoutException