Glue / Client / update_crawler
update_crawler#
- Glue.Client.update_crawler(**kwargs)#
- Updates a crawler. If a crawler is running, you must stop it using - StopCrawlerbefore updating it.- See also: AWS API Documentation - Request Syntax- response = client.update_crawler( Name='string', Role='string', DatabaseName='string', Description='string', Targets={ 'S3Targets': [ { 'Path': 'string', 'Exclusions': [ 'string', ], 'ConnectionName': 'string', 'SampleSize': 123, 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'JdbcTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'Exclusions': [ 'string', ], 'EnableAdditionalMetadata': [ 'COMMENTS'|'RAWTYPES', ] }, ], 'MongoDBTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'ScanAll': True|False }, ], 'DynamoDBTargets': [ { 'Path': 'string', 'scanAll': True|False, 'scanRate': 123.0 }, ], 'CatalogTargets': [ { 'DatabaseName': 'string', 'Tables': [ 'string', ], 'ConnectionName': 'string', 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'DeltaTargets': [ { 'DeltaTables': [ 'string', ], 'ConnectionName': 'string', 'WriteManifest': True|False, 'CreateNativeDeltaTable': True|False }, ] }, Schedule='string', Classifiers=[ 'string', ], TablePrefix='string', SchemaChangePolicy={ 'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE', 'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE' }, RecrawlPolicy={ 'RecrawlBehavior': 'CRAWL_EVERYTHING'|'CRAWL_NEW_FOLDERS_ONLY'|'CRAWL_EVENT_MODE' }, LineageConfiguration={ 'CrawlerLineageSettings': 'ENABLE'|'DISABLE' }, LakeFormationConfiguration={ 'UseLakeFormationCredentials': True|False, 'AccountId': 'string' }, Configuration='string', CrawlerSecurityConfiguration='string' ) - Parameters:
- Name (string) – - [REQUIRED] - Name of the new crawler. 
- Role (string) – The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources. 
- DatabaseName (string) – The Glue database where results are stored, such as: - arn:aws:daylight:us-east-1::database/sometable/*.
- Description (string) – A description of the new crawler. 
- Targets (dict) – - A list of targets to crawl. - S3Targets (list) – - Specifies Amazon Simple Storage Service (Amazon S3) targets. - (dict) – - Specifies a data store in Amazon Simple Storage Service (Amazon S3). - Path (string) – - The path to the Amazon S3 target. 
- Exclusions (list) – - A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler. - (string) – 
 
- ConnectionName (string) – - The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC). 
- SampleSize (integer) – - Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249. 
- EventQueueArn (string) – - A valid Amazon SQS ARN. For example, - arn:aws:sqs:region:account:sqs.
- DlqEventQueueArn (string) – - A valid Amazon dead-letter SQS ARN. For example, - arn:aws:sqs:region:account:deadLetterQueue.
 
 
- JdbcTargets (list) – - Specifies JDBC targets. - (dict) – - Specifies a JDBC data store to crawl. - ConnectionName (string) – - The name of the connection to use to connect to the JDBC target. 
- Path (string) – - The path of the JDBC target. 
- Exclusions (list) – - A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler. - (string) – 
 
- EnableAdditionalMetadata (list) – - Specify a value of - RAWTYPESor- COMMENTSto enable additional metadata in table responses.- RAWTYPESprovides the native-level datatype.- COMMENTSprovides comments associated with a column or table in the database.- If you do not need additional metadata, keep the field empty. - (string) – 
 
 
 
- MongoDBTargets (list) – - Specifies Amazon DocumentDB or MongoDB targets. - (dict) – - Specifies an Amazon DocumentDB or MongoDB data store to crawl. - ConnectionName (string) – - The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target. 
- Path (string) – - The path of the Amazon DocumentDB or MongoDB target (database/collection). 
- ScanAll (boolean) – - Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table. - A value of - truemeans to scan all records, while a value of- falsemeans to sample the records. If no value is specified, the value defaults to- true.
 
 
- DynamoDBTargets (list) – - Specifies Amazon DynamoDB targets. - (dict) – - Specifies an Amazon DynamoDB table to crawl. - Path (string) – - The name of the DynamoDB table to crawl. 
- scanAll (boolean) – - Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table. - A value of - truemeans to scan all records, while a value of- falsemeans to sample the records. If no value is specified, the value defaults to- true.
- scanRate (float) – - The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second. - The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode). 
 
 
- CatalogTargets (list) – - Specifies Glue Data Catalog targets. - (dict) – - Specifies an Glue Data Catalog target. - DatabaseName (string) – [REQUIRED] - The name of the database to be synchronized. 
- Tables (list) – [REQUIRED] - A list of the tables to be synchronized. - (string) – 
 
- ConnectionName (string) – - The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a - Catalogconnection type paired with a- NETWORKConnection type.
- EventQueueArn (string) – - A valid Amazon SQS ARN. For example, - arn:aws:sqs:region:account:sqs.
- DlqEventQueueArn (string) – - A valid Amazon dead-letter SQS ARN. For example, - arn:aws:sqs:region:account:deadLetterQueue.
 
 
- DeltaTargets (list) – - Specifies Delta data store targets. - (dict) – - Specifies a Delta data store to crawl one or more Delta tables. - DeltaTables (list) – - A list of the Amazon S3 paths to the Delta tables. - (string) – 
 
- ConnectionName (string) – - The name of the connection to use to connect to the Delta table target. 
- WriteManifest (boolean) – - Specifies whether to write the manifest files to the Delta table path. 
- CreateNativeDeltaTable (boolean) – - Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly. 
 
 
 
- Schedule (string) – A - cronexpression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:- cron(15 12 * * ? *).
- Classifiers (list) – - A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification. - (string) – 
 
- TablePrefix (string) – The table prefix used for catalog tables that are created. 
- SchemaChangePolicy (dict) – - The policy for the crawler’s update and deletion behavior. - UpdateBehavior (string) – - The update behavior when the crawler finds a changed schema. 
- DeleteBehavior (string) – - The deletion behavior when the crawler finds a deleted object. 
 
- RecrawlPolicy (dict) – - A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run. - RecrawlBehavior (string) – - Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. - A value of - CRAWL_EVERYTHINGspecifies crawling the entire dataset again.- A value of - CRAWL_NEW_FOLDERS_ONLYspecifies crawling only folders that were added since the last crawler run.- A value of - CRAWL_EVENT_MODEspecifies crawling only the changes identified by Amazon S3 events.
 
- LineageConfiguration (dict) – - Specifies data lineage configuration settings for the crawler. - CrawlerLineageSettings (string) – - Specifies whether data lineage is enabled for the crawler. Valid values are: - ENABLE: enables data lineage for the crawler 
- DISABLE: disables data lineage for the crawler 
 
 
- LakeFormationConfiguration (dict) – - Specifies Lake Formation configuration settings for the crawler. - UseLakeFormationCredentials (boolean) – - Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials. 
- AccountId (string) – - Required for cross account crawls. For same account crawls as the target data, this can be left as null. 
 
- Configuration (string) – Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior. For more information, see Setting crawler configuration options. 
- CrawlerSecurityConfiguration (string) – The name of the - SecurityConfigurationstructure to be used by this crawler.
 
- Return type:
- dict 
- Returns:
- Response Syntax- {}- Response Structure- (dict) – 
 
 - Exceptions- Glue.Client.exceptions.InvalidInputException
- Glue.Client.exceptions.VersionMismatchException
- Glue.Client.exceptions.EntityNotFoundException
- Glue.Client.exceptions.CrawlerRunningException
- Glue.Client.exceptions.OperationTimeoutException