Glue / Paginator / GetCrawlers
GetCrawlers#
- class Glue.Paginator.GetCrawlers#
paginator = client.get_paginator('get_crawlers')
- paginate(**kwargs)#
Creates an iterator that will paginate through responses from
Glue.Client.get_crawlers()
.See also: AWS API Documentation
Request Syntax
response_iterator = paginator.paginate( PaginationConfig={ 'MaxItems': 123, 'PageSize': 123, 'StartingToken': 'string' } )
- Parameters:
PaginationConfig (dict) –
A dictionary that provides parameters to control pagination.
MaxItems (integer) –
The total number of items to return. If the total number of items available is more than the value specified in max-items then a
NextToken
will be provided in the output that you can use to resume pagination.PageSize (integer) –
The size of each page.
StartingToken (string) –
A token to specify where to start paginating. This is the
NextToken
from a previous response.
- Return type:
dict
- Returns:
Response Syntax
{ 'Crawlers': [ { 'Name': 'string', 'Role': 'string', 'Targets': { 'S3Targets': [ { 'Path': 'string', 'Exclusions': [ 'string', ], 'ConnectionName': 'string', 'SampleSize': 123, 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'JdbcTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'Exclusions': [ 'string', ], 'EnableAdditionalMetadata': [ 'COMMENTS'|'RAWTYPES', ] }, ], 'MongoDBTargets': [ { 'ConnectionName': 'string', 'Path': 'string', 'ScanAll': True|False }, ], 'DynamoDBTargets': [ { 'Path': 'string', 'scanAll': True|False, 'scanRate': 123.0 }, ], 'CatalogTargets': [ { 'DatabaseName': 'string', 'Tables': [ 'string', ], 'ConnectionName': 'string', 'EventQueueArn': 'string', 'DlqEventQueueArn': 'string' }, ], 'DeltaTargets': [ { 'DeltaTables': [ 'string', ], 'ConnectionName': 'string', 'WriteManifest': True|False, 'CreateNativeDeltaTable': True|False }, ] }, 'DatabaseName': 'string', 'Description': 'string', 'Classifiers': [ 'string', ], 'RecrawlPolicy': { 'RecrawlBehavior': 'CRAWL_EVERYTHING'|'CRAWL_NEW_FOLDERS_ONLY'|'CRAWL_EVENT_MODE' }, 'SchemaChangePolicy': { 'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE', 'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE' }, 'LineageConfiguration': { 'CrawlerLineageSettings': 'ENABLE'|'DISABLE' }, 'State': 'READY'|'RUNNING'|'STOPPING', 'TablePrefix': 'string', 'Schedule': { 'ScheduleExpression': 'string', 'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING' }, 'CrawlElapsedTime': 123, 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'LastCrawl': { 'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED', 'ErrorMessage': 'string', 'LogGroup': 'string', 'LogStream': 'string', 'MessagePrefix': 'string', 'StartTime': datetime(2015, 1, 1) }, 'Version': 123, 'Configuration': 'string', 'CrawlerSecurityConfiguration': 'string', 'LakeFormationConfiguration': { 'UseLakeFormationCredentials': True|False, 'AccountId': 'string' } }, ], }
Response Structure
(dict) –
Crawlers (list) –
A list of crawler metadata.
(dict) –
Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog.
Name (string) –
The name of the crawler.
Role (string) –
The Amazon Resource Name (ARN) of an IAM role that’s used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
Targets (dict) –
A collection of targets to crawl.
S3Targets (list) –
Specifies Amazon Simple Storage Service (Amazon S3) targets.
(dict) –
Specifies a data store in Amazon Simple Storage Service (Amazon S3).
Path (string) –
The path to the Amazon S3 target.
Exclusions (list) –
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
(string) –
ConnectionName (string) –
The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).
SampleSize (integer) –
Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.
EventQueueArn (string) –
A valid Amazon SQS ARN. For example,
arn:aws:sqs:region:account:sqs
.DlqEventQueueArn (string) –
A valid Amazon dead-letter SQS ARN. For example,
arn:aws:sqs:region:account:deadLetterQueue
.
JdbcTargets (list) –
Specifies JDBC targets.
(dict) –
Specifies a JDBC data store to crawl.
ConnectionName (string) –
The name of the connection to use to connect to the JDBC target.
Path (string) –
The path of the JDBC target.
Exclusions (list) –
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
(string) –
EnableAdditionalMetadata (list) –
Specify a value of
RAWTYPES
orCOMMENTS
to enable additional metadata in table responses.RAWTYPES
provides the native-level datatype.COMMENTS
provides comments associated with a column or table in the database.If you do not need additional metadata, keep the field empty.
(string) –
MongoDBTargets (list) –
Specifies Amazon DocumentDB or MongoDB targets.
(dict) –
Specifies an Amazon DocumentDB or MongoDB data store to crawl.
ConnectionName (string) –
The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.
Path (string) –
The path of the Amazon DocumentDB or MongoDB target (database/collection).
ScanAll (boolean) –
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
.
DynamoDBTargets (list) –
Specifies Amazon DynamoDB targets.
(dict) –
Specifies an Amazon DynamoDB table to crawl.
Path (string) –
The name of the DynamoDB table to crawl.
scanAll (boolean) –
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
.scanRate (float) –
The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.
The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).
CatalogTargets (list) –
Specifies Glue Data Catalog targets.
(dict) –
Specifies an Glue Data Catalog target.
DatabaseName (string) –
The name of the database to be synchronized.
Tables (list) –
A list of the tables to be synchronized.
(string) –
ConnectionName (string) –
The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a
Catalog
connection type paired with aNETWORK
Connection type.EventQueueArn (string) –
A valid Amazon SQS ARN. For example,
arn:aws:sqs:region:account:sqs
.DlqEventQueueArn (string) –
A valid Amazon dead-letter SQS ARN. For example,
arn:aws:sqs:region:account:deadLetterQueue
.
DeltaTargets (list) –
Specifies Delta data store targets.
(dict) –
Specifies a Delta data store to crawl one or more Delta tables.
DeltaTables (list) –
A list of the Amazon S3 paths to the Delta tables.
(string) –
ConnectionName (string) –
The name of the connection to use to connect to the Delta table target.
WriteManifest (boolean) –
Specifies whether to write the manifest files to the Delta table path.
CreateNativeDeltaTable (boolean) –
Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly.
DatabaseName (string) –
The name of the database in which the crawler’s output is stored.
Description (string) –
A description of the crawler.
Classifiers (list) –
A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
(string) –
RecrawlPolicy (dict) –
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
RecrawlBehavior (string) –
Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.
A value of
CRAWL_EVERYTHING
specifies crawling the entire dataset again.A value of
CRAWL_NEW_FOLDERS_ONLY
specifies crawling only folders that were added since the last crawler run.A value of
CRAWL_EVENT_MODE
specifies crawling only the changes identified by Amazon S3 events.
SchemaChangePolicy (dict) –
The policy that specifies update and delete behaviors for the crawler.
UpdateBehavior (string) –
The update behavior when the crawler finds a changed schema.
DeleteBehavior (string) –
The deletion behavior when the crawler finds a deleted object.
LineageConfiguration (dict) –
A configuration that specifies whether data lineage is enabled for the crawler.
CrawlerLineageSettings (string) –
Specifies whether data lineage is enabled for the crawler. Valid values are:
ENABLE: enables data lineage for the crawler
DISABLE: disables data lineage for the crawler
State (string) –
Indicates whether the crawler is running, or whether a run is pending.
TablePrefix (string) –
The prefix added to the names of tables that are created.
Schedule (dict) –
For scheduled crawlers, the schedule when the crawler runs.
ScheduleExpression (string) –
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.State (string) –
The state of the schedule.
CrawlElapsedTime (integer) –
If the crawler is running, contains the total time elapsed since the last crawl began.
CreationTime (datetime) –
The time that the crawler was created.
LastUpdated (datetime) –
The time that the crawler was last updated.
LastCrawl (dict) –
The status of the last crawl, and potentially error information if an error occurred.
Status (string) –
Status of the last crawl.
ErrorMessage (string) –
If an error occurred, the error information about the last crawl.
LogGroup (string) –
The log group for the last crawl.
LogStream (string) –
The log stream for the last crawl.
MessagePrefix (string) –
The prefix for a message about this crawl.
StartTime (datetime) –
The time at which the crawl started.
Version (integer) –
The version of the crawler.
Configuration (string) –
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior. For more information, see Setting crawler configuration options.
CrawlerSecurityConfiguration (string) –
The name of the
SecurityConfiguration
structure to be used by this crawler.LakeFormationConfiguration (dict) –
Specifies whether the crawler should use Lake Formation credentials for the crawler instead of the IAM role credentials.
UseLakeFormationCredentials (boolean) –
Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials.
AccountId (string) –
Required for cross account crawls. For same account crawls as the target data, this can be left as null.