KinesisAnalytics / Client / discover_input_schema

discover_input_schema#

KinesisAnalytics.Client.discover_input_schema(**kwargs)#

Note

This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only supports SQL applications. Version 2 of the API supports SQL and Java applications. For more information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.

Infers a schema by evaluating sample records on the specified streaming source (Amazon Kinesis stream or Amazon Kinesis Firehose delivery stream) or S3 object. In the response, the operation returns the inferred schema and also the sample records that the operation used to infer the schema.

You can use the inferred schema when configuring a streaming source for your application. For conceptual information, see Configuring Application Input. Note that when you create an application using the Amazon Kinesis Analytics console, the console uses this operation to infer a schema and show it in the console user interface.

This operation requires permissions to perform the kinesisanalytics:DiscoverInputSchema action.

Request Syntax

response = client.discover_input_schema(
    ResourceARN='string',
    RoleARN='string',
    InputStartingPositionConfiguration={
        'InputStartingPosition': 'NOW'|'TRIM_HORIZON'|'LAST_STOPPED_POINT'
    },
    S3Configuration={
        'RoleARN': 'string',
        'BucketARN': 'string',
        'FileKey': 'string'
    },
    InputProcessingConfiguration={
        'InputLambdaProcessor': {
            'ResourceARN': 'string',
            'RoleARN': 'string'
        }
    }
)

Parameters:

ResourceARN (string) – Amazon Resource Name (ARN) of the streaming source.
RoleARN (string) – ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
InputStartingPositionConfiguration (dict) –
Point at which you want Amazon Kinesis Analytics to start reading records from the specified streaming source discovery purposes.
- InputStartingPosition (string) –
  
  The starting position on the stream.
  - NOW - Start reading just after the most recent record in the stream, start at the request time stamp that the customer issued.
  - TRIM_HORIZON - Start reading at the last untrimmed record in the stream, which is the oldest record available in the stream. This option is not available for an Amazon Kinesis Firehose delivery stream.
  - LAST_STOPPED_POINT - Resume reading from where the application last stopped reading.
S3Configuration (dict) –
Specify this parameter to discover a schema from data in an Amazon S3 object.
- RoleARN (string) – [REQUIRED]
  
  IAM ARN of the role used to access the data.
- BucketARN (string) – [REQUIRED]
  
  ARN of the S3 bucket that contains the data.
- FileKey (string) – [REQUIRED]
  
  The name of the object that contains the data.
InputProcessingConfiguration (dict) –
The InputProcessingConfiguration to use to preprocess the records before discovering the schema of the records.
- InputLambdaProcessor (dict) – [REQUIRED]
  
  The InputLambdaProcessor that is used to preprocess the records in the stream before being processed by your application code.
  - ResourceARN (string) – [REQUIRED]
    
    The ARN of the AWS Lambda function that operates on records in the stream.
    
    Note
    To specify an earlier version of the Lambda function than the latest, include the Lambda function version in the Lambda function ARN. For more information about Lambda ARNs, see Example ARNs: AWS Lambda
  - RoleARN (string) – [REQUIRED]
    
    The ARN of the IAM role that is used to access the AWS Lambda function.

Return type:

dict

Returns:

Response Syntax

{
    'InputSchema': {
        'RecordFormat': {
            'RecordFormatType': 'JSON'|'CSV',
            'MappingParameters': {
                'JSONMappingParameters': {
                    'RecordRowPath': 'string'
                },
                'CSVMappingParameters': {
                    'RecordRowDelimiter': 'string',
                    'RecordColumnDelimiter': 'string'
                }
            }
        },
        'RecordEncoding': 'string',
        'RecordColumns': [
            {
                'Name': 'string',
                'Mapping': 'string',
                'SqlType': 'string'
            },
        ]
    },
    'ParsedInputRecords': [
        [
            'string',
        ],
    ],
    'ProcessedInputRecords': [
        'string',
    ],
    'RawInputRecords': [
        'string',
    ]
}

Response Structure

(dict) –
- InputSchema (dict) –
  
  Schema inferred from the streaming source. It identifies the format of the data in the streaming source and how each data element maps to corresponding columns in the in-application stream that you can create.
  - RecordFormat (dict) –
    
    Specifies the format of the records on the streaming source.
    - RecordFormatType (string) –
      
      The type of record format.
    - MappingParameters (dict) –
      
      When configuring application input at the time of creating or updating an application, provides additional mapping information specific to the record format (such as JSON, CSV, or record fields delimited by some delimiter) on the streaming source.
      - JSONMappingParameters (dict) –
        
        Provides additional mapping information when JSON is the record format on the streaming source.
        
        RecordRowPath (string) –
        
        Path to the top-level parent that contains the records.
      - CSVMappingParameters (dict) –
        
        Provides additional mapping information when the record format uses delimiters (for example, CSV).
        
        RecordRowDelimiter (string) –
        
        Row delimiter. For example, in a CSV format, ‘n’ is the typical row delimiter.
        
        RecordColumnDelimiter (string) –
        
        Column delimiter. For example, in a CSV format, a comma (“,”) is the typical column delimiter.
  - RecordEncoding (string) –
    
    Specifies the encoding of the records in the streaming source. For example, UTF-8.
  - RecordColumns (list) –
    
    A list of RecordColumn objects.
    - (dict) –
      
      Describes the mapping of each data element in the streaming source to the corresponding column in the in-application stream.
      
      Also used to describe the format of the reference data source.
      - Name (string) –
        
        Name of the column created in the in-application input stream or reference table.
      - Mapping (string) –
        
        Reference to the data element in the streaming input or the reference data source. This element is required if the RecordFormatType is JSON.
      - SqlType (string) –
        
        Type of column created in the in-application input stream or reference table.
- ParsedInputRecords (list) –
  
  An array of elements, where each element corresponds to a row in a stream record (a stream record can have more than one row).
  - (list) –
    - (string) –
- ProcessedInputRecords (list) –
  
  Stream data that was modified by the processor specified in the InputProcessingConfiguration parameter.
  - (string) –
- RawInputRecords (list) –
  
  Raw stream data that was sampled to infer the schema.
  - (string) –

Exceptions

KinesisAnalytics.Client.exceptions.InvalidArgumentException
KinesisAnalytics.Client.exceptions.UnableToDetectSchemaException
KinesisAnalytics.Client.exceptions.ResourceProvisionedThroughputExceededException
KinesisAnalytics.Client.exceptions.ServiceUnavailableException