kendra.Client.
create_data_source
(**kwargs)¶Creates a data source connector that you want to use with an Amazon Kendra index.
You specify a name, data source connector type and description for your data source. You also specify configuration information for the data source connector.
CreateDataSource
is a synchronous operation. The operation returns 200 if the data source was successfully created. Otherwise, an exception is raised.
Amazon S3 and custom data sources are the only supported data sources in the Amazon Web Services GovCloud (US-West) region.
For an example of creating an index and data source using the Python SDK, see Getting started with Python SDK. For an example of creating an index and data source using the Java SDK, see Getting started with Java SDK.
See also: AWS API Documentation
Request Syntax
response = client.create_data_source(
Name='string',
IndexId='string',
Type='S3'|'SHAREPOINT'|'DATABASE'|'SALESFORCE'|'ONEDRIVE'|'SERVICENOW'|'CUSTOM'|'CONFLUENCE'|'GOOGLEDRIVE'|'WEBCRAWLER'|'WORKDOCS'|'FSX'|'SLACK'|'BOX'|'QUIP'|'JIRA'|'GITHUB'|'ALFRESCO'|'TEMPLATE',
Configuration={
'S3Configuration': {
'BucketName': 'string',
'InclusionPrefixes': [
'string',
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'DocumentsMetadataConfiguration': {
'S3Prefix': 'string'
},
'AccessControlListConfiguration': {
'KeyPath': 'string'
}
},
'SharePointConfiguration': {
'SharePointVersion': 'SHAREPOINT_2013'|'SHAREPOINT_2016'|'SHAREPOINT_ONLINE'|'SHAREPOINT_2019',
'Urls': [
'string',
],
'SecretArn': 'string',
'CrawlAttachments': True|False,
'UseChangeLog': True|False,
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'DocumentTitleFieldName': 'string',
'DisableLocalGroups': True|False,
'SslCertificateS3Path': {
'Bucket': 'string',
'Key': 'string'
},
'AuthenticationType': 'HTTP_BASIC'|'OAUTH2',
'ProxyConfiguration': {
'Host': 'string',
'Port': 123,
'Credentials': 'string'
}
},
'DatabaseConfiguration': {
'DatabaseEngineType': 'RDS_AURORA_MYSQL'|'RDS_AURORA_POSTGRESQL'|'RDS_MYSQL'|'RDS_POSTGRESQL',
'ConnectionConfiguration': {
'DatabaseHost': 'string',
'DatabasePort': 123,
'DatabaseName': 'string',
'TableName': 'string',
'SecretArn': 'string'
},
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'ColumnConfiguration': {
'DocumentIdColumnName': 'string',
'DocumentDataColumnName': 'string',
'DocumentTitleColumnName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'ChangeDetectingColumns': [
'string',
]
},
'AclConfiguration': {
'AllowedGroupsColumnName': 'string'
},
'SqlConfiguration': {
'QueryIdentifiersEnclosingOption': 'DOUBLE_QUOTES'|'NONE'
}
},
'SalesforceConfiguration': {
'ServerUrl': 'string',
'SecretArn': 'string',
'StandardObjectConfigurations': [
{
'Name': 'ACCOUNT'|'CAMPAIGN'|'CASE'|'CONTACT'|'CONTRACT'|'DOCUMENT'|'GROUP'|'IDEA'|'LEAD'|'OPPORTUNITY'|'PARTNER'|'PRICEBOOK'|'PRODUCT'|'PROFILE'|'SOLUTION'|'TASK'|'USER',
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
],
'KnowledgeArticleConfiguration': {
'IncludedStates': [
'DRAFT'|'PUBLISHED'|'ARCHIVED',
],
'StandardKnowledgeArticleTypeConfiguration': {
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'CustomKnowledgeArticleTypeConfigurations': [
{
'Name': 'string',
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
]
},
'ChatterFeedConfiguration': {
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'IncludeFilterTypes': [
'ACTIVE_USER'|'STANDARD_USER',
]
},
'CrawlAttachments': True|False,
'StandardObjectAttachmentConfiguration': {
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'IncludeAttachmentFilePatterns': [
'string',
],
'ExcludeAttachmentFilePatterns': [
'string',
]
},
'OneDriveConfiguration': {
'TenantDomain': 'string',
'SecretArn': 'string',
'OneDriveUsers': {
'OneDriveUserList': [
'string',
],
'OneDriveUserS3Path': {
'Bucket': 'string',
'Key': 'string'
}
},
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'DisableLocalGroups': True|False
},
'ServiceNowConfiguration': {
'HostUrl': 'string',
'SecretArn': 'string',
'ServiceNowBuildVersion': 'LONDON'|'OTHERS',
'KnowledgeArticleConfiguration': {
'CrawlAttachments': True|False,
'IncludeAttachmentFilePatterns': [
'string',
],
'ExcludeAttachmentFilePatterns': [
'string',
],
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'FilterQuery': 'string'
},
'ServiceCatalogConfiguration': {
'CrawlAttachments': True|False,
'IncludeAttachmentFilePatterns': [
'string',
],
'ExcludeAttachmentFilePatterns': [
'string',
],
'DocumentDataFieldName': 'string',
'DocumentTitleFieldName': 'string',
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'AuthenticationType': 'HTTP_BASIC'|'OAUTH2'
},
'ConfluenceConfiguration': {
'ServerUrl': 'string',
'SecretArn': 'string',
'Version': 'CLOUD'|'SERVER',
'SpaceConfiguration': {
'CrawlPersonalSpaces': True|False,
'CrawlArchivedSpaces': True|False,
'IncludeSpaces': [
'string',
],
'ExcludeSpaces': [
'string',
],
'SpaceFieldMappings': [
{
'DataSourceFieldName': 'DISPLAY_URL'|'ITEM_TYPE'|'SPACE_KEY'|'URL',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'PageConfiguration': {
'PageFieldMappings': [
{
'DataSourceFieldName': 'AUTHOR'|'CONTENT_STATUS'|'CREATED_DATE'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'MODIFIED_DATE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'BlogConfiguration': {
'BlogFieldMappings': [
{
'DataSourceFieldName': 'AUTHOR'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'PUBLISH_DATE'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'AttachmentConfiguration': {
'CrawlAttachments': True|False,
'AttachmentFieldMappings': [
{
'DataSourceFieldName': 'AUTHOR'|'CONTENT_TYPE'|'CREATED_DATE'|'DISPLAY_URL'|'FILE_SIZE'|'ITEM_TYPE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'ProxyConfiguration': {
'Host': 'string',
'Port': 123,
'Credentials': 'string'
},
'AuthenticationType': 'HTTP_BASIC'|'PAT'
},
'GoogleDriveConfiguration': {
'SecretArn': 'string',
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'ExcludeMimeTypes': [
'string',
],
'ExcludeUserAccounts': [
'string',
],
'ExcludeSharedDrives': [
'string',
]
},
'WebCrawlerConfiguration': {
'Urls': {
'SeedUrlConfiguration': {
'SeedUrls': [
'string',
],
'WebCrawlerMode': 'HOST_ONLY'|'SUBDOMAINS'|'EVERYTHING'
},
'SiteMapsConfiguration': {
'SiteMaps': [
'string',
]
}
},
'CrawlDepth': 123,
'MaxLinksPerPage': 123,
'MaxContentSizePerPageInMegaBytes': ...,
'MaxUrlsPerMinuteCrawlRate': 123,
'UrlInclusionPatterns': [
'string',
],
'UrlExclusionPatterns': [
'string',
],
'ProxyConfiguration': {
'Host': 'string',
'Port': 123,
'Credentials': 'string'
},
'AuthenticationConfiguration': {
'BasicAuthentication': [
{
'Host': 'string',
'Port': 123,
'Credentials': 'string'
},
]
}
},
'WorkDocsConfiguration': {
'OrganizationId': 'string',
'CrawlComments': True|False,
'UseChangeLog': True|False,
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'FsxConfiguration': {
'FileSystemId': 'string',
'FileSystemType': 'WINDOWS',
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'SecretArn': 'string',
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'SlackConfiguration': {
'TeamId': 'string',
'SecretArn': 'string',
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'SlackEntityList': [
'PUBLIC_CHANNEL'|'PRIVATE_CHANNEL'|'GROUP_MESSAGE'|'DIRECT_MESSAGE',
],
'UseChangeLog': True|False,
'CrawlBotMessage': True|False,
'ExcludeArchived': True|False,
'SinceCrawlDate': 'string',
'LookBackPeriod': 123,
'PrivateChannelFilter': [
'string',
],
'PublicChannelFilter': [
'string',
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'FieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'BoxConfiguration': {
'EnterpriseId': 'string',
'SecretArn': 'string',
'UseChangeLog': True|False,
'CrawlComments': True|False,
'CrawlTasks': True|False,
'CrawlWebLinks': True|False,
'FileFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'TaskFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'CommentFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'WebLinkFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
}
},
'QuipConfiguration': {
'Domain': 'string',
'SecretArn': 'string',
'CrawlFileComments': True|False,
'CrawlChatRooms': True|False,
'CrawlAttachments': True|False,
'FolderIds': [
'string',
],
'ThreadFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'MessageFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'AttachmentFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
}
},
'JiraConfiguration': {
'JiraAccountUrl': 'string',
'SecretArn': 'string',
'UseChangeLog': True|False,
'Project': [
'string',
],
'IssueType': [
'string',
],
'Status': [
'string',
],
'IssueSubEntityFilter': [
'COMMENTS'|'ATTACHMENTS'|'WORKLOGS',
],
'AttachmentFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'CommentFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'IssueFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'ProjectFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'WorkLogFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
}
},
'GitHubConfiguration': {
'SaaSConfiguration': {
'OrganizationName': 'string',
'HostUrl': 'string'
},
'OnPremiseConfiguration': {
'HostUrl': 'string',
'OrganizationName': 'string',
'SslCertificateS3Path': {
'Bucket': 'string',
'Key': 'string'
}
},
'Type': 'SAAS'|'ON_PREMISE',
'SecretArn': 'string',
'UseChangeLog': True|False,
'GitHubDocumentCrawlProperties': {
'CrawlRepositoryDocuments': True|False,
'CrawlIssue': True|False,
'CrawlIssueComment': True|False,
'CrawlIssueCommentAttachment': True|False,
'CrawlPullRequest': True|False,
'CrawlPullRequestComment': True|False,
'CrawlPullRequestCommentAttachment': True|False
},
'RepositoryFilter': [
'string',
],
'InclusionFolderNamePatterns': [
'string',
],
'InclusionFileTypePatterns': [
'string',
],
'InclusionFileNamePatterns': [
'string',
],
'ExclusionFolderNamePatterns': [
'string',
],
'ExclusionFileTypePatterns': [
'string',
],
'ExclusionFileNamePatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
'GitHubRepositoryConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubCommitConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubIssueDocumentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubIssueCommentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubIssueAttachmentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubPullRequestCommentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubPullRequestDocumentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'GitHubPullRequestDocumentAttachmentConfigurationFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
]
},
'AlfrescoConfiguration': {
'SiteUrl': 'string',
'SiteId': 'string',
'SecretArn': 'string',
'SslCertificateS3Path': {
'Bucket': 'string',
'Key': 'string'
},
'CrawlSystemFolders': True|False,
'CrawlComments': True|False,
'EntityFilter': [
'wiki'|'blog'|'documentLibrary',
],
'DocumentLibraryFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'BlogFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'WikiFieldMappings': [
{
'DataSourceFieldName': 'string',
'DateFieldFormat': 'string',
'IndexFieldName': 'string'
},
],
'InclusionPatterns': [
'string',
],
'ExclusionPatterns': [
'string',
],
'VpcConfiguration': {
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
}
},
'TemplateConfiguration': {
'Template': {...}|[...]|123|123.4|'string'|True|None
}
},
VpcConfiguration={
'SubnetIds': [
'string',
],
'SecurityGroupIds': [
'string',
]
},
Description='string',
Schedule='string',
RoleArn='string',
Tags=[
{
'Key': 'string',
'Value': 'string'
},
],
ClientToken='string',
LanguageCode='string',
CustomDocumentEnrichmentConfiguration={
'InlineConfigurations': [
{
'Condition': {
'ConditionDocumentAttributeKey': 'string',
'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
'ConditionOnValue': {
'StringValue': 'string',
'StringListValue': [
'string',
],
'LongValue': 123,
'DateValue': datetime(2015, 1, 1)
}
},
'Target': {
'TargetDocumentAttributeKey': 'string',
'TargetDocumentAttributeValueDeletion': True|False,
'TargetDocumentAttributeValue': {
'StringValue': 'string',
'StringListValue': [
'string',
],
'LongValue': 123,
'DateValue': datetime(2015, 1, 1)
}
},
'DocumentContentDeletion': True|False
},
],
'PreExtractionHookConfiguration': {
'InvocationCondition': {
'ConditionDocumentAttributeKey': 'string',
'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
'ConditionOnValue': {
'StringValue': 'string',
'StringListValue': [
'string',
],
'LongValue': 123,
'DateValue': datetime(2015, 1, 1)
}
},
'LambdaArn': 'string',
'S3Bucket': 'string'
},
'PostExtractionHookConfiguration': {
'InvocationCondition': {
'ConditionDocumentAttributeKey': 'string',
'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
'ConditionOnValue': {
'StringValue': 'string',
'StringListValue': [
'string',
],
'LongValue': 123,
'DateValue': datetime(2015, 1, 1)
}
},
'LambdaArn': 'string',
'S3Bucket': 'string'
},
'RoleArn': 'string'
}
)
[REQUIRED]
A name for the data source connector.
[REQUIRED]
The identifier of the index you want to use with the data source connector.
[REQUIRED]
The type of data source repository. For example, SHAREPOINT
.
Configuration information to connect to your data source repository.
You can't specify the Configuration
parameter when the Type
parameter is set to CUSTOM
. If you do, you receive a ValidationException
exception.
The Configuration
parameter is required for all other data sources.
Provides the configuration information to connect to an Amazon S3 bucket as your data source.
The name of the bucket that contains the documents.
A list of S3 prefixes for the documents that should be included in the index.
A list of glob patterns for documents that should be indexed. If a document that matches an inclusion pattern also matches an exclusion pattern, the document is not indexed.
Some examples are:
A list of glob patterns for documents that should not be indexed. If a document that matches an inclusion prefix or inclusion pattern also matches an exclusion pattern, the document is not indexed.
Some examples are:
Document metadata files that contain information such as the document access control information, source URI, document author, and custom attributes. Each metadata file contains metadata about a single document.
A prefix used to filter metadata configuration files in the Amazon Web Services S3 bucket. The S3 bucket might contain multiple metadata files. Use S3Prefix
to include only the desired metadata files.
Provides the path to the S3 bucket that contains the user context filtering files for the data source. For the format of the file, see Access control for S3 data sources.
Path to the Amazon S3 bucket that contains the ACL files.
Provides the configuration information to connect to Microsoft SharePoint as your data source.
The version of Microsoft SharePoint that you use.
The Microsoft SharePoint site URLs for the documents you want to index.
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the user name and password required to connect to the SharePoint instance. If you use SharePoint Server, you also need to provide the sever domain name as part of the credentials. For more information, see Using a Microsoft SharePoint Data Source.
You can also provide OAuth authentication credentials of user name, password, client ID, and client secret. For more information, see Using a SharePoint data source.
TRUE
to index document attachments.
TRUE
to use the SharePoint change log to determine which documents require updating in the index. Depending on the change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in SharePoint.
A list of regular expression patterns to include certain documents in your SharePoint. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The regex applies to the display URL of the SharePoint document.
A list of regular expression patterns to exclude certain documents in your SharePoint. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The regex applies to the display URL of the SharePoint document.
Configuration information for an Amazon Virtual Private Cloud to connect to your Microsoft SharePoint. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
A list of DataSourceToIndexFieldMapping
objects that map SharePoint data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to SharePoint fields. For more information, see Mapping data source fields. The SharePoint data source field names must exist in your SharePoint custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
The Microsoft SharePoint attribute field that contains the title of the document.
TRUE
to disable local groups information.
The path to the SSL certificate stored in an Amazon S3 bucket. You use this to connect to SharePoint Server if you require a secure SSL connection.
You can simply generate a self-signed X509 certificate on any computer using OpenSSL. For an example of using OpenSSL to create an X509 certificate, see Create and sign an X509 certificate.
The name of the S3 bucket that contains the file.
The name of the file.
Whether you want to connect to SharePoint using basic authentication of user name and password, or OAuth authentication of user name, password, client ID, and client secret. You can use OAuth authentication for SharePoint Online.
Configuration information to connect to your Microsoft SharePoint site URLs via instance via a web proxy. You can use this option for SharePoint Server.
You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.
Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication of user name and password. To store web proxy credentials, you use a secret in Secrets Manager.
It is recommended that you follow best security practices when configuring your web proxy. This includes setting up throttling, setting up logging and monitoring, and applying security patches on a regular basis. If you use your web proxy with multiple data sources, sync jobs that occur at the same time could strain the load on your proxy. It is recommended you prepare your proxy beforehand for any security and load requirements.
The name of the website host you want to connect to via a web proxy server.
For example, the host name of https://a.example.com/page1.html is "a.example.com".
The port number of the website host you want to connect to via a web proxy server.
For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.
Your secret ARN, which you can create in Secrets Manager
The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.
Provides the configuration information to connect to a database as your data source.
The type of database engine that runs the database.
Configuration information that's required to connect to a database.
The name of the host for the database. Can be either a string (host.subdomain.domain.tld) or an IPv4 or IPv6 address.
The port that the database uses for connections.
The name of the database containing the document data.
The name of the table that contains the document data.
The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. For more information, see Using a Database Data Source. For more information about Secrets Manager, see What Is Secrets Manager in the Secrets Manager user guide.
Provides the configuration information to connect to an Amazon VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Information about where the index should get the document information from the database.
The column that provides the document's identifier.
The column that contains the contents of the document.
The column that contains the title of the document.
An array of objects that map database column names to the corresponding fields in an index. You must first create the fields in the index using the UpdateIndex
API.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
One to five columns that indicate when a document in the database has changed.
Information about the database column that provides information for user context filtering.
A list of groups, separated by semi-colons, that filters a query response based on user context. The document is only returned to users that are in one of the groups specified in the UserContext
field of the Query
API.
Provides information about how Amazon Kendra uses quote marks around SQL identifiers when querying a database data source.
Determines whether Amazon Kendra encloses SQL identifiers for tables and column names in double quotes (") when making a database query.
By default, Amazon Kendra passes SQL identifiers the way that they are entered into the data source configuration. It does not change the case of identifiers or enclose them in quotes.
PostgreSQL internally converts uppercase characters to lower case characters in identifiers unless they are quoted. Choosing this option encloses identifiers in quotes so that PostgreSQL does not convert the character's case.
For MySQL databases, you must enable the ansi_quotes
option when you set this field to DOUBLE_QUOTES
.
Provides the configuration information to connect to Salesforce as your data source.
The instance URL for the Salesforce site that you want to index.
The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the key/value pairs required to connect to your Salesforce instance. The secret must contain a JSON structure with the following keys:
Configuration of the Salesforce standard objects that Amazon Kendra indexes.
Provides the configuration information for indexing a single standard object.
The name of the standard object.
The name of the field in the standard object table that contains the document contents.
The name of the field in the standard object table that contains the document title.
Maps attributes or field names of the standard object to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Salesforce fields. For more information, see Mapping data source fields. The Salesforce data source field names must exist in your Salesforce custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Configuration information for the knowledge article types that Amazon Kendra indexes. Amazon Kendra indexes standard knowledge articles and the standard fields of knowledge articles, or the custom fields of custom knowledge articles, but not both.
Specifies the document states that should be included when Amazon Kendra indexes knowledge articles. You must specify at least one state.
Configuration information for standard Salesforce knowledge articles.
The name of the field that contains the document data to index.
The name of the field that contains the document title.
Maps attributes or field names of the knowledge article to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Salesforce fields. For more information, see Mapping data source fields. The Salesforce data source field names must exist in your Salesforce custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Configuration information for custom Salesforce knowledge articles.
Provides the configuration information for indexing Salesforce custom articles.
The name of the configuration.
The name of the field in the custom knowledge article that contains the document data to index.
The name of the field in the custom knowledge article that contains the document title.
Maps attributes or field names of the custom knowledge article to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Salesforce fields. For more information, see Mapping data source fields. The Salesforce data source field names must exist in your Salesforce custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Configuration information for Salesforce chatter feeds.
The name of the column in the Salesforce FeedItem table that contains the content to index. Typically this is the Body
column.
The name of the column in the Salesforce FeedItem table that contains the title of the document. This is typically the Title
column.
Maps fields from a Salesforce chatter feed into Amazon Kendra index fields.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Filters the documents in the feed based on status of the user. When you specify ACTIVE_USERS
only documents from users who have an active account are indexed. When you specify STANDARD_USER
only documents for Salesforce standard users are documented. You can specify both.
Indicates whether Amazon Kendra should index attachments to Salesforce objects.
Configuration information for processing attachments to Salesforce standard objects.
The name of the field used for the document title.
One or more objects that map fields in attachments to Amazon Kendra index fields.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of regular expression patterns to include certain documents in your Salesforce. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The pattern is applied to the name of the attached file.
A list of regular expression patterns to exclude certain documents in your Salesforce. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The pattern is applied to the name of the attached file.
Provides the configuration information to connect to Microsoft OneDrive as your data source.
The Azure Active Directory domain of the organization.
The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the user name and password to connect to OneDrive. The user name should be the application ID for the OneDrive application, and the password is the application key for the OneDrive application.
A list of user accounts whose documents should be indexed.
A list of users whose documents should be indexed. Specify the user names in email format, for example, username@tenantdomain
. If you need to index the documents of more than 100 users, use the OneDriveUserS3Path
field to specify the location of a file containing a list of users.
The S3 bucket location of a file containing a list of users whose documents should be indexed.
The name of the S3 bucket that contains the file.
The name of the file.
A list of regular expression patterns to include certain documents in your OneDrive. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The pattern is applied to the file name.
A list of regular expression patterns to exclude certain documents in your OneDrive. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the document isn't included in the index.
The pattern is applied to the file name.
A list of DataSourceToIndexFieldMapping
objects that map OneDrive data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to OneDrive fields. For more information, see Mapping data source fields. The OneDrive data source field names must exist in your OneDrive custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
TRUE
to disable local groups information.
Provides the configuration information to connect to ServiceNow as your data source.
The ServiceNow instance that the data source connects to. The host endpoint should look like the following: {instance}.service-now.com.
The Amazon Resource Name (ARN) of the Secrets Manager secret that contains the user name and password required to connect to the ServiceNow instance. You can also provide OAuth authentication credentials of user name, password, client ID, and client secret. For more information, see Using a ServiceNow data source.
The identifier of the release that the ServiceNow host is running. If the host is not running the LONDON
release, use OTHERS
.
Configuration information for crawling knowledge articles in the ServiceNow site.
TRUE
to index attachments to knowledge articles.
A list of regular expression patterns to include certain attachments of knowledge articles in your ServiceNow. Item that match the patterns are included in the index. Items that don't match the patterns are excluded from the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
The regex is applied to the field specified in the PatternTargetField
.
A list of regular expression patterns to exclude certain attachments of knowledge articles in your ServiceNow. Item that match the patterns are excluded from the index. Items that don't match the patterns are included in the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
The regex is applied to the field specified in the PatternTargetField
.
The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.
The name of the ServiceNow field that is mapped to the index document title field.
Maps attributes or field names of knoweldge articles to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to ServiceNow fields. For more information, see Mapping data source fields. The ServiceNow data source field names must exist in your ServiceNow custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A query that selects the knowledge articles to index. The query can return articles from multiple knowledge bases, and the knowledge bases can be public or private.
The query string must be one generated by the ServiceNow console. For more information, see Specifying documents to index with a query.
Configuration information for crawling service catalogs in the ServiceNow site.
TRUE
to index attachments to service catalog items.
A list of regular expression patterns to include certain attachments of catalogs in your ServiceNow. Item that match the patterns are included in the index. Items that don't match the patterns are excluded from the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
The regex is applied to the file name of the attachment.
A list of regular expression patterns to exclude certain attachments of catalogs in your ServiceNow. Item that match the patterns are excluded from the index. Items that don't match the patterns are included in the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
The regex is applied to the file name of the attachment.
The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.
The name of the ServiceNow field that is mapped to the index document title field.
Maps attributes or field names of catalogs to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to ServiceNow fields. For more information, see Mapping data source fields. The ServiceNow data source field names must exist in your ServiceNow custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
The type of authentication used to connect to the ServiceNow instance. If you choose HTTP_BASIC
, Amazon Kendra is authenticated using the user name and password provided in the Secrets Manager secret in the SecretArn
field. If you choose OAUTH2
, Amazon Kendra is authenticated using the credentials of client ID, client secret, user name and password.
When you use OAUTH2
authentication, you must generate a token and a client secret using the ServiceNow console. For more information, see Using a ServiceNow data source.
Provides the configuration information to connect to Confluence as your data source.
The URL of your Confluence instance. Use the full URL of the server. For example, https://server.example.com:port/ . You can also use an IP address, for example, https://192.168.1.113/ .
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the user name and password required to connect to the Confluence instance. If you use Confluence Cloud, you use a generated API token as the password.
You can also provide authentication credentials in the form of a personal access token. For more information, see Using a Confluence data source.
The version or the type of Confluence installation to connect to.
Configuration information for indexing Confluence spaces.
TRUE
to index personal spaces. You can add restrictions to items in personal spaces. If personal spaces are indexed, queries without user context information may return restricted items from a personal space in their results. For more information, see Filtering on user context.
TRUE
to index archived spaces.
A list of space keys for Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are indexed. Spaces that aren't in the list aren't indexed. A space in the list must exist. Otherwise, Amazon Kendra logs an error when the data source is synchronized. If a space is in both the IncludeSpaces
and the ExcludeSpaces
list, the space is excluded.
A list of space keys of Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are not indexed. If a space is in both the ExcludeSpaces
and the IncludeSpaces
list, the space is excluded.
Maps attributes or field names of Confluence spaces to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
If you specify the SpaceFieldMappings
parameter, you must specify at least one field mapping.
Maps attributes or field names of Confluence spaces to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
The name of the field in the data source.
The format for date fields in the data source. If the field specified in DataSourceFieldName
is a date field you must specify the date format. If the field is not a date field, an exception is thrown.
The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.
Configuration information for indexing Confluence pages.
Maps attributes or field names of Confluence pages to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
If you specify the PageFieldMappings
parameter, you must specify at least one field mapping.
Maps attributes or field names of Confluence pages to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
The name of the field in the data source.
The format for date fields in the data source. If the field specified in DataSourceFieldName
is a date field you must specify the date format. If the field is not a date field, an exception is thrown.
The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.
Configuration information for indexing Confluence blogs.
Maps attributes or field names of Confluence blogs to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
If you specify the BlogFieldMappings
parameter, you must specify at least one field mapping.
Maps attributes or field names of Confluence blog to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
The name of the field in the data source.
The format for date fields in the data source. If the field specified in DataSourceFieldName
is a date field you must specify the date format. If the field is not a date field, an exception is thrown.
The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.
Configuration information for indexing attachments to Confluence blogs and pages.
TRUE
to index attachments of pages and blogs in Confluence.
Maps attributes or field names of Confluence attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
If you specify the AttachentFieldMappings
parameter, you must specify at least one field mapping.
Maps attributes or field names of Confluence attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Confluence fields. For more information, see Mapping data source fields. The Confuence data source field names must exist in your Confluence custom metadata.
The name of the field in the data source.
You must first create the index field using the UpdateIndex
API.
The format for date fields in the data source. If the field specified in DataSourceFieldName
is a date field you must specify the date format. If the field is not a date field, an exception is thrown.
The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.
Configuration information for an Amazon Virtual Private Cloud to connect to your Confluence. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
A list of regular expression patterns to include certain blog posts, pages, spaces, or attachments in your Confluence. Content that matches the patterns are included in the index. Content that doesn't match the patterns is excluded from the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the content isn't included in the index.
A list of regular expression patterns to exclude certain blog posts, pages, spaces, or attachments in your Confluence. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns is included in the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the content isn't included in the index.
Configuration information to connect to your Confluence URL instance via a web proxy. You can use this option for Confluence Server.
You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.
Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication of user name and password. To store web proxy credentials, you use a secret in Secrets Manager.
It is recommended that you follow best security practices when configuring your web proxy. This includes setting up throttling, setting up logging and monitoring, and applying security patches on a regular basis. If you use your web proxy with multiple data sources, sync jobs that occur at the same time could strain the load on your proxy. It is recommended you prepare your proxy beforehand for any security and load requirements.
The name of the website host you want to connect to via a web proxy server.
For example, the host name of https://a.example.com/page1.html is "a.example.com".
The port number of the website host you want to connect to via a web proxy server.
For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.
Your secret ARN, which you can create in Secrets Manager
The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.
Whether you want to connect to Confluence using basic authentication of user name and password, or a personal access token. You can use a personal access token for Confluence Server.
Provides the configuration information to connect to Google Drive as your data source.
The Amazon Resource Name (ARN) of a Secrets Managersecret that contains the credentials required to connect to Google Drive. For more information, see Using a Google Workspace Drive data source.
A list of regular expression patterns to include certain items in your Google Drive, including shared drives and users' My Drives. Items that match the patterns are included in the index. Items that don't match the patterns are excluded from the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
A list of regular expression patterns to exclude certain items in your Google Drive, including shared drives and users' My Drives. Items that match the patterns are excluded from the index. Items that don't match the patterns are included in the index. If an item matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the item isn't included in the index.
Maps Google Drive data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Google Drive fields. For more information, see Mapping data source fields. The Google Drive data source field names must exist in your Google Drive custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of MIME types to exclude from the index. All documents matching the specified MIME type are excluded.
For a list of MIME types, see Using a Google Workspace Drive data source.
A list of email addresses of the users. Documents owned by these users are excluded from the index. Documents shared with excluded users are indexed unless they are excluded in another way.
A list of identifiers or shared drives to exclude from the index. All files and folders stored on the shared drive are excluded.
Provides the configuration information required for Amazon Kendra Web Crawler.
Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.
You can include website subdomains. You can list up to 100 seed URLs and up to three sitemap URLs.
You can only crawl websites that use the secure communication protocol, Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling.
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own webpages, or webpages that you have authorization to index.
Configuration of the seed or starting point URLs of the websites you want to crawl.
You can choose to crawl only the website host names, or the website host names with subdomains, or the website host names with subdomains and other domains that the webpages link to.
You can list up to 100 seed URLs.
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
You can choose one of the following modes:
HOST_ONLY
– crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.SUBDOMAINS
– crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.EVERYTHING
– crawl the website host names with subdomains and other domains that the webpages link to.The default mode is set to HOST_ONLY
.
Configuration of the sitemap URLs of the websites you want to crawl.
Only URLs belonging to the same website host names are crawled. You can list up to three sitemap URLs.
The list of sitemap URLs of the websites you want to crawl.
The list can include a maximum of three sitemap URLs.
Specifies the number of levels in a website that you want to crawl.
The first level begins from the website seed or starting point URL. For example, if a website has 3 levels – index level (i.e. seed in this example), sections level, and subsections level – and you are only interested in crawling information up to the sections level (i.e. levels 0-1), you can set your depth to 1.
The default crawl depth is set to 2.
The maximum number of URLs on a webpage to include when crawling a website. This number is per webpage.
As a website’s webpages are crawled, any URLs the webpages link to are also crawled. URLs on a webpage are crawled in order of appearance.
The default maximum links per page is 100.
The maximum size (in MB) of a webpage or attachment to crawl.
Files larger than this size (in MB) are skipped/not crawled.
The default maximum size of a webpage or attachment is set to 50 MB.
The maximum number of URLs crawled per website host per minute.
A minimum of one URL is required.
The default maximum number of URLs crawled per website host per minute is 300.
A list of regular expression patterns to include certain URLs to crawl. URLs that match the patterns are included in the index. URLs that don't match the patterns are excluded from the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the URL file isn't included in the index.
A list of regular expression patterns to exclude certain URLs to crawl. URLs that match the patterns are excluded from the index. URLs that don't match the patterns are included in the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the URL file isn't included in the index.
Configuration information required to connect to your internal websites via a web proxy.
You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.
Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication. To store web proxy credentials, you use a secret in Secrets Manager.
The name of the website host you want to connect to via a web proxy server.
For example, the host name of https://a.example.com/page1.html is "a.example.com".
The port number of the website host you want to connect to via a web proxy server.
For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.
Your secret ARN, which you can create in Secrets Manager
The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.
Configuration information required to connect to websites using authentication.
You can connect to websites using basic authentication of user name and password. You use a secret in Secrets Manager to store your authentication credentials.
You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.
The list of configuration information that's required to connect to and crawl a website host using basic authentication credentials.
The list includes the name and port number of the website host.
Provides the configuration information to connect to websites that require basic user authentication.
The name of the website host you want to connect to using authentication credentials.
For example, the host name of https://a.example.com/page1.html is "a.example.com".
The port number of the website host you want to connect to using authentication credentials.
For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.
Your secret ARN, which you can create in Secrets Manager
You use a secret if basic authentication credentials are required to connect to a website. The secret stores your credentials of user name and password.
Provides the configuration information to connect to Amazon WorkDocs as your data source.
The identifier of the directory corresponding to your Amazon WorkDocs site repository.
You can find the organization ID in the Directory Service by going to Active Directory , then Directories . Your Amazon WorkDocs site directory has an ID, which is the organization ID. You can also set up a new Amazon WorkDocs directory in the Directory Service console and enable a Amazon WorkDocs site for the directory in the Amazon WorkDocs console.
TRUE
to include comments on documents in your index. Including comments in your index means each comment is a document that can be searched on.
The default is set to FALSE
.
TRUE
to use the Amazon WorkDocs change log to determine which documents require updating in the index. Depending on the change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in Amazon WorkDocs.
A list of regular expression patterns to include certain files in your Amazon WorkDocs site repository. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain files in your Amazon WorkDocs site repository. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of DataSourceToIndexFieldMapping
objects that map Amazon WorkDocs data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Amazon WorkDocs fields. For more information, see Mapping data source fields. The Amazon WorkDocs data source field names must exist in your Amazon WorkDocs custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Provides the configuration information to connect to Amazon FSx as your data source.
The identifier of the Amazon FSx file system.
You can find your file system ID on the file system dashboard in the Amazon FSx console. For information on how to create a file system in Amazon FSx console, using Windows File Server as an example, see Amazon FSx Getting started guide.
The Amazon FSx file system type. Windows is currently the only supported type.
Configuration information for an Amazon Virtual Private Cloud to connect to your Amazon FSx. Your Amazon FSx instance must reside inside your VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs required to connect to your Amazon FSx file system. Windows is currently the only supported type. The secret must contain a JSON structure with the following keys:
A list of regular expression patterns to include certain files in your Amazon FSx file system. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain files in your Amazon FSx file system. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of DataSourceToIndexFieldMapping
objects that map Amazon FSx data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Amazon FSx fields. For more information, see Mapping data source fields. The Amazon FSx data source field names must exist in your Amazon FSx custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Provides the configuration information to connect to Slack as your data source.
The identifier of the team in the Slack workspace. For example, T0123456789 .
You can find your team ID in the URL of the main page of your Slack workspace. When you log in to Slack via a browser, you are directed to the URL of the main page. For example, https://app.slack.com/client/T0123456789 /....
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs required to connect to your Slack workspace team. The secret must contain a JSON structure with the following keys:
Configuration information for an Amazon Virtual Private Cloud to connect to your Slack. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Specify whether to index public channels, private channels, group messages, and direct messages. You can specify one or more of these options.
TRUE
to use the Slack change log to determine which documents require updating in the index. Depending on the Slack change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in Slack.
TRUE
to index bot messages from your Slack workspace team.
TRUE
to exclude archived messages to index from your Slack workspace team.
The date to start crawling your data from your Slack workspace team. The date must follow this format: yyyy-mm-dd
.
The number of hours for change log to look back from when you last synchronized your data. You can look back up to 7 days or 168 hours.
Change log updates your index only if new content was added since you last synced your data. Updated or deleted content from before you last synced does not get updated in your index. To capture updated or deleted content before you last synced, set the LookBackPeriod
to the number of hours you want change log to look back.
The list of private channel names from your Slack workspace team. You use this if you want to index specific private channels, not all private channels. You can also use regular expression patterns to filter private channels.
The list of public channel names to index from your Slack workspace team. You use this if you want to index specific public channels, not all public channels. You can also use regular expression patterns to filter public channels.
A list of regular expression patterns to include certain attached files in your Slack workspace team. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain attached files in your Slack workspace team. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of DataSourceToIndexFieldMapping
objects that map Slack data source attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Slack fields. For more information, see Mapping data source fields. The Slack data source field names must exist in your Slack custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Provides the configuration information to connect to Box as your data source.
The identifier of the Box Enterprise platform. You can find the enterprise ID in the Box Developer Console settings or when you create an app in Box and download your authentication credentials. For example, 801234567 .
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs required to connect to your Box platform. The secret must contain a JSON structure with the following keys:
You create an application in Box to generate the keys or credentials required for the secret. For more information, see Using a Box data source.
TRUE
to use the Slack change log to determine which documents require updating in the index. Depending on the data source change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents.
TRUE
to index comments.
TRUE
to index the contents of tasks.
TRUE
to index web links.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Box files to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Box fields. For more information, see Mapping data source fields. The Box field names must exist in your Box custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Box tasks to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Box fields. For more information, see Mapping data source fields. The Box field names must exist in your Box custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Box comments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Box fields. For more information, see Mapping data source fields. The Box field names must exist in your Box custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Box web links to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Box fields. For more information, see Mapping data source fields. The Box field names must exist in your Box custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of regular expression patterns to include certain files and folders in your Box platform. Files and folders that match the patterns are included in the index. Files and folders that don't match the patterns are excluded from the index. If a file or folder matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file or folder isn't included in the index.
A list of regular expression patterns to exclude certain files and folders from your Box platform. Files and folders that match the patterns are excluded from the index.Files and folders that don't match the patterns are included in the index. If a file or folder matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file or folder isn't included in the index.
Configuration information for an Amazon VPC to connect to your Box. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Provides the configuration information to connect to Quip as your data source.
The Quip site domain. For example, https://quip-company.quipdomain.com/browse . The domain in this example is "quipdomain".
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs that are required to connect to your Quip. The secret must contain a JSON structure with the following keys:
TRUE
to index file comments.
TRUE
to index the contents of chat rooms.
TRUE
to index attachments.
The identifiers of the Quip folders you want to index. You can find the folder ID in your browser URL when you access your folder in Quip. For example, https://quip-company.quipdomain.com/zlLuOVNSarTL/folder-name . The folder ID in this example is "zlLuOVNSarTL".
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Quip threads to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Quip fields. For more information, see Mapping data source fields. The Quip field names must exist in your Quip custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Quip messages to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Quip fields. For more information, see Mapping data source fields. The Quip field names must exist in your Quip custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Quip attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Quip fields. For more information, see Mapping data source fields. The Quip field names must exist in your Quip custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of regular expression patterns to include certain files in your Quip file system. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
A list of regular expression patterns to exclude certain files in your Quip file system. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
Configuration information for an Amazon Virtual Private Cloud (VPC) to connect to your Quip. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Provides the configuration information to connect to Jira as your data source.
The URL of the Jira account. For example, company.atlassian.net .
The Amazon Resource Name (ARN) of a secret in Secrets Manager contains the key-value pairs required to connect to your Jira data source. The secret must contain a JSON structure with the following keys:
TRUE
to use the Jira change log to determine which documents require updating in the index. Depending on the change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in Jira.
Specify which projects to crawl in your Jira data source. You can specify one or more Jira project IDs.
Specify which issue types to crawl in your Jira data source. You can specify one or more of these options to crawl.
Specify which statuses to crawl in your Jira data source. You can specify one or more of these options to crawl.
Specify whether to crawl comments, attachments, and work logs. You can specify one or more of these options.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Jira attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Jira fields. For more information, see Mapping data source fields. The Jira data source field names must exist in your Jira custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Jira comments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Jira fields. For more information, see Mapping data source fields. The Jira data source field names must exist in your Jira custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Jira issues to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Jira fields. For more information, see Mapping data source fields. The Jira data source field names must exist in your Jira custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Jira projects to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Jira fields. For more information, see Mapping data source fields. The Jira data source field names must exist in your Jira custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Jira work logs to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Jira fields. For more information, see Mapping data source fields. The Jira data source field names must exist in your Jira custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of regular expression patterns to include certain file paths, file names, and file types in your Jira data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain file paths, file names, and file types in your Jira data source. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
Configuration information for an Amazon Virtual Private Cloud to connect to your Jira. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Provides the configuration information to connect to GitHub as your data source.
Configuration information to connect to GitHub Enterprise Cloud (SaaS).
The name of the organization of the GitHub Enterprise Cloud (SaaS) account you want to connect to. You can find your organization name by logging into GitHub desktop and selecting Your organizations under your profile picture dropdown.
The GitHub host URL or API endpoint URL. For example, https://api.github.com .
Configuration information to connect to GitHub Enterprise Server (on premises).
The GitHub host URL or API endpoint URL. For example, https://on-prem-host-url/api/v3/
The name of the organization of the GitHub Enterprise Server (in-premise) account you want to connect to. You can find your organization name by logging into GitHub desktop and selecting Your organizations under your profile picture dropdown.
The path to the SSL certificate stored in an Amazon S3 bucket. You use this to connect to GitHub if you require a secure SSL connection.
You can simply generate a self-signed X509 certificate on any computer using OpenSSL. For an example of using OpenSSL to create an X509 certificate, see Create and sign an X509 certificate.
The name of the S3 bucket that contains the file.
The name of the file.
The type of GitHub service you want to connect to—GitHub Enterprise Cloud (SaaS) or GitHub Enterprise Server (on premises).
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs required to connect to your GitHub. The secret must contain a JSON structure with the following keys:
TRUE
to use the GitHub change log to determine which documents require updating in the index. Depending on the GitHub change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in GitHub.
Configuration information to include certain types of GitHub content. You can configure to index repository files only, or also include issues and pull requests, comments, and comment attachments.
TRUE
to index all files with a repository.
TRUE
to index all issues within a repository.
TRUE
to index all comments on issues.
TRUE
to include all comment attachments for issues.
TRUE
to index all pull requests within a repository.
TRUE
to index all comments on pull requests.
TRUE
to include all comment attachments for pull requests.
A list of names of the specific repositories you want to index.
A list of regular expression patterns to include certain folder names in your GitHub repository or repositories. Folder names that match the patterns are included in the index. Folder names that don't match the patterns are excluded from the index. If a folder matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the folder isn't included in the index.
A list of regular expression patterns to include certain file types in your GitHub repository or repositories. File types that match the patterns are included in the index. File types that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to include certain file names in your GitHub repository or repositories. File names that match the patterns are included in the index. File names that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain folder names in your GitHub repository or repositories. Folder names that match the patterns are excluded from the index. Folder names that don't match the patterns are included in the index. If a folder matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the folder isn't included in the index.
A list of regular expression patterns to exclude certain file types in your GitHub repository or repositories. File types that match the patterns are excluded from the index. File types that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain file names in your GitHub repository or repositories. File names that match the patterns are excluded from the index. File names that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
Configuration information of an Amazon Virtual Private Cloud to connect to your GitHub. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
A list of DataSourceToIndexFieldMapping
objects that map GitHub repository attributes or field names to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub commits to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub issues to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub issue comments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub issue attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub pull request comments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub pull requests to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of GitHub pull request attachments to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to GitHub fields. For more information, see Mapping data source fields. The GitHub data source field names must exist in your GitHub custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
Provides the configuration information to connect to Alfresco as your data source.
The URL of the Alfresco site. For example, https://hostname:8080 .
The identifier of the Alfresco site. For example, my-site .
The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key-value pairs required to connect to your Alfresco data source. The secret must contain a JSON structure with the following keys:
The path to the SSL certificate stored in an Amazon S3 bucket. You use this to connect to Alfresco if you require a secure SSL connection.
You can simply generate a self-signed X509 certificate on any computer using OpenSSL. For an example of using OpenSSL to create an X509 certificate, see Create and sign an X509 certificate.
The name of the S3 bucket that contains the file.
The name of the file.
TRUE
to index shared files.
TRUE
to index comments of blogs and other content.
Specify whether to index document libraries, wikis, or blogs. You can specify one or more of these options.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Alfresco document libraries to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Alfresco fields. For more information, see Mapping data source fields. The Alfresco data source field names must exist in your Alfresco custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Alfresco blogs to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Alfresco fields. For more information, see Mapping data source fields. The Alfresco data source field names must exist in your Alfresco custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of DataSourceToIndexFieldMapping
objects that map attributes or field names of Alfresco wikis to Amazon Kendra index field names. To create custom fields, use the UpdateIndex
API before you map to Alfresco fields. For more information, see Mapping data source fields. The Alfresco data source field names must exist in your Alfresco custom metadata.
Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex
API.
The name of the column or attribute in the data source.
The type of data stored in the column or attribute.
The name of the field in the index.
A list of regular expression patterns to include certain files in your Alfresco data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
A list of regular expression patterns to exclude certain files in your Alfresco data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
Configuration information for an Amazon Virtual Private Cloud to connect to your Alfresco. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Provides a template for the configuration information to connect to your data source.
The template schema used for the data source, where templates schemas are supported.
Configuration information for an Amazon Virtual Private Cloud to connect to your data source. For more information, see Configuring a VPC.
A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.
A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.
Sets the frequency for Amazon Kendra to check the documents in your data source repository and update the index. If you don't set a schedule Amazon Kendra will not periodically update the index. You can call the StartDataSourceSyncJob
API to update the index.
You can't specify the Schedule
parameter when the Type
parameter is set to CUSTOM
. If you do, you receive a ValidationException
exception.
The Amazon Resource Name (ARN) of a role with permission to access the data source and required resources. For more information, see IAM roles for Amazon Kendra.
You can't specify the RoleArn
parameter when the Type
parameter is set to CUSTOM
. If you do, you receive a ValidationException
exception.
The RoleArn
parameter is required for all other data sources.
A list of key-value pairs that identify the data source connector. You can use the tags to identify and organize your resources and to control access to resources.
A list of key/value pairs that identify an index, FAQ, or data source. Tag keys and values can consist of Unicode letters, digits, white space, and any of the following symbols: _ . : / = + - @.
The key for the tag. Keys are not case sensitive and must be unique for the index, FAQ, or data source.
The value associated with the tag. The value may be an empty string but it can't be null.
A token that you provide to identify the request to create a data source connector. Multiple calls to the CreateDataSource
API with the same client token will create only one data source connector.
This field is autopopulated if not provided.
Configuration information for altering document metadata and content during the document ingestion process.
For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.
Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.
Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.
For more information, see Customizing document metadata during the ingestion process.
Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.
The identifier of the document attribute used for the condition.
For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
Amazon Kendra currently does not support _document_body
as an attribute key used for the condition.
The condition operator.
For example, you can use 'Contains' to partially match a string.
The value used by the operator.
For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
A string, such as "department".
A list of strings. The default maximum length or number of strings is 10.
A long integer value.
A date expressed as an ISO 8601 string.
It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.
The identifier of the target document attribute or metadata field.
For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.
TRUE
to delete the existing target value for your specified target attribute key. You cannot create a target value and set this toTRUE
. To create a target value (TargetDocumentAttributeValue
), set this toFALSE
.
The target value you want to create for the target attribute.
For example, 'Finance' could be the target value for the target attribute key 'Department'.
A string, such as "department".
A list of strings. The default maximum length or number of strings is 10.
A long integer value.
A date expressed as an ISO 8601 string.
It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
TRUE
to delete content if the condition used for the target attribute is met.
Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
The condition used for when a Lambda function should be invoked.
For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
The identifier of the document attribute used for the condition.
For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
Amazon Kendra currently does not support _document_body
as an attribute key used for the condition.
The condition operator.
For example, you can use 'Contains' to partially match a string.
The value used by the operator.
For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
A string, such as "department".
A list of strings. The default maximum length or number of strings is 10.
A long integer value.
A date expressed as an ISO 8601 string.
It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.
Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
The condition used for when a Lambda function should be invoked.
For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
The identifier of the document attribute used for the condition.
For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
Amazon Kendra currently does not support _document_body
as an attribute key used for the condition.
The condition operator.
For example, you can use 'Contains' to partially match a string.
The value used by the operator.
For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
A string, such as "department".
A list of strings. The default maximum length or number of strings is 10.
A long integer value.
A date expressed as an ISO 8601 string.
It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.
Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration
and PostExtractionHookConfiguration
for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.
dict
Response Syntax
{
'Id': 'string'
}
Response Structure
(dict) --
Id (string) --
The identifier of the data source connector.
Exceptions
kendra.Client.exceptions.ValidationException
kendra.Client.exceptions.ConflictException
kendra.Client.exceptions.ResourceNotFoundException
kendra.Client.exceptions.ResourceAlreadyExistException
kendra.Client.exceptions.ServiceQuotaExceededException
kendra.Client.exceptions.ThrottlingException
kendra.Client.exceptions.AccessDeniedException
kendra.Client.exceptions.InternalServerException