Collections#
Overview#
A collection provides an iterable interface to a group of resources. Collections behave similarly to Django QuerySets and expose a similar API. A collection seamlessly handles pagination for you, making it possible to easily iterate over all items from all pages of data. Example of a collection:
# SQS list all queues
sqs = boto3.resource('sqs')
for queue in sqs.queues.all():
print(queue.url)
When collections make requests#
Collections can be created and manipulated without any request being made to the underlying service. A collection makes a remote service request under the following conditions:
Iteration:
for bucket in s3.buckets.all(): print(bucket.name)
Conversion to list():
buckets = list(s3.buckets.all())
Batch actions (see below):
s3.Bucket('my-bucket').objects.delete()
Filtering#
Some collections support extra arguments to filter the returned data set,
which are passed into the underlying service operation. Use the
filter()
method to filter
the results:
# S3 list all keys with the prefix 'photos/'
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
for obj in bucket.objects.filter(Prefix='photos/'):
print('{0}:{1}'.format(bucket.name, obj.key))
Warning
Behind the scenes, the above example will call ListBuckets
,
ListObjects
, and HeadObject
many times. If you have a large
number of S3 objects then this could incur a significant cost.
Chainability#
Collection methods are chainable. They return copies of the collection rather than modifying the collection, including a deep copy of any associated operation parameters. For example, this allows you to build up multiple collections from a base which they all have in common:
# EC2 find instances
ec2 = boto3.resource('ec2')
base = ec2.instances.filter(InstanceIds=['id1', 'id2', 'id3'])
filters = [{
'Name': 'tenancy',
'Values': ['dedicated']
}]
filtered1 = base.filter(Filters=filters)
# Note, this does NOT modify the filters in ``filtered1``!
filters.append({'name': 'instance-type', 'value': 't1.micro'})
filtered2 = base.filter(Filters=filters)
print('All instances:')
for instance in base:
print(instance.id)
print('Dedicated instances:')
for instance in filtered1:
print(instance.id)
print('Dedicated micro instances:')
for instance in filtered2:
print(instance.id)
Limiting results#
It is possible to limit the number of items returned from a collection
by using either the
limit()
method:
# S3 iterate over first ten buckets
for bucket in s3.buckets.limit(10):
print(bucket.name)
In both cases, up to 10 items total will be returned. If you do not have 10 buckets, then all of your buckets will be returned.
Controlling page size#
Collections automatically handle paging through results, but you may want
to control the number of items returned from a single service operation
call. You can do so using the
page_size()
method:
# S3 iterate over all objects 100 at a time
for obj in bucket.objects.page_size(100):
print(obj.key)
By default, S3 will return 1000 objects at a time, so the above code would let you process the items in smaller batches, which could be beneficial for slow or unreliable internet connections.
Batch actions#
Some collections support batch actions, which are actions that operate on an entire page of results at a time. They will automatically handle pagination:
# S3 delete everything in `my-bucket`
s3 = boto3.resource('s3')
s3.Bucket('my-bucket').objects.delete()
Danger
The above example will completely erase all data in the my-bucket
bucket! Please be careful with batch actions.