Python has a surprisingly large number of built-in and auxiliary collection types. Collections are objects that contain zero or more member objects, often called elements. There are 3 main categories of collection: sequences, mappings, and sets.
Let's recap the types table from the Data Types chapter. We've reproduced a portion of the table below:
Type | Class | Category | Kind | Mutable |
---|---|---|---|---|
ranges | range |
sequences | Non-primitive | No |
tuples | tuple |
sequences | Non-primitive | No |
lists | list |
sequences | Non-primitive | Yes |
dictionaries | dict |
mappings | Non-primitive | Yes |
sets | set |
sets | Non-primitive | Yes |
frozen sets | frozenset |
sets | Non-primitive | No |
As you can see, the collections fall into several categories: sequences, mappings, and sets.
We often also include text sequences when talking about collections:
Type | Class | Category | Kind | Mutable |
---|---|---|---|---|
strings | str |
text sequences | Primitive | No |
The primary text sequence of interest is strings. Strings and other text sequences are not collections, however. They are sufficiently similar that we can treat them as sequence collections, but its important to remember that they are not. For the purposes of this chapter, we will discuss text sequences as though they were collections.
Some collections are mutable; others are immutable. Keeping them straight takes some practice; be patient. In particular, lists and tuples differ only in that lists are mutable; tuples are not. The set collections differ similarly: sets are mutable; frozen sets are not.
We've met all these types earlier in this book. In this lesson, we'll explore them in depth.
Sequences are types that maintain an ordered collection of objects (also: elements or values) that can be indexed by whole numbers. Ordered collections are collections organized in some sequence: a first element, a second element, a third element, and so on. Indexed sequences associate every object member with a whole number (0, 1, 2, etc.) that can be used to access or modify that object. The built-in sequences (ranges, lists, and tuples) always have the first object at index 0, the second at index 1, and so on.
letters = ['a', 'b', 'c']
print(letters[0]) # a (first element)
print(letters[1]) # b (second element)
print(letters[2]) # c (third element)
Lists and tuples are heterogeneous; they may contain different kinds of objects, including other sequences:
my_list = [
'May',
2.99,
{None, True, False},
[1, 2, 3],
]
Ranges are homogenous; they always contain integers.
Strings are a form of sequence called a text sequence. They differ from ordinary sequences in several ways:
Bullet 3 merits an example, but it requires using Unicode. The Greek letter theta (θ) is a fine choice. Let's look at two snippets, one using a list and the other a string:
letters = ['a', 'b', 'θ', 'd', 'e']
char = letters[2]
print(char is letters[2]) # True
letters = 'abθde'
char = letters[2]
print(char is letters[2]) # False
Example 1 shows that letters[2]
returns the object at index 2 of the letters
list. Both char
and letters[2]
reference the same object, as shown on line 3.
However, in Example 2, the two characters are not the same object. They have the same value but are distinct objects.
We used θ
to avoid interning, a topic we discussed earlier. Python would have interned any character from the extended ASCII character set, and both snippets would have returned True
. By switching to a non-ASCII character, we could show that characters in a string aren't objects.
While there are differences, we can usually treat strings as ordinary sequences.
Sets are types that maintain an unordered collection of unique objects (also called elements or members). Unlike sequences, sets cannot be indexed. Unordered means no well-defined order exists for the objects in a set. In fact, the sets {1, 2, 3}
and {3, 1, 2}
are equal since the order doesn't matter. By unique, we mean a set can not have duplicate members.
Python has two main built-in set types: sets and frozen sets. Regular sets are mutable; frozen sets are immutable. This is the only significant difference between the two:
letters = {'a', 'b', 'c'}
letters.add('d')
print(letters)
# {'a', 'b', 'c', 'd'} (order may differ)
frozen_letters = frozenset(letters)
frozen_letters.add('e')
# AttributeError: 'frozenset' object has no
# attribute 'add'
Sets and frozen sets, like Lists and tuples, are heterogeneous; they may contain different kinds of objects, including other hashable collections:
my_set = {
'May',
2.99,
(None, True, False),
range(5),
}
Python will ignore any attempt to add duplicate members to a set:
letters = {'a', 'b', 'c', 'b', 'a'}
print(letters)
# {'a', 'c', 'b'} (order may differ)
letters.add('c')
print(letters)
# {'a', 'c', 'b'} (order may differ)
Since Python 3.7, sets of integers sometimes seem to be ordered. For instance:
numbers = { 1, 2, 3, 4, 5 }
print(numbers) # { 1, 2, 3, 4, 5 }
numbers = { 5, 3, 1, 2, 4 }
print(numbers) # { 1, 2, 3, 4, 5 }
No matter how often you run that code, it prints both sets with the same ordered values. It's an illusion, however, produced by an implementation detail. The behavior isn't guaranteed: it's easy to construct sets of integers that are clearly unordered:
numbers = { 1, 2, 37, 4, 5 }
print(numbers) # {1, 2, 4, 37, 5}
The order of the numbers you see may not match ours, but you will likely see that the numbers are no longer predictably ordered.
Mappings are types that maintain an unordered collection of key/value pairs (also called elements or members). Unlike sequences, mappings are accessed by their keys, which usually are not numbers. While Python has several mapping types, the only one we'll meet in this book is the dictionary or dict type.
Dicts have the following characteristics:
None
) are hashable and can be dict keys.
d = {'a': 1, (1, 3): 3, 1: 'x'}
print(d) # {'a': 1, (1, 3): 3, 1: 'x'}
print(d['a']) # 1
print(d[(1, 3)]) # 3
print(d[1]) # 'x'
d['a'] = 'A'
print(d) # {'a': 'A', (1, 3): 3, 1: 'x'}
Since version 3.7, Python dicts maintain the insertion order of the pairs. Python guarantees it. Thus, you can iterate over the pairs in the same order they were inserted into the dictionary. However, this behavior doesn't fundamentally alter whether dicts are unordered collections. In practice, you should mentally think of dicts as unordered collections, but it's OK to rely on the ordering in contexts where the insertion order matters.
d = {'a': 1, (1, 3): 3, 1: 'x'}
del d['a'] # Deletes key/value pair 'a': 1
print(d) # {(1, 3): 3, 1: 'x'}
d['a'] = 42
print(d) # {(1, 3): 3, 1: 'x', 'a': 42}
Most built-in Python data types let the programmer create new objects using literal values. Literals are great. However, you can also use special functions called constructors to create new objects. In fact, sometimes you can't use literals; you must use constructors to create ranges, frozen sets, and empty sets.
Let's examine the constructors used with collections.
The string constructor, str
, has two basic formats, str()
and str(something)
, where something
is any Python object. Regardless of what you pass to str
, it returns a string. For instance:
str() # returns '' (empty string)
str('abc') # returns 'abc'
str(42) # returns '42'
str(3.141592) # returns '3.141592'
str(False) # returns 'False'
str(None) # returns 'None'
str(range(3, 7)) # returns 'range(3, 7)'
str([1, 2, 3]) # returns '[1, 2, 3]'
str(int) # returns "<class 'int'>"
str(list) # returns "<class 'list'>"
class Person: pass
str(Person())
# returns "<__main__.Person object at 0x1006d21e0>"
In most cases, the values returned by str
give a good idea of the original value. However, the last 6 lines show that the return value isn't always helpful. This is especially true with custom types like the Person
class above (we'll learn about custom types in the OO Python book). You can fix this issue with custom types, but we won't discuss that until later in the Core Curriculum.
The range
constructor comes in 3 forms:
range(start, stop, step)
This constructor generates a sequence of integers between start
and stop - 1
with an increment of step
between each consecutive integer. You can use a negative step to generate a sequence in reverse order. In this case, the range ends at stop + 1
. For instance:
r = range(5, 12, 2)
print(list(r)) # [5, 7, 9, 11]
r = range(12, 8, -1)
print(list(r)) # [12, 11, 10, 9]
r = range(12, 5, -2)
print(list(r)) # [12, 10, 8, 6]
You can create empty ranges by giving values where start >= stop
when step
is positive or start <= stop
when step
is negative. Empty ranges are often bugs.
r = range(5, 5, 1)
print(list(r)) # []
r = range(5, 7, -1)
print(list(r)) # []
range(start, stop)
When you omit the step
argument, Python uses a default value of 1
. Hence, range(start, stop)
is identical to range(start, stop, 1)
.
range(stop)
When you omit the start
argument, Python uses a default value of 0
for start
. Hence, range(stop)
is identical to range(0, stop, 1)
.
Ranges are lazy sequences: they don't create any element values until your program needs them. This is why we use list(r)
above; without it, print(r)
prints something like range(5, 12, 2)
, which may not be helpful. You can also use the tuple
constructor to do the same thing. The set
and frozenset
constructors also work, but the sets may not be in the same order.
Lists, tuples, sets, and frozen sets share two common constructor forms:
list()
or list(iterable)
tuple()
or tuple(iterable)
set()
or set(iterable)
frozenset()
or frozenset(iterable)
The constructors with no arguments create an empty list, tuple, set, or frozen set, as appropriate: a sequence or set with no members.
The constructors that take an iterable
argument expect an object that Python can iterate: an iterable. From the built-in types, the iterables include all sequences, strings, sets, and mappings. (Note that iterating a mapping iterates the mapping's keys.) Thus, you can use these constructors to convert lists to tuples, tuples to sets, strings to lists, and so on:
my_str = 'Python'
my_list = list(my_str)
print(my_list) # ['P', 'y', 't', 'h', 'o', 'n']
my_tuple = tuple(my_list)
print(my_tuple) # ('P', 'y', 't', 'h', 'o', 'n')
my_set = set(my_tuple)
print(my_set) # {'t', 'o', 'n', 'h', 'P', 'y'}
You can also use these constructors to duplicate a collection:
my_list = [5, 12, 2]
another_list = list(my_list)
print(my_list) # [5, 12, 2]
print(another_list) # [5, 12, 2]
print(my_list == another_list) # True
print(my_list is another_list) # False
As you can see in lines 4, 5, and 7, the lists are identical. However, line 8 shows that the list objects are not the same object.
Let's look at a more complex example involving nested lists:
my_list = [[1, 2, 3], [4, 5, 6]]
another_list = list(my_list)
print(my_list) # [[1, 2, 3], [4, 5, 6]]
print(another_list) # [[1, 2, 3], [4, 5, 6]]
print(my_list == another_list) # True
print(my_list is another_list) # False
print(my_list[0] is another_list[0]) # True
print(my_list[1] is another_list[1]) # True
In this case, we can see from lines 4, 5, and 7 that the lists have the same values, and from line 8, we can see that they are different objects. Strangely, in lines 9 and 10, we can see that their elements are the same objects. That's because the list
constructor performs what is known as a shallow copy. In a later chapter, we'll discuss shallow and deep copies and what is happening here. For now, remember that nested collections are subject to shallow copies. That's often fine. However, it can lead to problems that are difficult to diagnose.
Passing a string to any of these constructors causes it to iterate over the characters in the string. It places each character in a separate element of the resulting collection:
print(tuple('Python'))
# ('P', 'y', 't', 'h', 'o', 'n')
Python has a rich variety of collections for storing and manipulating data. You'll see collections in almost all real-world programs; nearly every useful program uses them. We'll continue to explore them in the next chapter.
If you have a list named people
, how can you determine the number of entries in that list?
You can use len(people)
to determine the number of entries.
Video Walkthrough
Can you write some code to change the value 'bye'
in the following tuple to 'goodbye'
?
stuff = ('hello', 'world', 'bye', 'now')
Note: this problem is a bit tricky.
Since tuples are immutable, you can't replace 'bye'
with 'goodbye'
. At best, you can create a new tuple and reassign it:
stuff = ('hello', 'world', 'goodbye', 'now')
stuff = stuff[0:2] + ('goodbye', stuff[3])
stuff = list(stuff)
stuff[2] = 'goodbye'
stuff = tuple(stuff)
Solution 1 creates an entirely new tuple with the changed value. That's not quite in the spirit of what we're asking. It would also be tedious if the tuple contained more than a few elements.
Solution 2 is a little closer to being in the proper spirit. This one concatenates a slice of the original tuple combined with a new tuple that includes 'goodbye'
and 'now'
(from stuff[3]
). However, that code is difficult to read. Furthermore, the slicing and indexing is a sure-fire way to get an off-by-one error. For example, if you wrote stuff[0:1]
instead of stuff[0:2]
, the result would be missing 'world'
.
Solution 3 is as close as we can get to the spirit of the problem. Here, we convert the original tuple to a list and reassign the element to the new value. Finally, we transform the list into a new tuple. This approach still creates an entirely new tuple. However, it avoids the problem of re-entering a long list of members, is cleaner than slicing and indexing, and is much easier to read.
Video Walkthrough
Identify at least 2 differences between lists and tuples. Identify at least 2 ways that lists and tuples are similar.
[]
; tuple literals use ()
.
Video Walkthrough
Why can we treat strings as sequences?
Strings are ordered; you can access the individual characters with indexing.
Video Walkthrough
How do the set types differ from the sequence types?
Sets are unordered; they don't support indexing.
Video Walkthrough
Assuming you have the following assignment:
pi = 3.141592
Write some code that converts the value of pi
to a string and assigns it to a variable named str_pi
.
pi = 3.141592
str_pi = str(pi)
pi = 3.141592
str_pi = f'{pi}'
Solution 1 is the preferred solution since it is slightly more readable. Solution 2 works, too. However, f-strings are better suited for creating strings mixed with other text.
Video Walkthrough
Without running the following code, identify the numbers that are included in each of the following ranges:
range(7)
range(1, 6)
range(3, 15, 4)
range(3, 8, -1)
range(8, 3, -1)
range(7) # [0, 1, 2, 3, 4, 5, 6]
range(1, 6) # [1, 2, 3, 4, 5]
range(3, 15, 4) # [3, 7, 11]
range(3, 8, -1) # []
range(8, 3, -1) # [8, 7, 6, 5, 4]
The most important thing to observe here is that ranges never include the "stop" value. Furthermore, a negative step value counts downwards from the start to the stop value. Thus, the start value should typically be larger than the stop value when the step value is negative.
Video Walkthrough
How would you print all the numbers in the following range?
range(3, 17, 4)
print(list(range(3, 17, 4)))
print(tuple(range(3, 17, 4)))
Since ranges are lazy sequences, you must either iterate over the range or convert the range to a non-lazy sequence: a list or tuple. Here, we transform the range into a list and a tuple.
Video Walkthrough
my_list = [1, 2, 3, [4, 5, 6]]
another_list = list(my_list)
Given the above code, answer the following questions and explain your answers:
my_list
and another_list
equal?
my_list
and another_list
the same object?
my_list
and another_list
equal?
my_list
and another_list
the same object?
Don't write any code for this exercise.
list
constructor creates a new object.
list
constructor creates a shallow copy of the iterable passed as an argument. Shallow copies don't create new nested lists. Instead, only a reference to the nested list gets copied.
Video Walkthrough
Bob expects the following code to print the names in alphabetical order. Without running the code, can you say whether it will? Explain your answer.
names = { 'Chris', 'Clare', 'Karis', 'Karl',
'Max', 'Nick', 'Victor' }
print(names)
This code probably won't print the names alphabetically. Sets, by definition, are unordered collections. Thus, the order in which members are shown when printing or iterating is arbitrary. Bob's code may print the names alphabetically on rare occasions, but it would do so only once every 5040 times.
Video Walkthrough
Consider the data in the following table:
Name | Country |
---|---|
Alice | USA |
Francois | Canada |
Inti | Peru |
Monika | Germany |
Sanyu | Uganda |
Yoshitaka | Japan |
You need to write some Python code to determine the country name associated with one of the listed names. Your code should include the data structure(s) you need and at least one test case to ensure the code works.
countries = {
'Alice': 'USA',
'Francois': 'Canada',
'Inti': 'Peru',
'Monika': 'Germany',
'Sanyu': 'Uganda',
'Yoshitaka': 'Japan',
}
print(countries['Monika'])
Video Walkthrough