In this chapter, we'll learn how to work with Python's collections. In particular, we'll explore indexing and key-based access, then explore some of the many operations most collections have in common.
Indexing is the process of using a whole number to access and perhaps alter an element of a sequence. All sequences, including strings, support indexing. Python uses index 0
as the first element of all built-in sequences. To access the first element of a sequence assigned to variable seq
, we use seq[0]
. We use seq[1]
to access the second element. This process repeats until we exhaust the sequence:
seq = ('a', 'b', 'c')
print(seq[0]) # a (1st element)
print(seq[1]) # b (2nd element)
print(seq[2]) # c (3rd element)
print(seq[3]) # IndexError: tuple index out of range
In the above example, we used a tuple as our sequence. However, we could have used any other sequence, such as a list or string. It's worth noting that the index of the final element is one less than the sequence's length.
An error occurred when accessing index 3. The last element in a 3-element sequence has index 2, not 3. The len
function can determine a sequence's length. You can use its return value to determine whether an index is out of range:
seq = ('a', 'b', 'c')
if len(seq) > 3:
print(seq[3])
Note that we tested whether the length was greater than the intended index. If you want to access element 3, the sequence must have at least four elements.
Suppose we want to access the last element in a sequence? To do that, we can compute the index of the last element and then use that value:
seq = ('a', 'b', 'c')
last_index = len(seq) - 1
print(seq[last_index]) # c
You can also use negative indexes, which are often easier to work with:
seq = ('a', 'b', 'c')
print(seq[-1]) # c (last element)
print(seq[-2]) # b (next to last element)
print(seq[-3]) # a (2nd to last element)
If you want to change the value of an element in a mutable sequence (i.e., a list), you can use indexing on the left side of an assignment:
seq = ['a', 'b', 'c']
seq[1] = 'B'
print(seq) # ['a', 'B', 'c']
This operation mutates the list but merely reassigns seq[1]
.
The indexing syntax also supports a slicing augmentation. Slicing can extract (or modify) any number of consecutive elements simultaneously. For instance, the syntax seq[start:stop]
retrieves the elements from seq
whose index is between start
and stop - 1
, inclusive. You can also use negative indexes for the slice. Finally, you can use the seq[start:stop:step]
syntax to slice every "step-th" element.
seq = 'abcdefghi'
print(seq[3:7]) # defg
print(seq[-6:-2]) # defg
print(seq[2:8:2]) # ceg
print(repr(seq[3:3])) # ''
print(seq[:]) # abcdefghi
print(seq[::-1]) # ihgfedcba
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(seq[3:7]) # [4, 5, 6, 7]
print(seq[-6:-2]) # [5, 6, 7, 8]
print(seq[2:8:2]) # [3, 5, 7]
print(seq[3:3]) # []
print(seq[:]) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(seq[::-1]) # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
seq = [[1, 2], [3, 4]]
seq_dup = seq[:]
print(seq[0] is seq_dup[0]) # True
Line 5 shows that you get an empty slice when the start and stop values are the same.
Line 6 returns a duplicate of the sequence. It is equivalent to seq[0:len(seq)]
. This syntax creates a shallow copy, which we'll discuss later.
Line 7 returns a reversed copy of a sequence. seq[::-1]
is similar to seq.reverse()
(described later). The former returns a new sequence; the latter mutates the original. seq.reverse()
is much easier to read, but the mutation should be intentional.
Lines 9-15 demonstrate that slicing also works with other sequences (a list in this case).
Lines 17-19 demonstrate that slicing performs a shallow copy if the sequence contains any collections, such as lists or tuples. Custom objects, which we'll meet in our Object Oriented Programming with Python book, are also subject to shallow copying.
Slicing also works as an assignment's target (the left side of an =
). However, we'll refrain from demonstrating that as it often leads to code that is hard to read.
Indexing uses whole numbers and only works with sequences and strings. However, mappings use a syntax called key-based access that looks like indexing. However, keys are usually strings, though not always.
Instead of limiting ourselves to whole numbers, we can use any hashable object as a key, which includes the built-in immutable types. We sometimes use integer keys, which, confusingly, look like indexing. Other types are rarely used but can be helpful (especially tuples).
With dicts, we use key-based access like this:
my_dict = {
'a': 'abc',
37: 'def',
(5, 6, 7): 'ghi',
frozenset([1, 2]): 'jkl',
}
print(my_dict['a']) # abc
print(my_dict[37]) # def
print(my_dict[(5, 6, 7)]) # ghi
print(my_dict[frozenset([1, 2])]) # jkl
print(my_dict['nothing']) # KeyError: 'nothing'
We've used a string, an integer, a tuple, and a frozen set as keys in this dict. We've also seen that we get a KeyError
if we try to use a non-existent key. If there's a chance you might get a KeyError
, consider using the dict.get
method. It returns the value associated with a given key if the key exists. Otherwise, it produces a default return value (usually None
, but other values can be specified):
my_dict = {
'a': 'abc',
37: 'def',
(5, 6, 7): 'ghi',
frozenset([1, 2]): 'jkl',
}
print(my_dict.get('a')) # abc
print(my_dict.get('nothing')) # None
print(my_dict.get('nothing', 'N/A')) # N/A
print(my_dict.get('nothing', 100)) # 100
We can also use key-based access to the left of the =
operator:
my_dict = {
'a': 'abc',
37: 'def',
(5, 6, 7): 'ghi',
frozenset([1, 2]): 'jkl',
}
my_dict['a'] = 'ABC'
my_dict[37] = 'DEF'
my_dict[(5, 6, 7)] = 'GHI'
my_dict[frozenset([1, 2])] = 'JKL'
print(my_dict)
# Pretty printed for clarity
# {
# 'a': 'ABC',
# 37: 'DEF',
# (5, 6, 7): 'GHI',
# frozenset({1, 2}): 'JKL'
# }
Can we assign new keys to a dict?
my_dict['xyz'] = 'Hey there!'
print(my_dict['xyz']) # Hey there!
By all means, we can!
Can we use a mutable key?
my_dict[[1, 2, 3]] = 'Hey there!'
# TypeError: unhashable type: 'list'
Nope. It doesn't work. Dictionary keys must be immutable.
Most Python collections support various operations, functions, and methods, some of which only apply to mutable collections. We'll examine the non-mutating operations first. We won't cover everything you can do. However, we'll see most of what you will encounter at Launch School.
All of the operations in this section work equally well with mutable and immutable collections. They never modify the original collection. Instead, they return new objects.
The in
operator determines whether the object to the operator's left is in the iterable collection on the right. It returns True
if the item on the left is in the collection on the right, False
otherwise.
The not in
operator is the inverse of in
. It returns False
if the object is in the collection; True
if not.
With sequences and sets, these operators compare the object for equality against each collection element. For mappings (dicts), it checks whether the item is a key in the dictionary. For strings, it determines whether the right string contains the left string.
seq = [4, 'abcdef', (True, False, None)]
print(4 in seq) # True
print(4 not in seq) # False
print('abcdef' in seq) # True
print('abcdef' not in seq) # False
print('cde' in seq[1]) # True
print('cde' not in seq[1]) # False
print('acde' in seq[1]) # False
print('acde' not in seq[1]) # True
print((True, False, None) in seq) # True
print((True, False, None) not in seq) # False
print(3.14 in seq) # False
print(3.15 not in seq) # True
min
and max
return the minimum and maximum members in an iterable collection. The only requirement is that any pair of the collection's elements are comparable with the <
and >
operators.
my_set1 = {1, 4, -9, 16, 25, -36, -63, -1}
my_set2 = {'1', '4', '-9', '16', '25', '-36', '-1'}
print(min(my_set1), max(my_set1)) # -63 25
print(min(my_set2), max(my_set2)) # -1 4
As you can see, min
and max
know how to compare the members of our sets. It determines the type of comparison by looking at the element types.
In most cases, you can't use min
and max
with heterogenous collections:
>>> my_set = {1, 4, '-9', 16, '25', -36, -63, -1}
>>> min(my_set)
TypeError: '<' not supported between instances of
'str' and 'int'
>>> max(my_set)
TypeError: '>' not supported between instances of
'str' and 'int'
However, it is possible in some situations:
my_set = {1, 3.14, -2.71}
print(min(my_set), max(my_set)) # -2.71 3.14
You can also use min
and max
with multiple arguments instead of an iterable.
print(min(3, 5, -1), max(3, 5, -1)) # -1 5
The sum
function is used in conjunction with iterable collections that consist entirely of numeric values. It computes and returns the sum of all the collection's numbers.
numbers = (1, 1, 2, 3, 5, 8, 13, 21, 34)
print(sum(numbers)) # 88
Despite what Python's official documentation says, sum
cannot be used with strings. It only works with numeric types. Use str.join
if you want to concatenate strings
Two helpful sequence methods are the index
and count
methods. seq.index
returns the index of the first element in the sequence that matches a given object. It raises a ValueError
exception if the object is not found. seq.count
returns the number of times a value occurs in the sequence.
names = ['Karl', 'Grace', 'Clare', 'Victor',
'Antonina', 'Allison', 'Trevor']
print(names.index('Clare')) # 2
print(names.index('Trevor')) # 6
print(names.index('Chris'))
# ValueError: 'Chris' is not in list
numbers = [1, 3, 6, 5, 4, 10, 1, 5, 4, 4, 5, 4]
print(numbers.count(1)) # 2
print(numbers.count(3)) # 1
print(numbers.count(4)) # 4
print(numbers.count(7)) # 0
index
also works with strings. It searches for the first matching substring of a string:
names = 'Karl Grace Clare Victor Antonina Trevor'
print(names.index('Clare')) # 11
print(names.index('Trevor')) # 33
print(names.index('Chris'))
# ValueError: substring not found
One of the most impressively helpful functions is zip
, which works with all iterables. It lets you merge the members of multiple iterables into a single list of tuples. zip
makes it easy to iterate through many collections simultaneously.
zip
iterates through 0 or more iterables in parallel and returns a list-like object of tuples. Each tuple contains a single object from each of the iterables. That's a mouthful, so let's take a look at an example:
iterable1 = [1, 2, 3]
iterable2 = ('Kim', 'Leslie', 'Bertie')
iterable3 = [None, True, False]
zipped_iterables = zip(iterable1, iterable2, iterable3)
print(list(zipped_iterables))
# Pretty printed for clarity
# [
# (1, 'Kim', None),
# (2, 'Leslie', True),
# (3, 'Bertie', False)
# ]
Here, we've combined 3 iterables (two lists and a tuple) into a list-like object of tuples. Each tuple in the return value has 3 members. The first member is from iterable1
, the second from iterable2
, and the third from iterable3
. The first tuple contains the first element of each of the iterables, the second tuple contains the second element of the iterables, and the third tuple contains the third element. Thus, our first tuple contains the first elements of iterable1
(1), iterable2
('Kim'), and iterable3
(None).
Note that we referred to zip
s return value as a list-like object. It's not a true list, but a lazy sequence much the same as a range
. You must request values explicitly, which you can do with a loop or iterable constructor. That's why we call list(zipped_iterables)
on line 6 above.
zip
's collection arguments are usually the same length but don't have to be. If you want to enforce identical lengths, add a strict=True
keyword argument to the invocation:
zipped_iterables = zip(iterable1, iterable2, strict=True)
With strict=True
, zip
raises an exception if the iterables don't all have the same length.
So, what happens if the lengths differ and strict=True
isn't given? In this case, zip
stops after exhausting the shortest iterable:
result = zip(range(5, 10), # length is 5
range(1, 3), # length is 2 (shortest)
range(3, 7)) # length is 4
print(list(result)) # [(5, 1, 3), (6, 2, 4)]
Since range(1, 3)
only has a length of 2 and strict=True
was omitted, the result
list only has 2 elements.
The zip
function's canonical application is to simultaneously iterate over multiple collections. We'll demonstrate that in the next chapter.
It's worth noting that zip
returns what is known as an iterator. We'll discuss iterators in more detail in the Core curriculum. However, one characteristic of iterators that is important to be aware of is that they can only be consumed once. If you iterate over the iterator object, subsequent attempts to iterate will fail. For instance:
result = zip(range(5, 10), # length is 5
range(1, 3), # length is 2 (shortest)
range(3, 7)) # length is 4
print(list(result)) # [(5, 1, 3), (6, 2, 4)]
print(list(result)) # []
As you can see, once we've consumed the return value of zip
, we can't do it again. Most Python functions and methods that return lazy sequences may only be consumed once; range
objects are one of the few exceptions to this rule.
Python provides 3 methods to get lists of the keys, values, and key/value pairs from a dictionary. Those methods are dict.keys
, dict.values
, and dict.items
. Let's see them in action:
people_phones = {
'Chris': '111-2222',
'Pete': '333-4444',
'Clare': '555-6666',
}
print(people_phones.keys())
# dict_keys(['Chris', 'Pete', 'Clare'])
print(people_phones.values())
# Pretty printed for clarity
# dict_values([
# '111-2222',
# '333-4444',
# '555-6666'
# ])
print(people_phones.items())
# Pretty printed for clarity
# dict_items([
# ('Chris', '111-2222'),
# ('Pete', '333-4444'),
# ('Clare', '555-6666')
# ])
The lists produced by these methods aren't ordinary lists. Python wraps the output for each list with dict_keys()
, dict_values()
, or dict_items()
to show that these aren't regular lists. They are actually dictionary view objects that are tied to the dictionary. If you add a new key/value pair to the dictionary, remove an element, or update a value, the corresponding lists are updated immediately:
people_phones = {
'Chris': '111-2222',
'Pete': '333-4444',
'Clare': '555-6666',
}
keys = people_phones.keys()
values = people_phones.values()
print(keys) # dict_keys(['Chris', 'Pete', 'Clare'])
print(values)
# dict_values(['111-2222', '333-4444', '555-6666'])
people_phones['Max'] = '123-4567'
people_phones['Pete'] = '345-6789'
del people_phones['Chris']
print(keys) # dict_keys(['Pete', 'Clare', 'Max'])
print(values)
# dict_values(['345-6789', '555-6666', '123-4567'])
Let's see some of the mutating operations we can use with mutable sequences, the most common of which are lists. All methods in this section mutate the collection used to call them.
Keep in mind that all these methods work with any properly defined mutable sequence, such as deques and arrays, both of which are available for Python.
We can use the append
, insert
, and extend
methods to add new elements to a mutable sequence, such as a list.
seq.append
appends a single object to the end of a mutable sequence, such as a list:
numbers = [1, 2]
numbers.append(10) # Append the number 10
print(numbers) # [1, 2, 10]
seq.insert
inserts an object into a mutable sequence before the element at a given index. If the given index is greater than or equal to the sequence's length, the object is appended to the sequence. If the index is negative, it is counts from the end of the sequence.
numbers = [1, 2]
numbers.insert(0, 8) # Insert 8 before numbers[0]
print(numbers) # [8, 1, 2]
numbers.insert(2, 6) # Insert 6 before numbers[2]
print(numbers) # [8, 1, 6, 2]
numbers.insert(100, 55) # Insert 55 before numbers[100]
print(numbers) # [8, 1, 6, 2, 55]
numbers.insert(-3, 33) # Insert 33 before the 3rd element
# from the end.
print(numbers) # [8, 1, 33, 6, 2, 55]
seq.extend
appends the contents of an iterable sequence to the calling iterable sequence.
numbers = [1, 2]
numbers.extend([7, 8]) # Append 7 and 8 to numbers
print(numbers) # [1, 2, 7, 8]
We can use the remove
, pop
, and clear
methods to remove elements from a mutable sequence:
seq.remove
searches a sequence for a specific object and removes the first occurrence of that object. It raises a ValueError
if there is no such object.
my_list = [2, 4, 6, 8, 10]
my_list.remove(8)
print(my_list) # [2, 4, 6, 10]
my_list.remove(8)
# ValueError: list.remove(x): x not in list
seq.pop
removes and returns an indexed element from a mutable sequence. If no index is given, it removes the last element in the sequence. It raises an error if the index is out of range. pop
only works with mutable indexed sequences.
my_list = [2, 4, 6, 8, 10]
print(my_list.pop(1)) # 4
print(my_list) # [2, 6, 8, 10]
print(my_list.pop()) # 10
print(my_list) # [2, 6, 8]
print(my_list.pop(4))
# IndexError: pop index out of range
seq.clear
removes all elements from a sequence, leaving it empty.
my_list = [2, 4, 6, 8, 10]
my_list.clear()
print(my_list) # []
You can use the sorted
function to create a sorted list from any iterable collection, mutable or immutable. It creates and returns a sorted list from the elements in the collection. The original collection is unchanged.
You can also sort lists using the list.sort
method. However, this method mutates the list. It's also worth noting that my_list.sort()
is a bit faster and less memory intensive than sorted(my_list)
since the method does an in-place sort, so doesn't have to build a completely new list.
names = ('Grace', 'Clare', 'Allison', 'Trevor')
print(sorted(names))
# ['Allison', 'Clare', 'Grace', 'Trevor']
print(names)
# ('Grace', 'Clare', 'Allison', 'Trevor')
names = list(names)
print(names)
# ['Grace', 'Clare', 'Allison', 'Trevor']
print(names.sort()) # None
print(names)
# ['Allison', 'Clare', 'Grace', 'Trevor']
In the sorted
example, we've sorted and printed the tuple of names. Note that the result is a list, not a tuple. We've then shown that the original names
tuple is unchanged.
Next, we converted the names
tuple to a list and sorted it with names.sort
. Note that sort
returned None
, which strongly hints that the collection was mutated. When we print the names, we see that the list was mutated.
By default, both sort
and sorted
do an ascending sort using the <
operator to compare elements from the collection. You can reverse the sort by adding a reverse=True
keyword argument to the argument list:
names = ['Grace', 'Clare', 'Allison', 'Trevor']
print(sorted(names, reverse=True))
# ['Trevor', 'Grace', 'Clare', 'Allison']
names.sort(reverse=True)
print(names) # ['Trevor', 'Grace', 'Clare', 'Allison']
You can also pass a key=func
keyword argument to tell sort
or sorted
how to determine what values it should sort. For instance, if you want to perform a case-insensitive sort on a list of strings, you can specify key=str.casefold
:
words = ['abc', 'DEF', 'xyz', '123']
print(sorted(words))
# ['123', 'DEF', 'abc', 'xyz']
print(sorted(words, key=str.casefold))
# ['123', 'abc', 'DEF', 'xyz']
In most cases, you can also use str.lower
instead of str.casefold
. However, using str.casefold
is considered best-practice since sort
will be comparing the strings.
You can also sort a list of numeric-valued strings by passing key=int
to the function or method:
numbers = ['1', '5', '100', '15', '534', '53']
numbers.sort()
print(numbers) # ['1', '100', '15', '5', '53', '534']
numbers.sort(key=int)
print(numbers) # ['1', '5', '15', '53', '100', '534']
This is probably your first time seeing that Python's functions and methods are objects that can be passed around as arguments to a function. (They can also be passed around as return values.) Remember this moment. The technique is potent.
If you use sorted
on a dictionary, it returns a sorted list of the dictionaries keys.
You can use the reversed
function to reverse the order of elements in a sequence or dictionary. The returned value is a lazy sequence that contains the elements in the sequence or the keys from a dictionary. Since the result is lazy, you need to iterate over the result or expand it with a function list list
or tuple
.
You can also reverse lists using the list.reverse
method. However, this method mutates the list. In general, you should use list.reverse
when you really need to reverse the list's contents, and don't need to preserve the original order. You should use reversed
when all you need to do is iterate over the list in reverse. Don't use reversed
if you eventually want to convert the result to a non-lazy sequence such as a list or tuple:
names = ('Grace', 'Clare', 'Allison', 'Trevor')
reversed_names = reversed(names)
print(reversed_names)
# <reversed object at 0x102848e50>
print(tuple(reversed(names))) # Requires extra memory
# ('Trevor', 'Allison', 'Clare', 'Grace')
print(names)
# ('Grace', 'Clare', 'Allison', 'Trevor')
names = list(names)
print(names.reverse()) # None
print(names)
# ['Trevor', 'Allison', 'Clare', 'Grace']
my_dict = {'abc': 1, 'xyz': 23, 'pqr': 0, 'jkl': 5}
reversed_dict = reversed(my_dict)
print(reversed_dict)
# <dict_reversekeyiterator object at 0x100d19f80>
print(list(reversed_dict)) # Requires extra memory
# ['jkl', 'pqr', 'xyz', 'abc']
In the reversed
example using a tuple, we've reversed and printed the tuple of names. We've then shown that the original names
tuple is unchanged.
Next, we converted the names
tuple to a list and reversed it with names.reverse
. Note that reverse
returned None
, which strongly hints that the collection was mutated. When we print the names, we see that the list was mutated.
Finally, we demonstrated using reversed
with a dictionary.
Think of the reversed
function as a looping aid. You sometimes want to iterate over a collection in reverse. reversed
makes that easy:
names = ('Grace', 'Clare', 'Allison', 'Trevor')
for name in reversed(names):
print(name)
# Trevor
# Allison
# Clare
# Grace
As we've seen, Python has many built-in functions for manipulating sequences and other collections. You can use many of these functions with strings, even though strings are not proper collections or sequences.
The str
class also offers a veritable goldmine of methods for using and manipulating strings in myriad ways. Let's take a look at some of these. We cannot cover them all, but we'll look at some of the most useful. We'll skip over anything we've already discussed in any detail.
We've already seen how the str.lower
and str.upper
methods work. Let's check out two related methods:
str.capitalize
returns a copy of str
with the first character capitalized and the remaining characters converted to lowercase.
print("what's up?".capitalize()) # What's up?
print('456ABC'.capitalize()) # 456abc
str.title
returns a copy of str
with every word in the string capitalized. The remaining characters are converted to lowercase.
print("four SCORE and sEvEn".title())
# Four Score And Seven
str.title
idea of what constitutes a word can lead to unexpected results. In particular, it uses whitespace and certain punctuation characters as word boundaries:
print("i can't believe it's already mid-july.".title())
# I Can'T Believe It'S Already Mid-July.
If you only want to break at whitespace, you can use the capwords
function from the string
module:
import string
print(string.capwords("i can't believe it's already mid-july."))
# I Can't Believe It's Already Mid-july.
str.swapcase
returns a copy of str
with every uppercase letter converted to lowercase, and vice versa.
print("What's up?".swapcase()) # wHAT'S UP?
print('456ABC'.swapcase()) # 456abc
print('456ABC'.swapcase().swapcase()) # 456ABC
The first call to swapcase
on line 3 returns the original string with the case swapped; the second swaps the case on that return value.
Note that there are situations where str.swapcase().swapcase()
does not return the original value of str
. For instance:
print('Straße'.swapcase().swapcase()) # Strasse
In this case, the German eszet character (ß
), which represents a doubled lowercase s
(ss
), does not have an uppercase counterpart. Thus, the above example prints Strasse
instead of Straße
.
Python has a generous suite of methods that test what sort of characters are present in a string. We'll look at just a few of these -- there are many more.
str.isalpha()
returns True
if all characters of str
are alphabetic, False
otherwise. It returns False
if the string is empty.
'Hello'.isalpha() # True
'Good-bye'.isalpha() # False: `-` is not a letter
'Four score'.isalpha() # False: space is not a letter
''.isalpha() # False
str.isdigit()
returns True
if all characters of str
are digits, False
otherwise. It returns False
if the string is empty.
'12340'.isdigit() # True
'123.4'.isdigit() # False: `.` is not a digit
'-1234'.isdigit() # False: `-` is not a digit
''.isdigit() # False
str.isalnum()
returns True
if str
is composed entirely of letters and/or digits, False
otherwise. It returns False
if the string is empty.
str.islower()
returns True
if all cased characters in str
are lowercase letters, False
otherwise. It returns False
if the string contains no case characters.
str.isupper()
returns True
if all cased characters in str
are uppercase, False
otherwise. It returns False
if the string contains no case characters.
str.isspace()
returns True
if all characters in str
are whitespace characters, False
otherwise. It returns False
if the string is empty. The whitespace characters include ordinary spaces (), tabs (
\t
), newlines (\n
), and carriage returns (\r
). It also includes two rarely used characters: vertical tabs (\v
) and form feeds (\f
), as well as some foreign characters that count as whitespace.
Be careful with these methods: they're all Unicode-aware. Thus, 'Καλωσήρθες'.isalpha()
returns True
since the characters are all part of the Greek alphabet. If you need to exclude non-ASCII characters, use this pattern:
text.isalpha() and text.isascii()
The str.strip
method returns a copy of str
with all leading and trailing whitespace characters (see str.isspace
above) removed. Python programmers often use this method to strip unwanted whitespace from input data, such as keyboard input. Unwanted whitespace frequently makes input data harder to work with, so immediately removing excess whitespace is a good idea.
text = input(prompt).strip()
In some of the examples in this section, we use the built-in repr
function when printing strings. repr
formats strings with quotes and clearly shows the presence of whitespace characters:
text = ' \t abc def \n\r'
print(repr(text)) # ' \t abc def \n\r'
print(repr(text.strip())) # 'abc def'
You can also tell strip
to remove other characters by providing a string argument. The characters inside this string are the ones you want removed.
text = ' \t abc def \n\r'
print(repr(text.strip('abc'))) # ' \t abc def \n\r'
text = 'aaabaacccabxyzabccba'
print(text.strip('a')) # baacccabxyzabccb
print(text.strip('ab')) # cccabxyzabcc
print(text.strip('ba')) # cccabxyzabcc
print(text.strip('abc')) # xyz
print(text.strip('bc')) # aaabaacccabxyzabccba
print(repr(text.strip('abcxyz'))) # ''
There are things to note with the above examples:
strip
removes individual characters, not substrings. That is, the order of the characters in the argument doesn't matter. Thus, on line 8, we remove all of the leading and trailing 'a'
, 'b'
, and 'c'
characters, leaving nothing but 'xyz'
.
The str.lstrip
method is identical to str.strip
except it only removes leading characters (the leftmost). Similarly, str.rstrip
removes trailing characters (the rightmost).
text = 'aaabaacccabxyzabccba'
print(text.lstrip('a')) # baacccabxyzabccba
print(text.rstrip('a')) # aaabaacccabxyzabccb
print(text.lstrip('ab')) # cccabxyzabccba
print(text.rstrip('ab')) # aaabaacccabxyzabcc
print(text.lstrip('ba')) # cccabxyzabccba
print(text.rstrip('ba')) # aaabaacccabxyzabcc
print(text.lstrip('abc')) # xyzabccba
print(text.rstrip('abc')) # aaabaacccabxyz
Two related methods are available for determining whether a string begins or ends with a given substring. Let's examine these methods:
str.startswith
returns True
if the string given by str
begins with a certain substring, False
if it does not:
'Four score and seven'.startswith('Four score') # True
'Four score and seven'.startswith('For score') # False
'Four score and seven'.startswith('score') # False
The argument can also be a tuple of strings:
'abc def'.startswith(('abc', 'xyz', 'stu')) # True
'def ghu'.startswith(('abc', 'xyz', 'stu')) # False
'xyz uvw'.startswith(('abc', 'xyz', 'stu')) # True
'stu vwx'.startswith(('abc', 'xyz', 'stu')) # True
The method also accepts "start" and "end" indexes to control where the search begins and ends:
'abc def'.startswith('def', 4) # True
'abc def ghi'.startswith('def', 4, 7) # True
str.endswith
returns True
if the string given by str
ends with a certain substring, False
if it does not:
'Four score and seven'.endswith('and seven') # True
'Four score and seven'.endswith('ad seven') # False
'Four score and seven'.endswith('score') # False
As with startswith
, the argument can be a tuple of strings. You can also supply "start" and "end" indexes:
'abc def'.endswith(('abc', 'xyz', 'stu')) # False
'abc def'.endswith(('xyz', 'def')) # True
'abc def'.endswith('def', 4) # True
'abc def ghi'.endswith('def', 4, 7) # True
The str.split
method returns a list of the words in the string str
. By default, split
splits the string at sequences of one or more whitespace characters:
text = ' Four score and seven years ago. '
print(text.split())
# ['Four', 'score', 'and', 'seven', 'years', 'ago.']
print('no-spaces'.split()) # ['no-spaces']
If you want to split on something other than spaces, you can tell Python what character or character string should act as a delimiter:
text = ',Four,score,and,,,seven,years,ago,'
print(text.split(','))
# Pretty printed for clarity
# [
# '',
# 'Four',
# 'score',
# 'and',
# '',
# '',
# 'seven',
# 'years',
# 'ago',
# ''
# ]
Note that specifying a delimiter changes the splitting behavior. Instead of looking for runs of whitespace, it splits the string at every occurrence of the delimiter. This also applies when using a literal space character as the delimiter. Compare the output of the code below to our first example:
text = ' Four score and seven years ago. '
print(text.split(' '))
# Partially pretty printed for clarity
# ['', '', 'Four', '', '', '', '', 'score', 'and',
# '', '', 'seven', 'years', 'ago.', '', '', '']
split
also recognizes multi-character delimiters:
text = 'Four<>score<:>and<>seven<>years<>ago'
print(text.split('<>'))
# ['Four', 'score<:>and', 'seven', 'years', 'ago']
Multi-character delimiters must match exactly. Thus, the <
and >
in <:>
aren't split.
Most major languages have a split
function or method. It's worth noting that Python's differs from most others in that you can't split a string into individual characters using split
. Python's str.split
method doesn't allow a separator of ''. If you need to split a string into characters, use the list
or tuple
function:
text = 'abcde'
print(list(text)) # ['a', 'b', 'c', 'd', 'e']
print(tuple(text)) # ('a', 'b', 'c', 'd', 'e')
You can also iterate strings a character at a time:
text = 'abcde'
for char in text:
print(char)
str.splitlines
returns a list of lines from the string str
. splitlines
looks for line-ending characters like \n
(line feed), \r
(carriage return), \n\r
(new lines), and a variety of other line boundaries. See the documentation for a complete list.
text = '''
You were lucky to have a room. We used to have to
live in a corridor.
Oh we used to dream of livin' in a corridor!
Woulda' been a palace to us. We used to live in an
old water tank on a rubbish tip. We got woken up
every morning by having a load of rotting fish
dumped all over us.
'''
print(text.strip().splitlines())
# Pretty printed for clarity
[
"You were lucky to have a room. We used to have to",
"live in a corridor.",
"Oh we used to dream of livin' in a corridor!",
"Woulda' been a palace to us. We used to live in an",
"old water tank on a rubbish tip. We got woken up",
"every morning by having a load of rotting fish",
"dumped all over us."
]
The final method we'll discuss in this cluster is str.join
, which concatenates all strings in an iterable collection into a single lone string. Each string from the collection gets concatenated to the previous string with the value of str
between them:
words = ['You', 'were', 'lucky']
print(''.join(words)) # Youwerelucky
print(' '.join(words)) # You were lucky
print(','.join(words)) # You,were,lucky
print('\n '.join(words))
# You
# were
# lucky
We often need to search through strings looking for a particular substring. Python provides several ways to do this, including the in
and not in
operators we looked at earlier.
Here, we'll look at the str.find
and str.rfind
methods. str.find
searches through str
looking for the first occurrence of the argument. str.rfind
does the same, but it searches from right to left (that is, in reverse). Both methods return the index of the first matching substring. Otherwise, they return -1
.
school = 'launch school'
print(school.find(' ')) # 6
print(school.find('l')) # 0
print(school.find('h')) # 5
print(school.find('hoo')) # 9
print(school.find('x')) # -1
print(school.find('N')) # -1
print(school.rfind(' ')) # 6
print(school.rfind('l')) # 12
print(school.rfind('h')) # 9
print(school.rfind('hoo')) # 9
print(school.rfind('oh')) # -1
print(school.rfind('N')) # -1
Note that both find
and rfind
are case sensitive, so 'N'
doesn't match 'n'
.
You can also search slices by adding start and end arguments to the invocation:
text = 'abc abc def abc'
print(text.find(' ', 4)) # 7
print(text.find(' ', 8)) # 11
print(text.find('c', 0, 2)) # -1
print(text.rfind('c', 3, 10)) # 6
Using find
or rfind
to search slices is not the same as using the [start:stop]
syntax first and then searching the result. Compare the outputs for the two find
invocations below:
text = 'abc abc def abc'
print(text[3:10].find('c')) # 3
print(text.find('c', 3, 10)) # 6
If you take a slice of a string before using it to call find
or rfind
, the method returns the index of the search string in that slice. However, the method considers the starting index when you use the slicing arguments. Another difference is that taking a slice first creates a new string, while using slicing arguments does not.
Collections can be nested inside other collections. For instance, you can have a list that contains a dict, a set, a tuple, and another list. Each of those can, in turn, also contain nested collections:
nested_list = [
{'foo': 42, 'bar': [1, 2, 3], 'qux': None},
{
'Kim',
('Leslie', 'Les'),
('Pete', 'Peter'),
('Jonathan', 'Jon', 'Jack'),
},
(4, 5, (1, 2, 3), 6, 7),
['a', 'b', 'cde', -3.141592],
]
There are, however, some limitations on what you can nest in certain collections. For instance, in most cases, you can't nest mutable collections inside some collections. For instance, you can't nest a mutable collection such as a list, dictionary, or another set inside a set:
>>> my_set = {1, 2, 3, [4, 5]}
TypeError: unhashable type: 'list'
>>> my_set = {1, 2, 3, {4, 5}}
TypeError: unhashable type: 'set'
However, you can nest a frozen set inside a set or frozen set:
>>> my_set = { 1, 2, 3, frozenset([4, 5]) }
>>> my_set # {frozenset({4, 5}), 1, 2, 3}
Curiously, you can nest mutable collections inside tuples even though tuples are immutable.
>>> my_tuple = (1, 2, 3, [4, 5], {6, 7}, {'x': 'a dict'})
>>> my_tuple
(1, 2, 3, [4, 5], {6, 7}, {'x': 'a dict'})
In most cases, nested collections have some sense of structure and reason for the nesting. For instance, we might define a deck of cards as a list of dictionaries:
deck = [
{ 'suit': 'Clubs', 'value': '2' },
{ 'suit': 'Clubs', 'value': '3' },
{ 'suit': 'Clubs', 'value': '4' },
...
{ 'suit': 'Spades', 'value': 'Queen' },
{ 'suit': 'Spades', 'value': 'King' },
{ 'suit': 'Spades', 'value': 'Ace' },
]
If we want to print the fiftieth card in the deck, we can write:
print(f"{deck[49]['value']} of {deck[49]['suit']}")
# Queen of Spades
You can also have several layers of nesting in a sequence. In that case, you can access any item from any nested part of the sequence by just running several []
indexing operations together:
nested_seq = [
[1, 2, [3, 4, (5, 6, 7, 8)]],
[9, [10, (11,)]],
[12, 13, [14, 15, (16, 17)]],
[18, [19, 20, (21, 22)]],
]
print(nested_seq[1]) # [9, [10, (11,)]]
print(nested_seq[3][0]) # 18
print(nested_seq[0][2][2]) # (5, 6, 7, 8)
print(nested_seq[2][2][2][1]) # 17
Nested collections become really "fun" when you start having to iterate through them.
As you might expect, Python supports comparison operations for collections. It provides comparison mechanisms for all built-in iterable collections, and you can define comparisons for any custom iterables you create.
Equality is the most straightforward comparison. If two iterables meet all of the following requirements, they are equal. Otherwise, they are unequal.
print([2, 3] == [2, 3]) # True
print([2, 3] == [3, 2]) # False (diff sequence)
print([2, 3] == [2]) # False (diff lengths)
print([2, 3] == (2, 3)) # False (diff types)
print({2, 3} == {3, 2}) # True (same members)
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 2, 'a': 1}
dict3 = {'a': 1, 'b': 2, 'c': 3}
print(dict1 == dict2) # True (same pairs)
print(dict1 == dict3) # False
Of course, you can also use !=
to compare for inequality.
You can also compare sequences for <
, <=
, >
, and >=
. However, we won't go into that here. It's rarely needed.
Python's collections have a wealth of functions, methods, and other operations that programmers need daily. Sometimes, the most challenging problem is deciding whether you need a function, a method, or an operator.
Write Python code to print the seventh number of range(0, 25, 3)
.
my_range = range(0, 25, 3)
print(my_range[6]) # 18
print(range(0, 25, 3)[6]) # 18
Video Walkthrough
Use slicing to write Python code to print a 6-character substring of 'Launch School'
that begins with the first c
.
my_str = 'Launch School'
print(my_str[4:10]) # ch Sch
The first c
occurs at index 4, so that's our start value for the slice. Since we want 6 characters, the stop value is at index 4 + 6 or 10. Note that the character at index 10 is not included in the result.
If you want to determine the location of the substring programmatically, you can do this:
my_str = 'Launch School'
start = my_str.find('c')
print(my_str[start:start + 6]) # ch Sch
Some error checking might be advisable there.
Video Walkthrough
Write Python code to create a new tuple from (1, 2, 3, 4, 5)
. The new tuple should be in reverse order from the original. It should also exclude the first and last members of the original. The result should be the tuple (4, 3, 2)
.
my_tuple = (1, 2, 3, 4, 5)
my_list = list(my_tuple)
my_list.reverse()
result = tuple(my_list[1:4])
print(result) # (4, 3, 2)
my_tuple = (1, 2, 3, 4, 5)
result = my_tuple[3:0:-1]
print(result) # (4, 3, 2)
my_tuple = (1, 2, 3, 4, 5)
result = my_tuple[-2:-5:-1]
print(result) # (4, 3, 2)
There are several ways to solve this problem. Your first inclination may have been to use the reverse
method, as in Solution 1. However, reverse
only works with lists, so we must first convert the tuple to a list. Even so, we have to slice the list, though the slice is a little cleaner.
Solutions 2 and 3 use the same approach by extracting a reversed slice. The only difference is how we specify the start and stop values for the slice. What makes these tricky is that the element indexed by the stop value is not included in the result. If you used one of these solutions, you likely started with an off-by-one bug.
Video Walkthrough
This is a 3-part question. Consider the following dictionary:
pets = {
'Cat': 'Meow',
'Dog': 'Bark',
'Bird': 'Tweet',
}
Bark
by accessing the element associated with the key Dog
.
None
when you try to print the value associated with the non-existent key, Lizard
.
<silence>
when you try to print the value associated with the non-existent key, Lizard
.
print(pets['Dog'])
print(pets.get('Lizard'))
print(pets.get('Lizard', '<silence>'))
Since the pets
dictionary doesn't have a Lizard
key, we need to use the dict.get
method so we don't get an error. In Part 2, we don't specify a default value, so get
returns None
. In Part 3, we set <silence>
as the default value.
Video Walkthrough
Which of the following values can't be used as a key in a dict
object, and why?
'cat'
(3, 1, 4, 1, 5, 9, 2)
['a', 'b']
{'a': 1, 'b': 2}
range(5)
{1, 4, 9, 16, 25}
3
0.0
frozenset({1, 4, 9, 16, 25})
The following items can't be used as keys:
['a', 'b']
{'a': 1, 'b': 2}
{1, 4, 9, 16, 25}
The first value is a list, the second another dict, and the last a set. Since all 3 types are mutable, they can't be used as dict keys. All remaining items are immutable built-in objects; they are acceptable dict keys.
Video Walkthrough
What will the following code print?
print('abc-def'.isalpha())
print('abc_def'.isalpha())
print('abc def'.isalpha())
print('abc def'.isalpha() and
'abc def'.isspace())
print('abc def'.isalpha() or
'abc def'.isspace())
print('abcdef'.isalpha())
print('31415'.isdigit())
print('-31415'.isdigit())
print('31_415'.isdigit())
print('3.1415'.isdigit())
print(''.isspace())
print('abc-def'.isalpha()) # False
print('abc_def'.isalpha()) # False
print('abc def'.isalpha()) # False
print('abc def'.isalpha() and
'abc def'.isspace()) # False
print('abc def'.isalpha() or
'abc def'.isspace()) # False
print('abcdef'.isalpha()) # True
print('31415'.isdigit()) # True
print('-31415'.isdigit()) # False
print('31_415'.isdigit()) # False
print('3.1415'.isdigit()) # False
print(''.isspace()) # False
There are two things to note above.
and
or or
to determine whether a string contains a mixture of different value types. The str.isXXXXX
methods determine whether every character in str
matches a certain class of characters. Thus, a string can't be both alphabetic and whitespace. It can be alphabetic or whitespace, but that doesn't work for something like 'abc def'
.
str.isXXXXX
methods return False
when invoked by an empty string.
Video Walkthrough
Write Python code to replace all the :
characters in the string below with +
.
info = 'xyz:*:42:42:Lee Kim:/home/xyz:/bin/zsh'
Try this problem using the methods you've learned about in this chapter. Once you have that working, use the Python documentation for the str
type to find an alternative solution.
info = 'xyz:*:42:441:Lee Kim:/home/xyz:/bin/zsh'
parts = info.split(':')
result = '+'.join(parts)
print(result)
# 'xyz+*+42+42+Lee Kim+/home/xyz+/bin/zsh'
info = 'xyz:*:42:42:Lee Kim:/home/xyz:/bin/zsh'
result = info.replace(':', '+')
print(result)
# 'xyz+*+42+42+Lee Kim+/home/xyz+/bin/zsh'
Video Walkthrough
Explain why the code below prints different values on lines 3 and 4.
text = "It's probably pining for the fjords!"
print(text[21:35].rfind('f')) # 8
print(text.rfind('f', 21, 35)) # 29
Line 3 first extracts a slice from text
ranging from index 21 through index 35. That returns the string 'for the fjords'
. rfind
then returns 8
, the index of the rightmost instance of an 'f'
.
On the other hand, line 4 does a search for the rightmost f
between indexes 21 and 35. Since the f
occurs at index 29, that's what the method returns.
Video Walkthrough
Write some code to replace the value 6
in the following nested list with 606
:
stuff = [
['hello', 'world'],
['example', 'mem', None, 6, 88],
[4, 8, 12],
]
You don't have to search the list. Just write an assignment that replaces the 6
.
stuff[1][3] = 606
Video Walkthrough
Consider the following nested collection:
cats = {
'Pete': {
'Cheddar': {
'color': 'orange',
'enjoys': {
'sleeping',
'snuggling',
'meowing',
'eating',
'birdwatching',
},
},
'Cocoa': {
'color': 'black',
'enjoys': {
'eating',
'sleeping',
'playing',
'chewing',
},
},
},
}
Write one line of code to print the activities that Cocoa enjoys.
print(cats['Pete']['Cocoa']['enjoys'])
Video Walkthrough
Without running the following code, determine what each line should print.
print('johnson' in 'Joe Johnson')
print('sen' not in 'Joe Johnson')
print('Joe J' in 'Joe Johnson')
print(5 in range(5))
print(5 in range(6))
print(5 not in range(5, 10))
print(0 in range(10, 0, -1))
print(4 in {6, 5, 4, 3, 2, 1})
print({1, 2, 3} in {1, 2, 3})
print({3, 2} in {1, frozenset({2, 3})})
print('johnson' in 'Joe Johnson') # False
print('sen' not in 'Joe Johnson') # True
print('Joe J' in 'Joe Johnson') # True
print(5 in range(5)) # False
print(5 in range(6)) # True
print(5 not in range(5, 10)) # False
print(0 in range(10, 0, -1)) # False
print(4 in {6, 5, 4, 3, 2, 1}) # True
print({1, 2, 3} in {1, 2, 3}) # False
print({3, 2} in {1, frozenset({2, 3})}) # True
in
with strings is case sensitive.
range(5)
does not include 5
; it ranges from 0 to 4.
range(10, 0, -1)
does not include 0
; it ranges from 10 to 1.
in
with sets only checks whether a specific value is in the set.
{3, 2}
and frozenset({2, 3})
are considered equal sets.
Video Walkthrough
Write some code that determines and prints whether the number 3
appears inside each of these lists:
numbers1 = [1, 3, 5, 7, 9, 11]
numbers2 = []
numbers3 = [2, 4, 6, 8]
numbers4 = ['1', '3', '5']
numbers5 = ['1', 3.0, '5']
You should print True
or False
depending on each result.
print(3 in numbers1) # True
print(3 in numbers2) # False
print(3 in numbers3) # False
print(3 in numbers4) # False (3 != '3')
print(3 in numbers5) # True (3 == 3.0)
Video Walkthrough
Without running the following code, determine what each print statement should print.
cats = ('Cocoa', 'Cheddar',
'Pudding', 'Butterscotch')
print('Butterscotch' in cats)
print('Butter' in cats)
print('Butter' in cats[3])
print('cheddar' in cats)
cats = ('Cocoa', 'Cheddar',
'Pudding', 'Butterscotch')
print('Butterscotch' in cats) # True
print('Butter' in cats) # False (note 1)
print('Butter' in cats[3]) # True (note 2)
print('cheddar' in cats) # False
in
list" must match a list element exactly.
cats[3]
is 'Butterscotch'
and 'Butter'
is in 'Butterscotch'
.
Video Walkthrough
Assume you have the following sequences:
my_str = 'abc'
my_list = ['Alpha', 'Bravo', 'Charlie']
my_tuple = (None, True, False)
my_range = range(10, 60, 10)
Write some code that combines the sequences into a list of tuples. Each tuple should contain one member of each sequence. Print the final result so you can see all the values, which should look like this:
[('a', 'Alpha', None, 10),
('b', 'Bravo', True, 20),
('c', 'Charlie', False, 30)]
result = zip(my_str, my_list, my_tuple, my_range)
print(list(result))
Video Walkthrough
Without running the following code, what values will be printed by line 10?
pets = {
'Cat': 'Meow',
'Dog': 'Bark',
'Bird': 'Tweet',
}
keys = pets.keys()
del pets['Dog']
pets['Snake'] = 'Sssss'
print(keys)
dict_keys(['Cat', 'Bird', 'Snake'])
Since dict.keys
returns a dictionary view object, any changes made to the dictionary after you call the keys
method will be reflected in dictionary view referenced by keys
immediately.
Video Walkthrough