Using Collections

In this chapter, we'll learn how to work with Python's collections. In particular, we'll explore indexing and key-based access, then explore some of the many operations most collections have in common.

Indexing

Indexing is the process of using a whole number to access and perhaps alter an element of a sequence. All sequences, including strings, support indexing. Python uses index 0 as the first element of all built-in sequences. To access the first element of a sequence assigned to variable seq, we use seq[0]. We use seq[1] to access the second element. This process repeats until we exhaust the sequence:

seq = ('a', 'b', 'c')
print(seq[0])  # a (1st element)
print(seq[1])  # b (2nd element)
print(seq[2])  # c (3rd element)
print(seq[3])  # IndexError: tuple index out of range

In the above example, we used a tuple as our sequence. However, we could have used any other sequence, such as a list or string. It's worth noting that the index of the final element is one less than the sequence's length.

An error occurred when accessing index 3. The last element in a 3-element sequence has index 2, not 3. The len function can determine a sequence's length. You can use its return value to determine whether an index is out of range:

seq = ('a', 'b', 'c')
if len(seq) > 3:
    print(seq[3])

Note that we tested whether the length was greater than the intended index. If you want to access element 3, the sequence must have at least four elements.

Suppose we want to access the last element in a sequence? To do that, we can compute the index of the last element and then use that value:

seq = ('a', 'b', 'c')
last_index = len(seq) - 1
print(seq[last_index])        # c

You can also use negative indexes, which are often easier to work with:

seq = ('a', 'b', 'c')
print(seq[-1])  # c (last element)
print(seq[-2])  # b (next to last element)
print(seq[-3])  # a (2nd to last element)

If you want to change the value of an element in a mutable sequence (i.e., a list), you can use indexing on the left side of an assignment:

seq = ['a', 'b', 'c']
seq[1] = 'B'
print(seq)      # ['a', 'B', 'c']

This operation mutates the list but merely reassigns seq[1].

Slicing

The indexing syntax also supports a slicing augmentation. Slicing can extract (or modify) any number of consecutive elements simultaneously. For instance, the syntax seq[start:stop] retrieves the elements from seq whose index is between start and stop - 1, inclusive. You can also use negative indexes for the slice. Finally, you can use the seq[start:stop:step] syntax to slice every "step-th" element.

seq = 'abcdefghi'
print(seq[3:7])       # defg
print(seq[-6:-2])     # defg
print(seq[2:8:2])     # ceg
print(repr(seq[3:3])) # ''
print(seq[:])         # abcdefghi
print(seq[::-1])      # ihgfedcba

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(seq[3:7])       # [4, 5, 6, 7]
print(seq[-6:-2])     # [5, 6, 7, 8]
print(seq[2:8:2])     # [3, 5, 7]
print(seq[3:3])       # []
print(seq[:])         # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(seq[::-1])      # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

seq = [[1, 2], [3, 4]]
seq_dup = seq[:]
print(seq[0] is seq_dup[0])   # True

Line 5 shows that you get an empty slice when the start and stop values are the same.

Line 6 returns a duplicate of the sequence. It is equivalent to seq[0:len(seq)]. This syntax creates a shallow copy, which we'll discuss later.

Line 7 returns a reversed copy of a sequence. seq[::-1] is similar to seq.reverse() (described later). The former returns a new sequence; the latter mutates the original. seq.reverse() is much easier to read, but the mutation should be intentional.

Lines 9-15 demonstrate that slicing also works with other sequences (a list in this case).

Lines 17-19 demonstrate that slicing performs a shallow copy if the sequence contains any collections, such as lists or tuples. Custom objects, which we'll meet in our Object Oriented Programming with Python book, are also subject to shallow copying.

Slicing also works as an assignment's target (the left side of an =). However, we'll refrain from demonstrating that as it often leads to code that is hard to read.

Key-Based Access

Indexing uses whole numbers and only works with sequences and strings. However, mappings use a syntax called key-based access that looks like indexing. However, keys are usually strings, though not always.

Instead of limiting ourselves to whole numbers, we can use any hashable object as a key, which includes the built-in immutable types. We sometimes use integer keys, which, confusingly, look like indexing. Other types are rarely used but can be helpful (especially tuples).

With dicts, we use key-based access like this:

my_dict = {
    'a': 'abc',
    37: 'def',
    (5, 6, 7): 'ghi',
    frozenset([1, 2]): 'jkl',
}

print(my_dict['a'])                # abc
print(my_dict[37])                 # def
print(my_dict[(5, 6, 7)])          # ghi
print(my_dict[frozenset([1, 2])])  # jkl
print(my_dict['nothing'])     # KeyError: 'nothing'

We've used a string, an integer, a tuple, and a frozen set as keys in this dict. We've also seen that we get a KeyError if we try to use a non-existent key. If there's a chance you might get a KeyError, consider using the dict.get method. It returns the value associated with a given key if the key exists. Otherwise, it produces a default return value (usually None, but other values can be specified):

my_dict = {
    'a': 'abc',
    37: 'def',
    (5, 6, 7): 'ghi',
    frozenset([1, 2]): 'jkl',
}

print(my_dict.get('a'))                 # abc
print(my_dict.get('nothing'))           # None
print(my_dict.get('nothing', 'N/A'))    # N/A
print(my_dict.get('nothing', 100))      # 100

We can also use key-based access to the left of the = operator:

my_dict = {
    'a': 'abc',
    37: 'def',
    (5, 6, 7): 'ghi',
    frozenset([1, 2]): 'jkl',
}

my_dict['a'] = 'ABC'
my_dict[37] = 'DEF'
my_dict[(5, 6, 7)] = 'GHI'
my_dict[frozenset([1, 2])] = 'JKL'
print(my_dict)
# Pretty printed for clarity
# {
#     'a': 'ABC',
#     37: 'DEF',
#     (5, 6, 7): 'GHI',
#     frozenset({1, 2}): 'JKL'
# }

Can we assign new keys to a dict?

my_dict['xyz'] = 'Hey there!'
print(my_dict['xyz'])         # Hey there!

By all means, we can!

Can we use a mutable key?

my_dict[[1, 2, 3]] = 'Hey there!'
# TypeError: unhashable type: 'list'

Nope. It doesn't work. Dictionary keys must be immutable.

Common Collection Operations

Most Python collections support various operations, functions, and methods, some of which only apply to mutable collections. We'll examine the non-mutating operations first. We won't cover everything you can do. However, we'll see most of what you will encounter at Launch School.

Non-Mutating Operations for Collections

All of the operations in this section work equally well with mutable and immutable collections. They never modify the original collection. Instead, they return new objects.

Collection Membership

The in operator determines whether the object to the operator's left is in the iterable collection on the right. It returns True if the item on the left is in the collection on the right, False otherwise.

The not in operator is the inverse of in. It returns False if the object is in the collection; True if not.

With sequences and sets, these operators compare the object for equality against each collection element. For mappings (dicts), it checks whether the item is a key in the dictionary. For strings, it determines whether the right string contains the left string.

seq = [4, 'abcdef', (True, False, None)]
print(4 in seq)                         # True
print(4 not in seq)                     # False
print('abcdef' in seq)                  # True
print('abcdef' not in seq)              # False
print('cde' in seq[1])                  # True
print('cde' not in seq[1])              # False
print('acde' in seq[1])                 # False
print('acde' not in seq[1])             # True
print((True, False, None) in seq)       # True
print((True, False, None) not in seq)   # False
print(3.14 in seq)                      # False
print(3.15 not in seq)                  # True

Minimum and Maximum Members

min and max return the minimum and maximum members in an iterable collection. The only requirement is that any pair of the collection's elements are comparable with the < and > operators.

my_set1 = {1, 4, -9, 16, 25, -36, -63, -1}
my_set2 = {'1', '4', '-9', '16', '25', '-36', '-1'}

print(min(my_set1), max(my_set1))     # -63 25
print(min(my_set2), max(my_set2))     # -1 4

As you can see, min and max know how to compare the members of our sets. It determines the type of comparison by looking at the element types.

In most cases, you can't use min and max with heterogenous collections:

>>> my_set = {1, 4, '-9', 16, '25', -36, -63, -1}
>>> min(my_set)
TypeError: '<' not supported between instances of
'str' and 'int'

>>> max(my_set)
TypeError: '>' not supported between instances of
'str' and 'int'

However, it is possible in some situations:

my_set = {1, 3.14, -2.71}
print(min(my_set), max(my_set))      # -2.71 3.14

You can also use min and max with multiple arguments instead of an iterable.

print(min(3, 5, -1), max(3, 5, -1))  # -1 5

Summation

The sum function is used in conjunction with iterable collections that consist entirely of numeric values. It computes and returns the sum of all the collection's numbers.

numbers = (1, 1, 2, 3, 5, 8, 13, 21, 34)
print(sum(numbers))                       # 88

Despite what Python's official documentation says, sum cannot be used with strings. It only works with numeric types. Use str.join if you want to concatenate strings

Locating Indices and Counting

Two helpful sequence methods are the index and count methods. seq.index returns the index of the first element in the sequence that matches a given object. It raises a ValueError exception if the object is not found. seq.count returns the number of times a value occurs in the sequence.

names = ['Karl', 'Grace', 'Clare', 'Victor',
         'Antonina', 'Allison', 'Trevor']
print(names.index('Clare'))   # 2
print(names.index('Trevor'))  # 6
print(names.index('Chris'))
# ValueError: 'Chris' is not in list
numbers = [1, 3, 6, 5, 4, 10, 1, 5, 4, 4, 5, 4]
print(numbers.count(1))       # 2
print(numbers.count(3))       # 1
print(numbers.count(4))       # 4
print(numbers.count(7))       # 0

index also works with strings. It searches for the first matching substring of a string:

names = 'Karl Grace Clare Victor Antonina Trevor'
print(names.index('Clare'))   # 11
print(names.index('Trevor'))  # 33
print(names.index('Chris'))
# ValueError: substring not found

Merging Collections

One of the most impressively helpful functions is zip, which works with all iterables. It lets you merge the members of multiple iterables into a single list of tuples. zip makes it easy to iterate through many collections simultaneously.

zip iterates through 0 or more iterables in parallel and returns a list-like object of tuples. Each tuple contains a single object from each of the iterables. That's a mouthful, so let's take a look at an example:

iterable1 = [1, 2, 3]
iterable2 = ('Kim', 'Leslie', 'Bertie')
iterable3 = [None, True, False]

zipped_iterables = zip(iterable1, iterable2, iterable3)
print(list(zipped_iterables))
# Pretty printed for clarity
# [
#   (1, 'Kim', None),
#   (2, 'Leslie', True),
#   (3, 'Bertie', False)
# ]

Here, we've combined 3 iterables (two lists and a tuple) into a list-like object of tuples. Each tuple in the return value has 3 members. The first member is from iterable1, the second from iterable2, and the third from iterable3. The first tuple contains the first element of each of the iterables, the second tuple contains the second element of the iterables, and the third tuple contains the third element. Thus, our first tuple contains the first elements of iterable1 (1), iterable2 ('Kim'), and iterable3 (None).

Note that we referred to zips return value as a list-like object. It's not a true list, but a lazy sequence much the same as a range. You must request values explicitly, which you can do with a loop or iterable constructor. That's why we call list(zipped_iterables) on line 6 above.

zip's collection arguments are usually the same length but don't have to be. If you want to enforce identical lengths, add a strict=True keyword argument to the invocation:

zipped_iterables = zip(iterable1, iterable2, strict=True)

With strict=True, zip raises an exception if the iterables don't all have the same length.

So, what happens if the lengths differ and strict=True isn't given? In this case, zip stops after exhausting the shortest iterable:

result = zip(range(5, 10),    # length is 5
             range(1, 3),     # length is 2 (shortest)
             range(3, 7))     # length is 4
print(list(result)) # [(5, 1, 3), (6, 2, 4)]

Since range(1, 3) only has a length of 2 and strict=True was omitted, the result list only has 2 elements.

The zip function's canonical application is to simultaneously iterate over multiple collections. We'll demonstrate that in the next chapter.

It's worth noting that zip returns what is known as an iterator. We'll discuss iterators in more detail in the Core curriculum. However, one characteristic of iterators that is important to be aware of is that they can only be consumed once. If you iterate over the iterator object, subsequent attempts to iterate will fail. For instance:

result = zip(range(5, 10),    # length is 5
             range(1, 3),     # length is 2 (shortest)
             range(3, 7))     # length is 4
print(list(result)) # [(5, 1, 3), (6, 2, 4)]
print(list(result)) # []

As you can see, once we've consumed the return value of zip, we can't do it again. Most Python functions and methods that return lazy sequences may only be consumed once; range objects are one of the few exceptions to this rule.

Operations on Dictionaries

Python provides 3 methods to get lists of the keys, values, and key/value pairs from a dictionary. Those methods are dict.keys, dict.values, and dict.items. Let's see them in action:

people_phones = {
    'Chris': '111-2222',
    'Pete':  '333-4444',
    'Clare': '555-6666',
}

print(people_phones.keys())
# dict_keys(['Chris', 'Pete', 'Clare'])

print(people_phones.values())
# Pretty printed for clarity
# dict_values([
#     '111-2222',
#     '333-4444',
#     '555-6666'
# ])

print(people_phones.items())
# Pretty printed for clarity
# dict_items([
#     ('Chris', '111-2222'),
#     ('Pete',  '333-4444'),
#     ('Clare', '555-6666')
# ])

The lists produced by these methods aren't ordinary lists. Python wraps the output for each list with dict_keys(), dict_values(), or dict_items() to show that these aren't regular lists. They are actually dictionary view objects that are tied to the dictionary. If you add a new key/value pair to the dictionary, remove an element, or update a value, the corresponding lists are updated immediately:

people_phones = {
    'Chris': '111-2222',
    'Pete':  '333-4444',
    'Clare': '555-6666',
}

keys = people_phones.keys()
values = people_phones.values()

print(keys)    # dict_keys(['Chris', 'Pete', 'Clare'])
print(values)
# dict_values(['111-2222', '333-4444', '555-6666'])

people_phones['Max'] = '123-4567'
people_phones['Pete'] = '345-6789'
del people_phones['Chris']

print(keys)    # dict_keys(['Pete', 'Clare', 'Max'])
print(values)
# dict_values(['345-6789', '555-6666', '123-4567'])

Operations for Mutable Sequences

Let's see some of the mutating operations we can use with mutable sequences, the most common of which are lists. All methods in this section mutate the collection used to call them.

Keep in mind that all these methods work with any properly defined mutable sequence, such as deques and arrays, both of which are available for Python.

Adding Elements to Mutable Sequences

We can use the append, insert, and extend methods to add new elements to a mutable sequence, such as a list.

  • seq.append appends a single object to the end of a mutable sequence, such as a list:

    numbers = [1, 2]
    
    numbers.append(10)      # Append the number 10
    print(numbers)          # [1, 2, 10]
    
  • seq.insert inserts an object into a mutable sequence before the element at a given index. If the given index is greater than or equal to the sequence's length, the object is appended to the sequence. If the index is negative, it is counts from the end of the sequence.

    numbers = [1, 2]
    
    numbers.insert(0, 8)    # Insert 8 before numbers[0]
    print(numbers)          # [8, 1, 2]
    numbers.insert(2, 6)    # Insert 6 before numbers[2]
    print(numbers)          # [8, 1, 6, 2]
    numbers.insert(100, 55) # Insert 55 before numbers[100]
    print(numbers)          # [8, 1, 6, 2, 55]
    numbers.insert(-3, 33)  # Insert 33 before the 3rd element
                            # from the end.
    print(numbers)          # [8, 1, 33, 6, 2, 55]
    
  • seq.extend appends the contents of an iterable sequence to the calling iterable sequence.

    numbers = [1, 2]
    
    numbers.extend([7, 8])  # Append 7 and 8 to numbers
    print(numbers)          # [1, 2, 7, 8]
    

Removing Elements from Mutable Sequences

We can use the remove, pop, and clear methods to remove elements from a mutable sequence:

  • seq.remove searches a sequence for a specific object and removes the first occurrence of that object. It raises a ValueError if there is no such object.

    my_list = [2, 4, 6, 8, 10]
    
    my_list.remove(8)
    print(my_list)            # [2, 4, 6, 10]
    
    my_list.remove(8)
    # ValueError: list.remove(x): x not in list
    
  • seq.pop removes and returns an indexed element from a mutable sequence. If no index is given, it removes the last element in the sequence. It raises an error if the index is out of range. pop only works with mutable indexed sequences.

    my_list = [2, 4, 6, 8, 10]
    
    print(my_list.pop(1))         # 4
    print(my_list)                # [2, 6, 8, 10]
    
    print(my_list.pop())          # 10
    print(my_list)                # [2, 6, 8]
    
    print(my_list.pop(4))
    # IndexError: pop index out of range
    
  • seq.clear removes all elements from a sequence, leaving it empty.

    my_list = [2, 4, 6, 8, 10]
    
    my_list.clear()
    print(my_list)                # []
    

Sorting Collections

You can use the sorted function to create a sorted list from any iterable collection, mutable or immutable. It creates and returns a sorted list from the elements in the collection. The original collection is unchanged.

You can also sort lists using the list.sort method. However, this method mutates the list. It's also worth noting that my_list.sort() is a bit faster and less memory intensive than sorted(my_list) since the method does an in-place sort, so doesn't have to build a completely new list.

names = ('Grace', 'Clare', 'Allison', 'Trevor')
print(sorted(names))
# ['Allison', 'Clare', 'Grace', 'Trevor']

print(names)
# ('Grace', 'Clare', 'Allison', 'Trevor')

names = list(names)
print(names)
# ['Grace', 'Clare', 'Allison', 'Trevor']

print(names.sort())   # None
print(names)
# ['Allison', 'Clare', 'Grace', 'Trevor']

In the sorted example, we've sorted and printed the tuple of names. Note that the result is a list, not a tuple. We've then shown that the original names tuple is unchanged.

Next, we converted the names tuple to a list and sorted it with names.sort. Note that sort returned None, which strongly hints that the collection was mutated. When we print the names, we see that the list was mutated.

By default, both sort and sorted do an ascending sort using the < operator to compare elements from the collection. You can reverse the sort by adding a reverse=True keyword argument to the argument list:

names = ['Grace', 'Clare', 'Allison', 'Trevor']
print(sorted(names, reverse=True))
# ['Trevor', 'Grace', 'Clare', 'Allison']

names.sort(reverse=True)
print(names) # ['Trevor', 'Grace', 'Clare', 'Allison']

You can also pass a key=func keyword argument to tell sort or sorted how to determine what values it should sort. For instance, if you want to perform a case-insensitive sort on a list of strings, you can specify key=str.casefold:

words = ['abc', 'DEF', 'xyz', '123']
print(sorted(words))
# ['123', 'DEF', 'abc', 'xyz']

print(sorted(words, key=str.casefold))
# ['123', 'abc', 'DEF', 'xyz']

In most cases, you can also use str.lower instead of str.casefold. However, using str.casefold is considered best-practice since sort will be comparing the strings.

You can also sort a list of numeric-valued strings by passing key=int to the function or method:

numbers = ['1', '5', '100', '15', '534', '53']
numbers.sort()
print(numbers)   # ['1', '100', '15', '5', '53', '534']

numbers.sort(key=int)
print(numbers)   # ['1', '5', '15', '53', '100', '534']

This is probably your first time seeing that Python's functions and methods are objects that can be passed around as arguments to a function. (They can also be passed around as return values.) Remember this moment. The technique is potent.

If you use sorted on a dictionary, it returns a sorted list of the dictionaries keys.

Reversing Sequences and Dictionaries

You can use the reversed function to reverse the order of elements in a sequence or dictionary. The returned value is a lazy sequence that contains the elements in the sequence or the keys from a dictionary. Since the result is lazy, you need to iterate over the result or expand it with a function list list or tuple.

You can also reverse lists using the list.reverse method. However, this method mutates the list. In general, you should use list.reverse when you really need to reverse the list's contents, and don't need to preserve the original order. You should use reversed when all you need to do is iterate over the list in reverse. Don't use reversed if you eventually want to convert the result to a non-lazy sequence such as a list or tuple:

names = ('Grace', 'Clare', 'Allison', 'Trevor')
reversed_names = reversed(names)
print(reversed_names)
# <reversed object at 0x102848e50>
print(tuple(reversed(names))) # Requires extra memory
# ('Trevor', 'Allison', 'Clare', 'Grace')
print(names)
# ('Grace', 'Clare', 'Allison', 'Trevor')

names = list(names)
print(names.reverse())   # None
print(names)
# ['Trevor', 'Allison', 'Clare', 'Grace']

my_dict = {'abc': 1, 'xyz': 23, 'pqr': 0, 'jkl': 5}
reversed_dict = reversed(my_dict)
print(reversed_dict)
# <dict_reversekeyiterator object at 0x100d19f80>

print(list(reversed_dict))    # Requires extra memory
# ['jkl', 'pqr', 'xyz', 'abc']

In the reversed example using a tuple, we've reversed and printed the tuple of names. We've then shown that the original names tuple is unchanged.

Next, we converted the names tuple to a list and reversed it with names.reverse. Note that reverse returned None, which strongly hints that the collection was mutated. When we print the names, we see that the list was mutated.

Finally, we demonstrated using reversed with a dictionary.

Think of the reversed function as a looping aid. You sometimes want to iterate over a collection in reverse. reversed makes that easy:

names = ('Grace', 'Clare', 'Allison', 'Trevor')
for name in reversed(names):
    print(name)
# Trevor
# Allison
# Clare
# Grace

String Operations

As we've seen, Python has many built-in functions for manipulating sequences and other collections. You can use many of these functions with strings, even though strings are not proper collections or sequences.

The str class also offers a veritable goldmine of methods for using and manipulating strings in myriad ways. Let's take a look at some of these. We cannot cover them all, but we'll look at some of the most useful. We'll skip over anything we've already discussed in any detail.

Letter Case

We've already seen how the str.lower and str.upper methods work. Let's check out two related methods:

  • str.capitalize returns a copy of str with the first character capitalized and the remaining characters converted to lowercase.

    print("what's up?".capitalize())        # What's up?
    print('456ABC'.capitalize())            # 456abc
    
  • str.title returns a copy of str with every word in the string capitalized. The remaining characters are converted to lowercase.

    print("four SCORE and sEvEn".title())
    # Four Score And Seven
    

    str.title idea of what constitutes a word can lead to unexpected results. In particular, it uses whitespace and certain punctuation characters as word boundaries:

    print("i can't believe it's already mid-july.".title())
    # I Can'T Believe It'S Already Mid-July.
    

    If you only want to break at whitespace, you can use the capwords function from the string module:

    import string
    print(string.capwords("i can't believe it's already mid-july."))
    # I Can't Believe It's Already Mid-july.
    
  • str.swapcase returns a copy of str with every uppercase letter converted to lowercase, and vice versa.

    print("What's up?".swapcase())          # wHAT'S UP?
    print('456ABC'.swapcase())              # 456abc
    print('456ABC'.swapcase().swapcase())   # 456ABC
    

    The first call to swapcase on line 3 returns the original string with the case swapped; the second swaps the case on that return value.

    Note that there are situations where str.swapcase().swapcase() does not return the original value of str. For instance:

    print('Straße'.swapcase().swapcase())   # Strasse
    

    In this case, the German eszet character (ß), which represents a doubled lowercase s (ss), does not have an uppercase counterpart. Thus, the above example prints Strasse instead of Straße.

Character Classification

Python has a generous suite of methods that test what sort of characters are present in a string. We'll look at just a few of these -- there are many more.

  • str.isalpha() returns True if all characters of str are alphabetic, False otherwise. It returns False if the string is empty.

    'Hello'.isalpha()      # True
    'Good-bye'.isalpha()   # False: `-` is not a letter
    'Four score'.isalpha() # False: space is not a letter
    ''.isalpha()           # False
    
  • str.isdigit() returns True if all characters of str are digits, False otherwise. It returns False if the string is empty.

    '12340'.isdigit()      # True
    '123.4'.isdigit()      # False: `.` is not a digit
    '-1234'.isdigit()      # False: `-` is not a digit
    ''.isdigit()           # False
    
  • str.isalnum() returns True if str is composed entirely of letters and/or digits, False otherwise. It returns False if the string is empty.

  • str.islower() returns True if all cased characters in str are lowercase letters, False otherwise. It returns False if the string contains no case characters.

  • str.isupper() returns True if all cased characters in str are uppercase, False otherwise. It returns False if the string contains no case characters.

  • str.isspace() returns True if all characters in str are whitespace characters, False otherwise. It returns False if the string is empty. The whitespace characters include ordinary spaces (), tabs (\t), newlines (\n), and carriage returns (\r). It also includes two rarely used characters: vertical tabs (\v) and form feeds (\f), as well as some foreign characters that count as whitespace.

Be careful with these methods: they're all Unicode-aware. Thus, 'Καλωσήρθες'.isalpha() returns True since the characters are all part of the Greek alphabet. If you need to exclude non-ASCII characters, use this pattern:

text.isalpha() and text.isascii()

Stripping Characters

The str.strip method returns a copy of str with all leading and trailing whitespace characters (see str.isspace above) removed. Python programmers often use this method to strip unwanted whitespace from input data, such as keyboard input. Unwanted whitespace frequently makes input data harder to work with, so immediately removing excess whitespace is a good idea.

text = input(prompt).strip()

In some of the examples in this section, we use the built-in repr function when printing strings. repr formats strings with quotes and clearly shows the presence of whitespace characters:

text = ' \t  abc def    \n\r'
print(repr(text))             # ' \t  abc def    \n\r'
print(repr(text.strip()))     # 'abc def'

You can also tell strip to remove other characters by providing a string argument. The characters inside this string are the ones you want removed.

text = ' \t  abc def    \n\r'
print(repr(text.strip('abc'))) # ' \t  abc def    \n\r'

text = 'aaabaacccabxyzabccba'
print(text.strip('a'))         # baacccabxyzabccb
print(text.strip('ab'))        # cccabxyzabcc
print(text.strip('ba'))        # cccabxyzabcc
print(text.strip('abc'))       # xyz
print(text.strip('bc'))        # aaabaacccabxyzabccba

print(repr(text.strip('abcxyz'))) # ''

There are things to note with the above examples:

  • Lines 2 and 9 show that only leading and trailing characters that match the argument are removed.
  • Lines 6-8 show that strip removes individual characters, not substrings. That is, the order of the characters in the argument doesn't matter. Thus, on line 8, we remove all of the leading and trailing 'a', 'b', and 'c' characters, leaving nothing but 'xyz'.

The str.lstrip method is identical to str.strip except it only removes leading characters (the leftmost). Similarly, str.rstrip removes trailing characters (the rightmost).

text = 'aaabaacccabxyzabccba'

print(text.lstrip('a'))       # baacccabxyzabccba
print(text.rstrip('a'))       # aaabaacccabxyzabccb

print(text.lstrip('ab'))      # cccabxyzabccba
print(text.rstrip('ab'))      # aaabaacccabxyzabcc

print(text.lstrip('ba'))      # cccabxyzabccba
print(text.rstrip('ba'))      # aaabaacccabxyzabcc

print(text.lstrip('abc'))     # xyzabccba
print(text.rstrip('abc'))     # aaabaacccabxyz

startswith and endswith

Two related methods are available for determining whether a string begins or ends with a given substring. Let's examine these methods:

  • str.startswith returns True if the string given by str begins with a certain substring, False if it does not:

    'Four score and seven'.startswith('Four score')  # True
    'Four score and seven'.startswith('For score')   # False
    'Four score and seven'.startswith('score')       # False
    

    The argument can also be a tuple of strings:

    'abc def'.startswith(('abc', 'xyz', 'stu'))  # True
    'def ghu'.startswith(('abc', 'xyz', 'stu'))  # False
    'xyz uvw'.startswith(('abc', 'xyz', 'stu'))  # True
    'stu vwx'.startswith(('abc', 'xyz', 'stu'))  # True
    

    The method also accepts "start" and "end" indexes to control where the search begins and ends:

    'abc def'.startswith('def', 4)           # True
    'abc def ghi'.startswith('def', 4, 7)    # True
    
  • str.endswith returns True if the string given by str ends with a certain substring, False if it does not:

    'Four score and seven'.endswith('and seven')  # True
    'Four score and seven'.endswith('ad seven')   # False
    'Four score and seven'.endswith('score')      # False
    

    As with startswith, the argument can be a tuple of strings. You can also supply "start" and "end" indexes:

    'abc def'.endswith(('abc', 'xyz', 'stu'))  # False
    'abc def'.endswith(('xyz', 'def'))         # True
    'abc def'.endswith('def', 4)               # True
    'abc def ghi'.endswith('def', 4, 7)        # True
    

Splitting and Joining Strings

The str.split method returns a list of the words in the string str. By default, split splits the string at sequences of one or more whitespace characters:

text = '  Four     score and   seven years ago.   '
print(text.split())
# ['Four', 'score', 'and', 'seven', 'years', 'ago.']

print('no-spaces'.split()) # ['no-spaces']

If you want to split on something other than spaces, you can tell Python what character or character string should act as a delimiter:

text = ',Four,score,and,,,seven,years,ago,'
print(text.split(','))
# Pretty printed for clarity
# [
#     '',
#     'Four',
#     'score',
#     'and',
#     '',
#     '',
#     'seven',
#     'years',
#     'ago',
#     ''
# ]

Note that specifying a delimiter changes the splitting behavior. Instead of looking for runs of whitespace, it splits the string at every occurrence of the delimiter. This also applies when using a literal space character as the delimiter. Compare the output of the code below to our first example:

text = '  Four     score and   seven years ago.   '
print(text.split(' '))
# Partially pretty printed for clarity
# ['', '', 'Four', '', '', '', '', 'score', 'and',
#  '', '', 'seven', 'years', 'ago.', '', '', '']

split also recognizes multi-character delimiters:

text = 'Four<>score<:>and<>seven<>years<>ago'
print(text.split('<>'))
# ['Four', 'score<:>and', 'seven', 'years', 'ago']

Multi-character delimiters must match exactly. Thus, the < and > in <:> aren't split.

Most major languages have a split function or method. It's worth noting that Python's differs from most others in that you can't split a string into individual characters using split. Python's str.split method doesn't allow a separator of ''. If you need to split a string into characters, use the list or tuple function:

text = 'abcde'
print(list(text))             # ['a', 'b', 'c', 'd', 'e']
print(tuple(text))            # ('a', 'b', 'c', 'd', 'e')

You can also iterate strings a character at a time:

text = 'abcde'
for char in text:
    print(char)

str.splitlines returns a list of lines from the string str. splitlines looks for line-ending characters like \n (line feed), \r (carriage return), \n\r (new lines), and a variety of other line boundaries. See the documentation for a complete list.

text = '''
You were lucky to have a room. We used to have to
live in a corridor.
Oh we used to dream of livin' in a corridor!
Woulda' been a palace to us. We used to live in an
old water tank on a rubbish tip. We got woken up
every morning by having a load of rotting fish
dumped all over us.
'''

print(text.strip().splitlines())
# Pretty printed for clarity
[
    "You were lucky to have a room. We used to have to",
    "live in a corridor.",
    "Oh we used to dream of livin' in a corridor!",
    "Woulda' been a palace to us. We used to live in an",
    "old water tank on a rubbish tip. We got woken up",
    "every morning by having a load of rotting fish",
    "dumped all over us."
]

The final method we'll discuss in this cluster is str.join, which concatenates all strings in an iterable collection into a single lone string. Each string from the collection gets concatenated to the previous string with the value of str between them:

words = ['You', 'were', 'lucky']
print(''.join(words))         # Youwerelucky
print(' '.join(words))        # You were lucky
print(','.join(words))        # You,were,lucky
print('\n  '.join(words))
# You
#   were
#   lucky

Finding Substrings

We often need to search through strings looking for a particular substring. Python provides several ways to do this, including the in and not in operators we looked at earlier.

Here, we'll look at the str.find and str.rfind methods. str.find searches through str looking for the first occurrence of the argument. str.rfind does the same, but it searches from right to left (that is, in reverse). Both methods return the index of the first matching substring. Otherwise, they return -1.

school = 'launch school'

print(school.find(' '))       # 6
print(school.find('l'))       # 0
print(school.find('h'))       # 5
print(school.find('hoo'))     # 9
print(school.find('x'))       # -1
print(school.find('N'))       # -1

print(school.rfind(' '))      # 6
print(school.rfind('l'))      # 12
print(school.rfind('h'))      # 9
print(school.rfind('hoo'))    # 9
print(school.rfind('oh'))     # -1
print(school.rfind('N'))      # -1

Note that both find and rfind are case sensitive, so 'N' doesn't match 'n'.

You can also search slices by adding start and end arguments to the invocation:

text = 'abc abc def abc'

print(text.find(' ', 4))         # 7
print(text.find(' ', 8))         # 11

print(text.find('c', 0, 2))      # -1
print(text.rfind('c', 3, 10))    # 6

Using find or rfind to search slices is not the same as using the [start:stop] syntax first and then searching the result. Compare the outputs for the two find invocations below:

text = 'abc abc def abc'

print(text[3:10].find('c'))     # 3
print(text.find('c', 3, 10))    # 6

If you take a slice of a string before using it to call find or rfind, the method returns the index of the search string in that slice. However, the method considers the starting index when you use the slicing arguments. Another difference is that taking a slice first creates a new string, while using slicing arguments does not.

Nested Collections

Collections can be nested inside other collections. For instance, you can have a list that contains a dict, a set, a tuple, and another list. Each of those can, in turn, also contain nested collections:

nested_list = [
    {'foo': 42, 'bar': [1, 2, 3], 'qux': None},
    {
        'Kim',
        ('Leslie', 'Les'),
        ('Pete', 'Peter'),
        ('Jonathan', 'Jon', 'Jack'),
    },
    (4, 5, (1, 2, 3), 6, 7),
    ['a', 'b', 'cde', -3.141592],
]

There are, however, some limitations on what you can nest in certain collections. For instance, in most cases, you can't nest mutable collections inside some collections. For instance, you can't nest a mutable collection such as a list, dictionary, or another set inside a set:

>>> my_set = {1, 2, 3, [4, 5]}
TypeError: unhashable type: 'list'

>>> my_set = {1, 2, 3, {4, 5}}
TypeError: unhashable type: 'set'

However, you can nest a frozen set inside a set or frozen set:

>>> my_set = { 1, 2, 3, frozenset([4, 5]) }
>>> my_set          # {frozenset({4, 5}), 1, 2, 3}

Curiously, you can nest mutable collections inside tuples even though tuples are immutable.

>>> my_tuple = (1, 2, 3, [4, 5], {6, 7}, {'x': 'a dict'})
>>> my_tuple
(1, 2, 3, [4, 5], {6, 7}, {'x': 'a dict'})

In most cases, nested collections have some sense of structure and reason for the nesting. For instance, we might define a deck of cards as a list of dictionaries:

deck = [
    { 'suit': 'Clubs', 'value': '2' },
    { 'suit': 'Clubs', 'value': '3' },
    { 'suit': 'Clubs', 'value': '4' },
        ...
    { 'suit': 'Spades', 'value': 'Queen' },
    { 'suit': 'Spades', 'value': 'King' },
    { 'suit': 'Spades', 'value': 'Ace' },
]

If we want to print the fiftieth card in the deck, we can write:

print(f"{deck[49]['value']} of {deck[49]['suit']}")
# Queen of Spades

You can also have several layers of nesting in a sequence. In that case, you can access any item from any nested part of the sequence by just running several [] indexing operations together:

nested_seq = [
    [1, 2, [3, 4, (5, 6, 7, 8)]],
    [9, [10, (11,)]],
    [12, 13, [14, 15, (16, 17)]],
    [18, [19, 20, (21, 22)]],
]

print(nested_seq[1])          # [9, [10, (11,)]]
print(nested_seq[3][0])       # 18
print(nested_seq[0][2][2])    # (5, 6, 7, 8)
print(nested_seq[2][2][2][1]) # 17

Nested collections become really "fun" when you start having to iterate through them.

Comparing Collections

As you might expect, Python supports comparison operations for collections. It provides comparison mechanisms for all built-in iterable collections, and you can define comparisons for any custom iterables you create.

Equality is the most straightforward comparison. If two iterables meet all of the following requirements, they are equal. Otherwise, they are unequal.

  • They have the same type: (list, tuple, set, etc.) Note that sets and frozen sets are considered the same for comparison purposes.
  • They have the same number of elements.
  • For sequences, each pair of corresponding elements compares as equal.
  • For sets, each set has the same members (order doesn't matter).
  • For mappings, each key/value pair must be present and identical in both mappings (order doesn't matter).
print([2, 3] == [2, 3])    # True
print([2, 3] == [3, 2])    # False (diff sequence)
print([2, 3] == [2])       # False (diff lengths)
print([2, 3] == (2, 3))    # False (diff types)
print({2, 3} == {3, 2})    # True (same members)

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 2, 'a': 1}
dict3 = {'a': 1, 'b': 2, 'c': 3}

print(dict1 == dict2)      # True (same pairs)
print(dict1 == dict3)      # False

Of course, you can also use != to compare for inequality.

You can also compare sequences for <, <=, >, and >=. However, we won't go into that here. It's rarely needed.

Summary

Python's collections have a wealth of functions, methods, and other operations that programmers need daily. Sometimes, the most challenging problem is deciding whether you need a function, a method, or an operator.

Exercises

  1. Write Python code to print the seventh number of range(0, 25, 3).

    Solution

    my_range = range(0, 25, 3)
    print(my_range[6])                      # 18
    
    print(range(0, 25, 3)[6])               # 18
    

    Video Walkthrough

    Please register to play this video

  2. Use slicing to write Python code to print a 6-character substring of 'Launch School' that begins with the first c.

    Solution

    my_str = 'Launch School'
    print(my_str[4:10])                   # ch Sch
    

    The first c occurs at index 4, so that's our start value for the slice. Since we want 6 characters, the stop value is at index 4 + 6 or 10. Note that the character at index 10 is not included in the result.

    If you want to determine the location of the substring programmatically, you can do this:

    my_str = 'Launch School'
    start = my_str.find('c')
    print(my_str[start:start + 6])        # ch Sch
    

    Some error checking might be advisable there.

    Video Walkthrough

    Please register to play this video

  3. Write Python code to create a new tuple from (1, 2, 3, 4, 5). The new tuple should be in reverse order from the original. It should also exclude the first and last members of the original. The result should be the tuple (4, 3, 2).

    Solution

    my_tuple = (1, 2, 3, 4, 5)
    my_list = list(my_tuple)
    my_list.reverse()
    result = tuple(my_list[1:4])
    print(result)       # (4, 3, 2)
    
    my_tuple = (1, 2, 3, 4, 5)
    result = my_tuple[3:0:-1]
    print(result)       # (4, 3, 2)
    
    my_tuple = (1, 2, 3, 4, 5)
    result = my_tuple[-2:-5:-1]
    print(result)       # (4, 3, 2)
    

    There are several ways to solve this problem. Your first inclination may have been to use the reverse method, as in Solution 1. However, reverse only works with lists, so we must first convert the tuple to a list. Even so, we have to slice the list, though the slice is a little cleaner.

    Solutions 2 and 3 use the same approach by extracting a reversed slice. The only difference is how we specify the start and stop values for the slice. What makes these tricky is that the element indexed by the stop value is not included in the result. If you used one of these solutions, you likely started with an off-by-one bug.

    Video Walkthrough

    Please register to play this video

  4. This is a 3-part question. Consider the following dictionary:

    pets = {
        'Cat':  'Meow',
        'Dog':  'Bark',
        'Bird': 'Tweet',
    }
    
    • Part 1: Write some code to print Bark by accessing the element associated with the key Dog.
    • Part 2: Write some code to print None when you try to print the value associated with the non-existent key, Lizard.
    • Part 3: Write some code to print <silence> when you try to print the value associated with the non-existent key, Lizard.

    Solution

    print(pets['Dog'])
    
    print(pets.get('Lizard'))
    
    print(pets.get('Lizard', '<silence>'))
    

    Since the pets dictionary doesn't have a Lizard key, we need to use the dict.get method so we don't get an error. In Part 2, we don't specify a default value, so get returns None. In Part 3, we set <silence> as the default value.

    Video Walkthrough

    Please register to play this video

  5. Which of the following values can't be used as a key in a dict object, and why?

    'cat'
    (3, 1, 4, 1, 5, 9, 2)
    ['a', 'b']
    {'a': 1, 'b': 2}
    range(5)
    {1, 4, 9, 16, 25}
    3
    0.0
    frozenset({1, 4, 9, 16, 25})
    

    Solution

    The following items can't be used as keys:

    ['a', 'b']
    {'a': 1, 'b': 2}
    {1, 4, 9, 16, 25}
    

    The first value is a list, the second another dict, and the last a set. Since all 3 types are mutable, they can't be used as dict keys. All remaining items are immutable built-in objects; they are acceptable dict keys.

    Video Walkthrough

    Please register to play this video

  6. What will the following code print?

    print('abc-def'.isalpha())
    print('abc_def'.isalpha())
    print('abc def'.isalpha())
    print('abc def'.isalpha() and
          'abc def'.isspace())
    print('abc def'.isalpha() or
          'abc def'.isspace())
    print('abcdef'.isalpha())
    print('31415'.isdigit())
    print('-31415'.isdigit())
    print('31_415'.isdigit())
    print('3.1415'.isdigit())
    print(''.isspace())
    

    Solution

    print('abc-def'.isalpha())       # False
    print('abc_def'.isalpha())       # False
    print('abc def'.isalpha())       # False
    print('abc def'.isalpha() and
          'abc def'.isspace())       # False
    print('abc def'.isalpha() or
          'abc def'.isspace())       # False
    print('abcdef'.isalpha())        # True
    print('31415'.isdigit())         # True
    print('-31415'.isdigit())        # False
    print('31_415'.isdigit())        # False
    print('3.1415'.isdigit())        # False
    print(''.isspace())              # False
    

    There are two things to note above.

    • Lines 4-7: You can't use and or or to determine whether a string contains a mixture of different value types. The str.isXXXXX methods determine whether every character in strmatches a certain class of characters. Thus, a string can't be both alphabetic and whitespace. It can be alphabetic or whitespace, but that doesn't work for something like 'abc def'.
    • Line 13: Most of the str.isXXXXX methods return False when invoked by an empty string.

    Video Walkthrough

    Please register to play this video

  7. Write Python code to replace all the : characters in the string below with +.

    info = 'xyz:*:42:42:Lee Kim:/home/xyz:/bin/zsh'
    

    Try this problem using the methods you've learned about in this chapter. Once you have that working, use the Python documentation for the str type to find an alternative solution.

    Solution

    info = 'xyz:*:42:441:Lee Kim:/home/xyz:/bin/zsh'
    parts = info.split(':')
    result = '+'.join(parts)
    print(result)
    # 'xyz+*+42+42+Lee Kim+/home/xyz+/bin/zsh'
    
    info = 'xyz:*:42:42:Lee Kim:/home/xyz:/bin/zsh'
    result = info.replace(':', '+')
    print(result)
    # 'xyz+*+42+42+Lee Kim+/home/xyz+/bin/zsh'
    

    Video Walkthrough

    Please register to play this video

  8. Explain why the code below prints different values on lines 3 and 4.

    text = "It's probably pining for the fjords!"
    
    print(text[21:35].rfind('f'))     # 8
    print(text.rfind('f', 21, 35))    # 29
    

    Solution

    Line 3 first extracts a slice from text ranging from index 21 through index 35. That returns the string 'for the fjords'. rfind then returns 8, the index of the rightmost instance of an 'f'.

    On the other hand, line 4 does a search for the rightmost f between indexes 21 and 35. Since the f occurs at index 29, that's what the method returns.

    Video Walkthrough

    Please register to play this video

  9. Write some code to replace the value 6 in the following nested list with 606:

    stuff = [
        ['hello', 'world'],
        ['example', 'mem', None, 6, 88],
        [4, 8, 12],
    ]
    

    You don't have to search the list. Just write an assignment that replaces the 6.

    Solution

    stuff[1][3] = 606
    

    Video Walkthrough

    Please register to play this video

  10. Consider the following nested collection:

    cats = {
        'Pete': {
            'Cheddar': {
                'color': 'orange',
                'enjoys': {
                    'sleeping',
                    'snuggling',
                    'meowing',
                    'eating',
                    'birdwatching',
                },
            },
            'Cocoa': {
                'color': 'black',
                'enjoys': {
                    'eating',
                    'sleeping',
                    'playing',
                    'chewing',
                },
            },
        },
    }
    

    Write one line of code to print the activities that Cocoa enjoys.

    Solution

    print(cats['Pete']['Cocoa']['enjoys'])
    

    Video Walkthrough

    Please register to play this video

  11. Without running the following code, determine what each line should print.

    print('johnson' in 'Joe Johnson')
    print('sen' not in 'Joe Johnson')
    print('Joe J' in 'Joe Johnson')
    print(5 in range(5))
    print(5 in range(6))
    print(5 not in range(5, 10))
    print(0 in range(10, 0, -1))
    print(4 in {6, 5, 4, 3, 2, 1})
    print({1, 2, 3} in {1, 2, 3})
    print({3, 2} in {1, frozenset({2, 3})})
    

    Solution

    print('johnson' in 'Joe Johnson')      # False
    print('sen' not in 'Joe Johnson')      # True
    print('Joe J' in 'Joe Johnson')        # True
    print(5 in range(5))                   # False
    print(5 in range(6))                   # True
    print(5 not in range(5, 10))           # False
    print(0 in range(10, 0, -1))           # False
    print(4 in {6, 5, 4, 3, 2, 1})         # True
    print({1, 2, 3} in {1, 2, 3})          # False
    print({3, 2} in {1, frozenset({2, 3})}) # True
    
    • Line 1: in with strings is case sensitive.
    • Line 4: range(5) does not include 5; it ranges from 0 to 4.
    • Line 7: range(10, 0, -1) does not include 0; it ranges from 10 to 1.
    • Line 9: in with sets only checks whether a specific value is in the set.
    • Line 10: {3, 2} and frozenset({2, 3}) are considered equal sets.

    Video Walkthrough

    Please register to play this video

  12. Write some code that determines and prints whether the number 3 appears inside each of these lists:

    numbers1 = [1, 3, 5, 7, 9, 11]
    numbers2 = []
    numbers3 = [2, 4, 6, 8]
    numbers4 = ['1', '3', '5']
    numbers5 = ['1', 3.0, '5']
    

    You should print True or False depending on each result.

    Solution

    print(3 in numbers1)          # True
    print(3 in numbers2)          # False
    print(3 in numbers3)          # False
    print(3 in numbers4)          # False (3 != '3')
    print(3 in numbers5)          # True  (3 == 3.0)
    

    Video Walkthrough

    Please register to play this video

  13. Without running the following code, determine what each print statement should print.

    cats = ('Cocoa', 'Cheddar',
            'Pudding', 'Butterscotch')
    print('Butterscotch' in cats)
    print('Butter' in cats)
    print('Butter' in cats[3])
    print('cheddar' in cats)
    

    Solution

    cats = ('Cocoa', 'Cheddar',
            'Pudding', 'Butterscotch')
    print('Butterscotch' in cats) # True
    print('Butter' in cats)       # False (note 1)
    print('Butter' in cats[3])    # True (note 2)
    print('cheddar' in cats)      # False
    
    • Note 1: "string in list" must match a list element exactly.
    • Note 2: cats[3] is 'Butterscotch' and 'Butter' is in 'Butterscotch'.

    Video Walkthrough

    Please register to play this video

  14. Assume you have the following sequences:

    my_str = 'abc'
    my_list = ['Alpha', 'Bravo', 'Charlie']
    my_tuple = (None, True, False)
    my_range = range(10, 60, 10)
    

    Write some code that combines the sequences into a list of tuples. Each tuple should contain one member of each sequence. Print the final result so you can see all the values, which should look like this:

    [('a', 'Alpha', None, 10),
     ('b', 'Bravo', True, 20),
     ('c', 'Charlie', False, 30)]
    

    Solution

    result = zip(my_str, my_list, my_tuple, my_range)
    print(list(result))
    

    Video Walkthrough

    Please register to play this video

  15. Without running the following code, what values will be printed by line 10?

    pets = {
        'Cat':  'Meow',
        'Dog':  'Bark',
        'Bird': 'Tweet',
    }
    
    keys = pets.keys()
    del pets['Dog']
    pets['Snake'] = 'Sssss'
    print(keys)
    

    Solution

    dict_keys(['Cat', 'Bird', 'Snake'])
    

    Since dict.keys returns a dictionary view object, any changes made to the dictionary after you call the keys method will be reflected in dictionary view referenced by keys immediately.

    Video Walkthrough

    Please register to play this video