2. Python Fundamentals I#

In this module, we will explore python’s Data Types, Data Structures, and Control Flow. These are the fundamental building blocks on which python programs are created.

Data Types#

Data types define how a computer stores, processes, and interprets data. Understanding data types is critical for programming in python and other languages.

Think of data types as labels that tell a computer how to handle different kinds of information - just like you wouldn’t store soup in a paper bag, computers need to know what they’re working with. The main types are:

  • Numbers (1, 2, 3.14)

  • Text (“hello world”)

  • True/False values (like a light switch)

  • Lists [1,2,3] to group things together

They matter because they help computers store data efficiently, know how to handle calculations (adding numbers is different from combining text!), and catch mistakes before they happen.

Fundamentals#

There are several fundamental types of objects in python:

  • bool: with values True and False

  • str: for alphanumeric text like “Hello world”

  • int: for integers like 1, 42, and -5

  • float: for floating point numbers like 96.8

Containers#

There are several types of ‘continers’ which can store multiple values of the previous data types:

  • list: a mutable ordered list of data values

  • tuple: an immutable, ordered list of data values

  • set: an unordered mutable collection of unique data values

  • dict: a hash map which permits lookup on the basis of keys and values

We will explore each of these data types and containers in this module.

Checking Types with the type() function#

We can check the type of any python object by issuing the type() command and inspecting the output

Casting#

We can convert between objects by calling one of data types with a python object in parenthesis. For example:

my_var = 42
print(my_var)
print(type(my_var))
42
<class 'int'>
my_var_string = str(my_var)
print(my_var_string)
print(type(my_var_string))

# Notice that the `int` and `str` data types print identical values on the command line!
42
<class 'str'>

Strings#

Strings are basically text data in programming - any sequence of characters like “Hello World” or “user@email.com”.

They’re crucial because most of what computers process involves text: usernames, passwords, messages, file names, URLs, and search queries are all strings.

What makes strings special is you can manipulate them easily - split them up, combine them, search through them, or replace parts of them. Think of scanning an email for spam keywords or breaking up a full name into first and last name - that’s all string manipulation.

Strings are a very important data type in all languages. In Python, strings may be quoted several ways:

Construction#

output_file = "output.txt"
triplequotes = """woah! strings can
split lines """
print(triplequotes)  # Split onto multiple lines; newline char embedded.
woah! strings can
split lines 

Equivalently, with single quotes:

trip_single_quotes = """I am a string too.
I can span multiple lines!"""
print(trip_single_quotes)
I am a string too.
I can span multiple lines!

Quotes in strings#

We construct strings using either single quotes ('this is a string'), double quotes ("this is also a string") or triple quotes ('''yet another string!'''). Sometimes, we will want to include quotes in a string (say, a paragraph of text with some apostrophes). If the same type of quote that is used to define the string is used within the string, the interpreter will think that is the end of the string:

print('defining a string with a contraction such as you're')
  Cell In[5], line 1
    print('defining a string with a contraction such as you're')
                                                              ^
SyntaxError: unterminated string literal (detected at line 1)

We can fix this by mixing quotes:

print("defining a string with a contraction such as you're")
defining a string with a contraction such as you're

This also works for multiple quotes:

print("I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.")
I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.

Alternatively, we can escape the quote with a backslash, i.e. a \:

print("defining a string with a contraction such as you're")
print("I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.")
defining a string with a contraction such as you're
I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.

We recommend using double quotes " " to be consistent with PEP8 style

String methods#

There are several built in python methods which operate on strings, and can make manipulating strings much simpler and easier.

Slicing / Indexing#

We can access any singular character, within a string through ‘slicing’, also sometimes called ‘indexing’, by placing square brackets after a variable name:

string_to_index = "this is a string that will be indexed"
string_to_index[0:4]
'this'

In this command, we select all values between the 0th index and the 4th index (non - inclusive). This means we are accessing the characters ‘this’ in the example above, the 0th, 1st, 2nd, and 3rd characters.

The : tells the computer to select values starting at one index and ending at another. We can also access any individual character:

another_string = "here's another string to index~"
print(another_string[0])
print(another_string[1])
print(another_string[2])
print(another_string[3])
h
e
r
e

You may have observed a very important characteristic here: python is zero indexed meaning that the first value in a string is given the zero index:

Let’s play around with some other indexing commands to get the hang of it:

string_to_index = "this is a string that will be indexed"
print(string_to_index[5:])
print(string_to_index[5:20])
print(string_to_index[-10:])
is a string that will be indexed
is a string tha
be indexed

Note that when slicing, we do not have to supply the 0th or last index, python is able to figure out the start / end indices automatically. The following returns every character in the string, beginning with the 5th character:

string_to_index[5:]
'is a string that will be indexed'

Note that we can also use negative indices, which count backwards from the end of the string. The following returns every character in the string, beginning with the 10th character from the end:

print(string_to_index[-10:])
be indexed

The following diagram summarizes python string indexing and slicing for an example string 'Python':

str_idx

Strings are immutable!#

While we can access any individual character of a string, we cannot ever directly change characters:

string_to_index[10] = "a"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 string_to_index[10] = 'a'

TypeError: 'str' object does not support item assignment

String functions#

There are several handy built-in string functions that make manipulating data in strings easier:

  • Concatenate strings: +

  • Split strings based on substring: split('substr')

  • Find substring: find('substr')

  • Replace substring with another substring replace('substr1','substr2')

  1. Combining strings (AKA concatennation): Strings can be concatennated using the + sign:

"some string " + "another string"
'some string another string'

note that we had to include a trailing space on some string in order to produce a concatennated string with spaces - the spaces are not added automatically.

  1. Splitting strings: strings can easily be separated based on a character using the split() function:

"we will split this string into pieces".split("p")
['we will s', 'lit this string into ', 'ieces']

This returns 3 separate strings, which are broken out by wherever p occurs in the substring.

Strings can also be separated by a sequence of characters

"we will split um this other string um on the basis um of where the um words occur".split("um")
['we will split ',
 ' this other string ',
 ' on the basis ',
 ' of where the ',
 ' words occur']
  1. Finding the first index where a character appears in a string with the find() function:

"here is a test string for which we will find the first ooccurrence of the letter i".find("i")
5
  1. Replacing characters appearing in a strings with the replace() function:

"here is a test string for which we will replace ooccurrence of one letter with another".replace("i", "j")
'here js a test strjng for whjch we wjll replace ooccurrence of one letter wjth another'
"this also works for full words and phrases, pretty neat, huh?".replace("neat", "swell")
'this also works for full words and phrases, pretty swell, huh?'

The len() function returns the number of characters in the string

test_str = "use len() to count the number of chars in this string"
len(test_str)
53

It’s easy to convert strings between upper and lower case with the string.upper() and string.lower() methods”

test_str = "AlTeRnAtInG cAsEs"
print(test_str.lower())  # Converts all chars to lowercase
print(test_str.upper())  # Converts all chars to uppercase
alternating cases
ALTERNATING CASES

String formatting#

It’s often convenient to create strings formatted from a combination of strings, numbers, and other data. In Python 3 this can be handled in two ways: the format string method. E.g.:

name = "Aakash"
course = "py4wrds"
# Prints: My name is Aakash. I am an instructor for py4wrds.
print("My name is {0}. I am an instructor for {1}.".format(name, course))
My name is Aakash. I am an instructor for py4wrds.

Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output.

If you need to include a brace character in the literal text, it can be escaped by doubling the braces: i.e. use . The number in the braces refers to the order of arguments passed to format.

Numbers don’t need to be specified if the sequence of braces has the same order as arguments:

course = "py4wrds"
number_of_students = 25
print("this course is {}, and {} students are in attendance".format(course, number_of_students))
this course is py4wrds, and 25 students are in attendance

Another way to handle string formatting is with F-strings, where variable names can be inserted directly into the curly braces

import math

r = 4
print(f"The area of a circle of radius {r} is {math.pi * math.pow(r, 2)}")
The area of a circle of radius 4 is 50.26548245743669

To summarize, string formatting is a good way to combine text and numeric data. It’s also how we control the output of floating point numbers:

# Fixed point precision (always uses six significant decimal digits).
print(" {{:f}}: {:f}".format(42.42))  # Prints 42.40000
# General format (knows how to drop trailing zeros in decimals).
print(" {{:g}}: {:g}".format(42.42))  # Prints 42.42
# Exponent (scientific) notation.
print(" {{:e}}: {:e}".format(42.42))  # Prints 4.242000e+01

# We can also specify how many digits of precision we want.
print(" {{:.2e}}: {:.2e}".format(42.42))  # Prints 4.24e+01.

# Or we can specify the width of our output (excluding +/- signs).
print("{{: 8.2e}}: {: 8.2e}".format(42.42))  # Prints total of 8 chars: 4.24e+01
print("{{: 8.2e}}: {: 8.2e}".format(-1.0))  # Prints -1.00e+00
 {:f}: 42.420000
 {:g}: 42.42
 {:e}: 4.242000e+01
 {:.2e}: 4.24e+01
{: 8.2e}:  4.24e+01
{: 8.2e}: -1.00e+00

We recommend using F-Strings to be consistent with PEP-8 formatting guidelines

Numbers#

Recall that numbers in python are represented by the int and float types. We can perform all standard numerical operations:

  • x + y : sum of x and y

  • x - y : differenceof x and y

  • x * y : productof x and y

  • x / y : quotientof x and y

  • x//y : floored quotient of x and y

  • x % y : remainder of x / y

  • -x : x negated

  • +x : x unchanged

  • abs(x) : absolute value (i.e. magnitude)

  • int(x) : x converted to integer

  • float(x) : x converted to floating point

  • x ** y : x to the power y

A full list of numerical operations can be found in the official python documentation

Complex Numbers#

Python supports complex numbers, with the character j denoting the imaginary part:

complex_num = 3 + 4j

The real and imaginary parts can be accessed by using .real and .imag synyax

print(complex_num.real)
print(complex_num.imag)
3.0
4.0

Numerical Conversions#

Python will automatically convert between numerical types of when performing operations:

float_num = 3.4
int_num = 2
print(float_num + int_num)
5.4

Notice that the integer in the operation was converted to a float type automatically before performing the calculation. This is known as widening

The same type of widening occurs when combining floats/ints with complex numbers:

print(float_num * complex_num)
print(int_num * complex_num)
(10.2+13.6j)
(6+8j)

In python, converting between float and int types is also easy:

float_num = 2.7
x = int(float_num)
print(type(x))
<class 'int'>

Notice that floats converted to ints are automatically rounded towards zero :

print(x)
2

We can observe this to be true by testing a negative number:

x = -3.8
int(x)
-3

Converting numbers to/from strings#

Python makes it simple to convert numbers to/from strings. This is especially useful when reading data from a text file:

string_num = "234"
print(int(string_num))
print(type(int(string_num)))
234
<class 'int'>
string_num = "234"
print(float(string_num))
print(type(float(string_num)))
234.0
<class 'float'>

Simiarly, it’s simple to convert from a numerical type to a string, using the str() constructor`

float_num = 25.0
print(str(float_num))
print(type(str(float_num)))
25.0
<class 'str'>

Note that attempting to concatenate a string with a numerical type will yield an error:

print("stringy string string " + 58)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import somepackage

ModuleNotFoundError: No module named 'somepackage'

This is best handled using F-strings or string formatting:

my_int = 456
print(f"stringy string string {58} with fstrings {my_int} ")
stringy string string 58 with fstrings 456 

Booleans#

A Boolean or Bool for short, is simply a value of True or False. You’ve probably heard of these if you have doen any programming.

Booleans are useful for checking data types, and lots of other things we’ll cover later.

For now, consider the following examples:

my_int_var = 42
type(my_int_var) is int
True
my_int_var = 42
type(my_int_var) is float
False

Notice we are usint the word is here to check whether the variable is an int type or a float type

We can also string together multiple statements using the following Boolean Operators : and, or, not

my_int_var = 42
my_float_var = 42.0

# Returns True
type(my_int_var) is int and type(my_float_var) is float
True
# Returns False
type(my_int_var) is int and type(my_int_var) is float
False
# Returns True
print(type(my_int_var) is int or type(my_int_var) is float)
True
# Returns True
type(my_int_var) is not float
True
# Returns False
type(my_int_var) is not int
False

These operators can be combined to form complex expressions and improve the readability and robustness of python programs. We will use these operators and revisit boolean logic througout this course.

If / Else Statements#

Perhaps the most well-known concept in programming is that of an if statement. The if statement is used for conditional execution:

# Try changing the value of `my_value` to get a sense for the `if`, `elif`, and `else` logic.
my_value = 10

if my_value > 0:  # If this expression evaluates to True, the following line will be printed.
    # If it evaluates to False, we move to the 'elif' statement
    print("value is positive")
elif my_value == 0:  # If this expression evaluates to True, the following line will be printed.
    # If it evaluates to False, we move to the 'else' statement
    print("value is zero")
else:
    print("value is negative")
value is positive

There can be zero or more elif: parts, and the else: part is optional.

The keyword elif is short for ‘else if’, and is useful to avoid excessive indentation.

An ifelifelif … sequence is a substitute for the switch or case statements found in other languages. See Switch Statement for more info.

Lists#

Lists are the most important and common data type in Python. A list is simply an ordered collection of items. Think of a vector in mathematics, or a grocery list of items to be purchased. In python, we can create lists using the square brackets [ ]:

list_a = [1, 2, 3, 4, 5]
print(list_a)
list_b = ["a", "b", "c", "d", "e"]
print(list_b)
[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']

A list can be comprised of any data type, or multiple data types:

list_of_many_types = ["a", 2, False, 4, "am I a list, or an element?", 17.5, True]
print(list_of_many_types)
['a', 2, False, 4, 'am I a list, or an element?', 17.5, True]

Each item in a list is called an element

We can call the len() operator to determine the number of elements in a list:

len(list_of_many_types)
7

Accessing elements in lists#

We can access any element of a list using the same style of indexing used to access characters in strings (square brackets after the variable name).

Remember - the first element in a list is index 0:

print(list_of_many_types[0])
print(list_of_many_types[1])
print(list_of_many_types[2])
print(list_of_many_types[4])
a
2
False
am I a list, or an element?

We can also index lists negatively, similar to how we were able to access characters in strings:

print(list_of_many_types[-1])
print(list_of_many_types[-2])
True
17.5

Slicing#

We can access a subset of a list using Slicing in the same way that we are able to access substrings:

list_of_many_types = ["a", 2, False, 4, "am I a list, or an element?", 17.5, True]
list_of_many_types[2:5]
[False, 4, 'am I a list, or an element?']

We call the [2:5] a slice, which also returns a list of with elements at positions 2,3, and 4 in the original list.

We can also slice to presesrve the start or end of the list, by keeping one side of the colon blank:

# Start with third element, go to end.
print(list_of_many_types[2:])

# Extract elements indexed at positions 0, 1, and 2.
list_of_many_types[:3]
[False, 4, 'am I a list, or an element?', 17.5, True]
['a', 2, False]

Let’s say we want to slice a list and select every kth entry - we can add an extra set of parantheses to the slicing syntax to set the ‘step’:

list_of_squares = [1, 4, 9, 16, 25, 36, 49, 64, 81]
# slice the list starting at 2nd element, ending at 9th element, selecting every 3rd element
list_of_squares[1:9:3]
[4, 25, 64]

List operations#

The + operator combines lists. This is also called concatenation.

list_one = ["a", 4, "one", 3.4]
list_two = ["b", 6, "two", 7.8]
print(list_one + list_two)
['a', 4, 'one', 3.4, 'b', 6, 'two', 7.8]

The * operator can be used to repeat lists:

list_one * 2
['a', 4, 'one', 3.4, 'a', 4, 'one', 3.4]
2 * list_two
['b', 6, 'two', 7.8, 'b', 6, 'two', 7.8]

Lists are mutable#

Lists can be modified ‘in place’, meaning that individual elements can be changed

list_one = ["a", 4, "one", 3.4]
print(list_one)
['a', 4, 'one', 3.4]
# assign a new value to index -1, i.e. replace float 3.4 with string "last element"
list_one[-1] = "last element"
print(list_one)
['a', 4, 'one', 'last element']

We can also assign slices:

list_one[1:3] = [2, 3]
print(list_one)
['a', 2, 3, 'last element']

Copying lists#

Let’s attempt to copy a list referenced by variable a to another variable b:

a = ["a", "b", "c"]  # define list a
b = a  # attempt to copy a to b
b[1] = 1  # changing an element in list b
print(b)  # confirming the change was made to the second element in list b
print(a)  # but HOLD UP! we also modified the second element in list a !
['a', 1, 'c']
['a', 1, 'c']

This is very interesting - we modified list b, but that caused list a to change.

Let’s try another way to copy the list:

a = ["a", "b", "c"]
b = a  # first attempt to copy a to b
c = list(a)  # second attempt to copy a to c using the list constructor
d = a[:]

b[1] = 1  # now we want to change an element in b

print("a: ", a)  # list 'a' got modified (unintentionally).
print("b: ", b)  # list 'b' of course got modified (intentionally).
print("c: ", c)  # list 'c' got preserved, success!
print("d: ", d)  # list 'd' got preserved, another success!
a:  ['a', 1, 'c']
b:  ['a', 1, 'c']
c:  ['a', 'b', 'c']
d:  ['a', 'b', 'c']

What do we observe? A list can be copied with b = list(a) or b = a[:]. The second option is a slice including all elements! Why does python work this way?

Python’s data model#

Why do we observe the behavior in the previous example?

  • Variables in Python are references to objects in memory.

  • Assignment with the = operator sets the variable to refer to an object.

Here is a simple example:

a = [1, 2, 3, 4]
b = a
b[1] = "modified"
print(a)
print(b)
[1, 'modified', 3, 4]
[1, 'modified', 3, 4]

Notice that list a and list b are identical - In this example, we assigned a to b via b = a.

This did not copy the data, it only copied the reference to the list object in memory.

Thus, modifying the list through object b also changes the data that you will see when accessing object a.

We can inspect the memory addresses in Python with the id command:

print("id(a): ", id(a))  # These two variables...
print("id(b): ", id(b))  # ...both refer to the same object.
id(a):  4707089920
id(b):  4707089920

Those numbers refer to memory addresses on the computer.

Copying objects and data#

So how do we copy objects and data generally, without having to call the appropriate constructor every time?

The copy function in the copy module is a generic way to copy a list:

import copy
a = [1, 2, 3, "abc"]
b = copy.copy(a)
b[3] = "xyz"
print(b)  # Variable b has its last element replaced.
print(a)  # Variabla a is unchanged, like we hoped.
[1, 2, 3, 'xyz']
[1, 2, 3, 'abc']

Elements in a list (the individual items) are also references to memory location. For example if your list contains a list, this will happen when using copy.copy():

sublist = [1, 2, 3]
a = [2, "string", sublist]
b = copy.copy(a)
b[2][0] = 5555
print(b)  # We modified the nested list...
print(a)  # ...which also affected the contents of variable 'a'.
[2, 'string', [5555, 2, 3]]
[2, 'string', [5555, 2, 3]]

What just happened? The element for the sub-list [5555, 2, 3] is actually a memory reference - when we copy the outer list, only references for the contained objects are copied. Therefore, modifying the copy b modifies the original a. Thus, we may need the function copy.deepcopy():

sublist = [1, 2, 3]
a = [2, "string", sublist]
b = copy.deepcopy(a)
b[2][0] = 5555
print(b)  # Variable b had its third element modified.
print(a)  # But we can observe here that variable a is unchanged
[2, 'string', [5555, 2, 3]]
[2, 'string', [1, 2, 3]]

Sorting lists#

Sorting lists in python is easy and can be very useful for lots of reasons.

my_list = list(range(10))
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
import random

random.shuffle(my_list)
print(my_list)
[7, 5, 2, 4, 6, 1, 0, 8, 9, 3]

The command random.shuffle sorts the list in place, meaning the original list is modified. If we want to create a new list that is sorted, we can use the sorted function:

my_sorted_list = sorted(my_list)
print(my_sorted_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

we can also sort in place using the .sort() function:

my_list.sort()
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

List operations#

The following is a summary of operations that can be performed on lists:

x refers to an element, s refers to a list, and n refers to any integer

  • x in s: returns True if an item of s is equal to x, else False

  • x not in s: returns False if an item of s is equal to x, else True

  • s + t, where t is another list: the concatenation of s and t.

  • s * n or n * s: equivalent to adding s to itself n times

  • s[i]: ith item of s, with start index of 0

  • s[i:j]: slice of s from index i to j

  • s[i:j:k]: slice of s from index i to index j with step k

  • len(s): length of s

  • min(s): smallest item of s

  • max(s): largest item of s

  • s.index(x): index of the first occurrence of x in s

  • s.count(x): total number of occurrences of x in s

  • s[i] = x: assign element with index i in list s a new value of x

In the examples below, x refers to an element, s refers to a list, and t refers to another list

  • s[i:j] = t: slice of s from i to j is replaced by the contents of another list t

  • del s[i:j]: remove elements between index i and j from s

  • s[i:j] = []: remove elements between index i and j from s

  • s[i:j:k] = t: the elements of s[i:j:k] are replaced by those of another list t

  • del s[i:j:k]: removes the elements of s[i:j:k] from the list s

  • s.append(x): appends x to the end of the existing list s

  • s.clear(): removes all items from s (equivalent to del s[:])

  • s.copy(): creates a shallow copy of s (equivalent to s[:])

  • s.extend(t) or s += t: extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t)

  • s *= n: updates s with its contents repeated n times

  • s.insert(i, x): inserts x into s at the index given by i (equivalent to s[i:i] = [x])

  • s.pop([i]): retrieves the item at i and also removes it from s

  • s.remove(x): remove the first item from s where s[i] == x

  • s.reverse(): reverses the items of s in place

  • s.sort(): sorts the items of s in place

Lists are the most important and common data type in Python, so understanding their construction and manipulation is important.

The following are helpful resources about lists:

help(list)
Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |
 |  Built-in mutable sequence.
 |
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |
 |  Methods defined here:
 |
 |  __add__(self, value, /)
 |      Return self+value.
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, index, /)
 |      Return self[index].
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __le__(self, value, /)
 |      Return self<=value.
 |
 |  __len__(self, /)
 |      Return len(self).
 |
 |  __lt__(self, value, /)
 |      Return self<value.
 |
 |  __mul__(self, value, /)
 |      Return self*value.
 |
 |  __ne__(self, value, /)
 |      Return self!=value.
 |
 |  __repr__(self, /)
 |      Return repr(self).
 |
 |  __reversed__(self, /)
 |      Return a reverse iterator over the list.
 |
 |  __rmul__(self, value, /)
 |      Return value*self.
 |
 |  __setitem__(self, key, value, /)
 |      Set self[key] to value.
 |
 |  __sizeof__(self, /)
 |      Return the size of the list in memory, in bytes.
 |
 |  append(self, object, /)
 |      Append object to the end of the list.
 |
 |  clear(self, /)
 |      Remove all items from list.
 |
 |  copy(self, /)
 |      Return a shallow copy of the list.
 |
 |  count(self, value, /)
 |      Return number of occurrences of value.
 |
 |  extend(self, iterable, /)
 |      Extend list by appending elements from the iterable.
 |
 |  index(self, value, start=0, stop=9223372036854775807, /)
 |      Return first index of value.
 |
 |      Raises ValueError if the value is not present.
 |
 |  insert(self, index, object, /)
 |      Insert object before index.
 |
 |  pop(self, index=-1, /)
 |      Remove and return item at index (default last).
 |
 |      Raises IndexError if list is empty or index is out of range.
 |
 |  remove(self, value, /)
 |      Remove first occurrence of value.
 |
 |      Raises ValueError if the value is not present.
 |
 |  reverse(self, /)
 |      Reverse *IN PLACE*.
 |
 |  sort(self, /, *, key=None, reverse=False)
 |      Sort the list in ascending order and return None.
 |
 |      The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
 |      order of two equal elements is maintained).
 |
 |      If a key function is given, apply it once to each list item and sort them,
 |      ascending or descending, according to their function values.
 |
 |      The reverse flag can be set to sort in descending order.
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  __class_getitem__(object, /)
 |      See PEP 585
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__(*args, **kwargs)
 |      Create and return a new object.  See help(type) for accurate signature.
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __hash__ = None

For loops and while loops#

Loops are another foundational concept in python programming. They are useful for repeating actions.

There are two types of loops: for and while.

Let’s say we want to compute the square of each integer between 1 and 8. We can use a for loop to make this much more efficient than computing each square individually:

for i in range(9):
    print(i**2)
0
1
4
9
16
25
36
49
64

the syntax for i in range(n): will execute the indented code print (i**2) a total of n times, with i = 0, 1, 2, ... n-1

An important note on python syntax#

Indents in python code matter, unlike in C, C++, or Java. Indents are actually a form of syntax in python.

Python requires indenting loops and conditionals (we will get to these a bit later). Indenting can be achieved using either one TAB or four SPACEBAR keystrokes. There is a lively debate over which of these is preferred. Text editors now make it easy to switch between tabs an spaces - the important thing is to remain consistent.

The range() function#

Let’s say you want to make a list of numbers - the range() function is a handy and convenient way to do that. We can convert a range() object to a python list by using the list() constructor:

# Get a range 0,1,2,...,7
print(list(range(8)))
[0, 1, 2, 3, 4, 5, 6, 7]
# Get a range from 5 to 9
print(list(range(5, 10)))
[5, 6, 7, 8, 9]
# Get a range from 4 to 16 with a step of 4
print(list(range(4, 16, 4)))
[4, 8, 12]

For loops and Lists#

We can use a for loop to iterate over items in a list:

my_list = [1, 41.99, True, "this is a string", ["sub", "list", "of", "strings"]]
for item in my_list:
    print(item)
1
41.99
True
this is a string
['sub', 'list', 'of', 'strings']

The enumerate() function#

When iterating through a for loop, if we want to access the index of each item in the list, along with the item, we can use the enumerate() function:

my_list = [1, 41.99, True, "this is a string", ["sub", "list", "of", "strings"]]
for index, item in enumerate(my_list):
    print("index {}: {}".format(index, item))
index 0: 1
index 1: 41.99
index 2: True
index 3: this is a string
index 4: ['sub', 'list', 'of', 'strings']

While loops#

If we don’t know how many iterations are needed, we can use a while loop:

i = 2
while i < 100:  # loop body will only execute if conditional statement is True
    print(i**2, end=", ")
    i = i**2
4, 16, 256, 

This loop prints the values of \(i^2\) and updates the value of \(i\), while i is <100. Notice the loop stops since 256 > 100

The infinite loop#

Beware the infamous infinite loop! Consider the following example, where the condition will never become false:

NOTE to stop this, you will have to press CTRL + c in the interpreter, or the “STOP” button in jupyter lab / notebook

# Uncomment the below lines to run, but beware that this code will execute forever unless stopped.
# while True:
# print("infinite loop")

Nested loops#

We can put loops within loops - these are called ‘nested loops’

Consider the following example:

for i in range(8):  # outer loop
    for j in range(i):  # inner loop
        print(j, end=" ")  # inner loop body
    print(" ")  # outer loop body
 
0  
0 1  
0 1 2  
0 1 2 3  
0 1 2 3 4  
0 1 2 3 4 5  
0 1 2 3 4 5 6  

Continue and Break#

We can exit a loop, or move to the next iteration if desired.

  • break is used to exit the loop immediately, skipping any remaining iterations.

  • continue is used to skip the current iteration and move to the next one.

# Using break to find prime numbers
max_n = 10

for n in range(2, max_n):
    for x in range(2, n):
        if n % x == 0:  # modulo of n divisible by x is zero
            print(n, "equals", x, "*", n / x)
            break
    else:  # executed if no break in for loop
        # loop fell through without finding a factor
        print(n, "is a prime number")
2 is a prime number
3 is a prime number
4 equals 2 * 2.0
5 is a prime number
6 equals 2 * 3.0
7 is a prime number
8 equals 2 * 4.0
9 equals 3 * 3.0
# Using continue to skip even numbners
for num in range(10):
    if num % 2 == 0:  # Skip even numbers
        continue
    print(num)
1
3
5
7
9

An(other) note on loop syntax best practices#

It is bad practice to define a variable inside of a conditional or loop body and then reference it outside:

name = "Nick"
if name == "Nick":
    age = 45  # newly created variable

print("Nick's age is {}".format(age))
Nick's age is 45
name = "Bob"
if name == "Nick":
    id_number = 45  # also newly created variable
print("{}'s id number is {}".format(name, id_number))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[86], line 5
      2 if name == "Nick":
      3     id_number = 45 # also newly created variable
----> 5 print("{}'s id number is {}".format(name, id_number))
      6 # NameError: name 'id_number' is not defined

NameError: name 'id_number' is not defined

Good practice is to define/initialize variables at the same level they will be used:

name = "Bob"
age = 55
if name == "Nick":
    age = 45

print(f"{name}'s age is {age}")
Bob's age is 55

Tuples#

Tuples are essentially immutable lists, meaning they cannot be modified. Elements in a list can be heterogenous.

Use Case:

  • You have a set of (possibly non-unique or heterogeneous) data that you wish to remain ordered and immutable for the lifetime of the object.

  • E.g. we may expect that a person’s name should be invariant over the course of a peron’s lifetime, and therefore may choose to represent first and last names via a length-two tuple.

Tuples are denoted by parantheses: tup = (1,2,3).#

  • Create an empty tuple: empty_tup = (), or equivalently empty_tup = tuple().

  • Create a tuple with some data: tup = (42, 3.14, "abc").

Values within tuples can be of any type.

Properties of Tuples#

  • Immutability - The number of items in a tuple, or what those fixed number of items reference, are not allowed to change after creation. Put differently, the size and type signature of a tuple are fixed at the time of creation.

  • Order matters, and is defined at construction time - Since tuples are immutable, their elements are ordered. We can access items via indexing and slicing, similar to lists.

Similrities with Lists: Indexing, Slicing, and Looping#

my_tuple = (1, 2, 3, 4)
print(my_tuple[1])  # Prints the second value
print(my_tuple[1:3])  # Prints the second and third values
2
(2, 3)

Similar to lists, we can also loop through tuples:

for element in my_tuple:
    print(element)
1
2
3
4

Immutability:#

# Lists are mutable
my_list = ["I", "am", "a", "list"]
my_list[0] = "I still"  # Modifying elements of a list is OK.
print("my_list:", my_list)  # --> my_list: ["I still", "am", "a", "list"].

# Tuples are not mutable (immutable)
my_tuple = ("I", "am", "a", "tuple")
my_list: ['I still', 'am', 'a', 'list']
my_tuple[2] = "still a"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[90], line 8
      6 # Tuples are not mutable (immutable) 
      7 my_tuple = ("I", "am", "a", "tuple")
----> 8 my_tuple[2] = "still a"

TypeError: 'tuple' object does not support item assignment

However, keep in mind that if a tuple contains a list element, that can be modified, since lists ARE mutable:

my_tuple = ("I", "am", "a", "tuple", ["containing", "a", "sub", "list"])
my_tuple[4][1] = "a mutable"
print(my_tuple)
('I', 'am', 'a', 'tuple', ['containing', 'a mutable', 'sub', 'list'])

Unpacking Tuples#

Python offers a convenient way to “unpack” tuples.

In tuple unpacking, we can store the elements of a tuple into multiple variables in one line of code:

# Define tuple, and unpack into 3 variables
my_tuple = ("a string", 43, 99.9)
my_str, my_int, my_flt = my_tuple

print(my_str, my_int, my_flt)

# Equivalently:
my_str = my_tuple[0]
my_int = my_tuple[1]
my_flt = my_tuple[2]

print(my_str, my_int, my_flt)
a string 43 99.9
a string 43 99.9

Sets#

In Python, a set is an unordered, mutable collection of unique items.

Unlike a list, a set cannot contain any duplicates. A great use case for a Set would be Checking for Existence of Usernames.

  • We care about existence, and not duplicity;

  • testing for existence is “fast”. E.g. When a new-user signs up for a service, we want to check if their proposed username is available instantly.

Sets are constructed with curly braces: my_set = {5, 8, "str", 49.2}, or with the set() constructor: my_set = set([1, 2, 3])

We can add items to a set with the .add() method

# create and update using add method
myclasses = set()
myclasses.add("math")
myclasses.add("chemistry")
myclasses.add("literature")

# create using {} notation
yourclasses = {"physics", "gym", "math"}

Set Operations#

We can easily compute the Union or the Intersection between multiple sets, using the & and | characters respectively:

print(myclasses & yourclasses)  # Returns only elements in  either myclasses AND yourclasses
print(myclasses | yourclasses)  # Returns elements found in either myclasses OR yourclasses
{'math'}
{'chemistry', 'literature', 'physics', 'gym', 'math'}

Example: finding unique elements in a list:#

Say we have a ‘shopping basket’ with lots of (non unique) items, we can use set() to identify the unique elements:

grocery_list = ["apples", "bananas", "yogurt", "apples", "milk", "bananas", "yogurt", "granola", "bread"]
print(set(grocery_list))
{'apples', 'yogurt', 'bananas', 'granola', 'milk', 'bread'}

Set Methods#

we saw that we could append an element to a set with the .add() method. There are many other useful operations, such as .remove(),.intersection(),.difference(), etc… Have a look at the official documentation for a comprehensive list.

Or, simply type help(set) into the python interpreter.

Dictionaries#

Dictionaries relate a key to a value.

Think of a real-world dictionary of word definitions. They unique key is a word, the value associated with the key is the definition of the word.

In Python, this is represented with the built-in dict or Dictionary type.

They are commonly called ‘maps’, ‘hashes’, ‘key-value pairs’, and ‘associative arrays’, in other programming languages and in mathematics.

Example use case: Phone book / cell phone contacts:#

You want to repeatedly “look up” values that are associated with identifiers. E.g. your contacts list, which might pair a name (the key) with a phone-number (the value). Want to look up a friends phone number? Just type their name (the key) and you get their phone number (the value) instantly.

Dictionary Syntax:#

  • Dictionaries can be created using curly braces {}, or with the dict() constructor:

  • Create an empty dictionary: empty_dict = {}, or equivalently, empty_dict = dict{}

  • Create a dictionary with data: emails={'aakash':'aaahamed@calpoly.edu', 'andrew':'andrew@ajnisbet.com'}

Keys must be immutable - e.g. numbers, strings, or tuples

Values can be any python data type - e.g. lists, strings, or even other dictionaries

Access we can access values associated with a key with square brackets: value = dictionary[key]

# One example to initialize and populate a dictionary
emails = {}
emails["aakash"] = "aaahamed@calpoly.edu"
emails["andrew"] = "andrew@ajnisbet.com"
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}
# Another equivalent example to initialize a dictionary
emails = {"aakash": "aaahamed@calpoly.edu", "andrew": "andrew@ajnisbet.com"}
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}
# Another equivalent example to initialize and populate a dictionary
emails = dict()
emails["aakash"] = "aaahamed@calpoly.edu"
emails["andrew"] = "andrew@ajnisbet.com"
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}

We can easily access a single element of a dictionary by using bracket notation, and placing the key inside the brackets:

emails["andrew"]
'andrew@ajnisbet.com'

Equivalently, we can access a value using the .get(key) syntax:

emails.get("andrew")
'andrew@ajnisbet.com'

If the key is not contained within our dict, we receive a KeyError

emails["payman"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[101], line 1
----> 1 emails['payman']

KeyError: 'payman'

Let’s walk through a more comprehensive example to illustrate some of the usefulness of dictionaries:

# Initialize an empty dictionary
movie_years = {}  # Equivalently: movie_years = dict()
# Populate dictionary with movie names (keys) and release years (values)
movie_years["Fargo"] = 1996
movie_years["Fellowship of the Ring"] = 2001
movie_years["The Departed"] = 2006
movie_years["Godfather Part 1"] = 1972
movie_years["Dune"] = 2021
movie_years["Monty Python and the Holy Grail"] = 1975
movie_years["2001 a Space Odyssey"] = 1968
movie_years["A Clockwork Orange"] = 1972
print(movie_years)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}

We can get determine the number of keys in the dictionary with the len() function:

len(movie_years)
8

Dicts are mutable - we can change the value associated with a key:

movie_years["Fargo"] = 2008
print(movie_years)
print("-----" * 20)
movie_years["Fargo"] = 1996
print(movie_years)
{'Fargo': 2008, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}
----------------------------------------------------------------------------------------------------
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}

Dicts are mutable - we can also delete a key:value pair if desired by using the del command and supplying the dict key to be deleted:

del movie_years["A Clockwork Orange"]
print(movie_years)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}

Accessing Keys and Values#

We can access the dictionary keys and values as a list if desired using the following syntax:

names = movie_years.keys()
print(names)
years = movie_years.values()
print(years)
dict_keys(['Fargo', 'Fellowship of the Ring', 'The Departed', 'Godfather Part 1', 'Dune', 'Monty Python and the Holy Grail', '2001 a Space Odyssey'])
dict_values([1996, 2001, 2006, 1972, 2021, 1975, 1968])

It’s often convenient to convert these to python lists by using the list constructor:

names = list(movie_years.keys())
print(names)
print(type(names))
years = list(movie_years.values())
print(years)
print(type(years))
['Fargo', 'Fellowship of the Ring', 'The Departed', 'Godfather Part 1', 'Dune', 'Monty Python and the Holy Grail', '2001 a Space Odyssey']
<class 'list'>
[1996, 2001, 2006, 1972, 2021, 1975, 1968]
<class 'list'>

If we have two lists of the same length, and want to create a dictionary, using one list as keys, and the other list as values, we can use the zip() function within the dict() constructor to achieve this.

another_movie_year_dict = dict(zip(names, years))
print(another_movie_year_dict)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}

Looping Through Dictionaries#

We can loop through the keys of a dictionary, similar to how we can loop through the elements of a list:

for movie in movie_years:
    print(movie)
Fargo
Fellowship of the Ring
The Departed
Godfather Part 1
Dune
Monty Python and the Holy Grail
2001 a Space Odyssey

We can also loop through the values:

for year in movie_years.values():
    print(year)
1996
2001
2006
1972
2021
1975
1968

If we want to loop through both the keys and values, we can use the .items() syntax:

for key, value in movie_years.items():
    print(f"{key} was released in {value}")
Fargo was released in 1996
Fellowship of the Ring was released in 2001
The Departed was released in 2006
Godfather Part 1 was released in 1972
Dune was released in 2021
Monty Python and the Holy Grail was released in 1975
2001 a Space Odyssey was released in 1968

Dictionaries are ordered#

Since python 3.7, dictionaries are ordered: when looping through a dictionary, the order of the keys/values will be the same order that they were added.

Saving and Loading Dicts#

It is often convenient to write and read dictionaries. This can be achieved easily using the json module (more on modules later). .json is also a file format that mirrors dictionary format.

import json

with open("movie_years.json", "w") as fp:
    json.dump(movie_years, fp)

Notice that a file with the name movie_years.json was written to the directory where this command executed.

It is equally easy to load the file we wrote as a dictionary:

with open("movie_years.json", "r") as f:
    my_other_movie_dict = json.load(f)
print(my_other_movie_dict)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}

Comprehensions#

Python has a special feature known as ‘comprehensions’, in which operations can be applied to each element in a container (list, dictionary, tuple, or set), without using a for loop. These expressions are very common in python code.

All comprehensions can be equivalently expressed as for loops. The key thing to remember is that comprehensions can implement for loops in a more concise and compact syntax. While this may sometimes be convenient, it can also be less readable than a for loop.

Consider our movie_years dictionary. Perhaps we want to increment each of the release years by one year and store the result in a dict called movie_years_updated. We can achieve this with a for loop:

movie_years_updated = {}

for k, v in movie_years.items():
    movie_years_updated[k] = v + 1

print(movie_years)
print(movie_years_updated)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}
{'Fargo': 1997, 'Fellowship of the Ring': 2002, 'The Departed': 2007, 'Godfather Part 1': 1973, 'Dune': 2022, 'Monty Python and the Holy Grail': 1976, '2001 a Space Odyssey': 1969}

The same result can be achieved with a ‘dictionary comprehension’ with the following:

{k: v + 1 for k, v in movie_years.items()}
{'Fargo': 1997,
 'Fellowship of the Ring': 2002,
 'The Departed': 2007,
 'Godfather Part 1': 1973,
 'Dune': 2022,
 'Monty Python and the Holy Grail': 1976,
 '2001 a Space Odyssey': 1969}

Notice that we use the curly braces {} to denote a dictionary result. We can ue the regular brackets [] to return a list:

[v + 5 for v in movie_years.values()]
[2001, 2006, 2011, 1977, 2026, 1980, 1973]

Reading and manipulating text files#

Let’s synthesize some of the materials we covered in this chapter, and use python tools to:

  1. open a text file containing movie names and years, using with open(filename):

  2. loop through the rows, extract the information we want using string operations, and store that information in a python object

  3. write the resulting objects to a file

To open text files, we can use the with open(filename): syntax:

lines = []

with open("./data/movies.txt", "r", encoding = 'utf-8') as file:  # the 'r' means read only
    lines = file.readlines()  # read each line of the textfile into a list element
    file.close()  # close the file

We now have each line of the textfile stored as an element in a list called lines. Let’s loop through the list, and create a dictionary like movie_years that we can write to a .json file.

# Print the first 10 lines in the file we read
lines[:10]
['1,Toy Story (1995)\n',
 '2,Jumanji (1995)\n',
 '3,Grumpier Old Men (1995)\n',
 '4,Waiting to Exhale (1995)\n',
 '5,Father of the Bride Part II (1995)\n',
 '6,Heat (1995)\n',
 '7,Sabrina (1995)\n',
 '8,Tom and Huck (1995)\n',
 '9,Sudden Death (1995)\n',
 '10,GoldenEye (1995)\n']

Notice that every line has an index, and a \n, which is an escape sequence that adds a new line (or carriage return) to text, often called a ‘newline’ character. We want to filter these out.

movie_years_big = {}  # create the empty dictionary we will fill up

for idx, line in enumerate(lines):  # loop through list of lines read from textfile
    # the following line splits each string into a list of strings based on the position of the comma
    # and replaces the newline character ("\n"), with nothing ("")
    info_we_want = line.split(",")[1].replace("\n", "")
    # the following line splits each string into a list of strings based on the position of the open paranthesis
    # We select the first element of the list as the title [0], and remove the trailing space with strip()
    title = info_we_want.split("(")[0].strip()
    # We select the second element of the list as the year [1], and replace the trailing close paranthesis with nothing
    year = info_we_want.split("(")[1].replace(")", "")
    # Lastly, we populate our dictionary
    movie_years_big[title] = year
len(movie_years_big)
8617