2. Python Fundamentals I#
In this module, we will explore python’s Data Types, Data Structures, and Control Flow. These are the fundamental building blocks on which python programs are created.
Data Types#
Data types define how a computer stores, processes, and interprets data. Understanding data types is critical for programming in python and other languages.
Think of data types as labels that tell a computer how to handle different kinds of information - just like you wouldn’t store soup in a paper bag, computers need to know what they’re working with. The main types are:
Numbers (1, 2, 3.14)
Text (“hello world”)
True/False values (like a light switch)
Lists [1,2,3] to group things together
They matter because they help computers store data efficiently, know how to handle calculations (adding numbers is different from combining text!), and catch mistakes before they happen.
Fundamentals#
There are several fundamental types of objects in python:
bool
: with values True and Falsestr
: for alphanumeric text like “Hello world”int
: for integers like 1, 42, and -5float
: for floating point numbers like 96.8
Containers#
There are several types of ‘continers’ which can store multiple values of the previous data types:
list
: a mutable ordered list of data valuestuple
: an immutable, ordered list of data valuesset
: an unordered mutable collection of unique data valuesdict
: a hash map which permits lookup on the basis of keys and values
We will explore each of these data types and containers in this module.
Checking Types with the type()
function#
We can check the type of any python object by issuing the type()
command and inspecting the output
Casting#
We can convert between objects by calling one of data types with a python object in parenthesis. For example:
my_var = 42
print(my_var)
print(type(my_var))
42
<class 'int'>
my_var_string = str(my_var)
print(my_var_string)
print(type(my_var_string))
# Notice that the `int` and `str` data types print identical values on the command line!
42
<class 'str'>
Strings#
Strings are basically text data in programming - any sequence of characters like “Hello World” or “user@email.com”.
They’re crucial because most of what computers process involves text: usernames, passwords, messages, file names, URLs, and search queries are all strings.
What makes strings special is you can manipulate them easily - split them up, combine them, search through them, or replace parts of them. Think of scanning an email for spam keywords or breaking up a full name into first and last name - that’s all string manipulation.
Strings are a very important data type in all languages. In Python, strings may be quoted several ways:
Construction#
output_file = "output.txt"
triplequotes = """woah! strings can
split lines """
print(triplequotes) # Split onto multiple lines; newline char embedded.
woah! strings can
split lines
Equivalently, with single quotes:
trip_single_quotes = """I am a string too.
I can span multiple lines!"""
print(trip_single_quotes)
I am a string too.
I can span multiple lines!
Quotes in strings#
We construct strings using either single quotes ('this is a string'
), double quotes ("this is also a string"
) or triple quotes ('''yet another string!'''
). Sometimes, we will want to include quotes in a string (say, a paragraph of text with some apostrophes). If the same type of quote that is used to define the string is used within the string, the interpreter will think that is the end of the string:
print('defining a string with a contraction such as you're')
Cell In[5], line 1
print('defining a string with a contraction such as you're')
^
SyntaxError: unterminated string literal (detected at line 1)
We can fix this by mixing quotes:
print("defining a string with a contraction such as you're")
defining a string with a contraction such as you're
This also works for multiple quotes:
print("I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.")
I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.
Alternatively, we can escape the quote with a backslash, i.e. a \
:
print("defining a string with a contraction such as you're")
print("I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.")
defining a string with a contraction such as you're
I went to a restaurant that serves 'breakfast at any time'. So I ordered French Toast during the Renaissance.
We recommend using double quotes " "
to be consistent with PEP8 style
String methods#
There are several built in python methods which operate on strings, and can make manipulating strings much simpler and easier.
Slicing / Indexing#
We can access any singular character, within a string through ‘slicing’, also sometimes called ‘indexing’, by placing square brackets after a variable name:
string_to_index = "this is a string that will be indexed"
string_to_index[0:4]
'this'
In this command, we select all values between the 0th index and the 4th index (non - inclusive). This means we are accessing the characters ‘this’ in the example above, the 0th, 1st, 2nd, and 3rd characters.
The :
tells the computer to select values starting at one index and ending at another. We can also access any individual character:
another_string = "here's another string to index~"
print(another_string[0])
print(another_string[1])
print(another_string[2])
print(another_string[3])
h
e
r
e
You may have observed a very important characteristic here: python is zero indexed meaning that the first value in a string is given the zero index:
Let’s play around with some other indexing commands to get the hang of it:
string_to_index = "this is a string that will be indexed"
print(string_to_index[5:])
print(string_to_index[5:20])
print(string_to_index[-10:])
is a string that will be indexed
is a string tha
be indexed
Note that when slicing, we do not have to supply the 0th or last index, python is able to figure out the start / end indices automatically. The following returns every character in the string, beginning with the 5th character:
string_to_index[5:]
'is a string that will be indexed'
Note that we can also use negative indices, which count backwards from the end of the string. The following returns every character in the string, beginning with the 10th character from the end:
print(string_to_index[-10:])
be indexed
The following diagram summarizes python string indexing and slicing for an example string 'Python'
:
Strings are immutable!#
While we can access any individual character of a string, we cannot ever directly change characters:
string_to_index[10] = "a"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[14], line 1
----> 1 string_to_index[10] = 'a'
TypeError: 'str' object does not support item assignment
String functions#
There are several handy built-in string functions that make manipulating data in strings easier:
Concatenate strings:
+
Split strings based on substring:
split('substr')
Find substring:
find('substr')
Replace substring with another substring
replace('substr1','substr2')
Combining strings (AKA concatennation): Strings can be concatennated using the
+
sign:
"some string " + "another string"
'some string another string'
note that we had to include a trailing space on some string
in order to produce a concatennated string with spaces - the spaces are not added automatically.
Splitting strings: strings can easily be separated based on a character using the
split()
function:
"we will split this string into pieces".split("p")
['we will s', 'lit this string into ', 'ieces']
This returns 3 separate strings, which are broken out by wherever p
occurs in the substring.
Strings can also be separated by a sequence of characters
"we will split um this other string um on the basis um of where the um words occur".split("um")
['we will split ',
' this other string ',
' on the basis ',
' of where the ',
' words occur']
Finding the first index where a character appears in a string with the
find()
function:
"here is a test string for which we will find the first ooccurrence of the letter i".find("i")
5
Replacing characters appearing in a strings with the
replace()
function:
"here is a test string for which we will replace ooccurrence of one letter with another".replace("i", "j")
'here js a test strjng for whjch we wjll replace ooccurrence of one letter wjth another'
"this also works for full words and phrases, pretty neat, huh?".replace("neat", "swell")
'this also works for full words and phrases, pretty swell, huh?'
The len()
function returns the number of characters in the string
test_str = "use len() to count the number of chars in this string"
len(test_str)
53
It’s easy to convert strings between upper and lower case with the string.upper()
and string.lower()
methods”
test_str = "AlTeRnAtInG cAsEs"
print(test_str.lower()) # Converts all chars to lowercase
print(test_str.upper()) # Converts all chars to uppercase
alternating cases
ALTERNATING CASES
String formatting#
It’s often convenient to create strings formatted from a combination of strings, numbers, and other data. In Python 3 this can be handled in two ways: the format string method. E.g.:
name = "Aakash"
course = "py4wrds"
# Prints: My name is Aakash. I am an instructor for py4wrds.
print("My name is {0}. I am an instructor for {1}.".format(name, course))
My name is Aakash. I am an instructor for py4wrds.
Format strings contain “replacement fields” surrounded by curly braces {}
. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output.
If you need to include a brace character in the literal text, it can be escaped by doubling the braces: i.e. use . The number in the braces refers to the order of arguments passed to format.
Numbers don’t need to be specified if the sequence of braces has the same order as arguments:
course = "py4wrds"
number_of_students = 25
print("this course is {}, and {} students are in attendance".format(course, number_of_students))
this course is py4wrds, and 25 students are in attendance
Another way to handle string formatting is with F-strings, where variable names can be inserted directly into the curly braces
import math
r = 4
print(f"The area of a circle of radius {r} is {math.pi * math.pow(r, 2)}")
The area of a circle of radius 4 is 50.26548245743669
To summarize, string formatting is a good way to combine text and numeric data. It’s also how we control the output of floating point numbers:
# Fixed point precision (always uses six significant decimal digits).
print(" {{:f}}: {:f}".format(42.42)) # Prints 42.40000
# General format (knows how to drop trailing zeros in decimals).
print(" {{:g}}: {:g}".format(42.42)) # Prints 42.42
# Exponent (scientific) notation.
print(" {{:e}}: {:e}".format(42.42)) # Prints 4.242000e+01
# We can also specify how many digits of precision we want.
print(" {{:.2e}}: {:.2e}".format(42.42)) # Prints 4.24e+01.
# Or we can specify the width of our output (excluding +/- signs).
print("{{: 8.2e}}: {: 8.2e}".format(42.42)) # Prints total of 8 chars: 4.24e+01
print("{{: 8.2e}}: {: 8.2e}".format(-1.0)) # Prints -1.00e+00
{:f}: 42.420000
{:g}: 42.42
{:e}: 4.242000e+01
{:.2e}: 4.24e+01
{: 8.2e}: 4.24e+01
{: 8.2e}: -1.00e+00
We recommend using F-Strings to be consistent with PEP-8 formatting guidelines
Numbers#
Recall that numbers in python are represented by the int
and float
types. We can perform all standard numerical operations:
x + y
: sum ofx
andy
x - y
: differenceofx
andy
x * y
: productofx
andy
x / y
: quotientofx
andy
x//y
: floored quotient of x and yx % y
: remainder ofx
/y
-x
:x
negated+x
:x
unchangedabs(x)
: absolute value (i.e. magnitude)int(x)
:x
converted to integerfloat(x)
:x
converted to floating pointx ** y
:x
to the powery
A full list of numerical operations can be found in the official python documentation
Complex Numbers#
Python supports complex numbers, with the character j
denoting the imaginary part:
complex_num = 3 + 4j
The real and imaginary parts can be accessed by using .real
and .imag
synyax
print(complex_num.real)
print(complex_num.imag)
3.0
4.0
Numerical Conversions#
Python will automatically convert between numerical types of when performing operations:
float_num = 3.4
int_num = 2
print(float_num + int_num)
5.4
Notice that the integer in the operation was converted to a float type automatically before performing the calculation. This is known as widening
The same type of widening occurs when combining floats/ints with complex numbers:
print(float_num * complex_num)
print(int_num * complex_num)
(10.2+13.6j)
(6+8j)
In python, converting between float
and int
types is also easy:
float_num = 2.7
x = int(float_num)
print(type(x))
<class 'int'>
Notice that floats converted to ints are automatically rounded towards zero :
print(x)
2
We can observe this to be true by testing a negative number:
x = -3.8
int(x)
-3
Converting numbers to/from strings#
Python makes it simple to convert numbers to/from strings. This is especially useful when reading data from a text file:
string_num = "234"
print(int(string_num))
print(type(int(string_num)))
234
<class 'int'>
string_num = "234"
print(float(string_num))
print(type(float(string_num)))
234.0
<class 'float'>
Simiarly, it’s simple to convert from a numerical type to a string, using the str()
constructor`
float_num = 25.0
print(str(float_num))
print(type(str(float_num)))
25.0
<class 'str'>
Note that attempting to concatenate a string with a numerical type will yield an error:
print("stringy string string " + 58)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import somepackage
ModuleNotFoundError: No module named 'somepackage'
This is best handled using F-strings or string formatting:
my_int = 456
print(f"stringy string string {58} with fstrings {my_int} ")
stringy string string 58 with fstrings 456
Booleans#
A Boolean or Bool for short, is simply a value of True
or False
. You’ve probably heard of these if you have doen any programming.
Booleans are useful for checking data types, and lots of other things we’ll cover later.
For now, consider the following examples:
my_int_var = 42
type(my_int_var) is int
True
my_int_var = 42
type(my_int_var) is float
False
Notice we are usint the word is
here to check whether the variable is an int
type or a float
type
We can also string together multiple statements using the following Boolean Operators : and
, or
, not
my_int_var = 42
my_float_var = 42.0
# Returns True
type(my_int_var) is int and type(my_float_var) is float
True
# Returns False
type(my_int_var) is int and type(my_int_var) is float
False
# Returns True
print(type(my_int_var) is int or type(my_int_var) is float)
True
# Returns True
type(my_int_var) is not float
True
# Returns False
type(my_int_var) is not int
False
These operators can be combined to form complex expressions and improve the readability and robustness of python programs. We will use these operators and revisit boolean logic througout this course.
If / Else Statements#
Perhaps the most well-known concept in programming is that of an if
statement.
The if
statement is used for conditional execution:
# Try changing the value of `my_value` to get a sense for the `if`, `elif`, and `else` logic.
my_value = 10
if my_value > 0: # If this expression evaluates to True, the following line will be printed.
# If it evaluates to False, we move to the 'elif' statement
print("value is positive")
elif my_value == 0: # If this expression evaluates to True, the following line will be printed.
# If it evaluates to False, we move to the 'else' statement
print("value is zero")
else:
print("value is negative")
value is positive
There can be zero or more elif:
parts, and the else:
part is optional.
The keyword elif
is short for ‘else if’, and is useful to avoid excessive indentation.
An if
… elif
… elif
… sequence is a substitute for the switch
or case
statements found in other languages. See Switch Statement for more info.
Lists#
Lists are the most important and common data type in Python. A list is simply an ordered collection of items. Think of a vector in mathematics, or a grocery list of items to be purchased. In python, we can create lists using the square brackets [ ]
:
list_a = [1, 2, 3, 4, 5]
print(list_a)
list_b = ["a", "b", "c", "d", "e"]
print(list_b)
[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']
A list can be comprised of any data type, or multiple data types:
list_of_many_types = ["a", 2, False, 4, "am I a list, or an element?", 17.5, True]
print(list_of_many_types)
['a', 2, False, 4, 'am I a list, or an element?', 17.5, True]
Each item in a list is called an element
We can call the len()
operator to determine the number of elements in a list:
len(list_of_many_types)
7
Accessing elements in lists#
We can access any element of a list using the same style of indexing used to access characters in strings (square brackets after the variable name).
Remember - the first element in a list is index 0:
print(list_of_many_types[0])
print(list_of_many_types[1])
print(list_of_many_types[2])
print(list_of_many_types[4])
a
2
False
am I a list, or an element?
We can also index lists negatively, similar to how we were able to access characters in strings:
print(list_of_many_types[-1])
print(list_of_many_types[-2])
True
17.5
Slicing#
We can access a subset of a list using Slicing in the same way that we are able to access substrings:
list_of_many_types = ["a", 2, False, 4, "am I a list, or an element?", 17.5, True]
list_of_many_types[2:5]
[False, 4, 'am I a list, or an element?']
We call the [2:5]
a slice, which also returns a list of with elements at positions 2,3, and 4 in the original list.
We can also slice to presesrve the start or end of the list, by keeping one side of the colon blank:
# Start with third element, go to end.
print(list_of_many_types[2:])
# Extract elements indexed at positions 0, 1, and 2.
list_of_many_types[:3]
[False, 4, 'am I a list, or an element?', 17.5, True]
['a', 2, False]
Let’s say we want to slice a list and select every k
th entry - we can add an extra set of parantheses to the slicing syntax to set the ‘step’:
list_of_squares = [1, 4, 9, 16, 25, 36, 49, 64, 81]
# slice the list starting at 2nd element, ending at 9th element, selecting every 3rd element
list_of_squares[1:9:3]
[4, 25, 64]
List operations#
The +
operator combines lists. This is also called concatenation.
list_one = ["a", 4, "one", 3.4]
list_two = ["b", 6, "two", 7.8]
print(list_one + list_two)
['a', 4, 'one', 3.4, 'b', 6, 'two', 7.8]
The *
operator can be used to repeat lists:
list_one * 2
['a', 4, 'one', 3.4, 'a', 4, 'one', 3.4]
2 * list_two
['b', 6, 'two', 7.8, 'b', 6, 'two', 7.8]
Lists are mutable#
Lists can be modified ‘in place’, meaning that individual elements can be changed
list_one = ["a", 4, "one", 3.4]
print(list_one)
['a', 4, 'one', 3.4]
# assign a new value to index -1, i.e. replace float 3.4 with string "last element"
list_one[-1] = "last element"
print(list_one)
['a', 4, 'one', 'last element']
We can also assign slices:
list_one[1:3] = [2, 3]
print(list_one)
['a', 2, 3, 'last element']
Copying lists#
Let’s attempt to copy a list referenced by variable a
to another variable b
:
a = ["a", "b", "c"] # define list a
b = a # attempt to copy a to b
b[1] = 1 # changing an element in list b
print(b) # confirming the change was made to the second element in list b
print(a) # but HOLD UP! we also modified the second element in list a !
['a', 1, 'c']
['a', 1, 'c']
This is very interesting - we modified list b, but that caused list a to change.
Let’s try another way to copy the list:
a = ["a", "b", "c"]
b = a # first attempt to copy a to b
c = list(a) # second attempt to copy a to c using the list constructor
d = a[:]
b[1] = 1 # now we want to change an element in b
print("a: ", a) # list 'a' got modified (unintentionally).
print("b: ", b) # list 'b' of course got modified (intentionally).
print("c: ", c) # list 'c' got preserved, success!
print("d: ", d) # list 'd' got preserved, another success!
a: ['a', 1, 'c']
b: ['a', 1, 'c']
c: ['a', 'b', 'c']
d: ['a', 'b', 'c']
What do we observe? A list can be copied with b = list(a) or b = a[:]. The second option is a slice including all elements! Why does python work this way?
Python’s data model#
Why do we observe the behavior in the previous example?
Variables in Python are references to objects in memory.
Assignment with the = operator sets the variable to refer to an object.
Here is a simple example:
a = [1, 2, 3, 4]
b = a
b[1] = "modified"
print(a)
print(b)
[1, 'modified', 3, 4]
[1, 'modified', 3, 4]
Notice that list a
and list b
are identical - In this example, we assigned a
to b
via b = a
.
This did not copy the data, it only copied the reference to the list object in memory.
Thus, modifying the list through object b
also changes the data that you will see when accessing object a
.
We can inspect the memory addresses in Python with the id
command:
print("id(a): ", id(a)) # These two variables...
print("id(b): ", id(b)) # ...both refer to the same object.
id(a): 4707089920
id(b): 4707089920
Those numbers refer to memory addresses on the computer.
Copying objects and data#
So how do we copy objects and data generally, without having to call the appropriate constructor every time?
The copy
function in the copy
module is a generic way to copy a list:
import copy
a = [1, 2, 3, "abc"]
b = copy.copy(a)
b[3] = "xyz"
print(b) # Variable b has its last element replaced.
print(a) # Variabla a is unchanged, like we hoped.
[1, 2, 3, 'xyz']
[1, 2, 3, 'abc']
Elements in a list (the individual items) are also references to memory location. For example if your list contains a list, this will happen when using copy.copy():
sublist = [1, 2, 3]
a = [2, "string", sublist]
b = copy.copy(a)
b[2][0] = 5555
print(b) # We modified the nested list...
print(a) # ...which also affected the contents of variable 'a'.
[2, 'string', [5555, 2, 3]]
[2, 'string', [5555, 2, 3]]
What just happened? The element for the sub-list [5555, 2, 3] is actually a memory reference - when we copy the outer list, only references for the contained objects are copied. Therefore, modifying the copy b
modifies the original a
. Thus, we may need the function copy.deepcopy()
:
sublist = [1, 2, 3]
a = [2, "string", sublist]
b = copy.deepcopy(a)
b[2][0] = 5555
print(b) # Variable b had its third element modified.
print(a) # But we can observe here that variable a is unchanged
[2, 'string', [5555, 2, 3]]
[2, 'string', [1, 2, 3]]
Sorting lists#
Sorting lists in python is easy and can be very useful for lots of reasons.
my_list = list(range(10))
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
import random
random.shuffle(my_list)
print(my_list)
[7, 5, 2, 4, 6, 1, 0, 8, 9, 3]
The command random.shuffle
sorts the list in place, meaning the original list is modified. If we want to create a new list that is sorted, we can use the sorted
function:
my_sorted_list = sorted(my_list)
print(my_sorted_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
we can also sort in place using the .sort()
function:
my_list.sort()
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
List operations#
The following is a summary of operations that can be performed on lists:
x
refers to an element, s
refers to a list, and n
refers to any integer
x in s
: returnsTrue
if an item ofs
is equal tox
, elseFalse
x not in s
: returns False if an item ofs
is equal tox
, elseTrue
s + t
, wheret
is another list: the concatenation ofs
andt
.s * n
orn * s
: equivalent to addings
to itselfn
timess[i]
:i
th item ofs
, with start index of 0s[i:j]
: slice ofs
from indexi
toj
s[i:j:k]
: slice ofs
from indexi
to indexj
with stepk
len(s)
: length ofs
min(s)
: smallest item ofs
max(s)
: largest item ofs
s.index(x)
: index of the first occurrence ofx
ins
s.count(x)
: total number of occurrences ofx
ins
s[i] = x
: assign element with indexi
in lists
a new value ofx
In the examples below, x
refers to an element, s
refers to a list, and t
refers to another list
s[i:j] = t
: slice ofs
fromi
toj
is replaced by the contents of another listt
del s[i:j]
: remove elements between indexi
andj
froms
s[i:j] = []
: remove elements between indexi
andj
froms
s[i:j:k] = t
: the elements ofs[i:j:k]
are replaced by those of another listt
del s[i:j:k]
: removes the elements ofs[i:j:k]
from the lists
s.append(x)
: appendsx
to the end of the existing lists
s.clear()
: removes all items froms
(equivalent todel s[:]
)s.copy()
: creates a shallow copy ofs
(equivalent tos[:]
)s.extend(t)
ors += t
: extendss
with the contents of t (for the most part the same ass[len(s):len(s)] = t
)s *= n
: updatess
with its contents repeatedn
timess.insert(i, x)
: insertsx
intos
at the index given byi
(equivalent tos[i:i] = [x]
)s.pop([i])
: retrieves the item ati
and also removes it froms
s.remove(x)
: remove the first item froms
wheres[i] == x
s.reverse()
: reverses the items ofs
in places.sort()
: sorts the items ofs
in place
Lists are the most important and common data type in Python, so understanding their construction and manipulation is important.
The following are helpful resources about lists:
Python documentation, accessible by executing the command
help(list)
help(list)
Help on class list in module builtins:
class list(object)
| list(iterable=(), /)
|
| Built-in mutable sequence.
|
| If no argument is given, the constructor creates a new empty list.
| The argument must be an iterable if specified.
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __contains__(self, key, /)
| Return bool(key in self).
|
| __delitem__(self, key, /)
| Delete self[key].
|
| __eq__(self, value, /)
| Return self==value.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(self, index, /)
| Return self[index].
|
| __gt__(self, value, /)
| Return self>value.
|
| __iadd__(self, value, /)
| Implement self+=value.
|
| __imul__(self, value, /)
| Implement self*=value.
|
| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __iter__(self, /)
| Implement iter(self).
|
| __le__(self, value, /)
| Return self<=value.
|
| __len__(self, /)
| Return len(self).
|
| __lt__(self, value, /)
| Return self<value.
|
| __mul__(self, value, /)
| Return self*value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __repr__(self, /)
| Return repr(self).
|
| __reversed__(self, /)
| Return a reverse iterator over the list.
|
| __rmul__(self, value, /)
| Return value*self.
|
| __setitem__(self, key, value, /)
| Set self[key] to value.
|
| __sizeof__(self, /)
| Return the size of the list in memory, in bytes.
|
| append(self, object, /)
| Append object to the end of the list.
|
| clear(self, /)
| Remove all items from list.
|
| copy(self, /)
| Return a shallow copy of the list.
|
| count(self, value, /)
| Return number of occurrences of value.
|
| extend(self, iterable, /)
| Extend list by appending elements from the iterable.
|
| index(self, value, start=0, stop=9223372036854775807, /)
| Return first index of value.
|
| Raises ValueError if the value is not present.
|
| insert(self, index, object, /)
| Insert object before index.
|
| pop(self, index=-1, /)
| Remove and return item at index (default last).
|
| Raises IndexError if list is empty or index is out of range.
|
| remove(self, value, /)
| Remove first occurrence of value.
|
| Raises ValueError if the value is not present.
|
| reverse(self, /)
| Reverse *IN PLACE*.
|
| sort(self, /, *, key=None, reverse=False)
| Sort the list in ascending order and return None.
|
| The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
| order of two equal elements is maintained).
|
| If a key function is given, apply it once to each list item and sort them,
| ascending or descending, according to their function values.
|
| The reverse flag can be set to sort in descending order.
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| __class_getitem__(object, /)
| See PEP 585
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs)
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __hash__ = None
For loops and while loops#
Loops are another foundational concept in python programming. They are useful for repeating actions.
There are two types of loops: for
and while
.
Let’s say we want to compute the square of each integer between 1 and 8. We can use a for
loop to make this much more efficient than computing each square individually:
for i in range(9):
print(i**2)
0
1
4
9
16
25
36
49
64
the syntax for i in range(n):
will execute the indented code print (i**2)
a total of n
times, with i = 0, 1, 2, ... n-1
An important note on python syntax#
Indents in python code matter, unlike in C, C++, or Java. Indents are actually a form of syntax in python.
Python requires indenting loops and conditionals (we will get to these a bit later). Indenting can be achieved using either one TAB or four SPACEBAR keystrokes. There is a lively debate over which of these is preferred. Text editors now make it easy to switch between tabs an spaces - the important thing is to remain consistent.
The range()
function#
Let’s say you want to make a list of numbers - the range()
function is a handy and convenient way to do that. We can convert a range()
object to a python list by using the list()
constructor:
# Get a range 0,1,2,...,7
print(list(range(8)))
[0, 1, 2, 3, 4, 5, 6, 7]
# Get a range from 5 to 9
print(list(range(5, 10)))
[5, 6, 7, 8, 9]
# Get a range from 4 to 16 with a step of 4
print(list(range(4, 16, 4)))
[4, 8, 12]
For loops and Lists#
We can use a for loop to iterate over items in a list:
my_list = [1, 41.99, True, "this is a string", ["sub", "list", "of", "strings"]]
for item in my_list:
print(item)
1
41.99
True
this is a string
['sub', 'list', 'of', 'strings']
The enumerate()
function#
When iterating through a for loop, if we want to access the index of each item in the list, along with the item, we can use the enumerate()
function:
my_list = [1, 41.99, True, "this is a string", ["sub", "list", "of", "strings"]]
for index, item in enumerate(my_list):
print("index {}: {}".format(index, item))
index 0: 1
index 1: 41.99
index 2: True
index 3: this is a string
index 4: ['sub', 'list', 'of', 'strings']
While loops#
If we don’t know how many iterations are needed, we can use a while
loop:
i = 2
while i < 100: # loop body will only execute if conditional statement is True
print(i**2, end=", ")
i = i**2
4, 16, 256,
This loop prints the values of \(i^2\) and updates the value of \(i\), while i is <100. Notice the loop stops since 256 > 100
The infinite loop#
Beware the infamous infinite loop! Consider the following example, where the condition will never become false:
NOTE to stop this, you will have to press CTRL + c in the interpreter, or the “STOP” button in jupyter lab / notebook
# Uncomment the below lines to run, but beware that this code will execute forever unless stopped.
# while True:
# print("infinite loop")
Nested loops#
We can put loops within loops - these are called ‘nested loops’
Consider the following example:
for i in range(8): # outer loop
for j in range(i): # inner loop
print(j, end=" ") # inner loop body
print(" ") # outer loop body
0
0 1
0 1 2
0 1 2 3
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5 6
Continue and Break#
We can exit a loop, or move to the next iteration if desired.
break
is used to exit the loop immediately, skipping any remaining iterations.continue
is used to skip the current iteration and move to the next one.
# Using break to find prime numbers
max_n = 10
for n in range(2, max_n):
for x in range(2, n):
if n % x == 0: # modulo of n divisible by x is zero
print(n, "equals", x, "*", n / x)
break
else: # executed if no break in for loop
# loop fell through without finding a factor
print(n, "is a prime number")
2 is a prime number
3 is a prime number
4 equals 2 * 2.0
5 is a prime number
6 equals 2 * 3.0
7 is a prime number
8 equals 2 * 4.0
9 equals 3 * 3.0
# Using continue to skip even numbners
for num in range(10):
if num % 2 == 0: # Skip even numbers
continue
print(num)
1
3
5
7
9
An(other) note on loop syntax best practices#
It is bad practice to define a variable inside of a conditional or loop body and then reference it outside:
name = "Nick"
if name == "Nick":
age = 45 # newly created variable
print("Nick's age is {}".format(age))
Nick's age is 45
name = "Bob"
if name == "Nick":
id_number = 45 # also newly created variable
print("{}'s id number is {}".format(name, id_number))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[86], line 5
2 if name == "Nick":
3 id_number = 45 # also newly created variable
----> 5 print("{}'s id number is {}".format(name, id_number))
6 # NameError: name 'id_number' is not defined
NameError: name 'id_number' is not defined
Good practice is to define/initialize variables at the same level they will be used:
name = "Bob"
age = 55
if name == "Nick":
age = 45
print(f"{name}'s age is {age}")
Bob's age is 55
Tuples#
Tuples are essentially immutable lists, meaning they cannot be modified. Elements in a list can be heterogenous.
Use Case:
You have a set of (possibly non-unique or heterogeneous) data that you wish to remain ordered and immutable for the lifetime of the object.
E.g. we may expect that a person’s name should be invariant over the course of a peron’s lifetime, and therefore may choose to represent first and last names via a length-two tuple.
Tuples are denoted by parantheses: tup = (1,2,3)
.#
Create an empty tuple:
empty_tup = ()
, or equivalentlyempty_tup = tuple()
.Create a tuple with some data:
tup = (42, 3.14, "abc")
.
Values within tuples can be of any type.
Properties of Tuples#
Immutability - The number of items in a tuple, or what those fixed number of items reference, are not allowed to change after creation. Put differently, the size and type signature of a tuple are fixed at the time of creation.
Order matters, and is defined at construction time - Since tuples are immutable, their elements are ordered. We can access items via indexing and slicing, similar to lists.
Similrities with Lists: Indexing, Slicing, and Looping#
my_tuple = (1, 2, 3, 4)
print(my_tuple[1]) # Prints the second value
print(my_tuple[1:3]) # Prints the second and third values
2
(2, 3)
Similar to lists, we can also loop through tuples:
for element in my_tuple:
print(element)
1
2
3
4
Immutability:#
# Lists are mutable
my_list = ["I", "am", "a", "list"]
my_list[0] = "I still" # Modifying elements of a list is OK.
print("my_list:", my_list) # --> my_list: ["I still", "am", "a", "list"].
# Tuples are not mutable (immutable)
my_tuple = ("I", "am", "a", "tuple")
my_list: ['I still', 'am', 'a', 'list']
my_tuple[2] = "still a"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[90], line 8
6 # Tuples are not mutable (immutable)
7 my_tuple = ("I", "am", "a", "tuple")
----> 8 my_tuple[2] = "still a"
TypeError: 'tuple' object does not support item assignment
However, keep in mind that if a tuple contains a list element, that can be modified, since lists ARE mutable:
my_tuple = ("I", "am", "a", "tuple", ["containing", "a", "sub", "list"])
my_tuple[4][1] = "a mutable"
print(my_tuple)
('I', 'am', 'a', 'tuple', ['containing', 'a mutable', 'sub', 'list'])
Unpacking Tuples#
Python offers a convenient way to “unpack” tuples.
In tuple unpacking, we can store the elements of a tuple into multiple variables in one line of code:
# Define tuple, and unpack into 3 variables
my_tuple = ("a string", 43, 99.9)
my_str, my_int, my_flt = my_tuple
print(my_str, my_int, my_flt)
# Equivalently:
my_str = my_tuple[0]
my_int = my_tuple[1]
my_flt = my_tuple[2]
print(my_str, my_int, my_flt)
a string 43 99.9
a string 43 99.9
Sets#
In Python, a set is an unordered, mutable collection of unique items.
Unlike a list, a set cannot contain any duplicates. A great use case for a Set would be Checking for Existence of Usernames.
We care about existence, and not duplicity;
testing for existence is “fast”. E.g. When a new-user signs up for a service, we want to check if their proposed username is available instantly.
Sets are constructed with curly braces: my_set = {5, 8, "str", 49.2}
, or with the set()
constructor: my_set = set([1, 2, 3])
We can add items to a set with the .add()
method
# create and update using add method
myclasses = set()
myclasses.add("math")
myclasses.add("chemistry")
myclasses.add("literature")
# create using {} notation
yourclasses = {"physics", "gym", "math"}
Set Operations#
We can easily compute the Union or the Intersection between multiple sets, using the &
and |
characters respectively:
print(myclasses & yourclasses) # Returns only elements in either myclasses AND yourclasses
print(myclasses | yourclasses) # Returns elements found in either myclasses OR yourclasses
{'math'}
{'chemistry', 'literature', 'physics', 'gym', 'math'}
Example: finding unique elements in a list:#
Say we have a ‘shopping basket’ with lots of (non unique) items, we can use set()
to identify the unique elements:
grocery_list = ["apples", "bananas", "yogurt", "apples", "milk", "bananas", "yogurt", "granola", "bread"]
print(set(grocery_list))
{'apples', 'yogurt', 'bananas', 'granola', 'milk', 'bread'}
Set Methods#
we saw that we could append an element to a set with the .add()
method. There are many other useful operations, such as .remove()
,.intersection()
,.difference()
, etc… Have a look at the official documentation for a comprehensive list.
Or, simply type help(set)
into the python interpreter.
Dictionaries#
Dictionaries relate a key
to a value
.
Think of a real-world dictionary of word definitions. They unique key
is a word, the value
associated with the key is the definition of the word.
In Python, this is represented with the built-in dict
or Dictionary type.
They are commonly called ‘maps’, ‘hashes’, ‘key-value pairs’, and ‘associative arrays’, in other programming languages and in mathematics.
Example use case: Phone book / cell phone contacts:#
You want to repeatedly “look up” values that are associated with identifiers. E.g. your contacts list, which might pair a name (the key
) with a phone-number (the value
). Want to look up a friends phone number? Just type their name (the key
) and you get their phone number (the value
) instantly.
Dictionary Syntax:#
Dictionaries can be created using curly braces
{}
, or with thedict()
constructor:Create an empty dictionary:
empty_dict = {}
, or equivalently,empty_dict = dict{}
Create a dictionary with data:
emails={'aakash':'aaahamed@calpoly.edu', 'andrew':'andrew@ajnisbet.com'}
Keys must be immutable - e.g. numbers, strings, or tuples
Values can be any python data type - e.g. lists, strings, or even other dictionaries
Access we can access values associated with a key with square brackets: value = dictionary[key]
# One example to initialize and populate a dictionary
emails = {}
emails["aakash"] = "aaahamed@calpoly.edu"
emails["andrew"] = "andrew@ajnisbet.com"
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}
# Another equivalent example to initialize a dictionary
emails = {"aakash": "aaahamed@calpoly.edu", "andrew": "andrew@ajnisbet.com"}
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}
# Another equivalent example to initialize and populate a dictionary
emails = dict()
emails["aakash"] = "aaahamed@calpoly.edu"
emails["andrew"] = "andrew@ajnisbet.com"
print(emails)
{'aakash': 'aaahamed@calpoly.edu', 'andrew': 'andrew@ajnisbet.com'}
We can easily access a single element of a dictionary by using bracket notation, and placing the key inside the brackets:
emails["andrew"]
'andrew@ajnisbet.com'
Equivalently, we can access a value using the .get(key)
syntax:
emails.get("andrew")
'andrew@ajnisbet.com'
If the key is not contained within our dict
, we receive a KeyError
emails["payman"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[101], line 1
----> 1 emails['payman']
KeyError: 'payman'
Let’s walk through a more comprehensive example to illustrate some of the usefulness of dictionaries:
# Initialize an empty dictionary
movie_years = {} # Equivalently: movie_years = dict()
# Populate dictionary with movie names (keys) and release years (values)
movie_years["Fargo"] = 1996
movie_years["Fellowship of the Ring"] = 2001
movie_years["The Departed"] = 2006
movie_years["Godfather Part 1"] = 1972
movie_years["Dune"] = 2021
movie_years["Monty Python and the Holy Grail"] = 1975
movie_years["2001 a Space Odyssey"] = 1968
movie_years["A Clockwork Orange"] = 1972
print(movie_years)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}
We can get determine the number of keys in the dictionary with the len()
function:
len(movie_years)
8
Dicts are mutable - we can change the value associated with a key:
movie_years["Fargo"] = 2008
print(movie_years)
print("-----" * 20)
movie_years["Fargo"] = 1996
print(movie_years)
{'Fargo': 2008, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}
----------------------------------------------------------------------------------------------------
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968, 'A Clockwork Orange': 1972}
Dicts are mutable - we can also delete a key:value pair if desired by using the del
command and supplying the dict key
to be deleted:
del movie_years["A Clockwork Orange"]
print(movie_years)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}
Accessing Keys and Values#
We can access the dictionary keys and values as a list if desired using the following syntax:
names = movie_years.keys()
print(names)
years = movie_years.values()
print(years)
dict_keys(['Fargo', 'Fellowship of the Ring', 'The Departed', 'Godfather Part 1', 'Dune', 'Monty Python and the Holy Grail', '2001 a Space Odyssey'])
dict_values([1996, 2001, 2006, 1972, 2021, 1975, 1968])
It’s often convenient to convert these to python lists by using the list constructor:
names = list(movie_years.keys())
print(names)
print(type(names))
years = list(movie_years.values())
print(years)
print(type(years))
['Fargo', 'Fellowship of the Ring', 'The Departed', 'Godfather Part 1', 'Dune', 'Monty Python and the Holy Grail', '2001 a Space Odyssey']
<class 'list'>
[1996, 2001, 2006, 1972, 2021, 1975, 1968]
<class 'list'>
If we have two lists of the same length, and want to create a dictionary, using one list as keys, and the other list as values, we can use the zip()
function within the dict()
constructor to achieve this.
another_movie_year_dict = dict(zip(names, years))
print(another_movie_year_dict)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}
Looping Through Dictionaries#
We can loop through the keys of a dictionary, similar to how we can loop through the elements of a list:
for movie in movie_years:
print(movie)
Fargo
Fellowship of the Ring
The Departed
Godfather Part 1
Dune
Monty Python and the Holy Grail
2001 a Space Odyssey
We can also loop through the values:
for year in movie_years.values():
print(year)
1996
2001
2006
1972
2021
1975
1968
If we want to loop through both the keys and values, we can use the .items()
syntax:
for key, value in movie_years.items():
print(f"{key} was released in {value}")
Fargo was released in 1996
Fellowship of the Ring was released in 2001
The Departed was released in 2006
Godfather Part 1 was released in 1972
Dune was released in 2021
Monty Python and the Holy Grail was released in 1975
2001 a Space Odyssey was released in 1968
Dictionaries are ordered#
Since python 3.7, dictionaries are ordered: when looping through a dictionary, the order of the keys/values will be the same order that they were added.
Saving and Loading Dicts#
It is often convenient to write and read dictionaries. This can be achieved easily using the json
module (more on modules later). .json
is also a file format that mirrors dictionary format.
import json
with open("movie_years.json", "w") as fp:
json.dump(movie_years, fp)
Notice that a file with the name movie_years.json
was written to the directory where this command executed.
It is equally easy to load the file we wrote as a dictionary:
with open("movie_years.json", "r") as f:
my_other_movie_dict = json.load(f)
print(my_other_movie_dict)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}
Comprehensions#
Python has a special feature known as ‘comprehensions’, in which operations can be applied to each element in a container (list, dictionary, tuple, or set), without using a for
loop. These expressions are very common in python code.
All comprehensions can be equivalently expressed as for
loops. The key thing to remember is that comprehensions can implement for
loops in a more concise and compact syntax. While this may sometimes be convenient, it can also be less readable than a for
loop.
Consider our movie_years
dictionary. Perhaps we want to increment each of the release years by one year and store the result in a dict called movie_years_updated
. We can achieve this with a for loop:
movie_years_updated = {}
for k, v in movie_years.items():
movie_years_updated[k] = v + 1
print(movie_years)
print(movie_years_updated)
{'Fargo': 1996, 'Fellowship of the Ring': 2001, 'The Departed': 2006, 'Godfather Part 1': 1972, 'Dune': 2021, 'Monty Python and the Holy Grail': 1975, '2001 a Space Odyssey': 1968}
{'Fargo': 1997, 'Fellowship of the Ring': 2002, 'The Departed': 2007, 'Godfather Part 1': 1973, 'Dune': 2022, 'Monty Python and the Holy Grail': 1976, '2001 a Space Odyssey': 1969}
The same result can be achieved with a ‘dictionary comprehension’ with the following:
{k: v + 1 for k, v in movie_years.items()}
{'Fargo': 1997,
'Fellowship of the Ring': 2002,
'The Departed': 2007,
'Godfather Part 1': 1973,
'Dune': 2022,
'Monty Python and the Holy Grail': 1976,
'2001 a Space Odyssey': 1969}
Notice that we use the curly braces {}
to denote a dictionary result. We can ue the regular brackets []
to return a list:
[v + 5 for v in movie_years.values()]
[2001, 2006, 2011, 1977, 2026, 1980, 1973]
Reading and manipulating text files#
Let’s synthesize some of the materials we covered in this chapter, and use python tools to:
open a text file containing movie names and years, using
with open(filename):
loop through the rows, extract the information we want using string operations, and store that information in a python object
write the resulting objects to a file
To open text files, we can use the with open(filename):
syntax:
lines = []
with open("./data/movies.txt", "r", encoding = 'utf-8') as file: # the 'r' means read only
lines = file.readlines() # read each line of the textfile into a list element
file.close() # close the file
We now have each line of the textfile stored as an element in a list called lines
. Let’s loop through the list, and create a dictionary like movie_years
that we can write to a .json
file.
# Print the first 10 lines in the file we read
lines[:10]
['1,Toy Story (1995)\n',
'2,Jumanji (1995)\n',
'3,Grumpier Old Men (1995)\n',
'4,Waiting to Exhale (1995)\n',
'5,Father of the Bride Part II (1995)\n',
'6,Heat (1995)\n',
'7,Sabrina (1995)\n',
'8,Tom and Huck (1995)\n',
'9,Sudden Death (1995)\n',
'10,GoldenEye (1995)\n']
Notice that every line has an index, and a \n
, which is an escape sequence that adds a new line (or carriage return) to text, often called a ‘newline’ character. We want to filter these out.
movie_years_big = {} # create the empty dictionary we will fill up
for idx, line in enumerate(lines): # loop through list of lines read from textfile
# the following line splits each string into a list of strings based on the position of the comma
# and replaces the newline character ("\n"), with nothing ("")
info_we_want = line.split(",")[1].replace("\n", "")
# the following line splits each string into a list of strings based on the position of the open paranthesis
# We select the first element of the list as the title [0], and remove the trailing space with strip()
title = info_we_want.split("(")[0].strip()
# We select the second element of the list as the year [1], and replace the trailing close paranthesis with nothing
year = info_we_want.split("(")[1].replace(")", "")
# Lastly, we populate our dictionary
movie_years_big[title] = year
len(movie_years_big)
8617