#
CMU 15-112: Fundamentals of Programming and Computer Science

Class Notes: Sets

**Quick Example**

s = set([2,3,5]) print(3 in s) # prints True print(4 in s) # prints False for x in range(7): if (x not in s): print(x) # prints 0 1 4 6**Creating Sets**

**Create an empty set**s = set() print(s) # prints set()**Create a set from a list**s = set(["cat", "cow", "dog"]) print(s) # prints {'cow', 'dog', 'cat'}**Create a set from any iterable object**s = set("wahoo") print(s) # surprised?**Create a statically-allocated set**s = { 2, 3, 5 } print(s) # prints { 2, 3, 5 }**Caution: { } is not an empty set!**s = { } print(type(s) == set) # False! print(type(s)) # This is a dict (we'll learn about those soon)

**Properties of Sets****Sets are Unordered**

s = set([2,4,8]) print(s) # prints {8, 2, 4} in standard Python (though not in brython) for element in s: # prints 8, 2, 4 print(element)**Elements are Unique**

s = set([2,2,2]) print(s) # prints {2} print(len(s)) # prints 1**Elements Must Be Immutable**

a = ["lists", "are", "mutable"] s = set([a]) # TypeError: unhashable type: 'list' print(s)

**Another example:**s1 = set(["sets", "are", "mutable", "too"]) s2 = set([s1]) # TypeError: unhashable type: 'set' print(s)**Sets are Very Efficient**

Sets are much more efficient than lists. It's possible to check whether an element appears in a set in constant time, while it takes more time to find an element in a list based on how large the list is. This is accomplished with an approach called**hashing**.

A hash function takes a value as input and returns an integer. The function should return the same integer each time its called on a given value, and should generally return different integers for different values, though that does not always need to be the case. We actually don't need to build the hash function ourselves; python has one already, a built-in function called**hash**.

The computer stores items in a set by creating a list of some length n, then choosing indexes in the list where each element in the set can be stored. These indexes are calculated as**hash(element) % n**. Then, when we need to check whether an item exists in a set, we don't need to check every possible index; we only need to check the one computed by our formula!

A practical example of how sets are faster than lists is shown below:

# 0. Preliminaries import time n = 1000 # 1. Create a list [2,4,6,...,n] then check for membership # among [1,2,3,...,n] in that list. # don't count the list creation in the timing a = list(range(2,n+1,2)) print("Using a list... ", end="") start = time.time() count = 0 for x in range(n+1): if x in a: count += 1 end = time.time() elapsed1 = end - start print("count=", count," and time = %0.4f seconds" % elapsed1) # 2. Repeat, using a set print("Using a set.... ", end="") start = time.time() s = set(a) count = 0 for x in range(n+1): if x in s: count += 1 end = time.time() elapsed2 = end - start print("count=", count," and time = %0.4f seconds" % elapsed2) print("With n=%d, sets ran about %0.1f times faster than lists!" % (n, elapsed1/elapsed2)) print("Try a larger n to see an even greater savings!")

**Set Operations**

Set operations are provided via operators, functions, and methods in Python as follows:**Operations on a set**

**Operation****Result****Example**len(s) cardinality (size) of set s s = { 2, 3, 2, 4, 3 } print(len(s))s.copy() new set with a shallow copy of s s = { 1, 2, 3 } t = s.copy() s.add(4) print(s) print(t)s.clear() remove all elements from set s s = { 1, 2, 3 } s.clear() print(s, len(s))**Operations on a set and an element**

**Operation****Result****Example**x in s test x for membership in s s = { 1, 2, 3 } print(0 in s) print(1 in s)x not in s test x for non-membership in s s = { 1, 2, 3 } print(0 not in s) print(1 not in s)s.add(x) add element x to set s s = { 1, 2, 3 } print(s, 4 in s) s.add(4) print(s, 4 in s)s.remove(x) remove x from set s; raises KeyError if not present s = { 1, 2, 3 } print(s, 3 in s) s.remove(3) print(s, 3 in s) s.remove(3) # crashess.discard(x) removes x from set s if present s = { 1, 2, 3 } print(s, 3 in s) s.discard(3) print(s, 3 in s) s.discard(3) # does not crash! print(s, 3 in s)**Operations on two sets (or a set and an iterable)**

**Operation****Equivalent****Result****Example**s.issubset(t) s <= t test whether every element in s is in t print({1,2} <= {1}, {1,2}.issubset({1})) print({1,2} <= {1,2}, {1,2}.issubset({1,2})) print({1,2} <= {1,2,3}, {1,2}.issubset({1,2,3}))s.issuperset(t) s >= t test whether every element in t is in s print({1,2} >= {1}, {1,2}.issuperset({1})) print({1,2} >= {1,2}, {1,2}.issuperset({1,2})) print({1,2} >= {1,2,3}, {1,2}.issuperset({1,2,3}))s.union(t) s | t new set with elements from both s and t print({1,2} | {1}, {1,2}.union({1})) print({1,2} | {1,3}, {1,2}.union({1,3})) s = {1,2} t = s | {1,3} print(s, t)s.intersection(t) s & t new set with elements common to s and t print({1,2} & {1}, {1,2}.intersection({1})) print({1,2} & {1,3}, {1,2}.intersection({1,3})) s = {1,2} t = s & {1,3} print(s, t)s.difference(t) s - t new set with elements in s but not in t print({1,2} - {1}, {1,2}.difference({1})) print({1,2} - {1,3}, {1,2}.difference({1,3})) s = {1,2} t = s - {1,3} print(s, t)s.update(t) s |= t modify s adding all elements found in t s = {1,2} t = {1,3} u = {2,3} s.update(u) t |= u print(s, t, u)

**Some Worked Examples Using Sets**