Big O Notation, Part 2
Prework
In preparation for the Big O Notation (Part 2) class, please review the following Big O Notation class from Mod 3. It’s not necessary to watch the entire video, but it could be helpful to skim some of the highlights including the differences between Time Complexity and Space complexity and some of the common notation equations.
Learning Goals
- Review Big O and the differences between time and space complexity
- Review some common Big O Notation equations while adding a couple more
- Practice a few technical challenges at a high level and calculating the Big O Notation
Vocabulary
Algorithm
A set of rules and processes for solving a problemBig-O Notation
A description of the worst-case performance of an algorithm
Introduction
We’re going to be reviewing some of the Big O Notation lesson from Mod 3, and examining some additional terminology and working through some additional interview challenges to practice high-level design and Big O Analysis.
Big O Notation is one of those topics where you’ll stand out from other candidates applying for a job, especially if you can VOLUNTEER Big O Notation terminology during an interview without being prompted.
Warm Up (Breakout Groups)
- What is Big O? What is it describing?
- Explain the differences between “time complexity” and “space complexity”.
- Which optimization should be prioritized?
Reviewing The Above
What is Big O and what does it describe?
- Big O notation is used in Computer Science to describe the performance or complexity of an algorithm.
- Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.
What is Time Complexity?
- The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input (ie n).
What is Space Complexity?
- The space complexity of an algorithm quantifies the amount of space or memory taken by an algorithm to run as a function of the length of the input.
- This is NOT impacted by variables or “transient” data structures like array slices, but rather when you intentionally build a new data structure like an object/hash or an array.
- Generally with use a “modifier” notation like
1x
or2x
where1x
represents that we did not use any more memory than the data structure provided to us.2X
would indicate that we’ve added a second data structure of some kind. Which data structure, and what it holds, is less relevant.
Which optimization should be prioritized?
- There is rarely a perfect solution as there may often between trade offs between time and space complexity. Most of the time you will want to optimize for time complexity, but it is something to confirm especially during an interview.
In Your Notebook
Review the name of each of the following Big O Notation equations, and one example of use within an algorithm:
- O(1)
- O(log n)
- O(n)
- O(n log n)
- O(n^2) (aka, n-squared)
Write down any questions you’d like to ask when we come back together as a larger class.
Reviewing Big O Notation Equations
Key Differences:
Big O Notation Equation | Name | Explanation | Use Cases |
---|---|---|---|
O(1) | Constant Time | No matter how much data there is, it only takes one operation. | - Simple math operations - Invoking a function - Accessing a value in an array by index position - Accessing a property in an object/hash by its attribute/key |
O(log n) | Logarithmic Time | Every step/action you take, you cut out half of the work. | - A common example of this is a binary tree |
O(n) | Linear Time | Complexity grows in direct proportion to the size of the input data. | - Iteration over an array or the keys of an object/hash with a for loop, while loop, or methods like map or forEach |
O(n log n) | Linearithmic Time | Often is a nested iteration where within each operation done in linear time, there are actions being done in logarithmic time over the same size of data | - Efficient sorting algorithms such as merge sort - Some searching algorithms |
O(n^2) | Quadratic Time | Nested loops over the same or similarly sized dataset | - Some inefficient sorting algorithms such as bubble sort - Anytime you have nested loops or iteration methods |
What are the factors in determining complexity?
- Number of comparisons or operations
- Amount of space used in memory
The number of elements in a data structure are counted, and generally use “n” as an algebraic replacement for the actual number. So if we have 100 elements in an array or 1,000,000, we substitute “n” for the count of those elements when determining how much work our algorithm is doing.
Why does this matter? Take a look at the differences in the number of operations taken when dealing with the same amount of data in the chart below:
| O( log n ) | O( n ) | O( n^2 ) | O( 2^n )
------------------------------------------------------------
100 | 2 | 100 | 10,000 | 1.27E+30
------------------------------------------------------------
1000 | 3 | 1,000 | 1,000,000 | 1.07E+301
------------------------------------------------------------
10000 | 4 | 10,000 | 1.00E+08 | 1.07E+3010
------------------------------------------------------------
Your Turn
Let’s practice solving a problem at a high level and try calculating the Big O Notation of your algorithm.
Problem:
- Suppose you were to write a function that takes two parameters, a target and a payload.
- The function should search through the payload of positive/negative integer values to find the first two numbers that add together to equal the target.
- If found, return an array of those two values from the function. If no values are found, return an empty array.
Test Case:
findMatchingPair([1, 2, 3, -3, 7], 10) => [3, 7]
- Focus on pseudo code and the logic for this solution. No need to write code!
- From there, using the chart above, try to calculate the Big O Notation of your solution. (Are there any iterators being used? Are those iterations nested? Have you created a new data structure?)
Note
There are a couple of solutions to the problem above that have various pros and cons. Don’t worry about trying to get the perfect solution. Instead, practice breaking down the problem and calculating what the pros and cons of that solution might be.
Solution A
// iterate through the array of numbers O(n)
// for each number, iterate through the remainder of the array O(n)
// calculate the sum to see if it is equal to the target O(1)
// if true, return array with num[i] and num[j] O(1)
// return [] O (1)
// Time Complexity: O(n^2) Quadratic
// Space Complexity: 1x
One potential solution is to use nested iterations on the array and calculate the sum of the numbers to see if they equal the target. Take note that we are not trying to write out code, we just want enough to hone in on the logic. Based on that, we can calculate that the time complexity is O(n^2) because of nested iterations. However, our space complexity is 1x due to use not having to create a new data structure.
Solution B
// iterate through numbers and create object/hash with each num as a key O(n)
// iterate through keys and calculate the difference (target - num) O(n)
// check if the difference is a key on the payload and that the current number is different from the difference O(1)
// if true, return [currentNum, diff]
// return []
// Time Complexity: O(n) Linear
// Space Complexity: 2x
Although there are multiple iterations happening here, none of them are nested keeping the time complexity as linear, O(n). However, because we want to create an object/hash this will increase our space complexity to 2x.
New Terminology
O(n * m) - This doesn’t have a good name (yet)
When we talk about quantities of numbers in a data set, we use “n” to describe the count. But what if we need to do an operation on one data set which is extremely large, against another data set which is extremely small? Should we use “n” in both cases?
Let’s use “n” for our largest set, and “m” for our smallest set, and compare using the following prompt:
A list of teachers is responsible for grading a number of students. Generate a list of students for each teacher.
set teacher_list = ['Alex', 'Eric', 'Megan', 'Casey']
set student_list = ['Able', 'Becca', 'Carlos', 'Demarcus', 'Edwin', 'Francesca', 'Gary', ...]
set teacher_index to 0
set last_teacher_index to -1
for each "student" in student_list:
if teacher_index is not equal to last_teacher_index:
print "---\n list of students for " + teacher_list[teacher_index]
set last_teacher_index = teacher_index
set teacher_index = teacher_index + 1
print student
If the number of teachers is 10, and the number of students is 100, those sizes aren’t too different. In this case, we could say the algorithm is O(n^2)
.
However, let’s say there are only 10 teachers for 10,000 students. Now the numbers are significantly different, by several “orders of magnitude”. In this case, we would use “m” to represent our smaller data set, the teachers, and classify this algorithm as O(n * m) – which is more than Linear Time (O(n)
), but less than Quadradic Time (O(n^2)
)
O(2^n) - Exponential Time
For each new item, “n”, the growth of work doubles.
The most common algorithm for this is the Fibonacci Sequence, where the next number in the sequence is the sum of the preceding two numbers. This list always starts with [0, 1] and grows to look like [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …]
This algorithm is typically built using recursion where we want to find the “N-th” number in the sequence. For example, what is the 30th number in the Fibonacci sequence? (the answer is 832,040)
function fibonacci(nth_number):
if n is less than 0:
print "ruh-roh, invalid input"
else if n === 0:
return 0
else if n === 1:
return 1
else:
return fibonacci(nth_number - 1) + fibonacci(nth_number - 2)
In this sequence, each new “N-th” number we want to find needs TWO function calls to the fibonacci()
method to calculate the answer.
More Exercises
Your Turn
Let’s practice another problem at high level and try calculating the Big O Notation of your algorithm.
Problem:
- Given an n x n array, write a function that returns the array elements arranged from outermost elements to the middle element, traveling clockwise.
Test Case:
const arrayMatrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
];
snail(arrayMatrix) #=> [1, 2, 3, 6, 9, 8, 7, 4, 5]
Note
Although this could by a 10x10 array or even a 1000x1000 array, always focus on the easiest/smallest test case scenario first and then expand to larger and more complex datasets.
Potential Solution
// while there are values in the matrix O(n) linear
// steal the first row
// ^^^^^^^^
// spread the result, and shift out the top array (spread) O(n) linear
// steal the right items
// ^^^^^^^^
// iterate through length of matrix O(n)
// Pop the last item off each array & push into result
// steal the bottom row
// ^^^^^^^^
// Pop the bottom array
// spread the result, and spread bottom array reversed O(n)
// steal the left items
// ^^^^^^^^
// Iterate through length of matrix - 1 O(n)
// Shift the first item off each array and push into result
// return result
// Time Complexity: O(n2) Quadratic
// Space Complexity: 2x
One solution might involve spiraling around the array clockwise using nested iterations. As a result the time complexity would be Quadratic, because we would need to spiral until there were no more numbers left in the matrix. Because I am creating a new array to keep track of the list of numbers, the space complexity is increased to 2x.