Big O Notation

Big O is a notation used to describe the computational complexity of an algorithm.

The computational complexity of an algorithm is split into two parts: time complexity and space complexity.

  1. The time complexity of an algorithm is the amount of time the algorithm needs to run relative to the input size.
  2. The space complexity of an algorithm is the amount of memory allocated by the algorithm when run relative to the input size.

Complexity is described by a function. The arguments are defined by the programmer, but they should cover all relevant variables that would change when the input changes. The most common variable you’ll see is $n$, which usually denotes the length of an input array or string.

  • In the context of interviews, there are some common assumptions that we make. For example: when dealing with integers, the larger the value, the more time operations like addition, multiplication, or printing will take. While this is relevant in theory, we typically ignore this fact because the difference is practically very small. If you are given an array of integers as an input, the only variable you would use is $n$ to denote the length of the array.
  • When written, the function is wrapped by a capital O:
    • $O(n \cdot logn)$
      • It’s reasonably fast for most problems and also the time complexity of efficient sorting algorithms.
    • $O(logn)$
      • It’s called logarithmic time and is extremely fast. It means that somewhere in the algorithm, the input is being reduced by a percentage at every step.
      • Typically, the base of the logarithm will be $2$. However, the base of the logarithm doesn’t actually matter for big O, since all logarithms are related by a constant factor.
        1. $a^x = n \Rightarrow log_an = x$, $a^{log_an} = n$
        2. $b^y = n \Rightarrow log_bn = y$
        3. $log_an = log_a(b^y) = ylog_ab = log_bn \cdot log_ab$
        4. $log_bn = \frac{1}{log_ab} \cdot log_an$
    • $O(n)$
    • $O(n^2)$
    • $O(2^n)$
    • $O(n \cdot m)$
    • $O(1)$
      • It’s the best complexity possible, called “constant time” or “constant space”. It means that the algorithm always uses the same amount of resources, regardless of the input. It doesn’t necessarily mean that an algorithm is fast, just that its runtime is independent of the input size.

These functions describe how the amount of operations/memory needed by the algorithm grows as the arguments tend to infinity. Because the variables are tending to infinity, constants are always ignored.

  • $O(99n) = O(n) = O(\frac{n}{100})$
  • $O(2^n+n^2-500n) = O(2^n)$
    • As $n$ tends to infinity, $2^n$ completely dominates the other two terms(项).

The focus is on how an algorithm scales with the input size.

When talking about complexity, there are normally three cases: best case scenario, average case, and worst case scenario. In most algorithms, all three of these will be equal, but some algorithms will have them differ.

  • Never choose the best case scenario. It is most correct to use the worst case scenario.

Analyzing time complexity

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Given integer arrays "arr" with length n and "arr2" with length m,
for (int num: arr) {
    print(num)
}
for (int num: arr) {
    print(num)
}
for (int num: arr2) {
    print(num)
}

This algorithm has a time complexity of $O(2n+m) = O(n+m)$.

1
2
3
4
5
6
// Given an integer array "arr" with length n,
for (int i = 0; i < arr.length; i++) {
    for (int j = i; j < arr.length; j++) {
        print(arr[i] + arr[j])
    }
}

This algorithm has a time complexity of $O(n^2)$. $n + (n-1) + (n-2) + … + 1 = \frac{n(n+1)}{2} = O(n^2)$.

Analyzing space complexity

We never count the space used by the input, and usually don’t count the space used by the output (the answer).

  • It is bad practice to modify the input
1
2
3
4
5
6
7
// Given integer arrays "arr" with length n and "arr2" with length m,
Array grid = int[n][m]
for (int i = 0; i < arr.length; i++) {
    for (int j = 0; j < arr2.length; j++) {
        grid[i][j] = arr[i] * arr2[j]
    }
}

This algorithm has a space complexity of $O(n \cdot m)$.

Tilde Approximations

Tilde notation (~) allows us to work with tilde approximations, where we throw away low-order terms that complicate formulas and represent a negligible contribution to values of interest: We write $~f(N)$ to represent any function that, when divided by $f(N)$, approaches $1$ as $N$ grows, and we write $g(N) \sim{f(N)}$ to indicate that $g(N)/f(N)$ approaches $1$ as $N$ grows.

functiontilde approximationorder of growth
$N^3/6 - N^2/2 + N/3$$\sim{N^3/6}$$N^3$
$N^2/2 - N/2$$\sim{N^2/2}$N^2
$lgN + 1$$\sim{lgN}$$lgN$
$3$$\sim{3}$$1$

Most often, we work with tilde approximations of the form $g(N) \sim{af(N)}$ where $f(N) = N^b(logN)^c$ with $a$, $b$, and $c$ constants and refer to $f(N)$ as the order of growth of $g(N)$.

  • When using the logarithm in the order of growth, we generally do not specify the base, since the constant a can absorb that detail.
descriptionfunctionconstants
$1$constant$b = 0, c = 0$
logarithmic$logN$$b = 0, c = 1$
linear$N$$b = 1, c = 0$
linearithmic$NlogN$$b = 1, c = 1$
quadratic$N^2$$b = 2, c = 0$
cubic$N^3$$b = 3, c = 0$
exponential$2^N$not applicable

https://algs4.cs.princeton.edu/home/