Fast Fourier Transform Lab

Written Article

Fast Fourier Transform — A Complete Guide

From naive polynomial multiplication to the Cooley-Tukey algorithm, Number Theoretic Transforms, and practical competitive programming techniques.

1. Introduction

Polynomial multiplication appears constantly in competitive programming — computing convolutions, string matching via correlation, counting lattice paths, and combinatorial enumeration. Given two polynomials $A(x) = \sum_{i=0}^{n-1} a_i x^i$ and $B(x) = \sum_{j=0}^{m-1} b_j x^j$ , their product $C(x)$ has degree $n + m - 2$ and the $k$ -th coefficient is:

c_k = \sum_{j=0}^{k} a_j \, b_{k-j}

Where

$a_j$ j-th coefficient of polynomial A
$b_{k-j}$ (k−j)-th coefficient of polynomial B
$c_k$ k-th coefficient of the product C

This is a convolution sum. The naive algorithm computes every pair $(a_j,\, b_{k-j})$ , costing $O(n^2)$ operations total. The animation below makes that cost concrete — watch every individual product get computed one at a time before accumulating into the result.

Worked Example — naive multiplication — (1 + 2x + 3x²) × (4 + 5x)

ACoefficients of A: [1, 2, 3] →

a_0=1,\ a_1=2,\ a_2=3

BCoefficients of B: [4, 5] →

b_0=4,\ b_1=5

c₀

c_0 = a_0 b_0 = 1 \cdot 4 = 4

c₁

c_1 = a_0 b_1 + a_1 b_0 = 1\cdot5 + 2\cdot4 = 13

c₂

c_2 = a_1 b_1 + a_2 b_0 = 2\cdot5 + 3\cdot4 = 22

c₃

c_3 = a_2 b_1 = 3\cdot5 = 15

result[4, 13, 22, 15] →

4 + 13x + 22x^2 + 15x^3

. Six products, computed explicitly. This is the O(n²) cost.

Interactive — Step through the naive algorithm

Module 1 — The Problem: O(n²) Multiplication

(1 + 2x + 3x²) × (4 + 5x) — watch each pairwise product accumulate

1 / 6

FFT reduces this to $O(n \log n)$ by exploiting the convolution theorem: pointwise multiplication in the frequency domain equals convolution in the time domain. To understand why, we first need the Discrete Fourier Transform.

2. The Discrete Fourier Transform

The Discrete Fourier Transform (DFT) of a length- $n$ sequence evaluates the associated polynomial at the $n$ -th roots of unity — the $n$ complex numbers $\omega_n^0, \omega_n^1, \ldots, \omega_n^{n-1}$ that satisfy $z^n = 1$ . The $k$ -th DFT output is:

X_k \;=\; \sum_{j=0}^{n-1} x_j \cdot \omega_n^{jk}, \qquad k = 0, 1, \ldots, n-1

Where

$n$ length of the input (must be a power of 2 for FFT)
$x_j$ j-th input sample — the j-th polynomial coefficient
$X_k$ k-th output — the polynomial evaluated at ω_n^k
$\omega_n$ e^{2πi/n} — the primitive n-th root of unity
$\omega_n^{jk}$ e^{2πi·jk/n} — the twiddle factor at position (j, k)

The $n$ evaluation points are equally spaced around the unit circle in the complex plane, each at angle $2\pi k / n$ radians. The interactive diagram below makes this concrete — select different values of $n$ and click any point to see its exact complex value.

Interactive — Explore the n-th roots of unity

Module 2 — Roots of Unity

Click a point to see its value. The n-th roots of unity are the evaluation points for DFT.

n =

Click any point to inspect its value

The inverse DFT recovers the original sequence from its frequency representation:

x_j \;=\; \frac{1}{n} \sum_{k=0}^{n-1} X_k \cdot \omega_n^{-jk}

Where

$1/n$ required scaling — always divide by n after the inverse transform
$\omega_n^{-jk}$ e^{-2πi·jk/n} — same roots traversed in the opposite direction

Worked Example — DFT of [1, 1, 0, 0] by hand (n = 4)

setupInput

x = [1, 1, 0, 0]

. The 4th roots of unity are

\omega_4^0=1,\ \omega_4^1=i,\ \omega_4^2=-1,\ \omega_4^3=-i

X₀

X_0 = 1\cdot1 + 1\cdot1 + 0 + 0 = 2

X₁

X_1 = 1\cdot1 + 1\cdot i + 0 + 0 = 1+i

X₂

X_2 = 1\cdot1 + 1\cdot(-1) + 0 + 0 = 0

X₃

X_3 = 1\cdot1 + 1\cdot(-i) + 0 + 0 = 1-i

result

\hat{A} = [2,\ 1+i,\ 0,\ 1-i]

. Notice

X_2 = 0

because

\omega_4^2 = -1

is a root of

1+x

. Try selecting n=4 in the diagram above and clicking

\omega^2

to see why.

3. The Cooley-Tukey FFT Algorithm

Computing the DFT naively from the formula costs $O(n^2)$ — one inner sum per output. The Cooley-Tukey algorithm (1965) achieves $O(n \log n)$ by divide-and-conquer. The key observation is that any polynomial can be split by its even and odd indexed coefficients:

A(x) \;=\; A_{\text{even}}(x^2) \;+\; x \cdot A_{\text{odd}}(x^2)

Where

$A_{\text{even}}(y)$ polynomial from even-indexed coefficients: a₀ + a₂y + a₄y² + …
$A_{\text{odd}}(y)$ polynomial from odd-indexed coefficients: a₁ + a₃y + a₅y² + …
$x^2$ substituting x² evaluates both halves at the same n/2 points

Worked Example — even/odd split on A = [1, 2, 3, 4, 5, 6]

evenIndices 0, 2, 4 →

A_{\text{even}}(y) = 1 + 3y + 5y^2

oddIndices 1, 3, 5 →

A_{\text{odd}}(y) = 2 + 4y + 6y^2

verify

A_{\text{even}}(x^2) + x\cdot A_{\text{odd}}(x^2) = (1+3x^2+5x^4) + x(2+4x^2+6x^4)

= 1+2x+3x^2+4x^3+5x^4+6x^5\ \checkmark

keyEach half has length 3 instead of 6 — the problem is halved. Recurse, then combine with a butterfly step. The recursion tree below shows every level of this splitting for an 8-element input.

Interactive — Cooley-Tukey recursion tree (n = 8)

Module 3 — Cooley-Tukey Recursion Tree

FFT for n=8: recursively splitting even and odd indices until base case.

Depth 0 — full input [0,1,2,3,4,5,6,7]

The key identity $(\omega_n^k)^2 = \omega_{n/2}^k$ means both halves of the split share the same $n/2$ evaluation points. So the results of the two recursive calls can be reused when combining. This combine step — the butterfly — is the heart of the FFT:

X[k] \;=\; E[k] \;+\; \omega_n^k \cdot O[k]

X\!\left[k + \tfrac{n}{2}\right] \;=\; E[k] \;-\; \omega_n^k \cdot O[k]

Where

$E[k]$ k-th output of the FFT of the even half
$O[k]$ k-th output of the FFT of the odd half
$\omega_n^k$ twiddle factor — rotates O[k] before combining
$X[k]$ upper butterfly output (k-th frequency bin)
$X[k+n/2]$ lower butterfly output ((k+n/2)-th frequency bin)

One butterfly computes two outputs from two inputs in $O(1)$ . There are $n/2$ butterflies per level and $\log_2 n$ levels in the tree, giving $T(n) = 2T(n/2) + O(n) = O(n \log n)$ .

Worked Example — one butterfly pass — n = 4, input [1, 1, 0, 0]

setupAfter recursing:

E = \text{FFT}([1,0]) = [1,1]

O = \text{FFT}([1,0]) = [1,1]

. Twiddle factors:

\omega_4^0=1,\ \omega_4^1=i

k=0↑

X[0] = E[0] + \omega_4^0\cdot O[0] = 1 + 1 = 2

k=0↓

X[2] = E[0] - \omega_4^0\cdot O[0] = 1 - 1 = 0

k=1↑

X[1] = E[1] + \omega_4^1\cdot O[1] = 1 + i = 1+i

k=1↓

X[3] = E[1] - \omega_4^1\cdot O[1] = 1 - i = 1-i

result

[2,\ 1+i,\ 0,\ 1-i]

— matches the by-hand DFT from Section 2. Both outputs at each k reuse the same E[k] and O[k], computed only once.

4. Polynomial Multiplication via FFT

The convolution theorem connects the DFT to polynomial multiplication:

(a \ast b) \;=\; \mathcal{F}^{-1}\!\left(\mathcal{F}(a) \cdot \mathcal{F}(b)\right)

Where

$\mathcal{F}(a)$ the DFT (forward FFT) of coefficient array a
$\cdot$ pointwise (element-by-element) multiplication
$\mathcal{F}^{-1}$ the inverse DFT (IFFT)
$a \ast b$ the convolution of a and b — the product polynomial's coefficients

Five steps to multiply two polynomials with FFT:

Pad to power-of-2 size. Result degree is $\deg A + \deg B$ , so pad both to the next power of 2 ≥ $\deg A + \deg B + 1$ .
FFT both inputs. Compute $\hat{A} = \text{FFT}(A)$ and $\hat{B} = \text{FFT}(B)$ in $O(n \log n)$ each.
Pointwise multiply. $\hat{C}[k] = \hat{A}[k] \cdot \hat{B}[k]$ for each $k$ — this is $O(n)$ .
Inverse FFT. $C = \text{IFFT}(\hat{C})$ in $O(n \log n)$ .
Divide by n. The IFFT includes a $1/n$ factor — apply it to every coefficient.

Worked Example — full FFT multiplication — [1, 2, 3] × [4, 5]

padProduct has 4 coefficients. Pad to n=4:

A=[1,2,3,0],\ B=[4,5,0,0]

FFT A

\hat{A} = [6,\ -2+2i,\ -2,\ -2-2i]

FFT B

\hat{B} = [9,\ 4-5i,\ -1,\ 4+5i]

pointwise ×

\hat{C} = [54,\ 2+18i,\ 2,\ 2-18i]

IFFT ÷ 4

C = [4,\ 13,\ 22,\ 15]

result[4, 13, 22, 15] — same answer as the naive calculation from Section 1, but the FFT approach used

O(n \log n)

work instead of

O(n^2)

5. Competitive Programming Notes

▸Floating-point precision. Standard FFT uses complex<double>. Rounding errors corrupt results when coefficients are large. Use long double or switch to NTT for exact integer arithmetic.
▸Iterative bit-reversal FFT. Recursive Cooley-Tukey has large constant factors. Production implementations use an iterative bottom-up approach with a bit-reversal permutation to reorder inputs in-place, improving cache performance significantly. This is what the Trace page demonstrates.
▸Array padding. Always pad to the next power of 2 ≥ $n + m - 1$ . Under-padding causes circular aliasing — high-degree coefficients wrap around and corrupt lower ones.
▸NTT for integer problems. When answers must be computed modulo a prime, use the Number Theoretic Transform. It replaces complex roots of unity with modular primitive roots, giving exact integer results with no floating-point error. The most common modulus is 998244353.
▸Crossover point. Naive multiplication is faster for small polynomials (roughly $n \lesssim 64$ ) due to FFT's higher constant factor.

Deep dive — Number Theoretic Transform (NTT)

Module 4 — Number Theoretic Transform (NTT)

NTT is a variant of FFT that works over modular arithmetic instead of complex numbers. Instead of complex roots of unity, we use primitive roots of a prime modulus — giving exact integer results with no floating-point precision issues.

Most common NTT prime

998244353 = 119 × 2²³ + 1

A Fermat prime with primitive root g = 3. Supports transforms up to size 2²³ ≈ 8 million.

Primitive root of unity in NTT:

\omega = g^{(p-1)/n} \bmod p

where p = 998244353, g = 3, n = transform length (power of 2)

Use cases in competitive programming

▸Polynomial multiplication with integer coefficients
▸Counting problems requiring modular arithmetic
▸String convolution and pattern matching
▸Subset sum convolution (AND-convolution, OR-convolution)

NTT vs FFT — when to choose:

Use NTT when

• Coefficients are integers
• Answer needs to be mod p
• Precision matters

Use FFT when

• Coefficients are real/complex
• Need arbitrary modulus
• Signal processing tasks