Written Article
Fast Fourier Transform — A Complete Guide
From naive polynomial multiplication to the Cooley-Tukey algorithm, Number Theoretic Transforms, and practical competitive programming techniques.
1. Introduction
Polynomial multiplication appears constantly in competitive programming — computing convolutions, string matching via correlation, counting lattice paths, and combinatorial enumeration. Given two polynomials and , their product has degree and the -th coefficient is:
Where
- j-th coefficient of polynomial A
- (k−j)-th coefficient of polynomial B
- k-th coefficient of the product C
This is a convolution sum. The naive algorithm computes every pair , costing operations total. The animation below makes that cost concrete — watch every individual product get computed one at a time before accumulating into the result.
Worked Example — naive multiplication — (1 + 2x + 3x²) × (4 + 5x)
[1, 2, 3] → [4, 5] → [4, 13, 22, 15] → . Six products, computed explicitly. This is the O(n²) cost.Interactive — Step through the naive algorithm
Module 1 — The Problem: O(n²) Multiplication
(1 + 2x + 3x²) × (4 + 5x) — watch each pairwise product accumulate
FFT reduces this to by exploiting the convolution theorem: pointwise multiplication in the frequency domain equals convolution in the time domain. To understand why, we first need the Discrete Fourier Transform.
2. The Discrete Fourier Transform
The Discrete Fourier Transform (DFT) of a length- sequence evaluates the associated polynomial at the -th roots of unity — the complex numbers that satisfy . The -th DFT output is:
Where
- length of the input (must be a power of 2 for FFT)
- j-th input sample — the j-th polynomial coefficient
- k-th output — the polynomial evaluated at ω_n^k
- e^{2πi/n} — the primitive n-th root of unity
- e^{2πi·jk/n} — the twiddle factor at position (j, k)
The evaluation points are equally spaced around the unit circle in the complex plane, each at angle radians. The interactive diagram below makes this concrete — select different values of and click any point to see its exact complex value.
Interactive — Explore the n-th roots of unity
Module 2 — Roots of Unity
Click a point to see its value. The n-th roots of unity are the evaluation points for DFT.
Click any point to inspect its value
The inverse DFT recovers the original sequence from its frequency representation:
Where
- required scaling — always divide by n after the inverse transform
- e^{-2πi·jk/n} — same roots traversed in the opposite direction
Worked Example — DFT of [1, 1, 0, 0] by hand (n = 4)
3. The Cooley-Tukey FFT Algorithm
Computing the DFT naively from the formula costs — one inner sum per output. The Cooley-Tukey algorithm (1965) achieves by divide-and-conquer. The key observation is that any polynomial can be split by its even and odd indexed coefficients:
Where
- polynomial from even-indexed coefficients: a₀ + a₂y + a₄y² + …
- polynomial from odd-indexed coefficients: a₁ + a₃y + a₅y² + …
- substituting x² evaluates both halves at the same n/2 points
Worked Example — even/odd split on A = [1, 2, 3, 4, 5, 6]
Interactive — Cooley-Tukey recursion tree (n = 8)
Module 3 — Cooley-Tukey Recursion Tree
FFT for n=8: recursively splitting even and odd indices until base case.
The key identity means both halves of the split share the same evaluation points. So the results of the two recursive calls can be reused when combining. This combine step — the butterfly — is the heart of the FFT:
Where
- k-th output of the FFT of the even half
- k-th output of the FFT of the odd half
- twiddle factor — rotates O[k] before combining
- upper butterfly output (k-th frequency bin)
- lower butterfly output ((k+n/2)-th frequency bin)
One butterfly computes two outputs from two inputs in . There are butterflies per level and levels in the tree, giving .
Worked Example — one butterfly pass — n = 4, input [1, 1, 0, 0]
4. Polynomial Multiplication via FFT
The convolution theorem connects the DFT to polynomial multiplication:
Where
- the DFT (forward FFT) of coefficient array a
- pointwise (element-by-element) multiplication
- the inverse DFT (IFFT)
- the convolution of a and b — the product polynomial's coefficients
Five steps to multiply two polynomials with FFT:
- Pad to power-of-2 size. Result degree is , so pad both to the next power of 2 ≥ .
- FFT both inputs. Compute and in each.
- Pointwise multiply. for each — this is .
- Inverse FFT. in .
- Divide by n. The IFFT includes a factor — apply it to every coefficient.
Worked Example — full FFT multiplication — [1, 2, 3] × [4, 5]
[4, 13, 22, 15] — same answer as the naive calculation from Section 1, but the FFT approach used work instead of .5. Competitive Programming Notes
- ▸Floating-point precision. Standard FFT uses
complex<double>. Rounding errors corrupt results when coefficients are large. Uselong doubleor switch to NTT for exact integer arithmetic. - ▸Iterative bit-reversal FFT. Recursive Cooley-Tukey has large constant factors. Production implementations use an iterative bottom-up approach with a bit-reversal permutation to reorder inputs in-place, improving cache performance significantly. This is what the Trace page demonstrates.
- ▸Array padding. Always pad to the next power of 2 ≥ . Under-padding causes circular aliasing — high-degree coefficients wrap around and corrupt lower ones.
- ▸NTT for integer problems. When answers must be computed modulo a prime, use the Number Theoretic Transform. It replaces complex roots of unity with modular primitive roots, giving exact integer results with no floating-point error. The most common modulus is 998244353.
- ▸Crossover point. Naive multiplication is faster for small polynomials (roughly ) due to FFT's higher constant factor.
Deep dive — Number Theoretic Transform (NTT)
Module 4 — Number Theoretic Transform (NTT)
NTT is a variant of FFT that works over modular arithmetic instead of complex numbers. Instead of complex roots of unity, we use primitive roots of a prime modulus — giving exact integer results with no floating-point precision issues.
Most common NTT prime
998244353 = 119 × 2²³ + 1
A Fermat prime with primitive root g = 3. Supports transforms up to size 2²³ ≈ 8 million.
Primitive root of unity in NTT:
where p = 998244353, g = 3, n = transform length (power of 2)
Use cases in competitive programming
- ▸Polynomial multiplication with integer coefficients
- ▸Counting problems requiring modular arithmetic
- ▸String convolution and pattern matching
- ▸Subset sum convolution (AND-convolution, OR-convolution)
NTT vs FFT — when to choose:
Use NTT when
- • Coefficients are integers
- • Answer needs to be mod p
- • Precision matters
Use FFT when
- • Coefficients are real/complex
- • Need arbitrary modulus
- • Signal processing tasks