Learn

Fast Fourier Transform

A guided walkthrough from naive polynomial multiplication through the Cooley-Tukey algorithm and NTT, with interactive visualizations embedded at each step.

Written Article

Fast Fourier Transform — A Complete Guide

From naive polynomial multiplication to the Cooley-Tukey algorithm, Number Theoretic Transforms, and practical competitive programming techniques.

1. Introduction

Polynomial multiplication appears constantly in competitive programming — computing convolutions, string matching via correlation, counting lattice paths, and combinatorial enumeration. Given two polynomials A(x)=i=0n1aixiA(x) = \sum_{i=0}^{n-1} a_i x^i and B(x)=j=0m1bjxjB(x) = \sum_{j=0}^{m-1} b_j x^j, their product C(x)C(x) has degree n+m2n + m - 2 and the kk-th coefficient is:

ck=j=0kajbkjc_k = \sum_{j=0}^{k} a_j \, b_{k-j}

Where

  • aja_jj-th coefficient of polynomial A
  • bkjb_{k-j}(k−j)-th coefficient of polynomial B
  • ckc_kk-th coefficient of the product C

This is a convolution sum. The naive algorithm computes every pair (aj,bkj)(a_j,\, b_{k-j}), costing O(n2)O(n^2) operations total. The animation below makes that cost concrete — watch every individual product get computed one at a time before accumulating into the result.

Worked Example — naive multiplication — (1 + 2x + 3x²) × (4 + 5x)

ACoefficients of A: [1, 2, 3] → a0=1, a1=2, a2=3a_0=1,\ a_1=2,\ a_2=3
BCoefficients of B: [4, 5] → b0=4, b1=5b_0=4,\ b_1=5
c₀c0=a0b0=14=4c_0 = a_0 b_0 = 1 \cdot 4 = 4
c₁c1=a0b1+a1b0=15+24=13c_1 = a_0 b_1 + a_1 b_0 = 1\cdot5 + 2\cdot4 = 13
c₂c2=a1b1+a2b0=25+34=22c_2 = a_1 b_1 + a_2 b_0 = 2\cdot5 + 3\cdot4 = 22
c₃c3=a2b1=35=15c_3 = a_2 b_1 = 3\cdot5 = 15
result[4, 13, 22, 15] → 4+13x+22x2+15x34 + 13x + 22x^2 + 15x^3. Six products, computed explicitly. This is the O(n²) cost.

Interactive — Step through the naive algorithm

Module 1 — The Problem: O(n²) Multiplication

(1 + 2x + 3x²) × (4 + 5x) — watch each pairwise product accumulate

12x3x²45xA →B ↓481251015Result coefficients:x^04x^10x^20x^30a[0] × b[0] = 1 × 4 = 4 → result[0]Final: 4 + 13x + 22x² + 15x³
1 / 6

FFT reduces this to O(nlogn)O(n \log n) by exploiting the convolution theorem: pointwise multiplication in the frequency domain equals convolution in the time domain. To understand why, we first need the Discrete Fourier Transform.

2. The Discrete Fourier Transform

The Discrete Fourier Transform (DFT) of a length-nn sequence evaluates the associated polynomial at the nn-th roots of unity — the nn complex numbers ωn0,ωn1,,ωnn1\omega_n^0, \omega_n^1, \ldots, \omega_n^{n-1} that satisfy zn=1z^n = 1. The kk-th DFT output is:

Xk  =  j=0n1xjωnjk,k=0,1,,n1X_k \;=\; \sum_{j=0}^{n-1} x_j \cdot \omega_n^{jk}, \qquad k = 0, 1, \ldots, n-1

Where

  • nnlength of the input (must be a power of 2 for FFT)
  • xjx_jj-th input sample — the j-th polynomial coefficient
  • XkX_kk-th output — the polynomial evaluated at ω_n^k
  • ωn\omega_ne^{2πi/n} — the primitive n-th root of unity
  • ωnjk\omega_n^{jk}e^{2πi·jk/n} — the twiddle factor at position (j, k)

The nn evaluation points are equally spaced around the unit circle in the complex plane, each at angle 2πk/n2\pi k / n radians. The interactive diagram below makes this concrete — select different values of nn and click any point to see its exact complex value.

Interactive — Explore the n-th roots of unity

Module 2 — Roots of Unity

Click a point to see its value. The n-th roots of unity are the evaluation points for DFT.

n =
ReIm01i01234567

Click any point to inspect its value

The inverse DFT recovers the original sequence from its frequency representation:

xj  =  1nk=0n1Xkωnjkx_j \;=\; \frac{1}{n} \sum_{k=0}^{n-1} X_k \cdot \omega_n^{-jk}

Where

  • 1/n1/nrequired scaling — always divide by n after the inverse transform
  • ωnjk\omega_n^{-jk}e^{-2πi·jk/n} — same roots traversed in the opposite direction

Worked Example — DFT of [1, 1, 0, 0] by hand (n = 4)

setupInput x=[1,1,0,0]x = [1, 1, 0, 0]. The 4th roots of unity are ω40=1, ω41=i, ω42=1, ω43=i\omega_4^0=1,\ \omega_4^1=i,\ \omega_4^2=-1,\ \omega_4^3=-i.
X₀X0=11+11+0+0=2X_0 = 1\cdot1 + 1\cdot1 + 0 + 0 = 2
X₁X1=11+1i+0+0=1+iX_1 = 1\cdot1 + 1\cdot i + 0 + 0 = 1+i
X₂X2=11+1(1)+0+0=0X_2 = 1\cdot1 + 1\cdot(-1) + 0 + 0 = 0
X₃X3=11+1(i)+0+0=1iX_3 = 1\cdot1 + 1\cdot(-i) + 0 + 0 = 1-i
resultA^=[2, 1+i, 0, 1i]\hat{A} = [2,\ 1+i,\ 0,\ 1-i]. Notice X2=0X_2 = 0 because ω42=1\omega_4^2 = -1 is a root of 1+x1+x. Try selecting n=4 in the diagram above and clicking ω2\omega^2 to see why.

3. The Cooley-Tukey FFT Algorithm

Computing the DFT naively from the formula costs O(n2)O(n^2) — one inner sum per output. The Cooley-Tukey algorithm (1965) achieves O(nlogn)O(n \log n) by divide-and-conquer. The key observation is that any polynomial can be split by its even and odd indexed coefficients:

A(x)  =  Aeven(x2)  +  xAodd(x2)A(x) \;=\; A_{\text{even}}(x^2) \;+\; x \cdot A_{\text{odd}}(x^2)

Where

  • Aeven(y)A_{\text{even}}(y)polynomial from even-indexed coefficients: a₀ + a₂y + a₄y² + …
  • Aodd(y)A_{\text{odd}}(y)polynomial from odd-indexed coefficients: a₁ + a₃y + a₅y² + …
  • x2x^2substituting x² evaluates both halves at the same n/2 points

Worked Example — even/odd split on A = [1, 2, 3, 4, 5, 6]

evenIndices 0, 2, 4 → Aeven(y)=1+3y+5y2A_{\text{even}}(y) = 1 + 3y + 5y^2
oddIndices 1, 3, 5 → Aodd(y)=2+4y+6y2A_{\text{odd}}(y) = 2 + 4y + 6y^2
verifyAeven(x2)+xAodd(x2)=(1+3x2+5x4)+x(2+4x2+6x4)A_{\text{even}}(x^2) + x\cdot A_{\text{odd}}(x^2) = (1+3x^2+5x^4) + x(2+4x^2+6x^4) =1+2x+3x2+4x3+5x4+6x5 = 1+2x+3x^2+4x^3+5x^4+6x^5\ \checkmark
keyEach half has length 3 instead of 6 — the problem is halved. Recurse, then combine with a butterfly step. The recursion tree below shows every level of this splitting for an 8-element input.

Interactive — Cooley-Tukey recursion tree (n = 8)

Module 3 — Cooley-Tukey Recursion Tree

FFT for n=8: recursively splitting even and odd indices until base case.

0–70,2,4,61,3,5,70,42,61,53,704261537Even splitOdd split
Depth 0 — full input [0,1,2,3,4,5,6,7]

The key identity (ωnk)2=ωn/2k(\omega_n^k)^2 = \omega_{n/2}^k means both halves of the split share the same n/2n/2 evaluation points. So the results of the two recursive calls can be reused when combining. This combine step — the butterfly — is the heart of the FFT:

X[k]  =  E[k]  +  ωnkO[k]X[k] \;=\; E[k] \;+\; \omega_n^k \cdot O[k]
X ⁣[k+n2]  =  E[k]    ωnkO[k]X\!\left[k + \tfrac{n}{2}\right] \;=\; E[k] \;-\; \omega_n^k \cdot O[k]

Where

  • E[k]E[k]k-th output of the FFT of the even half
  • O[k]O[k]k-th output of the FFT of the odd half
  • ωnk\omega_n^ktwiddle factor — rotates O[k] before combining
  • X[k]X[k]upper butterfly output (k-th frequency bin)
  • X[k+n/2]X[k+n/2]lower butterfly output ((k+n/2)-th frequency bin)

One butterfly computes two outputs from two inputs in O(1)O(1). There are n/2n/2 butterflies per level and log2n\log_2 n levels in the tree, giving T(n)=2T(n/2)+O(n)=O(nlogn)T(n) = 2T(n/2) + O(n) = O(n \log n).

Worked Example — one butterfly pass — n = 4, input [1, 1, 0, 0]

setupAfter recursing: E=FFT([1,0])=[1,1]E = \text{FFT}([1,0]) = [1,1], O=FFT([1,0])=[1,1]O = \text{FFT}([1,0]) = [1,1]. Twiddle factors: ω40=1, ω41=i\omega_4^0=1,\ \omega_4^1=i.
k=0↑X[0]=E[0]+ω40O[0]=1+1=2X[0] = E[0] + \omega_4^0\cdot O[0] = 1 + 1 = 2
k=0↓X[2]=E[0]ω40O[0]=11=0X[2] = E[0] - \omega_4^0\cdot O[0] = 1 - 1 = 0
k=1↑X[1]=E[1]+ω41O[1]=1+i=1+iX[1] = E[1] + \omega_4^1\cdot O[1] = 1 + i = 1+i
k=1↓X[3]=E[1]ω41O[1]=1i=1iX[3] = E[1] - \omega_4^1\cdot O[1] = 1 - i = 1-i
result[2, 1+i, 0, 1i][2,\ 1+i,\ 0,\ 1-i] — matches the by-hand DFT from Section 2. Both outputs at each k reuse the same E[k] and O[k], computed only once.

4. Polynomial Multiplication via FFT

The convolution theorem connects the DFT to polynomial multiplication:

(ab)  =  F1 ⁣(F(a)F(b))(a \ast b) \;=\; \mathcal{F}^{-1}\!\left(\mathcal{F}(a) \cdot \mathcal{F}(b)\right)

Where

  • F(a)\mathcal{F}(a)the DFT (forward FFT) of coefficient array a
  • \cdotpointwise (element-by-element) multiplication
  • F1\mathcal{F}^{-1}the inverse DFT (IFFT)
  • aba \ast bthe convolution of a and b — the product polynomial's coefficients

Five steps to multiply two polynomials with FFT:

  1. Pad to power-of-2 size. Result degree is degA+degB\deg A + \deg B, so pad both to the next power of 2 ≥ degA+degB+1\deg A + \deg B + 1.
  2. FFT both inputs. Compute A^=FFT(A)\hat{A} = \text{FFT}(A) and B^=FFT(B)\hat{B} = \text{FFT}(B) in O(nlogn)O(n \log n) each.
  3. Pointwise multiply. C^[k]=A^[k]B^[k]\hat{C}[k] = \hat{A}[k] \cdot \hat{B}[k] for each kk — this is O(n)O(n).
  4. Inverse FFT. C=IFFT(C^)C = \text{IFFT}(\hat{C}) in O(nlogn)O(n \log n).
  5. Divide by n. The IFFT includes a 1/n1/n factor — apply it to every coefficient.

Worked Example — full FFT multiplication — [1, 2, 3] × [4, 5]

padProduct has 4 coefficients. Pad to n=4: A=[1,2,3,0], B=[4,5,0,0]A=[1,2,3,0],\ B=[4,5,0,0].
FFT AA^=[6, 2+2i, 2, 22i]\hat{A} = [6,\ -2+2i,\ -2,\ -2-2i]
FFT BB^=[9, 45i, 1, 4+5i]\hat{B} = [9,\ 4-5i,\ -1,\ 4+5i]
pointwise ×C^=[54, 2+18i, 2, 218i]\hat{C} = [54,\ 2+18i,\ 2,\ 2-18i]
IFFT ÷ 4C=[4, 13, 22, 15]C = [4,\ 13,\ 22,\ 15]
result[4, 13, 22, 15] — same answer as the naive calculation from Section 1, but the FFT approach usedO(nlogn)O(n \log n) work instead of O(n2)O(n^2).

5. Competitive Programming Notes

  • Floating-point precision. Standard FFT uses complex<double>. Rounding errors corrupt results when coefficients are large. Use long double or switch to NTT for exact integer arithmetic.
  • Iterative bit-reversal FFT. Recursive Cooley-Tukey has large constant factors. Production implementations use an iterative bottom-up approach with a bit-reversal permutation to reorder inputs in-place, improving cache performance significantly. This is what the Trace page demonstrates.
  • Array padding. Always pad to the next power of 2 ≥ n+m1n + m - 1. Under-padding causes circular aliasing — high-degree coefficients wrap around and corrupt lower ones.
  • NTT for integer problems. When answers must be computed modulo a prime, use the Number Theoretic Transform. It replaces complex roots of unity with modular primitive roots, giving exact integer results with no floating-point error. The most common modulus is 998244353.
  • Crossover point. Naive multiplication is faster for small polynomials (roughly n64n \lesssim 64) due to FFT's higher constant factor.

Deep dive — Number Theoretic Transform (NTT)

Module 4 — Number Theoretic Transform (NTT)

NTT is a variant of FFT that works over modular arithmetic instead of complex numbers. Instead of complex roots of unity, we use primitive roots of a prime modulus — giving exact integer results with no floating-point precision issues.

Most common NTT prime

998244353 = 119 × 2²³ + 1

A Fermat prime with primitive root g = 3. Supports transforms up to size 2²³ ≈ 8 million.

Primitive root of unity in NTT:

ω=g(p1)/nmodp\omega = g^{(p-1)/n} \bmod p

where p = 998244353, g = 3, n = transform length (power of 2)

Use cases in competitive programming

  • Polynomial multiplication with integer coefficients
  • Counting problems requiring modular arithmetic
  • String convolution and pattern matching
  • Subset sum convolution (AND-convolution, OR-convolution)

NTT vs FFT — when to choose:

Use NTT when

  • • Coefficients are integers
  • • Answer needs to be mod p
  • • Precision matters

Use FFT when

  • • Coefficients are real/complex
  • • Need arbitrary modulus
  • • Signal processing tasks