COS 226 Lecture 21: Multiplication %ps /lecture 21 def How fast can we multiply? Small numbers: basic machine operation Approximate solution OK: "floating point" MIPS vs. FLOPS Exact solution required for numbers with millions of digits? small ex: 98734019238470981723409182730491827 times 66753875434605462654864811227186687 equals 6590878421402831216001745213641931592836606299109 07873145346994326436 Basic question about difficulty of computations ARITHMETIC COMPLEXITY also consider other operations add, multiply, exponentiate, divide, logarithm ... Applications more numerous than we might imagine Arithmetic coding ($%#?!) Cryptography Signal processing ----- Multiplying complex numbers Given two complex numbers x = a + b i y = c + d i Product x*y real part: a*c - b*d imaginary part: a*d + b*c Four multiplications, two additions Typical old-fashioned machine: 1 time unit for add 4 time units for multiply therefore, 18 time units for complex multiply Faster way to compute product (!) t = (a+b) * (c+d) = a*c + a*d + b*c + b*d Product x*y real part: a*c - b*d imaginary part: t - a*c - b*d THREE multiplications, five additions 17 time units for complex multiply Only works if multiply expensive relative to add software floating point old machines ----- Exponentiation How many multiplications to compute x^N ? If N is a power of 2, use successive squaring 1, x, x^2, x^4, x^8, x^16, x^32, ... If N is not a power of 2, do successive squaring then pick out numbers to multiply together by * divide-and-conquer * binary representation of N Ex: x = 2, N = 21 -- 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 --- binary rep. of N: 2^21 = 2^16*2^4*2^1 = 65536*16*2 = divide-and-conquer: 2^21 = 2^10*2^11 = 2^5*2*5*2*5*2^6 = ... Unrealistic to consider if x and/or N large Realistic version (which is of practical use) THM: Less than 2 lg N multiplications are required to compute x^N mod M Linear in the number of bits to store N DISCRETE LOGARITHM problem (difficult) given x^N mod M, find x ----- Multiplying polynomials To multiply integers may as well consider polynomial multiplication finish up with Horner's method grade school grade school, no carries -- . 1 2 3 4 1 2 3 4 . 5 6 7 8 5 6 7 8 . ------- --------------- . 9 8 7 2 8*1 8*2 8*3 8*4 . 8 6 3 8 7*1 7*2 7*3 7*4 . 7 4 0 4 6*1 6*2 6*3 6*4 . 6 1 7 0 5*1 5*2 5*3 5*4 . ------------- --------------------------- . 7 0 0 6 6 5 2 --- polynomial -- . 3 2 1 0 %ps hback . 1x + 2x + 3x + 4x %ps qback . 3 2 1 0 %ps hback . 5x + 6x + 7x + 8x %ps qback . ---------------------- %ps qback . 3 2 1 0 %ps hback . 8x +16x +24x +32x %ps qback . 4 3 2 1 %ps hback . 7x +14x +21x +28x %ps qback . 5 4 3 2 %ps hback . 6x +12x +18x +24x %ps qback . 6 5 4 3 %ps hback . 5x +10x +15x +20x %ps qback . ---------------------------------------- %ps qback . 6 5 4 3 2 1 0 %ps hback . 5x +16x +34x +60x +61x +52x +32x --- ----- Cost of multiplication Polynomial multiplication -- for (i = 0; i < 2*N-1; i++) r[i] = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) r[i+j] += p[i]*q[j]; --- N^2 coefficient multiplications Integer multiplication R-L: -- %ps qback . 32 %ps qback . 52 %ps qback . 61 %ps qback . 60 %ps qback . 34 %ps qback . 16 %ps qback . 5 %ps qback . 7006652 --- L-R: use Horner's method to evaluate at x=10 -- . 5*10 = 50+16 = 66 %ps qback . 66*10 = 660+34 = 6944 %ps qback . 694*10 = 6940+60 = 7000 %ps qback . 7000*10 = 70000+61 = 70061 %ps qback . 70061*10 = 700610+52 = 700662 %ps qback . 700662*10 = 7006620+32 = 7006652 --- Extra factor of log N might be required Bottom line: quadratic algorithms can multiply thousand-digit numbers but not million-digit numbers ----- Divide-conquer polynomial multiplication Example -- . 2 2 %ps hback . ( x ( x+2) + (3x+4) ) * ( x (5x+6) + (7x+8) ) %ps qback . 4 %ps hback . x ( x+2)*(5x+6) %ps qback . 2 %ps hback . + x ( x+2)*(7x+8) %ps qback . 2 %ps hback . + x (5x+6)*(3x+4) . + (3x+4)*(7x+8) . 6 5 4 %ps hback . 5x +16x +12x %ps qback . 4 3 2 %ps hback . 7x +22x +16x %ps qback . 4 3 2 %ps hback . 15x +38x +24x %ps qback . 2 %ps hback . 21x +52x +32 %ps qback . ---------------------------------------- %ps qback . 6 5 4 3 2 %ps hback . 5x +16x +34x +60x +61x +52x +32 --- In general, if p and q are degree 2N -- . N N %ps hback . p(x)*q(x) = (x ph(x)+pl(x))*(x qh(x)+ql(x)) %ps qback . 2N %ps hback . p(x)*q(x) = x ph(x)*qh(x) %ps qback . N %ps hback . + x (ph(x)*ql(x) + pl(x)*qh(x)) . + pl(x)*ql(x) --- Scalar multiplication count m(0) = 1 m(2N) = 4*m(N) Solution: m(N) = N^2, since (2N)^2 = 4N^2 No savings from divide and conquer? ----- Polynomial multiplication (continued) Use same technique as for complex numbers In general, if p and q are degree 2N -- . N N %ps hback . p(x)*q(x) = (x ph(x)+pl(x))*(x qh(x)+ql(x)) . t(x) = (ph(x) + pl(x)) * (qh(x) + ql(x)) %ps qback . 2N %ps hback . p(x)*q(x) = x ph(x)*qh(x) %ps qback . N %ps hback . + x (t(x)-ph(x)*qh(x) + pl(x)*ql(x)) . + pl(x)*ql(x) --- Ex: -- . 2 2 %ps hback . ( x ( x+2) + (3x+4) ) * ( x (5x+6) + (7x+8) ) . 2 %ps hback . ( x+2)*( 5x+ 6) = 5x + 16x + 12 %ps qback . 2 %ps hback . (3x+4)*( 7x+ 8) = 21x + 52x + 32 %ps qback . 2 %ps hback . t(x) = (6x+8)*(10x+12) = 60x + 152x + 96 . 4 2 %ps hback . x ( 5x + 16x + 12) %ps qback . 2 2 %ps hback . + x (34x +84x+52) %ps qback . 2 %ps hback . + 21x + 52x + 32 . ---------------------------------------- %ps qback . 6 5 4 3 2 %ps hback . 5x +16x +34x +60x +61x + 52x + 32 --- Scalar multiplication count m(0) = 1 m(2N) = 3*m(N) Solution: m(N) = N^lg3, since (2N)^lg3 = 3N^lg3 Bottom line: can do tens of thousands of digits but not millions of digits ----- Basic computations on polynomials A polynomial of degree N-1 %ps qback 2 3 N-1 %ps hback p0 + p1 x + p2 x + p3 x + ... + p(N-1) x is defined by N numbers (the coefficients) p0 p1 p2 ... p(N-1) The polynomial associates any N numbers x0 x1 x2 ... x(N-1) with another set of N numbers y0 y1 y2 ... y(N-1) simply by p(x0) = y0 p(x1) = y1 p(x2) = y2 p(x3) = y3 ... EVALUATION find the y's, given the p's and x's ROOT EXTRACTION find the x's, given the p's and y's INTERPOLATION find the p's, given the x's and y's ----- Fast polynomial multiplication EVALUATE-MULTIPLY-INTERPOLATE %ps qback 2 %ps hback Ex: (1+x)*(2+3x) = 2 + 5x + 3x Pick three distinct numbers: 0, 1, 2 (say) EVALUATE -- . x 0 1 2 . ----------------------- . 1 + x 1 2 3 . 2 + 3x 2 5 8 --- MULTIPLY -- . x 0 1 2 . ------------------------------- . (1 + x)*(2 + 3x) 2 10 24 --- INTERPOLATE -- . (x-1)(x-2) (x-0)(x-2) (x-0)(x-1) . 2 ---------- + 10 ---------- + 24 --------- . (0-1)(0-2) (1-0)(1-2) (2-0)(2-1) %ps qback . 2 2 2 %ps hback x - 3x + 2 -10 (x - 2x) + 12 (x - x) %ps qback . 2 %ps hback . 3x + 5x + 2 --- Bad news * evaluate, interpolate, seem to take N^2 steps Good news * method works for ANY set of distinct points ----- Complex roots of unity Ex: (1-i)(1-i) = 1 -2i -1 = -2i (1-i)^4 = 4 (.707.. - .707..i)^4 = 1 For each N, there are N different complex numbers that evaluate to N when raised to the Nth power -- . 0 1 2 3 4 5 6 7 %ps hback . W W W W W W W W %ps hback . 8 8 8 8 8 8 8 8 --- . 5 %ps hback 1 w -1 w w: PRINCIPAL Nth root of unity . k (2 pi i k)/N %ps hback . W = e %ps hback . N x -- . 0 1 2 3 4 5 6 7 %ps hback . W W W W W W W W %ps hback . 8 8 8 8 8 8 8 8 --- -- . 0 1 2 3 0 1 2 3 %ps hback . W W W W -W -W -W -W %ps hback . 8 8 8 8 8 8 8 8 --- x^2 -- . 0 1 2 3 0 1 2 3 %ps hback . W W W W W W W W %ps hback . 4 4 4 4 4 4 4 4 --- ----- EVALUATE in NlogN steps p(x) -- . 1 2 3 4 5 6 7 %ps hback . p + p x + p x + p x + p x + p x + p x + p x %ps hback . 0 1 2 3 4 5 6 7 --- split into even, odd coefficients -- . 2 4 6 2 4 6 %ps hback . p + p x + p x + p x + x(p + p x + p x + p x ) %ps hback . 0 2 4 6 1 3 5 7 --- . 2 2 %ps hback . p(x) = p (x ) + xp (x ) %ps hback . e o To evaluate at eighth roots of unity use values of even, odd polynomials at fourth roots of unity -- . 0 0 0 0 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 1 1 1 1 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 2 2 2 2 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 3 3 3 3 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 4 0 4 0 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 5 1 5 1 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 6 2 6 2 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 %ps qback . 7 3 7 3 %ps hback . p(w ) = p (w ) + w p (w ) %ps hback . 8 e 4 8 o 4 --- multiplication count m(0) = 1 m(N) = 2 m(N/2) + N m(N) = N lg N ----- EVALUATE implementation Use the perfect shuffle -- . p p p p p p p p %ps hback . 0 1 2 3 4 5 6 7 --- -- . p p p p p p p p %ps hback . 0 2 4 6 1 3 5 7 --- To evaluate inplace unshuffle evaluate halves inplace recursively use values to compute result ----- Interpolating at roots of unity For the Nth roots of unity, it turns out that INTERPOLATE is truly the inverse of EVALUATE r(x) -- . 1 2 3 4 5 6 7 %ps hback . r + r x + r x + r x + r x + r x + r x + r x %ps hback . 0 1 2 3 4 5 6 7 --- s(x) -- . 1 2 3 4 5 6 7 %ps hback . s + s x + s x + s x + s x + s x + s x + s x %ps hback . 0 1 2 3 4 5 6 7 --- -- . 0 0 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 0 8 0 %ps qback . 1 -1 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 1 8 1 %ps qback . 2 -2 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 2 8 2 %ps qback . 3 -3 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 3 8 3 %ps qback . 4 -4 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 4 8 4 %ps qback . 5 -5 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 5 8 5 %ps qback . 6 -6 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 6 8 6 %ps qback . 7 -7 %ps hback . r(w ) = s s(w ) = r %ps hback . 8 7 8 7 %ps qback --- Fundamental property (Fourier inversion theorem) To INTERPOLATE at roots of unity, use the EVALUATE routine (!) ----- Fast Fourier transform To multiply to polynomials of degree N-1 EVALUATE both at (2N-1) roots of unity MULTIPLY (2N-1) results term-by-term INTERPOLATE by evaluating at (2N-1) inverse roots of unity Total time proportional to N lg N Shuffling keeps space proportional to N Bottom line: can multiply numbers with millions of digits "multiply" = "convolution" basic mathematical operation on series basic operation in digital signal processing allows "real time" processing