What is the fastest DFT(discrete Fourier transform) in Haskell?


To begin with, I have performed some preliminary benchmark experiments; here are the results Each line in the benchmark results has four labels name wisdom size repeat, where name is the name of the contestant, wisdom is the flag whether wisdom is used, size is the DFT array size and repeat is proportional to how many times the DFT is performed. I have compared:

The Method

For the rest of this article, I’ll explain how I have benchmarked them. The benchmarks are under development and comments are welcome!

I have used criterion package to compare them.

import Criterion.Main
import Control.Applicative
import Control.DeepSeq (deepseq)
import Control.Monad
import System.IO
import System.Process
import Text.Printf

The Solver defines the interface for the contestants. An initial function is given to the contestants, specified by four parameters. Then contestants are to repeatedly apply the Fourier transformation and inverse Fourier transformation on the given function for specified times. Finally, the contestants should measure the power of the function, both in the real space and in the wave space, and report them. In this way, the communication cost is minimized and we expect the better measurement of the DFT part.

import Solver

Here are the contestants that provide the solvers;

import qualified ContestantFFT
import qualified ContestantPureFFT
import qualified ContestantRepa
import qualified ContestantVectorFFTW

And here is another type of contestants that communicates via standard I/O;

fromExe :: FilePath -> Solver
fromExe exeFn prob = do
  (Just hin, Just hout, _, pid) <- createProcess (proc exeFn [])
    {std_in = CreatePipe, std_out = CreatePipe}
  hPutStrLn hin $ printf "%d %f %f %f %f %d %d"
    (if wisdomFlag prob then 1 else 0::Int)
    (x1 prob) (x2 prob) (y1 prob) (y2 prob)
    (probSize prob) (probRepeat prob)
  hClose hin
  (ans1: ans2: _) <- map read . words <$> hGetContents hout
  (ans1, ans2) `deepseq` hClose hout
  _ <- waitForProcess pid
  return (ans1, ans2)

Every contestant’s result must pass the test:

quarantine :: Solver -> Solver
quarantine solver prob = do
  (ans1, ans2) <- solver prob
    expectedAns, accuracy :: Double
    expectedAns = (x2 prob-x1 prob) + f (y1 prob) (y2 prob)
    f a b = 1/5 - (a+b)/2 + 1/3*(a**2+4*a*b+b**2) - a*b*(a+b) + a**2*b**2
    rn = fromIntegral $ probSize prob
    accuracy = max  (rn **(-2)) (1e-16 * rn)
    passTest x = abs(x-expectedAns) < accuracy
    pass = all passTest $ [ans1,ans2]
  when (not pass) $ do
    hPutStr stderr $ printf
      "wrong answer! expected %f +- %f, got: %s"
      expectedAns accuracy (show (ans1, ans2))
  return (ans1, ans2)

Now here are the list of contestants:

candidates :: [(String, Bool, Int, Solver)]
candidates = [
  ("cpp"        , True , 22, fromExe "./client-fftw-cpp"),
  ("fft"        , True , 22, ContestantFFT.solver),
  ("vector-fftw", True , 22, ContestantVectorFFTW.solver),
  ("repa"       , False, 12, ContestantRepa.solver),
  ("pure-fft"   , False, 12, ContestantPureFFT.solver)

Now the Olympics begin! The array size is varied from 210 to 220. Contestants perfom the DFT-iDFT in many cycles, from once to 102 times. By taking the difference of these cycles, I hope to cancel out the constants to obtain the better estimate of the bulk FFT cost.

main :: IO ()
main = defaultMain $ (:[]) $ bgroup "fft" $ do
  n <- map (2^) [10,11..(22::Int)]
  (tag1, isWise, ulim, solver) <- candidates
  wf <- if isWise then [False,True] else [False]
  --n <- map (2^) [ulim] -- perform only the heaviest benchmarks
  guard (n <= 2^ulim)
  rep <- [1,2,15,52,102]
  let prob = Problem{
        wisdomFlag = wf,
        x1 = 1/4, x2 = 3/4,
        y1 = 1/3, y2 = 2/3,
        probSize = n, probRepeat = rep
      tag2 = printf "%s %d %d %d"
             (if wf then 1 else 0 :: Int)
             (probSize prob) (probRepeat prob)
  return $ bench tag2 $ nfIO $ quarantine solver prob
comments powered by Disqus