that jefu guy: haskell

Showing posts with label haskell. Show all posts

Tuesday, June 30, 2009

Anagram Code

Finally, here is the anagram code. It worked on my tests, but failed to build the anagram trie with the enable word list (which I no longer find a direct link to online). I'm getting a stack overflow error which implies there is a space leak and a bit of investigation turns up that it is somewhere in building the trie. I've tried different ways to force strictness, but to no (so far) avail. But if I run it with a smaller dictionary, or with +RTS -K16M, it works.

Notice that to keep from getting duplicates, the function "anagramsFor" only will produce a new word if it follows (or is equal to) "starter", which is set to empty to start, then to the last word generated. This should eliminate duplicates where anagramsFor generates a list like ["aaa", "bbb", "ccc"] and also one like ["aaa", "ccc", "bbb"].

It runs my (simple) test cases (using small dictionaries).

Here's the code :


{-# LANGUAGE BangPatterns #-} 

{- run with +RTS -K16M for enable wordlist -} 

import Char
import Data.List 
import System 
import qualified Data.Map as M
import Control.Monad.State
import Control.Monad.List

data Trie a b =   Empty 
                | Node { entries :: (M.Map a (Trie a b)),  ender :: Maybe b } 
                  deriving (Eq, Show) 

type AnagramTrie = Trie Char String 
data AnagramHelper = Ah { trie :: AnagramTrie, strings ::  M.Map String [String]} 
                     deriving Show 

type AnagramTree = Trie String [String] 

emptyHelper = Ah { trie = Empty, strings = M.empty } 

trieInsert :: (Ord a, Show a) => [a] -> b -> Trie a b -> Trie a b 

trieInsert [] w Empty = Node {entries=M.empty, ender=Just w} 
trieInsert [] w n = n  -- should never happen

trieInsert !l1@(x:xs) !l Empty = Node {entries=M.singleton x (trieInsert xs l Empty ), ender=Nothing} 
      
trieInsert !l1@(x:xs) !al !Node{entries=es, ender=en}      
  | M.member x es              = let !st = trieInsert xs al       in Node{entries=M.adjust st x es, ender=en} 
  | otherwise                  = let !st = trieInsert xs al Empty in Node{entries=M.insert x st es, ender=en} 

findInTrie Empty _ = Nothing
findInTrie Node{entries = es, ender=en} [] = en 
findInTrie Node{entries=es, ender=en} (c:cs)
  | M.member c es = findInTrie (fromJust $ M.lookup c es) cs
  | otherwise     = Nothing 

toString :: (Show a, Show b) => Trie a b   -> String -> [String]
toString Empty indent = [] 
toString Node{entries=es, ender=en} indent = enderLine:subtrees
   where 
       enderLine = case en of 
                      Nothing -> []  
                      Just l -> (indent ++ "-" ++ show l)
       doEntry k t =  (indent ++ " " ++ show [k]):(toString t ("  "++indent)) 
       subtrees =  concat $ M.elems $ M.mapWithKey doEntry es 

-- extract the paths in a trie (ignoring the entries, if any) 
pathsThrough :: Trie a b -> [[a]] 
pathsThrough Empty = [[]] 
pathsThrough Node{entries=es,ender=en} = l 
   where 
     l = concatMap (\(x,y) -> (map (x:)) (pathsThrough y)) $ M.assocs es 

allWhitespace s = and $ map isSpace s

makeAnagramHelperFromList l = foldl' ahInsert emptyHelper l

ahInsert !a@Ah{trie=t,strings= s} !word 
   | M.member sw s = Ah{trie=t, strings = M.insertWith' insertUnique sw [word] s}  
   | otherwise     = Ah{trie=trieInsert sw sw t, strings=M.insert sw [word] s} 
  where
    sw = sort word 
    insertUnique [w] l = if w `elem` l then l else (w:l)

clean x = filter (not.null) (nub x)   -- profiling shows this is a bit faster 

-- These are some common lists of words - one per line used here
-- these (obviously) live in my filesystem - you might use /usr/dict/share/words if
-- you're on a unix system 

-- wordfile = "words/american-english-large-no-quotes"
-- wordfile = "words/american-english-smaller"
-- wordfile = "words/shorter-wordlist"
-- wordfile = "words/random-words"
-- wordfile = "words/seethe"

strip s = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace s 

allchars w         = (not.or) $ map (\x -> x < 'a' || x > 'z') w

-- most dictionaries include all the letters on their own, so 
-- filter out all one letter words, but toss "a" and "i" back in
-- also strip the strings as there are some cases where they have trailing spaces 
-- sometimes there seem to be duplicates (probably because there are caps in some versions)
-- in this version duplicates are handled by the builder 

getWordsFromList l = ("a" :) $ ("i" :) $ filter (\x->(length x) > 1) $ filter allchars $ map strip $ (lines $ map toLower l)

makeAnagramHelperFromFile fn = do 
    rawWords <- readFile fn 
    let wordList = getWordsFromList rawWords  
    return $! makeAnagramHelperFromList wordList 

getStringsFromTopWithString :: AnagramTrie -> String -> [String] 

getStringsFromTopWithString Empty _ = [] 
getStringsFromTopWithString _  [] = [] 
getStringsFromTopWithString topTrie@Node{entries=es, ender=en} s@(c:cs) = 
  let maybeSubTree =  M.lookup c es 
  in case maybeSubTree of 
       Nothing -> [] 
       Just thisSubTree -> maybePrepend en $ findAllSubsetStringsInSubTree thisSubTree cs 
        
findAllSubsetStringsInSubTree Empty _ = [] 
findAllSubsetStringsInSubTree Node{ender=en} [] = maybePrepend en [] 
findAllSubsetStringsInSubTree t@Node{entries=es, ender=en} (c:cs) = 
   let next = M.lookup c es  -- does this first character in this node have a subtree ? 
       subs = findAllSubsetStringsInSubTree t $ dropWhile (==c) cs -- no, skip all of those and try the rest 
   in case next of 
        Nothing -> subs 
        Just t' -> maybePrepend en $ findAllSubsetStringsInSubTree t' cs ++ subs  

maybePrepend Nothing l = l 
maybePrepend (Just x) l = x:l

fromJust (Just l) = l 

testWords = ["at", "tear", "rat", "ate", "sea", "teas", "tea", "erase", "tar", "art", "eat", "tears", "stare", "reset", "stares", "star" , "tartar", "seatea", "erasetar", "tee", "see"] 

getTestTrie = trie getTestAH   

getTestAH = makeAnagramHelperFromList testWords 

anagramsFor starter ah@Ah{trie=t, strings=m} [] = Just Empty
anagramsFor starter ah@Ah{trie=t, strings=m} s  = anatree  
    where 
       s' = sort s 
       slist = getStringsFromTopWithString t s' 
       subtreeskeys = filter (\(x,y) -> y /=Nothing) $ map (\x -> (x,anagramsFor x ah (s' \\ x))) slist 
       subtreeskeys1 = filter (\(x,y) -> x>=starter) subtreeskeys 
       mkAnaTrieEntries m (x, Nothing) = m 
       mkAnaTrieEntries m (x, Just y)  = M.insert x y m 
                                 
       anatree =  if subtreeskeys1 == [] 
                     then Nothing 
                     else Just Node{entries=foldl mkAnaTrieEntries M.empty subtreeskeys1, ender=Just s} 

simpleTest s = do 
  let t = anagramsFor [] getTestAH s
      t' = fromJust t 
  mapM_ putStrLn $ toString t' "" 
  mapM_ putStrLn $ map show $ pathsThrough t' 

main = do
          (wl, strings) <- getCommandLineArgs   
          putStrLn $ "Using dictionary : " ++ wl
          ah <- makeAnagramHelperFromFile wl
          mapM_ (getAnagrams ah) strings 

getCommandLineArgs = do 
   al <- getArgs
   if length al < 1 
      then error "Args : [-dictionary] strings " 
      else let (s:xs) = al !! 0  
           in if s == '-' 
              then return (xs,drop 1 al) 
              else return ("words/enable.list", al) 

listVariants :: AnagramHelper -> [String] -> [[String]] 
listVariants Ah{strings=m} sl = map getWords sl 
  where
    getWords s = fromJust $ M.lookup s m 

getAnagrams ah s = do 
   putStrLn $ "getting anagrams for " ++ s 
   let at = anagramsFor [] ah s 
   case at of 
      Nothing -> putStrLn $ "No anagrams for: " ++ s 
      Just t  -> do 
                    mapM_ putStrLn $ toString t "" 
                    let p = pathsThrough t 
                    mapM_ putStrLn $ map show p 
                    let v = map (listVariants ah) p 
                    mapM_ putStrLn $ map show v

Tuesday, May 05, 2009

Whole Kenken program

Here is the complete program including all the bits previously posted as well as some helper functions and the main driver. It is set up in such a way that you can load it into ghci and then run "doPuzzle filename" to run a puzzle.


import System 
import Char 
import Maybe 
import Control.Monad.State

-- a puzzle has a size (so we know the limit of values to use)
-- its original input as a list of strings (just in case we want to print it)
-- a list of constraints 
-- and a list of cells with position/label
-- the cells could be a list of lists, but doing the lookup in a data set
-- this size isn't likely to be the limiting factor and we'll abstract
-- over getting a cell by x,y coordinates anyway 

data Puzzle = Puzzle { 
      psize :: Int, 
      origInput ::  [String],  
      constraints :: [Constraint], 
      pcells :: [Cell] 
    } 
   
-- constraints have labels (from the input description)
-- operations (the arithmetic operators as strings)
-- target values 
-- and a list of the cells that make up the constraint 
data Constraint =  Constraint 
                   { conlabel :: String,
                     conop :: String,
                     contarget :: Int,
                     concells :: [Cell] 
                   }
                  deriving Show 

-- each cell in the puzzle has a position (cx, cy) and a label
-- corresponding to the constraint it is in 
data Cell = Cell {
              clabel ::  String,
              cx :: Int,
              cy :: Int
          }
          deriving (Eq, Show)

-- an assignment is, well, an assignment of a value to a cell

data Assignment = Assignment { acell :: Cell, avalue :: Int } 
                deriving (Show, Eq)

-- a possibility represents a "possible" solution to the puzzle

type Possibility = [ Assignment ] 

-- The PuzzleM type contains the base puzzle

type PuzzleM = StateT Puzzle IO

getPuzzle :: PuzzleM Puzzle 
getPuzzle =  get 

getConstraints = do 
  p <- getPuzzle 
  return $ constraints p 

-- not a fancy show, but shows the pieces - quick and easy 
instance Show Puzzle where
   show (Puzzle{psize=s, origInput=inp,constraints=cos,pcells=ces}) = 
     unlines $ ["Puzzle::", "size="++(show s)] 
                 ++ inp
                 ++  (map show cos) 
                 ++ (map show ces)


strip l = sl 
  where 
      sl = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace l 

parse :: String -> Puzzle
parse s = Puzzle {psize=size, origInput=plines, constraints=constraintList, pcells=cellList}
          where 
            plines = map strip $ lines s 
            (cellLines, constraintLines) = break ([]==) plines
            size = length cellLines 
            cellList = doCellLines 0 cellLines 
            constraintList = parseConstraintLines cellList $ tail constraintLines 

doCellLines :: Int -> [String] -> [Cell] 
doCellLines i []     = [] 
doCellLines i (l:ls) = let l1 = zip [0..] l 
                           mkcell (xpos, y) = Cell { clabel=[y], cx=xpos, cy=i} 
                           l2 = map mkcell l1 
                           in l2 ++ (doCellLines (i+1) ls) 

parseConstraintLines cells lines = map (parseConstraint cells) (filter ("" /=) lines)

parseConstraint cells l =  Constraint {conlabel= label, 
                                   conop = op, 
                               contarget = target,
                                   concells = clist } 
                    where 
                      (label,rest) = break ('='==) l 
                      (starget, op) = break (not.isDigit) $ tail rest 
                      target = read starget         
                      clist = filter (\c -> clabel c == label) cells 

showPuzzle = do 
                p <- getPuzzle
                liftIO $ putStrLn $ show p 

solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList  = return assList 
solve cl@(c:cs) assList  = do 
                           s <- psize `liftM` getPuzzle 
                           let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
                               passes = map (:assList) pass 
                           solve1 cs passes 

solve1 cells [] = return [] 
solve1 cells pl@(p:ps) = do 
                        good <- okSoFar p 
                        if good
                           then do solved <- solve cells p
                                   if solved /= [] 
                                      then return solved
                                      else solve1 cells ps 
                           else solve1 cells ps 

allRowsOK p = do 
                s <- psize `liftM` getPuzzle
                return $ and $ map (rowOK s p) [0..s-1] 

allColsOK p = do 
                s <- psize `liftM` getPuzzle
                return $ and $ map (colOK s p) [0..s-1] 

rowOK s plist row = allDifferent (map avalue inrow) 
    where 
      inrow = filter (\x -> (row == (cy $ acell x))) plist 

colOK s plist col = allDifferent (map avalue incol) 
    where 
      incol = filter (\x -> (col == (cx $ acell x))) plist 
                               
allDifferent [] = True 
allDifferent (x:xs) = (not $ elem x xs) && allDifferent xs 

allConsOK p = do 
                 conlist <- constraints `liftM` getPuzzle  
                 return $ and $ map (conOK p) conlist 

conOK p constraint = checkCon convals contype target cl
    where 
      concl = concells constraint 
      convals = map avalue $ filter (\x -> ( acell x) `elem` concl) p
      contype = conop constraint 
      target = contarget constraint 
      cl = length concl 

checkCon [] _   tgt _ = True       
checkCon cl "=" tgt _ = tgt == cl !! 0
checkCon cl "*" tgt l = if length cl == l 
                           then tgt == product cl 
                           else 0== tgt `mod` (product cl) 
checkCon cl "+" tgt l = if length cl == l
                           then tgt == sum cl 
                           else tgt >= sum cl 
checkCon cl "-" tgt _
  | length cl > 2   = False 
  | length cl == 2  = abs(cl !! 0 - cl !! 1) == tgt 
  | length cl == 1  = True

checkCon cl "/" tgt  _
  | length cl > 2   = False 
  | length cl == 1  = True 
  | length cl == 2  = let a = cl !! 0 
                          b = cl !! 1
                      in  (a `div` b) == tgt  || (b `div` a) == tgt

okSoFar p = do 
               rowsOK <- allRowsOK p 
               colsOK <- allColsOK p 
               consOK <- allConsOK p 
               return $ rowsOK && colsOK && consOK 

showKnownCells al s =   unlines $ map getRow [0..s-1] 
                    where 
                      getCellByRowCol al r c = filter (\x -> (r == (cy $ acell x)) && (c == (cx $ acell x))) al 
                      getRow r = unwords $ map doCell $ map (getCellByRowCol al r) [0..s-1] 
                      doCell [] = " "
                      doCell (x:xs) = show $ avalue x 
                      

runPuzzle  = do
                {- showPuzzle  -} 
                cl <- pcells `liftM` getPuzzle 
                solve cl [] 

showPossibles p =   unlines $ map show p

main = do 
          args <- getArgs 
          doPuzzle (args !! 0)


doPuzzle fn = do 
  inp <- readFile fn 
  let puzzle = parse inp 
  
  putStrLn inp 
  putStrLn "about to evalState puzzle..."
  (soln,p) <- runStateT runPuzzle puzzle
  putStrLn $ showKnownCells soln 6

Monday, May 04, 2009

Kenken Solver

The solver is the remaining major piece of the kenken program. It is simple enough here - the function solve takes a list of cells that are not yet assigned values, a list of cells with values (a "Possibility") and returns a "Possibility" that should, if not null, result in a solution. To do this it takes the next unassigned cell from the list of cells, makes a list of all the possible values it might take (that is the values from 1 up to the size of the puzzle - no culling is attempted) and tries to solve the puzzle with each of those values being assigned to the cell.

This looks like :


solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList  = return assList 
solve cl@(c:cs) assList  = do 
                           s <- psize `liftM` getPuzzle 
                           let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
                               passes = map (:assList) pass 
                           solve1 cs passes 

solve1 cells [] = return [] 
solve1 cells pl@(p:ps) = do 
                        good <- okSoFar p 
                        if good
                           then do solved <- solve cells p
                                   if solved /= [] 
                                      then return solved
                                      else solve1 cells ps 
                           else solve1 cells ps

I think that if I used List as the base monad in the stack (instead of IO) I could have used the nondeterminism aspect to simplify this, but I did not, so here's what I have.

Friday, April 24, 2009

Simple kenken solver in haskell

Recently the NY Times started doing Kenken puzzles. These are numeric puzzles in the sudoku vein. For a good overview, see the wikipedia page where they have a nice sample puzzle.

I solved a couple of these and then decided that it was time to build a solver. My first solver (in Python) dissolved in a flurry of overcomplicated algorithms and data structures and I decided to start from scratch in Haskell (in part to try to improve my Haskell skills). First though, I needed an input format. I constructed one that was simple, easy to derive from a puzzle and easy to edit. In this format the puzzle grid is laid out with letters indicating the blocks and a list of constraints on the blocks on subsequent lines. Each constraint is a label (from the grid), an equals sign, a target value (numeric) and an operator ("+", "-", "/", "*", "=" - used when the value in the cell is set). This format has the advantage that it is easy for me to read and easy to parse.

The puzzle from the wikipedia entry is given below.


abbcdd
aeecfd
gghhfd
ggijkk
llijjm
nnnoom

a=11+
b=2/
c=20*
d=6*
e=3-
f=3/
g=240*
h=6*
i=6*
j=7+
k=30*
l=6*
m=9+
n=8+
o=2/

For example, the first block (labeled "a" in the square) needs to have a sum of 11.

My first solver was a simple backtracking recursive solver. It didn't use any constraint information except to verify that the current solution was ok.

To do this, I built several data structures. First, a Cell is an x,y location and a Constraint label (such as "a" above). I use the (x,y) information in the Cell to locate it rather than keeping a two dimensional array (or list of lists). This does mean that in several places I scan the list of cells to find a cell, but since the list of cells is typically small for these puzzles, that is not that much of a problem :


data Cell = Cell {
            clabel ::  String,
            cx :: Int,
            cy :: Int
        }
        deriving (Eq, Show)

Next, a Constraint is a label (from the puzzle input), a target value, an operation (as a string) and a list of cells. The list of cells could also be constructed as needed, but since checking the constraint always required looking at the list, I put this in.


data Constraint =  Constraint
                  { conlabel :: String,
                    conop :: String,
                    contarget :: Int,
                    concells :: [Cell]
                  }
                 deriving Show

An Assignment is a Cell - value pair, and we build up a list of possible assignments in a Possibility (that is, a possible solution). Assignments are not part of the puzzle, but are
carried around in the recursive calls.


data Assignment = Assignment { acell :: Cell, avalue :: Int }
               deriving (Show, Eq)

type Possibility = [ Assignment ]

A puzzle has a size, the original input (which is useful for debugging), a list of constraints and a list of cells. Since I'd like to pass the puzzle around in lots of places, I'm building a State Monad of this as well.


data Puzzle = Puzzle {
     psize :: Int,
     origInput ::  [String], 
     constraints :: [Constraint],
     pcells :: [Cell]
   }

type PuzzleM = StateT Puzzle IO

Next post: parsing the input.

that jefu guy