Showing posts with label haskell. Show all posts
Showing posts with label haskell. Show all posts

Tuesday, June 30, 2009

Anagram Code

Finally, here is the anagram code. It worked on my tests, but failed to build the anagram trie with the enable word list (which I no longer find a direct link to online). I'm getting a stack overflow error which implies there is a space leak and a bit of investigation turns up that it is somewhere in building the trie. I've tried different ways to force strictness, but to no (so far) avail. But if I run it with a smaller dictionary, or with +RTS -K16M, it works.

Notice that to keep from getting duplicates, the function "anagramsFor" only will produce a new word if it follows (or is equal to) "starter", which is set to empty to start, then to the last word generated. This should eliminate duplicates where anagramsFor generates a list like ["aaa", "bbb", "ccc"] and also one like ["aaa", "ccc", "bbb"].

It runs my (simple) test cases (using small dictionaries).

Here's the code :


{-# LANGUAGE BangPatterns #-}

{- run with +RTS -K16M for enable wordlist -}

import Char
import Data.List
import System
import qualified Data.Map as M
import Control.Monad.State
import Control.Monad.List

data Trie a b = Empty
| Node { entries :: (M.Map a (Trie a b)), ender :: Maybe b }
deriving (Eq, Show)

type AnagramTrie = Trie Char String
data AnagramHelper = Ah { trie :: AnagramTrie, strings :: M.Map String [String]}
deriving Show

type AnagramTree = Trie String [String]

emptyHelper = Ah { trie = Empty, strings = M.empty }

trieInsert :: (Ord a, Show a) => [a] -> b -> Trie a b -> Trie a b

trieInsert [] w Empty = Node {entries=M.empty, ender=Just w}
trieInsert [] w n = n -- should never happen

trieInsert !l1@(x:xs) !l Empty = Node {entries=M.singleton x (trieInsert xs l Empty ), ender=Nothing}

trieInsert !l1@(x:xs) !al !Node{entries=es, ender=en}
| M.member x es = let !st = trieInsert xs al in Node{entries=M.adjust st x es, ender=en}
| otherwise = let !st = trieInsert xs al Empty in Node{entries=M.insert x st es, ender=en}

findInTrie Empty _ = Nothing
findInTrie Node{entries = es, ender=en} [] = en
findInTrie Node{entries=es, ender=en} (c:cs)
| M.member c es = findInTrie (fromJust $ M.lookup c es) cs
| otherwise = Nothing

toString :: (Show a, Show b) => Trie a b -> String -> [String]
toString Empty indent = []
toString Node{entries=es, ender=en} indent = enderLine:subtrees
where
enderLine = case en of
Nothing -> []
Just l -> (indent ++ "-" ++ show l)
doEntry k t = (indent ++ " " ++ show [k]):(toString t (" "++indent))
subtrees = concat $ M.elems $ M.mapWithKey doEntry es

-- extract the paths in a trie (ignoring the entries, if any)
pathsThrough :: Trie a b -> [[a]]
pathsThrough Empty = [[]]
pathsThrough Node{entries=es,ender=en} = l
where
l = concatMap (\(x,y) -> (map (x:)) (pathsThrough y)) $ M.assocs es

allWhitespace s = and $ map isSpace s

makeAnagramHelperFromList l = foldl' ahInsert emptyHelper l

ahInsert !a@Ah{trie=t,strings= s} !word
| M.member sw s = Ah{trie=t, strings = M.insertWith' insertUnique sw [word] s}
| otherwise = Ah{trie=trieInsert sw sw t, strings=M.insert sw [word] s}
where
sw = sort word
insertUnique [w] l = if w `elem` l then l else (w:l)

clean x = filter (not.null) (nub x) -- profiling shows this is a bit faster

-- These are some common lists of words - one per line used here
-- these (obviously) live in my filesystem - you might use /usr/dict/share/words if
-- you're on a unix system

-- wordfile = "words/american-english-large-no-quotes"
-- wordfile = "words/american-english-smaller"
-- wordfile = "words/shorter-wordlist"
-- wordfile = "words/random-words"
-- wordfile = "words/seethe"

strip s = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace s

allchars w = (not.or) $ map (\x -> x < 'a' || x > 'z') w

-- most dictionaries include all the letters on their own, so
-- filter out all one letter words, but toss "a" and "i" back in
-- also strip the strings as there are some cases where they have trailing spaces
-- sometimes there seem to be duplicates (probably because there are caps in some versions)
-- in this version duplicates are handled by the builder

getWordsFromList l = ("a" :) $ ("i" :) $ filter (\x->(length x) > 1) $ filter allchars $ map strip $ (lines $ map toLower l)

makeAnagramHelperFromFile fn = do
rawWords <- readFile fn
let wordList = getWordsFromList rawWords
return $! makeAnagramHelperFromList wordList

getStringsFromTopWithString :: AnagramTrie -> String -> [String]

getStringsFromTopWithString Empty _ = []
getStringsFromTopWithString _ [] = []
getStringsFromTopWithString topTrie@Node{entries=es, ender=en} s@(c:cs) =
let maybeSubTree = M.lookup c es
in case maybeSubTree of
Nothing -> []
Just thisSubTree -> maybePrepend en $ findAllSubsetStringsInSubTree thisSubTree cs

findAllSubsetStringsInSubTree Empty _ = []
findAllSubsetStringsInSubTree Node{ender=en} [] = maybePrepend en []
findAllSubsetStringsInSubTree t@Node{entries=es, ender=en} (c:cs) =
let next = M.lookup c es -- does this first character in this node have a subtree ?
subs = findAllSubsetStringsInSubTree t $ dropWhile (==c) cs -- no, skip all of those and try the rest
in case next of
Nothing -> subs
Just t' -> maybePrepend en $ findAllSubsetStringsInSubTree t' cs ++ subs

maybePrepend Nothing l = l
maybePrepend (Just x) l = x:l

fromJust (Just l) = l

testWords = ["at", "tear", "rat", "ate", "sea", "teas", "tea", "erase", "tar", "art", "eat", "tears", "stare", "reset", "stares", "star" , "tartar", "seatea", "erasetar", "tee", "see"]

getTestTrie = trie getTestAH

getTestAH = makeAnagramHelperFromList testWords

anagramsFor starter ah@Ah{trie=t, strings=m} [] = Just Empty
anagramsFor starter ah@Ah{trie=t, strings=m} s = anatree
where
s' = sort s
slist = getStringsFromTopWithString t s'
subtreeskeys = filter (\(x,y) -> y /=Nothing) $ map (\x -> (x,anagramsFor x ah (s' \\ x))) slist
subtreeskeys1 = filter (\(x,y) -> x>=starter) subtreeskeys
mkAnaTrieEntries m (x, Nothing) = m
mkAnaTrieEntries m (x, Just y) = M.insert x y m

anatree = if subtreeskeys1 == []
then Nothing
else Just Node{entries=foldl mkAnaTrieEntries M.empty subtreeskeys1, ender=Just s}

simpleTest s = do
let t = anagramsFor [] getTestAH s
t' = fromJust t
mapM_ putStrLn $ toString t' ""
mapM_ putStrLn $ map show $ pathsThrough t'

main = do
(wl, strings) <- getCommandLineArgs
putStrLn $ "Using dictionary : " ++ wl
ah <- makeAnagramHelperFromFile wl
mapM_ (getAnagrams ah) strings

getCommandLineArgs = do
al <- getArgs
if length al < 1
then error "Args : [-dictionary] strings "
else let (s:xs) = al !! 0
in if s == '-'
then return (xs,drop 1 al)
else return ("words/enable.list", al)

listVariants :: AnagramHelper -> [String] -> [[String]]
listVariants Ah{strings=m} sl = map getWords sl
where
getWords s = fromJust $ M.lookup s m

getAnagrams ah s = do
putStrLn $ "getting anagrams for " ++ s
let at = anagramsFor [] ah s
case at of
Nothing -> putStrLn $ "No anagrams for: " ++ s
Just t -> do
mapM_ putStrLn $ toString t ""
let p = pathsThrough t
mapM_ putStrLn $ map show p
let v = map (listVariants ah) p
mapM_ putStrLn $ map show v

Tuesday, May 05, 2009

Whole Kenken program

Here is the complete program including all the bits previously posted as well as some helper functions and the main driver. It is set up in such a way that you can load it into ghci and then run "doPuzzle filename" to run a puzzle.


import System
import Char
import Maybe
import Control.Monad.State

-- a puzzle has a size (so we know the limit of values to use)
-- its original input as a list of strings (just in case we want to print it)
-- a list of constraints
-- and a list of cells with position/label
-- the cells could be a list of lists, but doing the lookup in a data set
-- this size isn't likely to be the limiting factor and we'll abstract
-- over getting a cell by x,y coordinates anyway

data Puzzle = Puzzle {
psize :: Int,
origInput :: [String],
constraints :: [Constraint],
pcells :: [Cell]
}

-- constraints have labels (from the input description)
-- operations (the arithmetic operators as strings)
-- target values
-- and a list of the cells that make up the constraint
data Constraint = Constraint
{ conlabel :: String,
conop :: String,
contarget :: Int,
concells :: [Cell]
}
deriving Show

-- each cell in the puzzle has a position (cx, cy) and a label
-- corresponding to the constraint it is in
data Cell = Cell {
clabel :: String,
cx :: Int,
cy :: Int
}
deriving (Eq, Show)

-- an assignment is, well, an assignment of a value to a cell

data Assignment = Assignment { acell :: Cell, avalue :: Int }
deriving (Show, Eq)

-- a possibility represents a "possible" solution to the puzzle

type Possibility = [ Assignment ]

-- The PuzzleM type contains the base puzzle

type PuzzleM = StateT Puzzle IO

getPuzzle :: PuzzleM Puzzle
getPuzzle = get

getConstraints = do
p <- getPuzzle
return $ constraints p

-- not a fancy show, but shows the pieces - quick and easy
instance Show Puzzle where
show (Puzzle{psize=s, origInput=inp,constraints=cos,pcells=ces}) =
unlines $ ["Puzzle::", "size="++(show s)]
++ inp
++ (map show cos)
++ (map show ces)


strip l = sl
where
sl = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace l

parse :: String -> Puzzle
parse s = Puzzle {psize=size, origInput=plines, constraints=constraintList, pcells=cellList}
where
plines = map strip $ lines s
(cellLines, constraintLines) = break ([]==) plines
size = length cellLines
cellList = doCellLines 0 cellLines
constraintList = parseConstraintLines cellList $ tail constraintLines

doCellLines :: Int -> [String] -> [Cell]
doCellLines i [] = []
doCellLines i (l:ls) = let l1 = zip [0..] l
mkcell (xpos, y) = Cell { clabel=[y], cx=xpos, cy=i}
l2 = map mkcell l1
in l2 ++ (doCellLines (i+1) ls)

parseConstraintLines cells lines = map (parseConstraint cells) (filter ("" /=) lines)

parseConstraint cells l = Constraint {conlabel= label,
conop = op,
contarget = target,
concells = clist }
where
(label,rest) = break ('='==) l
(starget, op) = break (not.isDigit) $ tail rest
target = read starget
clist = filter (\c -> clabel c == label) cells

showPuzzle = do
p <- getPuzzle
liftIO $ putStrLn $ show p

solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList = return assList
solve cl@(c:cs) assList = do
s <- psize `liftM` getPuzzle
let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
passes = map (:assList) pass
solve1 cs passes

solve1 cells [] = return []
solve1 cells pl@(p:ps) = do
good <- okSoFar p
if good
then do solved <- solve cells p
if solved /= []
then return solved
else solve1 cells ps
else solve1 cells ps

allRowsOK p = do
s <- psize `liftM` getPuzzle
return $ and $ map (rowOK s p) [0..s-1]

allColsOK p = do
s <- psize `liftM` getPuzzle
return $ and $ map (colOK s p) [0..s-1]

rowOK s plist row = allDifferent (map avalue inrow)
where
inrow = filter (\x -> (row == (cy $ acell x))) plist

colOK s plist col = allDifferent (map avalue incol)
where
incol = filter (\x -> (col == (cx $ acell x))) plist

allDifferent [] = True
allDifferent (x:xs) = (not $ elem x xs) && allDifferent xs

allConsOK p = do
conlist <- constraints `liftM` getPuzzle
return $ and $ map (conOK p) conlist

conOK p constraint = checkCon convals contype target cl
where
concl = concells constraint
convals = map avalue $ filter (\x -> ( acell x) `elem` concl) p
contype = conop constraint
target = contarget constraint
cl = length concl

checkCon [] _ tgt _ = True
checkCon cl "=" tgt _ = tgt == cl !! 0
checkCon cl "*" tgt l = if length cl == l
then tgt == product cl
else 0== tgt `mod` (product cl)
checkCon cl "+" tgt l = if length cl == l
then tgt == sum cl
else tgt >= sum cl
checkCon cl "-" tgt _
| length cl > 2 = False
| length cl == 2 = abs(cl !! 0 - cl !! 1) == tgt
| length cl == 1 = True

checkCon cl "/" tgt _
| length cl > 2 = False
| length cl == 1 = True
| length cl == 2 = let a = cl !! 0
b = cl !! 1
in (a `div` b) == tgt || (b `div` a) == tgt

okSoFar p = do
rowsOK <- allRowsOK p
colsOK <- allColsOK p
consOK <- allConsOK p
return $ rowsOK && colsOK && consOK

showKnownCells al s = unlines $ map getRow [0..s-1]
where
getCellByRowCol al r c = filter (\x -> (r == (cy $ acell x)) && (c == (cx $ acell x))) al
getRow r = unwords $ map doCell $ map (getCellByRowCol al r) [0..s-1]
doCell [] = " "
doCell (x:xs) = show $ avalue x


runPuzzle = do
{- showPuzzle -}
cl <- pcells `liftM` getPuzzle
solve cl []

showPossibles p = unlines $ map show p

main = do
args <- getArgs
doPuzzle (args !! 0)


doPuzzle fn = do
inp <- readFile fn
let puzzle = parse inp

putStrLn inp
putStrLn "about to evalState puzzle..."
(soln,p) <- runStateT runPuzzle puzzle
putStrLn $ showKnownCells soln 6

Monday, May 04, 2009

Kenken Solver

The solver is the remaining major piece of the kenken program. It is simple enough here - the function solve takes a list of cells that are not yet assigned values, a list of cells with values (a "Possibility") and returns a "Possibility" that should, if not null, result in a solution. To do this it takes the next unassigned cell from the list of cells, makes a list of all the possible values it might take (that is the values from 1 up to the size of the puzzle - no culling is attempted) and tries to solve the puzzle with each of those values being assigned to the cell.

This looks like :

solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList = return assList
solve cl@(c:cs) assList = do
s <- psize `liftM` getPuzzle
let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
passes = map (:assList) pass
solve1 cs passes

solve1 cells [] = return []
solve1 cells pl@(p:ps) = do
good <- okSoFar p
if good
then do solved <- solve cells p
if solved /= []
then return solved
else solve1 cells ps
else solve1 cells ps


I think that if I used List as the base monad in the stack (instead of IO) I could have used the nondeterminism aspect to simplify this, but I did not, so here's what I have.

Friday, April 24, 2009

Simple kenken solver in haskell

Recently the NY Times started doing Kenken puzzles. These are numeric puzzles in the sudoku vein. For a good overview, see the wikipedia page where they have a nice sample puzzle.

I solved a couple of these and then decided that it was time to build a solver. My first solver (in Python) dissolved in a flurry of overcomplicated algorithms and data structures and I decided to start from scratch in Haskell (in part to try to improve my Haskell skills). First though, I needed an input format. I constructed one that was simple, easy to derive from a puzzle and easy to edit. In this format the puzzle grid is laid out with letters indicating the blocks and a list of constraints on the blocks on subsequent lines. Each constraint is a label (from the grid), an equals sign, a target value (numeric) and an operator ("+", "-", "/", "*", "=" - used when the value in the cell is set). This format has the advantage that it is easy for me to read and easy to parse.

The puzzle from the wikipedia entry is given below.

abbcdd
aeecfd
gghhfd
ggijkk
llijjm
nnnoom

a=11+
b=2/
c=20*
d=6*
e=3-
f=3/
g=240*
h=6*
i=6*
j=7+
k=30*
l=6*
m=9+
n=8+
o=2/



For example, the first block (labeled "a" in the square) needs to have a sum of 11.

My first solver was a simple backtracking recursive solver. It didn't use any constraint information except to verify that the current solution was ok.

To do this, I built several data structures. First, a Cell is an x,y location and a Constraint label (such as "a" above). I use the (x,y) information in the Cell to locate it rather than keeping a two dimensional array (or list of lists). This does mean that in several places I scan the list of cells to find a cell, but since the list of cells is typically small for these puzzles, that is not that much of a problem :


data Cell = Cell {
clabel :: String,
cx :: Int,
cy :: Int
}
deriving (Eq, Show)

Next, a Constraint is a label (from the puzzle input), a target value, an operation (as a string) and a list of cells. The list of cells could also be constructed as needed, but since checking the constraint always required looking at the list, I put this in.

data Constraint = Constraint
{ conlabel :: String,
conop :: String,
contarget :: Int,
concells :: [Cell]
}
deriving Show


An Assignment is a Cell - value pair, and we build up a list of possible assignments in a Possibility (that is, a possible solution). Assignments are not part of the puzzle, but are
carried around in the recursive calls.

data Assignment = Assignment { acell :: Cell, avalue :: Int }
deriving (Show, Eq)

type Possibility = [ Assignment ]

A puzzle has a size, the original input (which is useful for debugging), a list of constraints and a list of cells. Since I'd like to pass the puzzle around in lots of places, I'm building a State Monad of this as well.

data Puzzle = Puzzle {
psize :: Int,
origInput :: [String],
constraints :: [Constraint],
pcells :: [Cell]
}

type PuzzleM = StateT Puzzle IO


Next post: parsing the input.