that jefu guy

Tuesday, June 30, 2009

Anagram Code

Finally, here is the anagram code. It worked on my tests, but failed to build the anagram trie with the enable word list (which I no longer find a direct link to online). I'm getting a stack overflow error which implies there is a space leak and a bit of investigation turns up that it is somewhere in building the trie. I've tried different ways to force strictness, but to no (so far) avail. But if I run it with a smaller dictionary, or with +RTS -K16M, it works.

Notice that to keep from getting duplicates, the function "anagramsFor" only will produce a new word if it follows (or is equal to) "starter", which is set to empty to start, then to the last word generated. This should eliminate duplicates where anagramsFor generates a list like ["aaa", "bbb", "ccc"] and also one like ["aaa", "ccc", "bbb"].

It runs my (simple) test cases (using small dictionaries).

Here's the code :


{-# LANGUAGE BangPatterns #-} 

{- run with +RTS -K16M for enable wordlist -} 

import Char
import Data.List 
import System 
import qualified Data.Map as M
import Control.Monad.State
import Control.Monad.List

data Trie a b =   Empty 
                | Node { entries :: (M.Map a (Trie a b)),  ender :: Maybe b } 
                  deriving (Eq, Show) 

type AnagramTrie = Trie Char String 
data AnagramHelper = Ah { trie :: AnagramTrie, strings ::  M.Map String [String]} 
                     deriving Show 

type AnagramTree = Trie String [String] 

emptyHelper = Ah { trie = Empty, strings = M.empty } 

trieInsert :: (Ord a, Show a) => [a] -> b -> Trie a b -> Trie a b 

trieInsert [] w Empty = Node {entries=M.empty, ender=Just w} 
trieInsert [] w n = n  -- should never happen

trieInsert !l1@(x:xs) !l Empty = Node {entries=M.singleton x (trieInsert xs l Empty ), ender=Nothing} 
      
trieInsert !l1@(x:xs) !al !Node{entries=es, ender=en}      
  | M.member x es              = let !st = trieInsert xs al       in Node{entries=M.adjust st x es, ender=en} 
  | otherwise                  = let !st = trieInsert xs al Empty in Node{entries=M.insert x st es, ender=en} 

findInTrie Empty _ = Nothing
findInTrie Node{entries = es, ender=en} [] = en 
findInTrie Node{entries=es, ender=en} (c:cs)
  | M.member c es = findInTrie (fromJust $ M.lookup c es) cs
  | otherwise     = Nothing 

toString :: (Show a, Show b) => Trie a b   -> String -> [String]
toString Empty indent = [] 
toString Node{entries=es, ender=en} indent = enderLine:subtrees
   where 
       enderLine = case en of 
                      Nothing -> []  
                      Just l -> (indent ++ "-" ++ show l)
       doEntry k t =  (indent ++ " " ++ show [k]):(toString t ("  "++indent)) 
       subtrees =  concat $ M.elems $ M.mapWithKey doEntry es 

-- extract the paths in a trie (ignoring the entries, if any) 
pathsThrough :: Trie a b -> [[a]] 
pathsThrough Empty = [[]] 
pathsThrough Node{entries=es,ender=en} = l 
   where 
     l = concatMap (\(x,y) -> (map (x:)) (pathsThrough y)) $ M.assocs es 

allWhitespace s = and $ map isSpace s

makeAnagramHelperFromList l = foldl' ahInsert emptyHelper l

ahInsert !a@Ah{trie=t,strings= s} !word 
   | M.member sw s = Ah{trie=t, strings = M.insertWith' insertUnique sw [word] s}  
   | otherwise     = Ah{trie=trieInsert sw sw t, strings=M.insert sw [word] s} 
  where
    sw = sort word 
    insertUnique [w] l = if w `elem` l then l else (w:l)

clean x = filter (not.null) (nub x)   -- profiling shows this is a bit faster 

-- These are some common lists of words - one per line used here
-- these (obviously) live in my filesystem - you might use /usr/dict/share/words if
-- you're on a unix system 

-- wordfile = "words/american-english-large-no-quotes"
-- wordfile = "words/american-english-smaller"
-- wordfile = "words/shorter-wordlist"
-- wordfile = "words/random-words"
-- wordfile = "words/seethe"

strip s = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace s 

allchars w         = (not.or) $ map (\x -> x < 'a' || x > 'z') w

-- most dictionaries include all the letters on their own, so 
-- filter out all one letter words, but toss "a" and "i" back in
-- also strip the strings as there are some cases where they have trailing spaces 
-- sometimes there seem to be duplicates (probably because there are caps in some versions)
-- in this version duplicates are handled by the builder 

getWordsFromList l = ("a" :) $ ("i" :) $ filter (\x->(length x) > 1) $ filter allchars $ map strip $ (lines $ map toLower l)

makeAnagramHelperFromFile fn = do 
    rawWords <- readFile fn 
    let wordList = getWordsFromList rawWords  
    return $! makeAnagramHelperFromList wordList 

getStringsFromTopWithString :: AnagramTrie -> String -> [String] 

getStringsFromTopWithString Empty _ = [] 
getStringsFromTopWithString _  [] = [] 
getStringsFromTopWithString topTrie@Node{entries=es, ender=en} s@(c:cs) = 
  let maybeSubTree =  M.lookup c es 
  in case maybeSubTree of 
       Nothing -> [] 
       Just thisSubTree -> maybePrepend en $ findAllSubsetStringsInSubTree thisSubTree cs 
        
findAllSubsetStringsInSubTree Empty _ = [] 
findAllSubsetStringsInSubTree Node{ender=en} [] = maybePrepend en [] 
findAllSubsetStringsInSubTree t@Node{entries=es, ender=en} (c:cs) = 
   let next = M.lookup c es  -- does this first character in this node have a subtree ? 
       subs = findAllSubsetStringsInSubTree t $ dropWhile (==c) cs -- no, skip all of those and try the rest 
   in case next of 
        Nothing -> subs 
        Just t' -> maybePrepend en $ findAllSubsetStringsInSubTree t' cs ++ subs  

maybePrepend Nothing l = l 
maybePrepend (Just x) l = x:l

fromJust (Just l) = l 

testWords = ["at", "tear", "rat", "ate", "sea", "teas", "tea", "erase", "tar", "art", "eat", "tears", "stare", "reset", "stares", "star" , "tartar", "seatea", "erasetar", "tee", "see"] 

getTestTrie = trie getTestAH   

getTestAH = makeAnagramHelperFromList testWords 

anagramsFor starter ah@Ah{trie=t, strings=m} [] = Just Empty
anagramsFor starter ah@Ah{trie=t, strings=m} s  = anatree  
    where 
       s' = sort s 
       slist = getStringsFromTopWithString t s' 
       subtreeskeys = filter (\(x,y) -> y /=Nothing) $ map (\x -> (x,anagramsFor x ah (s' \\ x))) slist 
       subtreeskeys1 = filter (\(x,y) -> x>=starter) subtreeskeys 
       mkAnaTrieEntries m (x, Nothing) = m 
       mkAnaTrieEntries m (x, Just y)  = M.insert x y m 
                                 
       anatree =  if subtreeskeys1 == [] 
                     then Nothing 
                     else Just Node{entries=foldl mkAnaTrieEntries M.empty subtreeskeys1, ender=Just s} 

simpleTest s = do 
  let t = anagramsFor [] getTestAH s
      t' = fromJust t 
  mapM_ putStrLn $ toString t' "" 
  mapM_ putStrLn $ map show $ pathsThrough t' 

main = do
          (wl, strings) <- getCommandLineArgs   
          putStrLn $ "Using dictionary : " ++ wl
          ah <- makeAnagramHelperFromFile wl
          mapM_ (getAnagrams ah) strings 

getCommandLineArgs = do 
   al <- getArgs
   if length al < 1 
      then error "Args : [-dictionary] strings " 
      else let (s:xs) = al !! 0  
           in if s == '-' 
              then return (xs,drop 1 al) 
              else return ("words/enable.list", al) 

listVariants :: AnagramHelper -> [String] -> [[String]] 
listVariants Ah{strings=m} sl = map getWords sl 
  where
    getWords s = fromJust $ M.lookup s m 

getAnagrams ah s = do 
   putStrLn $ "getting anagrams for " ++ s 
   let at = anagramsFor [] ah s 
   case at of 
      Nothing -> putStrLn $ "No anagrams for: " ++ s 
      Just t  -> do 
                    mapM_ putStrLn $ toString t "" 
                    let p = pathsThrough t 
                    mapM_ putStrLn $ map show p 
                    let v = map (listVariants ah) p 
                    mapM_ putStrLn $ map show v

Monday, June 29, 2009

ICFP programming contest

Last Friday at 11:16 (local time) the 2009 ICFP programming contest started. One of our students and I gave it a try. Currently we're about 190/317 (with about 30 minutes to go) which isn't great. But it was fun and I spent much of my time between the start of the contest and yesterday about 5 pm working on it.

Lots of things I did wrong - our virtual machine ran nicely (once a problem with the specification was fixed and announced) but our interface to it started all wrong. Once that was fixed there were a bunch of bugs (as usual) many of them dealing with signs of quantities. Anyway we solved all the first level problems and most of the second level problems and I had figured out the basics of the third level problems (I think).

Biggest thing that would have helped us was better communications. We were working remotely (from each other) and being in one place to share a whiteboard and be able to more quickly talk about what was wrong and draw pictures would have helped immensely.

Sunday, June 21, 2009

Reading with the Flu

Of course, I was wrong in my previous post about not having to worry about multiplicity (and indeed my code doesn't do that). But the code needs to be a bit cleaned up before posting.

Over the last four days I've had some kind of mini-flu. Temperature in the 100.5 range and a complete inability to feel either warm enough or cool enough for more than about an hour. Not important as such, and it seems to be gone now, but it did lead to my spending quite a bit of time reading. This has included samples from the most recent "New Yorker", CACM, and IEEE Computer, but relatively little online stuff (sitting up was just too tiring).

Most of the reading was science fiction/fantasy and not overly demanding (the kind of density that works well for airline travel and mild flu). I would note that I don't usually read anything like this much paper in most four day periods.

The City and the City by China Miéville. Miéville is a fascinating writer. I've read his three books set in Bas-Lag, a very strange world but one that hangs together nicely. These are : Perdido Street Station, The Scar, and The Iron Council and are often characterized as the "new weird" but while they are certainly weird enough in setting, the stories and themes are very human. His newest The City and the City is a murder mystery and in many ways fits the conventions of that genre pretty nicely. Except that instead of being set in somewhere that we might know, it is set in two twin but separate cities Bezel (there is an accent over the "z" that I don't know how to generate) and Ul Qoma. What makes this interesting is that the cities overlap in some undefined way, and residents learn to "unsee" things in the other city that may be occurring across the street (or closer). This prompted me (I hesitate to say "the reader") to contemplate just this kind of overlap that occurs for other reasons - perhaps Jerusalem or some city in Europe where the Jews were forced to live apart (not necessarily in ghettos - just "apart") or even a city in the US where classes may intermix without intermixing. Think of the oft mentioned example of a black man failing to get a cab in New York where a white man just down the block succeeds.

I have not yet read Un Lun Dun by Miéville but look forward to it. I also added more than a few books to my (far too large and unlikely ever to even begin to empty significantly) Amazon wishlist after reading a list of book suggestions by him.

Candle by John Barnes. Just as I was ready to put this down thinking I'd figured it all out, he twists things just enough, and then it happens again. I'm not sure it is worth a re-read, but it was a good read-once.

Against a Dark Backround by Iain M. Banks. I've read a couple of his "Culture" novels and liked them well enough. This was a good read, but in a kind of unrelenting sameness sort of way. Lots of imaginative touches, but tossed in almost at random. I must also admit that books and movies where everyone gets killed off (one by one by one) to be less than rewarding in general. There are exceptions, but there needs to be pretty substantial other rewards.

Killing Time by Caleb Carr. I've enjoyed Carr's 19th century mysteries, but not enough to seek out any more. This one fell into my hands (as did several of the others I read in the last couple days) at a yard sale where someone who had a serious interest in science fiction was selling things off. I wanted to like it, but it just didn't work.

I finished Before and After by Rosellen Brown. This was made into a movie a while back. The book is pretty good and well worth reading. The narration from the viewpoint of several characters reads a bit false though as only one of the characters writes in the first person - is this narrator also the writer of the other viewpoints? Certainly a possibility, but not one that seemed to me to add much to the story.

I also read Slant by Greg Bear and Beowulf's Children by Niven, Pournelle and Barnes - on which I will not comment, and started A Distant Mirror by Barbara Tuchman and Snowqueen by Joan Vinge.

Wednesday, June 17, 2009

One of the advantages of doing things this way is that now we can build the tree containing the pieces of the anagram in such a way as to guarantee that there are no duplicates. Consider "staringstaring" again. This is sorted before searching for the anagrams giving "aaggiinnrrsstt". Now look up in the trie the first character of that string. Any anagram of the string must contain that first character, so, of course, if we find no lookup for "a" then there are no anagrams. If we find all the words in the subtrie of the trie containing "a" and any other characters of the string, then we can look up all the anagrams for them in the top level of the trie using the same procedure and building subtrees along the way. These then get stitched together into the complete tree. Even better, we don't need to even bother with the second "a" when we start at the top again because we already have all the anagrams that use an "a" and the multiplicity doesn't matter.

Even better, if you look at the structure of the tree of anagrams we're building, it is (drum roll please) just a trie again but with the indices being strings, so if we paramaterize the Trie we get the second tree we're using essentially for free. In this trie though, we don't need to remember values at the end points (though having something there doesn't hurt), we use the path down through the trie as the information we need. Indeed, if we construct it correctly, every node in the trie corresponds to a partial anagram, so the path to every leaf node describes the anagram (of sorted words, remember).

Of course, we're not using this trie the same way - the first trie is used as a search device, the second as a storage device, but the shape of the trie is the same in both cases. Also, the first trie is built from the top down - starting with an empty top node and adding things to it as we go, and the second from the bottom up. Also in the latter case, we need to prune empty subtries (which correspond to "failure to anagram").

Friday, June 12, 2009

Anagrams

I do (sporadically) cryptic crosswords. I used to do regular crosswords, but after a while they became less challenging - when I could do the NY Times Sunday Crossword in an hour or so (with perhaps five or six squares open), I started paying more attention to cryptics.

I'd been exposed to cryptics when I was in the Peace Corps and we could, from time to time, get the Zambian newspaper which had a cryptic although those were mostly beyond my skills. When I returned to the US, I found Stephen Sondheim's book of cryptics and was hooked - though they were difficult indeed, I could at least make headway with them (and now I wish I'd xeroxed the pages so I could do them again).

In any case, cryptics often use anagrams, so one day a while ago I wrote an anagram generator in Haskell. You gave it a dictionary and a string and it would generate all the anagrams from that string consisting of words in the dictionary. It was messy and didn't cope with repeated anagrams well.

Just a short time ago I decided to redo this and try to make it cleaner. I decided to reuse one of the main parts of the previous version, that of using a trie to store the words from the dictionary, but to change it around a bit. A trie is a kind of tree that stores subtrees indexed by prefix elements (in this case characters) of the objects stored in it. (Wikipedia has a much better picture than I can manage, I suspect). In the previous version, the nodes at the end of each string contained the words that ended up there. Words? Yes, I sorted the characters in the word before entering it in the trie, so multiple words ("tar", "rat", "art") could map to a single node. In this version, I'm still going to use a trie in essentially the same way, but instead of storing all the words that sort to the same string in the nodes, I'm going to store only the sorted string and provide for a map (Data.Map) to associate the sorted string with the list of words that are anagrams.

The main problem with the previous version was that a word could have multiple anagrams and these might (given the way I searched the trie) pop up in different orders. But the duplicates were always generated (which was both space and time costly) and then duplicates were eliminated (time costly). For example, "staring" would have ["rat", "sing"], ["tar", "gins"], ["art", "sing"], ["sing", "art"] appear in the list of anagrams, but these are really all more or less the same and could have been better represented as ["art", "igns"] where each anagram is in sorted order, and the anagrams themselves are sorted. Then, a final step could generate all the possibilities from this string by looking at the mapped values of "art" and "igns" and making all the combinations.

But there's another step. Consider "staringstaring". This sorts to "aaggiinnrrsstt". Suppose that we determine that "agg" is (I think it is) the first word alphabetically in the string (generated by the english word "gag"). Then we can drop those characters from the string and find all the anagrams for "aiinnrrsstt" (assuming there are any). If we build a tree that has a node with a string in it and subnodes a list of similar trees so that any path from the root to a leaf consists of a list of strings whose sorted concatenation is the string we're looking for anagrams for, and if we do it correctly, then it becomes easy to produce all the anagrams for the string.

More in a bit.

Thursday, May 07, 2009

What I've been reading

I read a bit. Typically have a book or two I'm working on scattered around my house in different places where I might end up reading for a while. From time to time I'll post on what I'm reading.

My current (or recent) selections includes:

Dreadnought by Robert K. Massie - a massive (> 1000 pages) history of England and Germany in the years leading up to the first world war. For some reason I've been reading a bit about the first world war a lot and find it very interesting and this book is no exception. The book focuses on the various personalities and especially on the people influencing the navies of the two countries. It is a bit scattershot for someone like me who doesn't know a bit more about the history of the time, bouncing from one personality to another, but is generally a good read and gives a nice overview of the people involved.

Bioinformatics by Volker Sperschneider - I do voluntary book reviews for Computing Reviews and this was a recent choice. I try to pick books on topics I'd like to know more about as well as on topics that I do know something about and this was mostly new material for me. I found it tough going most of the time and not as illuminating as I might like. For instance he starts out without really framing the problem (analysis and construction of DNA) sequences and alternates between very formal discussions and sketchy views of things. It is published by Springer and I'm finding the books published by Springer to be generally poor in quality, but with some exceptions that are excellent.

Mage Guard of Hamor - L.E. Modesitt Jr. A couple of years back I picked up six of the "Magic of Recluse" series by Modesitt and found them eminently readable. Since then as I've found new books in the series I've picked them up and read them. They often get tagged as "young adult" but are good reads for most anyone who likes fantasy. They do tend to be a bit repetive (boy discovers magic powers, boy has trouble with magic powers and the current power structure, boy rises above it all), but the magic involved makes a certain amount of sense in contest and is not unlimited - that is, there are few places where suddenly the magicians suddenly discover powers that come from nowhere. And Modesitt is a good storyteller and that makes up for a lot of the deficiencies. Even better, while the story takes place in a single world, each book is more or less self contained (with a few of the stories spanning two books) and each gives a different view of the world.

One nice thing is that the stories jump back and forth in history, so you get another view of what happens. In the first few books the heros tend to focus on "black magicians" and the "white magicians" are portrayed as being more or less evil, but as the series progresses we also get views of white magicians that manage to portray them as being good as well. I've not started any of his other fantasy novels, but if more get published in this series I'll probably read them as well.

Wednesday, May 06, 2009

Kenken Comments

My (previously posted) simple kenken solver solved every kenken puzzle I tried it on but two (where I think I transcribed the problem wrong). For the most part it seemed fast enough - taking about a half a second per puzzle. Profiling shows that most of the work goes into checking the various constraints, so they might benefit from tuning.

I thought it might help to sort the cells before solving the puzzle to see if there was a benefit to (for instance) doing the cells in division and subtraction constraints first. It turns out that that doesn't help much. Worse yet, if the cells are sorted so that addition and multiplication are first the run time goes from less than a second to hours. I had expected the run time to increase, but the size of the increase was startling. After a bit of consideration though, the reason became apparent.

Currently the cells are processed along the top row, then the second row and so on, which means that once the top left cell has been assigned a tentative value, the cells in the first row (and first column) are already constrained by the row/column constraints as well as by the (local) constraints imposed by the blocks. Thus fewer possibilities need to be considered. If we have a puzzle in which there are two division constraints at two diagonally opposite corners, and these are considered first, then the row and column constraints will have little (or no) effect and the solver will be forced to consider many more possible values for the cells.

Thus, solving a row at a time from left to right is probably about as good an ordering as you can get for this (not very smart) solver.

Tuesday, May 05, 2009

Whole Kenken program

Here is the complete program including all the bits previously posted as well as some helper functions and the main driver. It is set up in such a way that you can load it into ghci and then run "doPuzzle filename" to run a puzzle.


import System 
import Char 
import Maybe 
import Control.Monad.State

-- a puzzle has a size (so we know the limit of values to use)
-- its original input as a list of strings (just in case we want to print it)
-- a list of constraints 
-- and a list of cells with position/label
-- the cells could be a list of lists, but doing the lookup in a data set
-- this size isn't likely to be the limiting factor and we'll abstract
-- over getting a cell by x,y coordinates anyway 

data Puzzle = Puzzle { 
      psize :: Int, 
      origInput ::  [String],  
      constraints :: [Constraint], 
      pcells :: [Cell] 
    } 
   
-- constraints have labels (from the input description)
-- operations (the arithmetic operators as strings)
-- target values 
-- and a list of the cells that make up the constraint 
data Constraint =  Constraint 
                   { conlabel :: String,
                     conop :: String,
                     contarget :: Int,
                     concells :: [Cell] 
                   }
                  deriving Show 

-- each cell in the puzzle has a position (cx, cy) and a label
-- corresponding to the constraint it is in 
data Cell = Cell {
              clabel ::  String,
              cx :: Int,
              cy :: Int
          }
          deriving (Eq, Show)

-- an assignment is, well, an assignment of a value to a cell

data Assignment = Assignment { acell :: Cell, avalue :: Int } 
                deriving (Show, Eq)

-- a possibility represents a "possible" solution to the puzzle

type Possibility = [ Assignment ] 

-- The PuzzleM type contains the base puzzle

type PuzzleM = StateT Puzzle IO

getPuzzle :: PuzzleM Puzzle 
getPuzzle =  get 

getConstraints = do 
  p <- getPuzzle 
  return $ constraints p 

-- not a fancy show, but shows the pieces - quick and easy 
instance Show Puzzle where
   show (Puzzle{psize=s, origInput=inp,constraints=cos,pcells=ces}) = 
     unlines $ ["Puzzle::", "size="++(show s)] 
                 ++ inp
                 ++  (map show cos) 
                 ++ (map show ces)


strip l = sl 
  where 
      sl = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace l 

parse :: String -> Puzzle
parse s = Puzzle {psize=size, origInput=plines, constraints=constraintList, pcells=cellList}
          where 
            plines = map strip $ lines s 
            (cellLines, constraintLines) = break ([]==) plines
            size = length cellLines 
            cellList = doCellLines 0 cellLines 
            constraintList = parseConstraintLines cellList $ tail constraintLines 

doCellLines :: Int -> [String] -> [Cell] 
doCellLines i []     = [] 
doCellLines i (l:ls) = let l1 = zip [0..] l 
                           mkcell (xpos, y) = Cell { clabel=[y], cx=xpos, cy=i} 
                           l2 = map mkcell l1 
                           in l2 ++ (doCellLines (i+1) ls) 

parseConstraintLines cells lines = map (parseConstraint cells) (filter ("" /=) lines)

parseConstraint cells l =  Constraint {conlabel= label, 
                                   conop = op, 
                               contarget = target,
                                   concells = clist } 
                    where 
                      (label,rest) = break ('='==) l 
                      (starget, op) = break (not.isDigit) $ tail rest 
                      target = read starget         
                      clist = filter (\c -> clabel c == label) cells 

showPuzzle = do 
                p <- getPuzzle
                liftIO $ putStrLn $ show p 

solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList  = return assList 
solve cl@(c:cs) assList  = do 
                           s <- psize `liftM` getPuzzle 
                           let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
                               passes = map (:assList) pass 
                           solve1 cs passes 

solve1 cells [] = return [] 
solve1 cells pl@(p:ps) = do 
                        good <- okSoFar p 
                        if good
                           then do solved <- solve cells p
                                   if solved /= [] 
                                      then return solved
                                      else solve1 cells ps 
                           else solve1 cells ps 

allRowsOK p = do 
                s <- psize `liftM` getPuzzle
                return $ and $ map (rowOK s p) [0..s-1] 

allColsOK p = do 
                s <- psize `liftM` getPuzzle
                return $ and $ map (colOK s p) [0..s-1] 

rowOK s plist row = allDifferent (map avalue inrow) 
    where 
      inrow = filter (\x -> (row == (cy $ acell x))) plist 

colOK s plist col = allDifferent (map avalue incol) 
    where 
      incol = filter (\x -> (col == (cx $ acell x))) plist 
                               
allDifferent [] = True 
allDifferent (x:xs) = (not $ elem x xs) && allDifferent xs 

allConsOK p = do 
                 conlist <- constraints `liftM` getPuzzle  
                 return $ and $ map (conOK p) conlist 

conOK p constraint = checkCon convals contype target cl
    where 
      concl = concells constraint 
      convals = map avalue $ filter (\x -> ( acell x) `elem` concl) p
      contype = conop constraint 
      target = contarget constraint 
      cl = length concl 

checkCon [] _   tgt _ = True       
checkCon cl "=" tgt _ = tgt == cl !! 0
checkCon cl "*" tgt l = if length cl == l 
                           then tgt == product cl 
                           else 0== tgt `mod` (product cl) 
checkCon cl "+" tgt l = if length cl == l
                           then tgt == sum cl 
                           else tgt >= sum cl 
checkCon cl "-" tgt _
  | length cl > 2   = False 
  | length cl == 2  = abs(cl !! 0 - cl !! 1) == tgt 
  | length cl == 1  = True

checkCon cl "/" tgt  _
  | length cl > 2   = False 
  | length cl == 1  = True 
  | length cl == 2  = let a = cl !! 0 
                          b = cl !! 1
                      in  (a `div` b) == tgt  || (b `div` a) == tgt

okSoFar p = do 
               rowsOK <- allRowsOK p 
               colsOK <- allColsOK p 
               consOK <- allConsOK p 
               return $ rowsOK && colsOK && consOK 

showKnownCells al s =   unlines $ map getRow [0..s-1] 
                    where 
                      getCellByRowCol al r c = filter (\x -> (r == (cy $ acell x)) && (c == (cx $ acell x))) al 
                      getRow r = unwords $ map doCell $ map (getCellByRowCol al r) [0..s-1] 
                      doCell [] = " "
                      doCell (x:xs) = show $ avalue x 
                      

runPuzzle  = do
                {- showPuzzle  -} 
                cl <- pcells `liftM` getPuzzle 
                solve cl [] 

showPossibles p =   unlines $ map show p

main = do 
          args <- getArgs 
          doPuzzle (args !! 0)


doPuzzle fn = do 
  inp <- readFile fn 
  let puzzle = parse inp 
  
  putStrLn inp 
  putStrLn "about to evalState puzzle..."
  (soln,p) <- runStateT runPuzzle puzzle
  putStrLn $ showKnownCells soln 6

Monday, May 04, 2009

Kenken Solver

The solver is the remaining major piece of the kenken program. It is simple enough here - the function solve takes a list of cells that are not yet assigned values, a list of cells with values (a "Possibility") and returns a "Possibility" that should, if not null, result in a solution. To do this it takes the next unassigned cell from the list of cells, makes a list of all the possible values it might take (that is the values from 1 up to the size of the puzzle - no culling is attempted) and tries to solve the puzzle with each of those values being assigned to the cell.

This looks like :


solve :: [Cell] -> Possibility -> PuzzleM Possibility
solve [] assList  = return assList 
solve cl@(c:cs) assList  = do 
                           s <- psize `liftM` getPuzzle 
                           let pass = map (\v -> Assignment{ acell=c, avalue=v}) [1..s]
                               passes = map (:assList) pass 
                           solve1 cs passes 

solve1 cells [] = return [] 
solve1 cells pl@(p:ps) = do 
                        good <- okSoFar p 
                        if good
                           then do solved <- solve cells p
                                   if solved /= [] 
                                      then return solved
                                      else solve1 cells ps 
                           else solve1 cells ps

I think that if I used List as the base monad in the stack (instead of IO) I could have used the nondeterminism aspect to simplify this, but I did not, so here's what I have.

Wednesday, April 29, 2009

Kenken constraints

There are several constraints in a kenken puzzle. First, each row and each column must have the numbers from 1 to the size of the puzzle with no repeats. Then each arithmetic constraint needs to be satisfied.

The row and column constraints are easy enough - just require that all the numbers in a row or column are different - and this works for partial rows/columns as well as full ones. Only the row code is included here (a following post will contain all of the code). This code checks all of the rows using a map that checks each row by index and uses a helper function "allDifferent" that checks to be sure that all the numbers in a row are different. I suspect there may be a better way using "nub" but this is simple enough.


allRowsOK p = do 
                s <- psize `liftM` getPuzzle
                return $ and $ map (rowOK s p) [0..s-1] 

rowOK s plist row = allDifferent (map avalue inrow) 
    where 
      inrow = filter (\x -> (row == (cy $ acell x))) plist 

allDifferent [] = True 
allDifferent (x:xs) = (not $ elem x xs) && allDifferent xs

The constraints are more difficult. These are checked with "checkCon" which takes a list of cell values, a target and an operator. A "+" constraint requires that the numbers so far add up to less than the target. A "*" constraint requires that the numbers so far add up to a divisor of the target. For divide and difference, a zero length list of cells is ok as is a list of cells with one entry and a list of cells with two entries is checked both ways and if either works the constraint is ok. This is tightened up a bit in a later version of the code, but this works for this simple version.


checkCon [] _   tgt _ = True       
checkCon cl "=" tgt _ = tgt == cl !! 0
checkCon cl "*" tgt l = if length cl == l 
                           then tgt == product cl 
                           else 0== tgt `mod` (product cl) 
checkCon cl "+" tgt l = if length cl == l
                           then tgt == sum cl 
                           else tgt >= sum cl 
checkCon cl "-" tgt _
  | length cl > 2   = False 
  | length cl == 2  = abs(cl !! 0 - cl !! 1) == tgt 
  | length cl == 1  = True

checkCon cl "/" tgt  _
  | length cl > 2   = False 
  | length cl == 1  = True 
  | length cl == 2  = let a = cl !! 0 
                          b = cl !! 1
                      in  (a `div` b) == tgt  || (b `div` a) == tgt

Tuesday, April 28, 2009

kenken parser

The parser for my kenken input format is simple and doesn't cope well with errors in the format. That's ok for me, as this was more or less an exercise in programming and not an attempt to build anything that anyone but me might use (and just became a blog post because, well, it was there).

In any case, I read in the file in one gulp using readFile, then pass that to the parse routine, so the parser is pure code. That gets passed to "parse" which breaks the input into lines (using "lines", natch), strips each input line of spaces and breaks the input into two parts at the first blank line. The size of the first list (the block of letters) is used to determine the size of the puzzle and then two helper functions are called, the first builds the list of cells and the second builds the list of constraints with the list of cells and the list of constraint definitions as its input.

Each of the lines in the block of cells is used to generate a Cell with the position derived using a counter passed to a recursive routine (I'd do it differently now, but rewriting would probably end up with my building a more robust parser and I'm not sure I want to do that).

Each constraint line is broken on the equals sign - the label for the constraint is the part before the equals, the target is the integer value of the list of digits and the operator (which must be present) is the last bit.

Without further ado, here is the parsing section of the solver (not all functions have types, but in later versions of this, the types are included pretty much everywhere).


parse :: String -> Puzzle
parse s = Puzzle {psize=size, origInput=plines, constraints=constraintList, pcells=cellList}
          where 
            plines = map strip $ lines s 
            (cellLines, constraintLines) = break ([]==) plines
            size = length cellLines 
            cellList = doCellLines 0 cellLines 
            constraintList = parseConstraintLines cellList $ tail constraintLines 

doCellLines :: Int -> [String] -> [Cell] 
doCellLines i []     = [] 
doCellLines i (l:ls) = let l1 = zip [0..] l 
                           mkcell (xpos, y) = Cell { clabel=[y], cx=xpos, cy=i} 
                           l2 = map mkcell l1 
                           in l2 ++ (doCellLines (i+1) ls) 

parseConstraintLines cells lines = map (parseConstraint cells) (filter ("" /=) lines)

parseConstraint cells l =  Constraint {conlabel= label, 
                                   conop = op, 
                               contarget = target,
                                   concells = clist } 
                    where 
                      (label,rest) = break ('='==) l 
                      (starget, op) = break (not.isDigit) $ tail rest 
                      target = read starget         
                      clist = filter (\c -> clabel c == label) cells 
strip l = sl 
  where 
      sl = reverse $ dropWhile isSpace $ reverse $ dropWhile isSpace l

Friday, April 24, 2009

Simple kenken solver in haskell

Recently the NY Times started doing Kenken puzzles. These are numeric puzzles in the sudoku vein. For a good overview, see the wikipedia page where they have a nice sample puzzle.

I solved a couple of these and then decided that it was time to build a solver. My first solver (in Python) dissolved in a flurry of overcomplicated algorithms and data structures and I decided to start from scratch in Haskell (in part to try to improve my Haskell skills). First though, I needed an input format. I constructed one that was simple, easy to derive from a puzzle and easy to edit. In this format the puzzle grid is laid out with letters indicating the blocks and a list of constraints on the blocks on subsequent lines. Each constraint is a label (from the grid), an equals sign, a target value (numeric) and an operator ("+", "-", "/", "*", "=" - used when the value in the cell is set). This format has the advantage that it is easy for me to read and easy to parse.

The puzzle from the wikipedia entry is given below.


abbcdd
aeecfd
gghhfd
ggijkk
llijjm
nnnoom

a=11+
b=2/
c=20*
d=6*
e=3-
f=3/
g=240*
h=6*
i=6*
j=7+
k=30*
l=6*
m=9+
n=8+
o=2/

For example, the first block (labeled "a" in the square) needs to have a sum of 11.

My first solver was a simple backtracking recursive solver. It didn't use any constraint information except to verify that the current solution was ok.

To do this, I built several data structures. First, a Cell is an x,y location and a Constraint label (such as "a" above). I use the (x,y) information in the Cell to locate it rather than keeping a two dimensional array (or list of lists). This does mean that in several places I scan the list of cells to find a cell, but since the list of cells is typically small for these puzzles, that is not that much of a problem :


data Cell = Cell {
            clabel ::  String,
            cx :: Int,
            cy :: Int
        }
        deriving (Eq, Show)

Next, a Constraint is a label (from the puzzle input), a target value, an operation (as a string) and a list of cells. The list of cells could also be constructed as needed, but since checking the constraint always required looking at the list, I put this in.


data Constraint =  Constraint
                  { conlabel :: String,
                    conop :: String,
                    contarget :: Int,
                    concells :: [Cell]
                  }
                 deriving Show

An Assignment is a Cell - value pair, and we build up a list of possible assignments in a Possibility (that is, a possible solution). Assignments are not part of the puzzle, but are
carried around in the recursive calls.


data Assignment = Assignment { acell :: Cell, avalue :: Int }
               deriving (Show, Eq)

type Possibility = [ Assignment ]

A puzzle has a size, the original input (which is useful for debugging), a list of constraints and a list of cells. Since I'd like to pass the puzzle around in lots of places, I'm building a State Monad of this as well.


data Puzzle = Puzzle {
     psize :: Int,
     origInput ::  [String], 
     constraints :: [Constraint],
     pcells :: [Cell]
   }

type PuzzleM = StateT Puzzle IO

Next post: parsing the input.

Wednesday, May 19, 2004

Music downloading

I'm getting increasingly irritated by the use of the words "theft" and "piracy" with respect to music (and such) downloading from the web.

While I'm no IP radical by any means, downloading music is not stealing in any real way. Copyright (in the US anyway) is a limited term legal gift of a monopoly to content creators from the people of the country. I'll grant you that downloaders are taking back their gift early and without the full blessing of the law - but its not theft (or piracy).

If anything is stealing (or piracy) it is corporations like Disney getting copyright indefinitely extended. And in terms of the value that the corporations stole from us all with the copyright extension act, I'd say they are by far the bigger thieves.

And I don't even download music.

Sunday, May 16, 2004

Just a first post

Trying a first post.

Or, as the tradition goes....

Hello world!