Learning to Automatically Solve Logic Grid Puzzles

Logic grid puzzle is a genre of logic puzzles in which we are given (in a natural language) a scenario, the object to be deduced and certain clues. The reader has to ﬁgure out the solution using the clues provided and some generic domain constraints. In this paper, we present a sys-tem, L OGICIA , that takes a logic grid puzzle and the set of elements in the puzzle and tries to solve it by translating it to the knowledge representation and reasoning language of Answer Set Programming (ASP) and then using an ASP solver. The translation to ASP involves extraction of entities and their relations from the clues. For that we use a novel learning based approach which uses varied supervision, including the entities present in a clue and the expected representation of a clue in ASP. Our system, L O - GICIA , learns to automatically translate a clue with 81.11% accuracy and is able to solve 71% of the problems of a corpus. This is the ﬁrst learning system that can solve logic grid puzzles described in natu-ral language in a fully automated manner. The code and the data will be made publicly available at http://bioai.lab. asu.edu/logicgridpuzzles .


Introduction
Understanding natural language to solve problems be it algebraic word problems Hosseini et al., 2014) or questions from biology texts (Berant et al., 2014;Kim et al., 2011), has attracted a lot of research interest over the past few decades. For NLP, these problems are of particular interest as they are concise, yet rich in information. In this paper, we attempt to solve another problem of this kind, known as Logic Grid Puzzle. Problem.1 shows an example of the same. Puzzle problems in the same spirit as the previously mentioned science problems, do not restrict the vocabulary; they use everyday language and have diverse background stories. The puzzle problems, however, are unique in their requirement of high precision understanding of the text. For a puzzle problem, the solution is never in the text and requires involved reasoning. Moreover, one needs to correctly understand each of the given clues to successfully solve a problem. Another interesting property is that only a small core of the world knowledge, noticeably spatial, temporal and knowledge related to numbers, is crucial to solve these problems.

PROBLEM .1 A LOGIC GRID PUZZLE
Waterford Spa had a full appointment calendar booked today. Help Janice figure out the schedule by matching each masseuse to her client, and determine the total price for each.
And the goal is to find out which elements are linked together based on a series of given clues. Each element is used only once. Each puzzle has a unique solution and can be solved using logical reasoning. A logic grid puzzle is called a (n, m)puzzle if it contains n categories and each category has m elements. For the example in Problem.1, there are three categories, namely clients, prices, masseuses and each category has four elements which are shown in the respective columns. A total of five clues are given in free text and the goal is to find the members of the four tuples, where each tuple shall contain exactly one element from each category such that all the members in a tuple are linked together.
To solve such a puzzle problem, it is crucial to understand the clues (for example, "Hannah paid more than Teri's client."). Each clue talks about a set of entities (for example, "Hannah", "client", "Terry") and their relations ("a greater-than relation between Hannah and the client of Terry on the basis of payment"). Our system, LOGICIA, learns to discover these entities and the underlying semantics of the relations that exist between them. Once the relations are discovered, a pair of Answer Set Programming (ASP) (Baral, 2003) rules are created. The reasoning module takes these ASP rules as input and finds a group configuration that satisfies all the clues. LOGICIA has "knowledge" about a fixed set of predicates which models different relations that hold between entities in a puzzle world. Clues in the puzzle text that are converted into ASP rules, use these predicates as building blocks. In this research, our goal is to build a system which can automatically do this conversion and then reason over it to find the solution. The set of predicates that the reasoning model is aware of is not sufficient to represent all logic grid puzzles. The family of logic grid puzzles is broad and contains variety of clues. Our future work involves dealing with such a diverse set of relations. In this work we assume that the relations in Table 1 are sufficient to represent the clues. Following are some examples of clues that cannot be modeled using the predicates in Table 1.
• Esther's brother's seat is at one end of the block of seven.
• The writer of Lifetime Ambition has a first name with more letters than that of the tennis star.
• Edward was two places behind Salim in one of the lines, both being in odd-numbered positions.
• Performers who finished in the top three places, in no particular order, are Tanya , the person who performed the fox trot, and the one who performed the waltz.
The rest of the paper is organized as follows: in section 2, we describe the representation of a puzzle problem in ASP and delineate how it helps in reasoning; in section 3, we present our novel method for learning to automatically translate a logic problem described in natural language to its ASP counterpart. In section 4, we describe the related works. In section 5, we discuss the detailed experimental evaluation of our system. Finally, section 6 concludes our paper.

Puzzle Representation
Answer Set Programming (ASP) (Baral, 2003;Lifschitz, 1999;Gelfond and Lifschitz, 1991) has been used to represent a puzzle and reason over it. This choice is facilitated by the two important reasons: 1) non-monotonic reasoning may occur in a puzzle (Nagy and Allwein, 2004) and 2) ASP constructs greatly simplify the reasoning module, as we will see in this section. We now briefly describe a part of ASP. Our discussion is informal. For a detailed account of the language, readers are referred to (Baral, 2003).

Answer Set Programming
An answer set program is a collection of rules of the form, where each of the L i 's is a literal in the sense of a classical logic. Intuitively, the above rule means that if L k+1 , ..., L m are to be true and if L m+1 , ..., L n can be safely assumed to be false then at least one of L 0 , ..., L k must be true. The left-hand side of an ASP rule is called the head and the right-hand side is called the body. A rule with no head is often referred to as a constraint. A rule with empty body is referred to as a f act and written as, Example fly(X) :-bird(X), not ab(X). The above program represents the knowledge that "Most birds fly". If we add the following rule (f act) to the program, bird(penguin). the answer set of the program will contain the belief that penguins can fly, {bird(penguin), f ly(penguin)}. However, adding one more fact, 'ab(penguin).', to convey that the penguin is an abnormal bird, will change the belief that the penguin can fly and correspondingly the answer set, {bird(penguin), ab(penguin)}, will not contain the fact, f ly(penguin). Rules of this type allow inclusion in the program's answer sets of arbitrary collections S of atoms of the form p(t) such that, m ≤| S |≤ n and if p(t) ∈ S then q(t) belongs to the corresponding answer set.

Representing Puzzle Entities
A (m, n)-puzzle problem contains m categories and n elements in each category. The term 'puzzle entity' is used to refer to any of them. Each category is assigned an unique index, denoted by the predicate cindex/1 (the number after the '/' denotes the arity of the predicate). The predicate etype/2 captures this association. Each element is represented, by the element/2 predicate which connects a category index to its element. The predicate eindex/1, denotes the tuple indices. The following blocks of code shows the representation of the entities for the puzzle in Problem.

Representing Solution
Solution to a logic grid puzzle is a set of tuples containing related elements. The tuple/3 predicate captures this tuple membership information of the elements. For example, the fact, tuple(2, 1, aimee), states that the element aimee from the category with index 1 is in the tuple 2. The rel/m predicate captures all the elements in a tuple for a (m, n)-puzzle and is defined using the tuple/3 predicate.

Domain Constraints
In the proposed approach, the logic grid puzzle problem is solved as a constraint satisfaction problem. Given a puzzle problem the goal is to enumerate over all possible configurations of tuple/3, and select the one which does not violate the constraints specified in the clues. However, 1) each tuple in a logic grid puzzle will contain exactly one element from each category and 2) an element will belong to exactly one tuple. These constraints come from the specification of a puzzle problem and will hold irrespective of the problem instance. Following blocks of code show an elegant representation of these domain constraints in ASP along with the enumeration.

Representing clues
Each clue describes some entities and the relations that hold between them. In its simplest form, the relations will suggest if the entities are linked together or not. However, the underlying semantics of such relations can be deep such as the one in clue 5 of Problem.1. There are different ways to express the same relation that holds between entities. For example, in Problem.1, the possessive relation has been used to express the linking between clients and masseuses; and the word paid expresses the linking between the clients and the prices. Depending on the puzzles the phrases that are used to express the relations will vary and it is crucial to identify their underlying semantics to solve the problems in systematic way.
In the current version, the reasoning module has knowledge of a selected set of relations and the translation module tries to represent the clue as a conjunction of these relations. All these relations and their underlying meanings are described in table 1. In this subsection, we describe the representation of a clue in terms of these relations in ASP and show how it is used by the reasoning module. In the next section, we present our approach to automate this translation.
Let us consider the clues and their representation from Problem.1: [1] Hannah paid more than Teri's client.
The first rule clue1 evaluates to true (will be in the answer set) if the element from category 1 with value hannah is linked to some element from category 2 which has a higher value than the element from its own category which is linked to an element from category 1 which is linked to teri from category 3. Since the desired solution must satisfy the relations described in the clue, the second ASP rule is added. A rule of this form that does not have a head is known as a constraint and the program must satisfy it to have an answer set. As the reasoning module enumerates over all possible configurations, in some cases the clue1 will not hold and subsequently those branches will be pruned. Similar constraints will be added for all clues. In the below, we show some more examples. A configuration which satisfies all the clue constraints and the domain constraints described in the previous section, will be accepted as the solution to the puzzle.
[3] Hannah was either the person who paid $180 or Lynda's client.

Learning Translation
To automate the translation of a clue to the pair of ASP rules, the translation module needs to identify the entities that are present in the clue, their category and their value; and the underlying interpretations of all the relations that hold between them. Once all the relation instances {R 1 (arg 1 , ..., arg p 1 ),..., R q (arg 1 , ..., arg pq )} , in the clue are identified, the ASP representation of the clue is generated in the following way: clue : −R 1 (arg 1 ..., arg p 1 ), ..., R q (arg 1 ..., arg pq ) The entity classification problem for logic grid puzzles poses several challenges. First, the existence of a wide variety in the set of entities. Entities can be names of objects, time related to some event, numbers, dates, currency, some form of ID etc. And it is not necessary that the entities in puzzles are nouns. It can be verbs, adjectives etc. Second and of paramount important, the "category" of a puzzle "element" is specific to a puzzle problem. Same element may have different category in different problems. Also, a constituent in a clue which refers to an entity in a particular problem may not refer to an entity in another problem. We formalize this problem in this section and propose one approach to solve the problem. Next, we discuss the method that is used to extract relations from clues. To the best of our knowledge, this type of entity classification problem has never been studied before.

Entity Classification
The entity classification problem is defined as follows: Problem description Given m categories C 1 , ..., C m and a text T , each category C i , 1 ≤ i ≤ m, contains a collection of elements E i and an optional textual description d i . The goal is to find the class information of all the constituents in the text T . Each category contributes two classes, where one of them represents the category itself and the other represents an instance of that category. Also, a constituent may not refer to any category or any instance of it, in that case the class of that constituent is null. So, there are a total 2m+1 classes and a constituent will take one value from them.
The constituent "clients" in the fourth clue refers to the category C 1 .

Our approach
We model the Entity Classification problem as a decoding query on Pairwise Markov Network (Koller and Friedman, 2009;Kindermann et al., 1980;Zhang et al., 2001). A pairwise Markov network over a graph H, is associated with a set of node potentials {φ(X i ) : i = 1, ..., n} and a set of edge potentials {φ(X i , X j ) : (X i , X j ) ∈ H}. Each node X i ∈ H, represents a random variable. Here, each X i can take value from the set {1...2m + 1}, denoting the class of the corresponding constituent in the text T .
In our implementation, the node potential captures the chances of that node to be classified as one of the possible categories without being affected by the given text T . And the edge potentials captures hints from the context in T for classification. After constructing the pairwise Markov network, a decoding query is issued to obtain the configuration that maximizes the joint probability distribution of the pairwise Markov network in consideration. The proposed approach is inspired by the following two observations: 1) to find the class of a constituent one needs some background knowledge; 2) however, background knowledge is not sufficient on its own, one also needs to understand the text to properly identify the class of each constituent. For example, let us consider the word "person" in clue 5 of Problem.1. Just skimming through the categories, one can discover that the word "person" is very unlikely to be a instance of the category "prices", which is from her knowledge about those constituents. However a proper disambiguation may face an issue here as there are two different categories of human beings. To properly classify the word "person" it is necessary to go through the text.
The following paragraphs describe the construction of the grah H, and the algorithm that is used in the computation of associated set of node potentials and edge potentials.
Construction of the graph While constructing the graph, we assign a label, L, to each edge in H which will be used in the edge potential computation. Let D G denotes the dependency graph of the text T obtained from the Stanford dependency parser  and dep(v 1 , v 2 ) denotes the grammatical relation between (v 1 , v 2 ) ∈ D G . Then the graph, H, is constructed as follows: 1. Create a node in H for each constituent w j in T if w j ∈ D G .

Add an edge
3. Add an edge between a pair of nodes (X i , X j ) if the corresponding words are synonyms. L(X i , X j ) := synonymy.
4. Create a node for each element and category specified in the puzzle and add an edge from them to others if the corresponding string descriptions are 'same'. In this case, the edges are labeled as exact match.
5. If (X i , X j ) ∈ H and L(X i , X j ) = exact match and both of them are referring to a verb, then add more edges (X i , X j ) to H with label spatial symmetry, where L(X i , X i ) = L(X j , X j ).
Determining Node potentials For each element in the m category, a set of naive regular-expression based taggers are used to detect it's type (For example, "am-pm time"). Each element type maps to a WordNet (Miller, 1995) representative (For example, "time unit#n"). For each constituent w a similarity score, sim(w,c), is calculated to each class c ∈ {1...2m + 1}, in the following way: •Class c is denoting instance of some category C i Similarity scores are computed between the textual description of the constituent to both the WordNet representative of E i and the textual description d i using the HSO WordNet similarity algorithm (Hirst and St-Onge, 1998). The similarity score, sim(w,c), is chosen to be the maximum of them.
•Class c is denoting a category C i : sim(w,c) is assigned the value of HSO Similarity between the textual description and d i .
•Class c is null : In this case similarity is calculated using the following formula: where M AX HSO denotes the maximum similarity score returned by HSO algorithm, which is 16.
Node potential for each node X i ∈ H, corresponding to the constituent w j , are then calculated by, φ(X i = c) = 1 + sim(w j , c), ∀c Determining Edge potentials For each edge in the graph H, the edge potential, φ(X i , X j ) is calculated using the following formula, In the training phase, each entity in a clue is tagged with its respective class. The probability values are then calculated from the training dataset using simple count.

Learning To Extract Relations
The goal here is to identify all the relations R(arg 1 , ..., arg p ) that are present in a clue, where each relation belongs to the logical vocabulary described in Table 1 . This problem is known as Complex relation extraction (McDonald et al., 2005;Bach and Badaskar, 2007;Fundel et al., 2007;Zhou et al., 2014). The common approach for solving the Complex relation extraction problem is to first find the relation between each pair of entities and then discover the complex relations from binary ones using the definition of each relation. Figure 1 depicts the scenario. The goal is to identify the relation possDif f (E1, E2, E3), where E1, E2, E3 are constituents having a nonnull class value. However instead of identifying posDif f (E1, E2, E3) directly, first the relation } are identified, the extraction module will infer that posDif f (E1, E2, E3) holds. In a similar manner, a set of total 39 binary relations are created for all the relations described in Table  1.
In the training phase, all the relations and their respective arguments in each clue are given. Using this supervision, we have built a Maximum Entropy based model (Berger et al., 1996;Della Pietra et al., 1997) to classify the relation between a pair of entities present in a clue. Maximum entropy classifier has been successfully applied in many natural language processing applications (Charniak, 2000;Chieu and Ng, 2002;Ratnaparkhi and others, 1996) and allows the inclusion of various sources of information without necessarily assuming any independence between the features. In this model, the conditional probability distribution is given by: where the denominator is the normalization term and the parameter λ i correspond to the weight for the feature f i . Features in Maximum Entropy model are functions from context and classes to the set of real numbers. A detailed description of the model or parameter estimation method used -Generalized Iterative Scaling, can be found at (Darroch and Ratcliff, 1972). Table 2 describes the features that are used in the classification task. Here, path(E 1 , E 2 ) denotes all the words that occur in the path(s) con-necting E 1 and E2 in the dependency graph of the clue.

Feature Set
Class of E 1 and E 2 All the grammatical relations between the words in path(E 1 , E 2 ) All the adjectives and adverbs in path(E 1 , E 2 ). POS tags of all the words in path(E 1 , All the words that appears in the following grammatical relations advmod, amod, cop, det with the words in path(E 1 , E 2 ). hasNegativeWord = [[ ∃w ∈ path(E 1 , E 2 ) s.t. w has a neg relation starting with it.]] The relation between each pair of entities in a clue is the one which maximizes the conditional probability in equation (1).

Missing Entity
In the case of comparative relations in Table 1, such as greaterT han, the basis of the comparison can be hidden. For example, in clue 1 of the example problem, the two entities, "Hannah" and "client" have been compared on the basis of "price", however there is no constituent in the clue which refers to an element from that category. The basis of comparison is hidden in this case and is implied by the word "paid". In the current implementation, the translation module does not handle this case. For puzzles that contain only one category consisting of numeric elements, the translation module goes with the obvious choice. This is part of our future work.

Related Work
There has been a significant amount of work on the representation of puzzle problems in a formal language (Gelfond and Kahl, 2014;Baral, 2003;Celik et al., 2009). However, there has not been any work that can automatically solve a logic grid puzzle. The latest work (Baral and Dzifcak, 2012) on this problem, assumes that the entities in a clue are given and the authors manually simplify the sentences for translation. Furthermore their representation of logic grid puzzles does not consider the category of a variable in the formal representation i.e. uses element/1 and tuple/2 predicates and thus cannot solve puzzles containing more than one numeric categories.
In the same work (Baral and Dzifcak, 2012), the authors propose to use a semantic parser to do the translation. This method works well for simple sentences such as "Donna dale does not have green fleece" however it faces several challenges while dealing with real world puzzle sentences. The difficulty arises due to the restrictions enforced in the translation models used by the existing semantic parsers. Traditional semantic parsers (Vo et al., 2015;Zettlemoyer and Collins, 2005) assign meanings to each word in a dictionary and combine the meaning of the words to characterize the complete sentence. A phrase structure grammar formalism such as Combinatory Categorial Grammar (Steedman and Baldridge, 2011;Vo et al., 2015;Collins, 2005), Context Free Grammar (Aho andUllman, 1972;Wong and Mooney, 2006), is normally used to obtain the way words combine with each other. In the training phase, the semantic parser learns the meanings of words given a corpus of <sentence, meaning> pairs and stores them in a dictionary. During translation, the semantic parser uses those learned meanings to obtain the meaning of the sentence. Firstly, for the puzzle problems the meaning of the words changes drastically depending on the puzzle. A word may be an entity in one puzzle, but, in a different problem it might not be an entity or might belong to a different category altogether. Thus a learned dictionary may not be useful while translating clues in a new puzzle. Secondly, in puzzles relations are normally expressed by phrases. For example, in the clue "The person who played at Eden Gardens played for India", the phrases "played at" and "played for" are used to express two different relations. Thus, using a model that assigns meaning to each word may not be suitable here. Finally, it is difficult to identify the participants of a relation with a parse tree generated following a phrase structure grammar. For example, consider the parse tree of the clue "The person who trekked for 8 miles started at Bull Creek". Even though, the relation "started at" takes the word 'person' and 'Bull Creek' as its input, it receives the entire phrase "the person who trekked for 8 miles" as its argument along with the other input 'Bull Creek'.
The entity classification problem studied in this Figure 2: Parse tree of an example sentence in Combinatory categorial grammar research shares many similarity with Named Entity Recognition (Nadeau and Sekine, 2007;Zhou and Su, 2002) and the Word Sense disambiguation (Stevenson and Wilks, 2003;Sanderson, 1994) task. However, our work has a major difference; in the entity classification problem, the class of an entity varies with the problem and does not belong to a known closed set, whereas for the other two problems the possible classes are pre-specified.

Experimental Evaluation
Dataset To evaluate our method we have built a dataset of logic grid puzzles along with their correct solutions. A total of 150 problems are collected from logic-puzzles.org. Out of them 100 problems are fully annotated with the entities and the relations information. The remaining 50 puzzles do not have any annotation except their solution. The set of annotated puzzles contain a total of 467 clues, 5687 words, 1681 entities and 862 relations. The set of 50 puzzles contain a total of 222 clues with 2604 words.
Tasks We evaluate LOGICIA on three tasks: 1) puzzle solving; 2) entity classification; and 3) relation extraction. We use the percentage of correct answers as the evaluation metric for all the three tasks. In case of a logic grid puzzle solving, an answer is considered correct if it exactly matches the solution of that puzzle.
Training-Testing Out of the 100 annotated puzzle problems 50 are used as training samples and remaining 50 puzzles are used in testing. The set of 50 unannotated puzzles are used solely for the task of testing puzzle solving. Binary relation classification Relation extraction  Solution  with annotation  with annotation  Yes  No  Yes  No  Total  1766  960  450  50  Correct  1502  922  854  410  365  37  Percentage 85.05% 96.04% 88.95% 90.90% 81.11% 74% Table 3: Accuracy on 50 annotated puzzle problems in the Test set.

Entity classification
Results Table 3 & 4 shows the efficacy of our approach in solving logic grid puzzles with the selected set of relations. LOGICIA is able to classify the constituents with 85.05% accuracy and is able to solve 71 problems out of the 100 test puzzles. It should be noted that puzzle problems requires precise understanding of the text and to obtain the correct solution of a puzzle problem all the entities and their relations in the puzzle need to be identified. Columns 2 and 3 in Table 3 compares the performance on relation extraction when it is used in conjunction with the entity classification and when it directly uses the annotated entity.
Error Analysis The errors in entity classification falls into two major categories. In the first category, more knowledge of similarity is needed than what is currently obtained from the WordNet. Consider for example, the categories are "class number" and "class size" and the constituent is "20 students". Even though the constituent is closer to "class size", standard WordNet based similarity methods are unable to provide such information. In the second category, the WordNet similarity of the constituent to one of the classes is quite high due to their position in the WordNet hierarchy; however with respect to the particular problem the constituent is not an entity. The relation extraction task performs fairly well, however the binary relation classification task does not jointly consider the relation between all the entities and because of that if one of the necessary binary relation of a complex relation is misclassified, the extraction of the entire relation gets affected.

Conclusion & Future Work
This paper presents a novel approach for solving logic grid puzzle. To the best of our knowledge, this is a novel work with respect to the fact that that it can automatically solve a given logic grid puzzle.
There are several advantages of our approach. The inclusion of knowledge in terms of a vocabulary of relations makes it scalable. For puzzles which make use of a different set of constraints, such as "Lynda sat on an even numbered position", can be easily integrated into the vocabulary and the system can then be trained to identify those relations for new puzzles. Also, the proposed approach separates the representation from reasoning. The translation module only identifies the relation and their arguments; it is not aware of the meaning of those relations. The reasoning module, on the other hand, knows the definition of each relation and subsequently prunes those possibilities when relations appearing in a clue does not hold. This separation of representation from reasoning allows the system to deal with the complex relations that appear in a clue.
There are a few practical and theoretical issues which need to be addressed. One of those is updating the logical vocabulary in a scalable manner. Logic grid puzzle is a wide family of puzzles and it will require more knowledge of relations than what is currently available. Another challenge that needs to be addressed is the computation of similarity between complex concepts such as "size of class" and "20 students". Also, the case of "missing entity" (3.2) needs to be modeled properly. This work is the first step towards further understanding these important issues.