Turning a file of natural language into a syntax tree.
§1. Everything is now set up to read in a text file, break it into sentences, and then hand each one to linguistics in turn to construct syntax diagrams.
We need to tell linguistics which parts of a verb we will allow in these sentences — the answer being, all of them. (Inform is more restrictive.)
define ALLOW_VERB_IN_ASSERTIONS_LINGUISTICS_CALLBACK Diagramming::allow_in_assertions
int Diagramming::allow_in_assertions(verb_conjugation *vc, int tense, int sense, int person) { return TRUE; }
parse_node_tree *syntax_tree = NULL; parse_node_tree *Diagramming::test_diagrams(text_stream *arg, int raw) { syntax_tree = SyntaxTree::new(); Turn the file into a syntax tree2.1; Use the linguistics module on each sentence2.2; return syntax_tree; }
§2.1. Turn the file into a syntax tree2.1 =
filename *F = Filenames::from_text(arg); feed_t FD = Feeds::begin(); source_file *sf = TextFromFiles::feed_into_lexer(F, NULL_GENERAL_POINTER); wording W = Feeds::end(FD); if (sf == NULL) { PRINT("File has failed to open\n"); return NULL; } Sentences::break(syntax_tree, W);
- This code is used in §2.
§2.2. Use the linguistics module on each sentence2.2 =
SyntaxTree::traverse(syntax_tree, Diagramming::diagram); if (raw == FALSE) Diagramming::parse_noun_phrases(syntax_tree->root_node);
- This code is used in §2.
§3. The work of the words and syntax modules means that we now have a rudimentary syntax tree, in which each sentence is just a single SENTENCE_NT node without children. We look for these, and apply <sentence>, the most powerful nonterminal from the linguistics module, to them. All being well (i.e., if any sentence structure can be found), this returns a subtree of further nodes, which we graft below the SENTENCE_NT.
void Diagramming::diagram(parse_node *p) { if (Node::get_type(p) == SENTENCE_NT) { wording W = Node::get_text(p); if (<sentence>(W)) { parse_node *n = <<rp>>; switch (Annotations::read_int(p, linguistic_error_here_ANNOT)) { case TwoLikelihoods_LINERROR: Errors::nowhere("sentence has two certainties"); break; default: SyntaxTree::graft(syntax_tree, n, p); break; } } else { Errors::nowhere("sentence has no primary verb"); } } }
§4. That sorts out the verbs and prepositions, but the noun phrases are not by default parsed: they are simply left as UNPARSED_NOUN_NT nodes.
void Diagramming::parse_noun_phrases(parse_node *p) { for (; p; p = p->next) { if (Node::get_type(p) == UNPARSED_NOUN_NT) Nouns::recognise(p); Diagramming::parse_noun_phrases(p->down); } }