% link-parser internal command documentation.
%
% The internal help system first displays the hard-coded one-line description
% of each variable (or command) and its current and default values, and then
% the matching text entry from this file.

[graphics]
The meaning of the marked-up displayed words are as follows:
 [word]            Null-linked (unlinked) word
 word[!]           word classified by a regex
 word[!REGEX_NAME] word classified by REGEX_NAME (turn on by !morphology=1)
 word[~]           word generated by spell guessing (unknown original word)
 word[&]           word run-on separated by spell guessing
 word[?].POS       word is unknown (POS is found by the parser)
 word.POS          word found in the dictionary as word.POS
 word.#CORRECTION  word is likely a typo - got linked as CORRECTION

For dictionaries that support morphology (enable with !morphology=1):
 word=             A prefix morpheme
 =word             A suffix morpheme
 word.=            A stem

For more details see:
 https://www.abisource.com/projects/link-grammar/dict/

[constituents]
Accepted values are:
 0      Disabled (no constituent tree display)
 1      Treebank-style constituent tree
 2      Flat, bracketed tree [A like [B this B] A]
 3      Flat, treebank-style tree (A like (B this))

[spell]
If zero, spell-guessing corrections and run-on corrections of unknown
words are not performed.  Otherwise, this indicates the number of
spelling-correction guesses per unknown word. The number of run-on
corrections (word splits) of unknown words is not limited when
spell-guessing is enabled.

[width]
The terminal width, used for wrapping the printing of long sentence
diagrams. Normally, this is not needed, as the terminal width is
automatically adjusted when the terminal window is resized.

[verbosity]
The level of descriptive debug messages that will be printed.
Values 1-4 are appropriate for use by the program/library user.  Higher
values are intended for LG dictionary authors and library developers.
For each level, unless otherwise is noted, messages of lower verbosity
levels are included.

Some useful values:

 0      No prompt, minimal library messages
 1      Normal verbosity (its messages are included in all higher levels)
 2      Show times of the parsing steps
 3      More info messages
 4      Display data file search and locale setup

In the levels below, the messages of levels 2-4 are not included:

 5-9    Tokenizer and parser debugging
 10-19  Dictionary debugging
The output of these levels may be restricted to particular files and/or
functions that are listed (comma-separated) in the !debug variable.

The following levels are for particular information. The messages of levels
greater than 1 are not included in their output:

 101    Print all the dictionary connectors, along with their length limit
 102    Print all the disjuncts, before and after pruning
 103    Show unsubscripted dictionary words and subscripted ones which
        share the same base word
 104    Memory pool statistics

[morphology]
When False, whole words are displayed, without indicating any
morphological analysis that might have been performed. When True,
morphemes are shown as separate tokens, together with the link types
between them.  See "!help graphics" for additional info on morpheme
markup. The English dictionaries do not do morphological markup,
so this flag has almost no effect on English sentences.

This flag has one side-effect: if set to true, and a word is matched
by a RegEx, then matching dictionary entry is shown.

[limit]
The maximum number of linkages that are considered for post-processing.
Up to this many linkages are generated; if there are fewer parses than
this limit, then they will all be printed, in deterministic,
cost-ranked order. If there are more parses than this limit, then a
random subset will be printed. The !random option is used to control
whether this sampling will use a repeatable (deterministic) random
sequence, or not.

[cost-max]
Determines the largest disjunct cost considered during parsing. That is,
only disjuncts with a cost less than this are used during the parse;
higher-cost disjuncts are ignored.  Raising the max allowed cost will
typically produce more parses, although these are (far) less likely
to be correct.

[bad]
When True, also display linkages that are rejected by post-processing,
along with the name of the rule that resulted in the rejection.

This mode is useful when editing the dictionary or the post-processing
rule-set.  The invalid linkages will be printed after the valid ones.

The parser will only output the linkages it finds at whatever stage it
had gotten to when it found a valid linkage. For example, if it had
gotten to null-link stage 2 before finding its first valid linkage,
it will also output invalid linkages found at null-link stage 2.
There is no way of seeing invalid linkages found at earlier stages.

[short]
Determines the maximum allowed length for certain connectors. The
intended use is to speed up parsing by not considering very long links
for most connectors, since they are rarely needed in a correct parse.
Setting this too low will prevent valid parses; setting this too high
will slow the system, and occasionally generate unlikely parses.
The limit applies only to those connectors not exempted by the
UNLIMITED-CONNECTORS dictionary entry.

[timeout]
Determines the approximate maximum time (in seconds) that parsing is
allowed to take. If a parse is not found before this time, normal parsing
is halted, and a "panic parse" mode is entered. During the panic parse,
a looser, less restrictive set of parameters is used (primarily, a
larger !cost-max), in an effort to find some, any parse.  Panic mode
can be enabled and disabled with the !panic option.

This option has no effect on the SAT parser (see "!help use-sat").

[memory]
This variable no longer has any effect; it is obsolete.

[null]
When False, only linkages without null links are considered.
When True, the parser tries to find linkages with the minimal
possible number of null links.

[panic]
If enabled, then a "panic-mode" will be entered when a parse cannot be
found within the time limit set by !timeout. When in panic mode, various
parse options are loosened so that a less accurate parse can be found
quickly.

[use-sat]
Use the Boolean-SAT parser instead of the traditional parser. The SAT
parser was an experimental alternative to the traditional parser.

This parser has several limitations, and offers no real advantages over
the traditional parser. Problems include that it is not able to find
linkages with null-links.  It does not honor the `!timeout` option.

[walls]
Alters the display of parsed sentences (see "!help graphics").
When True, the RIGHT-WALL and LEFT_WALL are always displayed.
When False, they are not displayed if their links are not considered
"interesting" (by a hard-coded criterion in the LG library).

[islands-ok]
This option determines whether or not "islands" of links are allowed.
For example, the following linkage has an island:

linkparser> this sentence is false this sentence is true
No complete linkages found.
Found 16 linkages (8 had no P.P. violations) at null count 1
	Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=11)

    +----------->WV---------->+
    +------->Wd-------+       |
    |        +--Dsu*c-+--Ss*s-+-Paf-+       +--Dsu*c-+--Ss*s-+--Pa-+
    |        |        |       |     |       |        |       |     |
LEFT-WALL this.d sentence.n is.v false.a this.d sentence.n is.v true.a

[postscript]
Generate postscript output. The generated postscript requires a header
in order  to be properly displayed; the header is printed by setting
!ps-header=True.

The postscript output currently malfunctions for sentences longer
than a page width.

[ps-header]
When set, and when !postscript=True is set, then the postscript header
will be printed.

%[cost-model]
%The only allowed value is 1 for now (the source code may need fixes).

[links]
When enabled, this will display each link, one per line, with the
words and connectors at each end of the link. The post-processing
domains are also displayed.

This mode is set to True when the standard input is not a terminal.

[disjuncts]
When True, display the disjuncts that used for each word, together
with their cost.

[batch]
When True, the program process sentences in batch-mode. During batch
mode, the usual parse printing is suppressed; only errors are reported.
In batch mode, a leading * in the first column can be used to indicate
a non-grammatical sentence. If such a sentence parses, an error is
printed. Conversely, an error is reported if no parses are found for
a valid sentence.

Batch testing is typically performed by piping a file to the parser;
for example
   link-parser [dictionary name] [arguments] < input-file
or
   cat input-file | link-parser [dictionary name] [arguments]

This flag is then usually placed at the beginning of the input-file
(other options may be specified, as well). Setting the !echo flag
can be useful, as it will echo the input sentence.

Our GitHub repository contains several large batch-files used during
testing and development; for English, the three most important ones
are "corpus-basic.batch", "corpus-fixes.batch" and "corpus-fix-long.batch".
See: https://github.com/opencog/link-grammar/tree/master/data/en

For more details see BATCH-MODE in:
https://www.abisource.com/projects/link-grammar/dict/introduction.html

[echo]
Print the original input sentence. This is primarily useful when working
in !batch mode, which otherwise suppresses output.

This mode is set to True when the standard input is not a terminal.

[rand]
If set to true, then a repeatable random sequence will be used, whenever
a random number is required.  The parser almost never uses random
numbers; currently they are only used in one place: to sample a subset
of linkages, if there are more parses than the linkage limit.
See "!help limit" for info on the linkage limit.

[debug]
This variable is for LG library development.  Its purpose is to limit
the quantity of debug output, of which there may otherwise be too much.
For example:

  $ link-parser -verbosity=6 -debug=flatten_wordgraph,print.c

will only show messages from the `flatten_wordgraph()` function or the
print.c file.

For more details see debug/README.md in the LG library source code
directory.

[test]
This variable is used to enable features that are used for debug
or do not yet have any other variable to control them.  For example,
to show all the linkages without a need to press RETURN, use:

  !test=auto-next-linkage

For more details, see debug/README.md and link-grammar/README.md
in our GitHub repository https://github.com/opencog/link-grammar .

[file]
Read text from this file. The file is assumed to contain sentences
and/or option settings.  It is typically used for reading in batch-mode
files (see "!help batch") but can also be useful in other scripting
situations.

[variables]
Variables can be set as follows:

 !<var>          Toggle the specified Boolean variable.
 !<var>=<val>    Assign that value to that variable.

[wordgraph]
This variable controls displaying the word-graph of the sentence.
The word-graph is a representation of the relations between the sentence
tokens, as set by the library tokenizer before the parsing step.

Its value may be:
 0      Disabled
 1      Default display
 2      Display parent tokens as subgraphs
 3      Use esoteric display flags as set by !test=wg:FLAGS

% FLAGS documentation:
% These flags are defined in wordgraph.h.
% Below, unsplit-word means a token before getting split.
% 1 and 2 mark the flags that are enabled in that modes.
%
% c                 Compact display
% d             1   Display debug labels
% h                 Display hex node numbers (for "dot" command debug)
% l             1,2 Add a legend
% p                 Display back-pointing links
% s             2   Display unsplit-words as subgraphs
% u             1   Display unsplit-word links
% x                 Display using X11 even on Windows (if supported)

[dialect]
This variable allows parsing according to predefined dialects (defined in
the "4.0.dialect" file), by modifying the disjunct set of dictionary words
whose expressions contain symbolic cost specifications - aka "dialect
components". It does that by controlling the cost values of these dialect
components.

The value of this variable consists of comma-separated names, with an
optional cost value after a ":" delimiter (which can be empty).
White space is not allowed.

Names without a cost value are dialect names from the "4.0.dialect" file,
and the dialect components are assigned costs as defined there.
Names with values are dialect components and their values. A missing
value after a ":" delimiter denotes a "very high" cost (to disable
the related disjuncts).

Examples:

  !dialect=irish
  !dialect=irish,headline
  !dialect=instructions,bad-spelling:2.2

[!]
This command is for debugging the dictionary or the library.
It gets as an argument a word, and optionally a regex and/or flags.
It splits the given word to tokens according to the current language,
and for each token it prints its matching dictionary words along with its
expression or disjunct list. The word may include a wildcard * to find
multiple matches, and a subscript can be used to limit the matches to this
subscript only.

Examples ("test.n" is an example word):

Show the expression:
  !!test.n

Show the expression using macro tags:
  !!test.n/m
Each macro tag is followed by its content on the same line.
The other lines are direct expression components (before and after a macro).

Show also low-level memory details of the expression:
  !!test.n/l

Show the disjuncts (without duplicates):
  !!test.n//

Show disjunct connector expression source macros:
  !!test.n//m

The above command is more useful for a single disjunct (1234 is an example
for a disjunct number, see below for disjunct print format):
  !!test.n/1234/m


Show selected disjuncts according to the supplied string (* and +
are automatically escaped if no other regex meta characters in the string):
  !!test.n/Ds**x+/

Show selected disjuncts according to the supplied regex:
  !!test.n/ Wd-.*<>.*@M\+/
  !!test.n/ J[sk]- D[\w*]+c\-/
Regexes are automatically detected. The r flag forces a regex interpretation
but it is not needed on normal use.

Show a particular disjunct from the output of !disjuncts:
  !!test.n/Ds**c- Os-/f
The f flag means a full specification of a disjunct. It is most useful
along with the m flag:
  !!test.n/Ds**c- Os-/fm

Search for connectors in any order:
  !!test.n/Os- Ds**c-/a
Regretfully, adding the f flag is not supported yet.

Display all the words that start with "test":
  !!test*

Display all the words that start with "test" and have subscript ".q":
  !!test*.q


A sample output of a disjunct-list display:
  Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>

  Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

    ...
          test.n: [3493]2.600= @AN- @A- Ds**x- <> NM+ R+ Bs+ Bsm+
    ...

In this sample output:
   8509     Number of disjuncts in the dictionary expression.
   4501     Number of disjuncts after applying cost-max.
   4273     Number of disjuncts w/o duplicates.
   3493     Disjunct ordinal number.
   2.600    Disjunct cost.
   =        A separator to enable regex anchoring.
   <>       A separator of the "-" (LHS) and "+" (RHS) connector lists.

These variables affect the output:
Disjuncts, expressions: !dialect
Disjuncts only:         !cost-max
