dwww Home | Manual pages | Find package

RE2C(1)                                                                RE2C(1)

NAME
       re2c - generate fast lexical analyzers for C/C++, Go and Rust

SYNOPSIS
       Note:  This  manual  is  for Rust, but it refers to re2c as the general
       program.

          re2c    [ OPTIONS ] [ WARNINGS ] INPUT
          re2go   [ OPTIONS ] [ WARNINGS ] INPUT
          re2rust [ OPTIONS ] [ WARNINGS ] INPUT

       Input can be either a file or - for stdin.

INTRODUCTION
       re2c works as a preprocessor. It reads the input file (which is usually
       a  program  in  the target language, but can be anything) and looks for
       blocks of code enclosed in special-form comments. The text  outside  of
       these  blocks  is copied verbatim into the output file. The contents of
       the blocks are processed by re2c. It translates them  to  code  in  the
       target language and outputs the generated code in place of the block.

       Here  is  an  example  of a small program that checks if a given string
       contains a decimal number:

          // re2rust $INPUT -o $OUTPUT

          fn lex(s: &[u8]) -> bool {
              let mut cursor = 0;
              /*!re2c
                  re2c:define:YYCTYPE = u8;
                  re2c:define:YYPEEK = "*s.get_unchecked(cursor)";
                  re2c:define:YYSKIP = "cursor += 1;";
                  re2c:yyfill:enable = 0;

                  number = [1-9][0-9]*;

                  number { return true; }
                  *      { return false; }
              */
          }

          fn main() {
              assert!(lex(b"1234\0"));
          }

       In the output everything between /*!re2c and */ has been replaced  with
       the generated code:

          /* Generated by re2c */
          // re2rust $INPUT -o $OUTPUT

          fn lex(s: &[u8]) -> bool {
              let mut cursor = 0;

          {
              #[allow(unused_assignments)]
              let mut yych : u8 = 0;
              let mut yystate : usize = 0;
              'yyl: loop {
                  match yystate {
                      0 => {
                          yych = unsafe {*s.get_unchecked(cursor)};
                          cursor += 1;
                          match yych {
                              0x31 ..= 0x39 => {
                                  yystate = 2;
                                  continue 'yyl;
                              }
                              _ => {
                                  yystate = 1;
                                  continue 'yyl;
                              }
                          }
                      }
                      1 => { return false; }
                      2 => {
                          yych = unsafe {*s.get_unchecked(cursor)};
                          match yych {
                              0x30 ..= 0x39 => {
                                  cursor += 1;
                                  yystate = 2;
                                  continue 'yyl;
                              }
                              _ => {
                                  yystate = 3;
                                  continue 'yyl;
                              }
                          }
                      }
                      3 => { return true; }
                      _ => {
                          panic!("internal lexer error")
                      }
                  }
              }
          }

          }

          fn main() {
              assert!(lex(b"1234\0"));
          }

SYNTAX
       A re2c program consists of a sequence of blocks intermixed with code in
       the target language. There are three main kinds of blocks:

          /*!re2c[:<name>] ... */
                 A global block contains definitions,  configurations,  direc-
                 tives  and  rules.  re2c compiles regular expressions associ-
                 ated with each rule into a  deterministic  finite  automaton,
                 encodes  it  in  the  form of conditional jumps in the target
                 language and replaces the  block  with  the  generated  code.
                 Names  and configurations defined in a global block are added
                 to the global scope and become visible to subsequent  blocks.
                 At  the  start of the program the global scope is initialized
                 with command-line options.  The :<name> part is optional:  if
                 specified,  the name can be used to refer to the block in an-
                 other part of the program.

          /*!local:re2c[:<name>] ... */
                 A local block is like a global block, but the names and  con-
                 figurations  in it have local scope (they do not affect other
                 blocks).

          /*!rules:re2c[:<name>] ... */
                 A rules block is like a local block, but it does not generate
                 any code and is meant to be reused in other blocks. This is a
                 way of sharing code (more details in the reusable blocks sec-
                 tion).

       There are also many auxiliary blocks; see section blocks and directives
       for a full list of them. A block may contain  the  following  kinds  of
       statements:

          <name> = <regular expression>;
                 A  definition binds a name to a regular expression. Names may
                 contain alphanumeric characters and underscore.  The  regular
                 expressions section gives an overview of re2c syntax for reg-
                 ular expressions. Once defined, the name can be used in other
                 regular  expressions and in rules. Recursion in named defini-
                 tions is not allowed, and each name should be defined  before
                 it  is  used.  A  block  inherits  named definitions from the
                 global scope.  Redefining a name that exists in  the  current
                 scope is an error.

          <configuration> = <value>;
                 A  configuration  allows one to change re2c behavior and cus-
                 tomize the generated code. For a full list of  configurations
                 supported  by  re2c see the configurations section. Depending
                 on a particular configuration, the value can be a keyword,  a
                 nonnegative  integer number or a one-line string which should
                 be enclosed in double or single quotes unless it consists  of
                 alphanumeric characters. A block inherits configurations from
                 the global scope and may redefine them or add new ones.  Con-
                 figurations defined inside of a block affect the whole block,
                 even if they appear at the end of it.

          <regular expression> { <code> }
                 A rule binds a regular expression to  a  semantic  action  (a
                 block of code in the target language). If the regular expres-
                 sion matches, the associated semantic action is executed.  If
                 multiple  rules match, the longest match takes precedence. If
                 multiple rules match the same string, the earliest one  takes
                 precedence.  There  are two special rules: the default rule *
                 and the end of input rule $. The default rule  should  always
                 be  defined,  it  has  the  lowest priority regardless of its
                 place in the block, and it matches any code unit (not  neces-
                 sarily  a valid character, see the encoding support section).
                 The end of input rule should be defined if the  corresponding
                 method for handling the end of input is used. If start condi-
                 tions are used, rules have more complex syntax.

          !<directive>;
                 A directive is one of the special predefined statements. Each
                 directive  has a unique purpose. For example, the !use direc-
                 tive merges a rules block  into  the  current  one  (see  the
                 reusable  blocks  section), and the !include directive allows
                 one to include an outer file (see the include files section).

PROGRAM INTERFACE
       The generated code interfaces with the outer program with the  help  of
       primitives  --  symbolic  names that can be defined as variables, func-
       tions or macros in the target language (collectively referred to as the
       API).   The  definition  of primitives is left for the user: this gives
       them both freedom in customizing the lexer and responsibility to under-
       stand  how  it  works.   Not all primitives have to be defined --- only
       those used by a given program.  The manual provides definitions for the
       most popular use cases. For a full list of primitives and their meaning
       see the API primitives section.

       There are two API flavors that define the set  of  primitives  used  by
       re2c:

          Pointer API
                 This  API  is  based on C pointer arithmetic. It was histori-
                 cally the first, and for a long time the only  one.  It  con-
                 sists  of  pointer-like primitives YYCURSOR, YYMARKER, YYCTX-
                 MARKER, YYLIMIT (which are normally defined  as  pointers  of
                 type YYCTYPE*) and YYFILL. This API is enabled by default for
                 C, and it cannot be used with other backends that do not sup-
                 port pointer arithmetic.

          Generic API
                 This API is more flexible. It consists generic operations and
                 does not assume any particular implementation. The primitives
                 are  YYPEEK, YYSKIP, YYBACKUP, YYBACKUPCTX, YYSTAGP, YYSTAGN,
                 YYMTAGP,  YYMTAGN,  YYRESTORE,  YYRESTORECTX,   YYRESTORETAG,
                 YYSHIFT,  YYSHIFTSTAG,  YYSHIFTMTAG,  YYLESSTHAN  and YYFILL.
                 For the C backend generic API is enabled  with  --api  custom
                 option  or  re2c:api = custom; configuration; for Go and Rust
                 it is enabled by default. Generic API was  added  in  version
                 0.14.

       There  are  two  API styles that determine the form in which the primi-
       tives should be defined:

          Free-form
                 Free-form style is enabled with configuration  re2c:api:style
                 =  free-form;. It is the default for Rust.  In this style in-
                 terface primitives should be defined as free-form  pieces  of
                 code  with  interpolated variables of the form @@{var} or op-
                 tionally just @@ if there is a single variable.  The  set  of
                 variables is specific to each primitive.  For example, if the
                 input is a byte slice buffer: &[u8], variables cursor, limit,
                 marker and ctxmarker of type usize represent input positions,
                 and a constant NONE represents invalid position, then the API
                 can be defined as follows:

                     /*!re2c
                       re2c:define:YYPEEK       = "*buffer.get_unchecked(cursor)";
                       re2c:define:YYSKIP       = "cursor += 1;";
                       re2c:define:YYBACKUP     = "marker = cursor;";
                       re2c:define:YYRESTORE    = "cursor = marker;";
                       re2c:define:YYBACKUPCTX  = "ctxmarker = cursor;";
                       re2c:define:YYRESTORECTX = "cursor = ctxmarker;";
                       re2c:define:YYRESTORETAG = "cursor = @@{tag};";
                       re2c:define:YYLESSTHAN   = "limit - cursor < @@{len}";
                       re2c:define:YYSTAGP      = "@@{tag} = cursor;";
                       re2c:define:YYSTAGN      = "@@{tag} = NONE;";
                       re2c:define:YYSHIFT      = "cursor = (cursor as isize + @@{shift}) as usize;";
                       re2c:define:YYSHIFTSTAG  = "@@{tag} = (@@{tag} as isize + @@{shift}) as usize;";
                     */

          Function-like
                 Function-like    style    is   enabled   with   configuration
                 re2c:api:style = functions;. In this style primitives  should
                 be defined as functions or macros with parentheses, accepting
                 the necessary arguments.  This is well-suited for C macros or
                 Go  closures,  but  it cannot be used with hygienic macros or
                 functions in Rust as the primitives need to access lexer con-
                 text that is not passed as an argument. It is possible to use
                 closures, but the borrow-checker makes it hard  because  dif-
                 ferent  primitives  read  and  modify the same parts of lexer
                 context in a series of interleaved invocations.  As  a  work-
                 around,  one can use Rust Cell type.  For example, if the in-
                 put is a byte slice buffer: &[u8] and  cursor:  Cell::<usize>
                 is  the current input position, then YYPEEK and YYSKIP can be
                 defined as follows:

                     let YYPEEK = || unsafe { *buffer.get_unchecked(cursor.get()) };
                     let YYSKIP = || { cursor.set(cur.get() + 1); };

       For YYFILL definition and instructions  how  to  customize  or  disable
       end-of-input  checks  see  the handling the end of input and buffer re-
       filling sections.

OPTIONS
       Some of the  options  have  corresponding  configurations,  others  are
       global  and cannot be changed after re2c starts reading the input file.
       Debug options generally require building re2c in  debug  configuration.
       Internal  options are useful for experimenting with the algorithms used
       in re2c.

       -? --help -h
              Show help message.

       --api --input <default | custom>
              Specify the API used by the generated  code  to  interface  with
              used-defined  code:  default  is the API based on pointer arith-
              metic (the default for C), and custom is the  generic  API  (the
              default for Go and Rust).

       --bit-vectors -b
              Optimize conditional jumps using bit masks.  This option implies
              --nested-ifs.

       --case-insensitive
              Treat single-quoted and double-quoted strings  as  case-insensi-
              tive.

       --case-inverted
              Invert  the  meaning of single-quoted and double-quoted strings:
              treat single-quoted strings as case-sensitive and  double-quoted
              strings as case-insensitive.

       --case-ranges
              Collapse  consecutive  cases in a switch statements into a range
              of the form low ... high. This syntax is a C/C++ language exten-
              sion that is supported by compilers like GCC, Clang and Tcc. The
              main advantage over using single cases is smaller generated code
              and faster generation time, although for some compilers like Tcc
              it also results in smaller binary size.   This  option  is  sup-
              ported only for C.

       --computed-gotos -g
              Optimize  conditional  jumps  using non-standard "computed goto"
              extension (which must be supported by the compiler). re2c gener-
              ates jump tables only in complex cases with a lot of conditional
              branches.  Complexity   threshold   can   be   configured   with
              cgoto:threshold  configuration.  This  option implies --bit-vec-
              tors. It is supported only for C.

       --conditions --start-conditions -c
              Enable support of Flex-like "conditions": multiple  interrelated
              lexers  within  one  block.  This  is an alternative to manually
              specifying different re2c blocks connected with goto or function
              calls.

       --depfile FILE
              Write  dependency  information to FILE in the form of a Makefile
              rule <output-file> : <input-file> [include-file ...].  This  al-
              lows  one  to  track  build  dependencies in the presence of in-
              clude:re2c directives, so that updating include  files  triggers
              regeneration  of  the  output  file.  This option depends on the
              --output option.

       --ebcdic --ecb -e
              Generate a lexer that reads input in EBCDIC encoding.  re2c  as-
              sumes  that  the character range is 0 -- 0xFF and character size
              is 1 byte.

       --empty-class <match-empty | match-none | error>
              Define  the  way  re2c  treats  empty  character  classes.  With
              match-empty (the default) empty class matches empty input (which
              is illogical, but backwards-compatible). With  match-none  empty
              class  always  fails  to match.  With error empty class raises a
              compilation error.

       --encoding-policy <fail | substitute | ignore>
              Define the way re2c treats Unicode surrogates.  With  fail  re2c
              aborts with an error when a surrogate is encountered.  With sub-
              stitute re2c silently replaces surrogates with  the  error  code
              point  0xFFFD.  With ignore (the default) re2c treats surrogates
              as normal code points. The Unicode standard says that standalone
              surrogates  are  invalid,  but real-world libraries and programs
              behave in different ways.

       --flex-syntax -F
              Partial support for Flex syntax: in this mode named  definitions
              don't  need  the  equal  sign and the terminating semicolon, and
              when used they must be surrounded with curly braces. Names with-
              out curly braces are treated as double-quoted strings.

       --header --type-header -t HEADER
              Generate  a  HEADER file. The contents of the file can be speci-
              fied with directives  header:re2c:on  and  header:re2c:off.   If
              conditions  are used the header will have a condition enum auto-
              matically appended to it (unless there  is  an  explicit  condi-
              tions:re2c directive).

       -I PATH
              Add  PATH to the list of locations which are used when searching
              for include files. This option is useful in combination with in-
              clude:re2c  directive.  re2c  looks for FILE in the directory of
              the parent file and in the include locations specified  with  -I
              option.

       --input-encoding <ascii | utf8>
              Specify  the  way  re2c  parses regular expressions.  With ascii
              (the default) re2c handles input as ASCII-encoded: any  sequence
              of  code  units  is  a sequence of standalone 1-byte characters.
              With utf8 re2c handles  input  as  UTF8-encoded  and  recognizes
              multibyte characters.

       --lang <c | go | rust>
              Specify  the  output language. Supported languages are C, Go and
              Rust.  The default is C for re2c, Go  for  re2go  and  Rust  for
              re2rust.

       --location-format <gnu | msvc>
              Specify  location  format  in  messages.  With gnu locations are
              printed as 'filename:line:column: ...'.  With msvc locations are
              printed as 'filename(line,column) ...'.  The default is gnu.

       --loop-switch
              Encode DFA in a form of a loop over a switch statement. Individ-
              ual states are switch cases. The current state is  stored  in  a
              variable  yystate.  Transitions between states update yystate to
              the case label of the destination state and continue to the head
              of  the  loop. This option is always enabled for Rust, as it has
              no goto statement and cannot use the goto/label  approach  which
              is the default for C and Go backends.

       --nested-ifs -s
              Use  nested if statements instead of switch statements in condi-
              tional jumps. This usually results in more efficient  code  with
              non-optimizing compilers.

       --no-debug-info -i
              Do  not output line directives. This may be useful when the gen-
              erated code is stored in a version control system (to avoid huge
              autogenerated  diffs on small changes). This option is on by de-
              fault for Rust, as it does not have line directives.

       --no-generation-date
              Suppress date output in the generated file.

       --no-version
              Suppress version output in the generated file.

       --no-unsafe
              Do not generate unsafe wrapper over YYPEEK (this option is  spe-
              cific  to  Rust).  For  performance  reasons YYPEEK should avoid
              bounds-checking, as  the  lexer  already  performs  end-of-input
              checks  in a more efficient way.  The user may choose to provide
              a safe YYPEEK definition, or a definition that is unsafe only in
              release  builds,  in  which case the --no-unsafe option helps to
              avoid warnings about redundant unsafe blocks.

       --output -o OUTPUT
              Specify the OUTPUT file.

       --posix-captures -P
              Enable submatch extraction with POSIX-style capturing groups.

       --reusable -r
              Deprecated since version 2.2 (reusable blocks are allowed by de-
              fault now).

       --skeleton -S
              Ignore user-defined interface code and generate a self-contained
              "skeleton" program.  Additionally,  generate  input  files  with
              strings  derived  from  the regular grammar and compressed match
              results that are used to verify "skeleton" behavior on  all  in-
              puts.  This  option  is useful for finding bugs in optimizations
              and code generation. This option is supported only for C.

       --storable-state -f
              Generate a lexer which can store its inner state.  This is  use-
              ful  in  push-model lexers which are stopped by an outer program
              when there is not enough input, and then resumed when more input
              becomes available. In this mode users should additionally define
              YYGETSTATE and YYSETSTATE primitives, and variables yych,  yyac-
              cept and state should be part of the stored lexer state.

       --tags -T
              Enable submatch extraction with tags.

       --ucs2 --wide-chars -w
              Generate  a  lexer  that  reads UCS2-encoded input. re2c assumes
              that the character range is 0 -- 0xFFFF and character size is  2
              bytes.  This option implies --nested-ifs.

       --utf8 --utf-8 -8
              Generate  a  lexer  that reads input in UTF-8 encoding. re2c as-
              sumes that the character range is 0 --  0x10FFFF  and  character
              size is 1 byte.

       --utf16 --utf-16 -x
              Generate  a  lexer  that reads UTF16-encoded input. re2c assumes
              that the character range is 0 -- 0x10FFFF and character size  is
              2 bytes.  This option implies --nested-ifs.

       --utf32 --unicode -u
              Generate  a  lexer  that reads UTF32-encoded input. re2c assumes
              that the character range is 0 -- 0x10FFFF and character size  is
              4 bytes.  This option implies --nested-ifs.

       --verbose
              Output a short message in case of success.

       --vernum -V
              Show version information in MMmmpp format (major, minor, patch).

       --version -v
              Show version information.

       --single-pass -1
              Deprecated. Does nothing (single pass is the default now).

       --debug-output -d
              Emit  YYDEBUG  invocations in the generated code. This is useful
              to trace lexer execution.

       --dump-adfa
              Debug option: output DFA after tunneling (in .dot format).

       --dump-cfg
              Debug option: output control flow graph  of  tag  variables  (in
              .dot format).

       --dump-closure-stats
              Debug  option: output statistics on the number of states in clo-
              sure.

       --dump-dfa-det
              Debug option: output DFA immediately after  determinization  (in
              .dot format).

       --dump-dfa-min
              Debug option: output DFA after minimization (in .dot format).

       --dump-dfa-tagopt
              Debug  option:  output DFA after tag optimizations (in .dot for-
              mat).

       --dump-dfa-tree
              Debug option: output DFA under construction with  states  repre-
              sented as tag history trees (in .dot format).

       --dump-dfa-raw
              Debug  option:  output  DFA  under  construction  with  expanded
              state-sets (in .dot format).

       --dump-interf
              Debug option: output interference  table  produced  by  liveness
              analysis of tag variables.

       --dump-nfa
              Debug option: output NFA (in .dot format).

       --emit-dot -D
              Instead  of  normal  output generate lexer graph in .dot format.
              The output can be  converted  to  an  image  with  the  help  of
              Graphviz (e.g. something like dot -Tpng -odfa.png dfa.dot).

       --dfa-minimization <moore | table>
              Internal  option:  DFA  minimization algorithm used by re2c. The
              moore option is the Moore algorithm (it is the default). The ta-
              ble  option  is  the  "table filling" algorithm. Both algorithms
              should produce the same DFA up to states relabeling; table fill-
              ing  is simpler and much slower and serves as a reference imple-
              mentation.

       --eager-skip
              Internal option: make the generated lexer advance the input  po-
              sition  eagerly  --  immediately after reading the input symbol.
              This changes the default behavior when the input position is ad-
              vanced lazily -- after transition to the next state. This option
              is implied by --no-lookahead.

       --no-lookahead
              Internal option: use TDFA(0) instead of  TDFA(1).   This  option
              has effect only with --tags or --posix-captures options.

       --no-optimize-tags
              Internal  option: suppress optimization of tag variables (useful
              for debugging).

       --posix-closure <gor1 | gtop>
              Internal option: specify shortest-path algorithm  used  for  the
              construction of epsilon-closure with POSIX disambiguation seman-
              tics: gor1 (the default) stands for  Goldberg-Radzik  algorithm,
              and gtop stands for "global topological order" algorithm.

       --posix-prectable <complex | naive>
              Internal  option:  specify  the  algorithm used to compute POSIX
              precedence table. The complex algorithm computes precedence  ta-
              ble  in one traversal of tag history tree and has quadratic com-
              plexity in the number of TNFA states; it  is  the  default.  The
              naive algorithm has worst-case cubic complexity in the number of
              TNFA states, but it is much simpler  than  complex  and  may  be
              slightly faster in non-pathological cases.

       --stadfa
              Internal  option:  use staDFA algorithm for submatch extraction.
              The main difference with TDFA is that tag operations  in  staDFA
              are placed in states, not on transitions.

       --fixed-tags <none | toplevel | all>
              Internal  option:  specify  whether  the  fixed-tag optimization
              should be applied to all tags (all), none  of  them  (none),  or
              only  those in toplevel concatenation (toplevel). The default is
              all.  "Fixed" tags are those that are  located  within  a  fixed
              distance  to  some other tag (called "base"). In such cases only
              the base tag needs to be tracked, and the value of the fixed tag
              can  be computed as the value of the base tag plus a static off-
              set. For tags that are under alternative  or  repetition  it  is
              also necessary to check if the base tag has a no-match value (in
              that case fixed tag should also be set to no-match, disregarding
              the  offset).  For  tags in top-level concatenation the check is
              not needed, because they always match.

WARNINGS
       Warnings can be invividually enabled, disabled and turned into  an  er-
       ror.

       -W     Turn on all warnings.

       -Werror
              Turn  warnings  into errors. Note that this option alone doesn't
              turn on any warnings; it only affects those warnings  that  have
              been turned on so far or will be turned on later.

       -W<warning>
              Turn on warning.

       -Wno-<warning>
              Turn off warning.

       -Werror-<warning>
              Turn  on warning and treat it as an error (this implies -W<warn-
              ing>).

       -Wno-error-<warning>
              Don't treat this particular warning as an  error.  This  doesn't
              turn off the warning itself.

       -Wcondition-order
              Warn  if  the generated program makes implicit assumptions about
              condition numbering. One should use either the ---header  option
              or the conditions:re2c directive to generate a mapping of condi-
              tion names to numbers and then use the  autogenerated  condition
              names.

       -Wempty-character-class
              Warn  if a regular expression contains an empty character class.
              Trying to match an empty character  class  makes  no  sense:  it
              should  always  fail.  However, for backwards compatibility rea-
              sons re2c permits empty character classes  and  treats  them  as
              empty  strings.  Use  the --empty-class option to change the de-
              fault behavior.

       -Wmatch-empty-string
              Warn if a rule is nullable (matches an empty  string).   If  the
              lexer  runs  in a loop and the empty match is unintentional, the
              lexer may unexpectedly hang in an infinite loop.

       -Wswapped-range
              Warn if the lower bound of a range is  greater  than  its  upper
              bound.  The  default  behavior  is  to  silently  swap the range
              bounds.

       -Wundefined-control-flow
              Warn if some input strings cause undefined control flow  in  the
              lexer  (the  faulty  patterns are reported). This is a dangerous
              and common mistake. It can be easily fixed by adding the default
              rule * which has the lowest priority, matches any code unit, and
              always consumes a single code unit.

       -Wunreachable-rules
              Warn about rules that are shadowed by other rules and will never
              match.

       -Wuseless-escape
              Warn  if  a symbol is escaped when it shouldn't be.  By default,
              re2c silently ignores such escapes, but this may as  well  indi-
              cate a typo or an error in the escape sequence.

       -Wnondeterministic-tags
              Warn  if  a  tag  has  n-th degree of nondeterminism, where n is
              greater than 1.

       -Wsentinel-in-midrule
              Warn if the sentinel symbol occurs in the middle of a  rule  ---
              this  may  cause reads past the end of buffer, crashes or memory
              corruption in the generated lexer. This warning is only applica-
              ble  if  the sentinel method of checking for the end of input is
              used.  It is set to an error if re2c:sentinel  configuration  is
              used.

BLOCKS AND DIRECTIVES
       Below  is  the  list of re2c directives (syntactic constructs that mark
       the beginning and end of the code that should be  processed  by  re2c).
       Named  blocks were added in re2c version 2.2. They are exactly the same
       as unnamed blocks, except that the name can  be  used  to  reference  a
       block in other parts of the program. More information on each directive
       can be found in the related sections.

       /*!re2c[:<name>] ... */
              A global re2c block with an optional name. The block may contain
              named  definitions, configurations and rules in any order. Named
              definitions and configurations are defined in the global  scope,
              so  they  are  inherited  by  subsequent  blocks. The code for a
              global block is generated at the point where the block is speci-
              fied.

       /*!local:re2c[:<name>] ... */
              A  local re2c block with an optional name. Unlike global blocks,
              definitions and configurations inside of a local block  are  not
              added  into the global scope. In all other respects local blocks
              are the same as global blocks.

       /*!rules:re2c[:<name>] ... */
              A reusable block with an optional name. Rules  blocks  have  the
              same  structure  as local or global blocks, but they do not pro-
              duce any code and they can be reused  multiple  times  in  other
              blocks   with   the  help  of  a  !use:<name>;  directive  or  a
              /*!use:re2c[:<name>] ... */ block. A rules block on its own does
              not  add  any definitions into the global scope. The code for it
              is generated at the point of use.  Prior  to  re2c  version  2.2
              rules blocks required -r --reusable option.

       /*!use:re2c[:<name>] ... */
              A use block that references a previously defined rules block. If
              the name is specified, re2c looks for a rules blocks  with  this
              name.  Otherwise  the  most recent rules block is used (either a
              named or an unnamed one). A use block can add definitions,  con-
              figurations  and  rules  of its own, which are added to those of
              the referenced rules block. Prior to re2c version 2.2 use blocks
              required -r --reusable option.

       !use:<name>;
              An in-block use directive that merges a previously defined rules
              block with the specified name into the current block. Named def-
              initions,  configurations  and rules of the referenced block are
              added to the current ones. Conflicts between  overlapping  rules
              and configurations are resolved in the usual way: the first rule
              takes priority, and the latest configuration overrides the  pre-
              ceding ones. One exception is the special rules *, $ and <!> for
              which a block-local definition always takes priority. A use  di-
              rective  can  be placed anywhere inside of a block, and multiple
              use directives are allowed.

       /*!max:re2c[:<name1>[:<name2>...]] ... */
              A directive that generates YYMAXFILL  definition.   An  optional
              list  of  block  names specifies which blocks should be included
              when computing YYMAXFILL value (if the list is empty, all blocks
              are included).  By default the generated code is a macro-defini-
              tion for C (#define YYMAXFILL <n>), or a global variable for  Go
              (var YYMAXFILL int = <n>). It can be customized with an optional
              configuration format that  specifies  a  template  string  where
              @@{max}  (or @@ for short) is replaced with the numeric value of
              YYMAXFILL.

       /*!maxnmatch:re2c[:<name1>[:<name2>...]] ... */
              A directive that generates YYMAXNMATCH definition  (it  requires
              -P  --posix-captures  option).   An optional list of block names
              specifies which blocks should be included when computing YYMAXN-
              MATCH value (if the list is empty, all blocks are included).  By
              default the generated code is a macro-definition for C  (#define
              YYMAXNMATCH  <n>),  or a global variable for Go (var YYMAXNMATCH
              int = <n>). It can be customized with an optional  configuration
              format that specifies a template string where @@{max} (or @@ for
              short) is replaced with the numeric value of YYMAXNMATCH.

       /*!stags:re2c[:<name1>[:<name2>...]]              ...               */,
       /*!mtags:re2c[:<name1>[:<name2>...]] ... */
              Directives  that  specify  a  template piece of code that is ex-
              panded for each s-tag/m-tag variable generated by re2c.  An  op-
              tional  list of block names specifies which blocks should be in-
              cluded when computing the set of tag variables (if the  list  is
              empty, all blocks are included).  There are two optional config-
              urations: format and separator.  Configuration format  specifies
              a  template  string  where @@(tag} (or @@ for short) is replaced
              with the name of each  tag  variable.   Configuration  separator
              specifies  a  piece  of  code  used to join the generated format
              pieces for different tag variables.

       /*!getstate:re2c[:<name1>[:<name2>...]] ... */
              A directive that generates conditional  dispatch  on  the  lexer
              state  (it  requires --storable-state option).  An optional list
              of block names specifies which blocks should be included in  the
              state  dispatch.  The default transition goes to the start label
              of the first block on the list. If the list is empty, all blocks
              are included, and the default transition goes to the first block
              in the file that has a start label.  This directive is incompat-
              ible  with  the  --loop-switch  option  and Rust, as it requires
              cross-block transitions that are unsupported  without  the  goto
              statement.

       /*!conditions:re2c[:<name1>[:<name2>...]] ... */, /*!types:re2c... */
              A  directive  that  generates condition enumeration (it requires
              --conditions option).  An optional list of block names specifies
              which blocks should be included when computing the set of condi-
              tions (if the list is empty, all blocks are included).   By  de-
              fault the generated code is an enumeration YYCONDTYPE. It can be
              customized with optional configurations  format  and  separator.
              Configuration  format specifies a template string where @@(cond}
              (or @@ for short) is replaced with the name of  each  condition,
              and  @@{num} is replaced with a numeric index of that condition.
              Configuration separator specifies a piece of code used  to  join
              the generated format pieces for different conditions.

       /*!include:re2c <file> */
              This  directive  allows  one  to include <file>, which must be a
              double-quoted file path. The contents of the file are  literally
              substituted  in  place of the directive, in the same way as #in-
              clude works in C/C++. This directive can be used  together  with
              the  --depfile  option  to generate build system dependencies on
              the included files.

       !include <file>;
              This directive is the same as /*!include:re2c <file> */,  except
              that it should be used inside of a re2c block.

       /*!header:re2c:on*/
              This  directive marks the start of header file. Everything after
              it and up to the  following  /*!header:re2c:off*/  directive  is
              processed  by re2c and written to the header file specified with
              -t --type-header option.

       /*!header:re2c:off*/
              This directive  marks  the  end  of  header  file  started  with
              /*!header:re2c:on*/.

       /*!ignore:re2c ... */
              A  block  which contents are ignored and removed from the output
              file.

       %{ ... %}
              A global re2c block in the --flex-support mode. This  is  depre-
              cated and exists for backward compatibility.

API PRIMITIVES
       Here is a list of API primitives that may be used by the generated code
       in order to interface with the outer  program.   Which  primitives  are
       needed depends on multiple factors, including the complexity of regular
       expressions, input representation, buffering, the use of  various  fea-
       tures and so on.  All the necessary primitives should be defined by the
       user in the form of macros, functions, variables, free-form  pieces  of
       code, or any other suitable form.  re2c does not (and cannot) check the
       definitions, so if anything is missing or defined incorrectly the  gen-
       erated code will not compile.

       YYCTYPE
              The  type  of  the  input  characters  (code units).  For ASCII,
              EBCDIC and UTF-8 encodings it should be 1-byte unsigned integer.
              For  UTF-16  or  UCS-2 it should be 2-byte unsigned integer. For
              UTF-32 it should be 4-byte unsigned integer.

       YYCURSOR
              A pointer-like l-value that stores the  current  input  position
              (usually  a pointer of type YYCTYPE*). Initially YYCURSOR should
              point to the first input character. It is advanced by the gener-
              ated code.  When a rule matches, YYCURSOR points to the position
              after the last matched character. It is used only in  C  pointer
              API.

       YYLIMIT
              A  pointer-like  r-value  that  stores the end of input position
              (usually a pointer of type YYCTYPE*). Initially  YYLIMIT  should
              point  to the position after the last available input character.
              It is not changed by the generated code. The lexer compares  YY-
              CURSOR  to YYLIMIT in order to determine if there are enough in-
              put characters left.  YYLIMIT is used only in C pointer API.

       YYMARKER
              A pointer-like l-value (usually a pointer of type YYCTYPE*) that
              stores  the  position  of the latest matched rule. It is used to
              restore the YYCURSOR position if the longer match fails and  the
              lexer  needs to rollback. Initialization is not needed. YYMARKER
              is used only in C pointer API.

       YYCTXMARKER
              A pointer-like l-value that stores the position of the  trailing
              context  (usually a pointer of type YYCTYPE*). No initialization
              is needed.  It is used only in C pointer API, and only with  the
              lookahead operator /.

       YYFILL A  generic  API  primitive with one argument len.  YYFILL should
              provide at least len more input characters or fail.  If re2c:eof
              is  used,  then len is always 1 and  YYFILL should always return
              to the calling function; zero return  value  indicates  success.
              If re2c:eof is not used, then YYFILL return value is ignored and
              it should not return on failure. The maximum value of len is YY-
              MAXFILL.   The  definition of YYFILL can be either function-like
              or free-form depending on the API style (see re2c:api:style  and
              re2c:define:YYFILL:naked).

       YYMAXFILL
              An  integral constant equal to the maximum value of the argument
              to YYFILL.  It can be generated with /*!max:re2c*/ directive.

       YYLESSTHAN
              A generic API primitive with one argument len.  It should be de-
              fined as an r-value of boolean type that equals true if and only
              if there are less than len input characters left.   The  defini-
              tion  can  be either function-like or free-form depending on the
              API style (see re2c:api:style).

       YYPEEK A generic API primitive with no arguments.  It should be defined
              as  an r-value of type YYCTYPE that is equal to the character at
              the current input position. The definition can be  either  func-
              tion-like   or   free-form  depending  on  the  API  style  (see
              re2c:api:style).

       YYSKIP A generic API primitive with no arguments.   YYSKIP  should  ad-
              vance  the  current input position by one character. The defini-
              tion can be either function-like or free-form depending  on  the
              API style (see re2c:api:style).

       YYBACKUP
              A generic API primitive with no arguments.  YYBACKUP should save
              the current input position, which is later restored  with  YYRE-
              STORE.    The  definition  should  be  either  function-like  or
              free-form depending on the API style (see re2c:api:style).

       YYRESTORE
              A generic API primitive with no arguments.  YYRESTORE should re-
              store the current input position to the value saved by YYBACKUP.
              The definition should be either function-like or  free-form  de-
              pending on the API style (see re2c:api:style).

       YYBACKUPCTX
              A generic API primitive with zero arguments.  YYBACKUPCTX should
              save the current input position as the position of the  trailing
              context,  which  is later restored by YYRESTORECTX.  The defini-
              tion should be either function-like or  free-form  depending  on
              the API style (see re2c:api:style).

       YYRESTORECTX
              A  generic API primitive with no arguments.  YYRESTORECTX should
              restore the trailing context position  saved  with  YYBACKUPCTX.
              The  definition  should be either function-like or free-form de-
              pending on the API style (see re2c:api:style).

       YYRESTORETAG
              A generic API primitive with  one  argument  tag.   YYRESTORETAG
              should  restore  the  trailing  context position to the value of
              tag.  The definition should be either function-like or free-form
              depending on the API style (see re2c:api:style).

       YYSTAGP
              A  generic API primitive with one argument tag, where tag can be
              a pointer or an offset (see submatch extraction section for  de-
              tails).   YYSTAGP  should set tag to the current input position.
              The definition should be either function-like or  free-form  de-
              pending on the API style (see re2c:api:style).

       YYSTAGN
              A  generic API primitive with one argument tag, where tag can be
              a pointer or an offset (see submatch extraction section for  de-
              tails).   YYSTAGN  should  to set tag to a value that represents
              non-existent input position.  The definition  should  be  either
              function-like  or  free-form  depending  on  the  API style (see
              re2c:api:style).

       YYMTAGP
              A generic API primitive with one argument tag.   YYMTAGP  should
              append  the current position to the submatch history of tag (see
              the submatch extraction section for  details.)   The  definition
              should be either function-like or free-form depending on the API
              style (see re2c:api:style).

       YYMTAGN
              A generic API primitive with one argument tag.   YYMTAGN  should
              append a value that represents non-existent input position posi-
              tion to the submatch history of tag (see the submatch extraction
              section  for  details.)   The  definition  can  be  either func-
              tion-like  or  free-form  depending  on  the  API   style   (see
              re2c:api:style).

       YYSHIFT
              A generic API primitive with one argument shift.  YYSHIFT should
              shift the current input position by shift characters (the  shift
              value  may  be  negative).  The  definition  can be either func-
              tion-like  or  free-form  depending  on  the  API   style   (see
              re2c:api:style).

       YYSHIFTSTAG
              A  generic   API  primitive  with  two arguments, tag and shift.
              YYSHIFTSTAG should shift tag  by  shift  characters  (the  shift
              value  may  be  negative).   The  definition can be either func-
              tion-like  or  free-form  depending  on  the  API   style   (see
              re2c:api:style).

       YYSHIFTMTAG
              A  generic  API  primitive  with  two  arguments, tag and shift.
              YYSHIFTMTAG should shift the latest value in the history of  tag
              by shift characters (the shift value may be negative).  The def-
              inition should be either function-like or free-form depending on
              the API style (see re2c:api:style).

       YYMAXNMATCH
              An  integral  constant equal to the maximal number of POSIX cap-
              turing  groups  in  a  rule.  It  is  generated  with   /*!maxn-
              match:re2c*/ directive.

       YYCONDTYPE
              The  type  of the condition enum.  It should be generated either
              with the /*!types:re2c*/ directive or the -t  --type-header  op-
              tion.

       YYGETCONDITION
              An  API  primitive with zero arguments.  It should be defined as
              an r-value of type YYCONDTYPE that is equal to the current  con-
              dition identifier. The definition can be either function-like or
              free-form depending on the API  style  (see  re2c:api:style  and
              re2c:define:YYGETCONDITION:naked).

       YYSETCONDITION
              An  API primitive with one argument cond.  The meaning of YYSET-
              CONDITION is to set the current condition  identifier  to  cond.
              The  definition  should be either function-like or free-form de-
              pending on the API style (see re2c:api:style and re2c:define:YY-
              SETCONDITION@cond).

       YYGETSTATE
              An  API  primitive with zero arguments.  It should be defined as
              an r-value of integer type that is equal to  the  current  lexer
              state. Should be initialized to -1. The definition can be either
              function-like or free-form  depending  on  the  API  style  (see
              re2c:api:style and re2c:define:YYGETSTATE:naked).

       YYSETSTATE
              An API primitive with one argument state.  The meaning of YYSET-
              STATE is to set the current lexer state to state.   The  defini-
              tion  should  be  either function-like or free-form depending on
              the  API  style  (see  re2c:api:style   and   re2c:define:YYSET-
              STATE@state).

       YYDEBUG
              A  debug API primitive with two arguments. It can be used to de-
              bug the generated code (with -d --debug-output option).  YYDEBUG
              should return no value and accept two arguments: state (either a
              DFA state index or -1) and symbol (the current input symbol).

       yych   An l-value of type YYCTYPE that stores the current input charac-
              ter.  User definition is necessary only with -f --storable-state
              option.

       yyaccept
              An l-value of unsigned integral type that stores the  number  of
              the latest matched rule.  User definition is necessary only with
              -f --storable-state option.

       yynmatch
              An l-value of unsigned integral type that stores the  number  of
              POSIX  capturing  groups in the matched rule.  Used only with -P
              --posix-captures option.

       yypmatch
              An array of l-values that are used to hold the tag values corre-
              sponding  to the capturing parentheses in the matching rule. Ar-
              ray length must be at least yynmatch * 2 (usually YYMAXNMATCH  *
              2 is a good choice).  Used only with -P --posix-captures option.

CONFIGURATIONS
       re2c:api, re2c:flags:input
              Same as the --api option.

       re2c:api:sigil
              Specify  the  marker  ("sigil") that is used for argument place-
              holders in the API primitives. The default is @@. A  placeholder
              starts with sigil followed by the argument name in curly braces.
              For example, if sigil is set to $, then placeholders  will  have
              the  form  ${name}. Single-argument APIs may use shorthand nota-
              tion without the name in braces. This option can  be  overridden
              by  options for individual API primitives, e.g.  re2c:define:YY-
              FILL@len for YYFILL.

       re2c:api:style
              Specify API style. Possible values are  functions  (the  default
              for  C)  and  free-form (the default for Go and Rust).  In func-
              tions style API primitives are generated with an  argument  list
              in  parentheses  following  the name of the primitive. The argu-
              ments are provided only for autogenerated  parameters  (such  as
              the number of characters passed to YYFILL), but not for the gen-
              eral lexer context, so the primitives behave more like macros in
              C/C++ or closures in Go and Rust.  In free-form style API primi-
              tives do not have a  fixed  form:  they  should  be  defined  as
              strings  containing  free-form  pieces of code with interpolated
              variables of the form @@{var} or @@ (they  correspond  to  argu-
              ments  in function-like style).  This configuration may be over-
              ridden for individual API primitives, see for  example  re2c:de-
              fine:YYFILL:naked configuration for YYFILL.

       re2c:bit-vectors, re2c:flags:bit-vectors, re2c:flags:b
              Same  as  the  --bit-vectors  option,  but  can be configured on
              per-block basis.

       re2c:case-insensitive, re2c:flags:case-insensitive
              Same as the --case-insensitive option, but can be configured  on
              per-block basis.

       re2c:case-inverted, re2c:flags:case-inverted
              Same  as  the  --case-inverted  option, but can be configured on
              per-block basis.

       re2c:case-ranges, re2c:flags:case-ranges
              Same as the --case-ranges  option,  but  can  be  configured  on
              per-block basis.

       re2c:computed-gotos, re2c:flags:computed-gotos, re2c:flags:g
              Same  as  the  --computed-gotos option, but can be configured on
              per-block basis.

       re2c:computed-gotos:threshold, re2c:cgoto:threshold
              If computed goto is used, this configuration specifies the  com-
              plexity  threshold  that  triggers the generation of jump tables
              instead of nested if statements and bitmaps. The  default  value
              is 9.

       re2c:cond:goto
              Specifies  a  piece  of code used for the autogenerated shortcut
              rules :=> in conditions. The default is goto @@;.  The @@ place-
              holder  is  substituted  with condition name (see configurations
              re2c:api:sigil and re2c:cond:goto@cond).

       re2c:cond:goto@cond
              Specifies  the  sigil  used   for   argument   substitution   in
              re2c:cond:goto  definition.  The default value is @@.  Overrides
              the more generic re2c:api:sigil configuration.

       re2c:cond:divider
              Defines the divider for condition blocks.  The default value  is
              /*  ***********************************  */.   Placeholders  are
              substituted  with  condition  name   (see   re2c:api;sigil   and
              re2c:cond:divider@cond).

       re2c:cond:divider@cond
              Specifies   the   sigil   used   for  argument  substitution  in
              re2c:cond:divider definition. The default is @@.  Overrides  the
              more generic re2c:api:sigil configuration.

       re2c:cond:prefix, re2c:condprefix
              Specifies  the prefix used for condition labels.  The default is
              yyc_.

       re2c:cond:enumprefix, re2c:condenumprefix
              Specifies the prefix used for condition  identifiers.   The  de-
              fault is yyc.

       re2c:debug-output, re2c:flags:debug-output, re2c:flags:d
              Same  as  the  --debug-output  option,  but can be configured on
              per-block basis.

       re2c:define:YYBACKUP
              Defines generic API primitive YYBACKUP (see the  API  primitives
              section).

       re2c:define:YYBACKUPCTX
              Defines  generic  API  primitive YYBACKUPCTX (see the API primi-
              tives section).

       re2c:define:YYCONDTYPE
              Defines YYCONDTYPE (see the API primitives section).

       re2c:define:YYCTYPE
              Defines YYCTYPE (see the API primitives section).

       re2c:define:YYCTXMARKER
              Defines API primitive YYCTXMARKER (see the API  primitives  sec-
              tion).

       re2c:define:YYCURSOR
              Defines API primitive YYCURSOR (see the API primitives section).

       re2c:define:YYDEBUG
              Defines API primitive YYDEBUG (see the API primitives section).

       re2c:define:YYFILL
              Defines API primitive YYFILL (see the API primitives section).

       re2c:define:YYFILL@len
              Specifies  the  sigil  used  for argument substitution in YYFILL
              definition.  Defaults  to  @@.   Overrides  the   more   generic
              re2c:api:sigil configuration.

       re2c:define:YYFILL:naked
              Overrides  the more generic re2c:api:style configuration for YY-
              FILL.  Zero value corresponds to free-form API style.

       re2c:define:YYGETCONDITION
              Defines API primitive YYGETCONDITION  (see  the  API  primitives
              section).

       re2c:define:YYGETCONDITION:naked
              Overrides  the  more  generic  re2c:api:style  configuration for
              YYGETCONDITION. Zero value corresponds to free-form API style.

       re2c:define:YYGETSTATE
              Defines API primitive YYGETSTATE (see the  API  primitives  sec-
              tion).

       re2c:define:YYGETSTATE:naked
              Overrides  the  more  generic  re2c:api:style  configuration for
              YYGETSTATE. Zero value corresponds to free-form API style.

       re2c:define:YYLESSTHAN
              Defines generic API primitive YYLESSTHAN (see the API primitives
              section).

       re2c:define:YYLIMIT
              Defines API primitive YYLIMIT (see the API primitives section).

       re2c:define:YYMARKER
              Defines API primitive YYMARKER (see the API primitives section).

       re2c:define:YYMTAGN
              Defines  generic  API  primitive YYMTAGN (see the API primitives
              section).

       re2c:define:YYMTAGP
              Defines generic API primitive YYMTAGP (see  the  API  primitives
              section).

       re2c:define:YYPEEK
              Defines  generic  API  primitive  YYPEEK (see the API primitives
              section).

       re2c:define:YYRESTORE
              Defines generic API primitive YYRESTORE (see the API  primitives
              section).

       re2c:define:YYRESTORECTX
              Defines  generic  API primitive YYRESTORECTX (see the API primi-
              tives section).

       re2c:define:YYRESTORETAG
              Defines generic API primitive YYRESTORETAG (see the  API  primi-
              tives section).

       re2c:define:YYSETCONDITION
              Defines  API  primitive  YYSETCONDITION  (see the API primitives
              section).

       re2c:define:YYSETCONDITION@cond
              Specifies the sigil used for argument substitution in  YYSETCON-
              DITION  definition. The default value is @@.  Overrides the more
              generic re2c:api:sigil configuration.

       re2c:define:YYSETCONDITION:naked
              Overrides the more generic re2c:api:style configuration for  YY-
              SETCONDITION. Zero value corresponds to free-form API style.

       re2c:define:YYSETSTATE
              Defines  API  primitive  YYSETSTATE (see the API primitives sec-
              tion).

       re2c:define:YYSETSTATE@state
              Specifies the sigil used for argument substitution in YYSETSTATE
              definition. The default value is @@.  Overrides the more generic
              re2c:api:sigil configuration.

       re2c:define:YYSETSTATE:naked
              Overrides the more generic re2c:api:style configuration for  YY-
              SETSTATE. Zero value corresponds to free-form API style.

       re2c:define:YYSKIP
              Defines  generic  API  primitive  YYSKIP (see the API primitives
              section).

       re2c:define:YYSHIFT
              Defines generic API primitive YYSHIFT (see  the  API  primitives
              section).

       re2c:define:YYSHIFTMTAG
              Defines  generic  API  primitive YYSHIFTMTAG (see the API primi-
              tives section).

       re2c:define:YYSHIFTSTAG
              Defines generic API primitive YYSHIFTSTAG (see  the  API  primi-
              tives section).

       re2c:define:YYSTAGN
              Defines  generic  API  primitive YYSTAGN (see the API primitives
              section).

       re2c:define:YYSTAGP
              Defines generic API primitive YYSTAGP (see  the  API  primitives
              section).

       re2c:empty-class, re2c:flags:empty-class
              Same  as  the  --empty-class  option,  but  can be configured on
              per-block basis.

       re2c:encoding:ebcdic, re2c:flags:ecb, re2c:flags:e
              Same as the --ebcdic option, but can be configured on  per-block
              basis.

       re2c:encoding:ucs2, re2c:flags:wide-chars, re2c:flags:w
              Same  as  the  --ucs2 option, but can be configured on per-block
              basis.

       re2c:encoding:utf8, re2c:flags:utf-8, re2c:flags:8
              Same as the --utf8 option, but can be  configured  on  per-block
              basis.

       re2c:encoding:utf16, re2c:flags:utf-16, re2c:flags:x
              Same  as  the --utf16 option, but can be configured on per-block
              basis.

       re2c:encoding:utf32, re2c:flags:unicode, re2c:flags:u
              Same as the --utf32 option, but can be configured  on  per-block
              basis.

       re2c:encoding-policy, re2c:flags:encoding-policy
              Same  as  the --encoding-policy option, but can be configured on
              per-block basis.

       re2c:eof
              Specifies the sentinel symbol used with the end-of-input rule $.
              The  default  value  is  -1 ($ rule is not used). Other possible
              values include all valid code units. Only  decimal  numbers  are
              recognized.

       re2c:header, re2c:flags:type-header, re2c:flags:t
              Specifies  the name of the generated header file relative to the
              directory of the output file. Same as the --header option except
              that the file path is relative.

       re2c:indent:string
              Specifies the string used for indentation. The default is a sin-
              gle tab character "\t". Indent string should contain  whitespace
              characters only.  To disable indentation entirely, set this con-
              figuration to an empty string.

       re2c:indent:top
              Specifies the minimum amount of indentation to use. The  default
              value  is  zero. The value should be a non-negative integer num-
              ber.

       re2c:label:prefix, re2c:labelprefix
              Specifies the prefix used for DFA state labels. The  default  is
              yy.

       re2c:label:start, re2c:startlabel
              Controls  the  generation  of  a  block start label. The default
              value is zero, which means that the  start  label  is  generated
              only  if  it  is used. An integer value greater than zero forces
              the generation of start label even if it is unused by the lexer.
              A  string  value also forces start label generation and sets the
              label name to the specified string. This  configuration  applies
              only  to  the current block (it is reset to default for the next
              block).

       re2c:label:yyFillLabel
              Specifies the prefix of YYFILL labels used with re2c:eof and  in
              storable state mode.

       re2c:label:yyloop
              Specifies  the  name of the label marking the start of the lexer
              loop with --loop-switch option. The default is yyloop.

       re2c:label:yyNext
              Specifies the name of the optional label that follows YYGETSTATE
              switch  in  storable state mode (enabled with re2c:state:nextla-
              bel). The default is yyNext.

       re2c:lookahead, re2c:flags:lookahead
              Same as inverted --no-lookahead option, but can be configured on
              per-block basis.

       re2c:nested-ifs, re2c:flags:nested-ifs, re2c:flags:s
              Same  as  the  --nested-ifs  option,  but  can  be configured on
              per-block basis.

       re2c:posix-captures, re2c:flags:posix-captures, re2c:flags:P
              Same as the --posix-captures option, but can  be  configured  on
              per-block basis.

       re2c:tags, re2c:flags:tags, re2c:flags:T
              Same  as  the  --tags option, but can be configured on per-block
              basis.

       re2c:tags:expression
              Specifies the expression used for  tag  variables.   By  default
              re2c generates expressions of the form yyt<N>. This might be in-
              convenient, for example if tag variables are defined  as  fields
              in  a struct. All occurrences of @@{tag} or @@ are replaced with
              the actual tag name. For example, re2c:tags:expression = "s.@@";
              results  in  expressions  of  the form s.yyt<N> in the generated
              code.  See also re2c:api:sigil configuration.

       re2c:tags:prefix
              Specifies the prefix for tag variable names. The default is yyt.

       re2c:sentinel
              Specifies the sentinel symbol used for the  end-of-input  checks
              (when  bounds  checks  are disabled with re2c:yyfill:enable = 0;
              and re2c:eof is not set). This  configuration  does  not  affect
              code  generation:  its purpose is to verify that the sentinel is
              not allowed in the middle of a rule, and ensure that  the  lexer
              won't  read past the end of buffer. The default value is -1` (in
              that case re2c assumes that the sentinel is zero, which  is  the
              most common case). Only decimal numbers are recognized.

       re2c:state:abort
              If  set to a positive integer value, changes the default case in
              YYGETSTATE switch: by default it aborts the program, and an  ex-
              plicit -1 case contains transition to the start of the block.

       re2c:state:nextlabel
              Controls if the YYGETSTATE switch is followed by an yyNext label
              (the default value is zero, which corresponds to no label).  Al-
              ternatively  one can use re2c:label:start to generate a specific
              start label, or an explicit getstate:re2c directive to  generate
              the YYGETSTATE switch separately from the lexer block.

       re2c:unsafe, re2c:flags:unsafe
              Same  as  the  --no-unsafe  option,  but  can  be  configured on
              per-block basis.  If set to zero, it suppresses  the  generation
              of unsafe wrappers around YYPEEK. The default is non-zero (wrap-
              pers are generated).  This configuration is specific to Rust.

       re2c:variable:yyaccept
              Specifies the name of the yyaccept variable (see the API  primi-
              tives section).

       re2c:variable:yybm
              Specifies the name of the yybm variable (used for bitmaps).

       re2c:variable:yybm:hex, re2c:yybm:hex
              If set to nonzero, bitmaps for the --bit-vectors option are gen-
              erated in hexadecimal format. The default is zero  (bitmaps  are
              in decimal format).

       re2c:variable:yych
              Specifies  the name of the yych variable (see the API primitives
              section).

       re2c:variable:yych:emit, re2c:yych:emit
              If set to zero, yych definition is not generated.   The  default
              is non-zero.

       re2c:variable:yych:conversion, re2c:yych:conversion
              If set to non-zero, re2c automatically generates a conversion to
              YYCTYPE every time yych is read. The default is to zero (no con-
              version).

       re2c:variable:yyctable
              Specifies the name of the yyctable variable (the jump table gen-
              erated for YYGETCONDITION switch with --computed-gotos option).

       re2c:variable:yytarget
              Specifies the name of the yytarget variable.

       re2c:variable:yystable
              Deprecated.

       re2c:variable:yystate
              Specifies the name  of  the  yystate  variable  (used  with  the
              --loop-switch option to store the current DFA state).

       re2c:yyfill:check
              If  set  to  zero, suppresses the generation of pre-YYFILL check
              for the number of input characters (the YYLESSTHAN definition in
              generic  API and the YYLIMIT-based comparison in C pointer API).
              The default is non-zero (generate the check).

       re2c:yyfill:enable
              If set to zero, suppresses the generation  of  YYFILL  (together
              with  the  check). This should be used when the whole input fits
              into one piece of memory (there is no need  for  buffering)  and
              the  end-of-input  checks do not rely on the YYFILL checks (e.g.
              if a sentinel character is used).  Use warnings (-W option)  and
              re2c:sentinel  configuration  to verify that the generated lexer
              cannot read past the end of input.  The default is non-zero (YY-
              FILL is enabled).

       re2c:yyfill:parameter
              If set to zero, suppresses the generation of parameter passed to
              YYFILL.  The parameter is the minimum number of characters  that
              must be supplied.  Defaults to non-zero (the parameter is gener-
              ated).  This  configuration  can  be  overridden  with  re2c:de-
              fine:YYFILL:naked or re2c:api:style.

REGULAR EXPRESSIONS
       re2c uses the following syntax for regular expressions:

       • "foo" case-sensitive string literal

       • 'foo' case-insensitive string literal

       • [a-xyz], [^a-xyz] character class (possibly negated)

       • . any character except newline

       • R \ S difference of character classes R and SR* zero or more occurrences of RR+ one or more occurrences of RR? optional RR{n} repetition of R exactly n times

       • R{n,} repetition of R at least n times

       • R{n,m} repetition of R from n to m times

       • (R)  just  R;  parentheses  are  used  to  override precedence or for
         POSIX-style submatch

       • R S concatenation: R followed by SR | S alternative: R or SR / S lookahead: R followed by S, but S is not consumed

       • name the regular expression defined as name (or literal string "name"
         in Flex compatibility mode)

       • {name}  the  regular expression defined as name in Flex compatibility
         mode

       • @stag an s-tag: saves the last input position at which @stag  matches
         in a variable named stag#mtag an m-tag: saves all input positions at which #mtag matches in a
         variable named mtag

       Character classes and string literals may contain the following  escape
       sequences: \a, \b, \f, \n, \r, \t, \v, \\, octal escapes \ooo and hexa-
       decimal escapes \xhh, \uhhhh and \Uhhhhhhhh.

HANDLING THE END OF INPUT
       One of the main problems for the lexer is to know when to stop.   There
       are a few terminating conditions:

       • the  lexer may match some rule (including default rule *) and come to
         a final state

       • the lexer may fail to match any rule and come to a default state

       • the lexer may reach the end of input

       The first two conditions terminate the lexer in  a  "natural"  way:  it
       comes  to  a state with no outgoing transitions, and the matching auto-
       matically stops. The third condition, end of input,  is  different:  it
       may  happen  in  any  state, and the lexer should be able to handle it.
       Checking for the end of input interrupts the normal lexer workflow  and
       adds  conditional  branches  to  the generated program, therefore it is
       necessary to minimize the number of such checks. re2c  supports  a  few
       different  methods  for handling the end of input. Which one to use de-
       pends on the complexity of regular expressions, the need for buffering,
       performance  considerations  and other factors. Here is a list of meth-
       ods:

       • Sentinel.  This method eliminates the  need  for  the  end  of  input
         checks  altogether.  It  is  simple and efficient, but limited to the
         case when there is a natural "sentinel" character that can never  oc-
         cur  in valid input. This character may still occur in invalid input,
         but it should not be allowed by the regular expressions, except  per-
         haps as the last character of a rule. The sentinel is appended at the
         end of input and serves as a stop signal: when the lexer  reads  this
         character,  it  is either a syntax error or the end of input. In both
         cases the lexer should stop. This method is used if  YYFILL  is  dis-
         abled with re2c:yyfill:enable = 0; and re2c:eof has the default value
         -1.

       • Sentinel with bounds checks.  This method is generic: it  allows  one
         to  handle any input without restrictions on the regular expressions.
         The idea is to reduce the number of end of input checks by performing
         them  only  on  certain characters. Similar to the "sentinel" method,
         one of the characters is chosen as a "sentinel" and appended  at  the
         end  of input. However, there is no restriction on where the sentinel
         may occur (in fact, any character can  be  chosen  for  a  sentinel).
         When  the  lexer  reads  this  character,  it additionally performs a
         bounds check.  If the current position is within  bounds,  the  lexer
         resumes  matching  and  handles  the sentinel as a regular character.
         Otherwise it invokes YYFILL (unless it is disabled). If more input is
         supplied,  the  lexer will rematch the last character and continue as
         if the sentinel wasn't there. Otherwise it must be the  real  end  of
         input,  and  the  lexer  stops. This method is used when re2c:eof has
         non-negative value (it should be set to the numeric value of the sen-
         tinel). YYFILL is optional.

       • Bounds  checks  with  padding.  This method is generic, and it may be
         faster than the "sentinel with bounds checks" method, but it is  also
         more  complex. The idea is to partition DFA states into strongly con-
         nected components (SCCs) and generate a  single  check  per  SCC  for
         enough  characters to cover the longest non-looping path in this SCC.
         This reduces the number of checks, but there is a problem with  short
         lexemes  at the end of input, as the check requires enough characters
         to cover the longest lexeme. This can be fixed by padding  the  input
         with a few fake characters that do not form a valid lexeme suffix (so
         that the lexer cannot match them). The length of  padding  should  be
         YYMAXFILL,  generated  with /*!max:re2c*/. If there is not enough in-
         put, the lexer invokes YYFILL which should supply at  least  the  re-
         quired  number  of  characters or not return.  This method is used if
         YYFILL is enabled and re2c:eof is -1 (this is the default  configura-
         tion).

       • Custom  checks.   Generic API allows one to override basic operations
         like reading a character, which makes  it  possible  to  include  the
         end-of-input  checks  as  part of them.  This approach is error-prone
         and should be used with caution.  To  use  a  custom  method,  enable
         generic  API  with --api custom or re2c:api = custom; and disable de-
         fault bounds checks with re2c:yyfill:enable = 0; or re2c:yyfill:check
         = 0;.

       The following subsections contain an example of each method.

   Sentinel
       This  example uses a sentinel character to handle the end of input. The
       program counts space-separated words in a null-terminated  string.  The
       sentinel is null: it is the last character of each input string, and it
       is not allowed in the middle of a lexeme by any of the rules  (in  par-
       ticular,  it  is  not  included in character ranges where it is easy to
       overlook). If a null occurs in the middle of a string, it is  a  syntax
       error  and  the lexer will match default rule *, but it won't read past
       the end of  input  or  crash  (use  -Wsentinel-in-midrule  warning  and
       re2c:sentinel  configuration  to  verify  this). Configuration re2c:yy-
       fill:enable = 0; suppresses the generation of bounds checks and  YYFILL
       invocations.

          // re2rust $INPUT -o $OUTPUT

          // Expect a null-terminated string.
          fn lex(s: &[u8]) -> isize {
              let mut cur = 0;
              let mut count = 0;

              'lex: loop {/*!re2c
                  re2c:define:YYCTYPE = u8;
                  re2c:define:YYPEEK  = "*s.get_unchecked(cur)";
                  re2c:define:YYSKIP  = "cur += 1;";
                  re2c:yyfill:enable = 0;

                  *      { return -1; }
                  [\x00] { return count; }
                  [a-z]+ { count += 1; continue 'lex; }
                  [ ]+   { continue 'lex; }
              */}
          }

          fn main() {
              assert_eq!(lex(b"\x00"), 0);
              assert_eq!(lex(b"one two three\x00"), 3);
              assert_eq!(lex(b"f0ur\x00"), -1);
          }

   Sentinel with bounds checks
       This  example uses sentinel with bounds checks to handle the end of in-
       put (this  method  was  added  in  version  1.2).  The  program  counts
       space-separated  single-quoted strings. The sentinel character is null,
       which is specified with re2c:eof = 0; configuration. As in the sentinel
       method,  null is the last character of each input string, but it is al-
       lowed in the middle of a rule (for example, 'aaa\0aa'\0 is valid input,
       but  'aaa\0  is  a  syntax error).  Bounds checks are generated in each
       state that matches an input character,  but  they  are  scoped  to  the
       branch  that handles null. Bounds checks are of the form YYLIMIT <= YY-
       CURSOR or YYLESSTHAN(1) with generic API. If  the  check  condition  is
       true,  lexer  has  reached  the end of input and should stop (YYFILL is
       disabled with re2c:yyfill:enable = 0; as the input fits into  one  buf-
       fer,  see the YYFILL with sentinel section for an example that uses YY-
       FILL). Reaching the end of input  opens  three  possibilities:  if  the
       lexer  is  in  the initial state it will match the end-of-input rule $,
       otherwise it may fallback to a previously matched rule  (including  de-
       fault    rule    *)    or    go    to    a   default   state,   causing
       -Wundefined-control-flow.

          // re2rust $INPUT -o $OUTPUT

          // Expect a null-terminated string.
          fn lex(s: &[u8]) -> isize {
              let (mut cur, mut mar) = (0, 0);
              let lim = s.len() - 1; // null-terminator not included
              let mut count = 0;

              'lex: loop {/*!re2c
                  re2c:define:YYCTYPE    = u8;
                  re2c:define:YYPEEK     = "*s.get_unchecked(cur)";
                  re2c:define:YYSKIP     = "cur += 1;";
                  re2c:define:YYBACKUP   = "mar = cur;";
                  re2c:define:YYRESTORE  = "cur = mar;";
                  re2c:define:YYLESSTHAN = "cur >= lim";
                  re2c:yyfill:enable = 0;
                  re2c:eof = 0;

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { return -1; }
                  $    { return count; }
                  str  { count += 1; continue 'lex; }
                  [ ]+ { continue 'lex; }
              */}
          }

          fn main() {
              assert_eq!(lex(b"\0"), 0);
              assert_eq!(lex(b"'qu\0tes' 'are' 'fine: \\'' \0"), 3);
              assert_eq!(lex(b"'unterminated\\'\0"), -1);
          }

   Bounds checks with padding
       This example uses bounds checks with padding to handle the end of input
       (this method is enabled by default). The program counts space-separated
       single-quoted strings. There is a padding of YYMAXFILL null  characters
       appended  at  the  end of input, where YYMAXFILL value is autogenerated
       with /*!max:re2c*/. It is not necessary to use null for padding --- any
       characters  can be used as long as they do not form a valid lexeme suf-
       fix (in this example padding should not contain single quotes, as  they
       may  be  mistaken  for  a suffix of a single-quoted string). There is a
       "stop" rule that matches the first padding character (null) and  termi-
       nates  the  lexer  (note  that it checks if null is at the beginning of
       padding, otherwise it is a syntax error). Bounds checks  are  generated
       only  in some states that are determined by the strongly connected com-
       ponents of the underlying automaton. Checks have the  form  (YYLIMIT  -
       YYCURSOR) < n or YYLESSTHAN(n) with generic API, where n is the minimum
       number of characters that are needed for the lexer to proceed (it  also
       means  that  the next bounds check will occur in at most n characters).
       If the check condition is true, the lexer has reached the end of  input
       and  will  invoke  YYFILL(n) that should either supply at least n input
       characters or not return. In this example YYFILL always fails and  ter-
       minates  the  lexer with an error (which is fine because the input fits
       into one buffer). See the YYFILL with padding section  for  an  example
       that refills the input buffer with YYFILL.

          // re2rust $INPUT -o $OUTPUT

          /*!max:re2c*/

          fn lex(s: &[u8]) -> isize {
              let mut count = 0;
              let mut cur = 0;
              let lim = s.len() + YYMAXFILL;

              // Copy string to a buffer and add YYMAXFILL zero padding.
              let mut buf = Vec::with_capacity(lim);
              buf.extend(s.iter());
              buf.extend(vec![0; YYMAXFILL]);

              'lex: loop {/*!re2c
                  re2c:define:YYCTYPE    = u8;
                  re2c:define:YYPEEK     = "*buf.get_unchecked(cur)";
                  re2c:define:YYSKIP     = "cur += 1;";
                  re2c:define:YYFILL     = "return -1;";
                  re2c:define:YYLESSTHAN = "cur + @@ > lim";

                  str = ['] ([^'\\] | [\\][^])* ['];

                  [\x00] {
                      // Check that it is the sentinel, not some unexpected null.
                      return if cur == s.len() + 1 { count } else { -1 }
                  }
                  str  { count += 1; continue 'lex; }
                  [ ]+ { continue 'lex; }
                  *    { return -1; }
              */}
          }

          fn main() {
              assert_eq!(lex(b""), 0);
              assert_eq!(lex(b"'qu\0tes' 'are' 'fine: \\'' "), 3);
              assert_eq!(lex(b"'unterminated\\'"), -1);
              assert_eq!(lex(b"'unexpected \0 null"), -1);
          }

   Custom checks
       This  example  uses  a  custom  end-of-input  handling  method based on
       generic API.  The program counts space-separated single-quoted strings.
       It  is the same as the sentinel with bounds checks example, except that
       the input is not null-terminated (this method can be used if padding is
       not  an  option,  not even a single character). To cover up for the ab-
       sence of sentinel character at the end of input, YYPEEK is redefined to
       perform  a  bounds check before it reads the next input character. This
       is inefficient because checks are done very often. If the check  condi-
       tion  fails,  YYPEEK returns the real character, otherwise it returns a
       fake sentinel character.

          // re2rust $INPUT -o $OUTPUT

          // Expect a string without terminating null.
          fn lex(s: &[u8]) -> isize {
              let (mut cur, mut mar) = (0, 0);
              let lim = s.len();
              let mut count = 0;

              'lex: loop {/*!re2c
                  re2c:define:YYCTYPE    = u8;
                  re2c:define:YYPEEK     = "if cur < lim {*s.get_unchecked(cur)} else {0}";
                  re2c:define:YYSKIP     = "cur += 1;";
                  re2c:define:YYBACKUP   = "mar = cur;";
                  re2c:define:YYRESTORE  = "cur = mar;";
                  re2c:define:YYLESSTHAN = "cur >= lim";
                  re2c:yyfill:enable = 0;
                  re2c:eof = 0;

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { return -1; }
                  $    { return count; }
                  str  { count += 1; continue 'lex; }
                  [ ]+ { continue 'lex; }
              */}
          }

          fn main() {
              assert_eq!(lex(b""), 0);
              assert_eq!(lex(b"'qu\0tes' 'are' 'fine: \\'' "), 3);
              assert_eq!(lex(b"'unterminated\\'"), -1);
          }

BUFFER REFILLING
       The need for buffering arises when the input cannot be mapped in memory
       all at once: either it is too large, or it comes in a streaming fashion
       (like reading from a socket). The usual technique in such cases  is  to
       allocate  a  fixed-sized memory buffer and process input in chunks that
       fit into the buffer. When the current chunk is processed, it  is  moved
       out  and new data is moved in. In practice it is somewhat more complex,
       because lexer state consists not of a single input position, but a  set
       of interrelated positions:

       • cursor:  the  next  input character to be read (YYCURSOR in C pointer
         API or YYSKIP/YYPEEK in generic API)

       • limit: the position after the last available input character (YYLIMIT
         in C pointer API, implicitly handled by YYLESSTHAN in generic API)

       • marker:  the  position  of the most recent match, if any (YYMARKER in
         default API or YYBACKUP/YYRESTORE in generic API)

       • token: the start of the current lexeme (implicit in re2c API,  as  it
         is  not  needed for the normal lexer operation and can be defined and
         updated by the user)

       • context marker: the position of the trailing context (YYCTXMARKER  in
         C pointer API or YYBACKUPCTX/YYRESTORECTX in generic API)

       • tag  variables:  submatch positions (defined with /*!stags:re2c*/ and
         /*!mtags:re2c*/  directives  and  YYSTAGP/YYSTAGN/YYMTAGP/YYMTAGN  in
         generic API)

       Not all these are used in every case, but if used, they must be updated
       by YYFILL. All active positions are contained in  the  segment  between
       token  and  cursor, therefore everything between buffer start and token
       can be discarded, the segment from token and  up  to  limit  should  be
       moved to the beginning of buffer, and the free space at the end of buf-
       fer should be filled with new data.  In order to avoid frequent  YYFILL
       calls  it is best to fill in as many input characters as possible (even
       though fewer characters might suffice to resume the lexer). The details
       of  YYFILL implementation are slightly different depending on which EOF
       handling method is used: the case of EOF rule is somewhat simpler  than
       the  case  of  bounds-checking  with  padding.  Also  note  that  if -f
       --storable-state option is used, YYFILL has slightly  different  seman-
       tics (described in the section about storable state).

   YYFILL with sentinel
       If  EOF  rule is used, YYFILL is a function-like primitive that accepts
       no arguments and returns a value which is checked against zero.  YYFILL
       invocation  is  triggered by condition YYLIMIT <= YYCURSOR in C pointer
       API and YYLESSTHAN() in generic API. A non-zero return value means that
       YYFILL  has  failed.  A successful YYFILL call must supply at least one
       character and adjust input positions accordingly. Limit must always  be
       set  to  one after the last input position in buffer, and the character
       at the limit position must be the sentinel symbol specified by re2c:eof
       configuration.  The pictures below show the relative locations of input
       positions in buffer before and after YYFILL call  (sentinel  symbol  is
       marked  with #, and the second picture shows the case when there is not
       enough input to fill the whole buffer).

                         <-- shift -->
                       >-A------------B---------C-------------D#-----------E->
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D------------E#->
                       buffer,  marker        cursor        limit
                       token

                         <-- shift -->
                       >-A------------B---------C-------------D#--E (EOF)
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D---E#........
                       buffer,  marker       cursor limit
                       token

       Here is an example of a program that  reads  input  file  input.txt  in
       chunks of 4096 bytes and uses EOF rule.

          // re2rust $INPUT -o $OUTPUT

          use std::fs::File;
          use std::io::{Read, Write};

          const BUFSIZE: usize = 4096;

          struct State {
              file: File,
              buf: [u8; BUFSIZE],
              lim: usize,
              cur: usize,
              mar: usize,
              tok: usize,
              eof: bool,
          }

          #[derive(PartialEq)]
          enum Fill { Ok, Eof, LongLexeme }

          fn fill(st: &mut State) -> Fill {
              if st.eof { return Fill::Eof; }

              // Error: lexeme too long. In real life could reallocate a larger buffer.
              if st.tok < 1 { return Fill::LongLexeme; }

              // Shift buffer contents (discard everything up to the current token).
              st.buf.copy_within(st.tok..st.lim, 0);
              st.lim -= st.tok;
              st.cur -= st.tok;
              st.mar = st.mar.overflowing_sub(st.tok).0; // may underflow if marker is unused
              st.tok = 0;

              // Fill free space at the end of buffer with new data from file.
              match st.file.read(&mut st.buf[st.lim..BUFSIZE - 1]) { // -1 for sentinel
                  Ok(n) => {
                      st.lim += n;
                      st.eof = n == 0; // end of file
                      st.buf[st.lim] = 0; // append sentinel
                  }
                  Err(why) => panic!("cannot read from file: {}", why)
              }

              return Fill::Ok;
          }

          fn lex(st: &mut State) -> isize {
              let mut count: isize = 0;

              'lex: loop {
                  st.tok = st.cur;
              /*!re2c
                  re2c:define:YYCTYPE    = u8;
                  re2c:define:YYPEEK     = "*st.buf.get_unchecked(st.cur)";
                  re2c:define:YYSKIP     = "st.cur += 1;";
                  re2c:define:YYBACKUP   = "st.mar = st.cur;";
                  re2c:define:YYRESTORE  = "st.cur = st.mar;";
                  re2c:define:YYLESSTHAN = "st.cur >= st.lim";
                  re2c:define:YYFILL     = "fill(st) == Fill::Ok";
                  re2c:eof = 0;

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { return -1; }
                  $    { return count; }
                  str  { count += 1; continue 'lex; }
                  [ ]+ { continue 'lex; }
              */}
          }

          fn main() {
              let fname = "input";
              let content = b"'qu\0tes' 'are' 'fine: \\'' ";

              // Prepare input file: a few times the size of the buffer, containing
              // strings with zeroes and escaped quotes.
              match File::create(fname) {
                  Err(why) => panic!("cannot open {}: {}", fname, why),
                  Ok(mut file) => match file.write_all(&content.repeat(BUFSIZE)) {
                      Err(why) => panic!("cannot write to {}: {}", fname, why),
                      Ok(_) => {}
                  }
              };
              let count = 3 * BUFSIZE; // number of quoted strings written to file

              // Reopen input file for reading.
              let file = match File::open(fname) {
                  Err(why) => panic!("cannot read file {}: {}", fname, why),
                  Ok(file) => file,
              };

              // Initialize lexer state: all offsets are at the end of buffer.
              let lim = BUFSIZE - 1;
              let mut st = State {
                  file: file,
                  // Sentinel (at `lim` offset) is set to null, which triggers YYFILL.
                  buf: [0; BUFSIZE],
                  lim: lim,
                  cur: lim,
                  mar: lim,
                  tok: lim,
                  eof: false,
              };

              // Run the lexer.
              assert_eq!(lex(&mut st), count as isize);

              // Cleanup: remove input file.
              match std::fs::remove_file(fname) {
                  Err(why) => panic!("cannot remove {}: {}", fname, why),
                  Ok(_) => {}
              }
          }

   YYFILL with padding
       In  the  default  case  (when  EOF  rule is not used) YYFILL is a func-
       tion-like primitive that accepts a single argument and does not  return
       any  value.  YYFILL invocation is triggered by condition (YYLIMIT - YY-
       CURSOR) < n in C pointer API and YYLESSTHAN(n) in generic API. The  ar-
       gument  passed  to YYFILL is the minimal number of characters that must
       be supplied. If it fails to do so, YYFILL must not return to the  lexer
       (for  that  reason  it is best implemented as a macro that returns from
       the calling function on failure).  In case of a successful YYFILL invo-
       cation  the limit position must be set either to one after the last in-
       put position in buffer, or to the end of YYMAXFILL padding (in case YY-
       FILL  has  successfully  read  at least n characters, but not enough to
       fill the entire buffer). The pictures below show the relative locations
       of input positions in buffer before and after YYFILL invocation (YYMAX-
       FILL padding on the second picture is marked with # symbols).

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E---F--------G->
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E---F--------G->
                       buffer,  marker cursor               limit
                       token

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E-F        (EOF)
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E-F###############
                       buffer,  marker cursor                   limit
                       token                        <- YYMAXFILL ->

       Here is an example of a program that  reads  input  file  input.txt  in
       chunks of 4096 bytes and uses bounds-checking with padding.

          // re2rust $INPUT -o $OUTPUT

          use std::fs::File;
          use std::io::{Read, Write};

          /*!max:re2c*/
          const BUFSIZE: usize = 4096;

          struct State {
              file: File,
              buf: [u8; BUFSIZE],
              lim: usize,
              cur: usize,
              mar: usize,
              tok: usize,
              eof: bool,
          }

          #[derive(PartialEq)]
          enum Fill { Ok, Eof, LongLexeme }

          fn fill(st: &mut State, need: usize) -> Fill {
              if st.eof { return Fill::Eof; }

              // Error: lexeme too long. In real life can reallocate a larger buffer.
              if st.tok < need { return Fill::LongLexeme; }

              // Shift buffer contents (discard everything up to the current token).
              st.buf.copy_within(st.tok..st.lim, 0);
              st.lim -= st.tok;
              st.cur -= st.tok;
              st.mar = st.mar.overflowing_sub(st.tok).0; // underflows if marker is unused
              st.tok = 0;

              // Fill free space at the end of buffer with new data from file.
              let n = match st.file.read(&mut st.buf[st.lim..BUFSIZE - YYMAXFILL]) {
                  Ok(n) => n,
                  Err(why) => panic!("cannot read from file: {}", why)
              };
              st.lim += n;

              // If read zero characters, this is end of input => add zero padding
              // so that the lexer can access characters at the end of buffer.
              if n == 0 {
                  st.eof = true;
                  for i in 0..YYMAXFILL { st.buf[st.lim + i] = 0; }
                  st.lim += YYMAXFILL;
              }

              return Fill::Ok;
          }

          fn lex(st: &mut State) -> isize {
              let mut count: isize = 0;

              'lex: loop {
                  st.tok = st.cur;
              /*!re2c
                  re2c:define:YYCTYPE    = u8;
                  re2c:define:YYPEEK     = "*st.buf.get_unchecked(st.cur)";
                  re2c:define:YYSKIP     = "st.cur += 1;";
                  re2c:define:YYBACKUP   = "st.mar = st.cur;";
                  re2c:define:YYRESTORE  = "st.cur = st.mar;";
                  re2c:define:YYLESSTHAN = "st.lim - st.cur < @@";
                  re2c:define:YYFILL     = "if fill(st, @@) != Fill::Ok { return -1; }";

                  str = ['] ([^'\\] | [\\][^])* ['];

                  [\x00] {
                      // Check that it is the sentinel, not some unexpected null.
                      return if st.tok == st.lim - YYMAXFILL { count } else { -1 }
                  }
                  str  { count += 1; continue 'lex; }
                  [ ]+ { continue 'lex; }
                  *    { return -1; }
              */}
          }

          fn main() {
              let fname = "input";
              let content = b"'qu\0tes' 'are' 'fine: \\'' ";

              // Prepare input file: a few times the size of the buffer, containing
              // strings with zeroes and escaped quotes.
              match File::create(fname) {
                  Err(why) => panic!("cannot open {}: {}", fname, why),
                  Ok(mut file) => match file.write_all(&content.repeat(BUFSIZE)) {
                      Err(why) => panic!("cannot write to {}: {}", fname, why),
                      Ok(_) => {}
                  }
              };
              let count = 3 * BUFSIZE; // number of quoted strings written to file

              // Reopen input file for reading.
              let file = match File::open(fname) {
                  Err(why) => panic!("cannot read file {}: {}", fname, why),
                  Ok(file) => file,
              };

              // Initialize lexer state: all offsets are at the end of buffer.
              // This immediately triggers YYFILL, as the YYLESSTHAN condition is true.
              let lim = BUFSIZE - YYMAXFILL;
              let mut st = State {
                  file: file,
                  buf: [0; BUFSIZE],
                  lim: lim,
                  cur: lim,
                  mar: lim,
                  tok: lim,
                  eof: false,
              };

              // Run the lexer.
              assert_eq!(lex(&mut st), count as isize);

              // Cleanup: remove input file.
              match std::fs::remove_file(fname) {
                  Err(why) => panic!("cannot remove {}: {}", fname, why),
                  Ok(_) => {}
              }
          }

MULTIPLE BLOCKS
       Sometimes it is necessary to have multiple interrelated lexers (for ex-
       ample, if there is a high-level state machine that transitions  between
       lexer  modes).  This  can  be implemented using multiple connected re2c
       blocks. Another option is to use start conditions.

       The implementation of connections between blocks depends on the  target
       language.  In languages that have goto statement (such as C/C++ and Go)
       one can have all blocks in one function, each of them prefixed  with  a
       label.  Transition from one block to another is a simple goto.  In lan-
       guages that do not have goto (such as Rust) it is necessary  to  use  a
       loop  with  a  switch  on  a  state  variable,  similar  to the yystate
       loop/switch generated by re2c, or else wrap each block  in  a  function
       and use function calls.

       The  example below uses multiple blocks to parse binary, octal, decimal
       and hexadecimal numbers. Each base has its own block. The initial block
       determines  base  and dispatches to other blocks. Common configurations
       are defined in a separate block at the beginning of the  program;  they
       are inherited by the other blocks.

          // re2rust $INPUT -o $OUTPUT

          // Store u32 number in u64 during parsing to simplify overflow hadling.
          struct State<'a> {
              str: &'a [u8],
              cur: usize,
              mar: usize,
              num: u64,
          }

          /*!re2c // Common re2c definitions shared between all functions.
              re2c:yyfill:enable = 0;
              re2c:define:YYCTYPE   = u8;
              re2c:define:YYPEEK    = "*st.str.get_unchecked(st.cur)";
              re2c:define:YYSKIP    = "st.cur += 1;";
              re2c:define:YYBACKUP  = "st.mar = st.cur;";
              re2c:define:YYRESTORE = "st.cur = st.mar;";
              re2c:define:YYSHIFT   = "st.cur = (st.cur as isize + @@) as usize;";
          */

          const ERROR: u64 = std::u32::MAX as u64 + 1; // overflow

          macro_rules! maybe { // Convert the number from u64 to optional u32.
              ($n:expr) => { if $n < ERROR { Some($n as u32) } else { None } }
          }

          // Add digit with the given base, checking for overflow.
          fn add(st: &mut State, offs: u8, base: u64) {
              let digit = unsafe { st.str.get_unchecked(st.cur - 1) } - offs;
              st.num = std::cmp::min(st.num * base + digit as u64, ERROR);
          }

          fn parse_u32(s: & [u8]) -> Option<u32> {
              let mut st = State {str: s, cur: 0, mar: 0, num: 0};
          /*!re2c
              '0b' / [01]        { return parse_bin(&mut st); }
              "0"                { return parse_oct(&mut st); }
              "" / [1-9]         { return parse_dec(&mut st); }
              '0x' / [0-9a-fA-F] { return parse_hex(&mut st); }
              *                  { return None; }
          */
          }

          fn parse_bin(st: &mut State) -> Option<u32> {
              'bin: loop {/*!re2c
                  [01] { add(st, 48, 2); continue 'bin; }
                  *    { return maybe!(st.num); }
              */}
          }

          fn parse_oct(st: &mut State) -> Option<u32> {
              'oct: loop {/*!re2c
                  [0-7] { add(st, 48, 8); continue 'oct; }
                  *     { return maybe!(st.num); }
              */}
          }

          fn parse_dec(st: &mut State) -> Option<u32> {
              'dec: loop {/*!re2c
                  [0-9] { add(st, 48, 10); continue 'dec; }
                  *     { return maybe!(st.num); }
              */}
          }

          fn parse_hex(st: &mut State) -> Option<u32> {
              'hex: loop {/*!re2c
                  [0-9] { add(st, 48, 16); continue 'hex; }
                  [a-f] { add(st, 87, 16); continue 'hex; }
                  [A-F] { add(st, 55, 16); continue 'hex; }
                  *     { return maybe!(st.num); }
              */}
          }

          fn main() {
              assert_eq!(parse_u32(b"\0"), None);
              assert_eq!(parse_u32(b"1234567890\0"), Some(1234567890));
              assert_eq!(parse_u32(b"0b1101\0"), Some(13));
              assert_eq!(parse_u32(b"0x7Fe\0"), Some(2046));
              assert_eq!(parse_u32(b"0644\0"), Some(420));
              assert_eq!(parse_u32(b"9999999999\0"), None);
          }

START CONDITIONS
       Start  conditions are enabled with --start-conditions option. They pro-
       vide a way to encode multiple interrelated  automata  within  the  same
       re2c block.

       Each  condition corresponds to a single automaton and has a unique name
       specified by the user and a unique internal number defined by re2c. The
       numbers  are used to switch between conditions: the generated code uses
       YYGETCONDITION and YYSETCONDITION primitives to get the current  condi-
       tion  or set it to the given number. Use /*!conditions:re2c*/ directive
       or the --header option to generate numeric condition identifiers.  Con-
       figuration re2c:cond:enumprefix specifies the generated identifier pre-
       fix.

       In condition mode every rule must be prefixed with a list of comma-sep-
       arated  condition  names in angle brackets, or a wildcard <*> to denote
       all conditions. The rule syntax is extended as follows:

          < cond-list > regexp action
                 A rule that is merged to every condition  on  the  cond-list.
                 It matches regexp and executes the associated action.

          < cond-list > regexp => cond action
                 A  rule  that  is merged to every condition on the cond-list.
                 It matches regexp, sets the current condition to cond and ex-
                 ecutes the associated action.

          < cond-list > regexp :=> cond
                 A  rule  that  is merged to every condition on the cond-list.
                 It matches regexp and immediately transitions to cond  (there
                 is no semantic action).

          <! cond-list > action
                 The  action is prepended to semantic actions of all rules for
                 every condition on the cond-list. This may be used  to  dedu-
                 plicate common code.

          < > action
                 A  rule that is merged to a special entry condition with num-
                 ber zero and name "0". It matches empty string  and  executes
                 the action.

          < > => cond action
                 A  rule that is merged to a special entry condition with num-
                 ber zero and name "0". It matches empty string, sets the cur-
                 rent condition to cond and executes the action.

          < > :=> cond
                 A  rule that is merged to a special entry condition with num-
                 ber zero and name "0". It matches empty  string  and  immedi-
                 ately transitions to cond.

       The  code  re2c  generates  for conditions depends on whether re2c uses
       goto/label approach or loop/switch approach to encode the automata.

       In languages that have goto statement (such as C/C++ and Go) conditions
       are naturally implemented as blocks of code prefixed with labels of the
       form yyc_<cond>, where cond is a condition name (label  prefix  can  be
       changed  with re2c:cond:prefix). Transitions between conditions are im-
       plemented using goto and condition labels. Before all  conditions  re2c
       generates an initial switch on YYGETSTATE that jumps to the start state
       of the current condition.  The shortcut rules :=>  bypass  the  initial
       switch and jump directly to the specified condition (re2c:cond:goto can
       be used to change the default behavior). The rules  with  semantic  ac-
       tions  do  not automatically jump to the next condition; this should be
       done by the user-defined action code.

       In languages that do not have goto (such as Rust) re2c reuses the  yys-
       tate variable to store condition numbers. Each condition gets a numeric
       identifier equal to the number of its start state, and a switch between
       conditions is no different than a switch between DFA states of a single
       condition. There is no need for a separate  initial  condition  switch.
       (Since  the  same approach is used to implement storable states, YYGET-
       CONDITION/YYSETCONDITION are redundant if both storable states and con-
       ditions are used).

       The program below uses start conditions to parse binary, octal, decimal
       and hexadecimal numbers. There is a single block where  each  base  has
       its  own  condition,  and  the initial condition is connected to all of
       them. User-defined variable cond stores the current  condition  number;
       it is initialized to the number of the initial condition generated with
       /*!conditions:re2c*/.

          // re2rust $INPUT -o $OUTPUT -c

          /*!conditions:re2c*/

          const ERROR: u64 = std::u32::MAX as u64 + 1; // overflow

          // Add digit with the given base, checking for overflow.
          fn add(num: &mut u64, str: &[u8], cur: usize, offs: u8, base: u64) {
              let digit = unsafe { str.get_unchecked(cur - 1) } - offs;
              *num = std::cmp::min(*num * base + digit as u64, ERROR);
          }

          fn parse_u32(str: &[u8]) -> Option<u32> {
              let (mut cur, mut mar) = (0, 0);
              let mut cond = YYC_INIT;
              let mut num = 0u64; // Store number in u64 to simplify overflow checks.

              'lex: loop {/*!re2c
                  re2c:define:YYCTYPE   = u8;
                  re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP    = "cur += 1;";
                  re2c:define:YYBACKUP  = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";
                  re2c:define:YYSHIFT   = "cur = (cur as isize + @@) as usize;";
                  re2c:define:YYGETCONDITION = "cond";
                  re2c:define:YYSETCONDITION = "cond = @@;";
                  re2c:yyfill:enable = 0;

                  <INIT> '0b' / [01]        :=> BIN
                  <INIT> "0"                :=> OCT
                  <INIT> "" / [1-9]         :=> DEC
                  <INIT> '0x' / [0-9a-fA-F] :=> HEX
                  <INIT> * { return None; }

                  <BIN> [01]  { add(&mut num, str, cur, 48, 2);  continue 'lex; }
                  <OCT> [0-7] { add(&mut num, str, cur, 48, 8);  continue 'lex; }
                  <DEC> [0-9] { add(&mut num, str, cur, 48, 10); continue 'lex; }
                  <HEX> [0-9] { add(&mut num, str, cur, 48, 16); continue 'lex; }
                  <HEX> [a-f] { add(&mut num, str, cur, 87, 16); continue 'lex; }
                  <HEX> [A-F] { add(&mut num, str, cur, 55, 16); continue 'lex; }

                  <BIN, OCT, DEC, HEX> * {
                      return if num < ERROR { Some(num as u32) } else { None };
                  }
              */}
          }

          fn main() {
              assert_eq!(parse_u32(b"\0"), None);
              assert_eq!(parse_u32(b"1234567890\0"), Some(1234567890));
              assert_eq!(parse_u32(b"0b1101\0"), Some(13));
              assert_eq!(parse_u32(b"0x7Fe\0"), Some(2046));
              assert_eq!(parse_u32(b"0644\0"), Some(420));
              assert_eq!(parse_u32(b"9999999999\0"), None);
          }

STORABLE STATE
       With --storable-state option re2c generates a lexer that can store  its
       current  state,  return  to the caller, and later resume operations ex-
       actly where it left off. The default mode of operation  in  re2c  is  a
       "pull"  model,  in which the lexer "pulls" more input whenever it needs
       it. This may be unacceptable in cases when the input becomes  available
       piece  by piece (for example, if the lexer is invoked by the parser, or
       if the lexer program communicates via a socket protocol with some other
       program  that  must wait for a reply from the lexer before it transmits
       the next message). Storable state feature is intended exactly for  such
       cases:  it  allows  one to generate lexers that work in a "push" model.
       When the lexer needs more input, it stores its state and returns to the
       caller.  Later,  when  more input becomes available, the caller resumes
       the lexer exactly where it stopped. There are a few  changes  necessary
       compared to the "pull" model:

       • Define YYSETSTATE() and YYGETSTATE(state) primitives.

       • Define yych, yyaccept (if used) and state variables as a part of per-
         sistent lexer state. The state variable should be initialized to -1.

       • YYFILL should return to the outer program instead of trying to supply
         more input. Return code should indicate that lexer needs more input.

       • The  outer  program should recognize situations when lexer needs more
         input and respond appropriately.

       • Optionally use getstate:re2c to generate YYGETSTATE  switch  detached
         from  the  main  lexer.  This only works for languages that have goto
         (not in --loop-switch mode).

       • Use re2c:eof and the sentinel with bounds checks method to handle the
         end of input. Padding-based method may not work because it is unclear
         when to append padding: the current end of input may not be the ulti-
         mate end of input, and appending padding too early may cut off a par-
         tially read greedy lexeme.  Furthermore, due  to  high-level  program
         logic  getting  more input may depend on processing the lexeme at the
         end of buffer (which already is blocked due to the end-of-input  con-
         dition).

       Here is an example of a "push" model lexer that simulates reading pack-
       ets from a socket. The lexer loops until it encounters the end of input
       and returns to the calling function. The calling function provides more
       input by "sending" the next packet and  resumes  lexing.  This  process
       stops when all the packets have been sent, or when there is an error.

          // re2rust $INPUT -o $OUTPUT -f

          use std::fs::File;
          use std::io::{Read, Write};

          const DEBUG: bool = false;
          macro_rules! log {
              ($($fmt:expr)? $(, $args:expr)*) => {
                  if DEBUG { println!($($fmt)? $(, $args)*) }
              }
          }

          // Use a small buffer to cover the case when a lexeme doesn't fit.
          // In real world use a larger buffer.
          const BUFSIZE: usize = 10;

          struct State {
              file: File,
              buf: [u8; BUFSIZE],
              lim: usize,
              cur: usize,
              mar: usize,
              tok: usize,
              state: isize,
          }

          #[derive(Debug, PartialEq)]
          enum Status {End, Ready, Waiting, BadPacket, BigPacket}

          fn fill(st: &mut State) -> Status {
              // Error: lexeme too long. In real life can reallocate a larger buffer.
              if st.tok < 1 { return Status::BigPacket; }

              // Shift buffer contents (discard everything up to the current lexeme).
              st.buf.copy_within(st.tok..st.lim, 0);
              st.lim -= st.tok;
              st.cur -= st.tok;
              st.mar = st.mar.overflowing_sub(st.tok).0; // underflows if marker is unused
              st.tok = 0;

              // Fill free space at the end of buffer with new data.
              match st.file.read(&mut st.buf[st.lim..BUFSIZE - 1]) { // -1 for sentinel
                  Ok(n) => {
                      st.lim += n;
                      st.buf[st.lim] = 0; // append sentinel symbol
                  },
                  Err(why) => panic!("cannot read from file: {}", why)
              }

              return Status::Ready;
          }

          fn lex(st: &mut State, recv: &mut usize) -> Status {
              let mut yych;
              'lex: loop {
                  st.tok = st.cur;
              /*!re2c
                  re2c:eof = 0;
                  re2c:define:YYCTYPE    = "u8";
                  re2c:define:YYPEEK     = "*st.buf.get_unchecked(st.cur)";
                  re2c:define:YYSKIP     = "st.cur += 1;";
                  re2c:define:YYBACKUP   = "st.mar = st.cur;";
                  re2c:define:YYRESTORE  = "st.cur = st.mar;";
                  re2c:define:YYLESSTHAN = "st.cur >= st.lim";
                  re2c:define:YYGETSTATE = "st.state";
                  re2c:define:YYSETSTATE = "st.state = @@;";
                  re2c:define:YYFILL     = "return Status::Waiting;";

                  packet = [a-z]+[;];

                  *      { return Status::BadPacket; }
                  $      { return Status::End; }
                  packet { *recv += 1; continue 'lex; }
              */}
          }

          fn test(packets: Vec<&[u8]>, expect: Status) {
              // Create a "socket" (open the same file for reading and writing).
              let fname = "pipe";
              let mut fw: File = match File::create(fname) {
                  Err(why) => panic!("cannot open {}: {}", fname, why),
                  Ok(file) => file,
              };
              let fr: File = match File::open(fname) {
                  Err(why) => panic!("cannot read file {}: {}", fname, why),
                  Ok(file) => file,
              };

              // Initialize lexer state: `state` value is -1, all offsets are at the end
              // of buffer, the character at `lim` offset is the sentinel (null).
              let lim = BUFSIZE - 1;
              let mut state = State {
                  file: fr,
                  // Sentinel (at `lim` offset) is set to null, which triggers YYFILL.
                  buf: [0; BUFSIZE],
                  cur: lim,
                  mar: lim,
                  tok: lim,
                  lim: lim,
                  state: -1,
              };

              // Main loop. The buffer contains incomplete data which appears packet by
              // packet. When the lexer needs more input it saves its internal state and
              // returns to the caller which should provide more input and resume lexing.
              let mut status;
              let mut send = 0;
              let mut recv = 0;
              loop {
                  status = lex(&mut state, &mut recv);
                  if status == Status::End {
                      log!("done: got {} packets", recv);
                      break;
                  } else if status == Status::Waiting {
                      log!("waiting...");
                      if send < packets.len() {
                          log!("sent packet {}", send);
                          match fw.write_all(packets[send]) {
                              Err(why) => panic!("cannot write to {}: {}", fname, why),
                              Ok(_) => send += 1,
                          }
                      }
                      status = fill(&mut state);
                      log!("queue: '{}'", String::from_utf8_lossy(&state.buf));
                      if status == Status::BigPacket {
                          log!("error: packet too big");
                          break;
                      }
                      assert_eq!(status, Status::Ready);
                  } else {
                      assert_eq!(status, Status::BadPacket);
                      log!("error: ill-formed packet");
                      break;
                  }
              }

              // Check results.
              assert_eq!(status, expect);
              if status == Status::End { assert_eq!(recv, send); }

              // Cleanup: remove input file.
              match std::fs::remove_file(fname) {
                  Err(why) => panic!("cannot remove {}: {}", fname, why),
                  Ok(_) => {}
              }
          }

          fn main() {
              test(vec![], Status::End);
              test(vec![b"zero;", b"one;", b"two;", b"three;", b"four;"], Status::End);
              test(vec![b"zer0;"], Status::BadPacket);
              test(vec![b"goooooooooogle;"], Status::BigPacket);
          }

REUSABLE BLOCKS
       Reusable  blocks are re2c blocks that can be reused any number of times
       and  combined  with  other  re2c  blocks.   They   are   defined   with
       /*!rules:re2c[:<name>]  ...  */ (the <name> is optional). A rules block
       can be used in two contexts: either in a use block, or in a use  direc-
       tive  inside  of another block. The code for a rules block is generated
       at every point of use.

       Use blocks are defined with /*!use:re2c[:<name>] ... */. The <name>  is
       optional;  if not specified, the associated rules block is the most re-
       cent one (whether named or unnamed). A use block can add named  defini-
       tions,  configurations and rules of its own.  An important use case for
       use blocks is a lexer that supports multiple input encodings: the  same
       rules  block is reused multiple times with encoding-specific configura-
       tions (see the example below).

       In-block use directive !use:<name>; can be used from inside of  a  re2c
       block.  It  merges the referenced block <name> into the current one. If
       some of the merged rules and configurations overlap with the previously
       defined  ones,  conflicts  are  resolved in the usual way: the earliest
       rule takes priority, and latest configuration overrides preceding ones.
       One  exception  are the special rules *, $ and (in condition mode) <!>,
       for which a block-local definition overrides any  inherited  ones.  Use
       directive  allows  one to combine different re2c blocks together in one
       block (see the example below).

       Named blocks and in-block use directive were added in re2c version 2.2.
       Since  that  version reusable blocks are allowed by default (no special
       option is needed). Before version 2.2 reuse mode was  enabled  with  -r
       --reusable  option.  Before  version  1.2  reusable blocks could not be
       mixed with normal blocks.

   Example of a !use directive
          // re2rust $INPUT -o $OUTPUT

          // This example shows how to combine reusable re2c blocks: two blocks
          // ('colors' and 'fish') are merged into one. The 'salmon' rule occurs
          // in both blocks; the 'fish' block takes priority because it is used
          // earlier. Default rule * occurs in all three blocks; the local (not
          // inherited) definition takes priority.

          #[derive(Debug, PartialEq)]
          enum Ans { Color, Fish, Dunno }

          /*!rules:re2c:colors
              *                            { panic!("ah"); }
              "red" | "salmon" | "magenta" { return Ans::Color; }
          */

          /*!rules:re2c:fish
              *                            { panic!("oh"); }
              "haddock" | "salmon" | "eel" { return Ans::Fish; }
          */

          fn lex(str: &[u8]) -> Ans {
              let (mut cur, mut mar) = (0, 0);
              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE   = u8;
                  re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP    = "cur += 1;";
                  re2c:define:YYBACKUP  = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";

                  !use:fish;
                  !use:colors;
                  * { return Ans::Dunno; }  // overrides inherited '*' rules
              */
          }

          fn main() {
              assert_eq!(lex(b"salmon"), Ans::Fish);
              assert_eq!(lex(b"what?"), Ans::Dunno);
          }

   Example of a /*!use:re2c ... */ block
          // re2rust $INPUT -o $OUTPUT --input-encoding utf8

          // This example supports multiple input encodings: UTF-8 and UTF-32.
          // Both lexers are generated from the same rules block, and the use
          // blocks add only encoding-specific configurations.
          /*!rules:re2c
              re2c:yyfill:enable = 0;
              re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
              re2c:define:YYSKIP    = "cur += 1;";
              re2c:define:YYBACKUP  = "mar = cur;";
              re2c:define:YYRESTORE = "cur = mar;";

              "∀x ∃y" { return Some(cur); }
              *       { return None; }
          */

          fn lex_utf8(str: &[u8]) -> Option<usize> {
              let (mut cur, mut mar) = (0, 0);
              /*!use:re2c
                  re2c:encoding:utf8 = 1;
                  re2c:define:YYCTYPE = u8;
              */
          }

          fn lex_utf32(str: &[u32]) -> Option<usize> {
              let (mut cur, mut mar) = (0, 0);
              /*!use:re2c
                  re2c:encoding:utf32 = 1;
                  re2c:define:YYCTYPE = u32;
              */
          }

          fn main() {
              let s8 = vec![0xe2, 0x88, 0x80, 0x78, 0x20, 0xe2, 0x88, 0x83, 0x79];
              assert_eq!(lex_utf8(&s8), Some(s8.len()));

              let s32 = vec![0x2200, 0x78, 0x20, 0x2203, 0x79];
              assert_eq!(lex_utf32(&s32), Some(s32.len()));
          }

SUBMATCH EXTRACTION
       re2c has two options for submatch extraction.

       The first option is -T --tags. With this option one can use  standalone
       tags  of  the  form  @stag and #mtag, where stag and mtag are arbitrary
       used-defined names. Tags can be used anywhere inside of a  regular  ex-
       pression; semantically they are just position markers. Tags of the form
       @stag are called s-tags: they denote a single submatch value (the  last
       input  position  where  this  tag  matched). Tags of the form #mtag are
       called m-tags: they denote multiple submatch values (the whole  history
       of repetitions of this tag).  All tags should be defined by the user as
       variables with the corresponding names. With standalone tags re2c  uses
       leftmost  greedy  disambiguation:  submatch positions correspond to the
       leftmost matching path through the regular expression.

       The second option is -P --posix-captures:  it  enables  POSIX-compliant
       capturing  groups.  In this mode parentheses in regular expressions de-
       note the beginning and the end of capturing groups; the  whole  regular
       expression  is group number zero. The number of groups for the matching
       rule is stored in a variable yynmatch, and submatch results are  stored
       in  yypmatch array. Both yynmatch and yypmatch should be defined by the
       user, and yypmatch size must be at least [yynmatch * 2]. re2c  provides
       a  directive  /*!maxnmatch:re2c*/  that defines YYMAXNMATCH: a constant
       equal to the maximal value of yynmatch among all rules. Note that  re2c
       implements  POSIX-compliant  disambiguation: each subexpression matches
       as long as possible, and subexpressions that start earlier  in  regular
       expression  have  priority  over those starting later. Capturing groups
       are translated into s-tags under the hood, therefore we  use  the  word
       "tag" to describe them as well.

       With  both -P --posix-captures and T --tags options re2c uses efficient
       submatch extraction algorithm described in the Tagged Deterministic Fi-
       nite Automata with Lookahead paper. The overhead on submatch extraction
       in the generated lexer grows with the number of tags --- if this number
       is  moderate,  the overhead is barely noticeable. In the lexer tags are
       implemented using a number of tag variables generated by re2c. There is
       no  one-to-one  correspondence between tag variables and tags: a single
       variable may be reused for different tags, and one tag may require mul-
       tiple  variables to hold all its ambiguous values. Eventually ambiguity
       is resolved, and only one final variable per tag survives. When a  rule
       matches,  all  its  tags are set to the values of the corresponding tag
       variables.  The exact number of tag variables is unknown to  the  user;
       this number is determined by re2c. However, tag variables should be de-
       fined by the user as a part of the lexer state and updated  by  YYFILL,
       therefore  re2c provides directives /*!stags:re2c*/ and /*!mtags:re2c*/
       that can be used to declare, initialize and manipulate  tag  variables.
       These  directives  have  two  optional  configurations:  format = "@@";
       (specifies the template where @@ is substituted with the name  of  each
       tag variable), and separator = ""; (specifies the piece of code used to
       join the generated pieces for different tag variables).

       S-tags support the following operations:

       • save input position to an s-tag: t = YYCURSOR with C pointer API or a
         user-defined operation YYSTAGP(t) with generic API

       • save  default  value  to  an  s-tag: t = NULL with C pointer API or a
         user-defined operation YYSTAGN(t) with generic API

       • copy one s-tag to another: t1 = t2

       M-tags support the following operations:

       • append input position to an  m-tag:  a  user-defined  operation  YYM-
         TAGP(t) with both default and generic API

       • append default value to an m-tag: a user-defined operation YYMTAGN(t)
         with both default and generic API

       • copy one m-tag to another: t1 = t2

       S-tags can be implemented  as  scalar  values  (pointers  or  offsets).
       M-tags  need a more complex representation, as they need to store a se-
       quence of tag values. The most naive and inefficient representation  of
       an m-tag is a list (array, vector) of tag values; a more efficient rep-
       resentation is to store all m-tags in a prefix-tree represented as  ar-
       ray  of nodes (v, p), where v is tag value and p is a pointer to parent
       node.

       Here is a simple example of using s-tags  to  parse  semantic  versions
       consisting of three numeric components: major, minor, patch (the latter
       is optional).  See below for a more complex example that uses YYFILL.

          // re2rust $INPUT -o $OUTPUT

          #[derive(Debug, PartialEq)]
          struct SemVer(u32, u32, u32); // version: (major, minor, patch)

          fn s2n(str: &[u8]) -> u32 { // convert a pre-parsed string to a number
              let mut n = 0;
              for i in str { n = n * 10 + *i as u32 - 48; }
              return n;
          }

          fn parse(str: &[u8]) -> Option<SemVer> {
              let (mut cur, mut mar) = (0, 0);

              // User-defined tag variables that are available in semantic action.
              let (t1, mut t2, t3, t4, t5);

              // Autogenerated tag variables used by the lexer to track tag values.
              const NONE: usize = std::usize::MAX;
              /*!stags:re2c format = 'let mut @@{tag} = NONE;'; */

              /*!re2c
                  re2c:define:YYCTYPE     = u8;
                  re2c:define:YYPEEK      = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP      = "cur += 1;";
                  re2c:define:YYBACKUP    = "mar = cur;";
                  re2c:define:YYRESTORE   = "cur = mar;";
                  re2c:define:YYSTAGP     = "@@{tag} = cur;";
                  re2c:define:YYSTAGN     = "@@{tag} = NONE;";
                  re2c:define:YYSHIFTSTAG = "@@{tag} -= -@@{shift}isize as usize;";
                  re2c:yyfill:enable = 0;
                  re2c:tags = 1;

                  num = [0-9]+;

                  @t1 num @t2 "." @t3 num @t4 ("." @t5 num)? [\x00] {
                      let major = s2n(&str[t1..t2]);
                      let minor = s2n(&str[t3..t4]);
                      let patch = if t5 != NONE {s2n(&str[t5..cur - 1])} else {0};
                      return Some(SemVer(major, minor, patch));
                  }
                  * { return None; }
              */
          }

          fn main() {
              assert_eq!(parse(b"23.34\0"), Some(SemVer(23, 34, 0)));
              assert_eq!(parse(b"1.2.99999\0"), Some(SemVer(1, 2, 99999)));
              assert_eq!(parse(b"1.a\0"), None);
          }

       Here is a more complex example of using s-tags with YYFILL to  parse  a
       file  with  newline-separated semantic versions. Tag variables are part
       of the lexer state, and they are adjusted in YYFILL  like  other  input
       positions.   Note  that it is necessary for s-tags because their values
       are invalidated after shifting buffer contents. It may not be necessary
       in  a  custom implementation where tag variables store offsets relative
       to the start of the input string rather than the buffer, which  may  be
       the case with m-tags.

          // re2rust $INPUT -o $OUTPUT

          use std::fs::File;
          use std::io::{Read, Write};

          const BUFSIZE: usize = 4096;
          const NONE: usize = std::usize::MAX;

          struct State {
              file: File,
              buf: [u8; BUFSIZE],
              lim: usize,
              cur: usize,
              mar: usize,
              tok: usize,
              // Tag variables must be part of the lexer state passed to YYFILL.
              // They don't correspond to tags and should be autogenerated by re2c.
              /*!stags:re2c format = "@@: usize,\n"; */
              eof: bool,
          }

          #[derive(PartialEq)]
          enum Fill { Ok, Eof, LongLexeme }

          #[derive(Debug, PartialEq)]
          struct SemVer(u32, u32, u32); // version: (major, minor, patch)

          fn s2n(str: &[u8]) -> u32 { // convert a pre-parsed string to a number
              let mut n = 0;
              for i in str { n = n * 10 + *i as u32 - 48; }
              return n;
          }

          macro_rules! shift { // ignore overflow, marker and tags may not be set yet
              ($x:expr, $y:expr) => { $x = $x.overflowing_sub($y).0 }
          }

          fn fill(st: &mut State) -> Fill {
              if st.eof { return Fill::Eof; }

              // Error: lexeme too long. In real life could reallocate a larger buffer.
              if st.tok < 1 { return Fill::LongLexeme; }

              // Shift buffer contents (discard everything up to the current token).
              st.buf.copy_within(st.tok..st.lim, 0);
              st.lim -= st.tok;
              st.cur -= st.tok;
              shift!(st.mar, st.tok);
              // Tag variables need to be shifted like other input positions. The check
              // for NONE is only needed if some tags are nested inside of alternative or
              // repetition, so that they can have NONE value.
              /*!stags:re2c format = "if st.@@ != NONE { shift!(st.@@, st.tok); }\n"; */
              st.tok = 0;

              // Fill free space at the end of buffer with new data from file.
              match st.file.read(&mut st.buf[st.lim..BUFSIZE - 1]) {
                  Ok(n) => {
                      st.lim += n;
                      st.eof = n == 0;
                      st.buf[st.lim] = 0;
                  }
                  Err(why) => panic!("cannot read from file: {}", why)
              }

              return Fill::Ok;
          }

          fn parse(st: &mut State) -> Option<Vec::<SemVer>> {
              let mut vers = Vec::new();
              // User-defined local variables that store final tag values.
              // They are different from tag variables autogenerated with `stags:re2c`,
              // as they are set at the end of match and used only in semantic actions.
              let (mut t1, mut t2, mut t3, mut t4);
              'parse: loop {
                  st.tok = st.cur;
              /*!re2c
                  re2c:eof = 0;
                  re2c:define:YYCTYPE     = u8;
                  re2c:define:YYPEEK      = "*st.buf.get_unchecked(st.cur)";
                  re2c:define:YYSKIP      = "st.cur += 1;";
                  re2c:define:YYBACKUP    = "st.mar = st.cur;";
                  re2c:define:YYRESTORE   = "st.cur = st.mar;";
                  re2c:define:YYSTAGP     = "@@{tag} = st.cur;";
                  re2c:define:YYSTAGN     = "@@{tag} = NONE;";
                  re2c:define:YYSHIFTSTAG = "@@{tag} -= -@@{shift}isize as usize;";
                  re2c:define:YYLESSTHAN  = "st.cur >= st.lim";
                  re2c:define:YYFILL      = "fill(st) == Fill::Ok";
                  re2c:tags = 1;
                  re2c:tags:expression = "st.@@";

                  num = [0-9]+;

                  num @t1 "." @t2 num @t3 ("." @t4 num)? [\n] {
                      let major = s2n(&st.buf[st.tok..t1]);
                      let minor = s2n(&st.buf[t2..t3]);
                      let patch = if t4 != NONE {s2n(&st.buf[t4..st.cur - 1])} else {0};
                      vers.push(SemVer(major, minor, patch));
                      continue 'parse;
                  }
                  $ { return Some(vers); }
                  * { return None; }
              */
              }
          }

          fn main() {
              let fname = "input";
              let verstr = b"1.22.333\n";
              let expect = (0..BUFSIZE).map(|_| SemVer(1, 22, 333)).collect();

              // Prepare input file (make sure it exceeds buffer size).
              match File::create(fname) {
                  Err(why) => panic!("cannot open {}: {}", fname, why),
                  Ok(mut file) => match file.write_all(&verstr.repeat(BUFSIZE)) {
                      Err(why) => panic!("cannot write to {}: {}", fname, why),
                      Ok(_) => {}
                  }
              };

              // Reopen input file for reading.
              let file = match File::open(fname) {
                  Err(why) => panic!("cannot read file {}: {}", fname, why),
                  Ok(file) => file,
              };

              // Initialize lexer state.
              let lim = BUFSIZE - 1;
              let mut st = State {
                  file: file,
                  buf: [0; BUFSIZE], // sentinel is set to zero, which triggers YYFILL
                  lim: lim,
                  cur: lim,
                  mar: lim,
                  tok: lim,
                  /*!stags:re2c format = "@@: NONE,\n"; */
                  eof: false,
              };

              // Run the lexer and check results.
              assert_eq!(parse(&mut st), Some(expect));

              // Cleanup: remove input file.
              match std::fs::remove_file(fname) {
                  Err(why) => panic!("cannot remove {}: {}", fname, why),
                  Ok(_) => {}
              }
          }

       Here  is  an  example of using POSIX capturing groups to parse semantic
       versions.

          // re2rust $INPUT -o $OUTPUT

          // Maximum number of capturing groups among all rules.
          /*!maxnmatch:re2c*/

          #[derive(Debug, PartialEq)]
          struct SemVer(u32, u32, u32); // version: (major, minor, patch)

          fn s2n(str: &[u8]) -> u32 { // convert a pre-parsed string to a number
              let mut n = 0;
              for i in str { n = n * 10 + *i as u32 - 48; }
              return n;
          }

          fn parse(str: &[u8]) -> Option<SemVer> {
              let (mut cur, mut mar) = (0, 0);

              // Allocate memory for capturing parentheses (twice the number of groups).
              let yynmatch: usize;
              let mut yypmatch = [0; YYMAXNMATCH*2];

              // Autogenerated tag variables used by the lexer to track tag values.
              const NONE: usize = std::usize::MAX;
              /*!stags:re2c format = 'let mut @@{tag} = NONE;'; */

              /*!re2c
                  re2c:define:YYCTYPE     = u8;
                  re2c:define:YYPEEK      = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP      = "cur += 1;";
                  re2c:define:YYBACKUP    = "mar = cur;";
                  re2c:define:YYRESTORE   = "cur = mar;";
                  re2c:define:YYSTAGP     = "@@{tag} = cur;";
                  re2c:define:YYSTAGN     = "@@{tag} = NONE;";
                  re2c:define:YYSHIFTSTAG = "@@{tag} -= -@@{shift}isize as usize;";
                  re2c:yyfill:enable = 0;
                  re2c:posix-captures = 1;

                  num = [0-9]+;

                  (num) "." (num) ("." num)? [\x00] {
                      // `yynmatch` is the number of capturing groups
                      assert_eq!(yynmatch, 4);

                      // Even `yypmatch` values are for opening parentheses, odd values
                      // are for closing parentheses, the first group is the whole match.
                      let major = s2n(&str[yypmatch[2]..yypmatch[3]]);
                      let minor = s2n(&str[yypmatch[4]..yypmatch[5]]);
                      let patch = if yypmatch[6] == NONE {0}
                          else {s2n(&str[yypmatch[6] + 1..yypmatch[7]])};

                      return Some(SemVer(major, minor, patch));
                  }
                  * { return None; }
              */
          }

          fn main() {
              assert_eq!(parse(b"23.34\0"), Some(SemVer(23, 34, 0)));
              assert_eq!(parse(b"1.2.99999\0"), Some(SemVer(1, 2, 99999)));
              assert_eq!(parse(b"1.a\0"), None);
          }

       Here is an example of using m-tags to parse a version with  a  variable
       number of components. Tag variables are stored in a trie.

          // re2rust $INPUT -o $OUTPUT

          const NONE: usize = std::usize::MAX;
          const MTAG_ROOT: usize = NONE - 1;

          // An m-tag tree is a way to store histories with an O(1) copy operation.
          // Histories naturally form a tree, as they have common start and fork at some
          // point. The tree is stored as an array of pairs (tag value, link to parent).
          // An m-tag is represented with a single link in the tree (array index).
          type MtagTrie = Vec::<MtagElem>;
          struct MtagElem {
              elem: usize, // tag value
              pred: usize, // index of the predecessor node or root
          }

          // Append a single value to an m-tag history.
          fn add_mtag(trie: &mut MtagTrie, mtag: usize, value: usize) -> usize {
              trie.push(MtagElem{elem: value, pred: mtag});
              return trie.len() - 1;
          }

          // Recursively unwind tag histories and collect version components.
          fn unwind(trie: &MtagTrie, x: usize, y: usize, str: &[u8], ver: &mut Ver) {
              // Reached the root of the m-tag tree, stop recursion.
              if x == MTAG_ROOT && y == MTAG_ROOT { return; }

              // Unwind history further.
              unwind(trie, trie[x].pred, trie[y].pred, str, ver);

              // Get tag values. Tag histories must have equal length.
              assert!(x != MTAG_ROOT && y != MTAG_ROOT);
              let (ex, ey) = (trie[x].elem, trie[y].elem);

              if ex != NONE && ey != NONE {
                  // Both tags are valid string indices, extract component.
                  ver.push(s2n(&str[ex..ey]));
              } else {
                  // Both tags are NONE (this corresponds to zero repetitions).
                  assert!(ex == NONE && ey == NONE);
              }
          }

          type Ver = Vec::<u32>; // unbounded number of version components

          fn s2n(str: &[u8]) -> u32 { // convert a pre-parsed string to a number
              let mut n = 0;
              for i in str { n = n * 10 + *i as u32 - 48; }
              return n;
          }

          fn parse(str: &[u8]) -> Option<Ver> {
              let (mut cur, mut mar) = (0, 0);
              let mut mt: MtagTrie = Vec::new();

              // User-defined tag variables that are available in semantic action.
              let (t1, t2, t3, t4);

              // Autogenerated tag variables used by the lexer to track tag values.
              /*!stags:re2c format = 'let mut @@ = NONE;'; */
              /*!mtags:re2c format = 'let mut @@ = MTAG_ROOT;'; */

              /*!re2c
                  re2c:define:YYCTYPE   = u8;
                  re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP    = "cur += 1;";
                  re2c:define:YYBACKUP  = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";
                  re2c:define:YYSTAGP   = "@@ = cur;";
                  re2c:define:YYSTAGN   = "@@ = NONE;";
                  re2c:define:YYMTAGP   = "@@ = add_mtag(&mut mt, @@, cur);";
                  re2c:define:YYMTAGN   = "@@ = add_mtag(&mut mt, @@, NONE);";
                  re2c:yyfill:enable = 0;
                  re2c:tags = 1;

                  num = [0-9]+;

                  @t1 num @t2 ("." #t3 num #t4)* [\x00] {
                      let mut ver: Ver = Vec::new();
                      ver.push(s2n(&str[t1..t2]));
                      unwind(&mt, t3, t4, str, &mut ver);
                      return Some(ver);
                  }
                  * { return None; }
              */
          }

          fn main() {
              assert_eq!(parse(b"1\0"), Some(vec![1]));
              assert_eq!(parse(b"1.2.3.4.5.6.7\0"), Some(vec![1, 2, 3, 4, 5, 6, 7]));
              assert_eq!(parse(b"1.2.\0"), None);
          }

ENCODING SUPPORT
       It  is  necessary  to understand the difference between code points and
       code units. A code point is a numeric identifier of a  symbol.  A  code
       unit is the smallest unit of storage in the encoded text. A single code
       point may be represented with one or more code units. In a fixed-length
       encoding  all  code points are represented with the same number of code
       units. In a variable-length encoding code  points  may  be  represented
       with  a  different  number of code units.  Note that the "any" rule [^]
       matches any code point, but not necessarily any code unit (the only way
       to  match  any code unit regardless of the encoding is the default rule
       *).  The generated lexer works with a stream of code units: yych stores
       a code unit, and YYCTYPE is the code unit type. Regular expressions, on
       the other hand, are specified in terms of code points. When  re2c  com-
       piles regular expressions to automata it translates code points to code
       units. This is generally not a simple mapping: in  variable-length  en-
       codings  a single code point range may get translated to a complex code
       unit graph.  The following encodings are supported:

       • ASCII (enabled by default). It is a fixed-length encoding  with  code
         space [0-255] and 1-byte code points and code units.

       • EBCDIC  (enabled  with  --ebcdic  or  re2c:encoding:ebcdic).  It is a
         fixed-length encoding with code space [0-255] and 1-byte code  points
         and code units.

       • UCS2   (enabled   with   --ucs2   or  re2c:encoding:ucs2).  It  is  a
         fixed-length encoding with code  space  [0-0xFFFF]  and  2-byte  code
         points and code units.

       • UTF8  (enabled  with  --utf8  or  re2c:encoding:utf8).  It is a vari-
         able-length Unicode encoding. Code unit size is 1 byte.  Code  points
         are represented with 1 -- 4 code units.

       • UTF16  (enabled  with  --utf16 or re2c:encoding:utf16). It is a vari-
         able-length Unicode encoding. Code unit size is 2 bytes. Code  points
         are represented with 1 -- 2 code units.

       • UTF32   (enabled  with  --utf32  or  re2c:encoding:utf32).  It  is  a
         fixed-length Unicode encoding with code space [0-0x10FFFF] and 4-byte
         code points and code units.

       Include  file  include/unicode_categories.re  provides re2c definitions
       for the standard Unicode categories.

       Option --input-encoding specifies source file encoding,  which  can  be
       used  to  enable  Unicode  literals in regular expressions. For example
       --input-encoding utf8 tells re2c that the source file is  in  UTF8  (it
       differs  from  --utf8  which sets input text encoding). Option --encod-
       ing-policy specifies the way  re2c  handles  Unicode  surrogates  (code
       points in range [0xD800-0xDFFF]).

       Below is an example of a lexer for UTF8 encoded Unicode identifiers.

          // re2rust $INPUT -o $OUTPUT --utf8

          /*!include:re2c "unicode_categories.re" */

          fn lex(str: &[u8]) -> bool {
              let (mut cur, mut mar) = (0, 0);
              /*!re2c
                  re2c:define:YYCTYPE   = u8;
                  re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP    = "cur += 1;";
                  re2c:define:YYBACKUP  = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";
                  re2c:yyfill:enable = 0;

                  // Simplified "Unicode Identifier and Pattern Syntax"
                  // (see https://unicode.org/reports/tr31)
                  id_start    = L | Nl | [$_];
                  id_continue = id_start | Mn | Mc | Nd | Pc | [\u200D\u05F3];
                  identifier  = id_start id_continue*;

                  identifier { return true; }
                  *          { return false; }
              */
          }

          fn main() {
              assert!(lex("_Ыдентификатор\0".as_bytes()));
          }

INCLUDE FILES
       re2c  allows one to include other files using directive /*!include:re2c
       FILE */ or !include FILE ;, where FILE is a path to the file to be  in-
       cluded.   The first form should be used outside of re2c blocks, and the
       second form allows one to include a file in the middle of a re2c block.
       re2c  looks  for  included files in the directory of the including file
       and in include locations, which can be specified with -I  option.   In-
       clude  directives  in  re2c work in the same way as C/C++ #include: the
       contents of FILE are copy-pasted verbatim in place  of  the  directive.
       Include files may have further includes of their own. Use --depfile op-
       tion to track build dependencies of the output file on  include  files.
       re2c  provides  some  predefined include files that can be found in the
       include/ subdirectory of the project. These files  contain  definitions
       that  can  be useful to other projects (such as Unicode categories) and
       form something like a standard library for re2c.  Below is  an  example
       of using include directive.

   Include file 1 (definitions.rs)
          #[derive(Debug, PartialEq)]
          enum Num { Int, Float, NaN }

          /*!re2c
              number = [1-9][0-9]*;
          */

   Include file 2 (extra_rules.re.inc)
          // floating-point numbers
          frac  = [0-9]* "." [0-9]+ | [0-9]+ ".";
          exp   = 'e' [+-]? [0-9]+;
          float = frac exp? | [0-9]+ exp;

          float { return Num::Float; }

   Input file
          // re2rust $INPUT -o $OUTPUT

          /*!include:re2c "definitions.rs" */

          fn lex(str: &[u8]) -> Num {
              let mut cur = 0;
              let mut mar = 0;
              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE   = u8;
                  re2c:define:YYPEEK    = "*str.get_unchecked(cur)";
                  re2c:define:YYSKIP    = "cur += 1;";
                  re2c:define:YYBACKUP  = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";

                  *      { return Num::NaN; }
                  number { return Num::Int; }
                  !include "extra_rules.re.inc";
              */
          }

          fn main() {
              assert_eq!(lex(b"123\0"), Num::Int);
              assert_eq!(lex(b"123.4567\0"), Num::Float);
          }

HEADER FILES
       re2c  allows  one to generate header file from the input .re file using
       option -t, --type-header or  configuration  re2c:flags:type-header  and
       directives  /*!header:re2c:on*/ and /*!header:re2c:off*/. The first di-
       rective marks the beginning of header file, and  the  second  directive
       marks  the  end of it. Everything between these directives is processed
       by re2c, and the generated code is written to the file specified by the
       -t  --type-header option (or stdout if this option was not used). Auto-
       generated header file may be needed in cases when re2c is used to  gen-
       erate definitions of constants, variables and structs that must be vis-
       ible from other translation units.

       Here is an example of generating a header file that contains definition
       of  the lexer state with tag variables (the number variables depends on
       the regular grammar and is unknown to the programmer).

   Input file
          // re2rust $INPUT -o $OUTPUT --header lexer/state.rs

          mod lexer;
          use lexer::state::State; // the module is generated by re2c

          /*!header:re2c:on*/
          pub struct State<'a> {
              pub str: &'a [u8],
              pub cur: usize,
              /*!stags:re2c format = "pub @@: usize,"; */
          }
          /*!header:re2c:off*/

          fn lex(st: &mut State) -> usize {
              let t: usize;
              /*!re2c
                  re2c:header = "lexer/state.rs";
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = "u8";
                  re2c:define:YYPEEK  = "*st.str.get_unchecked(st.cur)";
                  re2c:define:YYSKIP  = "st.cur += 1;";
                  re2c:define:YYSTAGP = "@@ = st.cur;";
                  re2c:tags = 1;
                  re2c:tags:expression = "st.@@";

                  [a]* @t [b]* { return t; }
              */
          }

          fn main() {
              let mut st = State {
                  str: b"ab\0",
                  cur: 0,
                  /*!stags:re2c format = "@@: 0,"; */
              };
              assert_eq!(lex(&mut st), 1);
          }

   Header file
          /* Generated by re2c */

          pub struct State<'a> {
              pub str: &'a [u8],
              pub cur: usize,
              pub yyt1: usize,
          }

SKELETON PROGRAMS
       With the -S, --skeleton option, re2c ignores all non-re2c code and gen-
       erates a self-contained C program that can be further compiled and exe-
       cuted. The program consists of lexer code and input data. For each con-
       structed DFA (block or condition) re2c generates a standalone lexer and
       two files: an .input file with strings derived from the DFA and a .keys
       file  with  expected  match results. The program runs each lexer on the
       corresponding .input file and compares results with  the  expectations.
       Skeleton programs are very useful for a number of reasons:

       • They can check correctness of various re2c optimizations (the data is
         generated early in the process, before any DFA  transformations  have
         taken place).

       • Generating  a  set of input data with good coverage may be useful for
         both testing and benchmarking.

       • Generating self-contained executable programs allows one to get mini-
         mized test cases (the original code may be large or have a lot of de-
         pendencies).

       The difficulty with generating input data is that for all but the  most
       trivial  cases  the number of possible input strings is too large (even
       if the string length is limited). re2c solves this difficulty by gener-
       ating sufficiently many strings to cover almost all DFA transitions. It
       uses the following algorithm. First, it constructs a  skeleton  of  the
       DFA. For encodings with 1-byte code unit size (such as ASCII, UTF-8 and
       EBCDIC) skeleton is just an exact copy of the original DFA. For  encod-
       ings  with  multibyte code units skeleton is a copy of DFA with certain
       transitions omitted: namely, re2c takes at most 256 code units for each
       disjoint  continuous  range  that corresponds to a DFA transition.  The
       chosen values are evenly distributed and include range bounds.  Instead
       of  trying to cover all possible paths in the skeleton (which is infea-
       sible) re2c generates sufficiently many paths  to  cover  all  skeleton
       transitions,  and  thus  trigger the corresponding conditional jumps in
       the lexer.  The algorithm implementation is limited by ~1Gb of  transi-
       tions  and consumes constant amount of memory (re2c writes data to file
       as soon as it is generated).

VISUALIZATION AND DEBUG
       With the -D, --emit-dot option, re2c does not generate  code.  Instead,
       it dumps the generated DFA in DOT format.  One can convert this dump to
       an image of the DFA using Graphviz or another library.  Note that  this
       option  shows the final DFA after it has gone through a number of opti-
       mizations and transformations. Earlier stages can be dumped with  vari-
       ous  debug  options,  such  as --dump-nfa, --dump-dfa-raw etc. (see the
       full list of options).

SEE ALSO
       You can find more information  about  re2c  at  the  official  website:
       http://re2c.org.    Similar   programs   are  flex(1),  lex(1),  quex(-
       http://quex.sourceforge.net).

AUTHORS
       re2c was originaly written by Peter Bumbulis in 1993.   Since  then  it
       has been developed and maintained by multiple volunteers; mots notably,
       Brain Young, Marcus Boerger, Dan Nuffer and Ulya Trofimovich.

                                                                       RE2C(1)

Generated by dwww version 1.14 on Fri Jan 24 06:30:58 CET 2025.