dwww Home | Manual pages | Find package

Tcl_RegExpMatch(3tcl)       Tcl Library Procedures       Tcl_RegExpMatch(3tcl)

______________________________________________________________________________

NAME
       Tcl_RegExpMatch,  Tcl_RegExpCompile,  Tcl_RegExpExec,  Tcl_RegExpRange,
       Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,  Tcl_RegExpExecObj,  Tcl_Reg-
       ExpGetInfo - Pattern matching with regular expressions

SYNOPSIS
       #include <tcl.h>

       int
       Tcl_RegExpMatchObj(interp, textObj, patObj)

       int
       Tcl_RegExpMatch(interp, text, pattern)

       Tcl_RegExp
       Tcl_RegExpCompile(interp, pattern)

       int
       Tcl_RegExpExec(interp, regexp, text, start)

       void
       Tcl_RegExpRange(regexp, index, startPtr, endPtr)

       Tcl_RegExp
       Tcl_GetRegExpFromObj(interp, patObj, cflags)

       int
       Tcl_RegExpExecObj(interp, regexp, textObj, offset, nmatches, eflags)

       void
       Tcl_RegExpGetInfo(regexp, infoPtr)

ARGUMENTS
       Tcl_Interp *interp (in)              Tcl  interpreter  to use for error
                                            reporting.  The interpreter may be
                                            NULL  if no error reporting is de-
                                            sired.

       Tcl_Obj *textObj (in/out)            Refers to the value from which  to
                                            get  the  text to search.  The in-
                                            ternal representation of the value
                                            may  be  converted  to a form that
                                            can be efficiently searched.

       Tcl_Obj *patObj (in/out)             Refers to the value from which  to
                                            get a regular expression. The com-
                                            piled regular expression is cached
                                            in the value.

       const char *text (in)                Text  to search for a match with a
                                            regular expression.

       const char *pattern (in)             String in the form  of  a  regular
                                            expression pattern.

       Tcl_RegExp regexp (in)               Compiled regular expression.  Must
                                            have been returned  previously  by
                                            Tcl_GetRegExpFromObj or Tcl_RegEx-
                                            pCompile.

       const char *start (in)               If text is just a portion of  some
                                            other  string, this argument iden-
                                            tifies the beginning of the larger
                                            string.   If it is not the same as
                                            text, then no “^” matches will  be
                                            allowed.

       int index (in)                       Specifies  which range is desired:
                                            0 means the range  of  the  entire
                                            match,  1  or  greater  means  the
                                            range that matched a parenthesized
                                            sub-expression.

       const char **startPtr (out)          The address of the first character
                                            in the range is  stored  here,  or
                                            NULL if there is no such range.

       const char **endPtr (out)            The  address of the character just
                                            after the last one in the range is
                                            stored  here,  or NULL if there is
                                            no such range.

       int cflags (in)                      OR-ed combination of the  compila-
                                            tion    flags    TCL_REG_ADVANCED,
                                            TCL_REG_EXTENDED,   TCL_REG_BASIC,
                                            TCL_REG_EXPANDED,   TCL_REG_QUOTE,
                                            TCL_REG_NOCASE,   TCL_REG_NEWLINE,
                                            TCL_REG_NLSTOP,    TCL_REG_NLANCH,
                                            TCL_REG_NOSUB,  and   TCL_REG_CAN-
                                            MATCH. See below for more informa-
                                            tion.

       int offset (in)                      The character offset into the text
                                            where  matching should begin.  The
                                            value of the offset has no  impact
                                            on  ^  matches.   This behavior is
                                            controlled by eflags.

       int nmatches (in)                    The number of matching  subexpres-
                                            sions  that  should  be remembered
                                            for later use.  If this  value  is
                                            0, then no subexpression match in-
                                            formation will  be  computed.   If
                                            the  value  is -1, then all of the
                                            matching  subexpressions  will  be
                                            remembered.   Any other value will
                                            be taken as the maximum number  of
                                            subexpressions to remember.

       int eflags (in)                      OR-ed combination of the execution
                                            flags      TCL_REG_NOTBOL      and
                                            TCL_REG_NOTEOL. See below for more
                                            information.

       Tcl_RegExpInfo *infoPtr (out)        The address of the location  where
                                            information about a previous match
                                            should be stored by Tcl_RegExpGet-
                                            Info.
______________________________________________________________________________

DESCRIPTION
       Tcl_RegExpMatch determines whether its pattern argument matches regexp,
       where regexp is interpreted as a regular expression using the rules  in
       the re_syntax reference page.  If there is a match then Tcl_RegExpMatch
       returns 1.  If there is no match then Tcl_RegExpMatch returns 0.  If an
       error occurs in the matching process (e.g. pattern is not a valid regu-
       lar expression) then Tcl_RegExpMatch returns -1  and  leaves  an  error
       message  in  the  interpreter result.  Tcl_RegExpMatchObj is similar to
       Tcl_RegExpMatch except it operates on the Tcl values textObj and patObj
       instead of UTF strings.  Tcl_RegExpMatchObj is generally more efficient
       than Tcl_RegExpMatch, so it is the preferred interface.

       Tcl_RegExpCompile, Tcl_RegExpExec, and Tcl_RegExpRange  provide  lower-
       level access to the regular expression pattern matcher.  Tcl_RegExpCom-
       pile compiles a regular expression string into the internal  form  used
       for  efficient  pattern matching.  The return value is a token for this
       compiled form, which can be used in subsequent calls to  Tcl_RegExpExec
       or Tcl_RegExpRange.  If an error occurs while compiling the regular ex-
       pression then Tcl_RegExpCompile returns NULL and leaves an  error  mes-
       sage  in the interpreter result.  Note:  the return value from Tcl_Reg-
       ExpCompile is only valid up to the next call to Tcl_RegExpCompile;   it
       is not safe to retain these values for long periods of time.

       Tcl_RegExpExec executes the regular expression pattern matcher.  It re-
       turns 1 if text contains a range of characters that match regexp, 0  if
       no match is found, and -1 if an error occurs.  In the case of an error,
       Tcl_RegExpExec leaves an error message in the interpreter result.  When
       searching  a  string for multiple matches of a pattern, it is important
       to distinguish between the start of the original string and  the  start
       of  the current search.  For example, when searching for the second oc-
       currence of a match, the text argument might  point  to  the  character
       just  after  the first match;  however, it is important for the pattern
       matcher to know that this is not the start of  the  entire  string,  so
       that  it  does  not allow “^” atoms in the pattern to match.  The start
       argument provides this information by pointing  to  the  start  of  the
       overall  string  containing  text.  Start will be less than or equal to
       text;  if it is less than text then no ^ matches will be allowed.

       Tcl_RegExpRange may be invoked after Tcl_RegExpExec returns;   it  pro-
       vides detailed information about what ranges of the string matched what
       parts of the pattern.  Tcl_RegExpRange returns a pair  of  pointers  in
       *startPtr and *endPtr that identify a range of characters in the source
       string for the most recent call  to  Tcl_RegExpExec.   Index  indicates
       which  of  several ranges is desired: if index is 0, information is re-
       turned about the overall range of characters that  matched  the  entire
       pattern;  otherwise, information is returned about the range of charac-
       ters that matched the index'th parenthesized subexpression  within  the
       pattern.   If  there  is  no  range corresponding to index then NULL is
       stored in *startPtr and *endPtr.

       Tcl_GetRegExpFromObj,  Tcl_RegExpExecObj,  and  Tcl_RegExpGetInfo   are
       value  interfaces  that  provide  the  most  direct  control  of  Henry
       Spencer's regular expression library.  For users that  need  to  modify
       compilation  and execution options directly, it is recommended that you
       use these interfaces instead of calling the internal regexp  functions.
       These  interfaces  handle the details of UTF to Unicode translations as
       well as providing improved performance through caching in  the  pattern
       and string values.

       Tcl_GetRegExpFromObj  attempts  to return a compiled regular expression
       from the patObj.  If the value does not already contain a compiled reg-
       ular  expression  it  will attempt to create one from the string in the
       value and assign it to the internal representation of the patObj.   The
       return  value of this function is of type Tcl_RegExp.  The return value
       is a token for this compiled form, which  can  be  used  in  subsequent
       calls  to  Tcl_RegExpExecObj  or Tcl_RegExpGetInfo.  If an error occurs
       while compiling the regular expression  then  Tcl_GetRegExpFromObj  re-
       turns  NULL and leaves an error message in the interpreter result.  The
       regular expression token can be used as long as the internal  represen-
       tation of patObj refers to the compiled form.  The cflags argument is a
       bit-wise OR of zero or more of the following  flags  that  control  the
       compilation of patObj:

         TCL_REG_ADVANCED
                Compile advanced regular expressions (“ARE”s).  This mode cor-
                responds to the normal regular expression syntax  accepted  by
                the Tcl regexp and regsub commands.

         TCL_REG_EXTENDED
                Compile extended regular expressions (“ERE”s).  This mode cor-
                responds to the regular expression syntax  recognized  by  Tcl
                8.0 and earlier versions.

         TCL_REG_BASIC
                Compile  basic regular expressions (“BRE”s).  This mode corre-
                sponds to the regular expression syntax recognized  by  common
                Unix  utilities  like sed and grep.  This is the default if no
                flags are specified.

         TCL_REG_EXPANDED
                Compile the regular expression (basic, extended, or  advanced)
                using  an expanded syntax that allows comments and whitespace.
                This mode causes non-backslashed non-bracket-expression  white
                space and #-to-end-of-line comments to be ignored.

         TCL_REG_QUOTE
                Compile a literal string, with all characters treated as ordi-
                nary characters.

         TCL_REG_NOCASE
                Compile for matching that ignores  upper/lower  case  distinc-
                tions.

         TCL_REG_NEWLINE
                Compile  for  newline-sensitive matching.  By default, newline
                is a completely ordinary character with no special meaning  in
                either  regular  expressions or strings.  With this flag, “[^”
                bracket expressions and “.”  never match newline, “^”  matches
                an  empty  string  after any newline in addition to its normal
                function, and “$” matches an empty string before  any  newline
                in  addition  to its normal function.  REG_NEWLINE is the bit-
                wise OR of REG_NLSTOP and REG_NLANCH.

         TCL_REG_NLSTOP
                Compile for partial newline-sensitive matching, with  the  be-
                havior  of “[^” bracket expressions and “.”  affected, but not
                the behavior of “^” and “$”.  In this mode, “[^”  bracket  ex-
                pressions and “.”  never match newline.

         TCL_REG_NLANCH
                Compile  for  inverse partial newline-sensitive matching, with
                the behavior of “^” and “$” (the “anchors”) affected, but  not
                the  behavior  of  “[^”  bracket expressions and “.”.  In this
                mode “^” matches an empty string after any newline in addition
                to its normal function, and “$” matches an empty string before
                any newline in addition to its normal function.

         TCL_REG_NOSUB
                Compile for matching that reports only success or failure, not
                what  was  matched.  This reduces compile overhead and may im-
                prove performance.  Subsequent calls to  Tcl_RegExpGetInfo  or
                Tcl_RegExpRange will not report any match information.

         TCL_REG_CANMATCH
                Compile  for matching that reports the potential to complete a
                partial match given more text (see below).

       Only one  of  TCL_REG_EXTENDED,  TCL_REG_ADVANCED,  TCL_REG_BASIC,  and
       TCL_REG_QUOTE may be specified.

       Tcl_RegExpExecObj  executes the regular expression pattern matcher.  It
       returns 1 if objPtr contains a range of characters that match regexp, 0
       if no match is found, and -1 if an error occurs.  In the case of an er-
       ror, Tcl_RegExpExecObj leaves an error message in the  interpreter  re-
       sult.   The nmatches value indicates to the matcher how many subexpres-
       sions are of interest.  If nmatches is 0, then no  subexpression  match
       information  is  recorded,  which may allow the matcher to make various
       optimizations.  If the value is -1, then all of the  subexpressions  in
       the  pattern  are remembered.  If the value is a positive integer, then
       only that number of subexpressions will be remembered.  Matching begins
       at  the  specified  Unicode  character  index  given by offset.  Unlike
       Tcl_RegExpExec, the behavior of anchors is not affected by  the  offset
       value.  Instead the behavior of the anchors is explicitly controlled by
       the eflags argument, which is a bit-wise OR of zero or more of the fol-
       lowing flags:

         TCL_REG_NOTBOL
                The starting character will not be treated as the beginning of
                a line or the beginning of the string, so “^” will  not  match
                there.  Note that this flag has no effect on how “\A” matches.

         TCL_REG_NOTEOL
                The  last  character  in the string will not be treated as the
                end of a line or the end of the string, so “$” will not  match
                there.  Note that this flag has no effect on how “\Z” matches.

       Tcl_RegExpGetInfo  retrieves information about the last match performed
       with a given regular expression regexp.  The infoPtr argument  contains
       a pointer to a structure that is defined as follows:

              typedef struct Tcl_RegExpInfo {
                  int nsubs;
                  Tcl_RegExpIndices *matches;
                  long extendStart;
              } Tcl_RegExpInfo;

       The  nsubs field contains a count of the number of parenthesized subex-
       pressions within the regular  expression.   If  the  TCL_REG_NOSUB  was
       used, then this value will be zero.  The matches field points to an ar-
       ray of nsubs+1 values that indicate the bounds  of  each  subexpression
       matched.  The first element in the array refers to the range matched by
       the entire regular expression, and subsequent  elements  refer  to  the
       parenthesized  subexpressions in the order that they appear in the pat-
       tern.  Each element is a structure that is defined as follows:

              typedef struct Tcl_RegExpIndices {
                  long start;
                  long end;
              } Tcl_RegExpIndices;

       The start and end values are Unicode character indices relative to  the
       offset  location  within  objPtr where matching began.  The start index
       identifies the first character of the matched subexpression.   The  end
       index  identifies  the first character after the matched subexpression.
       If the subexpression matched the empty string, then start and end  will
       be  equal.  If the subexpression did not participate in the match, then
       start and end will be set to -1.

       The extendStart field in Tcl_RegExpInfo is only set if the TCL_REG_CAN-
       MATCH  flag  was  used.  It indicates the first character in the string
       where a match could occur.  If a match was found, this will be the same
       as  the beginning of the current match.  If no match was found, then it
       indicates the earliest point at which a match might occur if additional
       text  is  appended  to  the string.  If it is no match is possible even
       with further text, this field will be set to -1.

SEE ALSO
       re_syntax(3tcl)

KEYWORDS
       match, pattern, regular expression, string,  subexpression,  Tcl_RegEx-
       pIndices, Tcl_RegExpInfo

Tcl                                   8.1                Tcl_RegExpMatch(3tcl)

Generated by dwww version 1.14 on Fri Nov 22 20:17:31 CET 2024.