( Regexp Functions

Info Catalog ( Regular Expressions ( Match Structures
 21.5.1 Regexp Functions
 By default, Guile supports POSIX extended regular expressions.  That
 means that the characters `(', `)', `+' and `?' are special, and must
 be escaped if you wish to match the literal characters.
    This regular expression interface was modeled after that implemented
 by SCSH, the Scheme Shell.  It is intended to be upwardly compatible
 with SCSH regular expressions.
  -- Scheme Procedure: string-match pattern str [start]
      Compile the string PATTERN into a regular expression and compare
      it with STR.  The optional numeric argument START specifies the
      position of STR at which to begin matching.
      `string-match' returns a "match structure" which describes what,
      if anything, was matched by the regular expression.   Match
      Structures.  If STR does not match PATTERN at all,
      `string-match' returns `#f'.
    Two examples of a match follow.  In the first example, the pattern
 matches the four digits in the match string.  In the second, the pattern
 matches nothing.
      (string-match "[0-9][0-9][0-9][0-9]" "blah2002")
      => #("blah2002" (4 . 8))
      (string-match "[A-Za-z]" "123456")
      => #f
    Each time `string-match' is called, it must compile its PATTERN
 argument into a regular expression structure.  This operation is
 expensive, which makes `string-match' inefficient if the same regular
 expression is used several times (for example, in a loop).  For better
 performance, you can compile a regular expression in advance and then
 match strings against the compiled regexp.
  -- Scheme Procedure: make-regexp pat . flags
  -- C Function: scm_make_regexp (pat, flags)
      Compile the regular expression described by PAT, and return the
      compiled regexp structure.  If PAT does not describe a legal
      regular expression, `make-regexp' throws a
      `regular-expression-syntax' error.
      The FLAGS arguments change the behavior of the compiled regular
      expression.  The following flags may be supplied:
           Consider uppercase and lowercase letters to be the same when
           If a newline appears in the target string, then permit the
           `^' and `$' operators to match immediately after or
           immediately before the newline, respectively.  Also, the `.'
           and `[^...]' operators will never match a newline character.
           The intent of this flag is to treat the target string as a
           buffer containing many lines of text, and the regular
           expression as a pattern that may match a single one of those
           Compile a basic ("obsolete") regexp instead of the extended
           ("modern") regexps that are the default.  Basic regexps do
           not consider `|', `+' or `?' to be special characters, and
           require the `{...}' and `(...)' metacharacters to be
           backslash-escaped ( Backslash Escapes).  There are
           several other differences between basic and extended regular
           expressions, but these are the most significant.
           Compile an extended regular expression rather than a basic
           regexp.  This is the default behavior; this flag will not
           usually be needed.  If a call to `make-regexp' includes both
           `regexp/basic' and `regexp/extended' flags, the one which
           comes last will override the earlier one.
  -- Scheme Procedure: regexp-exec rx str [start [flags]]
  -- C Function: scm_regexp_exec (rx, str, start, flags)
      Match the compiled regular expression RX against `str'.  If the
      optional integer START argument is provided, begin matching from
      that position in the string.  Return a match structure describing
      the results of the match, or `#f' if no match could be found.
      The FLAGS arguments change the matching behavior.  The following
      flags may be supplied:
           Operator `^' always fails (unless `regexp/newline' is used).
           Use this when the beginning of the string should not be
           considered the beginning of a line.
           Operator `$' always fails (unless `regexp/newline' is used).
           Use this when the end of the string should not be considered
           the end of a line.
      ;; Regexp to match uppercase letters
      (define r (make-regexp "[A-Z]*"))
      ;; Regexp to match letters, ignoring case
      (define ri (make-regexp "[A-Z]*" regexp/icase))
      ;; Search for bob using regexp r
      (match:substring (regexp-exec r "bob"))
      => ""                  ; no match
      ;; Search for bob using regexp ri
      (match:substring (regexp-exec ri "Bob"))
      => "Bob"               ; matched case insensitive
  -- Scheme Procedure: regexp? obj
  -- C Function: scm_regexp_p (obj)
      Return `#t' if OBJ is a compiled regular expression, or `#f'
    Regular expressions are commonly used to find patterns in one string
 and replace them with the contents of another string.
  -- Scheme Procedure: regexp-substitute port match [item...]
      Write to the output port PORT selected contents of the match
      structure MATCH.  Each ITEM specifies what should be written, and
      may be one of the following arguments:
         * A string.  String arguments are written out verbatim.
         * An integer.  The submatch with that number is written.
         * The symbol `pre'.  The portion of the matched string preceding
           the regexp match is written.
         * The symbol `post'.  The portion of the matched string
           following the regexp match is written.
      The PORT argument may be `#f', in which case nothing is written;
      instead, `regexp-substitute' constructs a string from the
      specified ITEMs and returns that.
    The following example takes a regular expression that matches a
 standard YYYYMMDD-format date such as `"20020828"'.  The
 `regexp-substitute' call returns a string computed from the information
 in the match structure, consisting of the fields and text from the
 original string reordered and reformatted.
      (define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
      (define s "Date 20020429 12am.")
      (define sm (string-match date-regex s))
      (regexp-substitute #f sm 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
      => "Date 04-29-2002 12am. (20020429)"
  -- Scheme Procedure: regexp-substitute/global port regexp target
      Similar to `regexp-substitute', but can be used to perform global
      substitutions on STR.  Instead of taking a match structure as an
      argument, `regexp-substitute/global' takes two string arguments: a
      REGEXP string describing a regular expression, and a TARGET string
      which should be matched against this regular expression.
      Each ITEM behaves as in REGEXP-SUBSTITUTE, with the following
         * A function may be supplied.  When this function is called, it
           will be passed one argument: a match structure for a given
           regular expression match.  It should return a string to be
           written out to PORT.
         * The `post' symbol causes `regexp-substitute/global' to recurse
           on the unmatched portion of STR.  This _must_ be supplied in
           order to perform global search-and-replace on STR; if it is
           not present among the ITEMs, then `regexp-substitute/global'
           will return after processing a single match.
    The example above for `regexp-substitute' could be rewritten as
 follows to remove the `string-match' stage:
      (define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
      (define s "Date 20020429 12am.")
      (regexp-substitute/global #f date-regex s
        'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
      => "Date 04-29-2002 12am. (20020429)"
Info Catalog ( Regular Expressions ( Match Structures
automatically generated byinfo2html