Command
stex
Purpose
Editor - String extractor
Aliases
stringextractor, strext, stex
Syntax
stex [ <options> ] " [<start_bounder>] ^ <search_string> ^ [<n>] [<end_bounder>] " <input_string>
Options
-c Case insensitive. Case will be ignored when searching for the
<search_string>. This option is very useful
when parsing links, emails, tags and commands in web pages.
For example, with this option, if the search string is "href=",
all the following string instances will be returned - "HREF=",
"HRef=", "href=". Without this option, case is considered
during the string search. In either case, the extracted string
and <input_string> are returned in their original case.
-p Preserve the input string. Without this option, when
a part of the input string (called the extraction target,
or just target) is extracted, that part is removed from
the input string. (This is done so that each subsequent
extract command will produce subsequent targets.) With this
option, the original string is left unchanged.
-r <search_string> is a regular expression. See help page on RE
for syntax of regular expressions.
If the -r option is specified, the <search_string> is assumed to be a regular
expression.
Arguments
<input_string>
The input string on which this command will operate. It
can be specified as a str constant or str variable or an
expression resulting in a str value.
If a str constant is used, we highly recommend using
double quotes around it, such as "John Doe".
Without the double quotes, the spaces in the input string
will produce errors. In case of a str constant or a str
expression, the -p option is assumed.
<n> The instance number. The input string will be searched for this
instance of the target. Instances are counted from 1. If <n> is
not specified, the first instance will be returned. If specified,
<n> must be either a number higher than 0 or the letter l (which indicates
the last instance).
<search_string>
The string to search for. The input string will be searched for
this search string.
The <search_string> needs to be enclosed in carets (^). The symbol
is also known as the "Cut here" symbol.
If the <serach_string> itself contains a caret (^), an opening square bracket ([),
a closing square bracket (]) or a double quote ("), escape them with a backslash,
as \^, \[, \] and \". See help page on escape for more details.
If the -r option is specified, the <search_string> is assumed to be a regular
expression.
<start_bounder>
<end_bounder>
This argument can either be absent, the character [ or the character ].
The <start_bounder> appears before the first cut-here or caret (^).
The <end_bounder> appears after the <n>.
We will now explain the role of these bounders with an example.
We will assume that the target string is "email:".
"^email:^5" Extract only the fifth instance of string "email:".
"^email:^5[" Extract everything after but excluding the fifth instance of string "email:".
"^email:^5]" Extract everything upto and including the fifth instance of string "email:".
"[^email:^5" Extract everything beginning with and including the fifth instance of string "email:".
"[^email:^5[" This combination is INVALID.
"[^email:^5]" Extract only the fifth instance of string "email:". This is same as "^email:^5".
"]^email:^5" Extract everything upto but excluding the fifth instance of string "email:".
"]^email:^5[" Extract everything outside but excluding the fifth instance of string "email:".
"]^email:^5]" This combination is INVALID.
Stream Input
Stream input is ignored.
Stream Output
The extracted content is added to stream output.
Stream Error
Any errors are listed here.
Description
The command extracts the target strings(s) from the original string and writes them to
the stream output or redirected output target. If <input_string> is a constant or an expression,
it remains unchanged. If <input_string> is a variable, and if the -p option is not
specified, the target is removed from the <input_string>. Similarly, if <input_string>
is a variable, and if the -p option is specified, the <input_string> remains unchanged.
The command CAN ALSO BE USED WITH FILES. Simply read in the contents
of the file using the repro command into a str variable. The following is
an example. It operates on the file just as if it were to operate on
a string.
# Assume you want to extract each word from file "My Code.aspx" .
var str content
repro "My Code.aspx" > $content
# Do something with $content
.
.
.
# Write the resulting output back to original file (if you so choose).
echo $content > "My Code.aspx"
Restrictions
If the <input_string> is specified as a constant or as a str expression,
the presence or absence of option -p is ignored. A constant never changes its value.
Valid Examples
The following code displays the copyright notice in file mypage.html.
# Get file contents into a str variable.
var str content
repro "mypage.html" > $content
# Is there a copyright notice in this file ?
if ( { sen -c "^copyright^" $content } == 0 )
echo "There is no copyright notice in file mypage.html."
else
do
# remove the portion upto (but excluding) Copyright.
stex -c "]^Copyright^" $content > null # We don't want to see output,
# we only want to modify $content.
# Remove the portion after the newline. That will give us just one line
# containing the copyright notice. If this line is the last line, nothing
# will be removed.
stex "[^\n^" $content > null
# Print copyright
echo "The following copyright notice was found in file mypage.html.\n" $content
done
endif
Now, we will write the code to show copyright notices in all files in a project.
Assume that the project is in directory C:/myproject. Assume that the files we
are looking for are .html, .aspx, .js.
# Collect file list in a variable.
var str fileList
lf -rn "*.html" "C:/myproject" > $fileList # Get .html files.
lf -rn "*.aspx" "C:/myproject" >> $fileList # Add .aspx files.
lf -rn "*.js" "C:/myproject" >> $fileList # Add .js files.
# Now we have a complete file list in $fileList.
# Read files one by one and extract the copyright notice.
while ($fileList <> "")
do
# Get the next file
var str file
lex "1" $fileList >$file
# The following code is just a copy of the above example.
# We have changed myfile.html to either $file or { echo $file }
# Get file contents into a str variable.
var str content
repro $file > $content
# Is there a copyright notice in this file ?
if ( { sen -c "^copyright^" $content } == 0 )
echo "There is no copyright notice in file " $file "."
else
do
# remove the portion upto (but excluding) Copyright.
stex -c "]^Copyright^" $content > null # We don't want to see output,
# we only want to modify $content.
# Remove the portion after the newline. That will give us just one line
# containing the copyright notice. If this line is the last line, nothing
# will be removed.
stex "[^\n^" $content > null
# Print copyright
echo "The following copyright notice was found in file " $file ".\n" $content
done
endif
done
Invalid Examples
var int i
...
stex "^Zip Code^" $i
Will produce error. Variable $i is not a str variable.
See Also
systemvar
var
echo
escape
sen
sin
sap
sal
lex
wex
chex
RE
|