PRODUCT |

|

|
|

|
|
|
|
|
FAQ |
|

|
|
|
|
|
|
LEARN SCRIPTING |
|

|
|
|
|
|
|
|
|
|
|
SAMPLE SCRIPTS |
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
HELP / DOCUMENTATION |
|

|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
Help Page - SS_ExtractTable
( Some help pages may not display correctly in html because those help pages may have sample code in them, part of which may be mis-interpreted as html tags.
All help pages, including this help page, are available in biterScripting with the help command. )
|
|
Sample Script
SS_ExtractTable
Purpose
Extracts a table from a web page.
Source Code
#####################################################################
# SCRIPT: SS_ExtractTable.txt
#
# This script extracts a table from an html input. It returns a portion of the web page
# that contains the table, starting with <table and ending with </table> .
#
# The html input is passed as input argument/FVA $input to this script.
#
# A web page may have several tables. To extract the correct table, these tables are numbered
# starting at 1. For example, the 12th table starts at the 12th instance of <table in the web
# page. The number of the table to extract is passed to this script as input argument/FVA $number.
# It can be 1, 12, etc. If a number is not specified, 1 is assumed. If $number is higher than the
# number of tables in the web page, 1 is assumed. When $number is 1, the table starting
# at the first instance of <table is extracted. This is the first outermost table.
#
# A web page may contain nested tables. So, if there are any tables inside the table being
# extracted, the inside tables will be removed.
#
# Download this script into directory C:/Scripts to a file named SS_ExtractTable.txt.
# Then call it as below.
#
# var str s1
# cat "C:/mypage.html" > $s1
# script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12)
#
# The above will produce text output on screen. If you want to save the output to a str variable
# for doing further processing with it, redirect the output as below -
#
# var str s2
# script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12) > $s2
#
# The script can be edited to meet your requirements more precisely.
#
# If you don't have biterScripting, you can download it from biterscripting.com .
# Install all sample scripts using the following command
#
# script "http://www.biterscripting.com/Download/SS_AllSamples.txt"
#
#####################################################################
var str input # html input
var int number # Table number to extract
# Is the table number specified ?
if ($number <= 0)
set $number = 1
endif
# Are there that many tables in the input ?
if ( { sen -c "^<table^" $input } < $number )
set $number = 1
endif
# Remove the portion before the number'th instance of <table .
stex -c ("]^<table^"+makestr(int($number))) $input > null
# For example, if $number is 12, the stex command is expecting argument in the form of
# ]^<table^12
# The expression ("]^<table^"+makestr(int($number))) is producing the argument in that form.
# Extract the portion up to the next </table> . Going forward, we will collect output in a
# separate variable $output.
var str output
stex -c -r "^</table&\>^]" $input >> $output
# Note that, with the -r option, we are passing a regular expression to the above stex command.
# The character > has a special meaning in regular expression. So, we must escape it with
# a backslash.
# Since the tables may be nested, the immediately </table> is not always the end of number'th table.
# This will happen if there are tables inside the table being extracted. We want to remove these
# inside tables. To do this, we count the number of instance of <table . If it is 1, there are no
# inside tables. If it is higher than 1, there are inside tables. For each inside table, we will
# remove the innermost <table...</table> pair.
var int count
set $count = { sen -c "^<table^" $output }
while ( $count > 1 )
do
# Remove the inner most table. The last <table...<table> pair is the inner most table.
stex -c "[^<table^l" $output > null
# Get the portion up to next </table> .
stex -c -r "^</table&\>^]" $input >> $output
# Get the count again.
set $count = { sen -c "^<table^" $output }
done
# We are done. Output the extracted table.
echo $output
|
|
© 2008-2013, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com.
Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.
|
|