PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Sample Script - SS_ExtractTable

( Some of the sample scripts may not be reproduced correctly in html because those scripts, especially web-related scripts, have html tags such as < tr > in their code.

For an accurate, copy-and-paste'able text version of this script, see SS_ExtractTable.txt . )



#####################################################################
# SCRIPT: SS_ExtractTable.txt
#
# This script extracts a table from an html input. It returns a portion of the web page
# that contains the table, starting with <table and ending with </table> .
#
# The html input is passed as input argument/FVA $input to this script.
#
# A web page may have several tables. To extract the correct table, these tables are numbered
# starting at 1. For example, the 12th table starts at the 12th instance of <table in the web
# page. The number of the table to extract is passed to this script as input argument/FVA $number.
# It can be 1, 12, etc. If a number is not specified, 1 is assumed. If $number is higher than the
# number of tables in the web page, 1 is assumed. When $number is 1, the table starting
# at the first instance of <table is extracted. This is the first outermost table.
#
# A web page may contain nested tables. So, if there are any tables inside the table being
# extracted, the inside tables will be removed.
#
# Download this script into directory C:/Scripts to a file named SS_ExtractTable.txt.
# Then call it as below.
#
# var str s1
# cat "C:/mypage.html" > $s1
# script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12)
#
# The above will produce text output on screen. If you want to save the output to a str variable
# for doing further processing with it, redirect the output as below -
#
# var str s2
# script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12) > $s2
#
# The script can be edited to meet your requirements more precisely.
#
# If you don't have biterScripting, you can download it from biterscripting.com .
# Install all sample scripts using the following command
#
# script "http://www.biterscripting.com/Download/SS_AllSamples.txt"
#
#####################################################################

var str input # html input
var int number # Table number to extract

# Is the table number specified ?
if ($number <= 0)
set $number = 1
endif

# Are there that many tables in the input ?
if ( { sen -c "^<table^" $input } < $number )
set $number = 1
endif

# Remove the portion before the number'th instance of <table .
stex -c ("]^<table^"+makestr(int($number))) $input > null

# For example, if $number is 12, the stex command is expecting argument in the form of
# ]^<table^12
# The expression ("]^<table^"+makestr(int($number))) is producing the argument in that form.

# Extract the portion up to the next </table> . Going forward, we will collect output in a
# separate variable $output.
var str output
stex -c -r "^</table&\>^]" $input >> $output

# Note that, with the -r option, we are passing a regular expression to the above stex command.
# The character > has a special meaning in regular expression. So, we must escape it with
# a backslash.

# Since the tables may be nested, the immediately </table> is not always the end of number'th table.
# This will happen if there are tables inside the table being extracted. We want to remove these
# inside tables. To do this, we count the number of instance of <table . If it is 1, there are no
# inside tables. If it is higher than 1, there are inside tables. For each inside table, we will
# remove the innermost <table...</table> pair.

var int count
set $count = { sen -c "^<table^" $output }
while ( $count > 1 )
do
# Remove the inner most table. The last <table...<table> pair is the inner most table.
stex -c "[^<table^l" $output > null
# Get the portion up to next </table> .
stex -c -r "^</table&\>^]" $input >> $output
# Get the count again.
set $count = { sen -c "^<table^" $output }
done

# We are done. Output the extracted table.
echo $output

2008-2014, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.