PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Help Page - SS_ExtractTable

( Some help pages may not display correctly in html because those help pages may have sample code
in them, part of which may be mis-interpreted as html tags.

All help pages, including this help page, are available in biterScripting with the help command. )




Sample Script SS_ExtractTable Purpose Extracts a table from a web page. Source Code ##################################################################### # SCRIPT: SS_ExtractTable.txt # # This script extracts a table from an html input. It returns a portion of the web page # that contains the table, starting with <table and ending with </table> . # # The html input is passed as input argument/FVA $input to this script. # # A web page may have several tables. To extract the correct table, these tables are numbered # starting at 1. For example, the 12th table starts at the 12th instance of <table in the web # page. The number of the table to extract is passed to this script as input argument/FVA $number. # It can be 1, 12, etc. If a number is not specified, 1 is assumed. If $number is higher than the # number of tables in the web page, 1 is assumed. When $number is 1, the table starting # at the first instance of <table is extracted. This is the first outermost table. # # A web page may contain nested tables. So, if there are any tables inside the table being # extracted, the inside tables will be removed. # # Download this script into directory C:/Scripts to a file named SS_ExtractTable.txt. # Then call it as below. # # var str s1 # cat "C:/mypage.html" > $s1 # script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12) # # The above will produce text output on screen. If you want to save the output to a str variable # for doing further processing with it, redirect the output as below - # # var str s2 # script "C:/Scripts/SS_ExtractTable.txt" input($s1) number(12) > $s2 # # The script can be edited to meet your requirements more precisely. # # If you don't have biterScripting, you can download it from biterscripting.com . # Install all sample scripts using the following command # # script "http://www.biterscripting.com/Download/SS_AllSamples.txt" # ##################################################################### var str input # html input var int number # Table number to extract # Is the table number specified ? if ($number <= 0) set $number = 1 endif # Are there that many tables in the input ? if ( { sen -c "^<table^" $input } < $number ) set $number = 1 endif # Remove the portion before the number'th instance of <table . stex -c ("]^<table^"+makestr(int($number))) $input > null # For example, if $number is 12, the stex command is expecting argument in the form of # ]^<table^12 # The expression ("]^<table^"+makestr(int($number))) is producing the argument in that form. # Extract the portion up to the next </table> . Going forward, we will collect output in a # separate variable $output. var str output stex -c -r "^</table&\>^]" $input >> $output # Note that, with the -r option, we are passing a regular expression to the above stex command. # The character > has a special meaning in regular expression. So, we must escape it with # a backslash. # Since the tables may be nested, the immediately </table> is not always the end of number'th table. # This will happen if there are tables inside the table being extracted. We want to remove these # inside tables. To do this, we count the number of instance of <table . If it is 1, there are no # inside tables. If it is higher than 1, there are inside tables. For each inside table, we will # remove the innermost <table...</table> pair. var int count set $count = { sen -c "^<table^" $output } while ( $count > 1 ) do # Remove the inner most table. The last <table...<table> pair is the inner most table. stex -c "[^<table^l" $output > null # Get the portion up to next </table> . stex -c -r "^</table&\>^]" $input >> $output # Get the count again. set $count = { sen -c "^<table^" $output } done # We are done. Output the extracted table. echo $output

© 2008-2013, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.