PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Help Page - SS_WebPageToCSV

( Some help pages may not display correctly in html because those help pages may have sample code
in them, part of which may be mis-interpreted as html tags.

All help pages, including this help page, are available in biterScripting with the help command. )




Sample Script SS_WebPageToCSV Purpose Extracts a table from a web page and converts it to CSV (Comma Separated Values) format. Source Code ##################################################################### # SCRIPT: SS_WebPageToCSV.txt # # This script extracts a table from a web page and creates corresponding CSV # (Comma Separated Values) output. The CSV output can then be saved to a file. # The file can then be used with a spreadsheet program. # # The name of the web page is passed as input argument/FVA $page to this script. # It can be either a web page or a local file and has one of the following forms - # "http://www.xxx.yyy/.../zzz.html", "C:/.../file.html" . We are using the extension # of .html as an example only, the script will accept any extension such as .asp, .php, etc. # # A web page may have several tables. To extract the correct table, these tables are numbered # starting at 1. For example, the 12th table starts at the 12th instance of <table in the web # page. The number of the table to extract, is passed to this script as input argument/FVA $number. # It can be 1, 12, etc. # # Download this script into directory C:/Scripts to a file named SS_WebPageToCSV.txt. # Then call it as below. # # script "C:/Scripts/SS_WebPageToCSV.txt" page("http://www.xxx.yyy/.../zzz.mmm") number(12) # # The above will produce text output on screen. If you want to store the output in a file, # simply redirect the script output to a local file, as below - # # script "C:/Scripts/SS_WebPageToCSV.txt" page("http://www.xxx.yyy/.../zzz.mmm") number(12) > "C:/table.csv" # # This script makes use of other biterscripting sample scripts. To install all sample scripts # all at once, use the following command. # # script "http://www.biterscripting.com/Download/SS_AllSamples.txt" # # The above will install all sample scripts in the directory C:/Scripts. # If you don't have biterscripting, you can download it from biterscripting.com . # ##################################################################### var str page # name of input page or file var int number # number of the table to convert to CSV format. # Read the file contents into a variable $html. var str html, temp_str cat $page > $html # Extract the number'th table. script "C:/Scripts/SS_ExtractTable.txt" input($html) number($number) > $html # Replace all "</tr...>" with "[ROW_SEPARATOR]" # Get the first instance of </tr...> stex -p -r -c "^</tr&\>^" $html > $temp_str while ( $temp_str <> "" ) do # Replace this instance. sal -r -c "^</tr&\>^" "[ROW_SEPARATOR]" $html > null # Get the next instance of </tr...> stex -p -r -c "^</tr&\>^" $html > $temp_str done # Replace all "</td...>" with "[COLUMN_SEPARATOR]" # Get the first instance of </td...> stex -p -r -c "^</td&\>^" $html > $temp_str while ( $temp_str <> "" ) do # Replace this instance. sal -r -c "^</td&\>^" "[COLUMN_SEPARATOR]" $html > null # Get the next instance of </td...> stex -p -r -c "^</td&\>^" $html > $temp_str done # Remove any remaining html tags script SS_RemoveTags.txt input($html) start_tag("<") end_tag(">") > $html script SS_RemoveTags.txt input($html) start_tag("{") end_tag("}") > $html script SS_RemoveTags.txt input($html) start_tag("&") end_tag(";") > $html # Replace any newlines, tabs and commas with a space. # Get the first instance of newline, tab or comma stex -p -r -c "^(\n\t\,)^" $html > $temp_str while ( $temp_str <> "" ) do # Replace this instance. sal -r -c "^(\n\t\,)^" " " $html > null # Get the next instance of newline, tab or comma stex -p -r -c "^(\n\t\,)^" $html > $temp_str done # $html now has table entries only - no html tags. var str output set $output = $html # Replace ROW_SEPARATORs with newlines and COLUMN_SEPARATORs with tabs or commas. # Get the first instance of [ROW_SEPARATOR]. stex -p "^[ROW_SEPARATOR]^" $output > $temp_str while ( $temp_str <> "" ) do # Replace this instance. sal "^[ROW_SEPARATOR]^" "\n" $output > null # Get the next instance of [ROW_SEPARATOR]. stex -p "^[ROW_SEPARATOR]^" $output > $temp_str done # Get the first instance of [COLUMN_SEPARATOR]. stex -p "^[COLUMN_SEPARATOR]^" $output > $temp_str while ( $temp_str <> "" ) do # Replace this instance. sal "^[COLUMN_SEPARATOR]^" "\t" $output > null # Use a \t (tab) or comma (,) to separate columns. # Get the next instance of [COLUMN_SEPARATOR]. stex -p "^[COLUMN_SEPARATOR]^" $output > $temp_str done # Write output echo $output

2008-2013, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.