PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Sample Script - SS_WebPageToCSV

( Some of the sample scripts may not be reproduced correctly in html because those scripts, especially web-related scripts, have html tags such as < tr > in their code.

For an accurate, copy-and-paste'able text version of this script, see SS_WebPageToCSV.txt . )



#####################################################################
# SCRIPT: SS_WebPageToCSV.txt
#
# This script extracts a table from a web page and creates corresponding CSV
# (Comma Separated Values) output. The CSV output can then be saved to a file.
# The file can then be used with a spreadsheet program.
#
# The name of the web page is passed as input argument/FVA $page to this script.
# It can be either a web page or a local file and has one of the following forms -
# "http://www.xxx.yyy/.../zzz.html", "C:/.../file.html" . We are using the extension
# of .html as an example only, the script will accept any extension such as .asp, .php, etc.
#
# A web page may have several tables. To extract the correct table, these tables are numbered
# starting at 1. For example, the 12th table starts at the 12th instance of <table in the web
# page. The number of the table to extract, is passed to this script as input argument/FVA $number.
# It can be 1, 12, etc.
#
# Download this script into directory C:/Scripts to a file named SS_WebPageToCSV.txt.
# Then call it as below.
#
# script "C:/Scripts/SS_WebPageToCSV.txt" page("http://www.xxx.yyy/.../zzz.mmm") number(12)
#
# The above will produce text output on screen. If you want to store the output in a file,
# simply redirect the script output to a local file, as below -
#
# script "C:/Scripts/SS_WebPageToCSV.txt" page("http://www.xxx.yyy/.../zzz.mmm") number(12) > "C:/table.csv"
#
# This script makes use of other biterscripting sample scripts. To install all sample scripts
# all at once, use the following command.
#
# script "http://www.biterscripting.com/Download/SS_AllSamples.txt"
#
# The above will install all sample scripts in the directory C:/Scripts.
# If you don't have biterscripting, you can download it from biterscripting.com .
#
#####################################################################

var str page # name of input page or file
var int number # number of the table to convert to CSV format.

# Read the file contents into a variable $html.
var str html, temp_str
cat $page > $html

# Extract the number'th table.
script "C:/Scripts/SS_ExtractTable.txt" input($html) number($number) > $html

# Replace all "</tr...>" with "[ROW_SEPARATOR]"
# Get the first instance of </tr...>
stex -p -r -c "^</tr&\>^" $html > $temp_str
while ( $temp_str <> "" )
do
# Replace this instance.
sal -r -c "^</tr&\>^" "[ROW_SEPARATOR]" $html > null
# Get the next instance of </tr...>
stex -p -r -c "^</tr&\>^" $html > $temp_str
done

# Replace all "</td...>" with "[COLUMN_SEPARATOR]"
# Get the first instance of </td...>
stex -p -r -c "^</td&\>^" $html > $temp_str
while ( $temp_str <> "" )
do
# Replace this instance.
sal -r -c "^</td&\>^" "[COLUMN_SEPARATOR]" $html > null
# Get the next instance of </td...>
stex -p -r -c "^</td&\>^" $html > $temp_str
done

# Remove any remaining html tags
script SS_RemoveTags.txt input($html) start_tag("<") end_tag(">") > $html
script SS_RemoveTags.txt input($html) start_tag("{") end_tag("}") > $html
script SS_RemoveTags.txt input($html) start_tag("&") end_tag(";") > $html

# Replace any newlines, tabs and commas with a space.
# Get the first instance of newline, tab or comma
stex -p -r -c "^(\n\t\,)^" $html > $temp_str
while ( $temp_str <> "" )
do
# Replace this instance.
sal -r -c "^(\n\t\,)^" " " $html > null
# Get the next instance of newline, tab or comma
stex -p -r -c "^(\n\t\,)^" $html > $temp_str
done

# $html now has table entries only - no html tags.
var str output
set $output = $html

# Replace ROW_SEPARATORs with newlines and COLUMN_SEPARATORs with tabs or commas.
# Get the first instance of [ROW_SEPARATOR].
stex -p "^[ROW_SEPARATOR]^" $output > $temp_str
while ( $temp_str <> "" )
do
# Replace this instance.
sal "^[ROW_SEPARATOR]^" "\n" $output > null
# Get the next instance of [ROW_SEPARATOR].
stex -p "^[ROW_SEPARATOR]^" $output > $temp_str
done

# Get the first instance of [COLUMN_SEPARATOR].
stex -p "^[COLUMN_SEPARATOR]^" $output > $temp_str
while ( $temp_str <> "" )
do
# Replace this instance.
sal "^[COLUMN_SEPARATOR]^" "\t" $output > null # Use a \t (tab) or comma (,) to separate columns.
# Get the next instance of [COLUMN_SEPARATOR].
stex -p "^[COLUMN_SEPARATOR]^" $output > $temp_str
done

# Write output
echo $output

2008-2014, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.