PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Help Page - SS_WebPageToText

( Some help pages may not display correctly in html because those help pages may have sample code
in them, part of which may be mis-interpreted as html tags.

All help pages, including this help page, are available in biterScripting with the help command. )




Sample Script SS_WebPageToText Purpose Extracts plain text and creates a plain text version from a web page. Source Code ##################################################################### # SCRIPT: SS_WebPageToText.txt # # This script reads a web page and creates corresponding plain text # version. The plain text version of web pages, thus created, can then be stored in # a local file and used for spell-checking, excerpting, review by # legal department, inclusion into legal documents, inclusion into # requirements/specifications documents, keeping the page lengths within limits, or other # purposes. # # The name of the web page is passed as input argument/FVA $page to this script. # It can be either a web page or a local file and has one of the following forms - # "http://www.xxx.yyy/.../zzz.html", "C:/.../file.html" . We are using the extension # of .html as an example only, the script will accept any extension such as .asp, .php, etc. # # Download this script into directory C:/Scripts to a file named sS_WebPageToText.txt. # Then call it as below. # # script "C:/Scripts/SS_WebPageToText.txt" page("http://www.xxx.yyy/.../zzz.mmm") # # The above will produce text output on screen. If you want to store the output in a file, # simply redirect the script output to a local file, as below - # # script "C:/Scripts/SS_WebPageToText.txt" page("http://www.xxx.yyy/.../zzz.mmm") > "C:/page.txt" # # The script can be edited to meet your requirements more precisely. # # IMPORTANT: As a sample of producing # debugging messages, we have left the debugging echo calls in this script. The -e option # in these echo commands indicates that the output of the echo commands will be written to # standard error stream. To not see these debugging statements, simply the following when calling # this script - # 2>null # Alternatively, you can remove the debugging echo calls. # # If you don't have biterscripting, you can download it from biterscripting.com . # ##################################################################### var str page # name of input page or file # Read the file contents into a variable web_version. # We will create the text version in variable text_version. var str web_version, text_version echo -e "DEBUG Reading page " $page cat $page > $web_version # Remove all <...> tags. To do this, we will use the script SS_RemoveTags. echo -e "DEBUG Removing <...> Tags" script SS_RemoveTags.txt input($web_version) start_tag("<") end_tag(">") > $web_version # Remove all {...} tags. echo -e "DEBUG Removing {...} Tags" script SS_RemoveTags.txt input($web_version) start_tag("{") end_tag("}") > $web_version # Remove &...; formatting tags echo -e "DEBUG Removing &...; Tags" script SS_RemoveTags.txt input($web_version) start_tag("&") end_tag(";") > $web_version # Remove all extra spaces, tabs, etc. We will replace them with one space. # Will will use a temporary str variable for collecting intermediate output. echo -e "DEBUG Removing extra formatting characters" var str temp_str while ( { sen -r "^,^" $web_version } > 0 ) do stex -r "]^,^" $web_version > $temp_str set $text_version = $text_version + $temp_str + " " stex -r "^;^]" $web_version > null # We will discard this output. done # There may be something left in $web_version set $text_version = $text_version + $web_version # For easy reading, we will cut lines at every 12 words. (You can change this number of words, # pass the number of words thru an input variable, or format it entirely differently.) # We will create the formatted version in $formatted_version . echo -e "DEBUG Inserting line breaks for easy reading" var str formatted_version while ( { wen $text_version } > 12 ) do wex "12]" $text_version > $temp_str set $formatted_version = $formatted_version + $temp_str + "\n" done # There may be something left in $text_version set $formatted_version = $formatted_version + $text_version # Write out the formatted version. echo $formatted_version

2008-2013, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.