PRODUCT






Home









Free Download








Installation Instructions





FAQ





FAQ








Ask A Question





LEARN SCRIPTING





Overview








Lesson 1








2


3


4


5








Exam





SAMPLE SCRIPTS





Computer








Internet








Administrators








Developers








Data








Miscellaneous





HELP / DOCUMENTATION





Commands








Automated Internet








Automated Editors








Sample Scripts








Precompiled Functions








System Features






  Help Page - SS_SearchWeb

( Some help pages may not display correctly in html because those help pages may have sample code
in them, part of which may be mis-interpreted as html tags.

All help pages, including this help page, are available in biterScripting with the help command. )




Sample Script SS_SearchWeb Purpose Collects parts of web pages that contain a specified string. Starts at a seed URL. Source Code ##################################################################### # SCRIPT: SS_SearchWeb # # This script searches the web for a specified search string or a # Regular Expression. # # A seed URL is assigned using FVA (Forward Variable Assignment) # for str variable $seedURL. The value of $seedURL is of form # "http://www.abc.def" or "http://www.abc.def/.../page.html" . # # The search string or regular expression is passed using FVA for # str variable $search_sting. Note that even if this variable is called # search_string, you can pass a regular expression though it. # # The list of domains to be ignored is passed using FVA for str variabls # $ignore_domains. The format is <domain>|<domain>|<domain> ... # Each <domain> is in the form "http://www.abc.def" . # # All domains and domains passed to this script need to be in the standard format # http://www.abc.def or http://www.abc.def/...../page.html # The http:// part IS REQUIRED. The page name extension can be anything such as # .html, .aspx, .js, etc. # # The script writes the found instances of the <search_string> to a file out.txt. # This file then may be viewed using any text editor. You can also write the output to a # file called out.html, and view it with a web browser, but the formatting will not be correct. # # NOTE: Depending on the seedURL, this script may take a long time to execute and may collect # a very large amount of data. Writing large amounts of data to screen can slow down execution # even further. # # This script calls other sample scripts as follows. # SS_URLs Collects URLs from $seedURL and from URLs thus collected. # SS_SearchURL Searches each URL thus found. # # This script can be stored and edited as necessary, in a text file # called SS_SearchWeb.txt . The script can then be called as # # script SS_SearchWeb.txt seedURL("http://www.abc.def/.../page.html") search_string("Computer Training") # ##################################################################### var str seedURL # Name of the seed URL var str search_string # string or regular expression to search for var str ignore_domains # List of domains to be ignored. Domains are separated by |. var str URLList # We collect a list of URLs found along # the way in this variable. We repeatedly # call script SS_SearchURL for each # URL in this list. var str foundURL # The URL we are currenly processing. var str processedURLList # we keep the list of processed URLs in here, # before processing a new URL, we always check if # we already did that URL before. This way, we do not # go into an infinite loop, because URLs do mutually # refer each other. # Add the seedURL to URLList. We will start the list with # just this one URL. echo $seedURL >> $URLList while ($URLList <> "") do # Get the next URL. lex "1" $URLList > $foundURL # Did we already process this URL ? # Create dynamic argument for the sen command. var str sen_arg set $sen_arg = "^````"+$foundURL+"````^" # The resulting value of $sen_arg will like this: ^````<URL>````^ # That means we are looking for the exact URL with the "````" before and after. # If there is any character before/after the URL which is not `, that's a different # URL. This makes sure that, for example, http://www.abc.def does not match http://www.abc.def/xyz. if ( { sen -c $sen_arg $processedURLList } == 0 ) do # No, we did not process this URL. # Output this URL. echo $foundURL >> out.txt # Search this URL first. script SS_SearchURL.txt URL($foundURL) search_string($search_string) >> out.txt # Add URLs found in this URL to our list of URLs to process. script SS_URLs.txt URL($foundURL) ignore_domains($ignore_domains) >> $URLList # Add this URL to processedURLList. We will add a marker "````" before and after the URL. echo ("````"+$foundURL+"````") >> $processedURLList done endif done

© 2008-2013, biterScripting.com. All rights reserved.
biterScripting, biterScript, biterBrowser, biterMobile, biterScripting.com, FVA (Forward Variable Assignment) are trademarks of biterScripting.com. Is it biterScripting-compatible ? is a service mark of biterScripting.com. Explorer, Unix, Windows are trademarks, service marks or other forms of intellectual property of their respective owners.