This script is designed to assist in generating URL.txt files for Public Web Browser (PWB).
It takes input from a file called PreURL.txt and generates a file called PostURL.txt.

Why needed
==========

Although you obviously don't need to use this script to create PWB filters, it can help
you make more bullet-proof filter entries. Although you can use simple filter entries in
native PWB, they can be overly permissive unless you carefully craft your entry. For instance,
if you wish to blacklist everything by default but whitelist book.com, you might put
the following in your PWB URL.ini:

-all
+book.com

However, the sly user will be able to access, say, google.com, by doing entering this URL
in the PWB address bar:
http://google.com?abc=book.com

Similarly, the user could access facebook.com directly:
http://facebook.com

While it's feasible to manually close these gaps by using regular expressions in URL.txt,
it's cumbersome and makes the URL.txt file harder to maintain. This script can help.


How to use
==========

Enter your [usually] simple entries in PreURL.txt, and run the script.

After the script runs, you need to copy the contents of PostURL.txt into whichever file PWB 
uses for URL filtering, which is, by default, URL.txt.

You may be able to use the "pass-thru" feature described below to create the full contents of
your URL.txt file. Otherwise (i.e., if you wish to use this script to produce only *some* of the
filter entries for URL.txt), take care to segregate your native URL.txt filter entries from
the ones generated by this PreFilter script. Otherwise, if you intertwine the entries, you will
have a hard time updating the URL.txt file when you want to re-run this script.

Here are the syntax rules for the PreURL.txt file (read all of these!):

a) Start entries with a + or a - to indicate whether the item is to be whitelisted or blacklisted. 
   If you omit this character, a '+' is assumed (allow). 

b) If you want to specify that the entry applies only to http or to https, 
   add the http:// or https:// prefix. Otherwise, PreFilter matches both http and https URLs

c) If you want to allow any subdomain matches, insert an asterisk before the domain name
   Note: It is immaterial whether you include the period after the asterisk.
   Both +*example.com, and +*.example.com will match both example.com and test.example.com.

d) The only wildcard supported is the asterisk described in (c) above, unless you use pass-thru
   (see (f) below), in which case you can use any wildcards you wish.

e) If a line begins with a straight apostrophe ('), the line is considered a comment and is ignored.
   If a line begins with a semicolon (;) it is considered a comment and is copied to the output file.

f) Pass-thru: If a line begins with a vertical bar (|), that bar is removed and the remainder
   of the line is copied verbatim to the output file. Use this if you need lines to appear in
   your PWB URL.txt file which do not fit into the common http/https structure that this script
   handles, and when you want to use PreURL.txt as the "master" source for all of your ultimate
   URL.txt PWB filter entries. If you use pass-thru to handle all filter entries, you can change
   your PWB.ini file to use PostURL.txt as the URL file, in which case all you need to do to
   modify your filters is to make the changes in PreURL.txt and run the script.

Note: Unless using pass-thru mode (starting the line with a vertical bar), do not use any
regular expression syntax in your input file (other than the asterisk described in (c), above.

Examples:

+example.co    Will permit: http://example.co, https://example.co/, http://example.co/experiment.html
               but not: http://test.example.co, http://oneexample.co, http://example.com

-*example.com or -*.example.com
               Will reject http://example.com, https://example.com, http://try.example.com/experiment.html
               but not: http://oneexample.com

https://*example.com
               Will permit: https://example.com, https://example.com/, https://my.example.com/demo.pdf
               but not: http://example.com, https://anotherexample.com

example.com/test
               Will permit: http://example.com/test, http://example.com/test/, http://example.com/test/1.htm
               but not: http://example.com/test.html, http://example.com/testing/123.htm

|-all          Will write "-all" (without quotes) into the output file

|+about:blank  Will write "+about:blank" (without quotes) to the output file


An Example
==========

Suppose you want to allow PWB to accept URLs from any page in the domain 'example.com'
In PreURL.txt, this entry will appear thusly:

*example.com

After PreFilter.vbs runs, this one line will turn into the following in PostURL.txt:

+^(https?://example\.com/)
+^(https?://example\.com/?)$
+^(https?://[A-Za-z0-9\.\-]+\.example\.com/)
+^(https?://[A-Za-z0-9\.\-]+\.example\.com/?)$


Credits
=======
Use this software at your own risk. Feel free to edit/distribute it as you wish.
Alan Mandel



