Thursday, April 12, 2007

Finding Database Tables used in ASP files - grep.exe sort.exe uniq.exe

We have a few websites sharing a single database.
We need to split the database, so that each website uses an separate database.

The problem was to find the set of tables used by a single website.

Instead of manually going through all the *.asp files,
I harness the power of Unix command line tools: grep

We use a standard naming convention for all database table names.
We prefix all database table names with "tbl".

Open Windows Command Prompt:
cd C:\websites\www
grep -o -r tbl[A-Za-z0-9]* *

Output:
test\view_tblCustomer.asp:tblCustomer
test\view_tblCustomer.asp:tblCustomer
test\view_tblMember.asp:tblMember
test\view_tblMember.asp:tblMember
test\view_tblMember.asp:tblMember
test\view_tblUser2.asp:tblUser2
test\view_tblUser2.asp:tblUser2

grep command usage:
Usage: grep [OPTION]... PATTERN [FILE] ...
Search for PATTERN in each FILE or standard input.
Example: grep -i 'hello world' menu.h main.c

Regexp selection and interpretation:
-E, --extended-regexp PATTERN is an extended regular expression
-F, --fixed-strings PATTERN is a set of newline-separated strings
-G, --basic-regexp PATTERN is a basic regular expression
-P, --perl-regexp PATTERN is a Perl regular expression
-e, --regexp=PATTERN use PATTERN as a regular expression
-f, --file=FILE obtain PATTERN from FILE
-i, --ignore-case ignore case distinctions
-w, --word-regexp force PATTERN to match only whole words
-x, --line-regexp force PATTERN to match only whole lines
-z, --null-data a data line ends in 0 byte, not newline

Miscellaneous:
-s, --no-messages suppress error messages
-v, --invert-match select non-matching lines
-V, --version print version information and exit
--help display this help and exit
--mmap use memory-mapped input if possible

Output control:
-m, --max-count=NUM stop after NUM matches
-b, --byte-offset print the byte offset with output lines
-n, --line-number print line number with output lines
--line-buffered flush output on every line
-H, --with-filename print the filename for each match
-h, --no-filename suppress the prefixing filename on output
--label=LABEL print LABEL as filename for standard input
-o, --only-matching show only the part of a line matching PATTERN
-q, --quiet, --silent suppress all normal output
--binary-files=TYPE assume that binary files are TYPE
TYPE is 'binary', 'text', or 'without-match'
-a, --text equivalent to --binary-files=text
-I equivalent to --binary-files=without-match
-d, --directories=ACTION how to handle directories
ACTION is 'read', 'recurse', or 'skip'
-D, --devices=ACTION how to handle devices, FIFOs and sockets
ACTION is 'read' or 'skip'
-R, -r, --recursive equivalent to --directories=recurse
--include=PATTERN files that match PATTERN will be examined
--exclude=PATTERN files that match PATTERN will be skipped.
--exclude-from=FILE files that match PATTERN in FILE will be skipped.
-L, --files-without-match only print FILE names containing no match
-l, --files-with-matches only print FILE names containing matches
-c, --count only print a count of matching lines per FILE
-Z, --null print 0 byte after FILE name

Context control:
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-NUM same as --context=NUM
--color[=WHEN],
--colour[=WHEN] use markers to distinguish the matching string
WHEN may be `always', `never' or `auto'.
-U, --binary do not strip CR characters at EOL (MSDOS)
-u, --unix-byte-offsets report offsets as if CRs were not there (MSDOS)

`egrep' means `grep -E'. `fgrep' means `grep -F'.
With no FILE, or when FILE is -, read standard input. If less than
two FILEs given, assume -h. Exit status is 0 if match, 1 if no match,
and 2 if trouble.

Display only Table Names (without FileName):
grep -o -r -h tbl[A-Za-z0-9]* *

Output:
tblCustomer
tblCustomer
tblMember
tblMember
tblMember
tblUser2
tblUser2

Removing duplicate Table Names:
grep -o -r -h tbl[A-Za-z0-9]* * | sort.exe | uniq.exe

Output:
tblCustomer
tblMember
tblUser2

Finding only ASP files:
grep --include=*.asp -o -r tbl[A-Za-z0-9]* *

References:

Tuesday, April 03, 2007

Wusage Rewrite Rules with Regular Expressions

I have edited Wusage's configuration file to rewrite the url before Wusage engine process the url. The Wusage's Rewrite Rules below shows the use of Wusage's Regular Expressions.

Wusage's documentation on the usage of Regular Expressions is not well documented. There were no examples to write Regular Expressions with Replacement using Backreferences.

NOTE: Wusage's regular expression must begin with @

After some trial and error, here is what I have learnt.

Example 1:
URL to match:
/flash/header.swf?rnd=0.24753951619780357


Regular expression:
@^([^\.]*\.swf)\?.*$

Replacement expression:
\1 Returns the entire string matched by the regular expression, e.g. /flash/header.swf?rnd=0.24753951619780357
\2 Returns the first sub-match, e.g. ([^\.]*\.swf), /flash/header.swf

Example 2:
URL to match:
/resize_image.aspx?FilePath=/resource_docs/SEA_Sewar.jpg&width=640


Regular expression:
@^/resize_image\.aspx\?FilePath=([^\.]*\.(jpg|gif|tif|bmp))&.+$

Replacement expression:
\1 Returns the entire string matched by the regular expression, e.g. /resize_image.aspx?FilePath=/resource_docs/SEA_Sewar.jpg&width=640
\2 Returns the first sub-match, e.g. ([^\.]*\.(jpg|gif|tif|bmp)), /resource_docs/SEA_Sewar.jpg

Wusage's config file snippet:
#Rewrite Rules...
rewrite
# /flash/header.swf?rnd=0.24753951619780357
# /flash/header_exhibition.swf?rnd=0.7556053221950332
# /flash/header_learning.swf?rnd=0.0028210814865592404
# /flash/header_visitus.swf?rnd=0.01042118194914482
url @^([^\.]*\.swf)\?.*$ \2

# /resize_image.aspx?FilePath=/resource_docs/SEA_Sewar.jpg&width=640
url @^/resize_image\.aspx\?FilePath=([^\.]*\.(jpg|gif|tif|bmp))&.+$ \2

url /exhibitions/events.asp?* /exhibitions/events.asp
url /exhibitions/eventdetail.asp?* /exhibitions/eventdetail.asp
url /exhibitions/exhibitions.asp?* /exhibitions/exhibitions.asp
url /exhibitions/booking.asp?* /exhibitions/booking.asp
url /exhibitions/thankyou.asp?* /exhibitions/thankyou.asp
url /exhibitions/booking_clear.asp?* /exhibitions/booking_clear.asp
url /exhibitions/ticketdetail.asp?* /exhibitions/ticketdetail.asp

url /booking/login/loginaction.asp?* /booking/login/loginaction.asp
url /booking/payment/add_booking.asp?* /booking/payment/add_booking.asp
url /booking/report/attendence.asp?* /booking/report/attendence.asp
url /booking/report/payment_details.asp?* /booking/report/payment_details.asp
url /booking/report/details.asp?* /booking/report/details.asp
url /booking/payment/details.asp?* /booking/payment/details.asp
url /booking/report/excel_details.asp?* /booking/report/excel_details.asp
url /booking/report/excel_payment_details.asp?* /booking/report/excel_payment_details.asp

url /pressroom/releasesdetail.asp?* /pressroom/releasesdetail.asp
url /pressroom/newsdetail.asp?* /pressroom/newsdetail.asp
url /pressroom/news.asp?* /pressroom/news.asp
url /pressroom/releases.asp?* /pressroom/releases.asp

url /learning/imagepopup.asp?* /learning/imagepopup.asp
url /learning/museumartefacts.asp?* /learning/museumartefacts.asp

url /themuseum/galleries2.asp?* /themuseum/galleries2.asp
end rewrite