Uncommon Techniques for Naoko 4.5
=================================
This is a collection of methods or "nuances" in Proxomitron's matching language
that are not well known. It is intended for those that are already familiar
with the official help files.
sidki, December 2004 -- last updated October 2010 by JJoe
-------------------------------------------------------------------------------
1 HTTP header fields may start with multiple whitespaces (RFC2616 SEC 4.2).
To cover this, $IHDR() and $OHDR() need two spaces after the colon, because
the first space lacks the full "zero to infinite" magic.
Examples:
$IHDR(X-Test:( ) foo)
$IHDR(Content-Length:( ) \1)
$OHDR(User-Agent:( ) weird_ua_that_prepends_multi_spaces)
-------------------------------------------------------------------------------
2 A capturing positional variable that directly follows a negated expression
may be incorrectly found to have value in the expressions and filters that
follow. To cover this, place positional variables that are next to and
follow negated expressions in parentheses.
Examples:
(^[a-z])(\0)
(^foo(^bar))(\9)
$TST(foo=(^*bar)(\7))
-------------------------------------------------------------------------------
3 You don't need to match the entire value string in $IHDR(), $OHDR(),
$RESP(), or $URL() commands.
Examples:
$IHDR(Content-Type:( ) image/j)
$OHDR(User-Agent:*opera)
$RESP(2)
^$RESP([345])
$URL(http://www.Shonen.Knife.com/Naoko/M)
-------------------------------------------------------------------------------
4 Variables can be conditionally set by inserting a test.
Examples:
$SET(0=$TST(foo=false)new value) \0 is reset
$SET(0=$TST(foo=true)new value) \0 = "new value"
$SET(test=$TST(foo=false)new value) test retains previous value
$SET(test=new$TST(foo=false) value) test = "new"
$SET(test=new$TST(foo=true) value) test = "new value"
-------------------------------------------------------------------------------
5a Positional variables or meta-chars that are not supposed to expand until
replacement - unless when used with commands like $ADDLST() or a global
$SET() - can also be expanded in a string test by using parens.
Examples:
$TST((\1)=foo)
$TST((my \1\2)=my foobar)
$TST((\p)=/Naoko/Michie/Atsuko/kappa*)
http://$TST((\x))bweb..mysite.com/ matches only if URL command prefix is
present
http://($TST((\x))|)bweb..mysite.com/ matches if URL command prefix may or
may not be present
Note: Direct expansion may leak memory, if used in blocklists.
5b "Replace only" commands expand straight in a string test without need for
parens.
Examples:
$TST($DTM(w))
$TST($DTM(c)=*1)
$SET(1=%66%6F%6F)$TST($UESC(\1)=foo)
-------------------------------------------------------------------------------
6a You can't test a positional variable that has been previously assigned with
$SET(), unless you expand it immediately.
Examples:
$SET(0=foo)$TST(\0=foo) test fails
$SET(0=foo)$TST((\0)=foo) test succeeds
$SET(1=foo)$SET(2=bar)$TST((\1\2)=foobar) test succeeds
6b However, you can reSET a previously assigned positional variable and test
it, as long a you do both in a subexpression that is preceeded by an
ampersand.
Examples:
$STOP()(?)\0&$SET(0=)$TST(\0=?) test fails
$STOP()(?)\0$SET(0=)&$TST(\0=?) test succeeds
-------------------------------------------------------------------------------
7 Positional variables can be reused (assign -> use -> reassign) with global
variables because they are immediately expanded.
Examples:
(???)\0$SET(a=\0)(??)\0$SET(b=\0)
((?)\0$SET(a=$GET(a)\0-))+
-------------------------------------------------------------------------------
8 You can test for the absence of a variable.
Examples:
^$TST(\0=*)
(^$TST(foo=*))match_expr
-------------------------------------------------------------------------------
9 You can stop a filter without letting it match as a whole by placing
$STOP() anywhere before the match fails -- $STOP() is always processed when
encountered.
In below examples PrxFail$TST() is used to force match failure, even if a
web page should "accommodate" to public Proxomitron filters. $TST() never
matches, but is a bit slow and only processed in latter scenario.
Examples:
$STOP()$SET(foo=bar)DontMatchMe sets stop, sets foo=bar, match fails
(
$TST(var=true) if var is "true" process match_expr,
|(^$TST(var=true))$STOP()PrxFail$TST() else set stop and fail match
)
match_expr
-------------------------------------------------------------------------------
10 Global variables can be set without letting the filter match as a whole by
placing the $SET() anywhere before the match fails -- global $SETs are
always processed when encountered, and retained even if match fails.
Examples:
$STOP()$SET(activator=1)DontMatchMe
<(tag1$SET(tag1=1)|tag2$SET(tag2=1))PrxFail$TST()
-------------------------------------------------------------------------------
11a Loops -- Limiting expression scopes.
You can use "+" loops to isolate subexpressions, removing their
capatibility to look ahead.
Example:
Say we want to match , but only if the following tag isn't
(^*)
... wouldn't work, because "*>" doesn't stop at the first match but is looking
ahead.
]+>(^[^<]+)
... would work, but [^...] forces inspection of each character.
)+{1}(^(*<)+{1}/foo>)
... does what we want, quickly. "*>", "*<" are not looking ahead anymore.
11b Avoiding superfluous tests in OR conditions.
Example:
Say we want to match "prefix-possible_suffix ... some_string" and capture
"-possible_suffix" if present.
prefix(-possible_suffix|)\1*some_string
... would cause the filter attempting twice to match:
"prefix-possible_suffix ... no_match"
prefix((-possible_suffix)+)\1*some_string
... does what we want.
11c However, +/++ loops remove the uniqueness of the string under test, even if
followed by {1,*}.
If possible, and if you aren't just testing the very beginning of a
document or bounds match, try to start your test string with at least one
unique character (better more),
Example:
To test for 100 asterisk symbols anywhere in a document:
\*\*\*+{98} instead of \*+{100}
*EOF*