Uncommon Techniques for Naoko 4.5 ================================= This is a collection of methods or "nuances" in Proxomitron's matching language that are not well known. It is intended for those that are already familiar with the official help files. sidki, December 2004 -- last updated October 2010 by JJoe ------------------------------------------------------------------------------- 1 HTTP header fields may start with multiple whitespaces (RFC2616 SEC 4.2). To cover this, $IHDR() and $OHDR() need two spaces after the colon, because the first space lacks the full "zero to infinite" magic. Examples: $IHDR(X-Test:( ) foo) $IHDR(Content-Length:( ) \1) $OHDR(User-Agent:( ) weird_ua_that_prepends_multi_spaces) ------------------------------------------------------------------------------- 2 A capturing positional variable that directly follows a negated expression may be incorrectly found to have value in the expressions and filters that follow. To cover this, place positional variables that are next to and follow negated expressions in parentheses. Examples: (^[a-z])(\0) (^foo(^bar))(\9) $TST(foo=(^*bar)(\7)) ------------------------------------------------------------------------------- 3 You don't need to match the entire value string in $IHDR(), $OHDR(), $RESP(), or $URL() commands. Examples: $IHDR(Content-Type:( ) image/j) $OHDR(User-Agent:*opera) $RESP(2) ^$RESP([345]) $URL(http://www.Shonen.Knife.com/Naoko/M) ------------------------------------------------------------------------------- 4 Variables can be conditionally set by inserting a test. Examples: $SET(0=$TST(foo=false)new value) \0 is reset $SET(0=$TST(foo=true)new value) \0 = "new value" $SET(test=$TST(foo=false)new value) test retains previous value $SET(test=new$TST(foo=false) value) test = "new" $SET(test=new$TST(foo=true) value) test = "new value" ------------------------------------------------------------------------------- 5a Positional variables or meta-chars that are not supposed to expand until replacement - unless when used with commands like $ADDLST() or a global $SET() - can also be expanded in a string test by using parens. Examples: $TST((\1)=foo) $TST((my \1\2)=my foobar) $TST((\p)=/Naoko/Michie/Atsuko/kappa*) http://$TST((\x))bweb..mysite.com/ matches only if URL command prefix is present http://($TST((\x))|)bweb..mysite.com/ matches if URL command prefix may or may not be present Note: Direct expansion may leak memory, if used in blocklists. 5b "Replace only" commands expand straight in a string test without need for parens. Examples: $TST($DTM(w)) $TST($DTM(c)=*1) $SET(1=%66%6F%6F)$TST($UESC(\1)=foo) ------------------------------------------------------------------------------- 6a You can't test a positional variable that has been previously assigned with $SET(), unless you expand it immediately. Examples: $SET(0=foo)$TST(\0=foo) test fails $SET(0=foo)$TST((\0)=foo) test succeeds $SET(1=foo)$SET(2=bar)$TST((\1\2)=foobar) test succeeds 6b However, you can reSET a previously assigned positional variable and test it, as long a you do both in a subexpression that is preceeded by an ampersand. Examples: $STOP()(?)\0&$SET(0=)$TST(\0=?) test fails $STOP()(?)\0$SET(0=)&$TST(\0=?) test succeeds ------------------------------------------------------------------------------- 7 Positional variables can be reused (assign -> use -> reassign) with global variables because they are immediately expanded. Examples: (???)\0$SET(a=\0)(??)\0$SET(b=\0) ((?)\0$SET(a=$GET(a)\0-))+ ------------------------------------------------------------------------------- 8 You can test for the absence of a variable. Examples: ^$TST(\0=*) (^$TST(foo=*))match_expr ------------------------------------------------------------------------------- 9 You can stop a filter without letting it match as a whole by placing $STOP() anywhere before the match fails -- $STOP() is always processed when encountered. In below examples PrxFail$TST() is used to force match failure, even if a web page should "accommodate" to public Proxomitron filters. $TST() never matches, but is a bit slow and only processed in latter scenario. Examples: $STOP()$SET(foo=bar)DontMatchMe sets stop, sets foo=bar, match fails ( $TST(var=true) if var is "true" process match_expr, |(^$TST(var=true))$STOP()PrxFail$TST() else set stop and fail match ) match_expr ------------------------------------------------------------------------------- 10 Global variables can be set without letting the filter match as a whole by placing the $SET() anywhere before the match fails -- global $SETs are always processed when encountered, and retained even if match fails. Examples: $STOP()$SET(activator=1)DontMatchMe <(tag1$SET(tag1=1)|tag2$SET(tag2=1))PrxFail$TST() ------------------------------------------------------------------------------- 11a Loops -- Limiting expression scopes. You can use "+" loops to isolate subexpressions, removing their capatibility to look ahead. Example: Say we want to match , but only if the following tag isn't (^*) ... wouldn't work, because "*>" doesn't stop at the first match but is looking ahead. ]+>(^[^<]+) ... would work, but [^...] forces inspection of each character. )+{1}(^(*<)+{1}/foo>) ... does what we want, quickly. "*>", "*<" are not looking ahead anymore. 11b Avoiding superfluous tests in OR conditions. Example: Say we want to match "prefix-possible_suffix ... some_string" and capture "-possible_suffix" if present. prefix(-possible_suffix|)\1*some_string ... would cause the filter attempting twice to match: "prefix-possible_suffix ... no_match" prefix((-possible_suffix)+)\1*some_string ... does what we want. 11c However, +/++ loops remove the uniqueness of the string under test, even if followed by {1,*}. If possible, and if you aren't just testing the very beginning of a document or bounds match, try to start your test string with at least one unique character (better more), Example: To test for 100 asterisk symbols anywhere in a document: \*\*\*+{98} instead of \*+{100} *EOF*