I'm trying to remove some parameters using nginx return 301.
For example I want URLs redirected like this:
https://www.example.com/one?wanted=1&unwanted1=1 -> https://www.example.com/one?wanted=1
https://www.example.com/one?unwanted1=1 -> https://www.example.com/one
https://www.example.com/one?unwanted1=1&unwanted2=2&unwanted3=3 -> https://www.example.com/one
...and so on
I did some experiments with this code (note it uses rewrite instead of return):
if ($request_uri ~ "([^\?]*)\?(.*)unwanted=([^&]*)&?(.*)") {
set $args $2$4;
rewrite "^" $scheme://$host$uri permanent;
}
This works well when only one of the unwanted parameters are present.
I tried to repeat it for each parameter but now it doesn't work in an optimum way as it does multiple redirects and does not clean up the "&" when it is no longer needed (only one arg left).
...moreover - I would prefer to use return 301 instead of rewrite as I understand it would be the preferred way in these kind of situations.
Tried to adapt the last line to return 301 but it didn't work out for me.
Any ideas how I can correctly accomplish this?
To remove multiple unwanted arguments in an indeterminate order, you will need to use some form of recursion. Your question contains an example that uses an external redirection loop, but suffers from undesired edge cases.
The edge cases of leaving trailing ? and & can be cleared up using multiple if...return blocks.
For example:
if ($args ~ ^(?:unwanted1|unwanted2)=[^&]*$ ) {
return 301 $uri;
}
if ($args ~ ^(?:unwanted1|unwanted2)=[^&]*(?:&(.*))?$ ) {
return 301 $uri?$1;
}
if ($args ~ ^(.*)&(?:unwanted1|unwanted2)=[^&]*(&.*)?$ ) {
return 301 $uri?$1$2;
}
The above works by treating the two cases when a trailing ? or trailing & occurs, separately with a specially crafted return statement.
The above works by using an external redirection loop.
nginx can also perform internal redirection, and one method to achieve this is to use a rewrite...last from within a location block.
The restriction in this case would be to find the location block which will process the set of URIs that are affected by the unwanted arguments. In the example below, I use the location / block, but your requirements will be affected by the other location blocks within your configuration and the set of URIs affected.
Although you have clearly had success with assigning to the internal $args variable, my answer avoids that technique.
The following example reimplements the previous example using rewrite...last and adds a forth if block to perform the actual return 301.
location / {
if ($args ~ ^(?:unwanted1|unwanted2)=[^&]*$ ) {
rewrite ^ $uri? last;
}
if ($args ~ ^(?:unwanted1|unwanted2)=[^&]*(?:&(.*))?$ ) {
rewrite ^ $uri?$1? last;
}
if ($args ~ ^(.*)&(?:unwanted1|unwanted2)=[^&]*(&.*)?$ ) {
rewrite ^ $uri?$1$2? last;
}
if ($request_uri ~ \?(.*&)?(unwanted1|unwanted2)= ) {
return 301 $uri$is_args$args;
}
try_files $uri $uri/ =404;
}
Note that the example uses only permitted statements within the if blocks as indicated in this document.
Related
I want to parse Scala grammar with flex and bison. But I don't know how to parse the newline token in Scala grammar.
If I parse newline as a token T_NL, Here's the Toy.l for example:
...
[a-zA-Z_][a-zA-Z0-9_]* { yylval->literal = strdup(yy_text); return T_ID; }
\n { yylval->token = T_LN; return T_LN; }
[ \t\v\f\r] { /* skip whitespaces */ }
...
And here's the Toy.y for example:
function_def: 'def' T_ID '(' argument_list ')' return_expression '=' expression T_NL
;
argument_list: argument
| argument ',' argument_list
;
expression: ...
;
return_expression: ...
;
You could see that I have to skip T_NL in all other statements and definitions in Toy.y, which is really boring.
Please educate me with source code example!
This is a clear case where bison push-parsers are useful. The basic idea is that the decision to send an NL token (or tokens) can only be made when the following token has been identified (and, in one corner case, the second following token).
The advantage of push parsers is that they let us implement strategies like this, where there is not necessarily a one-to-one relationship between input lexemes and tokens sent to the parser. I'm not going to deal with all the particularities of setting up a push parser (though it's not difficult); you should refer to the bison manual for details. [Note 1]
First, it's important to read the Scala language description with care. Newline processing is described in section 2.13:
A newline in a Scala source text is treated as the special token “nl” if the three following criteria are satisfied:
The token immediately preceding the newline can terminate a statement.
The token immediately following the newline can begin a statement.
The token appears in a region where newlines are enabled.
Rules 1 and 2 are simple lookup tables, which are precisely defined in the following two paragraphs. There is just one minor exception to rule 2 has a minor exception, described below:
A case token can begin a statement only if followed by a class or object token.
One hackish possibility to deal with that exception would be to add case[[:space:]]+class and case[[:space:]]+object as lexemes, on the assumption that no-one will put a comment between case and class. (Or you could use a more sophisticated pattern, which allows comments as well as whitespace.) If one of these lexemes is recognised, it could either be sent to the parser as a single (fused) token, or it could be sent as two tokens using two invocations of SEND in the lexer action. (Personally, I'd go with the fused token, since once the pair of tokens has been recognised, there is no advantage to splitting them up; afaik, there's no valid program in which case class can be parsed as anything other than case class. But I could be wrong.)
To apply rules one and two, then, we need two lookup tables indexed by token number: token_can_end_stmt and token_cannot_start_stmt. (The second one has its meaning reversed because most tokens can start statements; doing it this way simplifies initialisation.)
/* YYNTOKENS is defined by bison if you specify %token-tables */
static bool token_can_end_stmt[YYNTOKENS] = {
[T_THIS] = true, [T_NULL] = true, /* ... */
};
static bool token_cannot_start_stmt[YYNTOKENS] = {
[T_CATCH] = true, [T_ELSE] = true, /* ... */
};
We're going to need a little bit of persistent state, but fortunately when we're using a push parser, the scanner does not need to return to its caller every time it recognises a token, so we can keep the persistent state as local variables in the scan loop. (That's another advantage of the push-parser architecture.)
From the above description, we can see that what we're going to need to maintain in the scanner state are:
some indication that a newline has been encountered. This needs to be a count, not a boolean, because we might need to send two newlines:
if two tokens are separated by at least one completely blank line (i.e a line which contains no printable characters), then two nl tokens are inserted.
A simple way to handle this is to simply compare the current line number with the line number at the previous token. If they are the same, then there was no newline. If they differ by only one, then there was no blank line. If they differ by more than one, then there was either a blank line or a comment (or both). (It seems odd to me that a comment would not trigger the blank line rule, so I'm assuming that it does. But I could be wrong, which would require some adjustment to this scanner.) [Note 2]
the previous token (for rule 1). It's only necessary to record the token number, which is a simple small integer.
some way of telling whether we're in a "region where newlines are enabled" (for rule 3). I'm pretty sure that this will require assistance from the parser, so I've written it that way here.
By centralising the decision about sending a newline into a single function, we can avoid a lot of code duplication. My typical push-parser architecture uses a SEND macro anyway, to deal with the boilerplate of saving the semantic value, calling the parser, and checking for errors; it's easy to add the newline logic there:
// yylloc handling mostly omitted for simplicity
#define SEND_VALUE(token, tag, value) do { \
yylval.tag = value; \
SEND(token); \
} while(0);
#define SEND(token) do { \
int status = YYPUSH_MORE; \
if (yylineno != prev_lineno) \
&& token_can_end_stmt[prev_token] \
&& !token_cannot_start_stmt[token] \
&& in_new_line_region) { \
status = yypush_parse(ps, T_NL, NULL, &yylloc, &nl_enabled); \
if (status == YYPUSH_MORE \
&& yylineno - prev_lineno > 1) \
status = yypush_parse(ps, T_NL, NULL, &yylloc, &nl_enabled); \
} \
nl_encountered = 0; \
if (status == YYPUSH_MORE) \
status = yypush_parse(ps, token, &yylval, &yylloc, &nl_enabled); \
if (status != YYPUSH_MORE) return status; \
prev_token = token; \
prev_lineno = yylineno; \
while (0);
Specifying the local state in the scanner is extremely simple; just place the declarations and initialisations at the top of your scanner rules, indented. Any indented code prior to the first rule is inserted directly into yylex, almost at the top of the function (so it is executed once per call, not once per lexeme):
%%
int nl_encountered = 0;
int prev_token = 0;
int prev_lineno = 1;
bool nl_enabled = true;
YYSTYPE yylval;
YYLTYPE yylloc = {0};
Now, the individual rules are pretty simple (except for case). For example, we might have rules like:
"while" { SEND(T_WHILE); }
[[:lower:]][[:alnum:]_]* { SEND_VALUE(T_VARID, str, strdup(yytext)); }
That still leaves the question of how to determine if we are in a region where newlines are enabled.
Most of the rules could be handled in the lexer by just keeping a stack of different kinds of open parentheses, and checking the top of the stack: If the parenthesis on the top of the stack is a {, then newlines are enabled; otherwise, they are not. So we could use rules like:
[{([] { paren_stack_push(yytext[0]); SEND(yytext[0]); }
[])}] { paren_stack_pop(); SEND(yytext[0]); }
However, that doesn't deal with the requirement that newlines be disabled between a case and its corresponding =>. I don't think it's possible to handle that as another type of parenthesis, because there are lots of uses of => which do not correspond with a case and I believe some of them can come between a case and it's corresponding =>.
So a better approach would be to put this logic into the parser, using lexical feedback to communicate the state of the newline-region stack, which is what is assumed in the calls to yypush_parse above. Specifically, they share one boolean variable between the scanner and the parser (by passing a pointer to the parser). [Note 3] The parser then maintains the value of this variable in MRAs in each rule which matches a region of potentially different newlinedness, using the parse stack itself as a stack. Here's a small excerpt of a (theoretical) parser:
%define api.pure full
%define api.push-pull push
%locations
%parse-param { bool* nl_enabled; }
/* More prologue */
%%
// ...
/* Some example productions which modify nl_enabled: */
/* The actions always need to be before the token, because they need to take
* effect before the next lookahead token is requested.
* Note how the first MRA's semantic value is used to keep the old value
* of the variable, so that it can be restored in the second MRA.
*/
TypeArgs : <boolean>{ $$ = *nl_enabled; *nl_enabled = false; }
'[' Types
{ *nl_enabled = $1; } ']'
CaseClause : <boolean>{ $$ = *nl_enabled; *nl_enabled = false; }
"case" Pattern opt_Guard
{ *nl_enabled = $1; } "=>" Block
FunDef : FunSig opt_nl
<boolean>{ $$ = *nl_enabled; *nl_enabled = true; }
'{' Block
{ *nl_enabled = $3; } '}'
Notes:
Push parsers have many other advantages; IMHO they are they are the solution of choice. In particular, using push parsers avoids the circular header dependency which plagues attempts to build pure parser/scanner combinations.
There is still the question of multiline comments with preceding and trailing text:
return /* Is this comment
a newline? */ 42
I'm not going to try to answer that question.)
It would be possible to keep this flag in the YYLTYPE structure, since only one instance of yylloc is ever used in this example. That might be a reasonable optimisation, since it cuts down on the number of parameters sent to yypush_parse. But it seemed a bit fragile, so I opted for a more general solution here.
I am using this to find customer name in text file. Names are each on a separate line. I need to find exact name. If searching for Nick specifically it should find Nick only but my code will say found even if only Nickolson is in te list.
On*:text:*!Customer*:#: {
if ($read(system\Customer.txt,$2)) {
.msg $chan $2 Customer found in list! | halt }
else { .msg $chan 4 $2 Customer not found in list. | halt }
}
You have to loop through every matching line and see if the line is an exact match
Something like this
On*:text:*!Custodsddmer*:#: {
var %nick
; loop over all lines that contains nick
while ($read(customer.txt, nw, *nick*, $calc($readn + 1))) {
; check if the line is an exact match
if ($v1 == nick) {
%nick = $v1
; stop the loop because a result is found
break;
}
}
if (%nick == $null) {
.msg $chan 4 $2 Customer not found in list.
}
else{
.msg $chan $2 Customer found in list!
}
You can find more here: https://en.wikichip.org/wiki/mirc/text_files#Iterating_Over_Matches
If you're looking for exact match in a new line separate list, then you can use the 'w' switch without using wildcard '*' character.
From mIRC documentation
$read(filename, [ntswrp], [matchtext], [N])
Scans the file info.txt for a line beginning with the word mirc and
returns the text following the match value. //echo $read(help.txt, w,
*help*)
Because we don't want the wildcard matching, but a exact match, we would use:
$read(customers.txt, w, Nick)
Complete Code:
ON *:TEXT:!Customer *:#: {
var %foundInTheList = $read(system\Customer.txt, w, $2)
if (%foundInTheList) {
.msg # $2 Customer found in list!
}
else {
.msg 4 # $2 Customer not found in list.
}
}
Few remarks on Original code
Halting
halt should only use when you forcibly want to stop any future processing to take place. In most cases, you can avoid it, by writing you code flow in a way it will behave like that without explicitly using halting.
It will also resolve new problems that may arise, in case you will want to add new code, but you will wonder why it isn't executing.. because of the darn now forgotten halt command.
This will also improve you debugging, in the case it will not make you wonder on another flow exit, without you knowing.
Readability
if (..) {
.... }
else { .. }
When considering many lines of codes inside the first { } it will make it hard to notice the else (or elseif) because mIRC remote parser will put on the same identification as the else line also the line above it, which contains the closing } code. You should almost always few extra code in case of readability, especially which it costs new nothing!, as i remember new lines are free of charge.
So be sure the to have the rule of thump of every command in a new line. (that includes the closing bracket)
Matching Text
On*:text:*!Customer*:#: {
The above code has critical problem, and bug.
Critical: Will not work, because on*:text contains no space between on and *:text
Bug: !Customer will match EVERYTHING-BEFORE!customerANDAFTER <NICK>, which is clearly not desired behavior. What you want is :!Customer *: will only match if the first word was !customer and you must enter at least another text, because I've used [SPACE]*.
Let's say I have a URL such as:
http://www.example.com/something.html?abc=one&def=two&unwanted=three
I would like to remove the URL parameter unwanted and keep the rest of the URL in tact and it should look like:
http://www.example.com/something.html?abc=one&def=two
This specific parameter can be anywhere in the URL with respect to other parameters. The redirect should work regardless.
Can this be achieved?
You can achieve that this way
if ($request_uri ~ "([^\?]*)\?(.*)unwanted=([^&]*)&?(.*)") {
set $original_path $1;
set $args1 $2;
set $unwanted $3;
set $args2 $4;
set $args "";
rewrite ^ "${original_path}?${args1}${args2}" permanent;
}
I wanted to do something similar to this and here's the result ammended back to the context of the original question (regex is based on Milos Jovanovic's Answer)
if ($request_uri ~ "([^\?]*)\?(.*)unwanted=([^&]*)&?(.*)") {
set $args $2$4;
rewrite "^" $scheme://$host$uri permanent;
}
this way we only set one variable and if the updated $args is empty we dont have an unwanted ? at the end of the url as the server handles that itself
An other way is to redirect the client.
Of course it would work only for GET or HEAD, not if request has a body (ie: POST PUT PATCH)
BTW: in the specific case of fbclid the second line in map isn't required as the param is always appended to the url
map $request_uri $redirect_fbclid {
"~^(.*)(?:[?&]fbclid=[^&]*)$" $1;
"~^(.*)(?:([?&])fbclid=[^&]*)&(.*)$" $1$2$3;
}
if ( $redirect_fbclid ) {
return 301 $redirect_fbclid;
}
I saw a very manual way of doing this in another post: How do I add a query parameter to a URL?
This doesn't seem very intuitive, but someone there mentioned an easier way to accomplish this using the upcoming "URL scope". Is this feature out yet, and how would I use it?
If you're using the stdlib mixer, you should be able to use the URL scope which provides helper functions for adding, viewing, editing, and removing URL params. Here's a quick example:
$original_url = "http://cuteoverload.com/2013/08/01/buttless-monkey-jams?hi=there"
$new_url = url($original_url) {
log(param("hi"))
param("hello", "world")
remove_param("hi")
}
log($new_url)
Tritium Tester example here: http://tester.tritium.io/9fcda48fa81b6e0b8700ccdda9f85612a5d7442f
Almost forgot, link to docs: http://tritium.io/current (You'll want to click on the URL category).
AFAIK, there's no built-in way of doing so.
I'll post here how I did to append a query param, making sure that it does not get duplicated if already on the url:
Inside your functions/main.ts file, you can declare:
# Adds a query parameter to the URL string in scope.
# The parameter is added as the last parameter in
# the query string.
#
# Sample use:
# $("//a[#id='my_link]") {
# attribute("href") {
# value() {
# appendQueryParameter('MVWomen', '1')
# }
# }
# }
#
# That will add MVwomen=1 to the end of the query string,
# but before any hash arguments.
# It also takes care of deciding if a ? or a #
# should be used.
#func Text.appendQueryParameter(Text %param_name, Text %param_value) {
# this beautiful regex is divided in three parts:
# 1. Get anything until a ? or # is found (or we reach the end)
# 2. Get anything until a # is found (or we reach the end - can be empty)
# 3. Get the remainder (can be empty)
replace(/^([^#\?]*)(\?[^#]*)?(#.*)?$/) {
var('query_symbol', '?')
match(%2, /^\?/) {
$query_symbol = '&'
}
# first, it checks if the %param_name with this %param_value already exists
# if so, we don't do anything
match_not(%2, concat(%param_name, '=', %param_value)) {
# We concatenate the URL until ? or # (%1),
# then the query string (%2), which can be empty or not,
# then the query symbol (either ? or &),
# then the name of the parameter we are appending,
# then an equals sign,
# then the value of the parameter we are appending
# and finally the hash fragment, which can be empty or not
set(concat(%1, %2, $query_symbol, %param_name, '=', %param_value, %3))
}
}
}
The other features you want (remove, modify) can be achieved similarly (by creating a function inside functions/main.ts and leveraging some regex magic).
Hope it helps.
So I'm using a basic formmail script. Within the script I'm using a redirect variable. The value of the redirect is something like:
http://www.mysite.com/NewOLS_GCUK_EN/bling.aspx?BC=GCUK&IBC=CSEE&SIBC=CSEE
When the redirect action happens however, the URL appears in the browser as:
http://www.mysite.com/NewOLS_GCUK_EN/bling.aspx?BC=GCUK&IBC=CSEE&SIBC=CSEE
You can see the & characters are replaced with &
Is there any way to fix this?
Maybe you can edit the script with a string substitution:
$myRedirectURL =~ s/\&/\&/g;
Or perhaps look in the script where the opposite substitution is taking place, and comment out that step.
HTML::Entities's decode_entities could decode this for you:
$redirect_target = decode_entities($redirect_target);
But passing the destination URL as HTTP argument (e.g. hidden form field) is dangerous (as #Sinan Ünür already said in the comments). Better store the target URL within your script and pass a selector from the form:
if ($selector eq 'home') { $target_url = 'http://www.foo.bar/'; }
elsif ($selector eq 'bling') { $target_url = 'http://www.foo.bar/NewOLS_GCUK_EN/bling.aspx'; }
else {
$target_url = 'http://www.foo.bar/default.html'; # Fallback/default value
}
Using a Hash would be shorter:
my %targets = {
home => 'http://www.foo.bar/',
bling => '/NewOLS_GCUK_EN/bling.aspx',
};
$target_url = $targets{$selector} || '/default_feedback_thanks.html';