My gitweb installation works so far, but all generated links that include a querystring, e.g.
host/gitweb?p=somepath&a=summary
are somehow malformed.
First, the ampersand is replaced by a semicolon. When inspecting the html, the link looks like
host/gitweb?p=somepath;a=summary
When klicking the link, the browser escapes the ';' to '%3b', so the url sent to the server looks like
host/gitweb?p=somepath%3ba=summary
The gitweb.cgi does not parse this and displays a 404 error page. When I replace the '%3b' with a ';' or a '&', everything works fine.
How can this be fixed on the server-side?
So far, I have tried to find the line producing the ';' in the urls, which is line 1457
$href .= "?" . join(';', #result) if scalar #result;
replacing the ';' by '&' gives me a malformed xhtml in the browser. Replacing it by '&' forces the browser again to escape the ';' which produces broken urls again.
The issue is kind of hidden (I can view the repositories), if I set the option
$feature{'pathinfo'}{'default'} = [1];
in the gitweb.conf file, but unfortunately, folders containing multiple repositories cannot be displayed, since the respective link uses some query-parameters.
You do not encoded or decoded the query string as a hole.
The param name and value must be encoded and decoded individually and the delimiters of the query string i.e..
= ; &
never get URL encoded or decoded.
You can check the server environment variable $ENV{REQUEST_URI} to see if the server did receive the information encoded like you said. If the browser is sending it then that is the problem and there is nothing you can do in your Perl code to fix that. Because it will just cause more problems down the road in your Perl code.
I somehow solved the problem using a rewrite rule in apache:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^a=project_list%3bpf=(.*)$
RewriteRule (.*) $1?a=project_list;pf=%1 [QSA]
It only works for the project list with a specified path with the pathinfo feature enabled. A holistic approach would be to search for all possible query parameters and to replace them all.
However, this solution is not satisfying, since it does not fix the actual problem of the weird url escaping of gitweb.
Related
I have $stringF. Contained within $stringF is the following (the string is all one line, not word-wrapped as below):
http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=
AFQjCNHWQk0M4bZi9xYO4OY4ZiDqYVt2SA&clid=
c3a7d30bb8a4878e06b80cf16b898331&cid=52779892300270&ei=
H4IAW6CbK5WGhQH7s5SQAg&url=https://abcnews.
go.com/Lifestyle/wireStory/latest-royal-wedding-thousands-streets-windsor-55280649
I want to locate that string and make it look like this:
https://abcnews.go.com/Lifestyle/wireStory/latest-royal-
wedding-thousands-streets-windsor-55280649
Basically I need to use preg_replace to find the following string:
http://news.google.com/news/url?sa= ***SOME UNKNOWN CONTENT*** &url=http
and replace it with the following string:
http
I'm a little rusty with my php, and even rustier with regular expressions, so I'm struggling to figure this one out. My code looks like this:
$stringG = preg_replace('http://news.google.com/news/url?sa=*&url=http','http',$stringH);
except I know I can't use wildcards and I know I need to specially deal with the special characters (colon, forward slash, question mark, and sign, etc). Hoping someone can help me out here.
Also of note is that my $stringF contains multiple instances of such strings, so I need the preg_replace to be not greedy - otherwise it will replace a huge chunk of my string unnecessarily.
PHP has tools for that, no need to use a regex. parse_url to get the components of an url (scheme, host, path, anchor, query, ...) and parse_str to get the keys/values of the query part.
$url = 'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNHWQk0M4bZi9xYO4OY4ZiDqYVt2SA&clid=c3a7d30bb8a4878e06b80cf16b898331&ci=52779892300270&ei=H4IAW6CbK5WGhQH7s5SQAg&url=https://abcnews.go.com/Lifestyle/wireStory/latest-royal-wedding-thousands-streets-windsor-55280649';
parse_str(parse_url($url, PHP_URL_QUERY), $arr);
echo $arr['url'];
I'm creating a PowerShell script that will assemble an HTTP path from user input. The output has to convert any spaces in the user input to the product specific codes, "%2F".
Here's a sample of the source and the output:
The site URL can be a constant, though a variable would be a better approach for reuse, as used in the program is: /http:%2F%2SPServer/Projects/"
$Company="Company"
$Product="Product"
$Project="The new project"
$SitePath="$SiteUrl/$Company/$Product/$Project"
As output I need:
'/http:%2F%2FSPServer%2FProjects%2FCompany%2FProductF2FThe%2Fnew%2Fproject'
To replace " " with %20 and / with %2F and so on, do the following:
[uri]::EscapeDataString($SitePath)
The solution of #manojlds converts all odd characters in the supplied string.
If you want to do escaping for URLs only, use
[uri]::EscapeUriString($SitePath)
This will leave, e.g., slashes (/) or equal signs (=) as they are.
Example:
# Returns http%3A%2F%2Ftest.com%3Ftest%3Dmy%20value
[uri]::EscapeDataString("http://test.com?test=my value")
# Returns http://test.com?test=my%20value
[uri]::EscapeUriString("http://test.com?test=my value")
For newer operating systems, the command is changed. I had problems with this in Server 2012 R2 and Windows 10.
[System.Net.WebUtility] is what you should use if you get errors that [System.Web.HttpUtility] is not there.
$Escaped = [System.Net.WebUtility]::UrlEncode($SitePath)
The output transformation you need (spaces to %20, forward slashes to %2F) is called URL encoding. It replaces (escapes) characters that have a special meaning when part of a URL with their hex equivalent preceded by a % sign.
You can use .NET framework classes from within Powershell.
[System.Web.HttpUtility]::UrlEncode($SitePath)
Encodes a URL string. These method overloads can be used to encode the entire URL, including query-string values.
http://msdn.microsoft.com/en-us/library/system.web.httputility.urlencode.aspx
I have the following problem: when I try to save the file that contains a semicolon in the name it returns a huge and weird stacktrace of the characters on the page. I've tried to escape, to trim and to replace those semicolons, but the result is still the same. I use the following regex:
$value =~ s/([^a-zA-Z0-9_\-.]|;)/uc sprintf("%%%02x",ord($1))/eg;
(I've even added the |; part separately..)
So, when I open the file to write and call the print function it returns lots of weird stuff, like that:
PK!}�3y�[Content_Types].xml ���/�h9\�?�0���cz��:� �s_����o���>�T�� (it is a huge one, this is just a part of it).
Is there any way I could avoid this?
Thank you in advance!
EDIT:
Just interested - what is the PK responsible of in this string? I mean I can understand that those chars are just contents of the file, but what is PK ? And why does it show the content type?
EDIT 2.0:
I'm uploading the .docx file - when the name doesn't contain the semicolon it works all fine. This is the code for the file saving:
open (QSTR,">", "$dest_file") or die "can't open output file: $qstring_file";
print QSTR $value;
close (QSTR);
EDIT 3.0
This is a .cgi script, that is called after posting some data to the server. It has to save some info about the uploading file to a temp file (name, contents, size) in the manner of key-value pairs. So any file that contains the semicolon causes this error.
EDIT 4.0
Found the cause:
The CGI param function while uploading the params counts semicolon as the delimiter! Is there any way to escape it in the file header?
The PK in file header it means it is compressed ZIP like file, like docx.
One guess: The ; is not valid character in filename at the destination?
Your regexp is not good: (the dot alone is applicable to any character...)
$value =~ s/([^a-zA-Z0-9_\-.]|;)/uc sprintf("%%%02x",ord($1))/eg;
Try this:
#replace evey non valid char to underscore
$value =~ s/([^a-zA-Z0-9_\-\.\;])/_/g;
I'm passing a URL parameter to a form via a get request. I need to URL encode the parameter when the parameter contains a '#' . Otherwise the request fails. Why is this required ? Why do I need to URL encode the '#' parameter but not other text ?
'#' is used in URLs to indicate where a fragment identifier
(bookmarks/anchors in HTML) begins.
The part following the # is never seen by the server. It is generally used for navigation at the client-end.
The following characters need to be encoded in order to be used literally.
When using GET, anything after # (and the # itself) will not be seen by the server.
I am trying to use a web API of a service written in Perl (OTRS).
The data is sent in JSON format.
One of the string values inside the JSON structure contains a pound sign, which in apparently is used as a comment character in JSON.
This results in a parsing error:
unexpected end of string while parsing
JSON string
I couldn't find how to escape the character in order to get the string parsed successfully.
The obvious slash escaping results in:
illegal backslash escape sequence in
string
Any ideas how to escape it?
Update:
The URL I am trying to use looks something like that (simplified but still causes the error):
http://otrs.server.url/otrs/json.pl?User=username&Password=password&Object=TicketObject&Method=ArticleSend&Data={"Subject":"[Ticket#100000] Test Ticket from OTRS"}
Use Uri::escape:
use URI::Escape;
my $safe = uri_escape($url);
See rfc1738 for the list of characters which can be unsafe.
The hash symbol, #, has a special meaning in URLs, not in JSON. Your URL is probably getting truncated at the hash before the remove server even sees it:
http://otrs.server.url/otrs/json.pl?User=username&Password=password&Object=TicketObject&Method=ArticleSend&Data={"Subject":"[Ticket
And that means that the remote server gets mangled JSON in Data. The solution is to URL encode your parameters before pasting them together to form your URL; eugene y tells you how to do this.