negative regex with xidel + garbage-collect function - xidel

I currently use this command to extract URLs from a site
xidel https://www.website.com --extract "//h1//extract(#href, '.*')[. != '']"
This will extract all URLs (.*) but I would like to change this in a way that it would not extract URLs that contain specific strings in their URI path. For example, I would like to extract all URLs, except the ones that contain -text1- and -text2-
Also, xidel has a function called garbage-collect but it's not clear to me how to use these functions. I could be
--extract garbage-collect()
or
--extract garbage-collect()[0]
or
x:extract garbage-collect()
or
x"extract garbage-collect()
But these didn't reduce the memory usage when extracting URLs from multiple pages using --follow.

Just noticed this old question. It looks like OP's account is suspended, so I hope the following answer will be helpful for other users.
Let's assume 'test.htm' :
<html>
<body>
<span class="a-text1-u">1</span>
<span class="b-text2-v">2</span>
<span class="c-text3-w">3</span>
<span class="d-text4-x">4</span>
<span class="e-text5-y">5</span>
<span class="f-text6-z">6</span>
</body>
</html>
To extract all "class"-nodes, except the ones that contain "-text1-" and "-text2-":
xidel -s test.htm -e "//span[not(contains(#class,'-text1-') or contains(#class,'-text2-'))]/#class"
#or
xidel -s test.htm -e "//#class[not(contains(.,'-text1-') or contains(.,'-text2-'))]"
c-text3-w
d-text4-x
e-text5-y
f-text6-z
xidel has a function called garbage-collect but it's not clear to me how to use these functions.
http://www.benibela.de/documentation/internettools/xpath-functions.html#x-garbage-collect:
x:garbage-collect (0 arguments)
Frees unused memory. Always call it as garbage-collect()[0], or it might garbage collect its own return value
and crash.
So that would be -e "garbage-collect()[0]".

Related

XPath Expression: Optional character?

I would like to find:
<div style="text-align:center;" >
<div style="text-align: center;" >
<div style="text-align:center" >
<div style="text-align: center" >
So an optional space before center and an optional semicolon at the end.
I can do:
//div[#style=’text-align:center;’ or #style=’text-align: center;’ or #style=’text-align:center’ or #style=’text-align: center’]
But is there a “cleaner” way? And able to take many more optional characters without getting too long?
You can first remove the optional characters f.e space and semicolon, assuming they aren't used in the required text, using translate() function, and then check whether the result equals only the required text f.e 'text-align:center' :
//div[translate(#style, ' ;', '') = 'text-align:center']
Or, when the pattern gets more complex, you can use regex in your XPath via PHP preg_match :
$xp->query("//div[php:function('preg_match', '~text-align:\s*center;*~', string(#style))]");
See full example demonstrating how to call PHP function from XPath in my older post : Get hrefs that match regex expression using PHP & XPath.

cURL filling up an HTML form with tcl

I need to make a Tcl program that logs into a web page and i need to fill up some information and get some information.
The page has lots of forms with diferent types of input, radio/check buttons, entry strings etc the usual.
i can log into the page no problem and fill up the forms without a problem but i have to fill EVERY input for that particular form or else it will be save as empty (the things i didnt specify)
Heres an example:
this is the form:
--- FORM report. Uses POST to URL "/goform/FormUpdateBridgeConfiguration"
Input: NAME="management_ipaddr" (TEXT)
Input: NAME="management_mask" (TEXT)
Input: NAME="upstr_addr_type" VALUE="DHCP" (RADIO)
Input: NAME="upstr_addr_type" VALUE="STATIC" (RADIO)
--- end of FORM
and this is the command i use to fill it up
eval exec curl $params -d upstr_addr_type=STATIC https://$MIP/goform/FormUpdateBridgeConfiguration -o /dev/null 2> /dev/null
where params is:
"\--noproxy $MIP \--connect-timeout 5 \-m 5 \-k \-S \-s \-d \-L \-b Data/curl_cookie_file "
yes i know is horrible but it is what it is .
In this case i want to change the value of upstr_addr_type to STATIC but when i sumit it i lose the info from management_ipaddr and management_mask.
This is a small example, i have to do this for every form and a gizillion more variables so its a real problem for me.
i figure its concept problem or something like that, i look and look and look some more, try -F -X GET -GET -almost every thing on cURL manual, can someone guide me here
If you know what the values of management_ipaddr and management_mask should be, you can just supply them as extra -d arguments. It probably makes sense to wrap this in a procedure
proc UpdateBridgeConfiguration {management_ipaddr management_mask upstr_addr_type} {
global MIP params
eval exec curl $params \
-d management_ipaddr=$management_ipaddr \
-d management_mask=$management_mask \
-d upstr_addr_type=$upstr_addr_type \
"https://$MIP/goform/FormUpdateBridgeConfiguration" \
-o /dev/null 2> /dev/null
# You ought to replace the first line of the above call with:
# exec curl {*}$params \
# Provided you're not on Tcl 8.4 or before...
}
Like that, you'll find it much easier to get the call correct. (You shouldn't need to specify -X POST for this; it's default behaviour when -d is provided.)
To get the existing values, you'll need to GET them from the right URL (which I can't guess for you) and extract them from the resulting HTML. Which might involve using a regular expression against the retrieved document. This is pretty awful, but it's what you're stuck with sometimes. (You can use tDOM to parse HTML properly — provided it isn't too ill-formed — and then use its XPath support to query for the values correctly, but that's rather more complex and introduces a dependency on an external package.) Knowing what the right RE to use is can be tricky, but it is likely to involve grabbing a copy of the form and doing something vaguely like this:
regexp -nocase {<input type="text" name="management_ipaddr" value="([^<"">]*)"} $formSource -> management_ipaddr
regexp -nocase {<input type="text" name="management_mask" value="([^<"">]*)"} $formSource -> management_mask
While in general it could be encoded all sorts of ways, that's very unlikely for IP addresses or masks! On the other hand, the order of the attributes can vary; you have to customize your RE to what you're really dealing with, not merely what it might be…
The curl invokation will be something like
set formSource [exec curl "http://$MIP/the/right/url/here" {*}$params 2>/dev/null]
It's much simpler when you're not having to send data up and you want to consume the result.

mcc function can't return value,why?

I use matlab mcc to create a standalone application exe file, then I use php to call the exe file. but I can't get the function return value,it's always empty!! here is my test example in m file
function result=mysum(in)
if nargin<1
in=[1,2,3];
else
in=str2num(in);
end
result=sum(in);
end
then I use the command mcc -m mysum.m to create exe file(I have already configured the matlab compiler).
here is the php file
<html>
<head>
<title>test</title>
</head>
<body>
<?php
exec('F:\myevm\apache\htdocs\shs.exe [2,2,3,3,3] [4,4,4,4,4] 356 1567 1678',$ars);
echo '<br>';
echo $ars[0];
?>
</body>
</script>
</html>
however ,the $ars[0] is always empty!!
I tried to find answer by myself or through the Internet,but failed . give me a help, thanks.
Note two things:
You have your function set up to accept a single input argument.
When you run an application from the Windows command line, arguments are passed in as strings.
So if you type mysum 1 (either in MATLAB on the uncompiled program, and I would guess also if you do this from the Windows command line on the compiled program, although I haven't tested this) it will work, giving the answer 1, and if you type mysum [1,2] it will work, giving the answer 3. Note that mysum [1,2] is different from mysum([1,2]), as it is being passed the string '[1,2]', not the array of doubles [1,2].
But if you type mysum 1 2 it will fail, as you are now passing two string input arguments in, and your function is set up to only accept one.
Rewrite your function so that it accepts a variable number of input arguments (take a look at varargin to achieve that), applies str2num in turn to each of the inputs (which will be varargin{1} to varargin{n} if you've used varargin), and then sums them individually.

php gettext include string with phpcode

i'm trying to use gettext to translate the string in my site
gettext doesn't have problem detecting strings such as
<? echo _("Donations"); ?>
or
<? echo _("Donate to this site");?>
but obviously, usually we'll use codes like this in our site
<? echo _("$siteName was developed with one thing in mind"); ?>
Of course in the website, the $siteName is displayed correctly as
My Website was developed with one thing in mind
if we put
$siteName = "My Website";
previously.
My problem is, i'm using poedit to extract all the strings in my codes that needs to be translated, and it seems poedit doesn't extract all string with php codes like I described above. So how do I get poedit extract strings with php code inside it too? Or is there any other tools I should use?
One possibility is to use sprintf. Just make sure you keep the percent (%) in the poedit string!
echo sprintf( _("This %s can be translated "), 'string');
Or when using multiple variables
echo vsprintf( _("This %s can be %s"), ['string', 'translated']);

zsh filename globbling/substitution

I am trying to create my first zsh completion script, in this case for the command netcfg.
Lame as it may sound I have stuck on the first hurdle, disclaimer, I know how to do this crudely, however I seek the "ZSH WAY" to do this.
I need to list the files in /etc/networking but only the files, not the directory component, so I do the following.
echo $(ls /etc/network.d/*(.))
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
What I wanted was:
ethernet-dhcp wireless-wpa-config
So I try (excuse my naivity) :
echo ${(s/*\/)$(ls /etc/network.d/*(.))}
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
It seems that this doesn't work, I'm sure there must be some clever way of doing this by splitting into an array and getting the last part but as I say, I'm complete noob at this.
Any advice gratefully received.
General note: There is no need to use ls to generate the filenames. You might as well use echo some*glob. But if you want to protect the possible embedded newline characters even that is a bad idea. The first example below globs directly into an array to protect embedded newlines. The second one uses printf to generate NUL terminated data to accomplish the same thing without using a variable.
It is easy to do if you are willing to use a variable:
typeset -a entries
entries=(/etc/network.d/*(.)) # generate the list
echo ${entries#/etc/network.d/} # strip the prefix from each one
You can also do it without a variable, but the extra stuff to isolate individual entries is a bit ugly:
# From the inside, to the outside:
# * glob the entries
# * NUL terminate them into a single string
# * split at NUL
# * strip the prefix from each one
echo ${${(0)"$(printf '%s\0' /etc/network.d/*(.))"}#/etc/network.d/}
Or, if you are going to use a subshell anyway (i.e. the command substitution in the previous example), just cd to the directory so it is not part of the glob expansion (plus, you do not have to repeat the directory name):
echo ${(0)"$(cd /etc/network.d && printf '%s\0' *(.))"}
Chris Johnsen's answer is full of useful information about zsh, however it doesn't mention the much simpler solution that works in this particular case:
echo /etc/network.d/*(:t)
This is using the t history modifier as a glob qualifier.
Thanks for your suggestions guys, having done yet more reading of ZSH and coming back to the problem a couple of days later, I think I've got a very terse solution which I would like to share for your benefit.
echo ${$(print /etc/network.d/*(.)):t}
I'm used to seeing basename(1) stripping off directory components; also, you can use echo /etc/network/* to get the file listing without running the external ls program. (Running external programs can slow down completion more than you'd like; I didn't find a zsh-builtin for basename, but that doesn't mean that there isn't one.)
Here's something I hope will help:
haig% for f in /etc/network/* ; do basename $f ; done
if-down.d
if-post-down.d
if-pre-up.d
if-up.d
interfaces