How can I download Yahoo Groups? - perl

I want to download some Yahoo Groups (files, photos, messages, memberlist) and I've found these scripts:
http://freshmeat.net/projects/grabyahoogroup/
http://sourceforge.net/project/showfiles.php?group_id=62034
I've downloaded ActivePerl and the needed modules from CPAN (nothing fancy; they're very easy to find). I've managed to install them, but when I run the script I get an error after it tells me that I've successfully logged in:
"Use of uninitialized value $cells in pattern match (m//) at yahoogroups_files.pl line 244, line 2."
I'm guessing that Yahoo changed the layout of the page or something, but I'm not able to update the script myself. I'm a newbie when it comes to Perl and understanding the way Yahoo generates the pages, I only know some basic C++. I want to mention that I'm not lazy, I'll try do fix it myself but I need your help: hints, advice, anything.
PS: I've contacted the author, but he isn't willing to update the scripts.

You would need knowledge in the following fields:
use of an html parser
http knowledge ( get/post/head )
web scraping
I suggest you focus on WWW::Mechanize since it's capable of all these things ( and more )
EDIT: another solution ( that doesn't need programming ) , is this: login with your browser on yahoo groups, store the cookie, and then run wget , passing the stored cookie as a parameter. This way you'll get the task accomplished very fast.
Find your browser's cookies.txt file on your harddrive, and then call wget like this ( if I remember the commands correctly ) :
wget --load-cookies path_to_cookie_file -r -w 60 website
The full man page can be found here
EDIT2: Another option is to use WebDriver to automate firefox. You can use this article as a guide on how to accomplish this.

By the filename I'm assuming you're using Yahoo Group archiver found here: http://sourceforge.net/projects/grabyahoogroup/
I ran the files script against the SubEthaEdit group and it works great. All of the files downloaded without incident.
Looking at the code it seems to barf while processing an html table in a while loop if $cells is empty.
Considering the code did work when I tested it it's possible there's something going on with the listing of that group's files. You'll want to try outputting $content and figure out where and why the regular expression on 243 isn't able to process that html.
EDIT: If you don't mind posting the group this is happening with I'm sure myself or someone else here can try it out and troubleshoot on their own. It's tough to pinpoint what's up when the issue can't be duplicated. Also, try the same group I did and see if it works out for you. Certainly something up with the group you're trying if that works.

Dunno if it will help you, but here's what I did to get the message-download working:
http://sourceforge.net/forum/forum.php?thread_id=3283915&forum_id=209170
(I only used message-download, I didn't look at file-download)

Was tinkering on this a while ago to backup my girlfriend's group messages and files from uni. Upon debugging on the latest scripts I've found out that there seems to be a bug on group_domain declaration (theres also a group declaration bug that i've found on yahoo2maildir.pl of the same project, see $request)
($group_domain) = $url =~ /\/\/(.*?groups.yahoo.com)\//;
in this case, i've overwritten the $request var under the function sub download_folder() with
from <br>
$request = GET "http://$group_domain/group/$group/files$sub_folder/";
<br> to <br>
$request = GET "http://**groups.yahoo.com/group/$user_group**/files$sub_folder/";

grabyahoogroup works well in the latest edition, which can be found at the svn repo:
http://grabyahoogroup.svn.sourceforge.net/viewvc/grabyahoogroup/trunk/yahoo_group/
The version at sourceforge.net/projects/grabyahoogroup/files/ HAS BUGS AND DID NOT WORK FOR ME.

I've been looking for a tool that collects messages/conversations from Yahoo Groups!. I finally found this tool that converts your Yahoo! Groups messages into MBOX format after struggling to try to make my own and searching everywhere on the internet.
Download tools
Both of the following are Google Chrome extensions.
Chrome Extension to Download Members posted by Sam Hobbs (2015).
Chrome Application To Download Messages posted by Mark Fletcher (Jan 2016).
Plain string to Base64 binary data
At some time past September 16, 2010 (at least for me), the messages retrieved are no longer plain text and instead Base 64 binary data (ASCII). Using this swiss converter tool can allow you to read the data as it is.
Sample content from the MBOX format
VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=
Sample result after conversion
The quick brown fox jumps over the lazy dog.

for cause, as of 2019/09
https://github.com/csaftoiu/yahoo-groups-backup
.....

Related

MagnificPopup won´t work with Typo3

I would like to use magnific popup for a site which is still in development, but unfortunately nothing happens after the installation and adding the static content in my template.
No matter what i try no popup comes up.
If someone would like to help me i will provide access to the site.
Thanks a lot!
First thing to do:
look at your source code: does the 'link' show any sign of 'magnific popup' code?
Added classes? id?
Second: do you see any extra javascript and/or stylesheet that is added by the extension?
If you answer negative to 1 of these, the extension does not output anything. Sounds logic, but it is the first step. Is it a solution? Nope, it means your live just got a bit more worse, but hang in there !
If it does show any code from the ext: look at your console, are there any js errors occurred ? (if you don't know what console is, or (even worse) you work with IE) please read about chrome console of at least install Firefox with firebug.
My best beth would be a JS error...
Can it be possible that the ext itself does not work?
Does your php error logs tell you that the extension is behaving badly?
Do you see errors in the typoscript analyser (or whatever it's called to analyse the css_styled_content and other TS spaghetti )
If not, then no, it is not the ext.
Again, i'm betting my wive, three horses and a barrel of beer on JS erros.
Good luck mate !
ps: IF i'm wrong, i'm not sending you my wife by postal service. Loads of trouble last time. Nor the horses, same shizzle ..

Perl Dancer send_file Issue with Images

I have a Perl Dancer web application that uses GD to dynamically create images. I am trying to deliver these images to the user as PNG. For example:
package MyApp;
use Dancer ':syntax';
use GD;
...
get '/dynamic_image/:var1/:var2' => sub {
my $im = GD::Image->new(100,100);
my $black = $im->colorAllocate(0,0,0);
my $white = $im->colorAllocate(255,255,255);
$im->rectangle(10,10,90,90,$white);
my $png = $im->png;
return send_file( \$png, content_type => 'image/png', filename => params->{var1}."_".params->{var2}.".png" );
};
However, when accessing the above route, Chrome and Firefox don't seem to know what to do with the image data. If I try to use the route in Lightbox, Chrome complains. For example, when clicking on a link like this:
link
Chrome's console says:
Resource interpreted as Image but transferred with MIME type application/octet-stream: "http://www.example.com/dynamic_image/my/image".
It looks like Dancer is not using content_type correctly. Interestingly, IE8 seems to load the images just fine. Any idea what's going on? I'm currently running it standalone on Windows 7 with Strawberry Perl v5.16.2.
To explain the different behavior with IE: If IE encounters a Content-Type of application/octet-stream, it will attempt to scan the file to determine a more specific MIME type. That behavior is covered more here.
I recommend using the GET` commandline tool from Perl's LWP distribution to confirm what's going on. You can try this:
GET -sSe http://www.example.com/dynamic_image/my/image | less
The result should include among other things the Content-Type header. It sounds like you'll find that it says application/octet-stream. This starts to look like an issue with Dancer.
You didn't specify what version of Dancer you are using. Older versions did not support the content_type option to send_file(). If you are are reading the latest docs on CPAN and expecting them to apply to an older version, there could be some confusion.
It does not seem to be a dancer problem. There are other environments where it happens too.
Resource interpreted as Document but transferred with
MIME type image
After banging my head against this for awhile, I think I can answer my own question. Firefox actually tipped me off to a bug in my own code. Basically, when accessing the dynamically created image in Firefox, it would display a page with the HTTP request info along with the PNG data. I noticed that some debugging text was displayed on the page. It turns out that I left a print in one of the loops that generated the image data (I had used it to verify the image was being built correctly), and that text somehow made it into the "image" itself--which I assume caused Firefox and Chrome to freak out a bit. So this wasn't a Dancer or application bug, but a PEBKAC issue. Thanks for the input, everybody.

Gtkwebkit, save html to pdf

Last days I search for best and shortest way to convert html files to pdf. Since I create my html files with C program and see them through gtkwebkit which uses cairo it should be some efficient and direct way to convert content of showed page to html with C (I think).
But can't find any example or direction to go on the net.
Until now, among different virtual printers, I find only commandline tools which are maded in perl or which depends on qt what is not wanted.
Please for any suggestion, example or advice to get this functionality from gtkwebkit and if not, maybe something with some tiny C library.
As far as I can tell from reading the documentation (haven't tried it out myself):
Get the main frame with webkit_web_view_get_main_frame().
Create a GtkPrintOperation with gtk_print_operation_new().
Set the export-file property on your print operation to be the name of the PDF you want to export to.
Print the frame with webkit_web_frame_print_full(). Make sure to pass GTK_PRINT_OPERATION_ACTION_EXPORT as the 'action' parameter.
I once wrote some code, to accomplish that without opening a window. But then I ran into a problem with using that code from multiple threads (in a webserver e.g.). I made some research and I figured out that gtk itself is single threaded. So I made my code thread safe, by queuing the print operations to the main thread. Anyway, if it helps, check it out... https://github.com/gnudles/wkgtkprinter

Problems with Forms2Go script

I am currently trying to make a form for a website in work. I have created the script in Perl using Forms2Go and have entered the send mail and bin paths given to me. At the first the script wouldn't execute but the hosts made changes to the sever and now it does.
Problem now is the script executes and takes the user to the thank you page but doesn't send the form to the e-mail address which has been tested by the hosting company.
I have a feeling that the send-mail path isn't correct and that is why it is executing but not sending the email, anything else it might be?
Thanks for reading.
Tom
Forms To Go is payware and they do not provide their source code publicly which makes trouble-shooting by the general Internet populace rather difficult. Try their support forum instead. If you're looking for a form mailer that does not suck, install nms TFMail.

Redirect from Web query open agent on Lotus Domino?

Does anyone know a way of redirecting to another webpage from lotus domino web query an open agent? The print statement does not seem to work. A possible workaround would be very appreciated!
Something on that subject can be found here but it seems kind of flaky like this
link.
Can it be done without JavaScript and major redesign of document form?
Thanks in advance.
You do realize that the 4/5 forum is almost a decade out-of-date, right? If you're using a more recent version of Notes/Domino, I'd check the 6/7 forum (or even the 8 if you're on that release, since it's so different); here's the search results for "webqueryopen redirect"; there are a lot more possible answers.
Notes 6/7 forum results
The simplest answers to your actual question are here and here.
Better to wite to 'location:' header directly, as avoids Javascript and meta-tags.
This is from one of the responses on the thread you pointed to and should work fine - have the WQO agent simply write to a hidden field on the form. This will do a client-side redirect, so you'll get two trips to the server.
You could use your WQO to set a field
on the document called redirect. Set
the field to text and hide it from
everything. HEre's what you put in the
field (with your WQO agent):
location.replace('http://www.website.com');
In your HtmlHeadContent, put this
formula:
"" +
#NewLine + redirect + #NewLine +
""
A line of code in WQO agent does the trick, but note it still loads the page before the redirect:
Call s.Documentcontext.Replaceitemvalue("$$HTMLHead", {<meta http-equiv="REFRESH" content="0;url=http://www.etfos.hr">"})
in your agent, simply do the following
print |[| & requiredURL & |]|
Try to see if you really need to open the document instead of running an agent and pass the params in the Query_String
I think that wqo cannot redirect to another page as it will end up redirecting you to the document you either opened or are creating. I may be wrong but yesterday I was trying to either do one thing or the other based on the params I was passing to the OpenCorm url command and it seems that although it is running the agent, the pw.println() command doesn't behave the same as if it was a WQS agent.
If you don't need to open a document, try running an agent and pass the params to it
The proper way to do it is to have one print statement with the URL you want to redirect to in brackets. This will generate a 302 REDIRECT on the server.
Example code:
%REM
Agent redirect
Trigger: On Schedule - Never
Target: None
Security Level: 1
%END REM
Option Public
Option Declare
Sub Initialize
Print "[http://www.ibm.com]"
End Sub
Some people have suggested Meta Refresh. This is discouraged by the W3C. See the following link:
http://en.wikipedia.org/wiki/Meta_refresh#Usability