Parsing Google's search results - perl

I'm "working" on a data mining project and I've chosen to parse Google search results. Now before I actually start, I want to consult you - experienced folks.
I did a bit of research on how Google delivers results and I analyzed structure of a result page. That's all alright, I've already figured out regexes and data structures I'll use.
In between I encountered their CAPTCHA because I was searching too fast; oh, the irony. I've also discovered that they limit results to 1000 actually. Now, is there any way I could avoid those peripeties, perhaps slowing the rate of url fetching to solve the first one or reporting when encountering CAPTCHA so that it waits for my input; that might do it, but what about the other one ? Does Google provide some kind of an API that I can use for a workaround? I couldn't find one on their code.* page.

There is a Custom Search API.
It returns results in json or XML, so you won't even need to use regexes. However, you do need to pay for more than 100 searches a day.
What exactly are you trying to do? Maybe there is a better way to accomplish it.

Always look on CPAN first!
https://metacpan.org/pod/REST::Google
If someone hasn't already solved your problem, chances are it's a weird one :-)

Related

Exposing tags to Google Scholar

I want Google Scholar to detect and consider academic papers on my website. I implemented tags as required in [http://scholar.google.com/intl/en/scholar/inclusion.html], time passed (more than 3 months), and they still ignoring my metadata. I checked over and over, and nothing happens... any ideas on how to check this in a realistic way?
Google says some of they refresh takes more than a year.... it is dumb to wait a year to notice there was a bug... and then again wait for another year...
I have similar problems with Google Scholar. Unfortunately, there isn't much you can do. GS support isn't helpful at all, and indexing can take that much time. Sorry.

GAPI Class, Google Analytics API

I am about to start a new project in the Google Analytics API & PHP.
I read that Google Analytics will be deprecating XML v2.3 and v2.4 and in 6 months time, so aparently you will only be able to use v3 and retrieve information in JSON format.
http://analytics.blogspot.com/2011/12/introducing-google-analytics-core.html
My question is the following: Does this means that GAPI class won't work any longer? Anyone who has used this class before can help me answering this question ??
http://code.google.com/p/google-api-php-client/
In that case, any alternative suggestions of PHP classes that do the same thing.
Thanks so much
I've been using GAPI for a while now. And I can say with some confidence that yes it will break, if not due to XML it will be due to some other change google makes.
Having said that GAPI is the best solution I have found out there for php. It does break every 6 months to a year, usually needs one or two lines changing to fix. But GAPI is pretty popular so at least you know when your clients are calling saying analytics is throwing errors at them, you wont be the only dev tearing your hair out.
9 times out of 10, by the time I've got a problem someone else has found the fix - which is nice.
There are a few other php options out there but GAPI seems to be the most popular (usually the best way to go imho)
My approach is to build an analytics summary in the dashboard and provide a link to google analytics underneath so clients can see the full data or go there when GAPI breaks. I have been putting all my sites on the same modular system for a while now. I keep GAPI as a library in my admin layout module, this means I can make the fix once and roll it out to all my sites without too much drama.
In summary, use it but expect it to break - that way you wont be disappointed when it does.

Existing app that extracts meaningful data from old e-mails?

I was wondering if there is an application, and if not if it's worth writing one, that can gather meaningful data from old e-mails. I'm thinking things like:
Instructions (that could become "5 steps to..." posts)
Definitions
etc
Any idea? Suggestions? etc?
Well, I can offer the same solution as I did to this post, that is software like TexLexan or Alchemy API that can find keywords and other summary information. There is also a good list of open source and commercial solutions on this page. Definitely easier to see if one of those works then writing your own.

UIPickerView and a Giant Contact List?

I'm new to iOS Development and am trying to make an application that essentially sorts through a list of 300 names or so. I've got the Drill-Down part of the application down, aside from the detailView, but am now faced with a challenge.
What I would like to do is have users select from 3 fields with a UIPickerView to come up with shorter lists for every time a user is looking for a person. I'd like to use a .plist, but I also have an XML feed of the information. Before I waste all of my time structuring these data sources, does anybody have a good overview as to how I should approach this?
Also, I've asked some this question before, and they tell me to read up on introductory iOS development topics. I understand the mechanics of development, I just can't ever figure out how to approach a task properly. (I'm working on it!)
Thanks in advance. I'd share an image to help clarify, but my rep isn't high enough.
Snip: It looks like I misread your intention which makes my earlier comments irrelevant, you want to have the user select one of 3 options to shrink the list, if I'm not mistaken.
Some more questions for you, so I take it that this XML feed is going to be potentially changing between times that the user loads up the app? Will it only ever grow or are those 300 or so names that are loaded once set for good? The reason I ask so that you can maybe see my train of thought is whether or not using Core Data might be useful. You could easily store your large list locally, save time having to reload this large list frequently, and also you can use the built fetchedObjectController to search your collection of names. I'll keep thinking about it and once you get a chance to answer these questions we can continue.
Ill check back for an edit or comment, and see if I can give you an approach. Also, maybe edit your question with any of your own approach ideas and we could also start from there and refine them if needed.
Edit 2: From the information in the comments this is one of the ways that I could see this being done that make sense to me:
Since you seem to be able to control the information you receive from the feed I would set it up to send you only the contacts that need to be added/removed. You could handle this a few ways depending on your deployment intentions but I would go with the following:
Find a way to signal a first time run of the application, and as a result all contacts would be new, and you could populate your list fully with a slightly longer first time setup. Then any further changes could be quickly handled by smaller edits made to the local list.
You would need to set up Core Data for your application, which should be fairly straightforward in your case, and after this you can use the built in NSFetchRequest to do your searches that will then quickly return a list of narrowed down contacts. As for the physical picker that is just a matter of building the UI which will require some design from your end as you are the only one that knows what you are going for in that regard. Depending on the complexity of your app and what functionality you will want to include you could get away with 1-2 views that simply do the displaying of the contacts in a table and then the picker just reloads when appropriate.
I'm not familiar with the implementation of XML Feeds and receiving data from them, but I have done XML Response parsing into Core Data from a SOAP service before and they shouldn't be terribly different.
Regarding resource to get you started should you need them, I would recommend the following:
eBooks:
http://www.techotopia.com/index.php/Objective-C_2.0_Essentials
http://www.techotopia.com/index.php/IPhone_iOS_4_Development_Essentials_Xcode_4_Edition
Tutorials:
http://www.raywenderlich.com/
The eBooks I have linked are both absolutely fantastic and one of the few xCode 4.0 books that I was able to find that seemed to be of an actual usable quality. They both contain easy to follow and clear tutorials on simple and more advanced aspects of programming for iOS.
Ray's site is an immensely helpful resource as it contains both a very active forum base for iOS programming in addition to a constantly growing tutorial collection as there are 4-5 people that constantly are creating new tutorials that the community votes on and suggests every week. It contains some more advanced topics than the above books and I would recommend looking at it after doing a few walk through/tutorials from the books.
I'll stick around if you have any further questions, otherwise you can send me a notification via these comments, or just post another question and someone is bound to help you out!
-Karoly

I need to print 20,000 Word documents, is there a 3rd party tool that will help me do this or do I need to write custom code?

I need to print 20,000 Word documents. Naturally this is a logistical nightmare. For example: if the power goes out, I need some software that will be able to resume where the printing failed. Also, this is something that needs to be done once a month by our client.
Do I have to write my own code to manage this? (Word Automation)
Or does anyone know of a tool that will help me do this? (Googling has not given me any good options. And I'm willing to pay!)
Outsource the job to a specialist printing company.
Put your 20,000 documents in one folder
Press Ctrl-A to select all
Right click on 'print'
Note - there are many commercial printing houses which do this very thing. Often they provide an API to send Word or PDF documents. They'll even put the documents in a envelope and put them in the mail. This is how most banks and credit card companies send you your monthly statements.
Since I'm on Windows, I always use AutoIT for automation and/or repetitive tasks. It comes with a bunch of user-libraries, including one Microsoft Word, and it's very nice to work with.
I would try AutoIT: http://www.autoitscript.com/autoit3/
It should be possible to do this in any language which has an OLE capability. Most popular languages do, e.g. I know that Perl does.
I dont understand the problem. In a perfect world, all the 20k documents would be in the same folder, ready to be printed. Just... print?
Are you referring to the logistical problems? 20k documents can sure be heavy to carry around, but thats not a question for SO. Why would you need custom code for that?
Or do you lack a printer with sufficient capabilities? If your printer is to slow, old or inaccurate there are companies that handle printing for you.