I'd like to write a simple app to crawl/scrape some web content. The more I delve into the analysis, the more I realize that the instructions I need to give to the app in order to navigate through the site and get the content I wish it to extract, may be very peculiar depending on the target website structure.
The idea was to write some configuration files to define how the app should browse and scrape each specific site, but defining such behaviour could be challenging, unless you write in the configuration file some actual Scala code.
So, the idea is to write some code able to get a scraper object instance reading a file written in the .sbt format and inject some code in it.
First of all, I need to know where to start to achieve such task: what library should I use?
Could it be easier to write some sbt tasks and use sbt as the core of the app instead of writing one from scratch? What should be the limitations in this approach?
I apologize for being so general, but I don't have the slightest idea from where to start. I'd like you to head me to the right direction and post some docs to read.
Consider the app is meant to be a CLI tool, no graphical interface needed, then.
Related
As a process of rebranding, we have a change the names in one of our huge project.
Is there a way that let me refactor my Java code base (rename class and package) programmatically using eclipse/other tools?
Eclipse has a way of persisting refactorings that you apply into an XML file and gives you the possibility to replay this script later, see https://help.eclipse.org/neon/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Fconcepts%2Fconcept-refactoring.htm.
It is not that hard to write this script by hand, especially if you're just renaming code (refactoring within a method-body needs much more fine-grained information).
I was hoping to find a way to automatically generate some code based on existing code.
The actual functionality would be very similar to javadoc or in this case IDLdoc or to automatic get/set functions.
I want to create some generic code based on some already listed parameters.
How do I accomplish this within eclipse?
I think an example would be best, so here is what I would like to accomplish:
keyword1: stuffIdontCareAbout, $;comments
keyword2: otherStuffIdontCareAbout, $;more comments
keyword3: lastStuffIdontCareAbout $
What do I need to do in eclipse so that I can have eclipse quickly parse the above block and output the following for another part of the code?
KEYWORD1=inp_keyword1, KEYWORD2=inp_keyword2, KEYWORD3=inp_keyword3
Thanks
My usual knee-jerk response is to suggest that you use JET as that what it's designed for.
For this specific case, however, you might be better off just writing a simple popup action (use the new plugin project with the popup action template) that parses the properties file (looks simple enough to do in Java) and writes out the target code to another file, the console or, if you're clever, back into an existing file in the right place.
Once you have the plugin generated for you with the template, the rest should be simple Java.
I really like sbt and its extendibility. I'd like to use it as the basis of my own little stand-alone console-like tool. Basically it would have a bunch of tasks and such. I'm thinking something like Lifty, but I'd like to have one command that would launch sbt, load any relevant plugins (whether Lifty or my own), and then present users with my own custom prompt with a limited set of tasks & settings available.
Is this possible without jumping through a lot of hoops?
Yes, it is, and stuff like Play and Akka do exactly that. You might get an inkling on how to do all you need with sbt-extras, or looking at the above-mentioned projects.
I would like to access a Redmine taskbase via a simple text based interface - wondering what the shortest path would be (minimum investment/development).
Right now, this boils down to 2 use cases/phases:
Import a batch of tasks into Redmine from simple, wiki-based, bulletted TODO list, ie. plain text content. This is more of a one-off task, so a quick and dirty solution would be fine.
Later, some smooth two-way synchrosation would be great.. E.g. edit loads of tasks via some friendly plain text (or XML) in an editor, or scripting where I could manipulate all of them with simple text processing; then synchronise with Redmine and commit them back.
Any ideas on the easiest way to achieve these?
I'd prefer an external solution (i.e not touching the server), especially for the one-off import case; something like a neat IDE/editor/client, or a standalone Ruby script (e.g using the RM API).
If an appropriate RM plugin would be available, I would not resist giving it a try (can get root access from our lovely IT support:)..
Current ideas:
Emacs/Org-mode, looks like a great combination of a cool task manager UI and full plain text power. It seems rich enough to capture tags, states as well. This artice looks promising Orgmode and Roundup: Bridging public bugtrackers and local tasklists, although not exactly a perfect match.
org-mode parser in Ruby, could be used in an script with redmine-api access, or - worst case(for me, right now)- in newly developed RM plugin.. This looks like a good start: org-ruby
export RM->XML, process file, import XML->RM... not sure if this is supported?
I guess it's always possible to talk to the DB directly, but I'd prefer to avoid that.
Actually, I'm also interested a similar solution for Bugzilla.
At the simplest level, you could write a RM/Rails plugin that parses an Org-Mode task list, updating corresponding issues in the RM Model.
Equally, you can build a view for Redmine (again as a Rails plugin) to generate an org list of the current (or subset of) issues.
For Bugzilla I think you would be best off using the XML-RPC interface to do your issue comparison/update sync, so you'd have to take a very different approach from Redmine.
If you have any specific questions, please update your question, it's quite broad at the moment.
Update
At the moment, there are a few plugins which will probably help you figure out your solution, for example Nick Boltons xml import and Martin Liu's Redmine CSV Import Plugin but neither of these are going to completely solve the problem for you, just give you some useful starting point.
On the other hand, If you write a script that interacts with Redmine's REST api, you don't need it to be in any specific lanugage, in fact you could do it in Emacs-lisp, if the target users of the script are all Emacs aware, then this might well be the best way to do the job. (it would certainly be the most appealing option to me.)
Maybe this can be useful: https://github.com/fukamachi/redmine-el
I'm evaluating the possibility of developing an Eclipse plugin to modify the source code of some Java files.
The Eclipse plugin should:
add one menu option or context menu option to launch the modification process.
add a key binding
only alter the UI in that way when an editor has been open on a Java file.
the modification process would not open a dialog, or maybe, a very simple one.
the modification process would traverse the AST of the Java file and would modify it.
Considering that we have no experience with Eclipse plugins and we need spend time in reading docs, how much time do you estimate in developing that plugin?
Thanks in advance.
It's really not that difficult at all... I had students in my design patterns class doing it for an assignment (adding/removing javabean getters and setters)
See http://help.eclipse.org/ganymede/topic/org.eclipse.jdt.doc.isv/guide/jdt_api_manip.htm
[EDIT: added the following article reference]
And a great article on it at http://www.eclipse.org/articles/article.php?file=Article-JavaCodeManipulation_AST/index.html (from 2006 -- there may be a few API changes since)
Yes, writing plugins takes a little getting used to, but so does any API.
And you can modify the AST -- see the page I reference above.
(I should note that the above link is from the eclipse help, which can also be accessed via Help->Help Contents inside Eclipse -- there's a lot of good info in there, but it's just a starting point)
You'll probably spend quite some time cursing the complexity of the eclipse plugin system. There are some example plugin development projects that can be very helpful if they cover the area you're working in.
I'd say you're looking at 2-4 days of work, spent mainly getting familiar with the platform - someone with a lot of experience writing eclipse plugins would probably take no more than an hour.
However, your step 5 could be tricky. I don't know how easy it is to access and change the Java AST; my experience is based on developing an editor plugin for an exotic file format rather than Java code.
Well, the four first points are easy to achieve, even by monkey coders that look at the eclipse PDE documentation shipped with Eclipse. These can be achieve in 1 day of work, maybe 2.
The hardest point is really the fifth one and the kind of modification you expect to do. Acting directly on the editor content is simple, accessing the editor internal AST and modifying it is really a bigger challenge and I doubt that it could be achieve in less than a week by unexperimented people (it can take longer, depending of what kind of modification you want to apply).