Run Tika source code from Eclipse - eclipse

I have been using Apache Tika for extracting text from different document formats. Now i want to make it handle header, footer and text boxes differently. So i downloaded source code of Tika from GitHub and trying to make changes in it.
I want to run Apache Tika source code from Eclipse and debug its execution by passing an input document. How can i do that? There are so many main classes. Where do i start? I understand its a Maven project and i am new to it.
And once i make changes how can i create new jar file?

Take a look at Tika's xhtml output first, maybe it extracts headers/footers and you can use parser API to handle these parts as you wish. If it's that way, use API as examples say passing custom SAX-like handler to it.

Related

How to create xhtml files in an Eclipse plug-in?

Hello I´m writting a plug-in for Eclipse and part of the work of its work is to add new xhtml files in a JSF project.
I wonder what's the best way here. Is there a particular and recommended API for this case or I just have to treat this kind of file as a non-particular one and handle all the contents by myself?
There are a number of model-to-text generators (one of them mine, in full disclosure) that can help you if there is some boilerplate you want generated around your structure. An example of this is how Eclipse can generate getters and setters for Java instance variables.
If you're not familiar with those. though, or if the file content is relatively simple then you might want to just treat your generated xhtml as simple text files and use the basic resource methods to create them.

Talend : Create a Component using java code

I am new in user of Talend open studio
I want to find a way to add component like tinputfile or tligrow without the drag and drop tools , but with java code manually
Help please
Thank you very much
I don't think it's possible (in fact I'm quite sure you cannot).
When you drag&drop components you "generate" Java source code which is compiled later when you build the job (or run in the studio).
How do you expect to change the byte code at run time?
TRF
Yes , it is possible to create your own Talend components . There is a very thorough, multi-part tutorial at http://powerupbi.com/talend/componentCreation_1.html
You can also view the source to existing components for an idea of how they are implemented and setup.
Perhaps study how tInputFile is built , make a copy, and extend to your purpose.
Alternatively, if all the files have the same schema and reside in the same directory, you may not need a custom component . Rather create a prenr job which gets a list of files in the directory and loops through each file name. For each file, it would call a sub job to read and process that file .
I have provided the all the steps by which you can create the custom component with Java only here is the link to my answer
Custom component with dynamic configuration like jira, jdbc or azurestorage in talend

change html output doxygen link

I am working on a project that is heavily documented with doxygen.
In a UI I have a list of all the classes available - I would like to be able to open the right documentation page of the class I select. In order to do that I need an easy to read link, so I can dynamically build it and run it.
Is it there any way I can control the generated link of the html file? Because the ones I have right now are impossible do be built dynamically.
You could use Doxygen's tag file mechanism for that (see GENERATE_TAGFILE in the config file).
A tag file is a reasonably easy to understand and parse XML file that basically lists all symbols in your project, with for each symbol the corresponding (relative) URL to the documentation.
So you could parse the tag file from your UI to resolve the links to the doxygen generated documentation in a robust way.

GWT: How do I read an Excel file that is included in my src folder?

I am trying to program in GWT (using Eclipse and the GWT Designer). I would like to be able to take an Excel file that I have already imported into my source folder, read it, and process the data. The data will be both text and numbers, but I am comfortable doing the conversions from String to other types.
I have seen something about RequestBuilder, but I'm not sure how to use that to read Excel. Or, is there another/better way to do this?
I am willing to convert the Excel file into something like a CSV is that is necessary.
You'll probably want to do the processing in your servlet with something like
http://jexcelapi.sourceforge.net/
or
http://poi.apache.org/
I am not sure if this is clear enough to you, but it is not possible to process the excel file in GWT at least not directly.
You have to process it on the backend/server.
It can't be done on he client side because even if you put the excel file in your source folder it is not available to the GWT compiled javascript code on the client machine.
If you use Java on your backend/server you can use one of the libraries danb suggested to process it on the server and then use RequestFactory or RPC to transmit it to the client/browser for further processing/displaying.

Launching a GWT module when clicking on an XML

Greetings,
I'm looking for a way to launch a GWT module when a user clicks on an XML file and have the module consume the xml data. Ideally I would like to render the XML in a rich manner and would prefer to use GWT controls instead of having to lay it out by hand via xslt + javascript.
I'm supposing one way would be to point the xml to a well known xslt that creates a simple html page that forces a redirect to the gwt module but how would I transfer the xml data to said module to allow for enhanced formatting?
Another way would be to have the process that produces the xml also include the bootstrap gwt module but it would be creating multiple bootstrap instances over time and pollute the user's directory.
The use case is that a user would run this app on their local machine which outputs an XML file. If they try and view the xml file in a browser, I'd like to have the GWT module take over and present the data accordingly. I would rather they not have to go to a page and upload the data manually.
Appreciate any ideas on the matter.
TIA
If it's something that runs on the user's machine, I would recommend to ship an executable, or generate a parallel HTML file to present the data. JavaScript run from file:/// will not be able to acces the filesystem.