How to use the sqoop generated class in MapReduce? - class

A sqoop query generates a java file that contains a class that contains the code to get access in mapreduce to the columns data for each row. (the Sqoop import was done in text without the --as-sequencefile option, and with 1 line per record and commas between the columns)
But how do we actually use it?
I found a public method parse() in this class that takes Text as an input and populates all the members of the class , so to practice I modified the wordcount application to convert a line of text from the TextInputFormat in the mapper into an instnace of the class generated by sqoop. But that causes an "unreported exception.com.cloudera.sqoop.lib.RecordParser.ParseError; must be caught or declared to be thrown" when I call the parse() method.
Can it be done this way or is a custom InputFormat necessary to populate the class with the data from each record ?

Ok this seems obvious once you find out but as a java beginner this can take time.
First configure your project:
just add the sqoop generated .java file in your source folder.
I use eclipse to import it in my class source folder.
Then just make sure you configured your project's java build path correctly:
Add the following jar files in the project's properties/java build path/libraries/add external jar:
(for hadoop cdh4+) :
/usr/lib/hadoop/hadoop-common.jar
/usr/lib/hadoop-[version]-mapreduce/hadoop-core.jar
/usr/lib/sqoop/sqoop-[sqoop-version]-cdh[cdh-version].jar
Then adapt your mapreduce source code:
First configure it:
public int run(String [] args) throws exception
{
Job job = new Job(getConf());
job.setJarByClass(YourClass.class);
job.setMapperClass(SqoopImportMap.class);
job.setReducerClass(SqoopImprtReduce.class);
FileInputFormat.addInputPath((job,"hdfs_path_to_your_sqoop_imported_file"));
FileOutputFormat.setOutputPath((job,"hdfs_output_path"));
// I simply use text as output for the mapper but it can be any class you designed
// as long as you implement it as a Writable
job.setMapOutputKeyClass(Text.Class);
job.setMapOutputValueClass(Text.Class);
job.setOutputKeyClass(Text.Class);
job.setOutputValueClass(Text.Class);
...
Now configure your mapper class.
Let's assume your sqoop imported java file is called Sqimp.java:
and the table you imported had the following columns: id, name, age
your mapper class should look like this:
public static class SqoopImportMap
extends Mapper<LongWritable, Text, Text, Text>
{
public void map(LongWritable k, Text v, Context context)
{
Sqimp s = new Sqimp();
try
{
// this is where the code generated by sqoop is used.
// it automatically casts one line of the imported data into an instance of the generated class,
// to let you access the data inside the columns easily
s.parse(v);
}
catch(ParseError pe) {// do something if there is an error.}
try
{
// now the imported data is accessible:
// e.g
if (s.age>30)
{
// submit the selected data to the mapper's output as a key value pair.
context.write(new Text(s.age),new Text(s.id));
}
}
catch(Exception ex)
{//do something about the error}
}
}

Related

RepositoryItem from List&Label .lst file

Currently we are using a WPF application for creation/editing of List&Label Templates, but we are considering to move to the WebDesigner. Because we use project includes we need to use the repository mode.
I've been trying to import our existing templates, but I run into some issues regarding the RepositoryItemDescriptor. To create a RepositoryItem object you have to give a Descriptor in the constructor, but I cannot find any info regarding how you get it from the generated .lst file.
The data that we have at our disposal are:
TemplateType: List or Form
TemplateData: content of the .lst file (byte[])
IsMainTemplate: bool, is a "project include" or not
File name: name of the .lst file
The RepositoryItem constructor requires: string internalID, string descriptor, string type, DateTime lastModificationUTC.
What I have now is:
public class TemplateBaseModel : RepositoryItem
{
// Properties
// we have our own Ids and modification date, override RepositoryItem properties
public new InternalID => $"repository://{{{Id}}}";
public DateTime LastModificationUTC => ModifiedOn;
public TemplateBaseModel() : base($"repository://{{{Guid.NewGuid()}}}", /* ?? */, RepositoryItemType.ProjectList.Value, DateTime.Now) { }
public TemplateBaseModel(string internalID, string descriptor, string type, DateTime lastModificationUTC) : base(internalID, descriptor, type, lastModificationUTC) { }
}
In the documentation I can only find what it is (internal metadata that is serialized into a string, and can be edited with the class RepositoryItemDescriptor), but not how it's created or how you can get it, and if I try to debug the example I get (in the CreateOrUpdate() method)#2#PgUAAENoS19QYWNrZWQAeNqd1E1PE1EYxfHfmsTvMAyJEeLY8iKCtpChU5MmvAiOC2NcjDCYmqFtZkaEqF9dXThgsTVGt/fm+Z9zz3lyv3/r2HXlQiFwKVeqDI2NdIVWPdIWCuRGTo2dGRp5ryv0Suq5yKpNoUCllhk5kymMjeS6QtdyldCuHfcs6FgUiQQSqUQgEk3dJY70pF57oS8wURo7N1TIBd64Z0GgY1HfodRA6rXAqVIgdN+SK21tbZlnt4o9J41W2OjNo9Qy72Y421OcVGzvD6R9fQcNcdb7A4WhSm3FQ4GhWu7CimUrt6T5rJvJacruHcruHEosldo38PI3ykjmQi7Qk4ilYoElJ/qOvTJwoi+Z4s33daMeeGDJiyna8szs725+zf6vmz8Tf+71U5WJzGmT/5ncucxHhdoXE6VcJVe6lFsWCGdOQzsCb+ds8I3T6R2+2/qv/ZjNvit0IjcxVhmqjZWuDZpXhHfanE2rKzSQCO0o53Ceamn5rGdTrC3Ws6YtkuiJbYts2LJlXWRbbNWayIbEE7E9sZ4Na9Y91vdVR+vWx9+9pa5NmvwKhVaTzQe5U7WWQqX+R+q+TKV20PxI54ZyZ0I7LmXK5t17PkkcOnSkdKxtT6pwLNbVnava0brt6abP1txGfwD+q8AH, which doesn't help either.
Any idea how to properly create a RepositoryItem from a .lst file? or how to create/get the descriptor?
You should try and use the class RepositoryImportUtil from the combit.ListLabel23.Repository namespace. This helper class does all the hard work for you. Given an IRepositoryinterface and the lst file in place, the required code would be something like
IRepository listLabelRepository = <yourRepository>;
using (ListLabel LL = new ListLabel())
{
LL.FileRepository = listLabelRepository;
using (RepositoryImportUtil importUtil = new RepositoryImportUtil(listLabelRepository))
{
importUtil.ImportProjectFileWithDependencies(LL,
#"<PathToRootProject>");
}
}
If this method is not what your require, the helper class has a couple of other methods as well to help you importing existing projects.

How to get Json Api rendering to work with json views in Grails v3.3.3

I have a simple problem and documentation is not helping me resolve it.
I have created a Grails v3.3.3 demo project - and created a simple domain class called JsonApiBook, with 'name' attribute like this
package ttrestapi
import grails.rest.*
#Resource (uri='/jsonApiBook', formats=['json','xml'])
class JsonApiBook {
static constraints = {
}
String name
}
and marked up the URI as the documentation says the JSON API rendering only works with domain classes (and not a controller class).
In my bootstrap I have saved a instance of book to the tables - and can view that generally.
In my views directory I have a created jsonApiBook folder and created two gson files.
A '_jsonApIBook' template like this
import ttrestapi.JsonApiBook
model {
JsonApiBook book
}
json jsonapi.render(book)
which invokes the jsonapi helper object to render the instance.
I have in the same directory created an index.json like this:
import ttrestapi.Book
model {
List<Book> bookList
}
// We can use template namespace
// method with a Collection.
json tmpl.book(bookList)
When I run the app and use postman or browser to render then I get a result but its Json api compliant (I think it's ignored the template).
So localhost:8080/jsonApiBook just returns (looks default layout):
[
{
"id": 1,
"name": "json api book3"
}
]
and localhost:8080/jsonApiBook/1 just returns 'null' which can't be right.
How should I be setting up the json views for rendering JSON API compliant output? As this doesn't appear to work correctly.
build.gradle
buildscript {
....
dependencies {
........
classpath "org.grails.plugins:views-gradle:1.2.7"
}
}
--
apply plugin: "org.grails.grails-web"
apply plugin: "org.grails.plugins.views-json"
dependencies {
. . .
compile "org.grails.plugins:views-json:1.2.7"
. . .
}
Domain JsonApiBook.groovy
import grails.rest.Resource
#Resource (uri='/jsonApiBook', formats=['json','xml'])
class JsonApiBook {
String name
static constraints = {
}
}
Bootstrap.groovy
class BootStrap {
def init = { servletContext ->
new JsonApiBook(name: 'first').save(flush:true)
new JsonApiBook(name: 'second').save(flush:true)
new JsonApiBook(name: 'third').save(flush:true)
new JsonApiBook(name: 'fourth').save(flush:true)
new JsonApiBook(name: 'fifth').save(flush:true)
}
def destroy = {
}
}
Created folder under view called jsonApiBook
Created template named _jsonApiBook.gson in jsonApiBook folder
model {
JsonApiBook jsonApiBook
}
json {
name jsonApiBook.name
}
created show.gson under same folder
model {
JsonApiBook jsonApiBook
}
json g.render(template:"jsonApiBook", model:[jsonApiBook:jsonApiBook])
When i run http://localhost:8080/jsonApiBook i get bellow:
When i run http://localhost:8080/jsonApiBook/1 i get bellow:
Note: I used grails 3.3.3 with h2 memory DB
Reference
Hope this helps you
ok - got to similar place today on the train. Essentially the convention over configuration is core to whats happening here.
First the #Resource annotation generates a default RestfulController for you. In this approach the default base template _resourceClassName.gson expects the model variable to have the same name as the resource type so my original example instead of 'book'
import ttrestapi.JsonApiBook
model {
JsonApiBook book
}
json jsonapi.render(book)
it should really read as (following convention)
import ttrestapi.JsonApiBook
// variable should be same name as the Class name starting with lowercase
// as default (it can be different but the caller has to change how the
// the template parameter is invoked
model {
JsonApiBook jsonApiBook
}
json jsonapi.render(jsonApiBook)
Then the index.gson should have read as modified below
import ttrestapi.JsonBookApi
//note although not obvious in the written docs which use the show command, the
// default expected model variable is <resourceClass>List
model {
List<JsonBookApi> jsonBookApiList
}
// We can use template namespace
// method with a Collection.
json tmpl.jsonBookApi (jsonBookApiList )
If you want to use another variable name then in the base template you'd have to declare that name as map when calling the base template, from the index.gson . e.g. say the variable name in the base template was
model {
JsonBookApi myBook...
then when calling this template from my index.gson you would put something like this
...
model {
List<JsonBookApi> jsonBookApiList
}
json tmpl.jsonBookApi ("myBook", jsonBookApiList )
this invokes the correct template _jsonBookApi, but takes the model variable default in the index.gson and forces it to bind the value of jsonBookApiList to the myBook variable in the base template (_jsonBookApi.gson).
With the default generation of a controller, using #Resource annotation, the model variable will always be 'resourceClassName'List
I think the only way to change that is not to use the #Resource annotation on your domain class, but to use the URL mappings configuration to map your uri to a controller, and then you have to create a controller yourself by hand and ensure you extend from RestfulController. doing this you can override the default model variable name by implementing an overidden 'index()' method and ensuring you explicitly name the model variable you want, and ensure that the index.gson model variable is exactly the same as that set in your controller.
however the key point was I was not following the core convention defaults so the code as originally built couldn't work and returned null.
when you start out the documentation isn't absolutely clear what bits are part of the convention, and in the examples (which use show.gson) don't tell you what the model variable default name will be for the index.gson (add List to end) so its quite easy to get lost

Acceleo M2T - Write timestamp into a generated file

I am generating some files by using different Acceleo templates defined into a *.mtl file.
At the top op these files I need to write something like:
#-----------------------------------------------------------------------------
# Project automatically generated by XXX at (add timestamp here)
#-----------------------------------------------------------------------------
How could I generate this timestamp dynamically each time I generate the files?
Thanks!
Edit: I solved this as described below.
Just after the module declaration, add query declarations:
[module generate('platform:/resource/qt48_model/qt48_xmlschema.xsd') ]
[comment get timestamp/]
[query public getCurrentTime(c : OclAny) : String =
invoke('org.eclipse.acceleo.qt_test_api.generator.common.GenerationSupport', 'getCurrentTime()', Sequence{}) /]
Then, create a class called GenerationSupport and add a method called getCurrentTime():
package org.eclipse.acceleo.qt_test_api.generator.common;
import java.sql.Timestamp;
public class GenerationSupport {
public String getCurrentTime(){
java.util.Date date = new java.util.Date();
Timestamp ts = new Timestamp(date.getTime());
return ts.toString();
}}
try something like this:
[query public getCurrentTime(traceabilityContext : OclAny):
String = invoke('yourPackage.YourJavaClass', 'getCurrentTime()', Sequence{})
/]
And in your Java class, declare a method with this functionality:
public String getCurrentTime(){
return customDate;
}
Where "customDate" should be a String in your custom format:
new Date().toString(), use of formats mm/dd/yyyy or whatever you want.
Please, don't forget to add the package which contains this Java class to export packages in MANIFEST.MF
Good luck!
You'll have to use what's called a "service".
It's basically just a public method in a class that will return the date as a String, formatted the way you want.
Lookt at the acceleo tutorials to see how services are used, everything is there.

GXT Grid export to excel file

I am using GXT.
I have a Grid.
com.sencha.gxt.widget.core.client.grid.Grid<RiaBean> grid;
I want to export this to excel file.
i dont want to use external Jar files.
Can any body Help.
Your Grid is composed of ColumnConfig<RiaBean,?>.
Every ColumnConfig<RiaBean,?> is linked to a ValueProvider<RiaBean,?>. Every ValueProvider<RiaBean,?> contains a methodgetPath() which is intended to return the path of the displayed elements.
Hence, you can easily get the paths of your displayed elements, send them to the server and get back the value by Introspection or EL.
For example, let's take this class
public class RIABean{
private String a;
private Integer b;
private Boolean c;
private Integer idFoo;
}
Use an interface which extends PropertyAccess to easily define your ValueProviders. It will also generate the methods getPath() with the accurate value.
public interface RIABeanPropertyAccess extends PropertyAccess<RIABean>{
//The generated getPath() method returns "a"
ValueProvider<RIABean,String> a();
//The generated getPath() method returns "b"
ValueProvider<RIABean,Integer> b();
//The generated getPath() method returns "c"
ValueProvider<RIABean, Boolean> c();
//The generated getPath() method returns "foo.id"
#Path("foo.id")
ValueProvider<RIABean, Integer> idFoo();
}
Create the ColumnModel for your grid:
RIABeanPropertyAccess pa=GWT.create(RIABean.class);
List<ColumnConfig<RIABean,?>> listCols=new ArrayList<ColumnConfig<RIABean,?>>();
listCols.add(new ColumnConfig(pa.a(),100,"Header text for column A");
listCols.add(new ColumnConfig(pa.b(),100,"Header text for column B");
listCols.add(new ColumnConfig(pa.c(),100,"Header text for column C");
ColumnModel colModel=new ColumnModel(listCols);
When the user clicks on "export" button, just iterate on the list of ColumnConfig<RiaBean,?> of your grid in order to get the Path of each of them and send this list of paths to the server. The server might then use introspection/reflexion/EL to get the values corresponding to each path.
There is no way to generate the file on client side. As the server must do it, it is the easiest way I know and that's what we do in my team.
Finally, ask yourself if you really need an excel file or if a csv file would be enough. The csv file can easily be done without any library and can be opened with Excel.

MEF: how to import from an exported object?

I have created a MEF plugin control that I import into my app. Now, I want the plugin to be able to import parts from the app. I can't figure how setup the catalog in the plugin, so that it can find the exports from the app. Can somebody tell me how this is done? Below is my code which doesn't work when I try to create an AssemblyCatalog with the current executing assembly.
[Export(typeof(IPluginControl))]
public partial class MyPluginControl : UserControl, IPluginControl
[Import]
public string Message { get; set; }
public MyPluginControl()
{
InitializeComponent();
Initialize();
}
private void Initialize()
{
AggregateCatalog catalog = new AggregateCatalog();
catalog.Catalogs.Add(new AssemblyCatalog(Assembly.GetExecutingAssembly()));
CompositionContainer container = new CompositionContainer(catalog);
try
{
container.ComposeParts(this);
}
catch (CompositionException ex)
{
Console.WriteLine(ex.ToString());
}
}
}
You don't need to do this.
Just make sure that the catalog you're using when you import this plugin includes the main application's assembly.
When MEF constructs your type in order to export it (to fulfill the IPluginControl import elsewhere), it'll already compose this part for you - and at that point, will import the "Message" string (though, you most likely should assign a name to that "message", or a custom type of some sort - otherwise, it'll just import a string, and you can only use a single "string" export anywhere in your application).
When MEF composes parts, it finds all types matching the specified type (in this case IPluginControl), instantiates a single object, fills any [Import] requirements for that object (which is why you don't need to compose this in your constructor), then assigns it to any objects importing the type.