Create Scalding Source like TextLine that combines multiple files into single mappers - scala

We have many small files that need combining. In Scalding you can use TextLine to read files as text lines. The problem is we get 1 mapper per file, but we want to combine multiple files so that they are processed by 1 mapper.
I understand we need to change the input format to an implementation of CombineFileInputFormat, and this may involve using cascadings CombinedHfs. We cannot work out how to do this, but it should be just a handful of lines of code to define our own Scalding source called, say, CombineTextLine.
Many thanks to anyone who can provide the code to do this.
As a side question, we have some data that is in s3, it would be great if the solution given works for s3 files - I guess it depends on whether CombineFileInputFormat or CombinedHfs works for s3.

You get the idea in your question, so here is what possibly is a solution for you.
Create your own input format that extends the CombineFileInputFormat and uses your own custom RecordReader. I am showing you Java code, but you could easily convert it to scala if you want.
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.LineRecordReader;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
import org.apache.hadoop.mapred.lib.CombineFileRecordReader;
import org.apache.hadoop.mapred.lib.CombineFileSplit;
public class CombinedInputFormat<K, V> extends CombineFileInputFormat<K, V> {
public static class MyKeyValueLineRecordReader implements RecordReader<LongWritable,Text> {
private final RecordReader<LongWritable,Text> delegate;
public MyKeyValueLineRecordReader(CombineFileSplit split, Configuration conf, Reporter reporter, Integer idx) throws IOException {
FileSplit fileSplit = new FileSplit(split.getPath(idx), split.getOffset(idx), split.getLength(idx), split.getLocations());
delegate = new LineRecordReader(conf, fileSplit);
}
#Override
public boolean next(LongWritable key, Text value) throws IOException {
return delegate.next(key, value);
}
#Override
public LongWritable createKey() {
return delegate.createKey();
}
#Override
public Text createValue() {
return delegate.createValue();
}
#Override
public long getPos() throws IOException {
return delegate.getPos();
}
#Override
public void close() throws IOException {
delegate.close();
}
#Override
public float getProgress() throws IOException {
return delegate.getProgress();
}
}
#Override
public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new CombineFileRecordReader(job, (CombineFileSplit) split, reporter, (Class) MyKeyValueLineRecordReader.class);
}
}
Then you need to extend the TextLine class and make it use your own input format you just defined (Scala code from now on).
import cascading.scheme.hadoop.TextLine
import cascading.flow.FlowProcess
import org.apache.hadoop.mapred.{OutputCollector, RecordReader, JobConf}
import cascading.tap.Tap
import com.twitter.scalding.{FixedPathSource, TextLineScheme}
import cascading.scheme.Scheme
class CombineFileTextLine extends TextLine{
override def sourceConfInit(flowProcess: FlowProcess[JobConf], tap: Tap[JobConf, RecordReader[_, _], OutputCollector[_, _]], conf: JobConf) {
super.sourceConfInit(flowProcess, tap, conf)
conf.setInputFormat(classOf[CombinedInputFormat[String, String]])
}
}
Create a scheme for the for your combined input.
trait CombineFileTextLineScheme extends TextLineScheme{
override def hdfsScheme = new CombineFileTextLine().asInstanceOf[Scheme[JobConf,RecordReader[_,_],OutputCollector[_,_],_,_]]
}
Finally, create your source class:
case class CombineFileMultipleTextLine(p : String*) extends FixedPathSource(p :_*) with CombineFileTextLineScheme
If you want to use a single path instead of multiple ones, the change to your source class is trivial.
I hope that helps.

this should do the trick, ya man? - https://wiki.apache.org/hadoop/HowManyMapsAndReduces

Related

#DynamicPropertySource not being invoked (Kotlin, Spring Boot and TestContainers)

I'm trying to define a #TestConfiguration class that is executed once before all integration tests to run a MongoDB TestContainer in Kotlin in a Spring Boot project.
Here is the code:
import org.springframework.boot.test.context.TestConfiguration
import org.springframework.test.context.DynamicPropertyRegistry
import org.springframework.test.context.DynamicPropertySource
import org.testcontainers.containers.MongoDBContainer
import org.testcontainers.utility.DockerImageName
#TestConfiguration
class TestContainerMongoConfig {
companion object {
#JvmStatic
private val MONGO_CONTAINER: MongoDBContainer = MongoDBContainer(DockerImageName.parse("mongo").withTag("latest")).withReuse(true)
#JvmStatic
#DynamicPropertySource
private fun emulatorProperties(registry: DynamicPropertyRegistry) {
registry.add("spring.data.mongodb.uri", MONGO_CONTAINER::getReplicaSetUrl)
}
init { MONGO_CONTAINER.start() }
}
}
The issue seems to be that emulatorProperties method is not being called.
The regular flow should be that the container is started and then the properties are set.
The first step happens, the second does not.
I know there is an alternative for which I can do this configuration in each functional test class but I don't like it as it adds not needed noise to the test class.
For example, with a Java project that uses Postgres I managed to make it work with the following code:
import javax.sql.DataSource;
import org.springframework.boot.jdbc.DataSourceBuilder;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.context.annotation.Bean;
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.utility.DockerImageName;
#TestConfiguration
public class PostgresqlTestContainersConfig {
static final PostgreSQLContainer POSTGRES_CONTAINER;
private final static DockerImageName IMAGE = DockerImageName.parse("postgres").withTag("latest");
static {
POSTGRES_CONTAINER = new PostgreSQLContainer(IMAGE);
POSTGRES_CONTAINER.start();
}
#Bean
DataSource dataSource() {
return DataSourceBuilder.create()
.username(POSTGRES_CONTAINER.getUsername())
.password(POSTGRES_CONTAINER.getPassword())
.driverClassName(POSTGRES_CONTAINER.getDriverClassName())
.url(POSTGRES_CONTAINER.getJdbcUrl())
.build();
}
}
I'm trying to achieve the same thing but in Kotlin and using MongoDB.
Any idea on what may be the issue causing the #DynamicPropertySource not being called?
#DynamicPropertySource is part of the Spring-Boot context lifecycle. Since you want to replicate the Java setup in a way, it is not required to use #DynamicPropertySource. Instead you can follow the Singleton Container Pattern, and replicate it in Kotlin as well.
Instead of setting the config on the registry, you can set them as a System property and Spring Autoconfig will pick it up:
init {
MONGO_CONTAINER.start()
System.setProperty("spring.data.mongodb.uri", MONGO_CONTAINER.getReplicaSetUrl());
}
I was able to resolve similar problem in Groovy by:
Having static method annotated with #DynamicPropetySource directly in the test class (probably it would also work in superclass.
But I didn't want to copy the code into every test class that needs MongoDB.
I resolved the issue by using ApplicationContexInitializer
The example is written in groovy
class MongoTestContainer implements ApplicationContextInitializer<ConfigurableApplicationContext>{
static final MongoDBContainer mongoDBContainer = new MongoDBContainer(DockerImageName.parse("mongo:6.0.2"))
#Override
void initialize(ConfigurableApplicationContext applicationContext) {
mongoDBContainer.start()
def testValues = TestPropertyValues.of("spring.data.mongodb.uri="+ mongoDBContainer.getReplicaSetUrl())
testValues.applyTo(applicationContext.getEnvironment())
}
}
To make it complete, in the test class, you just need to add #ContextConfiguration(initializers = MongoTestContainer) to activate context initializer for the test.
For this you could also create custom annotation which would combine #DataMongoTest with previous annotation.
This solution works for me.
Method with #DynamicPropertySource is inside companion object(also added #JvmStatic) and added org.testcontainers.junit.jupiter.Testcontainers on the test class
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.extension.ExtendWith
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.boot.jdbc.DataSourceBuilder
import org.springframework.boot.test.context.TestConfiguration
import org.springframework.context.annotation.Bean
import org.springframework.test.context.ContextConfiguration
import org.springframework.test.context.DynamicPropertyRegistry
import org.springframework.test.context.DynamicPropertySource
import org.springframework.test.context.junit.jupiter.SpringExtension
import org.testcontainers.containers.PostgreSQLContainer
import org.testcontainers.junit.jupiter.Container
import org.testcontainers.junit.jupiter.Testcontainers
import javax.sql.DataSource
#ExtendWith(SpringExtension::class)
#Testcontainers
#TestConfiguration
#ContextConfiguration(classes = [PostgresqlTestContainersConfig::class])
class PostgresqlTestContainersConfig {
#Autowired
var dataSource: DataSource? = null
#Test
internal fun name() {
dataSource!!.connection.close()
}
#Bean
fun dataSource(): DataSource? {
return DataSourceBuilder.create()
.username(POSTGRES_CONTAINER.getUsername())
.password(POSTGRES_CONTAINER.getPassword())
.driverClassName(POSTGRES_CONTAINER.getDriverClassName())
.url(POSTGRES_CONTAINER.getJdbcUrl())
.build()
}
companion object {
#JvmStatic
#Container
private val POSTGRES_CONTAINER: PostgreSQLContainer<*> = PostgreSQLContainer("postgres:9.6.12")
.withDatabaseName("integration-tests-db")
.withUsername("sa")
.withPassword("sa")
#JvmStatic
#DynamicPropertySource
fun postgreSQLProperties(registry: DynamicPropertyRegistry) {
registry.add("db.url") { POSTGRES_CONTAINER.jdbcUrl }
registry.add("db.user") { POSTGRES_CONTAINER.username }
registry.add("db.password") { POSTGRES_CONTAINER.password }
}
}
}

Only apply modifyResponseBody for certain content-type

I am using GatewayFilterSpec.modifyResponseBody (marked as a "BETA" feature) to rewrite JSON payloads. This works well as long as the response payloads are in fact of content-type application/json. In my case, that is unfortunately not always guaranteed, and I would like it to only apply the modifyResponseBody if the reponse has the Content-Type: application/json header, else skip the filter. Is this possible with Spring Cloud Gateway, and how to do this? Thank you.
Now I'm getting this:
org.springframework.web.reactive.function.UnsupportedMediaTypeException: Content type 'text/html' not supported
at org.springframework.web.reactive.function.BodyInserters.lambda$null$11(BodyInserters.java:329)
at java.util.Optional.orElseGet(Optional.java:267)
at org.springframework.web.reactive.function.BodyInserters.lambda$bodyInserterFor$12(BodyInserters.java:325)
Here is a "solution", one that has all sorts of problems:
package my_package;
import org.reactivestreams.Publisher;
import org.springframework.cloud.gateway.filter.GatewayFilter;
import org.springframework.cloud.gateway.filter.GatewayFilterChain;
import org.springframework.cloud.gateway.filter.factory.rewrite.ModifyResponseBodyGatewayFilterFactory;
import org.springframework.context.annotation.Primary;
import org.springframework.core.io.buffer.DataBuffer;
import org.springframework.http.codec.ServerCodecConfigurer;
import org.springframework.http.server.reactive.ServerHttpResponse;
import org.springframework.http.server.reactive.ServerHttpResponseDecorator;
import org.springframework.stereotype.Component;
import org.springframework.web.server.ServerWebExchange;
import reactor.core.publisher.Mono;
import static org.springframework.http.MediaType.APPLICATION_JSON;
#Component
#Primary
public class JsonOnlyModifyResponseBodyGatewayFilterFactory extends ModifyResponseBodyGatewayFilterFactory {
public JsonOnlyModifyResponseBodyGatewayFilterFactory(ServerCodecConfigurer codecConfigurer) {
super(codecConfigurer);
}
#Override
public GatewayFilter apply(Config config) {
return new MyModifyResponseGatewayFilter(config);
}
public class MyModifyResponseGatewayFilter extends ModifyResponseGatewayFilter {
MyModifyResponseGatewayFilter(Config config) {
super(config);
}
#Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
ServerHttpResponse serverHttpResponse = getServerHttpResponseFromSuper(exchange);
ServerHttpResponseDecorator responseDecorator = new ServerHttpResponseDecorator(exchange.getResponse()) {
#Override
public Mono<Void> writeWith(Publisher<? extends DataBuffer> body) {
if (APPLICATION_JSON.isCompatibleWith(getDelegate().getHeaders().getContentType())) {
return serverHttpResponse.writeWith(body);
}
return super.writeWith(body);
}
};
return chain.filter(exchange.mutate().response(responseDecorator).build());
}
private ServerHttpResponse getServerHttpResponseFromSuper(ServerWebExchange exchange) {
ServerHttpResponse[] serverHttpResponse = new ServerHttpResponse[1];
//noinspection UnassignedFluxMonoInstance
super.filter(exchange, chain -> {
serverHttpResponse[0] = chain.getResponse(); // capture the response when the super sets it
return null;
});
return serverHttpResponse[0];
}
}
}
The chosen approach is in lieu of just changing a copy of the existing ModifyResponseBodyGatewayFilterFactory. This allows version upgrades of Spring Boot Gateway to bring in minor changes of ModifyResponseBodyGatewayFilterFactory. But since JsonOnlyModifyResponseBodyGatewayFilterFactory is very dependent on the implementation of ModifyResponseBodyGatewayFilterFactory, this may easily get broken. Another flaw of this solution is that I had to put an #Primary annotation to avoid a required a single bean, but 2 were found exception, but it overrides the default which would presumably affect other uses of modifyResponseBody. It's ugly to call super.filter and not use its result. And so on. So, while this "works", it doesn't, well, fill me with joy.

Costum Event Handling

I issue I am trying to get rid off is the following:
I intend to setup a costum event handling chain as a workaround for JavaFX's lack of actioncommands.
The issue in particular is, that a menuitem upon clicking it, still fires an ActionEvent instead of the self-written MilvaLabActionEvent.
The code:
Event class
package jpt.gui.items;
import javafx.event.ActionEvent;
public class MilvaLabActionEvent extends ActionEvent {
private static final long serialVersionUID = 6757067652205246280L;
private String actionCommand ="";
public MilvaLabActionEvent(String actionCommand2) {
setActionCommand(actionCommand2);
}
public MilvaLabActionEvent() {}
public String getActionCommand() {
return actionCommand;
}
public void setActionCommand(String actioncommand) {
this.actionCommand = actioncommand;
}
}
My EventHandler:
package jpt.gui.items;
import javafx.event.EventHandler;
import jpt.MilvaLabGlobal;
import jpt.MilvaLabKonst;
import jpt.handle.MilvaLabDateiHandle;
import jpt.handle.MilvaLabEinHandle;
import jpt.handle.MilvaLabHilfeHandle;
import jpt.handle.MilvaLabMilvaHandle;
import jpt.handle.MilvaLabRvAnwendungHandle;
import jpt.handle.MilvaLabrvTextHandle;
import jpt.log4j.MilvaLabLogger;
public class MilvaLabEventHandler implements EventHandler<MilvaLabActionEvent>{
#Override
public void handle(MilvaLabActionEvent event) {
// the command string of the menu item
final String sCmd = event.getActionCommand();
if (sCmd.charAt(0) == 'M')
{//doing something here
}
}
The costum MenuItem-Class I figured out I gotta write.
package jpt.gui.items;
import javafx.event.Event;
import javafx.scene.control.MenuItem;
public class MilvaLabMenuItem extends MenuItem {
private String actionCommand;
public MilvaLabMenuItem(String sText) {
this.setText(sText);
}
#Override
public void fire() {
Event.fireEvent(this, new MilvaLabActionEvent(getActionCommand()));
}
public String getActionCommand() {
return actionCommand;
}
public void setActionCommand(String actionCommand) {
this.actionCommand = actionCommand;
}
}
And the initialization of the costum MenuItem:
final MilvaLabMenuItem jmi = new MilvaLabMenuItem("I am a menuItem");
jmi.addEventHandler(evtype, new MilvaLabEventHandler());
jmi.setOnAction((event) -> {
System.out.print("I have fired an ActionEvent!");
});
Well, as of now, I got "I have fired an ActionEvent" when I click on the MilvaLabMenuItem, nothing else happens. (Looked into that thing already using the debugger).
What I want to happen is that, obviously, the MilvaLabEventHandler is called.
I figured it out again.
I declared two EventTypes, though, only one was necessary.
This helped me finding the solution, though, they use Nodes instead of MenuItems.
How to emit and handle custom events?

Getting user data in NewProjectCreationPage in Eclipse Plugin

I have been successful in making a plugin. However now i need that on project creation page i add some more textboxes to get the user information. Also i need to use this information to add into the auto generated .php files made in project directory.
I want to know how can i override the WizardNewProjectCreationPage to add some more textboxes to the already given layout. I am pretty new to plugin development. Here is the code for my custom wizard.
import java.net.URI;
import org.eclipse.core.runtime.CoreException;
import org.eclipse.core.runtime.IConfigurationElement;
import org.eclipse.core.runtime.IExecutableExtension;
import org.eclipse.jface.viewers.IStructuredSelection;
import org.eclipse.jface.wizard.Wizard;
import org.eclipse.jface.wizard.WizardDialog;
import org.eclipse.ui.INewWizard;
import org.eclipse.ui.IWorkbench;
import org.eclipse.ui.dialogs.WizardNewProjectCreationPage;
import org.eclipse.ui.wizards.newresource.BasicNewProjectResourceWizard;
import rudraxplugin.pages.MyPageOne;
import rudraxplugin.projects.RudraxSupport;
public class CustomProjectNewWizard extends Wizard implements INewWizard, IExecutableExtension {
private WizardNewProjectCreationPage _pageOne;
protected MyPageOne one;
private IConfigurationElement _configurationElement;
public CustomProjectNewWizard() {
// TODO Auto-generated constructor stub
setWindowTitle("RudraX");
}
#Override
public void init(IWorkbench workbench, IStructuredSelection selection) {
// TODO Auto-generated method stub
}
#Override
public void addPages() {
super.addPages();
_pageOne = new WizardNewProjectCreationPage("From Scratch Project Wizard");
_pageOne.setTitle("From Scratch Project");
_pageOne.setDescription("Create something from scratch.");
addPage(one);
addPage(_pageOne);
}
#Override
public boolean performFinish() {
String name = _pageOne.getProjectName();
URI location = null;
if (!_pageOne.useDefaults()) {
location = _pageOne.getLocationURI();
System.err.println("location: " + location.toString()); //$NON-NLS-1$
} // else location == null
RudraxSupport.createProject(name, location);
// Add this
BasicNewProjectResourceWizard.updatePerspective(_configurationElement);
return true;
}
#Override
public void setInitializationData(IConfigurationElement config,
String propertyName, Object data) throws CoreException {
_configurationElement = config;
// TODO Auto-generated method stub
}
}
Ask for any other code required. Any help is appreciated. Thank You.
Instead of using WizardNewProjectCreationPage directly create a new class extending WizardNewProjectCreationPage and override the createControl method to create new controls:
class MyNewProjectCreationPage extends WizardNewProjectCreationPage
{
#Override
public void createControl(Composite parent)
{
super.createControl(parent);
Composite body = (Composite)getControl();
... create new controls here
}
}

Reusable Liferay (6.0.6) service

I am trying to implement resuable Custom Services without using ext and servicebuilder.
I referred this article: http://www.devatwork.nl/2010/04/implementing-a-reusable-liferay-service-without-ext-or-service-builder/ , but I am confused in how should I implement this using eclipse? Following are the steps that I followed to do this:
- Created liferay-plugin project within eclipse.
- Created package containing CustomServices (interface) and CustomServicesUtil.
- Created jar file of package in step 2.
- Placed that jar file in tomcat\lib\ext\
- Then created package (with in same liferay-plugin project), that includes CutomServicesImpl and CustomServicesBaseImpl
- Defined portlet-spring.xml, service.properties, and modified web.xml (as per the article), and finally deployed the project.
On deployment, project is deployed successfully, but when I am trying to use customMethods defined in CustomServicesImpl through CustomServicesUtil.getCustomMethod(), I am getting the following error:
"java.lang.ClassNotFoundException: com.demo.custom.services.CustomServicesUtil"
I configure build path to include customservices.jar file but its not working out, still showing the same error. I don’t know whether this is the correct way to implement resuable services or not. I tried this so that i can make use of custom method in one of my project.
Here is the code for custom services:
CustomServices.java
package com.demo.custom.services;
import com.liferay.portal.model.User;
public interface CustomServices {
String getCustomName(User user);
}
CustomServicesUtil.java
package com.demo.custom.services;
import com.liferay.portal.model.User;
public class CustomServicesUtil {
private static CustomServices services;
public static CustomServices getServices() {
if (services == null) {
throw new RuntimeException("Custom Services not set");
}
return services;
}
public void setServices(CustomServices pServices) {
services = pServices;
}
public static String getCustomName(User user){
return getServices().getCustomName(user);
}
}
CustomServicesBaseImpl.java
package com.demo.custom.services.impl;
import com.demo.custom.services.CustomServices;
import com.liferay.portal.kernel.exception.SystemException;
import com.liferay.portal.service.base.PrincipalBean;
import com.liferay.portal.util.PortalUtil;
public abstract class CustomServicesBaseImpl extends PrincipalBean implements CustomServices {
protected CustomServices services;
public CustomServices getServices() {
return services;
}
public void setServices(CustomServices pServices) {
this.services = pServices;
}
protected void runSQL(String sql) throws SystemException {
try {
PortalUtil.runSQL(sql);
} catch (Exception e) {
throw new SystemException(e);
}
}
}
CustomServicesImpl.java
package com.demo.custom.services.impl;
import com.liferay.portal.model.User;
public class CustomServicesImpl extends CustomServicesBaseImpl {
#Override
public String getCustomName(User user) {
// TODO Auto-generated method stub
if(user == null){
return null;
}else{
return new StringBuffer().append(user.getFirstName()).append(" ").append(user.getLastName()).toString();
}
}
}
Here is the code of controller class of my another portlet, where i am making use of this service.
HelloCustomName.java
package com.test;
import java.io.IOException;
import javax.portlet.PortletException;
import javax.portlet.RenderRequest;
import javax.portlet.RenderResponse;
import com.demo.custom.services.CustomServicesUtil;
import com.liferay.portal.kernel.util.WebKeys;
import com.liferay.portal.model.User;
import com.liferay.portal.theme.ThemeDisplay;
import com.liferay.util.bridges.mvc.MVCPortlet;
public class HelloCustomName extends MVCPortlet {
#Override
public void doView(RenderRequest renderRequest,
RenderResponse renderResponse) throws IOException, PortletException {
System.out.println("--doview----");
ThemeDisplay themeDisplay = (ThemeDisplay)renderRequest.getAttribute(WebKeys.THEME_DISPLAY);
User user = themeDisplay.getUser();
String customName = CustomServicesUtil.getCustomName(user); //getting error here
System.out.println("customName:" + customName);
}
}
Please point me on how to implement resuable services? Any guidance will be really useful.
Thanks.
My mind, you don't need the complexity of services. Simply make utility classes and put this in to tomcat/lib/ext. Be sure that tomcat/lib/ext is correct configured in tomcat/conf/catalina.properties, something like this:
common.loader=${catalina.home}/lib/ext/*.jar