Scenario:
Spring Batch job with 2 Steps:
Tasklet which downloads csv files (file names unknown before
runtime) to a directory.
Chunk based step with a Reader which needs to read all csv files
Challenge:
Since the file names are unknown, we use PathMatchingResourcePatternResolver.getResources() to get the resources.
The returned resources are always of length 0 since there are no files in the directory at bean initialization.
#Bean
Resource[] resources() throws IOException {
final PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
final Resource[] resources = resolver.getResources("file:" + destinationDir + "/*.csv");
return resources;
}
Any ideas? Thanks in advance!
You can save the names of the files inside the JobExecutionContext while in the tasklet step, and then use those names to initialize your resources in the chunk step.
More details at the Spring Batch Docs: Configuring a Step.
This late binding in Spring Batch is possible due to Step Scope Beans. You can read more about it here.
Related
I have all 4 files needed to read from/write to HDFS in my resources folder and method to create hdfs object is as below .
public static FileSystem getHdfsOnPrem(String coreSiteXml, String hdfsSiteXml, String krb5confLoc, String keyTabLoc){
// Setup the configuration object.
try {
Configuration config = new Configuration();
config.addResource(new org.apache.hadoop.fs.Path(coreSiteXml));
config.addResource(new org.apache.hadoop.fs.Path(hdfsSiteXml));
config.set("hadoop.security.authentication", "Kerberos");
config.addResource(krb5confLoc);
config.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
config.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
System.setProperty("java.security.krb5.conf", krb5confLoc);
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(config);
UserGroupInformation.setConfiguration(config);
UserGroupInformation.loginUserFromKeytab("my_username", keyTabLoc);
return org.apache.hadoop.fs.FileSystem.get(config);
}
catch(Exception ex) {
ex.printStackTrace();
return null;
}
}
It works when I run it in local and pass below as the paths
C:\Users\my_username\IdeaProjects\my_project_name\target\scala-2.12\classes\core-site.xml
C:\Users\my_username\IdeaProjects\my_project_name\target\scala-2.12\classes\hdfs-site.xml
C:\Users\my_username\IdeaProjects\my_project_name\target\scala-2.12\classes\krb5.conf
C:\Users\my_username\IdeaProjects\my_project_name\target\scala-2.12\classes\my_username.user.keytab
It runs fine when I run it in local but when I bundle it as JAR and run it in an env like kubernetes it throws below error (Since bundling as JAR I can read contents of resource files as stream but I need to pass in path for loginuserFromKeytab method)
org.apache.hadoop.security.KerberosAuthException: failure to login: for principal: my_username from keytab file:/opt/spark-3.0.0/jars/foo-project-name!/my_username.user.keytab javax.security.auth.login.LoginException: Unable to obtain password from user
Any suggestions/pointers are appreciated.
I suggest you use Jaas config file instead of writing this code. This helps to remove the security plumbing from your code and externalizes it. "Unable to obtain password " would occur if the user that is running your app doesn't have permission to access the file.
I am using this setup to generate a ddl file:
spring.jpa.properties.javax.persistence.schema-generation.create-source=metadata
spring.jpa.properties.javax.persistence.schema-generation.scripts.action=create
spring.jpa.properties.javax.persistence.schema-generation.scripts.create-target=./ddl/schema.sql
The generation is executed via a specific test in Maven validation phase:
#RunWith(SpringRunner.class)
#DataJpaTest
#AutoConfigureTestDatabase(replace = Replace.NONE)
#TestPropertySource(locations = "/ddl.properties")
public class GenerateDDL {
#Autowired
private EntityManager em;
#Test
public void generateDDL() throws IOException {
em.close();
em.getEntityManagerFactory().close();
}
}
This is working fine, with on problem: the generator does not create a new file but just appends always it's stuff.
Is there a way or setting to let generator always create a new file or clean up the old?
Deleting it within the test would delete it after generation. We also need the file to be published on git thus it is not generated within target.
UPDATE
There seems at least no solution within Hibernate (until Hibernate 6):
https://hibernate.atlassian.net/browse/HHH-11817
Is there a way to hook into Spring context creation - before persistence context is created? There i could delete the file.
I was faced with the same problem, and found the following way to clean up before schema generation: use BeanFactoryPostProcessor. BeanFactoryPostProcessor is called when all bean definitions are loaded, but no beans are instantiated yet.
/**
* Remove auto-generated schema files before JPA generates them.
* Otherwise new content will be appended to the files.
*/
#Configuration
public class SchemaSupport {
#Bean
public static BeanFactoryPostProcessor schemaFilesCleanupPostProcessor() {
return bf -> {
try {
Files.deleteIfExists(Path.of("schema-drop-auto.sql"));
Files.deleteIfExists(Path.of("schema-create-auto.sql"));
} catch (IOException e) {
throw new IllegalStateException(e);
}
};
}
TLDR: Assuming your spring boot project includes a version of Hibernate >= 5.5.3, then to configure the schema file(s) to be overwritten (i.e. disable appending) you would add this entry in your application.properties file:
spring.jpa.properties.hibernate.hbm2ddl.schema-generation.script.append=false
Since Hibernate version 5.5.3 there is a new property available that can be set to disable the appending behaviour. From the Hibernate User Guide:
26.15. Automatic schema generation
hibernate.hbm2ddl.schema-generation.script.append (e.g. true (default value) or false)
For cases where the jakarta.persistence.schema-generation.scripts.action value indicates that schema commands should be written to DDL script file, hibernate.hbm2ddl.schema-generation.script.append specifies if schema commands should be appended to the end of the file rather than written at the beginning of the file. Values are true for appending schema commands to the end of the file, false for writing achema commands at the beginning of the file.
Short example using Gradle and Springboot.
Assuming you have a project property defining your environment, and "dev" is the one creating DDL files for Postgres.
Excerpt from application.yml:
spring:
jpa:
database-platform: org.hibernate.dialect.PostgreSQLDialect
database: POSTGRESQL
properties:
javax.persistence.schema-generation.database.action: drop-and-create
javax.persistence.schema-generation.scripts.action: drop-and-create
javax.persistence.schema-generation.scripts.create-target: ./sql/create-postgres.sql
javax.persistence.schema-generation.scripts.create-source: metadata
javax.persistence.schema-generation.scripts.drop-target: ./sql/drop-postgres.sql
javax.persistence.schema-generation.scripts.drop-source: metadata
Add some code in bootRun task to delete the files:
bootRun {
def defaultProfile = 'devtest'
def profile = project.hasProperty("env") ? project.getProperty("env") : defaultProfile
if (profile == 'dev') {delete 'sql/create-postgres.sql'; delete 'sql/drop-postgres.sql';}
...}
I only ever use the schema file for testing and solved the problem by specifying the location of the output file as follows:
spring.jpa.properties.javax.persistence.schema-generation.scripts.create-target=target/test-classes/schema.sql
This works well for Maven but the target directory could be modified for a Gradle build.
I'm just getting started with using Spring Batch. I'm working on a command line launcher using AWS Batch with a Docker image. Trying to just sort the job instance naming.
Would it be acceptable to use a #Value for the below literal string in the jobBuilder? Essentially I'm passing in S3 File keys which will be unique values already as I have a task that's grabbing the file before my FlatFileReader runs. The goal being to facilitate retries against the job when required due to a failure.
#Bean
public Job jobParametersJob() {
return jobBuilderFactory.get("PassedInValue")
.start(step1())
.next(step2())
.next(step3())
.build();
}
I ended up solving by using a job incrementer that implements JobParametersIncrementer. Note in my case, I'm not currently passing in any job parameters so this is setting them here as my parameters are currently just passed in via environment variables to the docker container.
public class JobIncrementer implements JobParametersIncrementer {
#Value("${s3filekey}")
String s3filekey;
#Override
public JobParameters getNext(JobParameters jobParameters) {
return new JobParametersBuilder().addString("s3filekey",s3filekey).toJobParameters();
}...//then in job configuration...
#Bean
public Job jobParametersJob() {
return jobBuilderFactory.get("jobParametersJob")
.incrementer(jobIncrementer)
.start(step1())
.next(step2())
.next(step3())
.build();
}
How spring batch admin is stopping a running job from the UI .
On the spring batch admin's online documentation i have read the following lines .
"A job that is executing can be stopped by the user (whether or not it
is launchable). The stop signal is sent via the database and once
detected by Spring Batch in whatever process is running the job, the
job is stopped (status moves from STOPPING to STOPPED) and no further
processing takes place."
Does that mean Spring batch admin UI is directly changing the status of job inside the spring batch table ?
UPDATE: I tried executing the below query on the running job .
update batch_job_execution set status="STOPPED" where job_ins
tance_id=19;
The above query is getting updated in the DB but spring batch is not bale to stop the running job.
If anybody has tried this please do share the logic here .
You're confused between Batch Status vs. Exit Status.
What are you doing with that SQL is changed the STATUS to STOPPED
When a job is running you can stop the job from the code. In each step iteration, check their status and if STOPPING its set, then send the step to stop ongoing.
Anyway, what you doing is not elegant. The correct way is explained in Common Batch Patterns -> 11.2 Stopping a Job Manually for Business Reasons
public class FooProcessor implements ItemProcessor<FooIn,FooOut>{
public FooOut process(FooIn foo) throws Exception {
if (sendToStop(item)) {
throw new MyStopException("I need to Stop: " + item);
}
//do my stuff
return new FooOut(foo);
}
}
Another simple way to stop chunk step is return null in the reader. This tells us that no more elements to iterate the reader
public T read() throws Exception {
T item = delegate.read();
if (ifNeedStop(item)) {
return null; // end the step here
}
return item;
}
I investigated the spring batch code.
It seems they update both the version and status of the BATCH_JOB_EXECUTION.
This works for me:
update batch_job_execution set status="STOPPED", version=version+1 where job_instance_id=19;
If you look into the jars of spring batch admin, you can see that in AbstractStep.java(spring-batch admin class) it checks for the status of the Step and Job from Database .
Based on this status it validates step before running it .
This works well for all cases except in chunk, since next step is called after large processing . If you want to implement in it, you can implement your own listener to check status (but it will increase DB hits) .
Could anyone give some considerations to get started using the ESAPI on a no-web context?
I came with this little test that validates a string with DefaultValidator.isValidCreditCard, but I got some web-container dependency errors.
The following method is consumed from a Junit Test:
#Override
public ValidationErrorList creditCard(String value) {
this.value = value;
ValidationErrorList errorList = new ValidationErrorList();
try {
isValid = validator.isValidCreditCard(null, value, false, errorList);
}catch(Exception ie){
System.out.println(">>> CCValidator: [ " + value + "] " + ie.getMessage());
messages = (ArrayList) errorList.errors();
}
return messages;
}
This is the error that I get (relevant part) of course I'm not running in a container:
Attempting to load ESAPI.properties via file I/O.
Attempting to load ESAPI.properties as resource file via file I/O.
Found in 'org.owasp.esapi.resources' directory: C:\foundation\validation\providers\esapi\ESAPI.properties
Loaded 'ESAPI.properties' properties file
Attempting to load validation.properties via file I/O.
Attempting to load validation.properties as resource file via file I/O.
Found in 'org.owasp.esapi.resources' directory: C:\foundation\validation\providers\esapi\validation.properties
Loaded 'validation.properties' properties file
SecurityConfiguration for Encoder.AllowMixedEncoding not found in ESAPI.properties. Using default: false
SecurityConfiguration for Encoder.AllowMixedEncoding not found in ESAPI.properties. Using default: false
javax/servlet/ServletRequest
java.lang.NoClassDefFoundError: javax/servlet/ServletRequest
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.owasp.esapi.util.ObjFactory.make(ObjFactory.java:74)
at org.owasp.esapi.ESAPI.httpUtilities(ESAPI.java:121)
at org.owasp.esapi.ESAPI.currentRequest(ESAPI.java:70)
at org.owasp.esapi.reference.Log4JLogger.log(Log4JLogger.java:434)
...
Calls to ESAPI..xxxMethods() also raise dependency errors.
Any advice to get started will be appreciate.
Best,
jose
ESAPI has a servlet filter API that requires javax.servlet.ServletRequest to be on the classpath. ESAPI is owned by OWASP --> "Open Web Application Security Project." Therefore, ESAPI is designed with web applications in mind.
If you're not writing a web application, then its either a console application or a rich client application. If you don't expect to use it to connect to the outside world, then the main secure practices you really need to worry about are ensuring that you always use safely parameterized queries, and that any data passed into your application from a source that IS connected to the outside world is properly escaped. For that, the only thing you need is OWASP's encoder project.