For scaling a Batch Job, Parallel Steps is one solution that bases on the business logic of application. We split the logic business in distinct responsibilities, and each step can be executed in parallelized flow. The tutorial will guide you how to configure parallel steps with Spring Batch.
Related Articles:
– Spring Batch Partition for Scaling & Parallel Processing
– How to start with Spring Batch using Spring Boot – Java Config
Contents
I. Technologies for spring batch parallel step tutorial
– Java 1.8
– Maven 3.3.9
– Spring Tool Suite – Version 3.8.1.RELEASE
– Spring Boot
– MySQL Database
II. Overview
1. Structure of project
2. Step to do
– Create Spring Boot project
– Add dependencies
– Config Batch Job DataSource
– Create a Simple Tasklet Step
– Create a Job Launch
– Enable Batch Job
– Define Spring Batch Job with XML config
– Run & Check Result
III. Practices
1. Create Spring Boot project
Open Spring Tool Suite, on main menu, choose File->New->Spring Starter Project, input project info. Then press Finish, spring boot project will be created.
2. Add dependencies
Add spring-boot-starter-batch, spring-boot-starter-web, mysql-connector-java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
<?xml version="1.0" encoding="UTF-8"?> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <scope>runtime</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> |
3. Config Batch Job DataSource
Open application.properties, config spring-datasource for batch job repository
1 2 3 4 |
spring.datasource.url=jdbc:mysql://localhost:3306/testdb spring.datasource.username=root spring.datasource.password=12345 spring.batch.job.enabled=false |
4. Create a Simple Tasklet Step
Create a SimpleStep that implements the Tasklet interface.
We define a workload function for simulating the hard processing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
package com.javasampleapproach.batch.parallelstep.step; import org.springframework.batch.core.StepContribution; import org.springframework.batch.core.scope.context.ChunkContext; import org.springframework.batch.core.step.tasklet.Tasklet; import org.springframework.batch.repeat.RepeatStatus; public class SimpleStep implements Tasklet{ @Override public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception { workload(); System.out.println("Done"); return RepeatStatus.FINISHED; } private void workload() throws Exception{ Thread.sleep(5000); } } |
5. Define Spring Batch Job with XML config
Here is the main part of tutorial.
We defines 4 steps: step_1, step_2, step_3, step_4. Step_1, Step_2, Step_3 are designed for parallel processing with 3 flow in a split split_1.
After split_1 is Done, it means: step_1, step_2, step_3 are Done, then step_4 is processed.
Details:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
<beans:beans xmlns="http://www.springframework.org/schema/batch" xmlns:beans="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-3.0.xsd"> <job id="job"> <split id="split_1" task-executor="taskExecutor" next="step_4"> <flow> <step id="step_1"> <tasklet ref="taskletStep_1"/> </step> </flow> <flow> <step id="step_2"> <tasklet ref="taskletStep_2"/> </step> </flow> <flow> <step id="step_3"> <tasklet ref="taskletStep_3"/> </step> </flow> </split> <step id="step_4"> <tasklet ref="taskletStep_3"/> </step> </job> <beans:bean id="taskletStep_1" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" /> <beans:bean id="taskletStep_2" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" /> <beans:bean id="taskletStep_3" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" /> <beans:bean id="taskletStep_4" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" /> <beans:bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" /> </beans:beans> |
The TaskExecutor is defaulted by a SyncTaskExecutor, but for parallel processing batch job requires an asynchronous TaskExecutor. So we set task-executor=”taskExecutor” with:
1 |
<beans:bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" /> |
6. Create a Job Launch
Use a Controller for launch a Batch Job
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
package com.javasampleapproach.batch.parallelstep.controller; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobParameters; import org.springframework.batch.core.JobParametersBuilder; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; @RestController public class JobLauncherController { @Autowired JobLauncher jobLauncher; @Autowired Job job; @RequestMapping("/launchjob") public String handle() throws Exception { Logger logger = LoggerFactory.getLogger(this.getClass()); try { JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis()) .toJobParameters(); jobLauncher.run(job, jobParameters); } catch (Exception e) { logger.info(e.getMessage()); } return "Done"; } } |
7. Enable Batch Job
In the main class of Spring boot, annotation with @EnableBatchProcessing
Detail:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
package com.javasampleapproach.batch.parallelstep; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.context.annotation.ImportResource; @SpringBootApplication @EnableBatchProcessing @ImportResource("classpath:batchjob.xml") public class SpringBatchParallelStepApplication { public static void main(String[] args) { SpringApplication.run(SpringBatchParallelStepApplication.class, args); } } |
8. Run & Check Result
Build project with Maven: clean install
Run project with mode: Spring Boot App
Make a launch request:
http://localhost:8080/launchjob
Logs
1 2 3 4 5 6 7 8 9 |
2016-10-13 09:27:00.868 INFO 3140 --- [cTaskExecutor-2] o.s.batch.core.job.SimpleStepHandler : Executing step: [step_2] 2016-10-13 09:27:00.883 INFO 3140 --- [cTaskExecutor-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [step_1] 2016-10-13 09:27:00.883 INFO 3140 --- [cTaskExecutor-3] o.s.batch.core.job.SimpleStepHandler : Executing step: [step_3] Done Done Done 2016-10-13 09:27:06.015 INFO 3140 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [step_4] Done 2016-10-13 09:27:11.059 INFO 3140 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=job]] completed with the following parameters: [{time=1476325617042}] and the following status: [COMPLETED] |
IV. Source code
Last updated on June 4, 2017.
i really like this tutorial,i implemented one spring batch application as POC my client like that and we deployed that in production,i implemented that application which use only one thread.
but we added more client and now getting huge amount of request,we deployed the application in PCF,by using PCF i created 2 instance of my spring batch application, but the problem i am getting is, both instances are getting same data from the data base while reading it.
i am using jdbcCursorItemReader, can you please suggest a way how can i prevent duplicate reads from instances.client want to use horizontal scaling,no partition and threads.
Hi,
For your system, I recommend you to re-design architecture. You should have a Queue implementation before your jobs. Then duplicated problem will be resolve
and your system’s architecture will be clearer and stronger.
About Queue implementation, you can choose: ActiveMQ, or RabitMQ, or Kafka.
– For starting with ActiveMQ, you can refer:
How to use Spring JMS with ActiveMQ – JMS Consumer and JMS Producer | Spring Boot
– For starting with RabitMQ, you can refer:
RabbitMq – How to create Spring RabbitMq Publish/Subcribe pattern with SpringBoot
RabbitMQ – How to create Spring RabbitMQ Producer/Consumer applications with SpringBoot
– For starting with Kafka, you can refer:
How to start Spring Apache Kafka Application with SpringBoot Auto-Configuration
If you want to know more about architecture, We can discuss more via email: contact@grokonez.com
Regards,
JSA