Spring Batch Job with Parallel Steps

Spring Batch Job with Parallel Steps

For scaling a Batch Job, Parallel Steps is one solution that bases on the business logic of application. We split the logic business in distinct responsibilities, and each step can be executed in parallelized flow. The tutorial will guide you how to configure parallel steps with Spring Batch.

Related Articles:
Spring Batch Partition for Scaling & Parallel Processing
How to start with Spring Batch using Spring Boot – Java Config

I. Technologies for spring batch parallel step tutorial

– Java 1.8
– Maven 3.3.9
– Spring Tool Suite – Version 3.8.1.RELEASE
– Spring Boot
– MySQL Database

II. Overview

1. Structure of project

spring batch parallel step structure-of-project

2. Step to do

– Create Spring Boot project
– Add dependencies
– Config Batch Job DataSource
– Create a Simple Tasklet Step
– Create a Job Launch
– Enable Batch Job
– Define Spring Batch Job with XML config
– Run & Check Result

III. Practices

1. Create Spring Boot project

Open Spring Tool Suite, on main menu, choose File->New->Spring Starter Project, input project info. Then press Finish, spring boot project will be created.

2. Add dependencies

Add spring-boot-starter-batch, spring-boot-starter-web, mysql-connector-java

<?xml version="1.0" encoding="UTF-8"?>
<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter</artifactId>
	</dependency>

	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-web</artifactId>
	</dependency>
	
	<dependency>
	    <groupId>org.springframework.boot</groupId>
	    <artifactId>spring-boot-starter-batch</artifactId>
	</dependency>  
	
      <dependency>
           <groupId>mysql</groupId>
           <artifactId>mysql-connector-java</artifactId>
           <scope>runtime</scope>
       </dependency>
	
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-test</artifactId>
		<scope>test</scope>
	</dependency>                              
</dependencies>

3. Config Batch Job DataSource

Open application.properties, config spring-datasource for batch job repository


spring.datasource.url=jdbc:mysql://localhost:3306/testdb
spring.datasource.username=root
spring.datasource.password=12345
spring.batch.job.enabled=false

4. Create a Simple Tasklet Step

Create a SimpleStep that implements the Tasklet interface.
We define a workload function for simulating the hard processing.


package com.javasampleapproach.batch.parallelstep.step;

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;

public class SimpleStep implements Tasklet{

	@Override
	public RepeatStatus execute(StepContribution contribution,
			ChunkContext chunkContext) throws Exception {
		workload();
		System.out.println("Done");
		return RepeatStatus.FINISHED;
	}
	
	private void workload() throws Exception{
		Thread.sleep(5000);
	}

}

5. Define Spring Batch Job with XML config

Here is the main part of tutorial.

spring batch parallel step-flow-split

We defines 4 steps: step_1, step_2, step_3, step_4. Step_1, Step_2, Step_3 are designed for parallel processing with 3 flow in a split split_1.

After split_1 is Done, it means: step_1, step_2, step_3 are Done, then step_4 is processed.

Details:

<beans:beans xmlns="http://www.springframework.org/schema/batch"
	xmlns:beans="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="
           http://www.springframework.org/schema/beans
           http://www.springframework.org/schema/beans/spring-beans.xsd
           http://www.springframework.org/schema/batch
           http://www.springframework.org/schema/batch/spring-batch-3.0.xsd">

	<job id="job">
		<split id="split_1" task-executor="taskExecutor" next="step_4">
			<flow>
				<step id="step_1">
					<tasklet ref="taskletStep_1"/>
				</step>
			</flow>
			<flow>
				<step id="step_2">
					<tasklet ref="taskletStep_2"/>
				</step>
			</flow>
			<flow>
				<step id="step_3">
					<tasklet ref="taskletStep_3"/>
				</step>
			</flow>
		</split>
		<step id="step_4">
					<tasklet ref="taskletStep_3"/>
		</step>
	</job>

	<beans:bean id="taskletStep_1" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" />
	<beans:bean id="taskletStep_2" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" />
	<beans:bean id="taskletStep_3" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" />
	<beans:bean id="taskletStep_4" class="com.javasampleapproach.batch.parallelstep.step.SimpleStep" />
	
	<beans:bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" />

</beans:beans>

The TaskExecutor is defaulted by a SyncTaskExecutor, but for parallel processing batch job requires an asynchronous TaskExecutor. So we set task-executor=”taskExecutor” with:

<beans:bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" />

6. Create a Job Launch

Use a Controller for launch a Batch Job


package com.javasampleapproach.batch.parallelstep.controller;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class JobLauncherController {

	@Autowired
	JobLauncher jobLauncher;

	@Autowired
	Job job;

	@RequestMapping("/launchjob")
	public String handle() throws Exception {

		Logger logger = LoggerFactory.getLogger(this.getClass());
		try {
			JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis())
					.toJobParameters();
			jobLauncher.run(job, jobParameters);
		} catch (Exception e) {
			logger.info(e.getMessage());
		}

		return "Done";
	}
}

7. Enable Batch Job

In the main class of Spring boot, annotation with @EnableBatchProcessing

Detail:


package com.javasampleapproach.batch.parallelstep;

import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ImportResource;

@SpringBootApplication
@EnableBatchProcessing
@ImportResource("classpath:batchjob.xml")
public class SpringBatchParallelStepApplication {

	public static void main(String[] args) {
		SpringApplication.run(SpringBatchParallelStepApplication.class, args);
	}
}

8. Run & Check Result

Build project with Maven: clean install
Run project with mode: Spring Boot App

Make a launch request:
http://localhost:8080/launchjob

Logs


2016-10-13 09:27:00.868  INFO 3140 --- [cTaskExecutor-2] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step_2]
2016-10-13 09:27:00.883  INFO 3140 --- [cTaskExecutor-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step_1]
2016-10-13 09:27:00.883  INFO 3140 --- [cTaskExecutor-3] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step_3]
Done
Done
Done
2016-10-13 09:27:06.015  INFO 3140 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step_4]
Done
2016-10-13 09:27:11.059  INFO 3140 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=job]] completed with the following parameters: [{time=1476325617042}] and the following status: [COMPLETED]

IV. Source code

SpringBatchParallelStep



By grokonez | October 13, 2016.

Last updated on April 25, 2021.



Related Posts


3 thoughts on “Spring Batch Job with Parallel Steps”

  1. i really like this tutorial,i implemented one spring batch application as POC my client like that and we deployed that in production,i implemented that application which use only one thread.
    but we added more client and now getting huge amount of request,we deployed the application in PCF,by using PCF i created 2 instance of my spring batch application, but the problem i am getting is, both instances are getting same data from the data base while reading it.
    i am using jdbcCursorItemReader, can you please suggest a way how can i prevent duplicate reads from instances.client want to use horizontal scaling,no partition and threads.

    1. Hi,

      For your system, I recommend you to re-design architecture. You should have a Queue implementation before your jobs. Then duplicated problem will be resolve
      and your system’s architecture will be clearer and stronger.

      About Queue implementation, you can choose: ActiveMQ, or RabitMQ, or Kafka.

      – For starting with ActiveMQ, you can refer:
      How to use Spring JMS with ActiveMQ – JMS Consumer and JMS Producer | Spring Boot

      – For starting with RabitMQ, you can refer:
      RabbitMq – How to create Spring RabbitMq Publish/Subcribe pattern with SpringBoot

      RabbitMQ – How to create Spring RabbitMQ Producer/Consumer applications with SpringBoot

      – For starting with Kafka, you can refer:
      How to start Spring Apache Kafka Application with SpringBoot Auto-Configuration

      If you want to know more about architecture, We can discuss more via email: contact@grokonez.com

      Regards,
      JSA

Got Something To Say:

Your email address will not be published. Required fields are marked *

*