Use two different Elasticsearch clusters with Spring Data Elasticsearch

Motivation

In the Spring Data Elasticsearch issue tracker recently someone asked if it is possible to access two different Elasticsearch clusters in an application that is using Spring Data Elasticsearch. The reason they needed this weas because that application should do some migration - I assume that data from one cluster should be transferred to a second cluster. In this blog post I will show how this can be set up. In this article I name the one cluster the default cluster and the other the secondary cluster.

Used versions and the sample code

The code was created by using the Spring Initializr ; the selected version of Spring Boot was 3.1.4, I added the “Spring Data Elasticsearch” (of course) and “web” dependencies, although the latter is not necessarily needed, I always add this because normally I need it. The version of Spring Data Elasticsearch that is added is 5.1.4.

The code for this example is available at Codeberg .

Prerequesites

I assume that you are familiar with Spring Data Elasticsearch and how to configure it by providing a configuration to configure the connection to the cluster. If not, check the reference documentation . I am not using any Spring Boot autoconfiguration features or values taken from environment or properties files.

What is needed

To achieve our goal we need to

  • set up two different ElasticsearchOperations, one for each cluster
  • a mechanism to be able to create ElasticsearchRepository beans that used the designated ElasticsearchOperations

Providing the ElasticsearchOperations beans

The default connection

For the connection to the default cluster we use the normal way of defining a @Configuration annotated class that derives from org.springframework.data.elasticsearch.client.elc.ElasticsearchConfiguration. We need to implement the method clientConfiguration(), here I show the minimal way to do this by just setting the host and port where the default cluster can be accessed.

ClusterConfiguration.java:


@Configuration(proxyBeanMethods = false)
public class ClusterConfiguration extends ElasticsearchConfiguration {

		@Override
		public ClientConfiguration clientConfiguration() {
				return ClientConfiguration.create("es-primary:9200");
		}

		@Bean
		@Primary
		public ElasticsearchOperations elasticsearchOperations(ElasticsearchConverter elasticsearchConverter, ElasticsearchClient elasticsearchClient) {
				return super.elasticsearchOperations(elasticsearchConverter, elasticsearchClient);
		}
}

The second method to overload is the method providing the default ElasticsearchOperations bean. We add no logic in here and just call the base class implementation, but we add the @Primary annotation to this so that whenever a bean of this type is requested without specifying a different qualifier, this one is used.

The secondary connection

For this one we create a second class derived from ElasticsearchConfiguration and specify the necessary host and port like for the default one. But on this one we do not add the @Configuration annotation, because if we’d do so, the base class would create beans that would conflict with the ones created from our default cluster configuration. But we need some of the logic from the base class.

SecondaryClusterConfiguration.java:

// NOTE: no @Configuration here!
public class SecondaryClusterConfiguration extends ElasticsearchConfiguration {
		public ClientConfiguration clientConfiguration() {
				return ClientConfiguration.create("es-secondary:9200");
		}
}

When we have this class we add a new method to our configuration of the first cluster:

ClusterConfiguration.java:


@Configuration(proxyBeanMethods = false)
public class ClusterConfiguration extends ElasticsearchConfiguration {

		@Override
		public ClientConfiguration clientConfiguration() {
				return ClientConfiguration.create("es-primary:9200");
		}

		@Bean
		@Primary
		public ElasticsearchOperations elasticsearchOperations(ElasticsearchConverter elasticsearchConverter, ElasticsearchClient elasticsearchClient) {
				return super.elasticsearchOperations(elasticsearchConverter, elasticsearchClient);
		}

		@Bean
		@Qualifier("secondaryCluster")
		public ElasticsearchOperations secondaryCluster(ElasticsearchConverter elasticsearchConverter) {

				var elasticsearchConfiguration = new SecondaryClusterConfiguration();
				var clientConfiguration = elasticsearchConfiguration.clientConfiguration();
				var restClient = elasticsearchConfiguration.elasticsearchRestClient(clientConfiguration);
				var elasticsearchClient = elasticsearchClient(restClient);

				return elasticsearchConfiguration.elasticsearchOperations(elasticsearchConverter, elasticsearchClient);
		}
}

This new method provides the second ElasticsearchOperations bean that is qualified by the name “secondaryCluster”.

With this we already can use both ElasticsearchOperations to access the different clusters. Just inject them like this:

@Autowired
private ElasticsearchOperations primaryClusterOperations;

@Autowired
@Qualifier("secondaryCluster")
private ElasticsearchOperations secondaryClusterOperations;

Each one will use its own connection to the corresponding cluster

The repositories

default cluster repositories

The implementations for Spring Data repositories are automatically created on application startup and by default use the bean with the name “elasticsearchTemplate”, that is the default name that the org.springframework.data.elasticsearch.client.elc.ElasticsearchConfiguration class assigns to the created bean. We do not need any additional configuration here.

secondary cluster repositories

We now need a way to tell Spring Data Elasticsearch that we have some repositories that need a different one.

To achieve this, we create a new package, I named it secondarycluster, in this package we put the repository interfaces and one configuration. The following shows the layout in the sample project:

.
├── BlogSdeMultipleClustersApplication.java
├── ClusterConfiguration.java
├── Data.java
├── PrimaryRepository.java
├── SecondaryClusterConfiguration.java
├── package-info.java
└── secondarycluster
    ├── SecondaryRepository.java
    └── SecondaryRepositoryConfiguration.java

The repository is nothing special,

SecondaryRepository.java:

public interface SecondaryRepository extends ElasticsearchRepository<Data, String> {
}

The important file in this package is

SecondaryRepositoryConfiguration.java:


@Configuration
@EnableElasticsearchRepositories(elasticsearchTemplateRef = "secondaryCluster")
public class SecondaryRepositoryConfiguration {
}

This configuration enables the repository scanning in this package and its sub-packages, but specifies the name of the ElasticsearchOperations bean to be used when instantiating these repository interfaces.

What’s left to do is that we need to exclude this package from the normal default repository scan, we achieve this by adding the following to our ClusterConfiguration class:


@Configuration(proxyBeanMethods = false)
@EnableElasticsearchRepositories(excludeFilters = {
				@ComponentScan.Filter(
								type = FilterType.REGEX,
								pattern = "com\\.sothawo\\.blogsdemultipleclusters\\.secondarycluster\\..*"
				)
})
public class ClusterConfiguration extends ElasticsearchConfiguration {
		// code shown above
}

We enable the repository scan that uses the default ElasticsearchOperations, but we exclude the package that contains our repositories that should use the “secondaryCluster”

Summing it up

With just some configuration files and adaptions to the default setup we achieved our goal to access two different Elasticsearch cluster from one application. Checkout the code from https://codeberg.org/sothawo/blog-sde-multiple-clusters , you can give feedback by mail or on Mastodon .