Implement a rolling index strategy with Spring Data Elasticsearch 4.1

With the release of version 4.1 Spring Data Elasticsearch now supports the index templates of Elasticsearch. Index templates allow the user to define settings, mappings and aliases for indices that are automatically created by Elasticsearch when documents are saved to a not yet existing index.

In this blog post I will show how index templates can be used in combination with Spring Data Repository customizations to implement a rolling index strategy where new indices will be created automatically based on the date.

You should be familiar with the basic concepts of Spring Data Repositories and the use of Spring Data Elasticsearch.

As the most popular use case for rolling indexes is storing log entries in Elasticsearch, we will do something similar. Our application will offer an HTTP endpoint where a client can POST a message, this message will be stored in an index that is named msg-HH-MM where the index name will contain the hour and the minute when the message was received. Normally that would be something containing the date, but to be able to see this working, we need some different naming scheme.

When the user issues a GET request with a search word, the application will search across all indices by using the alias name msg which we will set up as an alias for all the msg-* indices.

Basic setup

The program

The source code for this example is available on GitHub. This project was set up using start.spring.io, selecting a Spring Boot 2.4.0 application with web and spring-data-elasticsearch support and Java version 15.

Note: I make use of Java 15 features like var definition of variables, this is not necessary for Spring Data Elasticsearch, you still can use Java 8 if you need to.

Elasticsearch

In order to run this example we need an Elasticsearch cluster, I use version 7.9.3 because that’s the version that Spring Data Elasticsearch 4.1, the version Spring Boot pulls in, is built with. I have downloaded Elasticsearch and have it running on my machine, accessible at http://localhost:9200. Please adjust the setup in the application configuration at  src/main/resources/application.yml accordingly.

Command line client

In order to access our program and to check what is stored in Elasticsearch I use httpie. An alternative would be curl.

The different parts of the application

The entity

The entity we use in this example looks like this:

@Document(indexName = "msg", createIndex = false)
public class Message {
    @Id private String id;

    @Field(type = FieldType.Text)
    private String message;

    @Field(type = FieldType.Date, format = DateFormat.date_hour_minute_second)
    private LocalDateTime timestamp = LocalDateTime.now();

    // getter/setter omitted here for brevity
}

Please note the following points:

  • the index name is set to msg, this will be the alias name that will point to all the different indices that will be created. Spring Data Repository methods will without adaption use this name. This is ok for reading, we will set up the writing part later.
  • the createIndex argument of the @Document annotation is set to false. We don’t want the application to automatically create an index named msg as Elasticsearch will automatically create the indices when documents are stored.
  • the properties are explicitly annotated with their types, so that the correct index mapping can be stored in the index template and later automatically be applied to a new created index.

The index template

To initialize the index template, we use a Spring Component:

@Component
public class TemplateInitializer {

    private static final String TEMPLATE_NAME = "msg-template";
    private static final String TEMPLATE_PATTERN = "msg-*";
    
    private final ElasticsearchOperations operations;

    public TemplateInitializer(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Autowired
    public void setup() {

        var indexOps = operations.indexOps(Message.class);

        if (!indexOps.existsTemplate(TEMPLATE_NAME)) {

            var mapping = indexOps.createMapping();

            var aliasActions = new AliasActions().add(
                    new AliasAction.Add(AliasActionParameters.builderForTemplate()
                            .withAliases(indexOps.getIndexCoordinates().getIndexNames())
                            .build())
            );

            var request = PutTemplateRequest.builder(TEMPLATE_NAME, TEMPLATE_PATTERN)
                    .withMappings(mapping)
                    .withAliasActions(aliasActions)
                    .build();

            indexOps.putTemplate(request);
        }
    }
}

This bean class has a method setup() that is annotated with @Autowired. A method with this annotation will be executed once when the beans in the Spring ApplicationContext are all setup. So in the setup() method we can be sure that the injected ElasticsearchOperations instance has been set.

To work with the index templates we need an implementation of the IndexOperations interface which we get from the operations object. We then check if the index template already exists, as this initialization should only be done once.

If the index template does not exist, we first create the index mapping with indexOps.createMapping(). As the indexOps was bound to the Message class when we created it, the annotations from the Message class are used to create the mapping.

The next step is to create an AliasAction that will add an alias to an index when it is created. The name for the alias is retrieved from the Message class with indexOps.getIndexCoordinates().getIndexNames().

We then put the mapping and the alias action in a PutTemplateRequest together with a name for the template and the pattern when this template should be applied and send it off to Elasticsearch.

The repository

The Spring Data Repository we use is pretty simple:

public interface MessageRepository extends ElasticsearchRepository<Message, String> {

    SearchHits<Message> searchAllBy();

    SearchHits<Message> searchByMessage(String text);
}

It extends ElasticsearchRepository and defines one method to retrieve all messages and a second one to search for text in a message.

The repository customization

We now need to customize the repository as we want our own methods to be used when saving Message ojects to the index. In these methods we will set the correct index name. we do this by defining a new interface CustomMessageRepository. As we want to redefine methods that are already defined in the CrudRepository interface (which our MessageRepository already extends), it is important that our methods have exactly the same signature as the methods from CrudRepository. This is the reason we need to make this interface generic:

public interface CustomMessageRepository<T> {

    <S extends T> S save(S entity);

    <S extends T> Iterable<S> saveAll(Iterable<S> entities);
}

We provide an implementation of this interface in the class CustomMessageRepositoryImpl. This must have the same name as the interface with the suffix Impl, so that Spring Data can pick up this implementation:

public class CustomMessageRepositoryImpl implements CustomMessageRepository<Message> {

    final private ElasticsearchOperations operations;

    public CustomMessageRepositoryImpl(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Override
    public <S extends Message> S save(S entity) {
        return operations.save(entity, indexName());
    }

    @Override
    public <S extends Message> Iterable<S> saveAll(Iterable<S> entities) {
        return operations.save(entities, indexName());
    }

    public IndexCoordinates indexName() {
        var indexName = "msg-" +
                LocalTime.now().truncatedTo(ChronoUnit.MINUTES).toString().replace(':', '-');
        return IndexCoordinates.of(indexName);
    }
}

We have an ElasticsearchOperation instance injected (no need to define this as @Component, Spring Data detects this by the class name and does the injection). The index name is provided by the indexName() method which uses the hour and minute to provide an index name of the pattern msg-HH-MM using the current time. A real life scenario would probably use the date instead of the time, but as we want test this with different entities and not wait a whole day between inserting them, this should be fine for now.

In the implementations of our save methods, we call the ElasticsearchOperations´s save method but provide our own index name, so that the one from the @Document annotation is not taken.

A last step we need to do is to have our MessageRepository implement this new repository as well:

public interface MessageRepository extends ElasticsearchRepository<Message, String>, CustomMessageRepository<Message> {

    SearchHits<Message> searchAllBy();

    SearchHits<Message> searchAllByMessage(String text);
}

oops, the controller

And of course we need something to test this all, so here we hav a simple controller to store and retrieve messages:

@RestController
@RequestMapping("/messages")
public class MessageController {

    private final MessageRepository repository;

    public MessageController(MessageRepository repository) {
        this.repository = repository;
    }

    @PostMapping
    public Message add(@RequestBody Message message) {
        return repository.save(message);
    }

    @GetMapping
    public SearchHits<Message> messages() {
        return repository.searchAllBy();
    }

    @GetMapping("/{text}")
    public SearchHits<Message> messages(@PathVariable("text") String text) {
        return repository.searchAllByMessage(text);
    }
}

This is just a plain old Spring REST controller with nothing special.

Let’s see it in action

Now let’s start up the program and check what we have (remember, I use httpie as a client).

In the beginning there are no indices:

$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-length: 0
content-type: text/plain; charset=UTF-8

We check out the templates:

$ http :9200/_template/msg-template
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 165
content-type: application/json; charset=UTF-8

{
    "msg-template": {
        "aliases": {
            "msg": {}
        },
        "index_patterns": [
            "msg-*"
        ],
        "mappings": {
            "properties": {
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "date_hour_minute_second",
                    "type": "date"
                }
            }
        },
        "order": 0,
        "settings": {}
    }
}

The template definition with the mapping and alias definition is there. Now let’s add an entry:

$ http post :8080/messages message="this is the first message"
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:10:59 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "id": "TwYL2HUBIlu2470f4r6Y",
    "message": "this is the first message",
    "timestamp": "2020-11-17T22:10:58.541117"
}

We see that this message was persisted at 22:10, what about the indices?

$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 83
content-type: text/plain; charset=UTF-8

yellow open msg-22-10 bFfnss5wR8CuLOmSfJPDDw 1 1 1 0 4.5kb 4.5kb

We have a new index named msg-22-10, let’s check it’s setup:

$ http :9200/msg-22-10
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 326
content-type: application/json; charset=UTF-8

{
    "msg-22-10": {
        "aliases": {
            "msg": {}
        },
        "mappings": {
            "properties": {
                "_class": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "date_hour_minute_second",
                    "type": "date"
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1605647458601",
                "number_of_replicas": "1",
                "number_of_shards": "1",
                "provided_name": "msg-22-10",
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "uuid": "bFfnss5wR8CuLOmSfJPDDw",
                "version": {
                    "created": "7100099"
                }
            }
        }
    }
}

Let’s add another one:

$ http post :8080/messages message="this is the second message"                                           
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:13:52 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "id": "UAYO2HUBIlu2470fiL7G",
    "message": "this is the second message",
    "timestamp": "2020-11-17T22:13:52.336695"
}


$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 112
content-type: text/plain; charset=UTF-8

yellow open msg-22-13 gvs12CQvTOmdvqsQz7k6yw 1 1 1 0 4.5kb 4.5kb
yellow open msg-22-10 bFfnss5wR8CuLOmSfJPDDw 1 1 1 0 4.5kb 4.5kb

So we have two indices now. Now let’s get all the entries from our application:

$ http :8080/messages
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:15:57 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "aggregations": null,
    "empty": false,
    "maxScore": 1.0,
    "scrollId": null,
    "searchHits": [
        {
            "content": {
                "id": "TwYL2HUBIlu2470f4r6Y",
                "message": "this is the first message",
                "timestamp": "2020-11-17T22:10:58"
            },
            "highlightFields": {},
            "id": "TwYL2HUBIlu2470f4r6Y",
            "index": "msg-22-10",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        },
        {
            "content": {
                "id": "UAYO2HUBIlu2470fiL7G",
                "message": "this is the second message",
                "timestamp": "2020-11-17T22:13:52"
            },
            "highlightFields": {},
            "id": "UAYO2HUBIlu2470fiL7G",
            "index": "msg-22-13",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        }
    ],
    "totalHits": 2,
    "totalHitsRelation": "EQUAL_TO"
}

We get both entries. As we are returning SearchHits<Message> we also get the information in which index each result was found; this is important if you might want to edit one of these entries and store it again in it’s original index.

Let’s sum it up

We have defined and stored an index template that allows us to specify mappings and aliases for automatically created indices. We have set up our applicaion to read from the alias and to write to a dynamically created index name and so have implemented a rolling index pattern for our Elasticsearch storage all from within Spring Data Elasticsearch.

I hope you enjoyed this example.

21 thoughts on “Implement a rolling index strategy with Spring Data Elasticsearch 4.1

  1. Getting error if I create more than one indexes and search for a document in it

    “type”:”illegal_argument_exception”,”reason”:”Alias [msg] has more than one indices associated with it

    • with the setup shown, you do not explicitly create indices. What are the index templates you use and how do you store and retrieve data?

  2. Hi,

    Thanks for sharing , very good tutorial.
    I am looking for to search from three different indices based on id.
    example:
    2 indices with simple data true/false and third one with complex data with value and rating.
    want to search them based on id.
    Any examples would be good help.

    thanks,

    • You’d need entities for each index and then search with one of the ElasticsearchOperations.search() methods, passing in the query you need and the IndexCoordinates for the index.

  3. Hello, thanks for the very neat and clear information.
    I did implement your pattern, but when I do a saveAll I got the following message.
    I use elasticsearch 7.5.0 with spring date elasticsearch 4.1.9.
    it seems that it expects the index to exist beforehand. I don’t see in your code an actual creation of the index

    org.springframework.data.elasticsearch.BulkFailureException: Bulk operation has failures. Use ElasticsearchException.getFailedDocuments() for detailed messages [{null=[cns_nomenclature_laboratoires-au-11-02] ElasticsearchException[Elasticsearch exception [type=index_not_found_exception, reason=no such index [cns_nomenclature_laboratoires-au-11-02] and [action.auto_create_index] ([.monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*,icd10*]) doesn’t match]]}]

    • Of course there is no code in the application to create the index, that’s the whole point of using index templates. You would need a template that uses a pattern like “cns_nomenclature_laboratoires-au-*” – if “11-02” is the dynamic part of the name. That is done in my example in the TemplateInitializer class.

      If you have defined such a template, the the index will be automatically created with the correct setting and mapping

  4. This is a very good article.
    It helped me a lot on my project to switch to rolling index. Thank you
    Because I can’t speak English, I borrow the power of an automatic translator to leave a comment on my trouble shooting.

    1. In my project, I also set the document settings when creating a template.
    I am using the elasticsearch service of AWS, and the default primary shard is set to 5.
    Therefore, if there is no setting in the template, the shard of the dynamically created index is set to 5.
    Since this is not intended, I added the configuration to the put template request.

    2. If the index of the pattern is not created initially, an error occurs when searching.
    So, I create index with the name appended with the’-default’ postfix.
    This can affect query performance, but I choose it because it is more convenient than handling exceptions in each query logic.

    Actually, it is different from the code I applied, but if I add the contents I applied to the existing code, it becomes the following form.

    @Component
    public class TemplateInitializer {
    private static final String TEMPLATE_NAME = “msg-template”;
    private static final String TEMPLATE_PATTERN = “msg-*”;
    private static final String DEFAULT_INDEX_NAME = “msg-default”; // default index name

    private final ElasticsearchOperations operations;
    public TemplateInitializer(ElasticsearchOperations operations) {
    this.operations = operations;
    }
    @Autowired
    public void setup() {
    var indexOps = operations.indexOps(Message.class);
    if (!indexOps.existsTemplate(TEMPLATE_NAME)) {
    var mapping = indexOps.createMapping();
    var settings = indexOps.createSettings();
    var aliasActions = new AliasActions().add(
    new AliasAction.Add(AliasActionParameters.builderForTemplate()
    .withAliases(indexOps.getIndexCoordinates().getIndexNames())
    .build())
    );
    var request = PutTemplateRequest.builder(TEMPLATE_NAME, TEMPLATE_PATTERN)
    .withMappings(mapping)
    .withSettings(settings) // add settings
    .withAliasActions(aliasActions)
    .build();
    indexOps.putTemplate(request);

    // create default index
    IndexOperations defaultIndexOps = operations.indexOps(IndexCoordinates.of(DEFAULT_INDEX_NAME));
    if (!defaultIndexOps.exists()) {
    defaultIndexOps.create(indexOps.createSettings());
    defaultIndexOps.putMapping(indexOps.createMapping());
    }
    }
    }
    }

  5. Do you have an idea on how to migrate indexes (evolution) when using this approach? We would need to update all indexes created from the template as well as the template it self. Is there a way to use spring data elasticsearch to manage the evolution of the indexes?

      • Change the entity once it’s in production (from v1 to v2). We use flyway to handle changes in our jpa entities once it’s in production. Can’t find any equivalence in spring data elasticsearch mappings. Let’s say I need to create a new attribute in the Message class (in your example). Or change the datatype from:

        @Field(type = FieldType.Text)
        private String message;

        to:

        @Field(type = FieldType.Nested)
        private ComplexMessage message;

        This would require a migration logic to use existing messages and popolate the Nested object.

        • As for adding a new property, you should add it to the entity class, and then use the IndexOperations.putMapping() method to write the new mapping for the class to the index. This would leave the existing documents as they are, and on retrieval the new property for the existing documents would be returned as null. Take care of this especially when using Kotlin.

          Changing the type of an existing property might be done with Spring Data Elasticsearch by using the Update-by-query API with a matching script.Update by query will be available from version 4.2 on which is planned for release mid-April 2021

  6. Thanks for taking the time to write this example. It works very well. I have a couple of questions:
    1) The aliases field is empty. The example creates a index at each POST. When you wrote: “We have set up our applicaion to read from the alias” what do you mean since I can’t see any alias hence cannot read from them?

    2) Lets say I want to implement the faking-It pattern described here: (https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html). I would simply create an index per user by modifiying the methods save and saveAll in MessageRepositoryImpl :

    public IndexCoordinates indexName(String name) {
    var indexName = “owner-” + name.toLowerCase(Locale.ROOT);
    return IndexCoordinates.of(indexName);
    }

    • ad 1): The application reads from “msg”, this is specified in the @Document annotation. “msg” is set up as an alias in the index template for every index that is created and matches the pattern “msg-*”. The example does not create an index at every post. A new index is created by Elasticsearch if the index name given does not yet exist. In the example the index name changes every minute, but normally that would be something like a daily changing pattern.

      ad 2): this would not work with templates, as a template contains the alias to use for a new created index, whereas you’d need the “owner-xxx” to be an alias to your index. You’d need to implement a check if an index exists when writing, and if not, create it with the mapping and corresponding alias.Have a look at https://www.sothawo.com/2020/08/use-an-index-name-defined-by-the-entity-to-store-data-in-spring-data-elasticsearch-4-0/ in the implementation of CustomHotelRepositoryImpl.java I am doing something similar.

      • I have used the wrong word (I meant alias instead of index) thanks for straightening this. I am not looking at creating a seperante index for earch owners since I sometimes need to search documents in all owners. So the faking-it pattern would work with your template approach. I have made the changes and it works as expected. Can I ask why we get the following results when getting the aliases (why aliases attribute is empty)?
        GET {host}/msg/_alias

        {
        “msg-16-18”: {
        “aliases”: {
        “msg”: {}
        }
        },
        “msg-16-34”: {
        “aliases”: {
        “msg”: {}
        }
        },

        • I don’t know why these attributes are empty, because I don’t know how the aliases were created. Can it be that they were created before you had it working? When you completely delete the index, are they still empty on new created aliases?

            • these are the indices from this demo? These are of course empty, because I just define “msg” as alias for “msg-xx-yy”, and that’s what is returned. They would not be empty if there would be filters defined for the alias, for example. I assumed you are talking about aliases in your application.

  7. This is very nice 🙂

    I was using different approach, using one main index and monthly indexes for same document. Each month I was reindexing from main index to monthly indexes and I had one alias for searching which was added to all indexes. Now I have switched to this feature in 4.1 but I had to do some workarounds.

    E.g. since index is created with template when first document is being indexed instead of on application startup like before, search, stream (findBy*, stream*, count, …) queries and methods are failing with default implementation because index is not created, so I had to extend `ElasticsearchRestTemplate` to apply `IndicesOptions` to that queries (IndicesOptions.lenientExpandOpen()) and override findById like this:

    @Override
    public Optional findById(ID id) {
    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
    .withQuery(QueryBuilders.idsQuery().addIds(stringIdRepresentation(id)))
    .withPageable(PageRequest.of(0, 1))
    .build();

    SearchHits hits = operations.search(nativeSearchQuery, getEntityClass(), operations.getIndexCoordinatesFor(getEntityClass()));
    Assert.isTrue(hits.getTotalHits() 0 ? hits.getSearchHit(0).getContent() : null);
    }

    and also ElasticSearchRestTemplate.searchForStream like this (because there is no index created yet also) (EmptySearchHitsIterator is empty implementation of SearchHitsIterator, I didn’t want to execute count() before searchForStream to avoid exceptions for missing scroll id):

    @Override
    public SearchHitsIterator searchForStream(Query query, Class clazz, IndexCoordinates index) {
    applyDefaultIndicesOptions(query);
    try {
    return super.searchForStream(query, clazz, index);
    } catch (IllegalArgumentException e) {
    log.debug(“Returning empty iterator – {}”, e.getMessage());
    return new EmptySearchHitsIterator();
    }
    }

    And now is all fully functional 🙂

    • Couldn’t you check on application startup if the write index exists, and if not use IndexOperations.create() and IndexOperations.putMapping() to set it up, before doing any search operations? Basically the same code that is executed in the SimpleElasticsearchRepository constructor when createIndex is true.

      • Sure I could, but then I will have empty open indexes if none document is indexed in corresponding index.
        For some documents I use monthly index, for other daily indexes. I try to keep them open as little as possible (depending how long data is needed for application) so searching on corresponding alias will expand to less indexes and, hopefully, my ES cluster could have better performance on heavy load than with my old implementation(this part is not tested yet, old implementation was OK but reindexing by query and deleteing by query from main index every month seemed like a bad idea with the traffic increase).

Leave a Reply to Željko Tomić Cancel reply

Your comment will only be visible after moderation. Your email address will not be published.