Reading different entities from multiple indices with one call using Spring Data Elasticsearch

The problem

In Elasticsearch (the the current version at the time of writing this post is 7.12.1) every index holds exactly one type of data. This type is defined by the index mapping which describes the fields of the index and their types.

Spring Data Elasticsearch (the current version is 4.2) automatically maps between a Java class – the entity type – with its properties and the data in the Elasticsearch index. This means that when searching, the entity type must be specified. In Spring Data Repositories this is the type specified in the repository definition, when using the ElasticsearchOperations interface, the search function required a parameter specifying the class to return.

But Elasticsearch allows it to do a search query across different indices by specifying multiple indices or wildcard indices (with an index pattern like for example http://localhost:9200/index-a,other-ones-*/_search), it will return a list of JSON objects which contains data of the different types. These returned types cannot be automatically be mapped by Spring Data Elasticsearch.

In this post I will show how you can implement this behaviour in your application if needed.

Tools and code

The whole code for this sample project is available on GitHub at https://github.com/sothawo/blog-sde-multiple-entities. The project was created using the Spring initializr to create a Java 16 maven project, the only added dependencies are web and spring-data-elasticsearch. When showing code to access a REST interface – either Elasticsearch or the running application – I use httpie.

The sample scenario

We want to store information about books and additional log entries when a book is inserted or updated. For the books we will have an index named blog-sde-books and for the log entries we use multiple indices named blog-sde-log-yyyy-mm-dd where the actual date is used, meaning that after a couple of days we will have more than one index for log entries. When storing data in the log entries, we will add the id of the corresponding book to the log entry.

We then want to search with a book id in the books index and in all the log indices, but we only want to issue one call to Elasticsearch and retrieve all the values in one step.

The code

The entities

In the example I use two entities to store data in Elasticsearch. The first index is used to store information about books, the entity looks like this:

@Document(indexName = "blog-sde-books")
public class Book {
    @Id private final String id;
    @Field(type = FieldType.Text) private final String author;
    @Field(type = FieldType.Text) private final String title;
    @Field(type = FieldType.Keyword) private final String isbn;

    public Book(String id, String author, String title, String isbn) {
        this.id = id;
        this.author = author;
        this.title = title;
        this.isbn = isbn;
    }

    // getter omitted here
}

This is a standard Spring Data Elasticsearch entity definition. The LogEntry class is pretty simple as well, it just has an additional constructor for easier use and has the constructor that should be used by Spring Data Elasticsearch annotated with @PersistenceConstructor:

@Document(indexName = "blog-sde-log-#{T(java.time.LocalDate).now().toString()}", createIndex = false)
public class LogEntry {
    @Id private final String id;
    @Field(type = FieldType.Keyword) private final String bookId;
    @Field(type = FieldType.Date, format = DateFormat.epoch_millis) private final Instant timestamp;
    @Field(type = FieldType.Text) private final String message;

    @PersistenceConstructor
    public LogEntry(String id, String bookId, Instant timestamp, String message) {
        if (timestamp == null) {
            timestamp = Instant.now();
        }
        this.id = id;
        this.bookId = bookId;
        this.timestamp = timestamp;
        this.message = message;
    }
    
    public LogEntry(String bookId, String message) {
        this(null, bookId, null, message);
    }
    // getter omitted here
}

Note that the createIndex parameter in the @Document annotation is set to false. We need to define an index template so that the index mapping will be assigned automatically to newly created instances when LogEntry records are written on a new day. The indexName is set to a SpEL expression that evaluates the current date and adds it to the name of the index.

The application class

We do this template creation in the application class in a method that is triggered by an ApplicationReadyEvent:

@SpringBootApplication
public class BlogSdeMultipleEntitiesApplication {

    private final ElasticsearchOperations operations;

    public BlogSdeMultipleEntitiesApplication(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    public static void main(String[] args) {
        SpringApplication.run(BlogSdeMultipleEntitiesApplication.class, args);
    }

    @EventListener(ApplicationReadyEvent.class)
    public void initIndexTemplates() {
        var indexOperations = operations.indexOps(LogEntry.class);
        var putTemplateRequest = PutTemplateRequest.builder("blog-sde-log-template", "blog-sde-log-*")
            .withMappings(indexOperations.createMapping())
            .build();
        indexOperations.putTemplate(putTemplateRequest);
    }
}

The repositories

The repository interfaces are pretty straight forward an minimalistic:

public interface BookRepository extends ElasticsearchRepository<Book, String> {
}

public interface LogEntryRepository extends ElasticsearchRepository<LogEntry, String> {
}

The BookController

We add a controller class to be able to store and retrieve books:

@RestController
@RequestMapping("/books")
public class BookController {

    private final BookRepository bookRepository;
    private final LogEntryRepository logEntryRepository;

    public BookController(BookRepository bookRepository, LogEntryRepository logEntryRepository) {
        this.bookRepository = bookRepository;
        this.logEntryRepository = logEntryRepository;
    }

    @PostMapping
    public Book post(@RequestBody Book book) {
        var savedBook = bookRepository.save(book);
        logEntryRepository.save(
            new LogEntry(savedBook.getId(), "saved book with ISBN: " + savedBook.getIsbn())
        );
        return savedBook;
    }

    @GetMapping("/{id}")
    public Book byId(@PathVariable String id) {
        return bookRepository.findById(id).orElseThrow(ResourceNotFoundException::new);
    }
}

When saving a book, the entity is stored in Elasticsearch, after that it’s id – which is assigned by Elasticsearch – is used to create the LogEntry. Check back at the LogEntry definition that the constructor we use here for the LogEntry sets the bookId property, not the id property of the LogEntry. The LogEntry is saved as well before the saved Book entity is sent back to the caller.

Startup and storing a book

After application startup, we have the index for the books and the template for the log entries (remember I use httpie as command line client):

$ http :9200/_cat/indices v==
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 162
content-type: text/plain; charset=UTF-8

health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   blog-sde-books hgtw0geyTA-UCzhlO41edg   1   1          0            0       208b           208b


$ http :9200/_template/blog-sde-log-template
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 178
content-type: application/json; charset=UTF-8

{
    "blog-sde-log-template": {
        "aliases": {},
        "index_patterns": [
            "blog-sde-log-*"
        ],
        "mappings": {
            "properties": {
                "bookId": {
                    "type": "keyword"
                },
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "epoch_millis",
                    "type": "date"
                }
            }
        },
        "order": 0,
        "settings": {}
    }
}

Now lets store a book and after that check whats in the indices:

$ http post :8080/books author="Josh Long" title="Reactive Spring" isbn="1732910413"
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Sun, 09 May 2021 17:00:21 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "author": "Josh Long",
    "id": "fhwSUnkBOIbMW1uMlwuZ",
    "isbn": "1732910413",
    "title": "Reactive Spring"
}


$ http :9200/_cat/indices v==
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 210
content-type: text/plain; charset=UTF-8

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   blog-sde-log-2021-05-09 Cs0A2aNmSKupZa95ujEMwg   1   1          1            0      5.1kb          5.1kb
yellow open   blog-sde-books          tEJp962YS2u_Xs_3Fc03qw   1   1          1            0      4.8kb          4.8kb


$ http :9200/blog-sde-books/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 286
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "fhwSUnkBOIbMW1uMlwuZ",
                "_index": "blog-sde-books",
                "_score": 1.0,
                "_source": {
                    "_class": "com.sothawo.blogsdemultipleentities.Book",
                    "author": "Josh Long",
                    "isbn": "1732910413",
                    "title": "Reactive Spring"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 1
}


$ http :9200/blog-sde-log-2021-05-09/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 322
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "fxwSUnkBOIbMW1uMmgsh",
                "_index": "blog-sde-log-2021-05-09",
                "_score": 1.0,
                "_source": {
                    "_class": "com.sothawo.blogsdemultipleentities.LogEntry",
                    "bookId": "fhwSUnkBOIbMW1uMlwuZ",
                    "message": "saved book with ISBN: 1732910413",
                    "timestamp": "1620579621129.587"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 4
}

So both entities are in their corresponding index.

Getting to the original problem

After all this setup we now get to the stuff this blog post is all about. Let me first show the call to the application and what it returns before showing how this is done:

$ http :8080/admin/fhwSUnkBOIbMW1uMlwuZ
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Sun, 09 May 2021 17:28:33 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "books": [
        {
            "content": {
                "author": "Josh Long",
                "id": "fhwSUnkBOIbMW1uMlwuZ",
                "isbn": "1732910413",
                "title": "Reactive Spring"
            },
            "highlightFields": {},
            "id": "fhwSUnkBOIbMW1uMlwuZ",
            "index": "blog-sde-books",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        }
    ],
    "logEntries": [
        {
            "content": {
                "bookId": "fhwSUnkBOIbMW1uMlwuZ",
                "id": "fxwSUnkBOIbMW1uMmgsh",
                "message": "saved book with ISBN: 1732910413",
                "timestamp": "2021-05-09T17:00:21.129587Z"
            },
            "highlightFields": {},
            "id": "fxwSUnkBOIbMW1uMmgsh",
            "index": "blog-sde-log-2021-05-09",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 0.2876821,
            "sortValues": []
        }
    ]
}

We are calling an admin endpoint with the id of the book and are getting back the book and log search hits for this book id. Let’s hav a look at the AdminController:

@RestController
@RequestMapping("/admin")
public class AdminController {

    private final ElasticsearchOperations operations;

    public AdminController(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @GetMapping("/{id}")
    public AdminData byId(@PathVariable String id) {

        var query = new NativeSearchQueryBuilder()
            .withQuery(queryStringQuery("_id:" + id + " OR bookId:" + id))
            .build();

        var converter = operations.getElasticsearchConverter();
        List<SearchHit<Book>> books = new ArrayList<>();
        List<SearchHit<LogEntry>> logEntries = new ArrayList<>();

        SearchHits<AllDocuments> searchHits = operations.search(query, AllDocuments.class, IndexCoordinates.of("blog-sde-*"));
        searchHits.forEach(searchHit -> {

            var indexName = searchHit.getIndex();
            if (indexName != null) {
                var document = Document.from(searchHit.getContent());
                if (searchHit.getId() != null) {
                    document.setId(searchHit.getId());
                }

                if (indexName.equals("blog-sde-books")) {
                    var book = converter.read(Book.class, document);
                    books.add(searchHit(book, searchHit));
                } else if (indexName.startsWith("blog-sde-log-")) {
                    var logEntry = converter.read(LogEntry.class, document);
                    logEntries.add(searchHit(logEntry, searchHit));
                }
            }
        });

        return new AdminData(books, logEntries);
    }

    private <T> SearchHit<T> searchHit(T content, SearchHit<?> source) {
        return new SearchHit<T>(source.getIndex(),
            source.getId(),
            source.getScore(),
            source.getSortValues().toArray(new Object[0]),
            source.getHighlightFields(),
            source.getInnerHits(),
            source.getNestedMetaData(),
            content);
    }
}

In the byId(String id) method we first build a query (line 14) searching for the given id in either the _id field of an Elasticsearch document (that will return books) or in the bookId field of a document (that will return log entries).

In the lines 18 to 20 we retrieve the converter from the ElasticsearchOperations, which we need to create our real entities and set up the lists for the results.

In line 22 is the single call to Elasticsearch which issues the query against all indices matching the given pattern, which in our case will be all log entry indices and the book index. As Spring Data Elasticsearch needs an entity class for this, we use a helper class called AllDocuments which is just an implementation of a Map<String, Object> and so can retrieve any JSON returned by Elasticsearch, whatever type it may be. I’ll show this class further down.

We then loop over the returned SearchHit instances. For each we get the name of the index where the document was found (line 25). We then convert the AllDocument instance into a Spring Data Elasticsearch Document which the converter needs as input and add the id of the found hit into this Document.

Now we need to determine into which of our entities we want to convert that document. We check the index name and then call the converter.read() method with the appropriate class parameter, and then store the entity in a new SearchHit in the result list in which it belongs.

One could argue that the entity could be automatically determined by checking the index name against the name provided in the @Document annotations. This is true for probably the most cases, but when the index name changes like it does here with the SpEL provided name, this does not work anymore. So we need that custom code in the application.

The missing classes

There are two classes I have not shown. The first is the AllDocuments class, this is a class implementing Map<String, Object> that just delegates all the methods it must implement to an internal LinkedHashMap instance, I just show the first delegating method, have a look at the GitHub project to see the full implementation.

class AllDocuments implements Map<String, Object> {

    private final Map<String, Object> delegate = new LinkedHashMap<>();

    @Override
    public int size() {
        return delegate.size();
    }

    // other methods omitted
}

The other one is the class returned to the user:

public class AdminData {
    private final List<SearchHit<Book>> books;
    private final List<SearchHit<LogEntry>> logEntries;

    AdminData(List<SearchHit<Book>> books, List<SearchHit<LogEntry>> logEntries) {
        this.books = books;
        this.logEntries = logEntries;
    }

    public List<SearchHit<Book>> getBooks() {
        return books;
    }

    public List<SearchHit<LogEntry>> getLogEntries() {
        return logEntries;
    }
}

Summary

This post shows how to read multiple different entities in Spring Data Elasticsearch with a single call to Elasticsearch searching across multiple indices. I think in most cases it may a cleaner design by searching for the different entities in separate class, but if you have the need to do it all in one call this shows how to do it.

You need a helper class and the custom index name to entity resolution, but the rest ist not too hard to do – ok, I admit I knew what to do as I am the maintainer of Spring Data Elasticsearch.

Check out the source from https://github.com/sothawo/blog-sde-multiple-entities and try it out!

Leave a Reply

Your comment will only be visible after moderation. Your email address will not be published.