Reading different entities from multiple indices with one call using Spring Data Elasticsearch

The problem

In Elasticsearch (the the current version at the time of writing this post is 7.12.1) every index holds exactly one type of data. This type is defined by the index mapping which describes the fields of the index and their types.

Spring Data Elasticsearch (the current version is 4.2) automatically maps between a Java class – the entity type – with its properties and the data in the Elasticsearch index. This means that when searching, the entity type must be specified. In Spring Data Repositories this is the type specified in the repository definition, when using the ElasticsearchOperations interface, the search function required a parameter specifying the class to return.

But Elasticsearch allows it to do a search query across different indices by specifying multiple indices or wildcard indices (with an index pattern like for example http://localhost:9200/index-a,other-ones-*/_search), it will return a list of JSON objects which contains data of the different types. These returned types cannot be automatically be mapped by Spring Data Elasticsearch.

In this post I will show how you can implement this behaviour in your application if needed.

Tools and code

The whole code for this sample project is available on GitHub at https://github.com/sothawo/blog-sde-multiple-entities. The project was created using the Spring initializr to create a Java 16 maven project, the only added dependencies are web and spring-data-elasticsearch. When showing code to access a REST interface – either Elasticsearch or the running application – I use httpie.

The sample scenario

We want to store information about books and additional log entries when a book is inserted or updated. For the books we will have an index named blog-sde-books and for the log entries we use multiple indices named blog-sde-log-yyyy-mm-dd where the actual date is used, meaning that after a couple of days we will have more than one index for log entries. When storing data in the log entries, we will add the id of the corresponding book to the log entry.

We then want to search with a book id in the books index and in all the log indices, but we only want to issue one call to Elasticsearch and retrieve all the values in one step.

The code

The entities

In the example I use two entities to store data in Elasticsearch. The first index is used to store information about books, the entity looks like this:

@Document(indexName = "blog-sde-books")
public class Book {
    @Id private final String id;
    @Field(type = FieldType.Text) private final String author;
    @Field(type = FieldType.Text) private final String title;
    @Field(type = FieldType.Keyword) private final String isbn;

    public Book(String id, String author, String title, String isbn) {
        this.id = id;
        this.author = author;
        this.title = title;
        this.isbn = isbn;
    }

    // getter omitted here
}

This is a standard Spring Data Elasticsearch entity definition. The LogEntry class is pretty simple as well, it just has an additional constructor for easier use and has the constructor that should be used by Spring Data Elasticsearch annotated with @PersistenceConstructor:

@Document(indexName = "blog-sde-log-#{T(java.time.LocalDate).now().toString()}", createIndex = false)
public class LogEntry {
    @Id private final String id;
    @Field(type = FieldType.Keyword) private final String bookId;
    @Field(type = FieldType.Date, format = DateFormat.epoch_millis) private final Instant timestamp;
    @Field(type = FieldType.Text) private final String message;

    @PersistenceConstructor
    public LogEntry(String id, String bookId, Instant timestamp, String message) {
        if (timestamp == null) {
            timestamp = Instant.now();
        }
        this.id = id;
        this.bookId = bookId;
        this.timestamp = timestamp;
        this.message = message;
    }
    
    public LogEntry(String bookId, String message) {
        this(null, bookId, null, message);
    }
    // getter omitted here
}

Note that the createIndex parameter in the @Document annotation is set to false. We need to define an index template so that the index mapping will be assigned automatically to newly created instances when LogEntry records are written on a new day. The indexName is set to a SpEL expression that evaluates the current date and adds it to the name of the index.

The application class

We do this template creation in the application class in a method that is triggered by an ApplicationReadyEvent:

@SpringBootApplication
public class BlogSdeMultipleEntitiesApplication {

    private final ElasticsearchOperations operations;

    public BlogSdeMultipleEntitiesApplication(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    public static void main(String[] args) {
        SpringApplication.run(BlogSdeMultipleEntitiesApplication.class, args);
    }

    @EventListener(ApplicationReadyEvent.class)
    public void initIndexTemplates() {
        var indexOperations = operations.indexOps(LogEntry.class);
        var putTemplateRequest = PutTemplateRequest.builder("blog-sde-log-template", "blog-sde-log-*")
            .withMappings(indexOperations.createMapping())
            .build();
        indexOperations.putTemplate(putTemplateRequest);
    }
}

The repositories

The repository interfaces are pretty straight forward an minimalistic:

public interface BookRepository extends ElasticsearchRepository<Book, String> {
}

public interface LogEntryRepository extends ElasticsearchRepository<LogEntry, String> {
}

The BookController

We add a controller class to be able to store and retrieve books:

@RestController
@RequestMapping("/books")
public class BookController {

    private final BookRepository bookRepository;
    private final LogEntryRepository logEntryRepository;

    public BookController(BookRepository bookRepository, LogEntryRepository logEntryRepository) {
        this.bookRepository = bookRepository;
        this.logEntryRepository = logEntryRepository;
    }

    @PostMapping
    public Book post(@RequestBody Book book) {
        var savedBook = bookRepository.save(book);
        logEntryRepository.save(
            new LogEntry(savedBook.getId(), "saved book with ISBN: " + savedBook.getIsbn())
        );
        return savedBook;
    }

    @GetMapping("/{id}")
    public Book byId(@PathVariable String id) {
        return bookRepository.findById(id).orElseThrow(ResourceNotFoundException::new);
    }
}

When saving a book, the entity is stored in Elasticsearch, after that it’s id – which is assigned by Elasticsearch – is used to create the LogEntry. Check back at the LogEntry definition that the constructor we use here for the LogEntry sets the bookId property, not the id property of the LogEntry. The LogEntry is saved as well before the saved Book entity is sent back to the caller.

Startup and storing a book

After application startup, we have the index for the books and the template for the log entries (remember I use httpie as command line client):

$ http :9200/_cat/indices v==
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 162
content-type: text/plain; charset=UTF-8

health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   blog-sde-books hgtw0geyTA-UCzhlO41edg   1   1          0            0       208b           208b


$ http :9200/_template/blog-sde-log-template
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 178
content-type: application/json; charset=UTF-8

{
    "blog-sde-log-template": {
        "aliases": {},
        "index_patterns": [
            "blog-sde-log-*"
        ],
        "mappings": {
            "properties": {
                "bookId": {
                    "type": "keyword"
                },
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "epoch_millis",
                    "type": "date"
                }
            }
        },
        "order": 0,
        "settings": {}
    }
}

Now lets store a book and after that check whats in the indices:

$ http post :8080/books author="Josh Long" title="Reactive Spring" isbn="1732910413"
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Sun, 09 May 2021 17:00:21 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "author": "Josh Long",
    "id": "fhwSUnkBOIbMW1uMlwuZ",
    "isbn": "1732910413",
    "title": "Reactive Spring"
}


$ http :9200/_cat/indices v==
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 210
content-type: text/plain; charset=UTF-8

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   blog-sde-log-2021-05-09 Cs0A2aNmSKupZa95ujEMwg   1   1          1            0      5.1kb          5.1kb
yellow open   blog-sde-books          tEJp962YS2u_Xs_3Fc03qw   1   1          1            0      4.8kb          4.8kb


$ http :9200/blog-sde-books/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 286
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "fhwSUnkBOIbMW1uMlwuZ",
                "_index": "blog-sde-books",
                "_score": 1.0,
                "_source": {
                    "_class": "com.sothawo.blogsdemultipleentities.Book",
                    "author": "Josh Long",
                    "isbn": "1732910413",
                    "title": "Reactive Spring"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 1
}


$ http :9200/blog-sde-log-2021-05-09/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 322
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "fxwSUnkBOIbMW1uMmgsh",
                "_index": "blog-sde-log-2021-05-09",
                "_score": 1.0,
                "_source": {
                    "_class": "com.sothawo.blogsdemultipleentities.LogEntry",
                    "bookId": "fhwSUnkBOIbMW1uMlwuZ",
                    "message": "saved book with ISBN: 1732910413",
                    "timestamp": "1620579621129.587"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 1
        }
    },
    "timed_out": false,
    "took": 4
}

So both entities are in their corresponding index.

Getting to the original problem

After all this setup we now get to the stuff this blog post is all about. Let me first show the call to the application and what it returns before showing how this is done:

$ http :8080/admin/fhwSUnkBOIbMW1uMlwuZ
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Sun, 09 May 2021 17:28:33 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "books": [
        {
            "content": {
                "author": "Josh Long",
                "id": "fhwSUnkBOIbMW1uMlwuZ",
                "isbn": "1732910413",
                "title": "Reactive Spring"
            },
            "highlightFields": {},
            "id": "fhwSUnkBOIbMW1uMlwuZ",
            "index": "blog-sde-books",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        }
    ],
    "logEntries": [
        {
            "content": {
                "bookId": "fhwSUnkBOIbMW1uMlwuZ",
                "id": "fxwSUnkBOIbMW1uMmgsh",
                "message": "saved book with ISBN: 1732910413",
                "timestamp": "2021-05-09T17:00:21.129587Z"
            },
            "highlightFields": {},
            "id": "fxwSUnkBOIbMW1uMmgsh",
            "index": "blog-sde-log-2021-05-09",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 0.2876821,
            "sortValues": []
        }
    ]
}

We are calling an admin endpoint with the id of the book and are getting back the book and log search hits for this book id. Let’s hav a look at the AdminController:

@RestController
@RequestMapping("/admin")
public class AdminController {

    private final ElasticsearchOperations operations;

    public AdminController(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @GetMapping("/{id}")
    public AdminData byId(@PathVariable String id) {

        var query = new NativeSearchQueryBuilder()
            .withQuery(queryStringQuery("_id:" + id + " OR bookId:" + id))
            .build();

        var converter = operations.getElasticsearchConverter();
        List<SearchHit<Book>> books = new ArrayList<>();
        List<SearchHit<LogEntry>> logEntries = new ArrayList<>();

        SearchHits<AllDocuments> searchHits = operations.search(query, AllDocuments.class, IndexCoordinates.of("blog-sde-*"));
        searchHits.forEach(searchHit -> {

            var indexName = searchHit.getIndex();
            if (indexName != null) {
                var document = Document.from(searchHit.getContent());
                if (searchHit.getId() != null) {
                    document.setId(searchHit.getId());
                }

                if (indexName.equals("blog-sde-books")) {
                    var book = converter.read(Book.class, document);
                    books.add(searchHit(book, searchHit));
                } else if (indexName.startsWith("blog-sde-log-")) {
                    var logEntry = converter.read(LogEntry.class, document);
                    logEntries.add(searchHit(logEntry, searchHit));
                }
            }
        });

        return new AdminData(books, logEntries);
    }

    private <T> SearchHit<T> searchHit(T content, SearchHit<?> source) {
        return new SearchHit<T>(source.getIndex(),
            source.getId(),
            source.getScore(),
            source.getSortValues().toArray(new Object[0]),
            source.getHighlightFields(),
            source.getInnerHits(),
            source.getNestedMetaData(),
            content);
    }
}

In the byId(String id) method we first build a query (line 14) searching for the given id in either the _id field of an Elasticsearch document (that will return books) or in the bookId field of a document (that will return log entries).

In the lines 18 to 20 we retrieve the converter from the ElasticsearchOperations, which we need to create our real entities and set up the lists for the results.

In line 22 is the single call to Elasticsearch which issues the query against all indices matching the given pattern, which in our case will be all log entry indices and the book index. As Spring Data Elasticsearch needs an entity class for this, we use a helper class called AllDocuments which is just an implementation of a Map<String, Object> and so can retrieve any JSON returned by Elasticsearch, whatever type it may be. I’ll show this class further down.

We then loop over the returned SearchHit instances. For each we get the name of the index where the document was found (line 25). We then convert the AllDocument instance into a Spring Data Elasticsearch Document which the converter needs as input and add the id of the found hit into this Document.

Now we need to determine into which of our entities we want to convert that document. We check the index name and then call the converter.read() method with the appropriate class parameter, and then store the entity in a new SearchHit in the result list in which it belongs.

One could argue that the entity could be automatically determined by checking the index name against the name provided in the @Document annotations. This is true for probably the most cases, but when the index name changes like it does here with the SpEL provided name, this does not work anymore. So we need that custom code in the application.

The missing classes

There are two classes I have not shown. The first is the AllDocuments class, this is a class implementing Map<String, Object> that just delegates all the methods it must implement to an internal LinkedHashMap instance, I just show the first delegating method, have a look at the GitHub project to see the full implementation.

class AllDocuments implements Map<String, Object> {

    private final Map<String, Object> delegate = new LinkedHashMap<>();

    @Override
    public int size() {
        return delegate.size();
    }

    // other methods omitted
}

The other one is the class returned to the user:

public class AdminData {
    private final List<SearchHit<Book>> books;
    private final List<SearchHit<LogEntry>> logEntries;

    AdminData(List<SearchHit<Book>> books, List<SearchHit<LogEntry>> logEntries) {
        this.books = books;
        this.logEntries = logEntries;
    }

    public List<SearchHit<Book>> getBooks() {
        return books;
    }

    public List<SearchHit<LogEntry>> getLogEntries() {
        return logEntries;
    }
}

Summary

This post shows how to read multiple different entities in Spring Data Elasticsearch with a single call to Elasticsearch searching across multiple indices. I think in most cases it may a cleaner design by searching for the different entities in separate class, but if you have the need to do it all in one call this shows how to do it.

You need a helper class and the custom index name to entity resolution, but the rest ist not too hard to do – ok, I admit I knew what to do as I am the maintainer of Spring Data Elasticsearch.

Check out the source from https://github.com/sothawo/blog-sde-multiple-entities and try it out!

The mystery of the endless Throwable’s cause chain shown in the IntelliJ debugger

A Throwable in Java can have a cause which in turn is a Throwable as well. This cause can have a cause and we can have a chain of Throwableobjects which we can follow until a Throwable has no cause. Let me explain this with a small example:

class Scratch {
  public static void main(String[] args) {
    try {
      try {
        int division = divide(4, 0);
      } catch (Exception e) {
        throw new IllegalArgumentException("got exception calling divide", e);
      }
    } catch (Throwable t) {
      while (t != null) {
        System.out.println(t.getClass().getCanonicalName());
        t = t.getCause();
      }
    }
  }

  private static int divide(int divident, int divisor) {
    return divident / divisor;
  }
}

In this example we call a function to divide two numbers and pass in a divisor with value 0. This of course leads to an ArithmeticException which we catch and set as the cause for a new IllegalArgumentException. We then catch this new exception and print out the cause-chain . Running this program gives the expected result:

java.lang.IllegalArgumentException
java.lang.ArithmeticException

Process finished with exit code 0

No let’s debug this program in IntelliJ and stop at line ten, and lets examine the value of t:

What’s this? We see the IllegalArgumentException and then the ArithmeticException. But then the cause of the ArithmeticException seems to be the very same object creating an endless chain of causes. We did not observe this in the code, so why does the debugger display this information?

To understand this we need to look at the Throwable class and it’s cause property. The cause of a Throwable can be either set as constructor argument or by calling the method initCause(Throwable) on an existing throwable. The Javadoc for initCause() states that the method throws an exception if the parameter is either the object itself (A throwable cannot be it’s own cause) or if a cause was already set. So how does the throwable know if it’s cause was already set? The trick for this is that the cause property is initialized to this in the field’s initializer:

/**
 * The throwable that caused this throwable to get thrown, or null if this
 * throwable was not caused by another throwable, or if the causative
 * throwable is unknown.  If this field is equal to this throwable itself,
 * it indicates that the cause of this throwable has not yet been
 * initialized.
 *
 * @serial
 * @since 1.4
 */
private Throwable cause = this;

The getCause() method returns null if the cause was not initialized, otherwise the value that was set:

public synchronized Throwable getCause() {
    return (cause==this ? null : cause);
}

And the code to set a cause checks this as well:

public synchronized Throwable initCause(Throwable cause) {
    if (this.cause != this)
        throw new IllegalStateException("Can't overwrite cause with " +
                                        Objects.toString(cause, "a null"), this);
    if (cause == this)
        throw new IllegalArgumentException("Self-causation not permitted", this);
    this.cause = cause;
    return this;
}

So when using getCause() to follow the chain we end at a Throwable that has either no cause set or where the cause was explicitly set to null.

And the debugger? The debugger does not call the getCause() method but uses introspection of the exception and directly reads the value of the cause property. And this is the inspected object again leading to this endless chain of causes that are displyed.

Implement a rolling index strategy with Spring Data Elasticsearch 4.1

With the release of version 4.1 Spring Data Elasticsearch now supports the index templates of Elasticsearch. Index templates allow the user to define settings, mappings and aliases for indices that are automatically created by Elasticsearch when documents are saved to a not yet existing index.

In this blog post I will show how index templates can be used in combination with Spring Data Repository customizations to implement a rolling index strategy where new indices will be created automatically based on the date.

You should be familiar with the basic concepts of Spring Data Repositories and the use of Spring Data Elasticsearch.

As the most popular use case for rolling indexes is storing log entries in Elasticsearch, we will do something similar. Our application will offer an HTTP endpoint where a client can POST a message, this message will be stored in an index that is named msg-HH-MM where the index name will contain the hour and the minute when the message was received. Normally that would be something containing the date, but to be able to see this working, we need some different naming scheme.

When the user issues a GET request with a search word, the application will search across all indices by using the alias name msg which we will set up as an alias for all the msg-* indices.

Basic setup

The program

The source code for this example is available on GitHub. This project was set up using start.spring.io, selecting a Spring Boot 2.4.0 application with web and spring-data-elasticsearch support and Java version 15.

Note: I make use of Java 15 features like var definition of variables, this is not necessary for Spring Data Elasticsearch, you still can use Java 8 if you need to.

Elasticsearch

In order to run this example we need an Elasticsearch cluster, I use version 7.9.3 because that’s the version that Spring Data Elasticsearch 4.1, the version Spring Boot pulls in, is built with. I have downloaded Elasticsearch and have it running on my machine, accessible at http://localhost:9200. Please adjust the setup in the application configuration at  src/main/resources/application.yml accordingly.

Command line client

In order to access our program and to check what is stored in Elasticsearch I use httpie. An alternative would be curl.

The different parts of the application

The entity

The entity we use in this example looks like this:

@Document(indexName = "msg", createIndex = false)
public class Message {
    @Id private String id;

    @Field(type = FieldType.Text)
    private String message;

    @Field(type = FieldType.Date, format = DateFormat.date_hour_minute_second)
    private LocalDateTime timestamp = LocalDateTime.now();

    // getter/setter omitted here for brevity
}

Please note the following points:

  • the index name is set to msg, this will be the alias name that will point to all the different indices that will be created. Spring Data Repository methods will without adaption use this name. This is ok for reading, we will set up the writing part later.
  • the createIndex argument of the @Document annotation is set to false. We don’t want the application to automatically create an index named msg as Elasticsearch will automatically create the indices when documents are stored.
  • the properties are explicitly annotated with their types, so that the correct index mapping can be stored in the index template and later automatically be applied to a new created index.

The index template

To initialize the index template, we use a Spring Component:

@Component
public class TemplateInitializer {

    private static final String TEMPLATE_NAME = "msg-template";
    private static final String TEMPLATE_PATTERN = "msg-*";
    
    private final ElasticsearchOperations operations;

    public TemplateInitializer(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Autowired
    public void setup() {

        var indexOps = operations.indexOps(Message.class);

        if (!indexOps.existsTemplate(TEMPLATE_NAME)) {

            var mapping = indexOps.createMapping();

            var aliasActions = new AliasActions().add(
                    new AliasAction.Add(AliasActionParameters.builderForTemplate()
                            .withAliases(indexOps.getIndexCoordinates().getIndexNames())
                            .build())
            );

            var request = PutTemplateRequest.builder(TEMPLATE_NAME, TEMPLATE_PATTERN)
                    .withMappings(mapping)
                    .withAliasActions(aliasActions)
                    .build();

            indexOps.putTemplate(request);
        }
    }
}

This bean class has a method setup() that is annotated with @Autowired. A method with this annotation will be executed once when the beans in the Spring ApplicationContext are all setup. So in the setup() method we can be sure that the injected ElasticsearchOperations instance has been set.

To work with the index templates we need an implementation of the IndexOperations interface which we get from the operations object. We then check if the index template already exists, as this initialization should only be done once.

If the index template does not exist, we first create the index mapping with indexOps.createMapping(). As the indexOps was bound to the Message class when we created it, the annotations from the Message class are used to create the mapping.

The next step is to create an AliasAction that will add an alias to an index when it is created. The name for the alias is retrieved from the Message class with indexOps.getIndexCoordinates().getIndexNames().

We then put the mapping and the alias action in a PutTemplateRequest together with a name for the template and the pattern when this template should be applied and send it off to Elasticsearch.

The repository

The Spring Data Repository we use is pretty simple:

public interface MessageRepository extends ElasticsearchRepository<Message, String> {

    SearchHits<Message> searchAllBy();

    SearchHits<Message> searchByMessage(String text);
}

It extends ElasticsearchRepository and defines one method to retrieve all messages and a second one to search for text in a message.

The repository customization

We now need to customize the repository as we want our own methods to be used when saving Message ojects to the index. In these methods we will set the correct index name. we do this by defining a new interface CustomMessageRepository. As we want to redefine methods that are already defined in the CrudRepository interface (which our MessageRepository already extends), it is important that our methods have exactly the same signature as the methods from CrudRepository. This is the reason we need to make this interface generic:

public interface CustomMessageRepository<T> {

    <S extends T> S save(S entity);

    <S extends T> Iterable<S> saveAll(Iterable<S> entities);
}

We provide an implementation of this interface in the class CustomMessageRepositoryImpl. This must have the same name as the interface with the suffix Impl, so that Spring Data can pick up this implementation:

public class CustomMessageRepositoryImpl implements CustomMessageRepository<Message> {

    final private ElasticsearchOperations operations;

    public CustomMessageRepositoryImpl(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Override
    public <S extends Message> S save(S entity) {
        return operations.save(entity, indexName());
    }

    @Override
    public <S extends Message> Iterable<S> saveAll(Iterable<S> entities) {
        return operations.save(entities, indexName());
    }

    public IndexCoordinates indexName() {
        var indexName = "msg-" +
                LocalTime.now().truncatedTo(ChronoUnit.MINUTES).toString().replace(':', '-');
        return IndexCoordinates.of(indexName);
    }
}

We have an ElasticsearchOperation instance injected (no need to define this as @Component, Spring Data detects this by the class name and does the injection). The index name is provided by the indexName() method which uses the hour and minute to provide an index name of the pattern msg-HH-MM using the current time. A real life scenario would probably use the date instead of the time, but as we want test this with different entities and not wait a whole day between inserting them, this should be fine for now.

In the implementations of our save methods, we call the ElasticsearchOperations´s save method but provide our own index name, so that the one from the @Document annotation is not taken.

A last step we need to do is to have our MessageRepository implement this new repository as well:

public interface MessageRepository extends ElasticsearchRepository<Message, String>, CustomMessageRepository<Message> {

    SearchHits<Message> searchAllBy();

    SearchHits<Message> searchAllByMessage(String text);
}

oops, the controller

And of course we need something to test this all, so here we hav a simple controller to store and retrieve messages:

@RestController
@RequestMapping("/messages")
public class MessageController {

    private final MessageRepository repository;

    public MessageController(MessageRepository repository) {
        this.repository = repository;
    }

    @PostMapping
    public Message add(@RequestBody Message message) {
        return repository.save(message);
    }

    @GetMapping
    public SearchHits<Message> messages() {
        return repository.searchAllBy();
    }

    @GetMapping("/{text}")
    public SearchHits<Message> messages(@PathVariable("text") String text) {
        return repository.searchAllByMessage(text);
    }
}

This is just a plain old Spring REST controller with nothing special.

Let’s see it in action

Now let’s start up the program and check what we have (remember, I use httpie as a client).

In the beginning there are no indices:

$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-length: 0
content-type: text/plain; charset=UTF-8

We check out the templates:

$ http :9200/_template/msg-template
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 165
content-type: application/json; charset=UTF-8

{
    "msg-template": {
        "aliases": {
            "msg": {}
        },
        "index_patterns": [
            "msg-*"
        ],
        "mappings": {
            "properties": {
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "date_hour_minute_second",
                    "type": "date"
                }
            }
        },
        "order": 0,
        "settings": {}
    }
}

The template definition with the mapping and alias definition is there. Now let’s add an entry:

$ http post :8080/messages message="this is the first message"
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:10:59 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "id": "TwYL2HUBIlu2470f4r6Y",
    "message": "this is the first message",
    "timestamp": "2020-11-17T22:10:58.541117"
}

We see that this message was persisted at 22:10, what about the indices?

$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 83
content-type: text/plain; charset=UTF-8

yellow open msg-22-10 bFfnss5wR8CuLOmSfJPDDw 1 1 1 0 4.5kb 4.5kb

We have a new index named msg-22-10, let’s check it’s setup:

$ http :9200/msg-22-10
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 326
content-type: application/json; charset=UTF-8

{
    "msg-22-10": {
        "aliases": {
            "msg": {}
        },
        "mappings": {
            "properties": {
                "_class": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "message": {
                    "type": "text"
                },
                "timestamp": {
                    "format": "date_hour_minute_second",
                    "type": "date"
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1605647458601",
                "number_of_replicas": "1",
                "number_of_shards": "1",
                "provided_name": "msg-22-10",
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "uuid": "bFfnss5wR8CuLOmSfJPDDw",
                "version": {
                    "created": "7100099"
                }
            }
        }
    }
}

Let’s add another one:

$ http post :8080/messages message="this is the second message"                                           
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:13:52 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "id": "UAYO2HUBIlu2470fiL7G",
    "message": "this is the second message",
    "timestamp": "2020-11-17T22:13:52.336695"
}


$ http :9200/_cat/indices
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 112
content-type: text/plain; charset=UTF-8

yellow open msg-22-13 gvs12CQvTOmdvqsQz7k6yw 1 1 1 0 4.5kb 4.5kb
yellow open msg-22-10 bFfnss5wR8CuLOmSfJPDDw 1 1 1 0 4.5kb 4.5kb

So we have two indices now. Now let’s get all the entries from our application:

$ http :8080/messages
HTTP/1.1 200
Connection: keep-alive
Content-Type: application/json
Date: Tue, 17 Nov 2020 21:15:57 GMT
Keep-Alive: timeout=60
Transfer-Encoding: chunked

{
    "aggregations": null,
    "empty": false,
    "maxScore": 1.0,
    "scrollId": null,
    "searchHits": [
        {
            "content": {
                "id": "TwYL2HUBIlu2470f4r6Y",
                "message": "this is the first message",
                "timestamp": "2020-11-17T22:10:58"
            },
            "highlightFields": {},
            "id": "TwYL2HUBIlu2470f4r6Y",
            "index": "msg-22-10",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        },
        {
            "content": {
                "id": "UAYO2HUBIlu2470fiL7G",
                "message": "this is the second message",
                "timestamp": "2020-11-17T22:13:52"
            },
            "highlightFields": {},
            "id": "UAYO2HUBIlu2470fiL7G",
            "index": "msg-22-13",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 1.0,
            "sortValues": []
        }
    ],
    "totalHits": 2,
    "totalHitsRelation": "EQUAL_TO"
}

We get both entries. As we are returning SearchHits<Message> we also get the information in which index each result was found; this is important if you might want to edit one of these entries and store it again in it’s original index.

Let’s sum it up

We have defined and stored an index template that allows us to specify mappings and aliases for automatically created indices. We have set up our applicaion to read from the alias and to write to a dynamically created index name and so have implemented a rolling index pattern for our Elasticsearch storage all from within Spring Data Elasticsearch.

I hope you enjoyed this example.

Search maven from the commandline using httpie and jq

Sometimes I like to know what the latest version of a library in maven central is. As I don’t like to always open a browser tab for search but almost always have a terminal window at hand, I just wrote s small script that uses httpie and jq to get the information I want (that’s all one line in the script):

#!/usr/bin/env sh

http search.maven.org/solrsearch/select rows==10000 q=="$1" | jq '.response.docs[] | .timestamp |= (. / 1000 | todate) | (.timestamp)+" "+(.id)+":"+(.latestVersion)' | tr -d '"' | sort -k2

The script retrieves the searched values from search.maven.com, setting the limit to 1000, the default is 10. Then it uses jq to parse the returned JSON. The timestamp field is formatted to a readable format and then is printed along with the maven id and the latest version number. The quotes are removed from the output and it is sorted by the artifact id.

I saved this script as smvn and made it executable.

A sample run:

$ smvn "springframework elasticsearch"
2020-09-17T14:37:19Z org.springframework.boot:spring-boot-starter-data-elasticsearch:2.3.4.RELEASE
2020-09-16T10:10:50Z org.springframework.data:spring-data-elasticsearch:4.0.4.RELEASE

 

How to use Elasticsearch’s range types with Spring Data Elasticsearch

Elasticsearch allows the data, that is stored in a document, to be not only of elementary types, but also of a range of types, see the documentation. With a short example I will explain this range type and how to use it in Spring Data Elasticsearch (the current version being 4.0.3).

For this example we want be able to answer the question: “Who was president of the United States of America in the year X?”. We will store in Elasticsearch a document describing a president with the name and his term, defined be a range of years, defined by a from and to value. We will then query the index for documents where this term range contains a given value

The first thing we need to define is our entity. I named it President:

@Document(indexName = "presidents")
public class President {
    @Id
    private String id;

    @Field(type = FieldType.Text)
    private String name;

    @Field(type = FieldType.Integer_Range)
    private Term term;

    static President of(String name, Integer from, Integer to) {
        return new President(name, new Term(from, to));
    }

    public President() {
    }

    public President(String name, Term term) {
        this(UUID.randomUUID().toString(), name, term);
    }

    public President(String id, String name, Term term) {
        this.id = id;
        this.name = name;
        this.term = term;
    }

    // getter/setter

    static class Term {
        @Field(name = "gte")
        private Integer from;
        @Field(name = "lte")
        private Integer to;

        public Term() {
        }

        public Term(Integer from, Integer to) {
            this.from = from;
            this.to = to;
        }

        // getter/setter
    }
}

There are the standard annotations for a Spring Data Elasticsearch entity like @Document and @Id, but in addition there is the property term that is annotated with @Field(type = FieldType.Integer_Range) (line 9). This marks it as an integer range property. The Term class is defined as inner class at line 31 (not to be confused with the Elasticsearch Term), it defines the term of a president with the two value from and to. Elasticsearch needs for a range the fields to be named gte and lte, this we achieve by defining these names with the @Field annotations in lines 32 and 34.

The rest is just a basic repository:

public interface PresidentRepository extends ElasticsearchRepository<President, String> {
    SearchHits<President> searchByTerm(Integer year);
}

Here we use a single Integer as value because Elasticsearch does the magic by finding the corresponding entries where the searched value is in the range of the stored documents.

And of yourse we have some Controller using it. This Controller has one endpoint that loads the presidents since World War II into Elasticsearch, and a second one returns the desired results:

@RequestMapping("presidents")
public class PresidentController {

    private final PresidentRepository repository;

    public PresidentController(PresidentRepository repository) {
        this.repository = repository;
    }

    @GetMapping("/load")
    public void load() {
        repository.saveAll(Arrays.asList(
                President.of("Dwight D Eisenhower", 1953, 1961),
                President.of("Lyndon B Johnson", 1963, 1969),
                President.of("Richard Nixon", 1969, 1974),
                President.of("Gerald Ford", 1974, 1977),
                President.of("Jimmy Carter", 1977, 1981),
                President.of("Ronald Reagen", 1981, 1989),
                President.of("George Bush", 1989, 1993),
                President.of("Bill Clinton", 1993, 2001),
                President.of("George W Bush", 2001, 2009),
                President.of("Barack Obama", 2009, 2017),
                President.of("Donald Trump", 2017, 2021)));
    }

    @GetMapping("/term/{year}")
    public SearchHits<President> searchByTerm(@PathVariable Integer year) {
        return repository.searchByTerm(year);
    }
}

See it in action (I am using HTTPie), my application is listening on port 9090:

$ http -b :9090/presidents/term/2009
{
    "aggregations": null,
    "empty": false,
    "maxScore": 1.0,
    "scrollId": null,
    "searchHits": [
        {
            "content": {
                "id": "c3a3a0d0-d835-4a02-a2e8-20cc1c0e9285",
                "name": "George W Bush",
                "term": {
                    "from": 2001,
                    "to": 2009
                }
            },
            "highlightFields": {},
            "id": "c3a3a0d0-d835-4a02-a2e8-20cc1c0e9285",
            "score": 1.0,
            "sortValues": []
        },
        {
            "content": {
                "id": "36416746-ff11-4243-a4f3-a6bb0cff9a93",
                "name": "Barack Obama",
                "term": {
                    "from": 2009,
                    "to": 2017
                }
            },
            "highlightFields": {},
            "id": "36416746-ff11-4243-a4f3-a6bb0cff9a93",
            "score": 1.0,
            "sortValues": []
        }
    ],
    "totalHits": 2,
    "totalHitsRelation": "EQUAL_TO"
}

$http -b :9090/presidents/term/2000
{
    "aggregations": null,
    "empty": false,
    "maxScore": 1.0,
    "scrollId": null,
    "searchHits": [
        {
            "content": {
                "id": "984fdf87-a7d8-4dc2-b2e8-5dd948065147",
                "name": "Bill Clinton",
                "term": {
                    "from": 1993,
                    "to": 2001
                }
            },
            "highlightFields": {},
            "id": "984fdf87-a7d8-4dc2-b2e8-5dd948065147",
            "score": 1.0,
            "sortValues": []
        }
    ],
    "totalHits": 1,
    "totalHitsRelation": "EQUAL_TO"
}

So just with putting the right types and names into our @Field annotations we are able to use the range types of Elasticsearch in our Spring Data Elasticsearch application.

Search entities within a geographic distance with Spring Data Elasticsearch 4

A couple of months ago I published the post Using geo-distance sort in Spring Data Elasticsearch 4. In the comments there came up the question “What about searching within a distance?”

Well, this is not supported by query derivation from the method name, but it can easily be done with a custom repository implementation (see the documentation for more information about that).

I updated the example – which is available on GitHub – and will explain what is needed for this implementation. I won’t describe the entity and setup, please check the original post for that.

The custom repository interface

First we need to define a new repository interface that defines the method we want to provide:

public interface FoodPOIRepositoryCustom {

    /**
     * search all {@link FoodPOI} that are within a given distance of a point
     *
     * @param geoPoint
     *     the center point
     * @param distance
     *     the distance
     * @param unit
     *     the distance unit
     * @return the found entities
     */
    List<SearchHit<FoodPOI>> searchWithin(GeoPoint geoPoint, Double distance, String unit);
}

The custom repository implementation

Next we need to provide an implementation, important here is that this is named like the interface with the suffix “Impl”:

public class FoodPOIRepositoryCustomImpl implements FoodPOIRepositoryCustom {

    private final ElasticsearchOperations operations;

    public FoodPOIRepositoryCustomImpl(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Override
    public List<SearchHit<FoodPOI>> searchWithin(GeoPoint geoPoint, Double distance, String unit) {

        Query query = new CriteriaQuery(
          new Criteria("location").within(geoPoint, distance.toString() + unit)
        );

        // add a sort to get the actual distance back in the sort value
        Sort sort = Sort.by(new GeoDistanceOrder("location", geoPoint).withUnit(unit));
        query.addSort(sort);

        return operations.search(query, FoodPOI.class).getSearchHits();
    }
}

In this implementation we have an ElasticsearchOperations instance injected by Spring. In the method implementation we build a NativeSearchQuery that specifies the distance query we want. In addition to that we add a GeoDistanceSort to have the actual distance of the found entities in the output. We pass this query to the ElasticsearchOperations instance and return the search result.

Adapt the repository

We need to add the new interface to our FoodPOIRepository definition, which otherwise is unchanged:

public interface FoodPOIRepository extends ElasticsearchRepository<FoodPOI, String>, FoodPOIRepositoryCustom {

    List<SearchHit<FoodPOI>> searchTop3By(Sort sort);

    List<SearchHit<FoodPOI>> searchTop3ByName(String name, Sort sort);
}

Use it in the controller

In the rest controller, there is a new method that uses the distance search:

@PostMapping("/within")
List<ResultData> withinDistance(@RequestBody RequestData requestData) {

    GeoPoint location = new GeoPoint(requestData.getLat(), requestData.getLon());

    List<SearchHit<FoodPOI>> searchHits
        = repository.searchWithin(location, requestData.distance, requestData.unit);

    return toResultData(searchHits);
}

private List<ResultData> toResultData(List<SearchHit<FoodPOI>> searchHits) {
    return searchHits.stream()
        .map(searchHit -> {
            Double distance = (Double) searchHit.getSortValues().get(0);
            FoodPOI foodPOI = searchHit.getContent();
            return new ResultData(foodPOI.getName(), foodPOI.getLocation(), distance);
        }).collect(Collectors.toList());
}

We extract the needed parameters from the requestData that came in, call our repository method and convert the results to our output format.

And that’s it

So with a small custom repository implementation we were able to add the desired functionality to our repository

mapjfx display problem on Windows 10 seems solved

For some time now there was a bug that the map was not displaying properly on some Windows systems, see

It seems this was because of a bug in the WebView from JavaFX https://bugs.openjdk.java.net/browse/JDK-8234471. Thanks to https://github.com/vewert and https://github.com/Abu-Abdullah investigating into this.

This issue was fixed with JavaFX15, I tried this on a virtual machine with Windows10 and could not reproduce the error anymore.

There is no need to update mapjfx to JavaFX15 (as macOS and *nix are not hit by this bug). If you are on Windows10 you need to add the following dependency to your application:

<dependency>
    <groupId>org.openjfx</groupId>
    <artifactId>javafx-web</artifactId>
    <version>16-ea+1</version>
</dependency>

I tried 16-ea+1 and 15-ea+8, the version should be the same that is used for the whole application.

 

Use an index name defined by the entity to store data in Spring Data Elasticsearch 4.0

When using Spring Data Elasticsearch (I am referencing the current version 4.0.2), normally the name of the index where the documents are stored is taken from the @Document annotation of the entity class – here it’s books:

@Document(indexName="books")
public class Book {
  // ...
}

Recently in a discussion of a Pull Request in Spring Data Elasticsearch, someone told that she needed a possibility to extract the name from the entity itself, as entities might go to different indices.

In this post I will show how this can be done by using Spring Data Repository customization by providing a custom implementation for the save method. A complete solution would need to customize saveAll and other methods as well, but I will restrict this here to just one method.

The Hotel entity

For this post I will use an entity describing a hotel, with the idea that hotels from different countries should be stored in different Elasticsearch indices. The index name in the annotation is a wildcard name so that when searching for hotels all indices are considered.

Hotel.java

package com.sothawo.springdataelastictest;

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.lang.Nullable;

/**
 * @author P.J. Meisch (pj.meisch@sothawo.com)
 */
@Document(indexName = "hotel-*", createIndex = false)
public class Hotel {
    @Id
    @Nullable
    private String id;

    @Field(type = FieldType.Text)
    @Nullable
    private String name;

    @Field(type = FieldType.Keyword)
    @Nullable
    private String countryCode;

    // getter/setter ...
}

The custom repository

We need to define a custom repository interface that defines the methods we want to implement. Since we want to customize the save method that ElasticsearchRepository has by extending CrudRepository, we need to use the very same method signature including the generics:

CustomHotelRepository.java

package com.sothawo.springdataelastictest;

/**
 * @author P.J. Meisch (pj.meisch@sothawo.com)
 */
public interface CustomHotelRepository<T> {
    <S extends T> S save(S entity);
}

The next class to provide is an implementation of this interface. It is important that the implementation class is named like the interface with a Impl suffix:

CustomHotelRepositoryImpl.java

package com.sothawo.springdataelastictest;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.IndexOperations;
import org.springframework.data.elasticsearch.core.document.Document;
import org.springframework.data.elasticsearch.core.mapping.IndexCoordinates;
import org.springframework.lang.NonNull;
import org.springframework.lang.Nullable;

import java.util.concurrent.ConcurrentHashMap;

/**
 * @author P.J. Meisch (pj.meisch@sothawo.com)
 */
@SuppressWarnings("unused")
public class CustomHotelRepositoryImpl implements CustomHotelRepository<Hotel> {

    private static final Logger LOG = LoggerFactory.getLogger(CustomHotelRepositoryImpl.class);

    private final ElasticsearchOperations operations;

    private final ConcurrentHashMap<String, IndexCoordinates> knownIndexCoordinates = new ConcurrentHashMap<>();
    @Nullable
    private Document mapping;

    @SuppressWarnings("unused")
    public CustomHotelRepositoryImpl(ElasticsearchOperations operations) {
        this.operations = operations;
    }

    @Override
    public <S extends Hotel> S save(S hotel) {

        IndexCoordinates indexCoordinates = getIndexCoordinates(hotel);
        LOG.info("saving {} to {}", hotel, indexCoordinates);

        S saved = operations.save(hotel, indexCoordinates);

        operations.indexOps(indexCoordinates).refresh();

        return saved;
    }

    @NonNull
    private <S extends Hotel> IndexCoordinates getIndexCoordinates(S hotel) {

        String indexName = "hotel-" + hotel.getCountryCode();
        return knownIndexCoordinates.computeIfAbsent(indexName, i -> {

                IndexCoordinates indexCoordinates = IndexCoordinates.of(i);
                IndexOperations indexOps = operations.indexOps(indexCoordinates);

                if (!indexOps.exists()) {
                    indexOps.create();

                    if (mapping == null) {
                        mapping = indexOps.createMapping(Hotel.class);
                    }

                    indexOps.putMapping(mapping);
                }
                return indexCoordinates;
            }
        );
    }
}

This implementation is a Spring Bean (no need for adding @Component) and so can use dependency injection. Let me explain the code.

Line 22: the ElasticsearchOperations object we will use to store the entity in the desired index, this is autowired by constructor injection in lines 29-31

Line 24-26: As we want to make sure that the index we write to exists and has the correct mapping, we keep track of which indices we already know. This is used in the getIndexCoordinates method explained later.

Line 34 to 44: This is the actual implementation of the save operation. First we call getIndexCoordinates which will make sure the index exists. We pass the indexCoordinates into the save method of the ElasticsearchOperations instance. If we would use ElasticsearchOperations.save(hotel), the name from the @Document annotation would be used. But when passing an IndexCoordinates as second parameter, the index name from this is used to store the entity. In line 41 there is a call to refresh, this is the behaviour of the original ElasticsearchRepository.save() method, so we do the same here. If you do not need the immediate refresh, omit this line.

Line 47 to 76: As Spring Data Elasticsearch does not yet support index templates (this will come with version 4.1) this method ensures, that when the first time that an entity is saved to an index, this index is created if necessary and writes the mappings to the new created index.

The HotelRepository to use in the application

We now need to combine our custom repository with the ElasticsearchRepository from Spring Data Elasticsearch:

HotelRepository.java

package com.sothawo.springdataelastictest;

import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

/**
 * @author P.J. Meisch (pj.meisch@sothawo.com)
 */
public interface HotelRepository extends ElasticsearchRepository<Hotel, String>, CustomHotelRepository<Hotel> {
    SearchHits<Hotel> searchAllBy();
}

Here we combine the two interfaces and define an additional method that returns all hotels in a SearchHits object.

Use the repository in the code

The only thing that’s left is to use this repository, for example in a REST controller:

HotelController.java

package com.sothawo.springdataelastictest;

import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

/**
 * @author P.J. Meisch (pj.meisch@sothawo.com)
 */
@RestController
@RequestMapping("/hotels")
public class HotelController {

    private final HotelRepository repository;

    public HotelController(HotelRepository repository) {
        this.repository = repository;
    }

    @GetMapping()
    public SearchHits<Hotel> all() {
        return repository.searchAllBy();
    }

    @PostMapping()
    public Hotel save(@RequestBody Hotel hotel) {
        return repository.save(hotel);
    }
}

This is a standard controller which has a HotelRepository instance injected (which Spring Data Elasticsearch will create for us). This looks exactly how it would without our customization. The difference is that the call to save() ends up in our custom implementation.

Conclusion

This post shows how easy it is to provide custom implementations for the methods that are normally provided by Spring Data Repositories (not just in Spring Data Elasticsearch) if custom logic is needed.

mapjfx 2.15.0 and 1.33.0 released adding circles and OpenLayers 6.4.2

I just released mapjfx versions 1.33.0 and 2.15.0, they will be available in maven central:

  <dependency>
    <groupId>com.sothawo</groupId>
    <artifactId>mapjfx</artifactId>
    <version>1.33.0</version>
  </dependency>
  <dependency>
    <groupId>com.sothawo</groupId>
    <artifactId>mapjfx</artifactId>
    <version>2.15.0</version>
  </dependency>

1.33.0 is built using Java 8 and 2.15.0 uses Java 11.

Circles can now be added to a map, giving the center coordinates and the radius in meters with custom coloring and transparency, thanks to Hanwoo Kim for this contribution!

The OpenLayers version now is 6.4.2.