Using geo-distance sort in Spring Data Elasticsearch 4

The release of Spring Data Elasticsearch in version 4.0 (see the documentation) brings two new features that now enable users to use geo-distance sorts in repository queries: The first is a new class GeoDistanceOrder and the second is a new return type for repository methods SearchHit<T>. In this post I will show how easy it is to use these classes to answer questions like “Which pubs are the nearest to a given location?”.

The source code

The complete runnable code used for this post is available on GitHub. In order to run the application you will need Java 8 or higher and a running instance of Elasticsearch. If this is not accessible at localhost:9200 you need to set the correct value in the src/main/resources/application.yaml file.

The sample data

For this sample application I use a csv file with POI data from OpenStreetMap that contains POIs in Germany which are categorized as kind of food, like restaurants, pubs, fast-food and more. All together there are 826843 records.

When the application is started, the index in Elasticsearch is created and loaded with the data if it does not yet exist. So the first startup takes a little longer, the progress can be seen on the console. Within the application, these POIs are modelled by the following entity:

@Document(indexName = "foodpois")
public class FoodPOI {
    @Id
    private String id;
    @Field(type = FieldType.Text)
    private String name;
    @Field(type = FieldType.Integer)
    private Integer category;
    private GeoPoint location;
    // constructors, getter/setter left out for brevity
}

The interesting properties for this blog post are the location and the name.

The Repository

In order to search the data we need a Repository Definition:

public interface FoodPOIRepository extends ElasticsearchRepository<FoodPOI, String> {
    List<SearchHit<FoodPOI>> searchTop3By(Sort sort);
    List<SearchHit<FoodPOI>> searchTop3ByName(String name, Sort sort);
}

We have two functions defined, the first we will use to search any POI near a given point, with the second on we can search for the POIs with a name. Defining these methods in the interface is all we need as Spring Data Elasticsearch will under the hood create the implementation for these methods by analyzing the method names and parameters.

In Spring Data Elasticsearch before version 4 we could only get a List<FoodPOI> from a repository method. But now there is the SearchHit<T> class, which not only contains the entity, but also other values like a score, highlights or – what we need here – the sort value. When doing a geo-distance sort, the sort value contains the actual distance of the POI to the value we passed into the search.

The Controller

We define a REST controller, so we can call our application to get the data. The request parameters will come in a POST body that will be mapped to the following class:

public class RequestData {
    private String name;
    private double lat;
    private double lon;
    // constructors, getter/setter ...
}

The result data that will be sent to the client looks like this:

public class ResultData {
    private String name;
    private GeoPoint location;
    private Double distance;

  // constructor, gette/setter ...
}

The controller has just one method:

@RestController
@RequestMapping("/foodpois")
public class FoodPOIController {

    private final FoodPOIRepository repository;

    public FoodPOIController(FoodPOIRepository repository) {
        this.repository = repository;
    }

    @PostMapping("/nearest3")
    List<ResultData> nearest3(@RequestBody RequestData requestData) {

        GeoPoint location = new GeoPoint(requestData.getLat(), requestData.getLon());
        Sort sort = Sort.by(new GeoDistanceOrder("location", location).withUnit("km"));

        List<SearchHit<FoodPOI>> searchHits;

        if (StringUtils.hasText(requestData.getName())) {
            searchHits = repository.searchTop3ByName(requestData.getName(), sort);
        } else {
            searchHits = repository.searchTop3By(sort);
        }

        return searchHits.stream()
            .map(searchHit -> {
                Double distance = (Double) searchHit.getSortValues().get(0);
                FoodPOI foodPOI = searchHit.getContent();
                return new ResultData(foodPOI.getName(), foodPOI.getLocation(), distance);
            }).collect(Collectors.toList());
    }
}

In line 15 we create a Sort object that specifies that Elasticsearch should return the data ordered by the geographical distance to the given value which we take from the request data. Then, depending if we have a name, we call the corresponding method and get back a List<SearchHit<FoodPOI>>.

We then in the lines 27 to 29 extract the information we need from the returned objects and build our result data object.

Check the result

After starting the application we can hit the endpoint. I use curl here and pipe the output through jq to have it formatted:

$curl -X "POST" "http://localhost:8080/foodpois/nearest3" \
     -H 'Content-Type: application/json; charset=utf-8' \
     -d $'{
  "lat": 49.02,
  "lon": 8.4
}'|jq

[
  {
    "name": "Cantina Majolika",
    "location": {
      "lat": 49.0190808,
      "lon": 8.4014792
    },
    "distance": 0.14860088197123017
  },
  {
    "name": "Waldgaststätte FSSV",
    "location": {
      "lat": 49.023578,
      "lon": 8.3954656
    },
    "distance": 0.5173117164589114
  },
  {
    "name": "Hatz",
    "location": {
      "lat": 49.0155358,
      "lon": 8.3975457
    },
    "distance": 0.5276800664204232
  }
]

And the Pubs?

curl -X "POST" "http://localhost:8080/foodpois/nearest3" \
     -H 'Content-Type: application/json; charset=utf-8' \
     -d $'{
  "lat": 49.02,
  "lon": 8.4,
  "name": "pub"
}'|jq
[
  {
    "name": "Scruffy's Irish Pub",
    "location": {
      "lat": 49.0116335,
      "lon": 8.3950194
    },
    "distance": 0.998711100164643
  },
  {
    "name": "Irish Pub “Sean O'Casey's”",
    "location": {
      "lat": 49.0090639,
      "lon": 8.4028365
    },
    "distance": 1.2335132790824628
  },
  {
    "name": "Oxford Pub",
    "location": {
      "lat": 49.0086149,
      "lon": 8.4129781
    },
    "distance": 1.5806674447458173
  }
]

And that’s it

Without even needing to know how these request are sent to Elasticsearch and what Elasticsearch sends back, we can easily use these features in our Spring application. Hope you enjoyed it!