This article is a follow-up to Batch processing with Akka part 2 .
In the third part of this series I deal with error recovering for the case that records are lost during processing or in case the processing of a record takes too long time. The may happen for example when multiple computers are involved in the processing and network problems occur.
To handle such cases, the program checks in regular intervals if records, that have been sent into the system for processing, are processed in time. If this is not true, the records are sent again into the system. The length of the check interval and the time that is used to detect a timeout are adapted automatically during processing.
The error is simulated by the actors that are doing the processing as they are just dropping a configurable part of their messages instead of processing them.
The effort to implement this behavior its not too small, but it’s not too hard either. Care has to be taken that the records are not sent into the system too often, on the other hand the check interval must be short enough to not slow down the processing. But the result is remarkable:
Tested on a computer with 8 cores and a file with 520.000 records:
0% – 26ms
10% – 24ms
25% – 19ms
50% – 20ms
60% – 26ms
75% – 45ms
The measured times show that a drop rate of 0% causes some overhead by checking if records need to be resent. But it’s clear that when dropping 25% of the records and resending the data the system is fastest while correcting the errors. I think this comes from the fact that the actors that are dropping data are faster available for new processing than actors that do their job properly. Only if the drop rate rises above 60% the records have to be resent so often that this leads to a performance loss.