Batch processing with Akka part 2
This article is the follow-up to Batch processing with Akka part 1 .
This first implementation does not care about error recovery, there is no special exception or error handling.
The records are sent into the actor system, where they are processed in parallel and written to the output file at the end. The actor system is running in a single VM, there is no load-balancing across several computers at the moment.
The akka implementation has the following features:
- The number of messages that are processed simultaneously in the system is limited by a configuration entry. This prevents the system from being flooded by messages
- As the creation of objects from csv-lines does not create a siginificant prozessor load, a special dummy-processing has been implemented which in about 50 percent makes a Thread.sleep(1) while the other 50 percent are spent with calculating the 1000th fibonacci number. This simulates blocking processing as well as cpu consuming work.
The same dummy-processing is used in a second program where the records are processed sequentially without the use of threads.
Test runs of the sequential and the akka based program with a file with 520.000 records on two different computers yield the following times:
computer 1: 2 processors:
sequential processing: 7min, 18sec
parallel processing: 1min, 9sec
speed-factor: 6,4
computer 2: 8 processors:
sequential processing: 5min, 27sec
parallel processing: 15sec
speed-factor: 22,4
On observing the number of used threads during processing, akka in its standard configuration uses 3x#cores for the number of used threads. The number of threads in this simple program is about the same as the speed factor gained by the use of parallel programming.
About program complexity: By adopting the actor model there is a little increased work in modelling the actors and messages in respect to the sequential version. But this is more than paid off by the speed gain, and up to now there is no taking into account of using balancing with several computers.
Very important is the fact that the program code does not contain any thread or synchronisation elements, this is all provided by akka in the background, thus preventing problems which stem from wrong synchronisation.