Saturday, January 23, 2016

String splitter performance comparison for java libraries

Context:

Recently, I came across a usecase where I had to split the block of byte[ ] based on some delimiter.
Since, this operation will be done every record from incoming data and large volume of data; I wanted to evaluate performance between possible options for this. The 3 prominent choices I came across were

  • from JRE :java.lang.String.split()
  • from Gauava library : com.google.common.base.Splitter.on()
  • from apache-commons library : org.apache.commons.lang3.StringUtils.split()

Note: 

Intention of this blog is not to point out which library is better or worse than other. The performance numbers might change from version to version.
But, I want to share the code I used for comparison so that it can be reused if you need to similar evaluation. Always, try out the actual performance in your environment with the specific version of the libraries you will be using instead of blindly following the numbers given in some blog post. 


Code: 
Java source code is available at https://gist.github.com/yogidevendra/b696fd85d89b5896b25f


References:

  1. http://stackoverflow.com/questions/11001330/java-split-string-performances
  2. http://demeranville.com/battle-of-the-tokenizers-delimited-text-parser-performance/
  3. http://thornydev.blogspot.in/2014/10/updated-microbenchmarks-for-java-string.html
  4. http://howtodoinjava.com/2014/06/02/4-ways-to-splittokenize-strings-in-java/

No comments:

Post a Comment