Friday, March 25, 2016

Apache Apex meetups in Bangalore






  



Me along with few of my team-mates from Apache Apex team will be visiting Bangalore to conduct meetup sessions on 1st, 2nd April 2016.

This will be good starting point to learn about this next gen native hadoop big data streaming engine.

Details :

Apache Apex (incubating) is a open source next generation big data stream processing engine to develop low latency, high throughput, fault-tolerant, easily operable big data applications. It allows you to ingest data from different sources and perform data processing at scale using transformations, pre-built operators or custom business logic. Thus, making it a platform with low barrier to entry to write big data applications.

Agenda :
  • Apache Apex architectural overview, API model, salient features of the platform. 
  • Live demo of developing an application from scratch using Apache Apex APIs and see it running in hadoop environment. 
Prerequisite :
  • Developers with knowledge about Java (or any other) backend programming. 
  • Fundamentals of computer science 
  • Prior knowledge about distributed systems, big data, hadoop eco-system would be a plus but not mandatory 
This event is FREE. Anyone interested in the topic can join with prior RSVP.

Same event will be repeated at different locations, timings. You can attend any one event convenient to you. Please RSVP on the respective link for event suitable to you.

Hosted by:





Wednesday, March 2, 2016

Windowing in Apex

Today, I delivered a talk on : Windowing in Apex

Here is the slide deck for this talk : http://www.slideshare.net/DevendraVyavahare/windowing-in-apex



Windowing in apex from yogi devendra

It was part of Apache Apex and Big Data Ingestion meetup @ ICC towers, Pune. Also, webcasted for overseas attendees.

It was great experience to interact with the newer crowd.  Lot of questions from the curious audience.

I will also upload the video link once it is available. Till then, you can go through the slides and post if you have any questions / comments.  

Saturday, February 13, 2016

Introduction to Real-time data processing


Today I conducted a lecture on : Introduction to Real-time data processing.

It was part of Apache Hadoop & Apache Apex workshop organized by IEEE PICT Student Branch @ PICT, Pune. It was great pleasure to be there as a speaker. I got a teaching opportunity after long time. Big thanks to all the organizers, sponsors, participants.

Here is the slide deck for this lecture : http://www.slideshare.net/DevendraVyavahare/batch-processing-vs-real-time-data-processing-streaming



Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.  I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.




Saturday, January 23, 2016

String splitter performance comparison for java libraries

Context:

Recently, I came across a usecase where I had to split the block of byte[ ] based on some delimiter.
Since, this operation will be done every record from incoming data and large volume of data; I wanted to evaluate performance between possible options for this. The 3 prominent choices I came across were

  • from JRE :java.lang.String.split()
  • from Gauava library : com.google.common.base.Splitter.on()
  • from apache-commons library : org.apache.commons.lang3.StringUtils.split()

Note: 

Intention of this blog is not to point out which library is better or worse than other. The performance numbers might change from version to version.
But, I want to share the code I used for comparison so that it can be reused if you need to similar evaluation. Always, try out the actual performance in your environment with the specific version of the libraries you will be using instead of blindly following the numbers given in some blog post. 


Code: 
Java source code is available at https://gist.github.com/yogidevendra/b696fd85d89b5896b25f


References:

  1. http://stackoverflow.com/questions/11001330/java-split-string-performances
  2. http://demeranville.com/battle-of-the-tokenizers-delimited-text-parser-performance/
  3. http://thornydev.blogspot.in/2014/10/updated-microbenchmarks-for-java-string.html
  4. http://howtodoinjava.com/2014/06/02/4-ways-to-splittokenize-strings-in-java/