Michael-noll - michael-noll.com - Michael G. Noll
General Information:
Latest News:
Using Avro in MapReduce jobs with Hadoop, Pig, Hive 4 Jul 2013 | 11:29 am
Apache Avro is a very popular data serialization format in the Hadoop technology stack. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or ...
Understanding the Internal Message Buffers of Storm 22 Jun 2013 | 01:35 am
When you are optimizing the performance of your Storm topologies it helps to understand how Storm’s internal message queues are configured and put to use. In this short article I will explain and illu...
Installing and Running Graphite via RPM and Supervisord 6 Jun 2013 | 07:28 pm
When you are running distributed applications such as Hadoop or Storm a key success factor is gaining meaningful operational insights into what’s happening in your cluster environments. Graphite is p...
Multi-Node Storm Cluster Tutorial Published 28 May 2013 | 04:29 pm
This short blog post is just a reference pointer for subscribers of my RSS feed: I have published a new tutorial about how to run a multi-node, distributed Storm cluster in my Tutorials section. In t...
Reading and Writing Avro Files from the Command Line 17 Mar 2013 | 09:59 pm
Apache Avro is becoming one of the most popular data serialization formats nowadays, and this holds true particularly for Hadoop-based big data platforms because tools like Pig, Hive and of course Had...
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node 13 Mar 2013 | 09:59 pm
In this article I describe how to install, configure and run a multi-broker Apache Kafka 0.8 (trunk) cluster on a single machine. The final setup consists of one local ZooKeeper instance and three loc...
Bootstrapping a Java project with Gradle, TestNG, Mockito and Cobertura for Eclipse and Jenkins 25 Jan 2013 | 10:59 pm
When starting out with a fresh Java project one of the nuisances you have to deal with is setting up your build and test environment. It’s even more troublesome if you are trying to switch from Maven ...
Implementing Real-Time Trending Topics With A Distributed Rolling Count Algorithm in Storm 18 Jan 2013 | 04:56 pm
A common pattern in real-time data workflows is performing rolling counts of incoming data points, also known as sliding window analysis. A typical use case for rolling counts is identifying trending ...
Understanding the parallelism of a Storm topology 17 Oct 2012 | 12:53 am
In the past few days I have been test-driving Twitter’s Storm project, which is a distributed real-time data processing platform. One of my findings so far has been that the quality of Storm’s documen...
Understanding the Parallelism of a Storm Topology 16 Oct 2012 | 03:00 am
In the past few days I have been test-driving Twitter’s Storm project, which is a distributed real-time data processing platform. One of my findings so far has been that the quality of Storm’s documen...