Text this: Hadoop bottleneck detection algorithm based on information gain