Class Histogram<B extends Comparable<B>>

java.lang.Object
nl.colorize.util.stats.Histogram<B>
Type Parameters:
B - Type of the bins within the histogram.

public class Histogram<B extends Comparable<B>> extends Object
Data structure to describe histograms, which can be used to describe the distribution of a numerical data set. The histogram consists of bins and series.

Bins act as intervals or “buckets” for categorizing values. Bins can be of any type, but must implement the Comparable interface, which is used to determine the order of the bins within the histogram. It is possible to add bins with a frequency of zero, for cases where the histogram needs to depict the entire range of bins.

Series can be used to provide more information on different categories of data that contribute towards the overall frequency within each bin. Series are purely descriptive text label, and are therefore always of type string. Also unlike bins, series do not have an explicit order, with series being sorted based on their overall frequency within the data set.

The following example shows a histogram with multiple bins and multiple series, presented in glorious ASCII art:

   [2]
   [1]     [2]
   [1]     [1]      [1]
   -------------------------------
   0-10    11-20    21-30    31-40
 

This class is not thread-safe, Histogram instances should therefore not be used concurrently from multiple threads.

  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new histogram that is initially empty.
    Histogram(List<B> initialBins)
    Creates a new histogram that consists of the specified bins.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    count(B bin, String series)
    Adds the specified frequency to this histogram.
    void
    count(B bin, String series, int value)
    Adds the specified frequency to this histogram.
    Returns a map containing all series and corresponding frequency that exist in the specified bin.
    Returns a list of all bins in this histogram.
    int
    Returns the total frequency for the specified bin, combining the frequencies of all series that are included in that bin.
    int
    getFrequency(B bin, String series)
    Returns the frequency count for the specified bin and series.
    Returns a list of all series in this histogram.
    Returns a list of tuples for all bins in this histogram, with each tuple consisting of the bin and the corresponding frequency for the specified series.
    Returns a map containing the total frequency for all series in this histogram, but normalized to percentages instead of the absolute numbers.
    int
    Returns the total frequency for the specified series, combining all bins in which the series might exist.
    Returns map containing the total frequency for all series in this histogram.
    int
    Returns the combined total frequency of all data in this histogram.
    void
    merge(Histogram<B> other)
    Adds all data from the specified other histogram to this histogram.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Histogram

      public Histogram()
      Creates a new histogram that is initially empty. Bins will be added on-the-fly as data is added to the histogram.
    • Histogram

      public Histogram(List<B> initialBins)
      Creates a new histogram that consists of the specified bins. This can be used in situations where all bins are known up front, or when it is needed to always depict all possible bins in the histogram.
  • Method Details

    • count

      public void count(B bin, String series)
      Adds the specified frequency to this histogram. The requested bin and/or series are added to this histogram if they do not yet exist.
    • count

      public void count(B bin, String series, int value)
      Adds the specified frequency to this histogram. The requested bin and/or series are added to this histogram if they do not yet exist.
      Throws:
      IllegalArgumentException - when trying to add a negative frequency. Note that adding zero is in fact allowed, this will add the bin if it does not exist yet without adding a frequency to the bin.
    • merge

      public void merge(Histogram<B> other)
      Adds all data from the specified other histogram to this histogram. This includes any bins and/or series that are not yet present in this histogram.
    • getBins

      public List<B> getBins()
      Returns a list of all bins in this histogram. The bins are sorted based on their natural order, i.e. based on the Comparable interface.
    • getSeries

      public List<String> getSeries()
      Returns a list of all series in this histogram. The series are ordered based on overall frequency, with the largest series becoming the first element in the list.
    • getFrequency

      public int getFrequency(B bin, String series)
      Returns the frequency count for the specified bin and series. Returns zero if the bin and/or series do not exist in this histogram.
    • getBinFrequency

      public Map<String,Integer> getBinFrequency(B bin)
      Returns a map containing all series and corresponding frequency that exist in the specified bin. The iteration order of the map is based on series frequency, with the most common series first. Returns an empty map if no such bin exists.
    • getSeriesFrequency

      public TupleList<B,Integer> getSeriesFrequency(String series)
      Returns a list of tuples for all bins in this histogram, with each tuple consisting of the bin and the corresponding frequency for the specified series. The frequency will be zero if no such series exists.
    • getBinTotal

      public int getBinTotal(B bin)
      Returns the total frequency for the specified bin, combining the frequencies of all series that are included in that bin. Returns zero if no such bin exists in this histogram.
    • getSeriesTotal

      public int getSeriesTotal(String series)
      Returns the total frequency for the specified series, combining all bins in which the series might exist. Returns zero if no such series exists in this histogram.
    • getSeriesTotals

      public Map<String,Integer> getSeriesTotals()
      Returns map containing the total frequency for all series in this histogram. The iteration order of the map will match getSeries().
    • getSeriesPercentages

      public Map<String,Float> getSeriesPercentages()
      Returns a map containing the total frequency for all series in this histogram, but normalized to percentages instead of the absolute numbers. The iteration order of the map will match getSeries(). Use getSeriesTotals() if you need the absolute numbers.
    • getTotal

      public int getTotal()
      Returns the combined total frequency of all data in this histogram. This number will match both the sum of all bins and the sum of all series.