Vega Lite
Data Transformation
Binning

Understand Bin in Vega-Lite

Sometimes you might find algorithms you learned for data analytic can only be applied for specific types of data.

In this case, data transformation methods might help. It allows us to transform data into format we need, while keeping the original data properties. Bin is one of those magic transformation, it allows us to turn quantitative variables into categorical ones.

What is Binning in Data Visualization?

You can regard bin as a way to group data into intervals. For example, a collection of math score, from 0 - 100. Bin can turn the values into 10 interval, 0-10, 10-20, 20-30, and so on. Then you regard each interval as a category.

Bin can also be very useful for data analytic + LLM. Since it is a way to compress data while keep important features of data. Comparing with feed raw data directly into LLM (like you may need to feed 10k rows of raw data), with bin, you only need to pass 10 rows to LLM and get the same result. In this way, LLM can save a lot of tokens and time.

Binning in Encoding Field Definition

To use bin in vega-lite, you can directly use bin property in encoding field:

PropertyTypeDescription
binBoolean | BinParams | String | NullA flag for binning a quantitative field, an object defining binning parameters, or indicating that the data for x or y channel are binned before they are imported into Vega-Lite ("binned").

If true, default BinParams will be applied.
If binned, this indicates that the data for the x (or y) channel are already binned. You can map the bin-start field to x (or y) and the bin-end field to x2 (or y2). The scale and axis will be formatted similar to binning in Vega-Lite. To adjust the axis ticks based on the bin step, you can also set the axis's tickMinStep property.

Default: false

To bin an encoding field, use the bin property in a field definition. Here is how you can do this in Vega-Lite:

{
    "data": {
        "url": "data/cars.json"
    },
    "mark": "bar",
    "encoding": {
        "x": {
            "bin": true,
            "field": "Horsepower",
            "type": "quantitative"
        },
        "y": {
            "aggregate": "count"
        }
    }
}

Without "bin"

With "bin"

In this example, we are binning the x field, and the y axie counts the number of occurrences in each bin.

Examples

Using Ordinal Scale in Histograms

While binned fields are quantitative by default, setting the type to ordinal offers a different perspective:

{
    "mark": "bar",
    "encoding": {
        "x": {
            "bin": true,
            "field": "IMDB Rating",
            "type": "ordinal"
        },
        "y": {
            "aggregate": "count"
        }
    }
}

Binned Color

Binning can also be useful for discretizing color scales. Vega-Lite automatically creates legends with range labels:

{
    "data": { "url": "data/cars.json" },
    "mark": "point",
    "encoding": {
        "x": { "field": "Horsepower", "type": "quantitative" },
        "y": { "field": "Miles_per_Gallon", "type": "quantitative" },
        "color": { "bin": true, "field": "Acceleration" }
    }
}

Use Pre-binned Data

If you already have binned data, set the bin property to "binned". This allows Vega-Lite to render scales and axes as if the binning was performed within Vega-Lite.

{
    "data": {
        "values": [
            { "bin_start": 0, "bin_end": 2, "count": 10 },
            { "bin_start": 2, "bin_end": 4, "count": 20 },
            { "bin_start": 4, "bin_end": 6, "count": 50 },
            { "bin_start": 6, "bin_end": 8, "count": 35 }
        ]
    },
    "mark": "bar",
    "encoding": {
        "x": {
            "bin": "binned",
            "field": "bin_start"
        },
        "x2": { "field": "bin_end" },
        "y": {
            "field": "count",
            "type": "quantitative"
        }
    }
}

How to Use Bin Transform?

Instead of using the bin property in field definitions, you can perform binning via a bin transform. This method is useful for more complex data manipulation before encoding.

The bin transform in the transform array has following properties:

PropertyTypeDescription
binBoolean | BinParamsRequired. An object indicating bin properties, or simply true for using default bin parameters.
fieldStringRequired. The data field to bin.
asString | String[]Required. The output fields at which to write the start and end bin values. This can be either a string or an array of strings with two elements denoting the name for the fields for bin start and bin end respectively. If a single string (e.g., "val") is provided, the end field will be "val_end".

Following is an example that uses bin transform:

{
    "data": { "url": "data/movies.json" },
    "transform": [
        {
            "bin": true,
            "field": "IMDB Rating",
            "as": "binned rating"
        }
    ],
    "mark": "bar",
    "encoding": {
        "x": {
            "field": "binned rating",
            "title": "IMDB Rating (binned)",
            "bin": {
                "binned": true,
                "step": 1
            }
        },
        "y": { "aggregate": "count" }
    }
}

Bin parameters

When bin is true, default binning properties are used. You can set bin to a bin definition object, which can have following properties:

PropertyTypeDescription
anchorNumberA value in the binned domain at which to anchor the bins, shifting the bin boundaries if necessary to ensure that a boundary aligns with the anchor value.
Default value: the minimum bin extent value
baseNumberThe number base to use for automatic bin determination (default is base 10).
Default value: 10
divideNumber[]Scale factors indicating allowable subdivisions. The default value is [5, 2], which indicates that for base 10 numbers (the default base), the method may consider dividing bin sizes by 5 and/or 2. For example, for an initial step size of 10, the method can check if bin sizes of 2 (= 10/5), 5 (= 10/2), or 1 (= 10/(5*2)) might also satisfy the given constraints.
Default value: [5, 2]
extentArrayA two-element ([min, max]) array indicating the range of desired bin values.
maxbinsNumberMaximum number of bins.
Default value: 6 for row, column and shape channels; 10 for other channels
minstepNumberA minimum allowable step size (particularly useful for integer values).
niceBooleanIf true, attempts to make the bin boundaries use human-friendly boundaries, such as multiples of ten.
Default value: true
stepNumberAn exact step size to use between bins.
Note: If provided, options such as maxbins will be ignored.
stepsNumber[]An array of allowable step sizes to choose from.

Examples for Using Bin Parameters

Customizing Max Bins

{
    "data": { "url": "data/movies.json" },
    "mark": "bar",
    "encoding": {
        "x": {
            "bin": { "maxbins": 30 },
            "field": "IMDB Rating"
        },
        "y": { "aggregate": "count" }
    }
}

Use Default

Maxbins=30

Customizing The Step Length

{
  "data": {"url": useVegaJSONDataset("movies")},
  "mark": "bar",
  "encoding": {
    "x": {
      "bin": {"step": 3},
      "field": "IMDB Rating"
    },
    "y": {"aggregate": "count"}
  }
}

Use Default

Step=3

FAQs

1. How do I bin values in a range?

You can use the extent parameter of bin transformation, extent is a two-element [min, max] array to indicating the desired range.

Use Default

extent=[4, 8]

In this example, we limited the IMDB rating in a range of 4 to 8.

2. How do I get more refined bins if I have a large amount of data?

You can set the maxbins parameter to adapt to large amount of data:

Use Default

maxbins=30

In this example, we used a larger maxbins than default (10), so that we will get more detailed chart.