Understand Bin in Vega-Lite
Sometimes you might find algorithms you learned for data analytic can only be applied for specific types of data.
In this case, data transformation methods might help. It allows us to transform data into format we need, while keeping the original data properties.
Bin
is one of those magic transformation, it allows us to turn quantitative variables into categorical ones.
What is Binning in Data Visualization?
You can regard bin as a way to group data into intervals. For example, a collection of math score, from 0 - 100. Bin can turn the values into 10 interval, 0-10, 10-20, 20-30, and so on. Then you regard each interval as a category.
Bin can also be very useful for data analytic + LLM. Since it is a way to compress data while keep important features of data. Comparing with feed raw data directly into LLM (like you may need to feed 10k rows of raw data), with bin, you only need to pass 10 rows to LLM and get the same result. In this way, LLM can save a lot of tokens and time.
Binning in Encoding Field Definition
To use bin in vega-lite, you can directly use bin
property in encoding field:
Property | Type | Description |
---|---|---|
bin | Boolean | BinParams | String | Null | A flag for binning a quantitative field, an object defining binning parameters, or indicating that the data for x or y channel are binned before they are imported into Vega-Lite ("binned"). If true , default BinParams will be applied. If binned , this indicates that the data for the x (or y) channel are already binned. You can map the bin-start field to x (or y) and the bin-end field to x2 (or y2). The scale and axis will be formatted similar to binning in Vega-Lite. To adjust the axis ticks based on the bin step, you can also set the axis's tickMinStep property. Default: false |
To bin an encoding field, use the bin
property in a field definition. Here is how you can do this in Vega-Lite:
{
"data": {
"url": "data/cars.json"
},
"mark": "bar",
"encoding": {
"x": {
"bin": true,
"field": "Horsepower",
"type": "quantitative"
},
"y": {
"aggregate": "count"
}
}
}
Without "bin"
With "bin"
In this example, we are binning the x
field, and the y
axie counts the number of occurrences in each bin.
Examples
Using Ordinal Scale in Histograms
While binned fields are quantitative by default, setting the type to ordinal
offers a different perspective:
{
"mark": "bar",
"encoding": {
"x": {
"bin": true,
"field": "IMDB Rating",
"type": "ordinal"
},
"y": {
"aggregate": "count"
}
}
}
Binned Color
Binning can also be useful for discretizing color scales. Vega-Lite automatically creates legends with range labels:
{
"data": { "url": "data/cars.json" },
"mark": "point",
"encoding": {
"x": { "field": "Horsepower", "type": "quantitative" },
"y": { "field": "Miles_per_Gallon", "type": "quantitative" },
"color": { "bin": true, "field": "Acceleration" }
}
}
Use Pre-binned Data
If you already have binned data, set the bin
property to "binned"
. This allows Vega-Lite to render scales and axes as if the binning was performed within Vega-Lite.
{
"data": {
"values": [
{ "bin_start": 0, "bin_end": 2, "count": 10 },
{ "bin_start": 2, "bin_end": 4, "count": 20 },
{ "bin_start": 4, "bin_end": 6, "count": 50 },
{ "bin_start": 6, "bin_end": 8, "count": 35 }
]
},
"mark": "bar",
"encoding": {
"x": {
"bin": "binned",
"field": "bin_start"
},
"x2": { "field": "bin_end" },
"y": {
"field": "count",
"type": "quantitative"
}
}
}
How to Use Bin Transform?
Instead of using the bin
property in field definitions, you can perform binning via a bin
transform. This method is useful for more complex data manipulation before encoding.
The bin
transform in the transform array has following properties:
Property | Type | Description |
---|---|---|
bin | Boolean | BinParams | Required. An object indicating bin properties, or simply true for using default bin parameters. |
field | String | Required. The data field to bin. |
as | String | String[] | Required. The output fields at which to write the start and end bin values. This can be either a string or an array of strings with two elements denoting the name for the fields for bin start and bin end respectively. If a single string (e.g., "val" ) is provided, the end field will be "val_end" . |
Following is an example that uses bin transform:
{
"data": { "url": "data/movies.json" },
"transform": [
{
"bin": true,
"field": "IMDB Rating",
"as": "binned rating"
}
],
"mark": "bar",
"encoding": {
"x": {
"field": "binned rating",
"title": "IMDB Rating (binned)",
"bin": {
"binned": true,
"step": 1
}
},
"y": { "aggregate": "count" }
}
}
Bin parameters
When bin
is true, default binning properties are used. You can set bin
to a bin definition object, which can have following properties:
Property | Type | Description |
---|---|---|
anchor | Number | A value in the binned domain at which to anchor the bins, shifting the bin boundaries if necessary to ensure that a boundary aligns with the anchor value. Default value: the minimum bin extent value |
base | Number | The number base to use for automatic bin determination (default is base 10). Default value: 10 |
divide | Number[] | Scale factors indicating allowable subdivisions. The default value is [5, 2] , which indicates that for base 10 numbers (the default base), the method may consider dividing bin sizes by 5 and/or 2. For example, for an initial step size of 10, the method can check if bin sizes of 2 (= 10/5), 5 (= 10/2), or 1 (= 10/(5*2)) might also satisfy the given constraints. Default value: [5, 2] |
extent | Array | A two-element ([min, max] ) array indicating the range of desired bin values. |
maxbins | Number | Maximum number of bins. Default value: 6 for row , column and shape channels; 10 for other channels |
minstep | Number | A minimum allowable step size (particularly useful for integer values). |
nice | Boolean | If true, attempts to make the bin boundaries use human-friendly boundaries, such as multiples of ten. Default value: true |
step | Number | An exact step size to use between bins. Note: If provided, options such as maxbins will be ignored. |
steps | Number[] | An array of allowable step sizes to choose from. |
Examples for Using Bin Parameters
Customizing Max Bins
{
"data": { "url": "data/movies.json" },
"mark": "bar",
"encoding": {
"x": {
"bin": { "maxbins": 30 },
"field": "IMDB Rating"
},
"y": { "aggregate": "count" }
}
}
Use Default
Maxbins=30
Customizing The Step Length
{
"data": {"url": useVegaJSONDataset("movies")},
"mark": "bar",
"encoding": {
"x": {
"bin": {"step": 3},
"field": "IMDB Rating"
},
"y": {"aggregate": "count"}
}
}
Use Default
Step=3
FAQs
1. How do I bin values in a range?
You can use the extent
parameter of bin transformation, extent is a two-element [min, max]
array to indicating the desired range.
Use Default
extent=[4, 8]
In this example, we limited the IMDB rating in a range of 4 to 8.
2. How do I get more refined bins if I have a large amount of data?
You can set the maxbins
parameter to adapt to large amount of data:
Use Default
maxbins=30
In this example, we used a larger maxbins
than default (10
), so that we will get more detailed chart.