Vega Lite
What Are the Different Data Types in Vega-Lite and How to Use Them?

What Are the Different Data Types in Vega-Lite and How to Use Them?

When working with Vega-Lite, it's crucial to understand the types of data you’re dealing with. Each field in your data set has a type that tells Vega-Lite how to handle it. This type is based on its level of measurement (opens in a new tab). Let's dive into the details of each data type—quantitative, temporal, ordinal, nominal, and geojson—and show you how to use them effectively.

Quantitative Data: What Is It and How to Visualize It?

Quantitative data is all about numbers. Whether you’re plotting sales figures, temperatures, or distances, if it’s numerical, it’s quantitative. Examples include 7.3, 42.0, or 12.1.

Why Use Quantitative Data?

Quantitative data is perfect for showing quantities and their relationships. Vega-Lite includes zero values by default on scales for quantitative fields, like x, y, and size, making it ideal for ratio data. However, you can adjust the scale's zero property if you need flexibility.

Example

Imagine you have data on temperature recordings over a week:

{
  "data": {"values": [
    {"day": "Monday", "temperature": 20},
    {"day": "Tuesday", "temperature": 22},
    {"day": "Wednesday", "temperature": 19},
    {"day": "Thursday", "temperature": 24},
    {"day": "Friday", "temperature": 23}
  ]},
  "mark": "line",
  "encoding": {
    "x": {"field": "day", "type": "ordinal"},
    "y": {"field": "temperature", "type": "quantitative"}
  }
}

Temporal Data: Representing Time and Dates

Temporal data deals with dates and times, such as "2015-03-07 12:32:17", "17:01", or even Unix timestamps like 1552199579097.

Why Use Temporal Data?

When your data involves time, such as tracking performance over days or months, temporal types allow you to represent and visualize it accurately.

Example

Visualize monthly sales data:

{
  "data": {"values": [
    {"month": "2021-01", "sales": 30},
    {"month": "2021-02", "sales": 45},
    {"month": "2021-03", "sales": 50}
  ]},
  "mark": "bar",
  "encoding": {
    "x": {"field": "month", "type": "temporal"},
    "y": {"field": "sales", "type": "quantitative"}
  }
}

Ordinal Data: Ranking Without Measuring Distance

Ordinal data is about order and ranking. An example might be "small", "medium", "large", and "extra-large". Unlike quantitative data, ordinal data does not measure relative differences.

Why Use Ordinal Data?

Use ordinal data when you have a ranked order but don't need to compare the difference between entries.

Example

Showing product sizes:

{
  "data": {"values": [
    {"size": "small", "count": 15},
    {"size": "medium", "count": 30},
    {"size": "large", "count": 45},
    {"size": "extra-large", "count": 10}
  ]},
  "mark": "bar",
  "encoding": {
    "x": {"field": "size", "type": "ordinal"},
    "y": {"field": "count", "type": "quantitative"}
  }
}

Casting a Temporal Field as Ordinal

Treat a date-time field as an ordinal field if you want to visualize time units as discrete categories.

{
  "data": {"values": [
    {"month": "2021-01", "sales": 30},
    {"month": "2021-02", "sales": 45},
    {"month": "2021-03", "sales": 50}
  ]},
  "mark": "bar",
  "encoding": {
    "x": {"field": "month", "type": "ordinal"},
    "y": {"field": "sales", "type": "quantitative"}
  }
}

Nominal Data: Categories Without Order

Nominal data sorts items by category without any implicit ranking. Examples include gender, nationality, or the type of music genre.

Why Use Nominal Data?

Use nominal data when each value is unique and cannot be compared against each other in any meaningful numerical way.

Example

Categorize customer feedback:

{
  "data": {"values": [
    {"feedback": "positive", "count": 110},
    {"feedback": "neutral", "count": 45},
    {"feedback": "negative", "count": 25}
  ]},
  "mark": "bar",
  "encoding": {
    "x": {"field": "feedback", "type": "nominal"},
    "y": {"field": "count", "type": "quantitative"}
  }
}

GeoJSON Data: Map Your Geographical Shapes

GeoJSON data is all about geography. It represents shapes and areas on the map described using the GeoJSON format (opens in a new tab).

Why Use GeoJSON Data?

Whenever you need to represent geographical entities like countries, states, or even complex polygons.

Example

Map data visualization:

{
  "data": {
    "url": "path/to/geojson/data",
    "format": {"type": "json"}
  },
  "mark": "geoshape",
  "encoding": {
    "color": {"field": "value", "type": "quantitative"}
  }
}

FAQ

1. Can I mix different data types in one chart?

Yes, you can mix different data types in a single Vega-Lite chart. Just make sure each field is correctly specified, so Vega-Lite knows how to treat each type.

2. What happens if I don't specify a data type?

If you don't specify a data type, Vega-Lite tries to infer it based on the data provided. However, it's always best to be explicit to avoid any unintended results.

3. How do I handle missing or null values in my data?

Vega-Lite handles missing or null values gracefully by default. If you need special handling, you can preprocess your data to fill or remove these values before visualizing.

Understanding these data types will enable you to create powerful and insightful visualizations with Vega-Lite. Happy charting!