Semantic Types
What Are Semantic Types?
When you upload a dataset, DashAI assigns a semantic type to each column. Semantic types go beyond raw storage formats (e.g., PyArrow int32 or string) to express the ML-meaningful nature of the data: is this column a continuous measurement, a discrete label, a free-form text, a date?
This classification drives three critical behaviours throughout the platform:
- Task compatibility — only columns whose types match a task's requirements can be selected as inputs or outputs.
- Converter chaining — converters declare the type they accept and the type they produce, enabling safe preprocessing pipelines.
- Label encoding — categorical output columns are automatically integer-encoded before training and decoded back to string labels after prediction.
Type Hierarchy
All semantic types inherit from a common abstract base class, DashAIDataType.
DashAIDataType
├── DashAIValue # abstract parent for all value types
│ ├── Integer # int8, int16, int32, int64 (signed or unsigned)
│ ├── Float # float16, float32, float64
│ ├── Text # string with encoding (default: UTF-8)
│ ├── Date # calendar date (default format: YYYY-MM-DD)
│ ├── Time # time of day (default format: HH:mm:ss)
│ ├── Timestamp # datetime with timezone (default: YYYY-MM-DD HH:mm:ss)
│ ├── Duration # elapsed time with unit (s, ms, us, ns)
│ ├── Decimal # precise decimal (128 or 256-bit, with precision and scale)
│ └── Binary # raw binary data