The Informer model tackles the vanilla Transformer computational complexity challenges for long-horizon forecasting. The architecture has three distinctive features:Documentation Index
Fetch the complete documentation index at: https://nixtla-feat-posthog-analytics.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- A ProbSparse self-attention mechanism with an O time and memory complexity Llog(L). - A self-attention distilling process that prioritizes attention and efficiently handles long input sequences.
- An MLP multi-step decoder that predicts long time-series sequences in a single forward operation rather than step-by-step.
- It employs encoded autoregressive features obtained from a convolution network.
- It uses window-relative positional embeddings derived from harmonic functions.
- Absolute positional embeddings obtained from calendar features are utilized.

1. Informer
Informer
BaseModel
Informer
| Name | Type | Description | Default |
|---|---|---|---|
h | int | forecast horizon. | required |
input_size | int | maximum sequence length for truncated train backpropagation. | required |
futr_exog_list | str list | future exogenous columns. | None |
hist_exog_list | str list | historic exogenous columns. | None |
stat_exog_list | str list | static exogenous columns. | None |
exclude_insample_y | bool | the model skips the autoregressive features y[t-input_size:t] if True. | False |
decoder_input_size_multiplier | float | multiplier for the input size of the decoder. | 0.5 |
hidden_size | int | units of embeddings and encoders. | 128 |
dropout | float | dropout throughout Informer architecture. | 0.05 |
factor | int | Probsparse attention factor. | 3 |
n_head | int | controls number of multi-head’s attention. | 4 |
conv_hidden_size | int | channels of the convolutional encoder. | 32 |
activation | str | activation from [‘ReLU’, ‘Softplus’, ‘Tanh’, ‘SELU’, ‘LeakyReLU’, ‘PReLU’, ‘Sigmoid’, ‘GELU’]. | ‘gelu’ |
encoder_layers | int | number of layers for the TCN encoder. | 2 |
decoder_layers | int | number of layers for the MLP decoder. | 1 |
distil | bool | wether the Informer decoder uses bottlenecks. | True |
loss | PyTorch module | instantiated train loss class from losses collection. | MAE() |
valid_loss | PyTorch module | instantiated valid loss class from losses collection. | None |
max_steps | int | maximum number of training steps. | 5000 |
learning_rate | float | Learning rate between (0, 1). | 0.0001 |
num_lr_decays | int | Number of learning rate decays, evenly distributed across max_steps. | -1 |
early_stop_patience_steps | int | Number of validation iterations before early stopping. | -1 |
val_check_steps | int | Number of training steps between every validation loss check. | 100 |
batch_size | int | number of different series in each batch. | 32 |
valid_batch_size | int | number of different series in each validation and test batch, if None uses batch_size. | None |
windows_batch_size | int | number of windows to sample in each training batch, default uses all. | 1024 |
inference_windows_batch_size | int | number of windows to sample in each inference batch. | 1024 |
start_padding_enabled | bool | if True, the model will pad the time series with zeros at the beginning, by input size. | False |
training_data_availability_threshold | Union[float, List[float]] | minimum fraction of valid data points required for training windows. Single float applies to both insample and outsample; list of two floats specifies [insample_fraction, outsample_fraction]. Default 0.0 allows windows with only 1 valid data point (current behavior). | 0.0 |
step_size | int | step size between each window of temporal data. | 1 |
scaler_type | str | type of scaler for temporal inputs normalization see temporal scalers. | ‘identity’ |
random_seed | int | random_seed for pytorch initializer and numpy generators. | 1 |
drop_last_loader | bool | if True TimeSeriesDataLoader drops last non-full batch. | False |
alias | str | optional, Custom name of the model. | None |
optimizer | Subclass of ‘torch.optim.Optimizer’ | optional, user specified optimizer instead of the default choice (Adam). | None |
optimizer_kwargs | dict | optional, list of parameters used by the user specified optimizer. | None |
lr_scheduler | Subclass of ‘torch.optim.lr_scheduler.LRScheduler’ | optional, user specified lr_scheduler instead of the default choice (StepLR). | None |
Informer.fit
fit method, optimizes the neural network’s weights using the
initialization parameters (learning_rate, windows_batch_size, …)
and the loss function as defined during the initialization.
Within fit we use a PyTorch Lightning Trainer that
inherits the initialization’s self.trainer_kwargs, to customize
its inputs, see PL’s trainer arguments.
The method is designed to be compatible with SKLearn-like classes
and in particular to be compatible with the StatsForecast library.
By default the model is not saving training checkpoints to protect
disk memory, to get them change enable_checkpointing=True in __init__.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | TimeSeriesDataset | NeuralForecast’s TimeSeriesDataset, see documentation. | required |
val_size | int | Validation size for temporal cross-validation. | 0 |
random_seed | int | Random seed for pytorch initializer and numpy generators, overwrites model.init’s. | None |
test_size | int | Test size for temporal cross-validation. | 0 |
| Type | Description |
|---|---|
| None |
Informer.predict
Trainer execution of predict_step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | TimeSeriesDataset | NeuralForecast’s TimeSeriesDataset, see documentation. | required |
test_size | int | Test size for temporal cross-validation. | None |
step_size | int | Step size between each window. | 1 |
random_seed | int | Random seed for pytorch initializer and numpy generators, overwrites model.init’s. | None |
quantiles | list | Target quantiles to predict. | None |
h | int | Prediction horizon, if None, uses the model’s fitted horizon. Defaults to None. | None |
explainer_config | dict | configuration for explanations. | None |
**data_module_kwargs | dict | PL’s TimeSeriesDataModule args, see documentation. |
| Type | Description |
|---|---|
| None |
Usage Example
2. Auxiliary Functions
ConvLayer
Module
ConvLayer
ProbAttention
Module
ProbAttention

