[ENH] `DMLForecaster` for causal forecasting with confounder adjustment #8797

XAheli · 2025-09-12T12:36:03Z

Reference Issues/PRs

Fixes #8785

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

FYI @marrov @ankurankan

fkiraly

I think we need to allow X to be None. If that happens, simply use only the endogenous forecaster, and ignore the rest of the algorithm. I think that is the natural, "degenerate" special case for this algorithm.

Also please try to ensure that check_estimator passes.

ankurankan · 2025-09-26T14:21:29Z

sktime/forecasting/compose/_dmlf.py

+            )
+
+        X_ex = X[self.exposure_vars].copy()
+        X_conf = X.drop(columns=self.exposure_vars).copy()


I think it should be explicitly mentioned in the documentation that non-exposure variables will be considered as confounders. Treating non-confounding variables as confounders can lead to biased or high-variance predictions.

please see the current code if it address this correctly

ankurankan · 2025-09-26T14:25:12Z

sktime/forecasting/compose/_dmlf.py

+        # Step 4: Fit final versions for prediction
+        self.forecaster_y_final_ = clone(self.forecaster_y)
+        self.forecaster_y_final_.fit(y, X=X_conf, fh=fh)
+
+        self.forecaster_ex_final_ = clone(self.forecaster_ex)
+        self.forecaster_ex_final_.fit(X_ex, X=X_conf, fh=fh)


How are these different from the models fitted earlier? Do we need to do this fitting twice?

How are these different from the models fitted earlier? Do we need to do this fitting twice?

From my understanding (please correct if wrong), fiirst fitting is used to compute residuals for the causal effect estimation and second fitting is used for out-of-sample prediction components. This separation is needed because residual models need to predict on training indices (in sample) and final models need to predict on future indices (oos). Using the same fitted model for both can lead to overfitting issues ?

agreed, in-short, I think its because of the forecasting horizon, first fit uses the insample as fh and second fit uses the user-provided fh.
But I would argue that for some forecasters (depending on the internal implementation) this could lead to 2 models with different weights

ankurankan · 2025-09-26T14:33:46Z

sktime/forecasting/compose/_dmlf.py

+        # Combine intervals (assumes independence)
+        return pred_int_conf + pred_int_causal


I am not sure if this would be as straightforward as adding these two predictions. Depending on the estimator used, both of the quantities would have some posterior distribution, and the final value would depend on the distributions.

could you see the current implementation if that does it the right way? I tried to use some logic from ResidualBoostingForecaster, but it needs a review

fkiraly

Added review:

first fit of the residual forecasters should be in-sample, so fh should be the same as the y passed
the probabilistic foreceasting methods should use shifting of the probabilistic residual forecast by the point forecast of the original forecast, not adding the proba forecast. The logic for this is the same as in the ResidualForecaster, from which the utility could be reused (maybe move to a common location)

Please also address my review on X=None above.

- since they are covered by `check_estimator` - exceptions are removed in code

fkiraly · 2025-10-14T18:16:52Z

sktime/forecasting/causal/_dmlf.py

+    >>> fh = [1, 2, 3]
+    >>> dml_forecaster.fit(y_train, X=X_train, fh=fh)
+    DoubleMLForecaster(exposure_vars=['GNP'], outcome_forecaster=NaiveForecaster(),
+                       residual_forecaster=RecursiveTabularRegressionForecaster(estimator=LinearRegression()),


why is this line not failing code formatting?

don't really know, should they?
are you expecting the "line >80 chars" check failure? I don't think it fails for code blocks starting with ">>>"

sktime/forecasting/causal/_dmlf.py

fkiraly · 2025-10-14T18:17:32Z

sktime/forecasting/compose/_dmlf.py

+    3. Fit the *residual forecaster* on these residuals to estimate the causal
+       effect of the exposure on the outcome.
+
+    The residual forecaster is typically a simple linear model, ensuring


"typically" is not precise, remove or rework

removed in 8d987a4

fkiraly · 2025-10-14T18:18:26Z

sktime/forecasting/causal/_dmlf.py

+        outcome and treatment. If None, defaults to a recursive reduction
+        forecaster wrapping a linear regression model.
+
+    exposure_vars : list of str, optional (default=None)


same here, explain the default

fixed in 8d987a4

fkiraly

Looks great! Also extremely neat that it works with hierarchical forecasters!

A few small requests:

can we move the new file to a folder forecasting.causal?
can we add the exact algorithm from the issue to the docstring preamble?
question: can we think of a way to shorten the forecaster names?
there is some code duplication with ResidualBoostingForecaster should we try to deduplicate by moving common code to a common utility or mixin?

geetu040 · 2025-10-17T13:06:03Z

question: can we think of a way to shorten the forecaster names?

Yeah, I agree the names are a bit long, but I'd still lean toward keeping them for clarity and consistency with DML terminology.
If we do want to shorten them though, a few options could be:

outcome_forecaster → out_forecaster or y_forecaster (old name)
treatment_forecaster → treat_forecaster or ex_forecaster (old name)
residual_forecaster → resid_forecaster or res_forecaster (old name)

We could also drop _forecaster entirely and use _model or _est instead if that feels cleaner.

fkiraly · 2025-10-17T20:55:31Z

question: can we think of a way to shorten the forecaster names?

Yeah, I agree the names are a bit long, but I'd still lean toward keeping them for clarity and consistency with DML terminology. If we do want to shorten them though, a few options could be:
* `outcome_forecaster` → `out_forecaster` or `y_forecaster` (old name)

* `treatment_forecaster` → `treat_forecaster` or `ex_forecaster` (old name)

* `residual_forecaster` → `resid_forecaster` or `res_forecaster` (old name)
We could also drop _forecaster entirely and use _model or _est instead if that feels cleaner.

Hm, I think "forecaster" is the redundant term, not "outcome" etc. How about: outcome_f, treatment_f, residual_f? Or fcst instead of f to avoid association with "function"?

geetu040 · 2025-10-20T05:05:39Z

This PR is ready for review, failing tests are unrelated. Please take a look at your convenience.
FYI: @fkiraly @ankurankan @marrov

fkiraly · 2025-10-22T19:28:09Z

sktime/forecasting/causal/_dmlf.py

+
+    **Fit procedure**
+
+    1. Split exogenous data ``X`` into exposure variables ``X_exposure`` and


Can you please make this more precise? Which forecaster, which fh; etc. Could you simply transfer my specification to here?

fkiraly · 2025-10-22T19:30:59Z

sktime/forecasting/causal/_dmlf.py

+class DoubleMLForecaster(BaseForecaster):
+    """Double Machine Learning forecaster for causal time-series forecasting.
+
+    Implements the Double Machine Learning (DML) framework [1]_ for time-series,


it is not exactly the algorithm in [1], but an adaptation for time series (by myself) afaik

fkiraly

Looks good to me! Only minor documentation requests.

fix: dmlf test suites

43a3a10

XAheli requested review from benHeid, felipeangelimvieira, fkiraly and yarnabrina as code owners September 12, 2025 12:36

XAheli marked this pull request as draft September 12, 2025 12:36

fix: removed dmlf summary test

fc726e5

fkiraly added this to May - Sep 2025 mentee projects Sep 17, 2025

fkiraly moved this to PR in progress in May - Sep 2025 mentee projects Sep 17, 2025

fkiraly assigned XAheli Sep 17, 2025

fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Sep 24, 2025

XAheli marked this pull request as ready for review September 25, 2025 12:53

fkiraly requested changes Sep 25, 2025

View reviewed changes

jgyasu moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Sep 26, 2025

ankurankan reviewed Sep 26, 2025

View reviewed changes

fkiraly requested changes Oct 3, 2025

View reviewed changes

XAheli and others added 10 commits October 9, 2025 23:19

Merge branch 'sktime:main' into feature/DMLF

42a74c4

update tags

521363d

update params and handling X

e4aca65

update fit and predict functions

aea8749

rename DMLForecaster -> DoubleMLForecaster

1b7e0e5

fixes for check_estimator

c395f7c

update for multiindex

2d7864d

rename variables

812ca39

8000

update docs and comments

cca0c32

removing explit tests

40cf6b4

- since they are covered by `check_estimator` - exceptions are removed in code

fkiraly assigned geetu040 and unassigned XAheli Oct 13, 2025

fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Oct 13, 2025

fkiraly added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Oct 14, 2025

geetu040 requested review from ankurankan, fkiraly and marrov and removed request for ankurankan October 14, 2025 16:42

fkiraly changed the title ~~[ENH] Add DMLForecaster for causal forecasting with confounder adjustment~~ [ENH] DMLForecaster for causal forecasting with confounder adjustment Oct 14, 2025

fkiraly reviewed Oct 14, 2025

View reviewed changes

sktime/forecasting/causal/_dmlf.py Show resolved Hide resolved

fkiraly reviewed Oct 14, 2025

View reviewed changes

fkiraly requested changes Oct 14, 2025

View reviewed changes

fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Oct 16, 2025

geetu040 added 5 commits October 17, 2025 17:07

update docs

8d987a4

move to forecasting.causal

aec0536

move _add_det_to_proba to mixin

3b6a8d5

Merge branch 'main' into feature/DMLF

7aec659

E96D

remove ResidualBoostingForecaster._add_det_to_proba

6e71ce5

geetu040 requested a review from fkiraly October 17, 2025 13:06

fix requires-fh tag

d5010d2

geetu040 and others added 5 commits October 18, 2025 09:12

rename component forecasters

4e6b118

Merge remote-tracking branch 'origin/main' into feature/DMLF

98be21d

Merge branch 'main' into pr/8797

7e21915

rename: OosResidualsWrapper -> OosForecaster

a3f1510

Merge remote-tracking branch 'origin/main' into feature/DMLF

be20753

fkiraly reviewed Oct 22, 2025

View reviewed changes

fkiraly requested changes Oct 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] `DMLForecaster` for causal forecasting with confounder adjustment #8797

[ENH] `DMLForecaster` for causal forecasting with confounder adjustment #8797

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		# Combine intervals (assumes independence)
		return pred_int_conf + pred_int_causal


		Fit procedure

		1. Split exogenous data ``X`` into exposure variables ``X_exposure`` and

Uh oh!

[ENH] DMLForecaster for causal forecasting with confounder adjustment #8797

Are you sure you want to change the base?

[ENH] DMLForecaster for causal forecasting with confounder adjustment #8797

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ENH] `DMLForecaster` for causal forecasting with confounder adjustment #8797

[ENH] `DMLForecaster` for causal forecasting with confounder adjustment #8797