Pre-Diagnostics (`src.diagnostics.pre_diagnostics`)
Overview
Section titled “Overview”pre_diagnostics.py runs statistical checks before posterior interpretation: stationarity on the target, multicollinearity checks on regressors, and pairwise transfer entropy screening. Outputs are written to stage 10_pre_diagnostics/.
Function Signatures
Section titled “Function Signatures”from src.diagnostics.pre_diagnostics import ( run_stationarity_tests, run_vif_tests, run_transfer_entropy, run_all_pre_diagnostics,)
run_stationarity_tests( data: pd.DataFrame, date_col: str, cols: list[str], *, kpss_regression: str = "c", kpss_nlags: str | int | None = "auto", adf_maxlag: int | None = None, adf_regression: str = "c", dropna: bool = True,) -> pd.DataFrame
run_vif_tests( data: pd.DataFrame, cols: list[str], *, include_constant: bool = True, dropna: str = "pairwise",) -> pd.DataFrame
run_transfer_entropy( data: pd.DataFrame, date_col: str, x_cols: list[str], y_col: str, *, max_lag: int = 1, bins: int = 8, permutations: int = 200, random_state: int = 42, normalize: bool = True, dropna: bool = True,) -> pd.DataFrame
run_all_pre_diagnostics( data: pd.DataFrame, config: dict[str, Any], results_dir: str, *, stationarity_cols: list[str] | None = None, vif_cols: list[str] | None = None, te_x_cols: list[str] | None = None, te_y_col: str | None = None, te_include_controls_in_x: bool = False, stationarity_kwargs: dict[str, Any] | None = None, vif_kwargs: dict[str, Any] | None = None, te_kwargs: dict[str, Any] | None = None,) -> dict[str, str]Parameters (Orchestrator)
Section titled “Parameters (Orchestrator)”| Parameter | Description |
|---|---|
data | Processed modelling dataframe. |
config | Pipeline configuration; used to infer date_col, target_col, media spend_cols, and extra_features_cols. |
results_dir | Run root directory; save_csv(...) routes into 10_pre_diagnostics/. |
stationarity_cols | Optional override for ADF/KPSS variables. Default is target only. |
vif_cols | Optional override for VIF variables. Default is media + controls. |
te_x_cols, te_y_col | Optional overrides for transfer-entropy direction setup. |
te_include_controls_in_x | Adds controls to TE X-set when True. |
*_kwargs | Extra keyword arguments passed to each underlying diagnostic function. |
Artefacts Produced
Section titled “Artefacts Produced”| Filename | Stage folder | Description |
|---|---|---|
stationarity_summary.csv | 10_pre_diagnostics/ | ADF + KPSS metrics and combined stationarity conclusion by variable. |
vif_summary.csv | 10_pre_diagnostics/ | VIF/tolerance/correlation summary with high-VIF flagging. |
transfer_entropy_summary.csv | 10_pre_diagnostics/ | Pairwise TE(X→Y), TE(Y→X), permutation p-values, and direction label. |
Interpretation Guidance
Section titled “Interpretation Guidance”1. Stationarity (ADF + KPSS)
Section titled “1. Stationarity (ADF + KPSS)”| ADF result | KPSS result | Conclusion |
|---|---|---|
| Reject H0 (p < 0.05) | Fail to reject H0 (p >= 0.05) | Likely stationary |
| Fail to reject H0 (p >= 0.05) | Reject H0 (p < 0.05) | Likely unit root |
| Other combinations | Other combinations | Inconclusive |
Remediation for likely unit-root behaviour:
- First differencing.
- Detrending.
- Log transform for multiplicative trends.
2. Variance Inflation Factor (VIF)
Section titled “2. Variance Inflation Factor (VIF)”| VIF | Severity | Action |
|---|---|---|
< 5 | Low collinearity | Usually no action. |
5 to < 10 | Moderate collinearity | Monitor and stress-test estimates. |
>= 10 | High collinearity | Combine/remove regressors, or add stronger regularisation structure. |
3. Transfer Entropy (Pairwise, Unconditional)
Section titled “3. Transfer Entropy (Pairwise, Unconditional)”| Condition | Direction | Interpretation |
|---|---|---|
| TE(X→Y) significant and stronger than TE(Y→X) | x→y | X may contain predictive information for Y. |
| TE(Y→X) significant and stronger than TE(X→Y) | y→x | Reverse predictive direction may dominate. |
| Both significant | bidirectional | Mutual predictive relationship. |
| Neither significant | none | No strong directional signal. |
Important caveat:
- This implementation is pairwise TE and does not control for confounders.
- Use as an exploratory diagnostic, not a causal identification claim.
Usage Example
Section titled “Usage Example”import pandas as pd
from src.diagnostics.pre_diagnostics import run_all_pre_diagnostics
# Example only: in production, the V2 driver prepares the processed dataframe.data = pd.read_csv("processed_input.csv")
config = { "date_col": "DATE", "target_col": "revenue", "media": [ {"display_name": "TV", "spend_col": "tv_spend"}, {"display_name": "Search", "spend_col": "search_spend"}, ], "extra_features_cols": ["price_index", "competitor_sales"],}
paths = run_all_pre_diagnostics( data=data, config=config, results_dir="results/run_20260304_101500",)
print(paths)V2 Driver Integration
Section titled “V2 Driver Integration”Pre-diagnostics are orchestrated through the V2 driver workflow (MMMBaseDriverV2 -> WorkflowExecutor) rather than relying on a standalone script entrypoint.
from src.driver.base import MMMBaseDriverV2
driver = MMMBaseDriverV2( config_filename="data-config/demo_config.yml", input_filename="data-config/demo_data.csv", holidays_filename="data-config/holidays.csv", results_filename="results",)
driver.main()Relationship to Workflow Stages and Gates
Section titled “Relationship to Workflow Stages and Gates”- Stage:
10_pre_diagnostics/. - These checks are pre-fit quality diagnostics that support gate
g1interpretation readiness and reduce downstream model risk. - They are advisory diagnostics; hard machine-readable gate states are emitted later by convergence/calibration diagnostics in
50_diagnostics/.