Release and platform policy

AlloyGBM 0.12.8 release notes and platform policy.

What’s new in 0.12.8

Feature release on top of v0.12.7. Narrows limitation #4 from docs/limitations.md: the GLM ("poisson", "gamma", "tweedie") and "quantile" objectives now work on GBMRanker and MultiLabelGBMRanker in addition to single-output GBMRegressor. Only the Classifier / multiclass softmax paths still reject these objectives.

GLM and Quantile objectives on ``GBMRanker``. GBMRanker(ranking_objective="poisson" | "gamma" | "tweedie" | "quantile", …) is now accepted. The objectives reuse GBMRegressor’s training path and the artifact-recorded post-transform (the predictor applies exp for GLM objectives), so predictions return on the natural scale. tweedie_variance_power and quantile_alpha are honored.
GLM and Quantile objectives on ``MultiLabelGBMRanker``. Both multi_label_mode="independent" and "joint" accept per-label GLM/quantile objectives, including mixed lists such as ranking_objective=["poisson", "gamma", "tweedie", "quantile"]. In joint mode the GLM exp post-transform is applied on the Python predict surface, and the .alloy bundle (v3) now persists ranking_objective so the post-transform survives a save_model/load_model roundtrip.
Engine. JointObjective gained Poisson / Gamma / Tweedie { variance_power } / Quantile { alpha } variants (delegating to the existing single-output ObjectiveOps impls), plus a joint empirical-quantile leaf-refinement pass (refine_joint_quantile_leaves).
Bug fix. Joint GLM/quantile predictions previously lost the post-transform after save_model/load_model because the bundle did not persist ranking_objective; the v3 metadata now stores and restores it.

What’s new in 0.12.7

Feature and compatibility release on top of v0.12.6. Closes limitation #6 from docs/limitations.md: Quantile regression now fully composes with DART boosting, MorphBoost training, and piecewise-linear (leaf_model="linear") leaves.

Quantile objective compatibility extended. GBMRegressor(objective="quantile") now successfully composes with:
- DART boosting (boosting_mode="dart"): leaf refinement operates correctly on dropped-out residuals.
- MorphBoost (training_mode="morph"): leaf refinement scales intercept updates by MorphBoost per-round shrinkage and depth-based penalty.
- Piecewise-linear leaves (leaf_model="linear"): leaf refinement calculates residual targets by subtracting the linear portion of predictions from training values (correctly walking from root to terminal leaf to accumulate parent-relative delta weights for max_depth >= 2), and only refines the flat leaf intercept (avoiding double-scaling of build-time solved linear slopes).
Linear leaves + quantile numeric test. Added a new robust, multi-feature, max_depth >= 4 numeric test test_quantile_linear_leaves_numeric verifying that linear-leaf quantile regression fits linear relationships significantly better than standard constant-leaf models and that path-level weights accumulate correctly.
Fixed double-scaling blocker on linear leaf weights during quantile leaf refinement. Solved linear weights already carry the appropriate learning rate scale from build time and are now left untouched during intercept refinement.
Aligned MorphBoost shrinkage calculation. Aligned iter_shrinkage calculation in trainer/mod.rs with the authoritative tree builder (tree_build.rs) formula, removing the redundant .max(0.0) clamp to avoid cosmetic divergence.

No artifact format change. Model artifacts written by v0.12.6 load and predict identically under v0.12.7.

What’s new in 0.12.6

Feature release on top of v0.12.5. Closes limitation #3 from docs/limitations.md: SHAP values and interaction values are now supported on multiclass classifiers and multi-output (joint) rankers in addition to single-output regressors.

GBMClassifier.shap_values(X) and GBMClassifier.shap_interaction_values(X) return a list of K arrays — one per class logit. Additivity per class: Σⱼ values[k][i][j] + expected_values[k] ≈ raw_logit_k(rows[i]).
MultiLabelGBMRanker.shap_values(X) and MultiLabelGBMRanker.shap_interaction_values(X) return a list of n_labels arrays — one per output. Joint mode (multi_label_mode="joint") routes through new per-output Rust entry points with full binning-context support; independent mode fans out to per-label GBMRanker.shap_values.
global_importance_from_artifact_bytes now averages over outputs (divides by n_models) so importance magnitudes remain comparable across single-output and multi-output models.
The Rust crate gained four new public entry points: explain_rows_from_artifact_bytes_per_output, explain_rows_from_artifact_bytes_with_binning_per_output, explain_interactions_from_artifact_bytes_per_output, and explain_interactions_from_artifact_bytes_with_binning_per_output. The original single-output entry points keep their existing signature and now error on K>1 artifacts directing callers to the _per_output variants.

Internal refactors. load_artifact_context decomposed into unroll_multiclass, parse_joint_baselines, and unroll_multi_output helpers (orchestrator stays ~45 lines). bindings/python/src/predict.rs split into predict.rs (predictor entry points) and shap_bridge.rs (all 16 SHAP PyO3 wrappers — 8 single-output + 8 _multi). Continues the v0.12.2 / v0.12.3 decomposition pattern.

No artifact format change. Model artifacts written by v0.12.5 load and predict identically under v0.12.6.

What’s new in 0.12.5

Small feature release on top of v0.12.4. Closes the leaf_model="linear" exception on SHAP interaction values that was carved out when interactions originally shipped in v0.11.0.

GBMRegressor.shap_interaction_values(X) now accepts artifacts trained with leaf_model="linear". The row-dependent linear deviation w_j · (x_j − μ_j) is credited to the diagonal of the interaction matrix (the regressor feature’s main effect): standard TreeSHAP interactions run on the constant part of each leaf (intercept + Σⱼ wⱼ·μⱼ), then the per-row deviations are folded onto Φ[j][j] via the same helper that backs PL-leaf shap_values. Full additivity (Σᵢⱼ Φᵢⱼ + E = ŷ) and row-marginal (Σⱼ Φᵢⱼ = φᵢ) hold by construction; the matrix is symmetric and expected_value is unchanged.
Pragmatic caveat: this attribution does not split linear-deviation credit across path-feature × regressor-feature off-diagonals; a faithful PL-leaf interaction decomposition remains an open extension.
Internal refactor: explain_interactions_from_model moved from crates/shap/src/lib.rs to crates/shap/src/tree_shap.rs next to its peer explain_rows_tree_shap. Continues the v0.12.2 SHAP-crate decomposition pattern; no behavioral change.

No artifact format change. Model artifacts written by v0.12.4 load and predict identically under v0.12.5. 644 pytest (v0.12.4 baseline 643 plus the renamed-and-extended linear-leaf interactions test and the new LinearRank × linear-leaves coverage) and 447 cargo (v0.12.4 baseline 445 plus two new shap_interactions_linear_leaves_*_satisfies_additivity tests).

What’s new in 0.12.4

Bugfix release on top of v0.12.3. Two post-merge review findings (issues #48, #49) from the v0.12.2 / v0.12.3 refactor PRs:

GBMRegressor.__module__ now reports its public alloygbm.regressor shim path instead of the private alloygbm._regressor._core implementation module. repr and newly-created pickle payloads no longer leak the internal package layout; old v0.12.3 pickles continue to load.
The joint trainer’s module-level documentation in crates/engine/src/joint/mod.rs is refreshed to reflect the v0.10.x feature parity (DART, GOSS, MorphBoost, DRO, factor neutralization, warm-start, leaf-wise growth, native categorical splits, interaction constraints) that had landed since the original v0.10.0 minimal scope.

No user-facing API changes, no behavioral changes, no new features. Model artifacts written by v0.12.3 load and predict identically under v0.12.4. 643 pytest (the v0.12.3 baseline of 641 plus the two new regression tests for the module-identity fix) and 445 cargo tests pass.

What’s new in 0.12.3

Phases 6–8 of the structural refactor — completing the program. No user-facing API changes, no behavioral changes, no new features. The 6,619-line bindings/python/src/lib.rs (the PyO3 bridge) was decomposed into nine focused submodules plus a slim lib.rs and an extracted tests/ submodule; the 4,909-line bindings/python/alloygbm/regressor.py (the GBMRegressor estimator) was decomposed into a _regressor/ mixin package (_base plus four mixins and a _core shell), with regressor.py reduced to a back-compat shim.

No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.2 load and predict identically under v0.12.3.
from alloygbm.regressor import GBMRegressor and the alloygbm.regressor module name are unchanged; GBMClassifier / GBMRanker subclass GBMRegressor transparently.
Closes the file-decomposition program (issue #44). The 445 cargo + 641 pytest tests held at every refactor commit.

What’s new in 0.12.2

Phase 4 + Phase 5 of the structural refactor. No user-facing API changes, no behavioral changes, no new features. The 3,925-line crates/shap/src/lib.rs was decomposed into eight focused single-responsibility modules; the 5,088-line crates/engine/src/joint.rs was promoted to a crates/engine/src/joint/ subdir with five sibling modules.

No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.1 load and predict identically under v0.12.2; v0.12.2 produces byte-identical artifacts to v0.12.1 from the same training data.
No public Rust API changes. Every pub symbol that resolved at alloygbm_shap::* or alloygbm_engine::joint::* in v0.12.1 still resolves at the same path in v0.12.2 via the pub use re-exports in the SHAP crate’s lib.rs and in joint/mod.rs.
Verified at every commit. All 445 cargo workspace tests and all 641 pytest tests pass unchanged on every one of the 15 refactor commits (9 for the SHAP crate, 6 for the engine joint trainer). Function bodies were moved byte-identically; visibility promotions on private items were limited to the minimum required for sibling-module access (private fn to pub(super) or pub(crate), never past pub(crate)).

After this release, the remaining queued refactor work is the PyO3 binding (Phase 6), the Python regressor (Phase 7), and a cross-cutting verification + CLAUDE.md refresh (Phase 8) — see tracking issue #44. Each ships as its own patch release.

What’s new in 0.12.1

Phase 2 + Phase 3 of the structural refactor. No user-facing API changes, no behavioral changes, no new features. The 4,822-line crates/core/src/lib.rs was decomposed into thirteen focused single-responsibility modules; the 3,987-line crates/backend_cpu/src/lib.rs was decomposed into five sibling modules.

No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.0 load and predict identically under v0.12.1; v0.12.1 produces byte-identical artifacts to v0.12.0 from the same training data.
No public Rust API changes. Every pub symbol that resolved at alloygbm_core::* or alloygbm_backend_cpu::* in v0.12.0 still resolves at the same path in v0.12.1 via the pub use re-exports in each crate’s lib.rs.
Verified at every commit. All 445 cargo workspace tests and all 641 pytest tests pass unchanged on every one of the 18 refactor commits (13 for the core crate, 5 for backend_cpu). Function bodies were moved byte-identically; visibility promotions on private items in backend_cpu were limited to the minimum required for sibling-module access (private fn to pub(crate) fn, never past pub(crate)).

After this release, the remaining queued refactor work is the SHAP crate (Phase 4), the engine joint trainer (Phase 5), the PyO3 binding (Phase 6), the Python regressor (Phase 7), and a cross-cutting verification + CLAUDE.md refresh (Phase 8) — see tracking issue #44. Each ships as its own patch release.

What’s new in 0.12.0

Engine crate refactor. No user-facing API changes, no behavioral changes, no new features. The 15,189-line crates/engine/src/lib.rs monolith was decomposed into 24 focused single-responsibility modules across a new crates/engine/src/ layout and a new crates/engine/src/trainer/ submodule directory. The remaining lib.rs is 101 lines of module declarations and pub use re-exports.

No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.11.1 load and predict identically under v0.12.0; v0.12.0 produces byte-identical artifacts to v0.11.1 from the same training data.
No public Rust API changes. Every pub symbol that resolved at alloygbm_engine::* in v0.11.1 still resolves at the same path in v0.12.0 via the pub use re-exports in lib.rs.
Verified at every commit. All 207 engine unit tests, all 445 workspace Rust tests, and all 641 pytest tests pass unchanged on every one of the 24 refactor commits. Function bodies were moved byte-identically; visibility promotions were limited to the minimum required by the new module boundary (private fn to pub(crate) fn, never past pub(crate)).

Scope: only crates/engine/src/lib.rs. The other large files (bindings/python/src/lib.rs, crates/engine/src/joint.rs, bindings/python/alloygbm/regressor.py, crates/core/src/lib.rs, crates/backend_cpu/src/lib.rs, crates/shap/src/lib.rs) are untouched and queued for future releases.

What’s new in 0.11.1

Quantile regression. GBMRegressor accepts a new quantile regression objective (objective="quantile") with pinball loss semantics and parameter quantile_alpha (default 0.5, strictly in (0.0, 1.0)).

Empirical Quantile Leaf Refinement: At the end of each round, a custom post-growth leaf refinement step (refine_quantile_leaf_values) is run to replace Newton-Raphson leaf predictions with the actual empirical quantiles of residuals for all rows in each leaf.
Full-dataset refinement: Under row_subsample < 1.0, split-finding runs on the subsampled subset, but leaf refinement uses the entire training set to minimize the estimation variance of the empirical quantile.
Proxy Hessian: Since the pinball loss has a zero second derivative everywhere, a proxy Hessian h_i = w_i (sample weight) is used during split-finding.
Quickselect optimization: The unweighted refinement path uses a fast O(N) quickselect algorithm (select_nth_unstable_by) instead of sorting O(N log N), avoiding performance degradation.
Validation: Gated validation ensures that invalid quantile_alpha settings are only rejected when objective="quantile" is active, leaving non-quantile models unaffected.

Scope limit: Single-output GBMRegressor only. Rejects combinations with DART boosting, MorphBoost, linear leaves (leaf_model="linear"), classification, ranking, and joint multi-output training.

What’s new in 0.11.0

Two small, independent wins in one release.

SHAP interaction values. GBMRegressor.shap_interaction_values(X) returns the (n_rows, n_features, n_features) pairwise SHAP-interaction tensor in O(T · L · D² · M) time. Implements Lundberg et al. (2020) Algorithm 2, ported verbatim from the canonical slundberg/shap C++ reference. Three invariants are pinned by tests: symmetric (Φ_ij == Φ_ji), row-marginal recovers per-feature SHAP (Σ_j Φ_ij == φ_i), and full additivity reconstructs the prediction (Σ_i Σ_j Φ_ij + expected_value == predict(x) within atol = 1e-5 + rtol = 1e-4 · |predict(x)|). Constant-leaf artifacts only; leaf_model="linear" is rejected.

Poisson / Gamma / Tweedie GLM objectives. GBMRegressor accepts three new log-link GLM objectives. All three use weighted-mean-in-log-space initial predictions, Newton-Raphson leaves, and the standard ObjectiveOps machinery. predict() returns exp(raw). Tweedie supports 1 < variance_power < 2 (compound Poisson-gamma) via the new tweedie_variance_power: float = 1.5 constructor kwarg. New deviance metrics in alloygbm.evaluation: poisson_deviance, gamma_deviance, tweedie_deviance(y_true, y_pred, variance_power=p). Target-domain validation raises ValueError before training starts when targets violate the domain (negative y for Poisson/Tweedie, non-positive y for Gamma).

Single-output GBMRegressor only; not on Ranker, Classifier, multiclass, or the joint multi-output ranker.

What’s new in 0.10.6

Closes the last v0.10.4-deferred joint-path follow-up: all three factor neutralization modes now work on the joint multi-output trainer. MultiLabelGBMRanker(multi_label_mode="joint", neutralization=…, factor_exposures=…) supports "pre_target", "per_round_gradient", and "split_penalty" with the same surface as the single-output GBMRegressor / GBMRanker. The joint trainer reaches full feature parity with the single-output path. Default behaviour for every existing user-facing API remains byte-identical to v0.10.5 when neutralization is not opted into.

Three new modes, all activated via the neutralization kwarg:

pre_target — residualize each per-output target through the factor exposures once before training. Requires every per-output objective to be squared_error (the only objective where residualize-target equals residualize-gradient).
per_round_gradient — project each of the K gradient buffers in place every round after computing them. Mirrors the single-output multiclass per-class projection pattern.
split_penalty — subtract a K-output factor-load penalty from each candidate split’s gain. Applies under both tree_growth="level" and tree_growth="leaf".

Three new kwargs admitted by _JOINT_SUPPORTED_KWARGS:

neutralization — "none" (default), "pre_target", "per_round_gradient", or "split_penalty"
factor_neutralization_lambda — ridge regularization on the projector Gram matrix (default 1e-6)
factor_penalty — split_penalty mode’s penalty multiplier (default 0.0 — 0 collapses to standard byte-for-byte)

Plus the factor_exposures= kwarg on fit() (already existed for the independent-mode fallback; now honored on joint too). The PyO3 bridge cross-validates the exposures-vs-config invariant: active config requires exposures, exposures require an active config.

Artifact: new ModelSectionKind::NeutralizationMetadata (kind=14) records the active config in the artifact so joint models are self-describing. Metadata only; prediction never reads it (neutralization is a training-time transformation; the trained leaf values already bake in the projection).

Byte-equivalence: a fit with neutralization='none' (or kind=None, or split_penalty=0) produces byte-identical artifact bytes to a pre-v0.10.6 fit. Pinned by joint_neutralization_inert_configs_match_v0_10_5_byte_for_byte. Composes with MorphBoost (training_mode="morph"), DRO leaves (leaf_solver="dro"), DART boosting, and warm-start.

What’s new in 0.10.5

Closes the joint DRO leaves follow-up from v0.10.4. MultiLabelGBMRanker(multi_label_mode="joint", leaf_solver="dro", dro_radius=…, dro_metric="wasserstein") now applies Wasserstein-distributionally-robust leaf values on the joint multi-output trainer, mirroring GBMRegressor / GBMRanker’s single-output leaf solver. Default behaviour for every existing user-facing API remains byte-identical to v0.10.4 when DRO is not opted into.

Joint DRO leaves: routes the K-output Newton-Raphson leaf step through alloygbm_core::leaf_effective_gradient (the same helper used by single-output GBMRegressor / GBMRanker since v0.6.x). Applied in-build inside build_joint_round_inner’s leaf_values closure and build_joint_round_leafwise’s per-output leaf computation — row indices are already in scope at leaf-computation time. DRO is leaf-only: split-gain dispatch still uses the standard K-output sum-of-XGBoost-gains (multi-output histogram doesn’t carry per-bin grad_sq; adding it would cost ~1.5× joint-round memory — split-time DRO is deferred pending benchmark evidence).

Three new kwargs in _JOINT_SUPPORTED_KWARGS:

leaf_solver — "standard" (default) or "dro"
dro_radius — float ≥ 0; 0.0 collapses to standard byte-for-byte
dro_metric — "wasserstein" (only supported value in v0.10.5)

Works under both tree_growth="level" and tree_growth="leaf", and composes with MorphBoost (training_mode="morph") and DART/GOSS boosting modes. Byte-equivalent to v0.10.4 when lambda_l1 == 0 AND (dro_config.is_none() OR dro_config.radius == 0.0); pinned by joint_dro_radius_zero_matches_standard_byte_for_byte (cargo) and test_joint_dro_radius_zero_byte_equivalent_to_standard (pytest).

Deferred to v0.10.6: joint factor neutralization (neutralization + factor_exposures). Remains in docs/limitations.md Limitation 2 with explicit version marker.

What’s new in 0.10.4

Adds MorphBoost (Kriuk 2025, arXiv:2511.13234) to the joint multi-output trainer used by MultiLabelGBMRanker(multi_label_mode="joint"). This is the first of three deferred items from docs/limitations.md Limitation 2 to ship; DRO leaves landed in v0.10.5 and factor neutralization on the joint trainer is tracked for v0.10.6. Default behaviour for every existing user-facing API remains byte-identical to v0.10.3 when MorphBoost is not opted into.

Joint MorphBoost surface: MultiLabelGBMRanker(multi_label_mode="joint", training_mode="morph", …) now activates MorphBoost on the shared-tree multi-output trainer. Honors the full single-output MorphBoost surface — morph_rate, evolution_pressure, morph_warmup_iters, info_score_weight, depth_penalty_base, balance_penalty, lr_schedule, lr_warmup_frac. Per-iteration LR schedule (constant or warmup-cosine), per-leaf depth penalty (depth_penalty_base ^ (depth/3) where depth = (local_node_id + 1).ilog2()), and per-iteration leaf shrinkage (1 − morph_rate * round/total) all apply uniformly across the K-output leaf values.

Multi-output morph gain: two new helpers in crates/engine/src/shared_histogram.rs — compute_multi_output_split_gain_morph and find_best_multi_output_categorical_split_morph — sum per-output morph gain across the K outputs. Each output uses its own (grad_mean, grad_std) snapshot from MorphState::ema_stats[k]. Per-side row count for the info-gain term is approximated via hess.max(0.0) as u32 (multi-output histogram doesn’t carry exact counts) — exact for objectives where hess ≡ 1 per row, monotone proxy for ranking. Warmup byte-equivalence with the standard K-output gain is guaranteed regardless.

MorphBoost EMA warm-start (continuity, not byte-equivalence): JointWarmStartState.initial_ema_stats: Option<Vec<GradientEmaStats>> re-seeds MorphState::ema_stats on warm-resume so the gradient- statistics smoothing is continuous across the resume boundary — new rounds see the same per-output (mean, std) they would have seen had training never been interrupted. The PyO3 bridge auto-extracts the snapshot from init_artifact_bytes via TrainedModel::from_artifact_bytes(…).morph_metadata.

MorphBoost warm-resume is intentionally NOT byte-equivalent to a fresh longer fit. Per-iteration leaf shrinkage and LR schedule are resolved against the total_iterations horizon at training time; a prior fit with n_estimators=6 baked its first six trees against a 6-round horizon and resuming with n_estimators=4 cannot retroactively re-scale them. The EMA continuity is the practical guarantee. This mirrors the single-output MorphBoost warm-start behavior.

Deferred to v0.10.5 / v0.10.6 (from v0.10.4): joint DRO leaves (leaf_solver="dro") — shipped in v0.10.5 — and joint factor neutralization (neutralization + factor_exposures) — tracked for v0.10.6. See docs/limitations.md Limitation 2.

What’s new in 0.10.3

Closes the four “v0.10.3” follow-ups carved out of the v0.10.2 joint-trainer parity work: native-categorical Python wiring, joint GOSS, joint DART, and joint warm-start. The MultiLabelGBMRanker(multi_label_mode="joint") wrapper now accepts every kwarg the single-output trainer accepts (except MorphBoost / DRO / factor neutralization, which are tracked for v0.10.4). Default behaviour for every existing user-facing API remains byte-identical to v0.10.2 when the new knobs are not opted into.

Joint native-categorical Python wiring: the Rust-level joint native-cat trainer (fit_joint_multi_output_with_categorical + find_best_multi_output_categorical_split) was already in v0.10.2; the PyO3 bridge train_joint_multi_label_ranker now re-bins requested columns to bin_index == category_id before invoking the trainer (mirrors the single-output apply_categorical_encoding_to_training_matrices_multi). The _JOINT_SUPPORTED_KWARGS allow-list re-adds categorical_feature_indices and max_cat_threshold.

Joint GOSS: new select_joint_row_indices_for_round helper inside crates/engine/src/joint.rs mirrors select_row_indices_for_round_multiclass — per-row score is \(s_i = \\sum_k |g_{i,k}|\) across the K per-output gradient buffers (LightGBM multiclass GOSS convention). A single row mask is shared across all K buffers; the amplification factor mutates every per-output gradient/hessian in lockstep so histograms remain unbiased. MultiLabelGBMRanker(multi_label_mode='joint', boosting_mode='goss', goss_top_rate=..., goss_other_rate=...).

Joint DART: dropout/normalize cycle added to fit_joint_inner. One tree per round on the joint trainer simplifies bookkeeping vs. multiclass DART: dart_state.tree_weights has length rounds_completed and dart_round_start_offsets[r] / dart_round_counts[r] collapse to a flat per-round pair. Reuses engine::dart::{select_dropouts, apply_normalization} unchanged. Per-stump tree_weight persists via the existing DartTreeWeights artifact section (kind=11), and JointPredictor is extended with tree_weights: Vec<f32> so each tree’s leaf contribution is multiplied by tree_w at predict time.

Joint warm-start: new JointWarmStartState { baselines, stumps, initial_rounds_completed, initial_dart_tree_weights } + new fit_joint_multi_output_with_warm_start entry point. MultiLabelGBMRanker(multi_label_mode='joint', warm_start=True, init_model=<prior_fit>) cracks open the prior fit’s joint artifact, replays prior stumps onto predictions via the shared walk_tree_into_predictions helper, re-encodes new-round node_id starting at initial_rounds_completed, and (under DART) reconstructs dart_state.tree_weights from per-stump tree_weight. Per-round seeds mix global_round = round + initial_rounds so an N+M warm-resumed fit produces identical RNG draws to a fresh N+M fit on rounds N..N+M.

Deferred to later v0.10.x point releases:

v0.10.4: MorphBoost, DRO, and factor neutralization on the joint path.

What’s new in 0.10.2

Closes the leaf-wise multiclass DART limitation and the first slice of joint-path feature parity (leaf-wise growth, native-categorical, interaction constraints, row/col subsample, min_split_gain). The remaining joint-path features land in v0.10.3 (GOSS, DART, warm-start on joint) and v0.10.4 (MorphBoost, DRO, neutralization on joint). Default behaviour for every existing user-facing API remains byte-identical to v0.10.1 when the new features are not opted into.

Joint trainer core feature parity: engine::joint::fit_joint_multi_output now supports tree_growth="leaf" + max_leaves (via the new build_joint_round_leafwise priority-queue best-first growth), interaction_constraints (reusing the single-output InteractionConstraintIndex), min_split_gain, row_subsample, and col_subsample. All five are exposed through MultiLabelGBMRanker(multi_label_mode="joint") Python surface; _JOINT_SUPPORTED_KWARGS grew to permit min_split_gain, row_subsample, col_subsample, interaction_constraints, tree_growth, max_leaves.

Native-categorical splits on the joint path are partially shipped: the Rust-level find_best_multi_output_categorical_split Fisher-sort helper + fit_joint_multi_output_with_categorical entry point are in place and sound when given bins where bin_index == category_id. The Python surface is intentionally not wired in v0.10.2 because the current bridge bins all features with ContinuousBinningStrategy::Linear which doesn’t preserve that invariant for joint mode — categorical_feature_indices and max_cat_threshold are rejected in joint mode and tracked for v0.10.3.

Leaf-wise multiclass DART: GBMClassifier(boosting_mode="dart") with K ≥ 3 classes now works under tree_growth="leaf" + max_leaves. The v0.10.1 tree_growth='level' restriction in fit_multiclass_iterations_impl was lifted. Per-class dart_round_start_offsets[k] / dart_round_counts[k] bookkeeping is growth-mode-agnostic because it snapshots class_stumps[k].len() around each build_tree_* call. Validation early-stopping DART transition and DART warm-start tree-weight reconstruction work without changes.

Deferred to later v0.10.x point releases (as documented in v0.10.2, now closed):

v0.10.3 shipped: native-cat Python wiring, joint GOSS, joint DART, joint warm-start.
v0.10.4: MorphBoost, DRO, and factor neutralization on the joint path.

What’s new in 0.10.1

Closes the three v0.10.x-deferred limitations from v0.10.0: MultiLabelGBMRanker joint mode Python surface, multiclass softmax + GOSS, and multiclass softmax + DART (including warm-start). Default behaviour for every existing user-facing API remains byte-identical to v0.10.0 when the new features are not opted into.

MultiLabelGBMRanker joint mode (Python surface):

MultiLabelGBMRanker(multi_label_mode="joint") now routes through a new PyO3 entry point (train_joint_multi_label_ranker) and JointPredictorHandle py-class to the v0.10.0 Rust joint trainer engine::joint::fit_joint_multi_output. Default mode is still "independent" (the K-per-label GBMRanker fallback from v0.7.1) — joint is opt-in. Bundle format bumped to v2 with an explicit mode byte; v1 bundles still load as independent.

Multiclass softmax + GOSS:

GBMClassifier(boosting_mode="goss") for K >= 3 classes. Per-row score \(s_i = \\sum_k |g_{i,k}|\) (LightGBM convention) drives a shared sampling mask across all K class gradient buffers; the amplification factor is applied identically to every class’s grad and hess. The multiclass round loop was refactored so the K gradient buffers are pre-computed before sampling.

Multiclass softmax + DART (+ warm-start):

GBMClassifier(boosting_mode="dart") for K >= 3 classes. Per-class prediction vectors get per-round subtract/readd of dropped tree contributions scaled by dart_state.tree_weights. Per-class dart_round_start_offsets / dart_round_counts arrays track the contiguous stump slice each (round, class) tree occupies in class_stumps[k] so dropout subtracts the WHOLE class tree, not just its root stump. After K new trees are built each round they are rescaled to new_w = 1/(n_dropped + 1) and the dropped trees are re-added at their rescaled weights. stump.tree_weight = new_w is stamped on every stump in the new round’s per-class slice. Requires tree_growth="level" in v0.10.1.
MultiClassWarmStartState.initial_dart_tree_weights carries the flat round-major × class-k per-tree weights from the prior fit, so continuation seeds dart_state.tree_weights correctly. The PyO3 bridge reconstructs the per-tree weights by grouping class_stumps[k] by tree_id (decoded from node_id / TREE_NODE_STRIDE) — taking the first stump’s tree_weight per tree group, mirroring the predictor’s apply_dart_tree_weights convention.

Constraints:

Multiclass DART requires tree_growth="level"; leaf-wise dropout indexing across K class trees is tracked as a follow-up.
Joint mode supports level-wise growth, standard boosting, and the built-in squared_error / queryrmse / rank:pairwise / rank:ndcg / rank:xendcg objectives only. Joint-path feature parity (MorphBoost, neutralization, DRO, interaction constraints, leaf-wise, GOSS, DART, warm-start, row_subsample, col_subsample, min_split_gain) is targeted for later v0.10.x releases — see docs/limitations.md.

What’s new in 0.10.0

Infrastructure release: lays the Rust-level foundation for joint multi-output learning and closes the v0.9.0 DART + warm_start follow-up. Default behaviour for every existing user-facing API (GBMRegressor, GBMClassifier, GBMRanker, MultiLabelGBMRanker) remains byte-identical to v0.9.0 — the new MultiOutputLeafValues artifact section is only emitted when the (currently Rust-only) joint trainer produces a model.

DART + warm_start continuation:

GBMRegressor, GBMClassifier, and GBMRanker now accept boosting_mode="dart" + warm_start=True (or fit(..., init_model=prior_model)). The v0.9.0 rejection error is removed.
WarmStartState gains an optional initial_dart_tree_weights field that captures the per-stump tree_weight snapshot from the prior fit. The engine seeds dart_state.tree_weights from this snapshot and pre-populates the round_start_offsets / dart_round_counts arrays from the warm-start tree shapes.
Historical RNG-driven dropped_per_round is intentionally not persisted; new rounds start fresh dropout bookkeeping going forward.

Joint multi-output infrastructure (Rust):

MultiOutputHistogram (crates/engine/src/shared_histogram.rs) accumulates K (grad, hess) pairs per (feature, bin) in one sweep, with subtraction trick and multi-output split-gain helpers.
MultiOutputLeafValues artifact section (kind index 13) stores per-stump K-output leaf values. TrainedStump gains optional multi_output_leaf_values: Option<(Vec<f32>, Vec<f32>)>.
Rust-level joint trainer (crates/engine/src/joint.rs): fit_joint_multi_output runs the full training loop with K per-output objectives (squared_error, queryrmse, rank:pairwise, rank:ndcg, rank:xendcg); JointPredictor decodes the artifact and predicts K outputs per row.
Scope intentionally minimal for v0.10.0: level-wise growth only, no MorphBoost / DRO / neutralization / leaf-wise / native-categorical / GOSS / DART / warm-start on the joint path.

Deferred to v0.10.x:

Python MultiLabelGBMRanker(training_mode="joint") user-facing surface (Rust infrastructure complete; targeted for v0.10.1).
Multiclass softmax + DART / GOSS (engine plumbing into the K-output histogram primitive is targeted for v0.10.1+).
Leaf-wise / MorphBoost / DRO / neutralization on the joint path (feature parity with the single-output trainer is targeted for v0.10.x).

What’s new in 0.9.0

Minor feature release: closes the v0.8.0 DART placeholder (Limitation 2) and resolves the linear-rank predict-path NaN routing bug (Limitation 4). Default behaviour is byte-identical to v0.8.0 on every API surface — the new DartTreeWeights artifact section is only emitted when at least one stump has a non-1.0 weight, which never happens under boosting_mode="standard" (the default) or boosting_mode="goss".

DART boosting mode (Dropouts meet MART):

New boosting_mode="dart" opt-in on GBMRegressor, binary GBMClassifier, and GBMRanker, with four companion parameters: dart_drop_rate (default 0.1), dart_max_drop (default 50), dart_normalize_type ("tree" or "forest", default "tree"), and dart_sample_type ("uniform" or "weighted", default "uniform").
Per-round dropout + normalization cycle lives in a new module crates/engine/src/dart.rs. No new crate dependencies — uses the existing mixed_hash splitmix64 derivative so per-stump drop decisions are deterministic given seed + round index.
Per-stump tree_weight: f32 is plumbed through TrainedStump and persisted via a new DartTreeWeights artifact section (ModelSectionKind index 12). Emitted only when at least one weight diverges from 1.0; pre-v0.9.0 artifacts continue to load with all weights defaulting to 1.0.
The single-output training loop rejects boosting_mode="dart" + warm_start with a clear error (tracked as a v0.10.x follow-up: would require persisting tree_weights and dropped_per_round in WarmStartState).
Multiclass softmax continues to reject boosting_mode != "standard" with a clear error message; per-class gradient scoring during the dropout step is tracked as a v0.10.x follow-up.

NaN routing on the linear-rank predict path (Limitation 4 resolved):

The predict-time quantize helpers in bindings/python/src/lib.rs (quantize_dense_values_linear_inplace_wide, quantize_dense_values_linear_rank_inplace_wide, and the inline loop in predict_dense_quantized_with_summary_bytes) now preserve f32::NAN through the f32 cast instead of casting a finite bin index. The predictor’s existing feature_value.is_nan() -> default_left short-circuit at crates/predictor/src/lib.rs:148 then fires automatically.
LinearLeaf::eval (in alloygbm-core) and LinearLeafCompact::eval (in alloygbm-predictor) now skip NaN regressor features when accumulating the linear sum, so PL-leaf predictions don’t NaN-poison on a w * NaN step.
Pure-linear, pure-quantile, and rank-binning paths now share consistent NaN semantics: missing values always route through the learned default_left direction.

Known limitations carried forward to v0.10.0

Multiclass softmax + DART is still rejected.
DART + warm_start is rejected.
Joint shared-tree multi-label ranking and the K-output shared-histogram engine primitive remain v0.10.0 targets.

What’s new in 0.8.0

Minor feature release: closes the mixed linear-rank SHAP carry-forward from v0.7.4 (Limitation 4) and adds LightGBM-style GOSS sampling as a new opt-in boosting mode. Default behaviour is byte-identical to v0.7.5 on every API surface. The other two original v0.8.0 targets — DART boosting mode and joint shared-tree multi-label ranking — were scope-split out to v0.9.0 and v0.10.0 respectively so this release could ship on a reviewable surface. BoostingMode::Dart is reserved in the API (Python boosting_mode="dart" raises NotImplementedError; the Rust trainer rejects it with a clear error message) so v0.9.0 can land DART training without further TrainParams churn.

GOSS sampling (gradient-based one-side sampling):

New boosting_mode="goss" opt-in on GBMRegressor, GBMClassifier (binary), and GBMRanker, with companion goss_top_rate (default 0.2) and goss_other_rate (default 0.1) parameters. Default boosting_mode="standard" is byte-identical to v0.7.5.
Implements LightGBM’s GOSS algorithm: at the start of each round rows are scored by |gradient|, the top goss_top_rate fraction is kept, goss_other_rate fraction is uniformly sampled from the rest, and the sampled-low rows’ gradient + hessian are multiplied by (1 - goss_top_rate) / goss_other_rate to preserve unbiased histogram statistics.
Reorders the per-round training loop so gradient computation happens before row sampling — required because GOSS scores by gradient magnitude. Standard and DART modes get the same pre-computed gradient buffer and fall back to uniform subsampling.
Multiclass softmax explicitly rejects boosting_mode != "standard" with a clear error message — per-class gradient scoring is tracked as a v0.8.1 follow-up. DART is reserved for the next feature commit on v0.8.0-features and currently raises NotImplementedError in Python.

SHAP strict additivity on the mixed linear-rank binning path (Limitation 4):

When continuous_binning_strategy="linear" triggered per-feature rank-based binning on at least one column (gated by the ALLOYGBM_EXPERIMENT_LINEAR_TAIL_RANK experiment flag), the Python shap_values() flow used to fall back to the legacy quantize-then-walk SHAP path which exempts leaf_model="linear" artifacts from strict additivity.
v0.8.0 adds a new BinningContext::LinearRank variant to crates/shap/src/lib.rs. It carries per-feature sorted unique values, global feature_mins / feature_maxs, and max_data_bin. At the explain_rows_from_model entry point SHAP internally quantizes the raw input rows to bin indices using exactly the same rules as predict_dense_quantized_linear_rank (linear quantize for unflagged features, rank quantize for flagged features, both with round_half_away_from_zero clamped to [0, max_data_bin]) and dispatches the remainder of the path-walker with BinningContext::PreBinned semantics. Both tree traversal and PL-leaf evaluation now operate in the same bin-index space the predictor uses, so strict additivity holds for leaf_model="linear" (and constant leaves stay correct).
The Python _shap_binning_kwargs() helper returns binning_kind="linear_rank" whenever any per-feature rank flag is set; GBMClassifier and GBMRanker inherit the fix from GBMRegressor._shap_binning_kwargs.
Verified by bindings/python/tests/test_shap_linear_rank_strict_additivity.py (architectural contract + strict additivity for both leaf_model="constant" and leaf_model="linear"). Closes Limitation 4.

What’s new in 0.7.5

Bug-fix release. Closes Limitation 5 from v0.7.4 — the pre-existing TreeSHAP polynomial-path additivity drift on trees with a feature appearing more than once on a root-to-leaf path. No user-visible API breakage.

TreeSHAP polynomial-path strict additivity:

The Rust port of TreeSHAP’s polynomial-time algorithm in crates/shap/src/lib.rs::ts_unextend_path was shifting the entire PathElement struct (including pweight) when removing a duplicate feature from the path. This clobbered the pweights that the unwind loop had just carefully recomputed in place. The reference implementation in slundberg/shap (shap/explainers/pytree.py) stores the four path fields as four parallel arrays and only shifts the first three (feature_index, zero_fraction, one_fraction), preserving pweights. Pre-existing in v0.7.3 and earlier; uncovered during v0.7.4 PR #27 review and pinned with an @xfail(strict=True) test at that time pending this v0.7.x follow-up.
The fix shifts the three fields explicitly and leaves pweight alone. Strict additivity now holds end-to-end on the polynomial path.
Coverage: a synthetic full-tree sweep (tree_shap_polynomial_path_matches_brute_force_on_full_trees) covers depths 2-7 × n_features {2,3,5,8,12} including all configurations that force path-duplicate features, asserting polynomial matches brute-force per-feature within 1e-5. The formerly @xfail(strict=True) regression test_strict_additivity_via_tree_shap_polynomial_path in bindings/python/tests/test_shap_pl_strict_additivity.py now passes as a regular test.

Documentation:

docs/limitations.md: Limitation 5 promoted to Resolved.
Other documented v0.7.x follow-ups (mixed linear-rank SHAP path, GOSS+DART, joint multi-label ranking, shared-histogram engine) remain deferred to v0.8.0.

What’s new in 0.7.4

Bug-fix release. Closes the remaining v0.7.x carryover documented in docs/limitations.md for SHAP strict additivity on leaf_model="linear" artifacts. No user-visible API breakage.

SHAP strict additivity for piecewise-linear leaves:

Pre-v0.7.4 distribute_linear_terms_for_row credited the per-feature deviation Σⱼ wⱼ·(xⱼ − μⱼ) only at each tree’s terminal leaf. The predictor accumulates leaf.eval_row(row) at every visited node along the row’s path, so SHAP was uncrediting one Σⱼ wⱼ·(xⱼ − μⱼ) per internal node per tree per row — producing additivity gaps on the order of the predictions themselves (~3.85 on linear-data predictions of magnitude ~10 with n_estimators=100, max_depth=6).
v0.7.4 walks the full row path and credits the linear deviation at every visited leaf. The brute-force Shapley and TreeSHAP polynomial paths share the helper so both get the fix.
The model_has_linear_leaves exemption in verify_additivity is now gated on binning.is_none(), so the predictor-aligned BinningContext callers — i.e. the default Python path for continuous features — get the strict atol + rtol·|predicted| tolerance check.
Coverage: 44 new regression tests in bindings/python/tests/test_shap_pl_strict_additivity.py exercising every binning strategy × max-bin width × lambda_l2 × max_depth × n_estimators combination, plus training_mode="manual" and "morph", interaction_constraints, GBMRanker, GBMClassifier (via the internal Rust check, since the raw margin is not exposed in Python), feature_importances (brute-force exact path), and mixed scalar+linear-leaf artifacts. Strict additivity holds on the default predictor-aligned binning path for any model that dispatches to the brute-force exact Shapley path (distinct_split_feature_count <= MAX_EXACT_SPLIT_FEATURES = 25). Larger models that trigger the polynomial-TreeSHAP path are subject to a pre-existing additivity drift documented as Limitation 5 (also present in v0.7.3 and earlier).

Documentation:

Limitation 4 (new): SHAP on the mixed linear-rank binning path — continuous_binning_strategy="linear" with per-feature rank-based binning falls back to the legacy non-binning SHAP entry point, triggering the leaf_model="linear" exemption. Narrow edge case; deferred to v0.8.0.
Limitation 5 (new): pre-existing TreeSHAP polynomial-path additivity drift on large gradient-trained trees (>= 30 distinct split features, depth >= 6). Uncovered during PR #27 review; investigated but not isolated in minimal Rust reproductions. Coverage pinned by @xfail(strict=True) regression test (test_strict_additivity_via_tree_shap_polynomial_path) so the eventual fix flips the xfail to a regular pass.

Documented for v0.7.x follow-ups (deferred to 0.8.0):

Joint shared-tree multi-label ranking. The current MultiLabelGBMRanker trains K independent per-label rankers under a unified API and is numerically equivalent to training each label separately. Joint shared-tree training lands alongside the v0.8.0 shared-histogram speedup where the architectural change has a real performance story.

What’s new in 0.7.3

Bug-fix release. Closes the four limitations queued in v0.7.2 and clears RUSTSEC-2025-0020. No user-visible API breakage.

SHAP additivity tolerance:

The internal additivity check now uses atol + rtol * |predicted| (atol=1e-5, rtol=1e-4) instead of a fixed 1e-5 absolute bound. Larger explanation batches — feature_importances() over ~1000 rows of California Housing with n_estimators=200 was the public-facing reproducer — no longer raise spurious RuntimeError on healthy leaf_model="constant" artifacts.

SHAP path-walker uses predictor-aligned float thresholds:

New shap::BinningContext (Linear, Quantile, PreBinned) plus four PyO3 entry points (shap_explain_rows_with_binning, shap_global_importance_with_binning, plus dense variants). When a binning context is provided, the path walker compares feature_value < float_threshold (matching the predictor’s convert_bin_thresholds_to_float*) instead of the legacy feature_value <= split.threshold_bin as f32. Eliminates the path-walk vs. predict-path divergence on continuous features for scalar-leaf artifacts.
GBMRegressor, GBMClassifier, and GBMRanker now pass feature mins / maxs / cuts / binning kind into SHAP automatically.

MorphBoost warm-start now persists EMA:

MorphMetadata artifact section bumped to v2 with appended Vec<GradientEmaStats> per class. WarmStartState and MultiClassWarmStartState gain initial_ema_stats: Option<Vec<GradientEmaStats>>. Both single-class and multiclass training loops seed the fresh MorphState.ema_stats from this snapshot, so resuming a MorphBoost-trained model via init_model= no longer restarts the EMA cold.
v1 artifacts decode with empty ema_stats; the engine falls back to MorphState::new cold initialization, preserving prior behaviour for legacy artifacts.

PyO3 0.23 → 0.24 (clears RUSTSEC-2025-0020):

Bumps pyo3 = "0.24" and numpy = "0.24". The bindings were already on the Bound<>-first API — zero source changes needed. deny.toml and .github/workflows/security-audit.yml no longer ignore RUSTSEC-2025-0020.

Limitations documented for the next release:

SHAP additivity for piecewise-linear leaves on continuous features remains exempted from the strict check (linear weights and feature_baseline are still trained in bin space).
Joint shared-tree multi-label boosting is still pending; the MultiLabelGBMRanker wrapper trains K independent per-label rankers.

What’s new in 0.7.2

Documentation, supply-chain, and repo-hygiene release. No user-facing Python API surface changes.

Documentation:

Multiple docs still claimed warm-start was rejected, SHAP required leaf_model="constant", interaction constraints did not exist, or rankers were single-label only after v0.7.1 actually shipped those features. README, docs/user/*.md, the Sphinx mirror under docs/site/source/*.rst, docs/roadmap/current.md, CLAUDE.md, AGENTS.md, and benchmarks/README.md are now consistent with the v0.7.1 surface that actually shipped.
docs/reference/release_checklist.md is now a top-to-bottom operating manual covering version bumps, doc updates, verification, tag/publish, and post-release bookkeeping.
docs/site/source/api.rst now auto-documents MultiLabelGBMRanker (was missing in v0.7.1).
New examples/ directory with 8 self-contained end-to-end scripts.

Repo hygiene & supply chain:

CI now runs the full pytest suite (455 tests) on every PR. v0.7.1 built the wheel and ran a handful of smoke snippets but never invoked pytest bindings/python/tests/ — the Python test suite was not enforced on merge.
Cargo.lock is tracked.
maturin pinned in publish.yml to the same SemVer range declared in pyproject.toml.
cargo-audit + cargo-deny run weekly and on every PR that touches Cargo manifests, configured via the new deny.toml.
Coverage reporting via cargo-llvm-cov + pytest-cov → Codecov.
publish = false on every workspace crate.
New CONTRIBUTING.md, SECURITY.md, GitHub issue / PR / CODEOWNERS / Dependabot configs, .editorconfig, requirements-dev.txt, README badges.

Limitations documented for the next release:

SHAP path-walker still compares against bin-index thresholds (carried over from v0.7.1).
MorphBoost warm-start does not restore the EMA snapshot (carried over from v0.7.1).
MultiLabelGBMRanker trains K independent per-label rankers; joint shared-tree multi-label boosting (carried over from v0.7.1).
NEW: SHAP additivity check has a 1e-5 absolute tolerance that f32 round-off can exceed across larger evaluation samples; loosening to atol + rtol * |predict(x)| is queued.
NEW: pyo3 = 0.23.5 has RUSTSEC-2025-0020; not exploitable in AlloyGBM’s code path. Upgrading to pyo3 0.24+ requires migrating the bindings to the Bound<>-first API.

What’s new in 0.7.1

SHAP for piecewise-linear leaves:

shap_values() now accepts leaf_model="linear" artifacts and returns an interventional decomposition: the path-based TreeSHAP / brute-force machinery attributes each leaf’s “constant part” (intercept + Σ wⱼ·μⱼ_global) while per-leaf row deviations wⱼ · (xⱼ − μⱼ_global) are credited directly to each regressor. Global feature means are persisted in a new FeatureBaseline artifact section so SHAP is self-contained at explain time.

Per-round training diagnostics:

Every estimator exposes diagnostics_per_round_ — a list of dicts containing gradient_l2_norm, gradient_variance, hessian_l2_norm, sampling counts, and (when factor neutralization is active) neutralization_effectiveness = 1 − ‖projₘ‖ / ‖origₘ‖.

Neutralized warm-start:

init_model / warm_start=True with neutralization=* is supported across pre_target, per_round_gradient, and split_penalty provided the caller supplies the same factor_exposures matrix used for the initial fit. Mode, factor_neutralization_lambda, and (for split_penalty) factor_penalty must match; mismatches raise a clear “does not match” error.

Interaction constraints:

LightGBM-compatible interaction_constraints=[[…]] on every estimator. Each group is a set of feature indices; any root-to-leaf path is restricted to splits on features from a single still-active group. Up to 64 groups per fit; enforced through both the level-wise and leaf-wise tree builders.

Multi-label ranking:

New MultiLabelGBMRanker exposes a unified multi-output ranking API. y is shaped (n_rows, n_labels) and predict returns the same shape. Trains one independent GBMRanker per label sharing group / factor_exposures / kwargs, supports per-label ranking_objective lists, and slices eval_set y-columns per label so early stopping and custom eval metrics work end-to-end.

Limitations documented for the next release:

SHAP path-walker still compares feature values against bin-index thresholds; strict additivity is relaxed for PL-leaf artifacts. Tightening this is queued for v0.7.2.
MorphBoost warm-start does not restore the EMA snapshot from the artifact, so resumed training starts EMA cold.
MultiLabelGBMRanker trains K independent per-label rankers. Joint shared-tree multi-label boosting is queued for v0.7.2.

What’s new in 0.7.0

Factor-neutral boosting:

New neutralization parameter on GBMRegressor, GBMClassifier, and GBMRanker, with row-aligned fit-time factor_exposures.
neutralization="per_round_gradient" projects each boosting round’s objective gradients away from user-supplied factors. Multiclass classification projects each class-gradient column independently.
neutralization="pre_target" residualizes the target once before training for built-in squared-error regression. Classification, ranking, custom objectives, and validation sets are rejected for this mode in 0.7.0.
neutralization="split_penalty" also subtracts a factor-load penalty from split gain via factor_penalty. It supports constant leaves, composes with leaf_solver="dro" and training_mode="morph", and rejects leaf_model="linear" in 0.7.0.
Neutralized warm_start and init_model continuation are rejected in 0.7.0 — this restriction was lifted in v0.7.1 with the same-exposures contract documented above.

Benchmarks:

alloygbm_factor_neutral and alloygbm_factor_neutral_dro arms added to benchmarks/run_model_comparison.py.
Benchmark datasets without explicit factors synthesize factor_exposures from the first min(5, n_features) feature columns. These arms are smoke and stability checks, not standalone quality claims, because the synthesized factors are also present as model features.

What’s new in 0.6.0

DRO-style scalar leaves:

New opt-in leaf_solver="dro" parameter on GBMRegressor, GBMClassifier, and GBMRanker. The solver is a fast, closed-form robust Newton update over within-leaf gradient uncertainty.
dro_radius controls the gradient-uncertainty penalty and dro_metric="wasserstein" names the Wasserstein-inspired robust counterpart. This is not a full optimizer over raw feature/target distributions.
leaf_solver="dro" requires leaf_model="constant" and composes with training_mode="morph".
Inference speed is unchanged because robust scalar leaf values are stored directly in the artifact.

What’s new in 0.5.0

Piecewise-linear (PL) tree leaves:

New opt-in leaf_model="linear" parameter on GBMRegressor, GBMClassifier, and GBMRanker. Each leaf stores a small linear model f_s(x) = b_s + Σ α_j x_j (up to 8 regressors per leaf, inherited from the split path’s feature indices; the cap is internal and not user-tunable in v0.5.0). Optimal weights are solved in closed form via the ridge regression α* = -(XᵀHX + λI)⁻¹ Xᵀg, regularised by lambda_l2.
Default leaf_model="constant" preserves all prior behaviour exactly.
New artifact section ModelSectionKind::LinearLeafCoefficients stores per-stump linear leaf data; backward-compatible with v0.4.0 artifacts.
Native-bitset categorical splits (max_cat_threshold > 0) fall back to constant leaves at the categorical split node; descendant numeric leaves use linear leaves normally.
Multi-class softmax fits each per-class tree sequence with linear leaves independently.
leaf_model="linear" composes with training_mode="morph".
SHAP (shap_values, feature_importances) currently raises an error for leaf_model="linear" artifacts; use leaf_model="constant" if you need SHAP.

Performance:

~10× faster convergence on linearly-structured datasets (fewer rounds to reach the same RMSE).
+3.5% RMSE on California Housing and +1.75pp accuracy on Breast Cancer vs constant leaves.
2–8× per-round training overhead from the closed-form Cholesky solve. Recommended lambda_l2 >= 0.01 for weight stability.

Benchmarks:

alloygbm_linear and alloygbm_morph_linear arms added to benchmarks/run_model_comparison.py for all four task types.
New benchmarks/pl_trees_benchmark.py script with convergence-curve and λ-sweep analysis.
Benchmark report committed to docs/benchmarks/pl_trees_v1.md.

What’s new in 0.4.0

MorphBoost mode and SIMD acceleration:

New opt-in adaptive training mode via training_mode="morph", implementing the criterion from Kriuk (2025). Available on GBMRegressor, GBMClassifier, and GBMRanker. See MorphBoost (Adaptive Split Criterion).
New per-iteration learning-rate schedule parameter lr_schedule ("constant" default, "warmup_cosine" available). Independent of training_mode — usable on its own.
Schedule-aware auto early-stopping: when an LR schedule is active, the auto-tuned min_loss_improvement threshold is scaled by current_lr / max_lr, and warmup-phase rounds are tolerated without termination.
Backend SIMD acceleration via the wide crate (safe API; AVX2 / NEON intrinsics underneath, scalar fallback otherwise). Histogram bin-scan and EMA passes are now vectorized; histogram tile sizing is auto-tuned for high-feature workloads.
New benchmark harnesses: benchmarks/morph_report.py, benchmarks/morph_ablation.py, and an enhanced benchmarks/numerai_benchmark.py with MorphBoost arms and a startup build-freshness check.
benchmarks/run_model_comparison.py registers two new arms by default per task type: alloygbm_morph and alloygbm_morph_cosine. New --models flag filters which arms run.

What’s new in 0.3.2

0.3.2 fixes silent zero-tree training in GBMRanker, corrects signature introspection, and adds a real-data ranking benchmark:

GBMRanker training fixes:

The auto training policy’s density-based min_split_gain and min_loss_improvement floors are no longer applied to ranking objectives. Ranking gradients are an order of magnitude smaller than regression/classification gradients; on datasets where row_count * feature_count >= 65 536 these floors were causing training to exit after round 1 with zero trees committed.
The main training loop’s unconditional loss_improvement < 0 early-exit no longer fires for ranking objectives, where round-to-round loss oscillation is expected behaviour.
inspect.signature(GBMRanker.__init__) now returns the full parameter set (ranking_objective plus all GBMRegressor parameters). Previously only three parameters were visible, causing tools that build kwargs via signature introspection to silently train with n_estimators=6.

Diagnostics:

stop_reason_ and rounds_completed_ attributes are now set on all estimators after fit() to surface the engine’s early-stop reason and actual committed round count.

Benchmarks:

Added california_ranking: California Housing reframed as learning-to-rank with geographic grid cells as queries and median_house_value bucketed into 5 graded relevance levels (~44 queries × 468 docs = ~20 595 rows).

What was new in 0.3.1

0.3.1 fixed multiclass prediction and expanded the benchmark suite:

Fixed class_trees threshold conversion so multiclass models predict correctly with continuous float features
Fixed multiclass benchmark argmax label mapping with model.classes_
Added wine_multiclass, digits_multiclass, adult_income, abalone_regression benchmark scenarios
Activated synthetic_multiclass and synthetic_categorical scenarios
Rewrote benchmarks/README.md

What was new in 0.3.0

0.3.0 adds native categorical splits, multi-class classification, and custom objective/metric support:

Native categorical splits:

Fisher-sort categorical split-finding with O(K log K) optimal binary partitions and O(1) bitset prediction
max_cat_threshold parameter controls the maximum category cardinality for native splits (default 0 = disabled, opt-in)
Category-to-ID mappings preserved through pickle, save/load, and params
Full support across GBMRegressor, GBMClassifier, and GBMRanker

Multi-class classification:

GBMClassifier auto-detects K > 2 classes and uses softmax (multinomial cross-entropy) objective with K trees per round
predict_proba returns (n_samples, K) probability matrix

Custom objectives and metrics:

objective=callable for user-defined gradient/hessian computation
eval_metric=callable for custom evaluation metrics with early stopping
higher_is_better protocol for metric direction

What was new in 0.2.0

0.2.0 was a major capability expansion from the regression-only 0.1.x series:

New estimators:

GBMClassifier – binary classification with log-loss objective, predict_proba, sklearn ClassifierMixin
GBMRanker – learning-to-rank with 5 objectives (RankNet, LambdaMART, XE-NDCG, QueryRMSE, YetiRank)

Core improvements:

NaN / missing value support across training and prediction
Sample weight support via fit(..., sample_weight=...)
Group ID support via fit(..., group=...)
Model persistence: pickle, save_model/load_model, artifact export
Feature name capture from pandas DataFrames and other named inputs
sklearn compatibility (BaseEstimator, RegressorMixin, ClassifierMixin, get_params, set_params, score)
min_split_gain exposed as a user parameter

Training enhancements:

Leaf-wise (best-first) tree growth via tree_growth="leaf"
Monotone constraints via monotone_constraints
Feature importance weighting via feature_weights
max_leaves parameter for leaf-budget-oriented training
Warm-starting / incremental training via warm_start=True
Up to 65,535 bins per feature (adaptive u8/u16 storage)
Multiple categorical column support via categorical_feature_indices
Histogram buffer reuse to reduce allocation pressure
Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)

Explanations:

TreeSHAP (polynomial-time exact Shapley values, replaces the 25-feature brute-force method)
SHAP limit raised from 20 to 25 features (for legacy brute-force path), then replaced entirely by TreeSHAP

Metrics:

accuracy – classification accuracy
log_loss – binary cross-entropy
ndcg – normalized discounted cumulative gain (with optional k)

Benchmarks:

Classification scenarios: breast_cancer, synthetic_classification
Ranking scenario: synthetic_ranking
Task-type-aware benchmark runner with per-type metrics and rendering

Validated release surface

For 0.7.1, the intended release surface is:

macOS arm64 wheel
Linux x86_64 manylinux wheel
source distribution

Deferred targets

These are intentionally deferred:

Windows wheels
macOS Intel wheels

Release checklist summary

Before a public release:

confirm package metadata and version
confirm user docs are up to date
confirm CI is green
confirm the built wheel installs in a fresh environment
confirm the publish workflow smoke-tests its wheel artifacts before upload
confirm benchmark messaging stays narrow and defensible