Release and platform policy
AlloyGBM 0.12.8 release notes and platform policy.
What’s new in 0.12.8
Feature release on top of v0.12.7. Narrows limitation #4 from
docs/limitations.md: the GLM ("poisson", "gamma", "tweedie")
and "quantile" objectives now work on GBMRanker and
MultiLabelGBMRanker in addition to single-output GBMRegressor. Only the
Classifier / multiclass softmax paths still reject these objectives.
GLM and Quantile objectives on ``GBMRanker``.
GBMRanker(ranking_objective="poisson" | "gamma" | "tweedie" | "quantile", …)is now accepted. The objectives reuseGBMRegressor’s training path and the artifact-recorded post-transform (the predictor appliesexpfor GLM objectives), so predictions return on the natural scale.tweedie_variance_powerandquantile_alphaare honored.GLM and Quantile objectives on ``MultiLabelGBMRanker``. Both
multi_label_mode="independent"and"joint"accept per-label GLM/quantile objectives, including mixed lists such asranking_objective=["poisson", "gamma", "tweedie", "quantile"]. In joint mode the GLMexppost-transform is applied on the Python predict surface, and the.alloybundle (v3) now persistsranking_objectiveso the post-transform survives asave_model/load_modelroundtrip.Engine.
JointObjectivegainedPoisson/Gamma/Tweedie { variance_power }/Quantile { alpha }variants (delegating to the existing single-outputObjectiveOpsimpls), plus a joint empirical-quantile leaf-refinement pass (refine_joint_quantile_leaves).Bug fix. Joint GLM/quantile predictions previously lost the post-transform after
save_model/load_modelbecause the bundle did not persistranking_objective; the v3 metadata now stores and restores it.
What’s new in 0.12.7
Feature and compatibility release on top of v0.12.6. Closes limitation
#6 from docs/limitations.md: Quantile regression now fully composes
with DART boosting, MorphBoost training, and piecewise-linear
(leaf_model="linear") leaves.
Quantile objective compatibility extended.
GBMRegressor(objective="quantile")now successfully composes with:DART boosting (
boosting_mode="dart"): leaf refinement operates correctly on dropped-out residuals.MorphBoost (
training_mode="morph"): leaf refinement scales intercept updates by MorphBoost per-round shrinkage and depth-based penalty.Piecewise-linear leaves (
leaf_model="linear"): leaf refinement calculates residual targets by subtracting the linear portion of predictions from training values (correctly walking from root to terminal leaf to accumulate parent-relative delta weights formax_depth >= 2), and only refines the flat leaf intercept (avoiding double-scaling of build-time solved linear slopes).
Linear leaves + quantile numeric test. Added a new robust, multi-feature,
max_depth >= 4numeric testtest_quantile_linear_leaves_numericverifying that linear-leaf quantile regression fits linear relationships significantly better than standard constant-leaf models and that path-level weights accumulate correctly.Fixed double-scaling blocker on linear leaf weights during quantile leaf refinement. Solved linear weights already carry the appropriate learning rate scale from build time and are now left untouched during intercept refinement.
Aligned MorphBoost shrinkage calculation. Aligned
iter_shrinkagecalculation intrainer/mod.rswith the authoritative tree builder (tree_build.rs) formula, removing the redundant.max(0.0)clamp to avoid cosmetic divergence.
No artifact format change. Model artifacts written by v0.12.6 load and predict identically under v0.12.7.
What’s new in 0.12.6
Feature release on top of v0.12.5. Closes limitation #3 from
docs/limitations.md: SHAP values and interaction values are now
supported on multiclass classifiers and multi-output (joint) rankers in
addition to single-output regressors.
GBMClassifier.shap_values(X)andGBMClassifier.shap_interaction_values(X)return a list ofKarrays — one per class logit. Additivity per class:Σⱼ values[k][i][j] + expected_values[k] ≈ raw_logit_k(rows[i]).MultiLabelGBMRanker.shap_values(X)andMultiLabelGBMRanker.shap_interaction_values(X)return a list ofn_labelsarrays — one per output. Joint mode (multi_label_mode="joint") routes through new per-output Rust entry points with full binning-context support; independent mode fans out to per-labelGBMRanker.shap_values.global_importance_from_artifact_bytesnow averages over outputs (divides byn_models) so importance magnitudes remain comparable across single-output and multi-output models.The Rust crate gained four new public entry points:
explain_rows_from_artifact_bytes_per_output,explain_rows_from_artifact_bytes_with_binning_per_output,explain_interactions_from_artifact_bytes_per_output, andexplain_interactions_from_artifact_bytes_with_binning_per_output. The original single-output entry points keep their existing signature and now error on K>1 artifacts directing callers to the_per_outputvariants.
Internal refactors. load_artifact_context decomposed into
unroll_multiclass, parse_joint_baselines, and
unroll_multi_output helpers (orchestrator stays ~45 lines).
bindings/python/src/predict.rs split into predict.rs (predictor
entry points) and shap_bridge.rs (all 16 SHAP PyO3 wrappers — 8
single-output + 8 _multi). Continues the v0.12.2 / v0.12.3
decomposition pattern.
No artifact format change. Model artifacts written by v0.12.5 load and predict identically under v0.12.6.
What’s new in 0.12.5
Small feature release on top of v0.12.4. Closes the
leaf_model="linear" exception on SHAP interaction values that was
carved out when interactions originally shipped in v0.11.0.
GBMRegressor.shap_interaction_values(X)now accepts artifacts trained withleaf_model="linear". The row-dependent linear deviationw_j · (x_j − μ_j)is credited to the diagonal of the interaction matrix (the regressor feature’s main effect): standard TreeSHAP interactions run on the constant part of each leaf (intercept + Σⱼ wⱼ·μⱼ), then the per-row deviations are folded ontoΦ[j][j]via the same helper that backs PL-leafshap_values. Full additivity (Σᵢⱼ Φᵢⱼ + E = ŷ) and row-marginal (Σⱼ Φᵢⱼ = φᵢ) hold by construction; the matrix is symmetric andexpected_valueis unchanged.Pragmatic caveat: this attribution does not split linear-deviation credit across path-feature × regressor-feature off-diagonals; a faithful PL-leaf interaction decomposition remains an open extension.
Internal refactor:
explain_interactions_from_modelmoved fromcrates/shap/src/lib.rstocrates/shap/src/tree_shap.rsnext to its peerexplain_rows_tree_shap. Continues the v0.12.2 SHAP-crate decomposition pattern; no behavioral change.
No artifact format change. Model artifacts written by v0.12.4 load and
predict identically under v0.12.5. 644 pytest (v0.12.4 baseline 643
plus the renamed-and-extended linear-leaf interactions test and the new
LinearRank × linear-leaves coverage) and 447 cargo (v0.12.4 baseline
445 plus two new shap_interactions_linear_leaves_*_satisfies_additivity
tests).
What’s new in 0.12.4
Bugfix release on top of v0.12.3. Two post-merge review findings (issues #48, #49) from the v0.12.2 / v0.12.3 refactor PRs:
GBMRegressor.__module__now reports its publicalloygbm.regressorshim path instead of the privatealloygbm._regressor._coreimplementation module.reprand newly-created pickle payloads no longer leak the internal package layout; old v0.12.3 pickles continue to load.The joint trainer’s module-level documentation in
crates/engine/src/joint/mod.rsis refreshed to reflect the v0.10.x feature parity (DART, GOSS, MorphBoost, DRO, factor neutralization, warm-start, leaf-wise growth, native categorical splits, interaction constraints) that had landed since the original v0.10.0 minimal scope.
No user-facing API changes, no behavioral changes, no new features. Model artifacts written by v0.12.3 load and predict identically under v0.12.4. 643 pytest (the v0.12.3 baseline of 641 plus the two new regression tests for the module-identity fix) and 445 cargo tests pass.
What’s new in 0.12.3
Phases 6–8 of the structural refactor — completing the program. No
user-facing API changes, no behavioral changes, no new features. The
6,619-line bindings/python/src/lib.rs (the PyO3 bridge) was decomposed
into nine focused submodules plus a slim lib.rs and an extracted
tests/ submodule; the 4,909-line bindings/python/alloygbm/regressor.py
(the GBMRegressor estimator) was decomposed into a _regressor/ mixin
package (_base plus four mixins and a _core shell), with
regressor.py reduced to a back-compat shim.
No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.2 load and predict identically under v0.12.3.
from alloygbm.regressor import GBMRegressorand thealloygbm.regressormodule name are unchanged;GBMClassifier/GBMRankersubclassGBMRegressortransparently.Closes the file-decomposition program (issue #44). The 445 cargo + 641 pytest tests held at every refactor commit.
What’s new in 0.12.2
Phase 4 + Phase 5 of the structural refactor. No user-facing API
changes, no behavioral changes, no new features. The 3,925-line
crates/shap/src/lib.rs was decomposed into eight focused
single-responsibility modules; the 5,088-line
crates/engine/src/joint.rs was promoted to a crates/engine/src/joint/
subdir with five sibling modules.
No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.1 load and predict identically under v0.12.2; v0.12.2 produces byte-identical artifacts to v0.12.1 from the same training data.
No public Rust API changes. Every
pubsymbol that resolved atalloygbm_shap::*oralloygbm_engine::joint::*in v0.12.1 still resolves at the same path in v0.12.2 via thepub usere-exports in the SHAP crate’slib.rsand injoint/mod.rs.Verified at every commit. All 445 cargo workspace tests and all 641 pytest tests pass unchanged on every one of the 15 refactor commits (9 for the SHAP crate, 6 for the engine joint trainer). Function bodies were moved byte-identically; visibility promotions on private items were limited to the minimum required for sibling-module access (private
fntopub(super)orpub(crate), never pastpub(crate)).
After this release, the remaining queued refactor work is the PyO3
binding (Phase 6), the Python regressor (Phase 7), and a cross-cutting
verification + CLAUDE.md refresh (Phase 8) — see tracking issue #44.
Each ships as its own patch release.
What’s new in 0.12.1
Phase 2 + Phase 3 of the structural refactor. No user-facing API
changes, no behavioral changes, no new features. The 4,822-line
crates/core/src/lib.rs was decomposed into thirteen focused
single-responsibility modules; the 3,987-line
crates/backend_cpu/src/lib.rs was decomposed into five sibling modules.
No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.12.0 load and predict identically under v0.12.1; v0.12.1 produces byte-identical artifacts to v0.12.0 from the same training data.
No public Rust API changes. Every
pubsymbol that resolved atalloygbm_core::*oralloygbm_backend_cpu::*in v0.12.0 still resolves at the same path in v0.12.1 via thepub usere-exports in each crate’slib.rs.Verified at every commit. All 445 cargo workspace tests and all 641 pytest tests pass unchanged on every one of the 18 refactor commits (13 for the core crate, 5 for backend_cpu). Function bodies were moved byte-identically; visibility promotions on private items in backend_cpu were limited to the minimum required for sibling-module access (private
fntopub(crate) fn, never pastpub(crate)).
After this release, the remaining queued refactor work is the SHAP crate
(Phase 4), the engine joint trainer (Phase 5), the PyO3 binding
(Phase 6), the Python regressor (Phase 7), and a cross-cutting
verification + CLAUDE.md refresh (Phase 8) — see tracking issue #44.
Each ships as its own patch release.
What’s new in 0.12.0
Engine crate refactor. No user-facing API changes, no behavioral changes,
no new features. The 15,189-line crates/engine/src/lib.rs monolith was
decomposed into 24 focused single-responsibility modules across a new
crates/engine/src/ layout and a new crates/engine/src/trainer/
submodule directory. The remaining lib.rs is 101 lines of module
declarations and pub use re-exports.
No new objectives, parameters, training modes, or estimator API.
No artifact format changes. Model artifacts written by v0.11.1 load and predict identically under v0.12.0; v0.12.0 produces byte-identical artifacts to v0.11.1 from the same training data.
No public Rust API changes. Every
pubsymbol that resolved atalloygbm_engine::*in v0.11.1 still resolves at the same path in v0.12.0 via thepub usere-exports inlib.rs.Verified at every commit. All 207 engine unit tests, all 445 workspace Rust tests, and all 641 pytest tests pass unchanged on every one of the 24 refactor commits. Function bodies were moved byte-identically; visibility promotions were limited to the minimum required by the new module boundary (private
fntopub(crate) fn, never pastpub(crate)).
Scope: only crates/engine/src/lib.rs. The other large files
(bindings/python/src/lib.rs, crates/engine/src/joint.rs,
bindings/python/alloygbm/regressor.py, crates/core/src/lib.rs,
crates/backend_cpu/src/lib.rs, crates/shap/src/lib.rs) are untouched
and queued for future releases.
What’s new in 0.11.1
Quantile regression. GBMRegressor accepts a new quantile regression
objective (objective="quantile") with pinball loss semantics and parameter
quantile_alpha (default 0.5, strictly in (0.0, 1.0)).
Empirical Quantile Leaf Refinement: At the end of each round, a custom post-growth leaf refinement step (
refine_quantile_leaf_values) is run to replace Newton-Raphson leaf predictions with the actual empirical quantiles of residuals for all rows in each leaf.Full-dataset refinement: Under
row_subsample < 1.0, split-finding runs on the subsampled subset, but leaf refinement uses the entire training set to minimize the estimation variance of the empirical quantile.Proxy Hessian: Since the pinball loss has a zero second derivative everywhere, a proxy Hessian
h_i = w_i(sample weight) is used during split-finding.Quickselect optimization: The unweighted refinement path uses a fast
O(N)quickselect algorithm (select_nth_unstable_by) instead of sortingO(N log N), avoiding performance degradation.Validation: Gated validation ensures that invalid
quantile_alphasettings are only rejected whenobjective="quantile"is active, leaving non-quantile models unaffected.
Scope limit: Single-output GBMRegressor only. Rejects combinations with DART
boosting, MorphBoost, linear leaves (leaf_model="linear"), classification,
ranking, and joint multi-output training.
What’s new in 0.11.0
Two small, independent wins in one release.
SHAP interaction values. GBMRegressor.shap_interaction_values(X)
returns the (n_rows, n_features, n_features) pairwise SHAP-interaction
tensor in O(T · L · D² · M) time. Implements Lundberg et al. (2020)
Algorithm 2, ported verbatim from the canonical slundberg/shap C++
reference. Three invariants are pinned by tests: symmetric
(Φ_ij == Φ_ji), row-marginal recovers per-feature SHAP
(Σ_j Φ_ij == φ_i), and full additivity reconstructs the prediction
(Σ_i Σ_j Φ_ij + expected_value == predict(x) within
atol = 1e-5 + rtol = 1e-4 · |predict(x)|). Constant-leaf artifacts only;
leaf_model="linear" is rejected.
Poisson / Gamma / Tweedie GLM objectives. GBMRegressor accepts
three new log-link GLM objectives. All three use weighted-mean-in-log-space
initial predictions, Newton-Raphson leaves, and the standard ObjectiveOps
machinery. predict() returns exp(raw). Tweedie supports
1 < variance_power < 2 (compound Poisson-gamma) via the new
tweedie_variance_power: float = 1.5 constructor kwarg. New deviance
metrics in alloygbm.evaluation: poisson_deviance, gamma_deviance,
tweedie_deviance(y_true, y_pred, variance_power=p). Target-domain
validation raises ValueError before training starts when targets violate
the domain (negative y for Poisson/Tweedie, non-positive y for Gamma).
Single-output GBMRegressor only; not on Ranker, Classifier,
multiclass, or the joint multi-output ranker.
What’s new in 0.10.6
Closes the last v0.10.4-deferred joint-path follow-up: all three factor
neutralization modes now work on the joint multi-output trainer.
MultiLabelGBMRanker(multi_label_mode="joint", neutralization=…,
factor_exposures=…) supports "pre_target",
"per_round_gradient", and "split_penalty" with the same surface
as the single-output GBMRegressor / GBMRanker. The joint trainer
reaches full feature parity with the single-output path. Default
behaviour for every existing user-facing API remains byte-identical to
v0.10.5 when neutralization is not opted into.
Three new modes, all activated via the neutralization kwarg:
pre_target— residualize each per-output target through the factor exposures once before training. Requires every per-output objective to besquared_error(the only objective where residualize-target equals residualize-gradient).per_round_gradient— project each of the K gradient buffers in place every round after computing them. Mirrors the single-output multiclass per-class projection pattern.split_penalty— subtract a K-output factor-load penalty from each candidate split’s gain. Applies under bothtree_growth="level"andtree_growth="leaf".
Three new kwargs admitted by _JOINT_SUPPORTED_KWARGS:
neutralization—"none"(default),"pre_target","per_round_gradient", or"split_penalty"factor_neutralization_lambda— ridge regularization on the projector Gram matrix (default1e-6)factor_penalty—split_penaltymode’s penalty multiplier (default0.0—0collapses to standard byte-for-byte)
Plus the factor_exposures= kwarg on fit() (already existed for the
independent-mode fallback; now honored on joint too). The PyO3 bridge
cross-validates the exposures-vs-config invariant: active config requires
exposures, exposures require an active config.
Artifact: new ModelSectionKind::NeutralizationMetadata (kind=14)
records the active config in the artifact so joint models are
self-describing. Metadata only; prediction never reads it (neutralization
is a training-time transformation; the trained leaf values already bake
in the projection).
Byte-equivalence: a fit with neutralization='none' (or
kind=None, or split_penalty=0) produces byte-identical artifact
bytes to a pre-v0.10.6 fit. Pinned by
joint_neutralization_inert_configs_match_v0_10_5_byte_for_byte.
Composes with MorphBoost (training_mode="morph"), DRO leaves
(leaf_solver="dro"), DART boosting, and warm-start.
What’s new in 0.10.5
Closes the joint DRO leaves follow-up from v0.10.4.
MultiLabelGBMRanker(multi_label_mode="joint", leaf_solver="dro",
dro_radius=…, dro_metric="wasserstein") now applies
Wasserstein-distributionally-robust leaf values on the joint
multi-output trainer, mirroring GBMRegressor / GBMRanker’s
single-output leaf solver. Default behaviour for every existing
user-facing API remains byte-identical to v0.10.4 when DRO is not
opted into.
Joint DRO leaves:
routes the K-output Newton-Raphson leaf step through
alloygbm_core::leaf_effective_gradient (the same helper used by
single-output GBMRegressor / GBMRanker since v0.6.x). Applied
in-build inside build_joint_round_inner’s leaf_values closure
and build_joint_round_leafwise’s per-output leaf computation — row
indices are already in scope at leaf-computation time. DRO is leaf-only:
split-gain dispatch still uses the standard K-output sum-of-XGBoost-gains
(multi-output histogram doesn’t carry per-bin grad_sq; adding it would
cost ~1.5× joint-round memory — split-time DRO is deferred pending
benchmark evidence).
Three new kwargs in _JOINT_SUPPORTED_KWARGS:
leaf_solver—"standard"(default) or"dro"dro_radius— float ≥ 0;0.0collapses to standard byte-for-bytedro_metric—"wasserstein"(only supported value in v0.10.5)
Works under both tree_growth="level" and tree_growth="leaf", and
composes with MorphBoost (training_mode="morph") and DART/GOSS
boosting modes. Byte-equivalent to v0.10.4 when lambda_l1 == 0 AND
(dro_config.is_none() OR dro_config.radius == 0.0); pinned by
joint_dro_radius_zero_matches_standard_byte_for_byte (cargo) and
test_joint_dro_radius_zero_byte_equivalent_to_standard (pytest).
Deferred to v0.10.6:
joint factor neutralization (neutralization + factor_exposures).
Remains in docs/limitations.md Limitation 2 with explicit version
marker.
What’s new in 0.10.4
Adds MorphBoost (Kriuk 2025, arXiv:2511.13234) to the joint multi-output
trainer used by MultiLabelGBMRanker(multi_label_mode="joint"). This
is the first of three deferred items from docs/limitations.md
Limitation 2 to ship; DRO leaves landed in v0.10.5 and factor
neutralization on the joint trainer is tracked for v0.10.6. Default
behaviour for
every existing user-facing API remains byte-identical to v0.10.3 when
MorphBoost is not opted into.
Joint MorphBoost surface:
MultiLabelGBMRanker(multi_label_mode="joint", training_mode="morph",
…) now activates MorphBoost on the shared-tree multi-output trainer.
Honors the full single-output MorphBoost surface — morph_rate,
evolution_pressure, morph_warmup_iters, info_score_weight,
depth_penalty_base, balance_penalty, lr_schedule,
lr_warmup_frac. Per-iteration LR schedule (constant or
warmup-cosine), per-leaf depth penalty
(depth_penalty_base ^ (depth/3) where
depth = (local_node_id + 1).ilog2()), and per-iteration leaf
shrinkage (1 − morph_rate * round/total) all apply uniformly across
the K-output leaf values.
Multi-output morph gain:
two new helpers in crates/engine/src/shared_histogram.rs —
compute_multi_output_split_gain_morph and
find_best_multi_output_categorical_split_morph — sum per-output
morph gain across the K outputs. Each output uses its own
(grad_mean, grad_std) snapshot from MorphState::ema_stats[k].
Per-side row count for the info-gain term is approximated via
hess.max(0.0) as u32 (multi-output histogram doesn’t carry exact
counts) — exact for objectives where hess ≡ 1 per row, monotone proxy
for ranking. Warmup byte-equivalence with the standard K-output gain
is guaranteed regardless.
MorphBoost EMA warm-start (continuity, not byte-equivalence):
JointWarmStartState.initial_ema_stats: Option<Vec<GradientEmaStats>>
re-seeds MorphState::ema_stats on warm-resume so the gradient-
statistics smoothing is continuous across the resume boundary — new
rounds see the same per-output (mean, std) they would have seen had
training never been interrupted. The PyO3 bridge auto-extracts the
snapshot from init_artifact_bytes via
TrainedModel::from_artifact_bytes(…).morph_metadata.
MorphBoost warm-resume is intentionally NOT byte-equivalent to a fresh
longer fit. Per-iteration leaf shrinkage and LR schedule are resolved
against the total_iterations horizon at training time; a prior fit
with n_estimators=6 baked its first six trees against a 6-round
horizon and resuming with n_estimators=4 cannot retroactively
re-scale them. The EMA continuity is the practical guarantee. This
mirrors the single-output MorphBoost warm-start behavior.
Deferred to v0.10.5 / v0.10.6 (from v0.10.4):
joint DRO leaves (leaf_solver="dro") — shipped in v0.10.5 — and
joint factor neutralization (neutralization + factor_exposures)
— tracked for v0.10.6. See docs/limitations.md Limitation 2.
What’s new in 0.10.3
Closes the four “v0.10.3” follow-ups carved out of the v0.10.2
joint-trainer parity work: native-categorical Python wiring, joint
GOSS, joint DART, and joint warm-start. The
MultiLabelGBMRanker(multi_label_mode="joint") wrapper now accepts
every kwarg the single-output trainer accepts (except MorphBoost / DRO
/ factor neutralization, which are tracked for v0.10.4). Default
behaviour for every existing user-facing API remains byte-identical to
v0.10.2 when the new knobs are not opted into.
Joint native-categorical Python wiring:
the Rust-level joint native-cat trainer
(fit_joint_multi_output_with_categorical +
find_best_multi_output_categorical_split) was already in v0.10.2;
the PyO3 bridge train_joint_multi_label_ranker now re-bins
requested columns to bin_index == category_id before invoking the
trainer (mirrors the single-output
apply_categorical_encoding_to_training_matrices_multi). The
_JOINT_SUPPORTED_KWARGS allow-list re-adds
categorical_feature_indices and max_cat_threshold.
Joint GOSS:
new select_joint_row_indices_for_round helper inside
crates/engine/src/joint.rs mirrors
select_row_indices_for_round_multiclass — per-row score is
\(s_i = \\sum_k |g_{i,k}|\) across the K per-output gradient
buffers (LightGBM multiclass GOSS convention). A single row mask is
shared across all K buffers; the amplification factor mutates every
per-output gradient/hessian in lockstep so histograms remain unbiased.
MultiLabelGBMRanker(multi_label_mode='joint', boosting_mode='goss',
goss_top_rate=..., goss_other_rate=...).
Joint DART:
dropout/normalize cycle added to fit_joint_inner. One tree per
round on the joint trainer simplifies bookkeeping vs. multiclass DART:
dart_state.tree_weights has length rounds_completed and
dart_round_start_offsets[r] / dart_round_counts[r] collapse to
a flat per-round pair. Reuses engine::dart::{select_dropouts,
apply_normalization} unchanged. Per-stump tree_weight persists
via the existing DartTreeWeights artifact section (kind=11), and
JointPredictor is extended with tree_weights: Vec<f32> so each
tree’s leaf contribution is multiplied by tree_w at predict time.
Joint warm-start:
new JointWarmStartState { baselines, stumps,
initial_rounds_completed, initial_dart_tree_weights } + new
fit_joint_multi_output_with_warm_start entry point.
MultiLabelGBMRanker(multi_label_mode='joint', warm_start=True,
init_model=<prior_fit>) cracks open the prior fit’s joint artifact,
replays prior stumps onto predictions via the shared
walk_tree_into_predictions helper, re-encodes new-round
node_id starting at initial_rounds_completed, and (under DART)
reconstructs dart_state.tree_weights from per-stump
tree_weight. Per-round seeds mix
global_round = round + initial_rounds so an N+M warm-resumed fit
produces identical RNG draws to a fresh N+M fit on rounds N..N+M.
Deferred to later v0.10.x point releases:
v0.10.4: MorphBoost, DRO, and factor neutralization on the joint path.
What’s new in 0.10.2
Closes the leaf-wise multiclass DART limitation and the first slice of joint-path feature parity (leaf-wise growth, native-categorical, interaction constraints, row/col subsample, min_split_gain). The remaining joint-path features land in v0.10.3 (GOSS, DART, warm-start on joint) and v0.10.4 (MorphBoost, DRO, neutralization on joint). Default behaviour for every existing user-facing API remains byte-identical to v0.10.1 when the new features are not opted into.
Joint trainer core feature parity:
engine::joint::fit_joint_multi_output now supports
tree_growth="leaf" + max_leaves (via the new
build_joint_round_leafwise priority-queue best-first growth),
interaction_constraints (reusing the single-output
InteractionConstraintIndex), min_split_gain, row_subsample,
and col_subsample. All five are exposed through
MultiLabelGBMRanker(multi_label_mode="joint") Python surface;
_JOINT_SUPPORTED_KWARGS grew to permit
min_split_gain, row_subsample, col_subsample,
interaction_constraints, tree_growth, max_leaves.
Native-categorical splits on the joint path are partially shipped:
the Rust-level
find_best_multi_output_categorical_split Fisher-sort helper +
fit_joint_multi_output_with_categorical entry point are in place
and sound when given bins where bin_index == category_id. The
Python surface is intentionally not wired in v0.10.2 because the
current bridge bins all features with
ContinuousBinningStrategy::Linear which doesn’t preserve that
invariant for joint mode — categorical_feature_indices and
max_cat_threshold are rejected in joint mode and tracked for
v0.10.3.
Leaf-wise multiclass DART:
GBMClassifier(boosting_mode="dart") with K ≥ 3 classes now works
under tree_growth="leaf" + max_leaves. The v0.10.1
tree_growth='level' restriction in
fit_multiclass_iterations_impl was lifted. Per-class
dart_round_start_offsets[k] / dart_round_counts[k] bookkeeping
is growth-mode-agnostic because it snapshots class_stumps[k].len()
around each build_tree_* call. Validation early-stopping DART
transition and DART warm-start tree-weight reconstruction work
without changes.
Deferred to later v0.10.x point releases (as documented in v0.10.2, now closed):
v0.10.3 shipped: native-cat Python wiring, joint GOSS, joint DART, joint warm-start.
v0.10.4: MorphBoost, DRO, and factor neutralization on the joint path.
What’s new in 0.10.1
Closes the three v0.10.x-deferred limitations from v0.10.0:
MultiLabelGBMRanker joint mode Python surface, multiclass softmax
+ GOSS, and multiclass softmax + DART (including warm-start). Default
behaviour for every existing user-facing API remains byte-identical
to v0.10.0 when the new features are not opted into.
MultiLabelGBMRanker joint mode (Python surface):
MultiLabelGBMRanker(multi_label_mode="joint")now routes through a new PyO3 entry point (train_joint_multi_label_ranker) andJointPredictorHandlepy-class to the v0.10.0 Rust joint trainerengine::joint::fit_joint_multi_output. Default mode is still"independent"(the K-per-labelGBMRankerfallback from v0.7.1) — joint is opt-in. Bundle format bumped to v2 with an explicit mode byte; v1 bundles still load as independent.
Multiclass softmax + GOSS:
GBMClassifier(boosting_mode="goss")for K >= 3 classes. Per-row score \(s_i = \\sum_k |g_{i,k}|\) (LightGBM convention) drives a shared sampling mask across all K class gradient buffers; the amplification factor is applied identically to every class’s grad and hess. The multiclass round loop was refactored so the K gradient buffers are pre-computed before sampling.
Multiclass softmax + DART (+ warm-start):
GBMClassifier(boosting_mode="dart")for K >= 3 classes. Per-class prediction vectors get per-round subtract/readd of dropped tree contributions scaled bydart_state.tree_weights. Per-classdart_round_start_offsets/dart_round_countsarrays track the contiguous stump slice each (round, class) tree occupies inclass_stumps[k]so dropout subtracts the WHOLE class tree, not just its root stump. After K new trees are built each round they are rescaled tonew_w = 1/(n_dropped + 1)and the dropped trees are re-added at their rescaled weights.stump.tree_weight = new_wis stamped on every stump in the new round’s per-class slice. Requirestree_growth="level"in v0.10.1.MultiClassWarmStartState.initial_dart_tree_weightscarries the flat round-major × class-k per-tree weights from the prior fit, so continuation seedsdart_state.tree_weightscorrectly. The PyO3 bridge reconstructs the per-tree weights by groupingclass_stumps[k]bytree_id(decoded fromnode_id / TREE_NODE_STRIDE) — taking the first stump’stree_weightper tree group, mirroring the predictor’sapply_dart_tree_weightsconvention.
Constraints:
Multiclass DART requires
tree_growth="level"; leaf-wise dropout indexing across K class trees is tracked as a follow-up.Joint mode supports level-wise growth, standard boosting, and the built-in
squared_error/queryrmse/rank:pairwise/rank:ndcg/rank:xendcgobjectives only. Joint-path feature parity (MorphBoost, neutralization, DRO, interaction constraints, leaf-wise, GOSS, DART, warm-start,row_subsample,col_subsample,min_split_gain) is targeted for later v0.10.x releases — seedocs/limitations.md.
What’s new in 0.10.0
Infrastructure release: lays the Rust-level foundation for joint
multi-output learning and closes the v0.9.0 DART + warm_start
follow-up. Default behaviour for every existing user-facing API
(GBMRegressor, GBMClassifier, GBMRanker,
MultiLabelGBMRanker) remains byte-identical to v0.9.0 — the new
MultiOutputLeafValues artifact section is only emitted when the
(currently Rust-only) joint trainer produces a model.
DART + warm_start continuation:
GBMRegressor,GBMClassifier, andGBMRankernow acceptboosting_mode="dart"+warm_start=True(orfit(..., init_model=prior_model)). The v0.9.0 rejection error is removed.WarmStartStategains an optionalinitial_dart_tree_weightsfield that captures the per-stumptree_weightsnapshot from the prior fit. The engine seedsdart_state.tree_weightsfrom this snapshot and pre-populates theround_start_offsets/dart_round_countsarrays from the warm-start tree shapes.Historical RNG-driven
dropped_per_roundis intentionally not persisted; new rounds start fresh dropout bookkeeping going forward.
Joint multi-output infrastructure (Rust):
MultiOutputHistogram(crates/engine/src/shared_histogram.rs) accumulates K (grad, hess) pairs per (feature, bin) in one sweep, with subtraction trick and multi-output split-gain helpers.MultiOutputLeafValuesartifact section (kind index 13) stores per-stump K-output leaf values.TrainedStumpgains optionalmulti_output_leaf_values: Option<(Vec<f32>, Vec<f32>)>.Rust-level joint trainer (
crates/engine/src/joint.rs):fit_joint_multi_outputruns the full training loop with K per-output objectives (squared_error,queryrmse,rank:pairwise,rank:ndcg,rank:xendcg);JointPredictordecodes the artifact and predicts K outputs per row.Scope intentionally minimal for v0.10.0: level-wise growth only, no MorphBoost / DRO / neutralization / leaf-wise / native-categorical / GOSS / DART / warm-start on the joint path.
Deferred to v0.10.x:
Python
MultiLabelGBMRanker(training_mode="joint")user-facing surface (Rust infrastructure complete; targeted for v0.10.1).Multiclass softmax + DART / GOSS (engine plumbing into the K-output histogram primitive is targeted for v0.10.1+).
Leaf-wise / MorphBoost / DRO / neutralization on the joint path (feature parity with the single-output trainer is targeted for v0.10.x).
What’s new in 0.9.0
Minor feature release: closes the v0.8.0 DART placeholder
(Limitation 2) and resolves the linear-rank predict-path NaN routing
bug (Limitation 4). Default behaviour is byte-identical to v0.8.0 on
every API surface — the new DartTreeWeights artifact section is
only emitted when at least one stump has a non-1.0 weight, which
never happens under boosting_mode="standard" (the default) or
boosting_mode="goss".
DART boosting mode (Dropouts meet MART):
New
boosting_mode="dart"opt-in onGBMRegressor, binaryGBMClassifier, andGBMRanker, with four companion parameters:dart_drop_rate(default0.1),dart_max_drop(default50),dart_normalize_type("tree"or"forest", default"tree"), anddart_sample_type("uniform"or"weighted", default"uniform").Per-round dropout + normalization cycle lives in a new module
crates/engine/src/dart.rs. No new crate dependencies — uses the existingmixed_hashsplitmix64 derivative so per-stump drop decisions are deterministic givenseed+ round index.Per-stump
tree_weight: f32is plumbed throughTrainedStumpand persisted via a newDartTreeWeightsartifact section (ModelSectionKindindex 12). Emitted only when at least one weight diverges from 1.0; pre-v0.9.0 artifacts continue to load with all weights defaulting to 1.0.The single-output training loop rejects
boosting_mode="dart"+warm_startwith a clear error (tracked as a v0.10.x follow-up: would require persistingtree_weightsanddropped_per_roundinWarmStartState).Multiclass softmax continues to reject
boosting_mode != "standard"with a clear error message; per-class gradient scoring during the dropout step is tracked as a v0.10.x follow-up.
NaN routing on the linear-rank predict path (Limitation 4 resolved):
The predict-time quantize helpers in
bindings/python/src/lib.rs(quantize_dense_values_linear_inplace_wide,quantize_dense_values_linear_rank_inplace_wide, and the inline loop inpredict_dense_quantized_with_summary_bytes) now preservef32::NANthrough the f32 cast instead of casting a finite bin index. The predictor’s existingfeature_value.is_nan() -> default_leftshort-circuit atcrates/predictor/src/lib.rs:148then fires automatically.LinearLeaf::eval(inalloygbm-core) andLinearLeafCompact::eval(inalloygbm-predictor) now skip NaN regressor features when accumulating the linear sum, so PL-leaf predictions don’t NaN-poison on aw * NaNstep.Pure-linear, pure-quantile, and rank-binning paths now share consistent NaN semantics: missing values always route through the learned
default_leftdirection.
Known limitations carried forward to v0.10.0
Multiclass softmax + DART is still rejected.
DART +
warm_startis rejected.Joint shared-tree multi-label ranking and the K-output shared-histogram engine primitive remain v0.10.0 targets.
What’s new in 0.8.0
Minor feature release: closes the mixed linear-rank SHAP carry-forward
from v0.7.4 (Limitation 4) and adds LightGBM-style GOSS sampling as a
new opt-in boosting mode. Default behaviour is byte-identical to
v0.7.5 on every API surface. The other two original v0.8.0 targets —
DART boosting mode and joint shared-tree multi-label ranking — were
scope-split out to v0.9.0 and v0.10.0 respectively so this release
could ship on a reviewable surface. BoostingMode::Dart is reserved
in the API (Python boosting_mode="dart" raises
NotImplementedError; the Rust trainer rejects it with a clear error
message) so v0.9.0 can land DART training without further
TrainParams churn.
GOSS sampling (gradient-based one-side sampling):
New
boosting_mode="goss"opt-in onGBMRegressor,GBMClassifier(binary), andGBMRanker, with companiongoss_top_rate(default0.2) andgoss_other_rate(default0.1) parameters. Defaultboosting_mode="standard"is byte-identical to v0.7.5.Implements LightGBM’s GOSS algorithm: at the start of each round rows are scored by
|gradient|, the topgoss_top_ratefraction is kept,goss_other_ratefraction is uniformly sampled from the rest, and the sampled-low rows’ gradient + hessian are multiplied by(1 - goss_top_rate) / goss_other_rateto preserve unbiased histogram statistics.Reorders the per-round training loop so gradient computation happens before row sampling — required because GOSS scores by gradient magnitude. Standard and DART modes get the same pre-computed gradient buffer and fall back to uniform subsampling.
Multiclass softmax explicitly rejects
boosting_mode != "standard"with a clear error message — per-class gradient scoring is tracked as a v0.8.1 follow-up. DART is reserved for the next feature commit onv0.8.0-featuresand currently raisesNotImplementedErrorin Python.
SHAP strict additivity on the mixed linear-rank binning path (Limitation 4):
When
continuous_binning_strategy="linear"triggered per-feature rank-based binning on at least one column (gated by theALLOYGBM_EXPERIMENT_LINEAR_TAIL_RANKexperiment flag), the Pythonshap_values()flow used to fall back to the legacy quantize-then-walk SHAP path which exemptsleaf_model="linear"artifacts from strict additivity.v0.8.0 adds a new
BinningContext::LinearRankvariant tocrates/shap/src/lib.rs. It carries per-feature sorted unique values, globalfeature_mins/feature_maxs, andmax_data_bin. At theexplain_rows_from_modelentry point SHAP internally quantizes the raw input rows to bin indices using exactly the same rules aspredict_dense_quantized_linear_rank(linear quantize for unflagged features, rank quantize for flagged features, both withround_half_away_from_zeroclamped to[0, max_data_bin]) and dispatches the remainder of the path-walker withBinningContext::PreBinnedsemantics. Both tree traversal and PL-leaf evaluation now operate in the same bin-index space the predictor uses, so strict additivity holds forleaf_model="linear"(and constant leaves stay correct).The Python
_shap_binning_kwargs()helper returnsbinning_kind="linear_rank"whenever any per-feature rank flag is set;GBMClassifierandGBMRankerinherit the fix fromGBMRegressor._shap_binning_kwargs.Verified by
bindings/python/tests/test_shap_linear_rank_strict_additivity.py(architectural contract + strict additivity for bothleaf_model="constant"andleaf_model="linear"). Closes Limitation 4.
What’s new in 0.7.5
Bug-fix release. Closes Limitation 5 from v0.7.4 — the pre-existing TreeSHAP polynomial-path additivity drift on trees with a feature appearing more than once on a root-to-leaf path. No user-visible API breakage.
TreeSHAP polynomial-path strict additivity:
The Rust port of TreeSHAP’s polynomial-time algorithm in
crates/shap/src/lib.rs::ts_unextend_pathwas shifting the entirePathElementstruct (includingpweight) when removing a duplicate feature from the path. This clobbered the pweights that the unwind loop had just carefully recomputed in place. The reference implementation inslundberg/shap(shap/explainers/pytree.py) stores the four path fields as four parallel arrays and only shifts the first three (feature_index,zero_fraction,one_fraction), preserving pweights. Pre-existing in v0.7.3 and earlier; uncovered during v0.7.4 PR #27 review and pinned with an@xfail(strict=True)test at that time pending this v0.7.x follow-up.The fix shifts the three fields explicitly and leaves
pweightalone. Strict additivity now holds end-to-end on the polynomial path.Coverage: a synthetic full-tree sweep (
tree_shap_polynomial_path_matches_brute_force_on_full_trees) covers depths 2-7 × n_features {2,3,5,8,12} including all configurations that force path-duplicate features, asserting polynomial matches brute-force per-feature within 1e-5. The formerly@xfail(strict=True)regressiontest_strict_additivity_via_tree_shap_polynomial_pathinbindings/python/tests/test_shap_pl_strict_additivity.pynow passes as a regular test.
Documentation:
docs/limitations.md: Limitation 5 promoted to Resolved.Other documented v0.7.x follow-ups (mixed linear-rank SHAP path, GOSS+DART, joint multi-label ranking, shared-histogram engine) remain deferred to v0.8.0.
What’s new in 0.7.4
Bug-fix release. Closes the remaining v0.7.x carryover documented in
docs/limitations.md for SHAP strict additivity on
leaf_model="linear" artifacts. No user-visible API breakage.
SHAP strict additivity for piecewise-linear leaves:
Pre-v0.7.4
distribute_linear_terms_for_rowcredited the per-feature deviationΣⱼ wⱼ·(xⱼ − μⱼ)only at each tree’s terminal leaf. The predictor accumulatesleaf.eval_row(row)at every visited node along the row’s path, so SHAP was uncrediting oneΣⱼ wⱼ·(xⱼ − μⱼ)per internal node per tree per row — producing additivity gaps on the order of the predictions themselves (~3.85 on linear-data predictions of magnitude ~10 withn_estimators=100, max_depth=6).v0.7.4 walks the full row path and credits the linear deviation at every visited leaf. The brute-force Shapley and TreeSHAP polynomial paths share the helper so both get the fix.
The
model_has_linear_leavesexemption inverify_additivityis now gated onbinning.is_none(), so the predictor-alignedBinningContextcallers — i.e. the default Python path for continuous features — get the strictatol + rtol·|predicted|tolerance check.Coverage: 44 new regression tests in
bindings/python/tests/test_shap_pl_strict_additivity.pyexercising every binning strategy × max-bin width ×lambda_l2×max_depth×n_estimatorscombination, plustraining_mode="manual"and"morph",interaction_constraints,GBMRanker,GBMClassifier(via the internal Rust check, since the raw margin is not exposed in Python),feature_importances(brute-force exact path), and mixed scalar+linear-leaf artifacts. Strict additivity holds on the default predictor-aligned binning path for any model that dispatches to the brute-force exact Shapley path (distinct_split_feature_count <= MAX_EXACT_SPLIT_FEATURES = 25). Larger models that trigger the polynomial-TreeSHAP path are subject to a pre-existing additivity drift documented as Limitation 5 (also present in v0.7.3 and earlier).
Documentation:
Limitation 4 (new): SHAP on the mixed linear-rank binning path —
continuous_binning_strategy="linear"with per-feature rank-based binning falls back to the legacy non-binning SHAP entry point, triggering theleaf_model="linear"exemption. Narrow edge case; deferred to v0.8.0.Limitation 5 (new): pre-existing TreeSHAP polynomial-path additivity drift on large gradient-trained trees (>= 30 distinct split features, depth >= 6). Uncovered during PR #27 review; investigated but not isolated in minimal Rust reproductions. Coverage pinned by
@xfail(strict=True)regression test (test_strict_additivity_via_tree_shap_polynomial_path) so the eventual fix flips the xfail to a regular pass.
Documented for v0.7.x follow-ups (deferred to 0.8.0):
Joint shared-tree multi-label ranking. The current
MultiLabelGBMRankertrains K independent per-label rankers under a unified API and is numerically equivalent to training each label separately. Joint shared-tree training lands alongside the v0.8.0 shared-histogram speedup where the architectural change has a real performance story.
What’s new in 0.7.3
Bug-fix release. Closes the four limitations queued in v0.7.2 and clears RUSTSEC-2025-0020. No user-visible API breakage.
SHAP additivity tolerance:
The internal additivity check now uses
atol + rtol * |predicted|(atol=1e-5, rtol=1e-4) instead of a fixed1e-5absolute bound. Larger explanation batches —feature_importances()over ~1000 rows of California Housing withn_estimators=200was the public-facing reproducer — no longer raise spuriousRuntimeErroron healthyleaf_model="constant"artifacts.
SHAP path-walker uses predictor-aligned float thresholds:
New
shap::BinningContext(Linear,Quantile,PreBinned) plus four PyO3 entry points (shap_explain_rows_with_binning,shap_global_importance_with_binning, plus dense variants). When a binning context is provided, the path walker comparesfeature_value < float_threshold(matching the predictor’sconvert_bin_thresholds_to_float*) instead of the legacyfeature_value <= split.threshold_bin as f32. Eliminates the path-walk vs. predict-path divergence on continuous features for scalar-leaf artifacts.GBMRegressor,GBMClassifier, andGBMRankernow pass feature mins / maxs / cuts / binning kind into SHAP automatically.
MorphBoost warm-start now persists EMA:
MorphMetadata artifact section bumped to v2 with appended
Vec<GradientEmaStats>per class.WarmStartStateandMultiClassWarmStartStategaininitial_ema_stats: Option<Vec<GradientEmaStats>>. Both single-class and multiclass training loops seed the freshMorphState.ema_statsfrom this snapshot, so resuming a MorphBoost-trained model viainit_model=no longer restarts the EMA cold.v1 artifacts decode with empty
ema_stats; the engine falls back toMorphState::newcold initialization, preserving prior behaviour for legacy artifacts.
PyO3 0.23 → 0.24 (clears RUSTSEC-2025-0020):
Bumps
pyo3 = "0.24"andnumpy = "0.24". The bindings were already on theBound<>-first API — zero source changes needed.deny.tomland.github/workflows/security-audit.ymlno longer ignore RUSTSEC-2025-0020.
Limitations documented for the next release:
SHAP additivity for piecewise-linear leaves on continuous features remains exempted from the strict check (linear weights and
feature_baselineare still trained in bin space).Joint shared-tree multi-label boosting is still pending; the
MultiLabelGBMRankerwrapper trains K independent per-label rankers.
What’s new in 0.7.2
Documentation, supply-chain, and repo-hygiene release. No user-facing Python API surface changes.
Documentation:
Multiple docs still claimed warm-start was rejected, SHAP required
leaf_model="constant", interaction constraints did not exist, or rankers were single-label only after v0.7.1 actually shipped those features. README,docs/user/*.md, the Sphinx mirror underdocs/site/source/*.rst,docs/roadmap/current.md,CLAUDE.md,AGENTS.md, andbenchmarks/README.mdare now consistent with the v0.7.1 surface that actually shipped.docs/reference/release_checklist.mdis now a top-to-bottom operating manual covering version bumps, doc updates, verification, tag/publish, and post-release bookkeeping.docs/site/source/api.rstnow auto-documentsMultiLabelGBMRanker(was missing in v0.7.1).New
examples/directory with 8 self-contained end-to-end scripts.
Repo hygiene & supply chain:
CI now runs the full pytest suite (455 tests) on every PR. v0.7.1 built the wheel and ran a handful of smoke snippets but never invoked
pytest bindings/python/tests/— the Python test suite was not enforced on merge.Cargo.lockis tracked.maturinpinned inpublish.ymlto the same SemVer range declared inpyproject.toml.cargo-audit+cargo-denyrun weekly and on every PR that touches Cargo manifests, configured via the newdeny.toml.Coverage reporting via
cargo-llvm-cov+pytest-cov→ Codecov.publish = falseon every workspace crate.New
CONTRIBUTING.md,SECURITY.md, GitHub issue / PR / CODEOWNERS / Dependabot configs,.editorconfig,requirements-dev.txt, README badges.
Limitations documented for the next release:
SHAP path-walker still compares against bin-index thresholds (carried over from v0.7.1).
MorphBoost warm-start does not restore the EMA snapshot (carried over from v0.7.1).
MultiLabelGBMRankertrains K independent per-label rankers; joint shared-tree multi-label boosting (carried over from v0.7.1).NEW: SHAP additivity check has a 1e-5 absolute tolerance that f32 round-off can exceed across larger evaluation samples; loosening to
atol + rtol * |predict(x)|is queued.NEW:
pyo3 = 0.23.5has RUSTSEC-2025-0020; not exploitable in AlloyGBM’s code path. Upgrading topyo3 0.24+requires migrating the bindings to theBound<>-first API.
What’s new in 0.7.1
SHAP for piecewise-linear leaves:
shap_values()now acceptsleaf_model="linear"artifacts and returns an interventional decomposition: the path-based TreeSHAP / brute-force machinery attributes each leaf’s “constant part” (intercept + Σ wⱼ·μⱼ_global) while per-leaf row deviationswⱼ · (xⱼ − μⱼ_global)are credited directly to each regressor. Global feature means are persisted in a newFeatureBaselineartifact section so SHAP is self-contained at explain time.
Per-round training diagnostics:
Every estimator exposes
diagnostics_per_round_— a list of dicts containinggradient_l2_norm,gradient_variance,hessian_l2_norm, sampling counts, and (when factor neutralization is active)neutralization_effectiveness= 1 − ‖projₘ‖ / ‖origₘ‖.
Neutralized warm-start:
init_model/warm_start=Truewithneutralization=*is supported acrosspre_target,per_round_gradient, andsplit_penaltyprovided the caller supplies the samefactor_exposuresmatrix used for the initial fit. Mode,factor_neutralization_lambda, and (forsplit_penalty)factor_penaltymust match; mismatches raise a clear “does not match” error.
Interaction constraints:
LightGBM-compatible
interaction_constraints=[[…]]on every estimator. Each group is a set of feature indices; any root-to-leaf path is restricted to splits on features from a single still-active group. Up to 64 groups per fit; enforced through both the level-wise and leaf-wise tree builders.
Multi-label ranking:
New
MultiLabelGBMRankerexposes a unified multi-output ranking API.yis shaped(n_rows, n_labels)andpredictreturns the same shape. Trains one independentGBMRankerper label sharinggroup/factor_exposures/ kwargs, supports per-labelranking_objectivelists, and sliceseval_sety-columns per label so early stopping and custom eval metrics work end-to-end.
Limitations documented for the next release:
SHAP path-walker still compares feature values against bin-index thresholds; strict additivity is relaxed for PL-leaf artifacts. Tightening this is queued for v0.7.2.
MorphBoost warm-start does not restore the EMA snapshot from the artifact, so resumed training starts EMA cold.
MultiLabelGBMRankertrains K independent per-label rankers. Joint shared-tree multi-label boosting is queued for v0.7.2.
What’s new in 0.7.0
Factor-neutral boosting:
New
neutralizationparameter onGBMRegressor,GBMClassifier, andGBMRanker, with row-aligned fit-timefactor_exposures.neutralization="per_round_gradient"projects each boosting round’s objective gradients away from user-supplied factors. Multiclass classification projects each class-gradient column independently.neutralization="pre_target"residualizes the target once before training for built-in squared-error regression. Classification, ranking, custom objectives, and validation sets are rejected for this mode in 0.7.0.neutralization="split_penalty"also subtracts a factor-load penalty from split gain viafactor_penalty. It supports constant leaves, composes withleaf_solver="dro"andtraining_mode="morph", and rejectsleaf_model="linear"in 0.7.0.Neutralized
warm_startandinit_modelcontinuation are rejected in 0.7.0 — this restriction was lifted in v0.7.1 with the same-exposures contract documented above.
Benchmarks:
alloygbm_factor_neutralandalloygbm_factor_neutral_droarms added tobenchmarks/run_model_comparison.py.Benchmark datasets without explicit factors synthesize
factor_exposuresfrom the firstmin(5, n_features)feature columns. These arms are smoke and stability checks, not standalone quality claims, because the synthesized factors are also present as model features.
What’s new in 0.6.0
DRO-style scalar leaves:
New opt-in
leaf_solver="dro"parameter onGBMRegressor,GBMClassifier, andGBMRanker. The solver is a fast, closed-form robust Newton update over within-leaf gradient uncertainty.dro_radiuscontrols the gradient-uncertainty penalty anddro_metric="wasserstein"names the Wasserstein-inspired robust counterpart. This is not a full optimizer over raw feature/target distributions.leaf_solver="dro"requiresleaf_model="constant"and composes withtraining_mode="morph".Inference speed is unchanged because robust scalar leaf values are stored directly in the artifact.
What’s new in 0.5.0
Piecewise-linear (PL) tree leaves:
New opt-in
leaf_model="linear"parameter onGBMRegressor,GBMClassifier, andGBMRanker. Each leaf stores a small linear modelf_s(x) = b_s + Σ α_j x_j(up to 8 regressors per leaf, inherited from the split path’s feature indices; the cap is internal and not user-tunable in v0.5.0). Optimal weights are solved in closed form via the ridge regressionα* = -(XᵀHX + λI)⁻¹ Xᵀg, regularised bylambda_l2.Default
leaf_model="constant"preserves all prior behaviour exactly.New artifact section
ModelSectionKind::LinearLeafCoefficientsstores per-stump linear leaf data; backward-compatible with v0.4.0 artifacts.Native-bitset categorical splits (
max_cat_threshold > 0) fall back to constant leaves at the categorical split node; descendant numeric leaves use linear leaves normally.Multi-class softmax fits each per-class tree sequence with linear leaves independently.
leaf_model="linear"composes withtraining_mode="morph".SHAP (
shap_values,feature_importances) currently raises an error forleaf_model="linear"artifacts; useleaf_model="constant"if you need SHAP.
Performance:
~10× faster convergence on linearly-structured datasets (fewer rounds to reach the same RMSE).
+3.5% RMSE on California Housing and +1.75pp accuracy on Breast Cancer vs constant leaves.
2–8× per-round training overhead from the closed-form Cholesky solve. Recommended
lambda_l2 >= 0.01for weight stability.
Benchmarks:
alloygbm_linearandalloygbm_morph_lineararms added tobenchmarks/run_model_comparison.pyfor all four task types.New
benchmarks/pl_trees_benchmark.pyscript with convergence-curve and λ-sweep analysis.Benchmark report committed to
docs/benchmarks/pl_trees_v1.md.
What’s new in 0.4.0
MorphBoost mode and SIMD acceleration:
New opt-in adaptive training mode via
training_mode="morph", implementing the criterion from Kriuk (2025). Available onGBMRegressor,GBMClassifier, andGBMRanker. See MorphBoost (Adaptive Split Criterion).New per-iteration learning-rate schedule parameter
lr_schedule("constant"default,"warmup_cosine"available). Independent oftraining_mode— usable on its own.Schedule-aware auto early-stopping: when an LR schedule is active, the auto-tuned
min_loss_improvementthreshold is scaled bycurrent_lr / max_lr, and warmup-phase rounds are tolerated without termination.Backend SIMD acceleration via the
widecrate (safe API; AVX2 / NEON intrinsics underneath, scalar fallback otherwise). Histogram bin-scan and EMA passes are now vectorized; histogram tile sizing is auto-tuned for high-feature workloads.New benchmark harnesses:
benchmarks/morph_report.py,benchmarks/morph_ablation.py, and an enhancedbenchmarks/numerai_benchmark.pywith MorphBoost arms and a startup build-freshness check.benchmarks/run_model_comparison.pyregisters two new arms by default per task type:alloygbm_morphandalloygbm_morph_cosine. New--modelsflag filters which arms run.
What’s new in 0.3.2
0.3.2 fixes silent zero-tree training in GBMRanker, corrects signature
introspection, and adds a real-data ranking benchmark:
GBMRanker training fixes:
The auto training policy’s density-based
min_split_gainandmin_loss_improvementfloors are no longer applied to ranking objectives. Ranking gradients are an order of magnitude smaller than regression/classification gradients; on datasets whererow_count * feature_count >= 65 536these floors were causing training to exit after round 1 with zero trees committed.The main training loop’s unconditional
loss_improvement < 0early-exit no longer fires for ranking objectives, where round-to-round loss oscillation is expected behaviour.inspect.signature(GBMRanker.__init__)now returns the full parameter set (ranking_objectiveplus allGBMRegressorparameters). Previously only three parameters were visible, causing tools that build kwargs via signature introspection to silently train withn_estimators=6.
Diagnostics:
stop_reason_androunds_completed_attributes are now set on all estimators afterfit()to surface the engine’s early-stop reason and actual committed round count.
Benchmarks:
Added
california_ranking: California Housing reframed as learning-to-rank with geographic grid cells as queries andmedian_house_valuebucketed into 5 graded relevance levels (~44 queries × 468 docs = ~20 595 rows).
What was new in 0.3.1
0.3.1 fixed multiclass prediction and expanded the benchmark suite:
Fixed
class_treesthreshold conversion so multiclass models predict correctly with continuous float featuresFixed multiclass benchmark argmax label mapping with
model.classes_Added
wine_multiclass,digits_multiclass,adult_income,abalone_regressionbenchmark scenariosActivated
synthetic_multiclassandsynthetic_categoricalscenariosRewrote
benchmarks/README.md
What was new in 0.3.0
0.3.0 adds native categorical splits, multi-class classification, and
custom objective/metric support:
Native categorical splits:
Fisher-sort categorical split-finding with O(K log K) optimal binary partitions and O(1) bitset prediction
max_cat_thresholdparameter controls the maximum category cardinality for native splits (default 0 = disabled, opt-in)Category-to-ID mappings preserved through pickle, save/load, and params
Full support across
GBMRegressor,GBMClassifier, andGBMRanker
Multi-class classification:
GBMClassifierauto-detects K > 2 classes and uses softmax (multinomial cross-entropy) objective with K trees per roundpredict_probareturns (n_samples, K) probability matrix
Custom objectives and metrics:
objective=callablefor user-defined gradient/hessian computationeval_metric=callablefor custom evaluation metrics with early stoppinghigher_is_betterprotocol for metric direction
What was new in 0.2.0
0.2.0 was a major capability expansion from the regression-only 0.1.x
series:
New estimators:
GBMClassifier– binary classification with log-loss objective,predict_proba, sklearnClassifierMixinGBMRanker– learning-to-rank with 5 objectives (RankNet, LambdaMART, XE-NDCG, QueryRMSE, YetiRank)
Core improvements:
NaN / missing value support across training and prediction
Sample weight support via
fit(..., sample_weight=...)Group ID support via
fit(..., group=...)Model persistence: pickle,
save_model/load_model, artifact exportFeature name capture from pandas DataFrames and other named inputs
sklearn compatibility (
BaseEstimator,RegressorMixin,ClassifierMixin,get_params,set_params,score)min_split_gainexposed as a user parameter
Training enhancements:
Leaf-wise (best-first) tree growth via
tree_growth="leaf"Monotone constraints via
monotone_constraintsFeature importance weighting via
feature_weightsmax_leavesparameter for leaf-budget-oriented trainingWarm-starting / incremental training via
warm_start=TrueUp to 65,535 bins per feature (adaptive u8/u16 storage)
Multiple categorical column support via
categorical_feature_indicesHistogram buffer reuse to reduce allocation pressure
Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
Explanations:
TreeSHAP (polynomial-time exact Shapley values, replaces the 25-feature brute-force method)
SHAP limit raised from 20 to 25 features (for legacy brute-force path), then replaced entirely by TreeSHAP
Metrics:
accuracy– classification accuracylog_loss– binary cross-entropyndcg– normalized discounted cumulative gain (with optional k)
Benchmarks:
Classification scenarios:
breast_cancer,synthetic_classificationRanking scenario:
synthetic_rankingTask-type-aware benchmark runner with per-type metrics and rendering
Validated release surface
For 0.7.1, the intended release surface is:
macOS
arm64wheelLinux
x86_64manylinux wheelsource distribution
Deferred targets
These are intentionally deferred:
Windows wheels
macOS Intel wheels
Release checklist summary
Before a public release:
confirm package metadata and version
confirm user docs are up to date
confirm CI is green
confirm the built wheel installs in a fresh environment
confirm the publish workflow smoke-tests its wheel artifacts before upload
confirm benchmark messaging stays narrow and defensible