MorphBoost (Adaptive Split Criterion) ===================================== MorphBoost is an opt-in training mode in AlloyGBM that augments the standard gradient-gain split criterion with a normalized information-theoretic term, plus several round-aware leaf adjustments. Implementation follows the formulation in `Kriuk (2025), MorphBoost `_. When To Use It -------------- MorphBoost tends to help most on: - Tabular problems with low signal-to-noise ratio (financial residuals, Numerai-style returns, etc.) where the standard gain criterion can overfit to spurious local-best splits. - Workloads where you want the model to find structure that a pure-gradient-gain learner misses early in training. It is not a strict upgrade — treat MorphBoost as a configuration to A/B against ``training_mode="auto"`` rather than a default replacement. How It Works ------------ For every candidate split, the gain is .. code-block:: text gradient_score = standard XGBoost-style gradient gain info_score = normalized information-gain term over the partition morph_weight = tanh(iteration / 20) # ramps in over training gain = (1 - info_score_weight) * gradient_score + info_score_weight * info_score * morph_weight + optional balance penalty In addition: - A per-class EMA over gradient statistics tracks recent training dynamics and shapes split selection during evaluation. - Leaf values are scaled by a depth-based penalty (``depth_penalty_base ** (depth / 3)``) and a per-iteration shrinkage (``1 - morph_rate * progress``). - An optional balance penalty discourages highly imbalanced splits. Enabling It ----------- Pass ``training_mode="morph"`` to any AlloyGBM estimator. The same parameter exists on :class:`~alloygbm.GBMRegressor`, :class:`~alloygbm.GBMClassifier`, and :class:`~alloygbm.GBMRanker`. .. code-block:: python from alloygbm import GBMRegressor model = GBMRegressor( n_estimators=1200, max_depth=6, learning_rate=0.05, training_mode="morph", seed=7, ) model.fit(X_train, y_train) ``training_mode`` accepts ``"auto"`` (default), ``"manual"``, or ``"morph"``. Parameters ---------- All MorphBoost-related parameters are exposed as top-level keyword arguments on the estimator; the table below notes any mode-specific behavior. .. list-table:: :header-rows: 1 :widths: 28 12 60 * - Parameter - Default - Description * - ``morph_rate`` - ``0.1`` - Per-iteration leaf shrinkage rate. Range ``[0.0, 1.0]``. * - ``evolution_pressure`` - ``0.2`` - Strength of EMA-driven gain shaping. Range ``[0.0, 1.0]``. * - ``morph_warmup_iters`` - ``5`` - Initial rounds for which the morph blend collapses to the pure gradient gain. * - ``info_score_weight`` - ``0.3`` - Mixing weight for the information-theoretic term post-warmup. Range ``[0.0, 1.0]``. ``0.0`` disables the info-theoretic term. * - ``depth_penalty_base`` - ``0.9`` - Base of the leaf depth penalty. Range ``(0.0, 1.0]``. ``1.0`` disables the penalty. * - ``balance_penalty`` - ``True`` - Whether to penalize highly imbalanced splits. * - ``lr_schedule`` - ``"constant"`` - Per-iteration LR schedule. ``"constant"`` or ``"warmup_cosine"``. Independent of ``training_mode`` — usable on its own. * - ``lr_warmup_frac`` - ``0.1`` - Fraction of ``n_estimators`` spent in the linear-warmup phase when ``lr_schedule="warmup_cosine"``. Range ``[0.0, 1.0]``. Must be left at the default ``0.1`` when ``lr_schedule="constant"``; non-default values with a constant schedule raise ``ValueError``. Learning-Rate Schedules ----------------------- ``lr_schedule`` is independent of ``training_mode``. Two schedules are supported: - ``"constant"`` (default) — single fixed ``learning_rate`` for all rounds. - ``"warmup_cosine"`` — linear warmup from a small fraction of ``learning_rate`` up to ``learning_rate`` over the first ``lr_warmup_frac * n_estimators`` rounds, then half-cosine decay to a floor of ``0.01 * learning_rate`` over the remainder. The warmup-cosine schedule is most useful at very low ``learning_rate`` and high ``n_estimators`` (e.g. ``n_estimators=5000``, ``learning_rate=0.01``). .. code-block:: python model = GBMRegressor( n_estimators=5000, learning_rate=0.01, training_mode="morph", lr_schedule="warmup_cosine", lr_warmup_frac=0.1, ) When a non-constant LR schedule is active, AlloyGBM's auto-stopping logic becomes schedule-aware: the auto-tuned ``min_loss_improvement`` threshold is scaled by ``current_lr / max_lr``, and empty / slightly-negative rounds during the explicit warmup phase do not terminate training. Combining with piecewise-linear leaves -------------------------------------- ``training_mode="morph"`` composes with ``leaf_model="linear"``. The MorphBoost gain criterion drives split selection, and each resulting leaf still fits a closed-form linear model via the ridge solve. Pair with ``lambda_l2 >= 0.01`` for weight stability. See :doc:`estimator` for the full ``leaf_model`` reference. Combining with DRO leaves ------------------------- ``training_mode="morph"`` also composes with ``leaf_solver="dro"`` when ``leaf_model="constant"``. The robust gradient gain and scalar leaf value are computed first; MorphBoost then blends the robust gradient gain with its information score and applies the usual depth/iteration leaf scaling. Persistence ----------- Models trained with ``training_mode="morph"`` save and load identically to auto-mode models — ``pickle``, ``save_model`` / ``load_model``, and raw artifact export all work without extra steps. The morph configuration is embedded as an optional artifact section so loaded models predict consistently.