GBMRanker ========= ``GBMRanker`` is the learning-to-rank estimator in AlloyGBM. Overview -------- ``GBMRanker`` extends ``GBMRegressor`` with ranking-specific objectives. All ranking objectives require query group identifiers to be passed in ``fit()``. Data is sorted by group internally. Quick example ------------- .. code-block:: python from alloygbm import GBMRanker, ndcg model = GBMRanker( ranking_objective="rank:ndcg", learning_rate=0.05, max_depth=6, n_estimators=300, deterministic=True, seed=7, ) model.fit(X_train, y_train, group=query_ids_train) scores = model.predict(X_test) print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10)) Ranking objectives ------------------ - ``"rank:pairwise"`` -- Pairwise logistic loss (RankNet) - ``"rank:ndcg"`` -- LambdaMART with NDCG weighting (default) - ``"rank:xendcg"`` -- Cross-entropy approximation to NDCG - ``"queryrmse"`` -- Query-grouped RMSE - ``"yetirank"`` -- YetiRank (stochastic NDCG-weighted pairwise) As of v0.12.8, ``GBMRanker`` also accepts the regression objectives inherited from ``GBMRegressor`` via ``ranking_objective=``: ``"poisson"``, ``"gamma"``, ``"tweedie"`` (log-link GLM, ``predict()`` returns ``exp(raw)``), and ``"quantile"`` (pinball loss, ``quantile_alpha`` ∈ (0.0, 1.0)). Parameters ---------- - ``ranking_objective: str = "rank:ndcg"`` -- the ranking loss function All other parameters are inherited from ``GBMRegressor``, including ``leaf_solver="dro"`` for robust scalar leaves, ``leaf_model="linear"`` for piecewise-linear leaves (see :doc:`estimator`), and ``training_mode="morph"`` and the MorphBoost / LR-schedule parameters (``morph_rate``, ``evolution_pressure``, ``morph_warmup_iters``, ``info_score_weight``, ``depth_penalty_base``, ``balance_penalty``, ``lr_schedule``, ``lr_warmup_frac``). See :doc:`morphboost`. ``leaf_model="linear"`` and ``training_mode="morph"`` can be combined. ``boosting_mode="goss"`` with ``goss_top_rate`` / ``goss_other_rate`` and ``boosting_mode="dart"`` with ``dart_drop_rate`` / ``dart_max_drop`` / ``dart_normalize_type`` / ``dart_sample_type`` are both supported on the ranking objective (see :doc:`estimator` "Boosting mode" for the full semantics). Methods ------- - ``fit(X, y, *, group, eval_set=None, eval_group=None, ...)`` -- trains the ranker. ``group`` is required and provides per-row query identifiers. - ``predict(X)`` -- returns raw relevance scores (higher = more relevant) Evaluation ---------- .. code-block:: python from alloygbm import ndcg score = ndcg(y_test, predictions, group=query_ids_test) score_at_10 = ndcg(y_test, predictions, group=query_ids_test, k=10) Group format ------------ The ``group`` parameter accepts per-row group identifiers (e.g. query IDs). AlloyGBM sorts by group internally, so rows do not need to be pre-sorted. .. code-block:: python # Per-row group IDs (AlloyGBM format) group = [0, 0, 0, 1, 1, 2, 2, 2, 2] Early stopping -------------- .. code-block:: python model = GBMRanker( ranking_objective="rank:ndcg", n_estimators=2000, early_stopping_rounds=50, ) model.fit( X_train, y_train, group=query_ids_train, eval_set=(X_valid, y_valid), eval_group=query_ids_valid, ) Current scope ------------- - 5 ranking objectives implemented natively in Rust, plus the 4 inherited regression objectives (``poisson``, ``gamma``, ``tweedie``, ``quantile``) as of v0.12.8 - Single-label per ``GBMRanker``. For multi-output ranking, see :class:`~alloygbm.MultiLabelGBMRanker` (also covered in :doc:`estimator`). Joint shared-tree multi-label boosting is deferred to v0.10.0 (paired with the K-output shared-histogram primitive). - Group identifiers must be unsigned integers