GBMRanker

GBMRanker is the learning-to-rank estimator in AlloyGBM.

Overview

GBMRanker extends GBMRegressor with ranking-specific objectives. All ranking objectives require query group identifiers to be passed in fit(). Data is sorted by group internally.

Quick example

from alloygbm import GBMRanker, ndcg

model = GBMRanker(
    ranking_objective="rank:ndcg",
    learning_rate=0.05,
    max_depth=6,
    n_estimators=300,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)

scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))

Ranking objectives

"rank:pairwise" – Pairwise logistic loss (RankNet)
"rank:ndcg" – LambdaMART with NDCG weighting (default)
"rank:xendcg" – Cross-entropy approximation to NDCG
"queryrmse" – Query-grouped RMSE
"yetirank" – YetiRank (stochastic NDCG-weighted pairwise)

As of v0.12.8, GBMRanker also accepts the regression objectives inherited from GBMRegressor via ranking_objective=: "poisson", "gamma", "tweedie" (log-link GLM, predict() returns exp(raw)), and "quantile" (pinball loss, quantile_alpha ∈ (0.0, 1.0)).

Parameters

ranking_objective: str = "rank:ndcg" – the ranking loss function

All other parameters are inherited from GBMRegressor, including leaf_solver="dro" for robust scalar leaves, leaf_model="linear" for piecewise-linear leaves (see GBMRegressor), and training_mode="morph" and the MorphBoost / LR-schedule parameters (morph_rate, evolution_pressure, morph_warmup_iters, info_score_weight, depth_penalty_base, balance_penalty, lr_schedule, lr_warmup_frac). See MorphBoost (Adaptive Split Criterion). leaf_model="linear" and training_mode="morph" can be combined.

boosting_mode="goss" with goss_top_rate / goss_other_rate and boosting_mode="dart" with dart_drop_rate / dart_max_drop / dart_normalize_type / dart_sample_type are both supported on the ranking objective (see GBMRegressor “Boosting mode” for the full semantics).

Methods

fit(X, y, *, group, eval_set=None, eval_group=None, ...) – trains the ranker. group is required and provides per-row query identifiers.
predict(X) – returns raw relevance scores (higher = more relevant)

Evaluation

from alloygbm import ndcg

score = ndcg(y_test, predictions, group=query_ids_test)
score_at_10 = ndcg(y_test, predictions, group=query_ids_test, k=10)

Group format

The group parameter accepts per-row group identifiers (e.g. query IDs). AlloyGBM sorts by group internally, so rows do not need to be pre-sorted.

# Per-row group IDs (AlloyGBM format)
group = [0, 0, 0, 1, 1, 2, 2, 2, 2]

Early stopping

model = GBMRanker(
    ranking_objective="rank:ndcg",
    n_estimators=2000,
    early_stopping_rounds=50,
)
model.fit(
    X_train, y_train,
    group=query_ids_train,
    eval_set=(X_valid, y_valid),
    eval_group=query_ids_valid,
)

Current scope

5 ranking objectives implemented natively in Rust, plus the 4 inherited regression objectives (poisson, gamma, tweedie, quantile) as of v0.12.8
Single-label per GBMRanker. For multi-output ranking, see MultiLabelGBMRanker (also covered in GBMRegressor). Joint shared-tree multi-label boosting is deferred to v0.10.0 (paired with the K-output shared-histogram primitive).
Group identifiers must be unsigned integers