# X "For You" Feed Algorithm — Analysis of the May 15, 2026 Release

Source: https://github.com/xai-org/x-algorithm (commit pulled 2026-05-15). Cloned locally to `~/repos/x-algorithm`. License Apache 2.0.

This is xAI's first publication of the post-Twitter-acquisition recommendation stack. It supersedes the March-2023 `twitter/the-algorithm` (Scala/Heavy Ranker) and `twitter/the-algorithm-ml` (Python/torchrec) repos, which are now effectively archived in spirit — almost nothing from the old stack survives.

---

## TL;DR

- The whole For-You feed now runs through **one Grok-based transformer (Phoenix)** that does both **retrieval (two-tower)** and **ranking (multi-action prediction)**.
- "We have eliminated every single hand-engineered feature and most heuristics from the system." Heavy Ranker, SimClusters, TwHIN, RealGraph, Earlybird-based ranker — all gone.
- The final per-post score is still a **weighted linear combination of predicted action probabilities**, very much like 2023. The model just got bigger and the input features got simpler.
- New auxiliary services: **Grox** (content-understanding pipeline: spam, "banger" detection, multimodal embeddings, PTOS safety classifiers) and a **brand-safety-aware ads blender**.
- Retention is still ~6 hours of in-network posts (Thunder); out-of-network candidates come from learned ANN retrieval, not from social-graph following heuristics.

---

## 1. System Architecture

Four Cargo/Python services live in the repo:

| Service | Language | Role |
|---|---|---|
| `home-mixer/` | Rust | Orchestrator. Hydrates query → fetches candidates → filters → scores → selects top-K → ad blender. Serves gRPC `ScoredPostsService` + `ForYouFeedServer`. |
| `thunder/` | Rust | In-memory store of recent posts per user, fed by Kafka. Sub-ms lookup for in-network candidates. |
| `phoenix/` | Python/JAX | The ML brain: two-tower retrieval + transformer ranker. Includes a ~3 GB pre-trained mini model. |
| `grox/` | Python | Async content-understanding worker pool (classifiers, embedders, safety, ASR, summarisers). |
| `candidate-pipeline/` | Rust | Generic pipeline crate: `Source`/`Hydrator`/`Filter`/`Scorer`/`Selector`/`SideEffect` traits. |

Request flow (Home Mixer):

```
Query Hydrators → Candidate Sources → Candidate Hydrators → Pre-scoring Filters
   → Scorers (Phoenix → Weighted → AuthorDiversity → OON) → Top-K Selector
   → Post-selection Filters → Ads Blender → FeedItem[]
```

Sources (`home-mixer/sources/`):
- `thunder_source.rs` — in-network posts from accounts you follow
- `phoenix_source.rs` — generic Phoenix retrieval (out-of-network)
- `phoenix_moe_source.rs` — Mixture-of-Experts retrieval variant
- `phoenix_topics_source.rs` — topic-anchored retrieval (new-user / followed topics)
- `tweet_mixer_source.rs` — legacy tweet mixer
- `ads_source.rs`, `who_to_follow_source.rs`, `prompts_source.rs`, `push_to_home_source.rs`, `cached_posts_source.rs`, `scored_posts_source.rs`

---

## 2. Scoring & Ranking — How a Post Becomes a Number

### 2.1 The Weighted Scorer (the part everyone cares about)

`home-mixer/scorers/weighted_scorer.rs:44-70` — the final relevance score is:

```
Final = Σᵢ weightᵢ × P(actionᵢ)
```

with these action probabilities coming from the Phoenix transformer:

| Action (positive) | Weight constant |
|---|---|
| Favorite (like) | `FAVORITE_WEIGHT` |
| Reply | `REPLY_WEIGHT` |
| Retweet (repost) | `RETWEET_WEIGHT` |
| Quote | `QUOTE_WEIGHT` |
| Quoted-tweet click | `QUOTED_CLICK_WEIGHT` |
| Click (open post) | `CLICK_WEIGHT` |
| Profile click | `PROFILE_CLICK_WEIGHT` |
| Photo expand | `PHOTO_EXPAND_WEIGHT` |
| Share | `SHARE_WEIGHT` |
| Share via DM | `SHARE_VIA_DM_WEIGHT` |
| Share via copy link | `SHARE_VIA_COPY_LINK_WEIGHT` |
| Video Quality View | `VQV_WEIGHT` *(only counted if `video_duration_ms > MIN_VIDEO_DURATION_MS`)* |
| Continuous dwell time | `CONT_DWELL_TIME_WEIGHT` |
| Dwell (bucketed) | `DWELL_WEIGHT` |
| Follow author | `FOLLOW_AUTHOR_WEIGHT` |

| Action (negative) | Weight constant |
|---|---|
| Not interested | `NOT_INTERESTED_WEIGHT` |
| Block author | `BLOCK_AUTHOR_WEIGHT` |
| Mute author | `MUTE_AUTHOR_WEIGHT` |
| Report | `REPORT_WEIGHT` |

**The actual numeric values are not in the repo** — `use crate::params as p;` references a `params` module that isn't published. (Same pattern as the 2023 release, where the numbers landed elsewhere and shifted over time via "feature switches". In production these are runtime-tunable.) What *is* fixed by the code:

- **15 positive engagement heads, 4 negative ones.** That's roughly 2× more action heads than the 2023 model, which mostly cared about like, retweet, reply, profile-click, video-50, negative-feedback, report.
- **Video views only count if the video is long enough** (`vqv_weight_eligibility`). Short clips can't gain from the VQV signal — a likely guard against algorithmic incentive to attach a stub video to non-video content.
- **Two dwell signals at once** — a bucketed `dwell_score` and a `dwell_time` continuous prediction. The model both classifies "will dwell" and regresses dwell milliseconds.
- **Share is broken into three sub-actions** — generic share, DM share, copy-link share. Letting weights diverge means xAI can boost "shared to a friend in DM" without also boosting "share button tapped".
- **`offset_score` post-processing** clips negatives and rescales:
  - If sum-of-all-weights is zero, just `max(combined, 0)` (drop negative-only scores).
  - Otherwise: positives get a constant offset; negatives get rescaled by `(combined + NEGATIVE_WEIGHTS_SUM) / WEIGHTS_SUM * NEGATIVE_SCORES_OFFSET` so that even the worst post stays above zero by construction — diversifies the floor instead of producing huge negative outliers.

### 2.2 Author Diversity Scorer

`scorers/author_diversity_scorer.rs`. After the weighted score, candidates are re-sorted globally, then walked top-to-bottom; the *n*-th post by the same author gets multiplied by:

```
multiplier(n) = (1 − floor) × decay^n + floor
```

Geometric decay with a floor — so the 0th post from author A keeps its full score, the 1st is decayed by `decay`, asymptote at `floor`. Net effect: a single user can dominate at most a few slots in the feed, no matter how relevant they are.

### 2.3 OON (Out-of-Network) Scorer

`scorers/oon_scorer.rs`. Final pass: if `in_network == false`, multiply by `OON_WEIGHT_FACTOR`. This is the single explicit "follow graph beats ML retrieval" knob. The fact that it's a multiplicative penalty rather than additive means well-ranked OON posts can still beat in-network noise, but the bar is uniformly higher.

### 2.4 What the model does *not* directly score on

There is **no published code path** for:
- Author "creator score" or follower count → not directly in the weighting; the model has to learn it from history.
- Verification/Subscription → only used as a *filter* signal (`subscription_hydrator`, `ineligible_subscription_filter`); doesn't appear in `weighted_scorer.rs`. Whether Phoenix's hash embeddings encode subscription tier is opaque.
- Anything Elon-specific. Compared to the 2023 leak where a special `author_is_elon` boolean was logged for telemetry, this release has no such hook. (Doesn't mean none exists in the closed weights, but the open code is clean.)
- Recency. Post age is fed to the model as a bucketed embedding (`post_age_embedding_table`, 60-min buckets, max 4800 minutes ≈ 80 hours), but there is no hand-tuned recency decay layered on top.

---

## 3. Phoenix — the ML Stack

### 3.1 Retrieval: Two-Tower

- User tower runs the same transformer as the ranker.
- Candidate tower precomputes a normalised embedding per post in the global corpus.
- Top-K via dot-product / approximate nearest-neighbour.
- Sample artifact: a 537K-post "sports corpus" with pre-computed candidate reps (~1.4 GB embedding tables + ~3 MB transformer weights).

### 3.2 Ranking: Transformer with Candidate Isolation

`phoenix/recsys_model.py` plus the Grok-1 transformer from `phoenix/grok.py`.

The transformer eats a single sequence: **[user_token, history_1, …, history_S, candidate_1, …, candidate_C]**, but with a custom **attention mask**:

- User & history positions: full bidirectional attention.
- Candidate positions: can attend to **user + history + their own self only** — they cannot see each other.

This is the single most important architectural decision and is restated repeatedly in the README:

> "Candidates cannot attend to each other—only to the user context. This ensures the score for a post doesn't depend on which other posts are in the batch, making scores consistent and cacheable."

In other words, the same post must score identically regardless of which other candidates were bundled with it. That kills batching artefacts and lets the serving layer cache scores or compute them in parallel shards.

### 3.3 Inputs (per timestep)

Each history and candidate row carries:

- Post hash (multiple — `num_item_hashes`, default 2)
- Author hash (multiple — `num_author_hashes`, default 2)
- Product surface ID (vocab 16 — "where on X did this happen": For-You, profile, search, etc.)
- For history only: a learned **multi-hot action vector** (which engagements the user took on this past post; signed: 2x − 1, so non-actions count as -1)
- Post age bucket (60-min granularity, up to 80h)
- Continuous values for history (e.g. dwell time normalised to 0-1)

User row carries:
- User hash(es)
- Optional **IP hash(es)** — `use_ip_address` flag. Disabled by default in the mini config but the plumbing is there; potentially a coarse geo signal.

### 3.4 Outputs

Two heads:
- `logits` of shape `[B, num_candidates, num_actions]` → sigmoid → discrete action probabilities (the 15-19 the weighted scorer consumes).
- `continuous_preds` of shape `[B, num_candidates, num_continuous_actions]` → sigmoid → continuous values (dwell time, etc.).

Mini-config in the published checkpoint:
- emb_dim 128, 4 layers, 4 heads, key size 32, widening 2
- 1 M-row hash tables per user/item/author
- 127 history tokens, 64 candidates per call
- 19 action types, 8 continuous actions

The README explicitly says production is bigger (more layers, wider embeddings) and trained continuously; the open release is a frozen snapshot.

### 3.5 Hash-based embeddings (no learned ID embeddings)

Both user and post IDs are run through *multiple hash functions* and the resulting buckets are looked up in a fixed-size embedding table. Collisions are tolerated; multiple hashes give a Bloom-filter-ish disambiguation. This is how they keep a single embedding table at fixed size (1 M rows in mini, presumably much larger in prod) across billions of users/posts.

---

## 4. Hydrators — What X Actually Knows About You at Score Time

`home-mixer/query_hydrators/` — pulled per request:

- `user_action_seq_query_hydrator` — your engagement history (the model's main input).
- `user_features_query_hydrator` — preferences, languages, muted keywords.
- `followed_user_ids`, `followed_starter_packs`, `followed_grok_topics`, `inferred_grok_topics`, `subscribed_user_ids`, `muted_user_ids`, `blocked_user_ids`
- `served_history` — what was already shown
- `impressed_posts` + `impression_bloom_filter` — to avoid re-serving
- `past_request_timestamps` — paging / freshness
- `ip_query_hydrator` — IP address (can flow into the model if `use_ip_address`)
- `user_demographics`, `user_inferred_gender`
- `mutual_follow_query_hydrator` — your MinHash signature of who you follow
- `retrieval_sequence`, `scoring_sequence`, `cached_posts`

`home-mixer/candidate_hydrators/` — per-candidate enrichment:

- `core_data_candidate_hydrator` — text/metadata
- `gizmoduck_hydrator` — author user object (verification, follower counts)
- `engagement_counts_hydrator`, `has_media_hydrator`, `video_duration_candidate_hydrator`, `language_code_hydrator`
- `quote_hydrator`, `in_network_candidate_hydrator`, `following_replied_users_hydrator`
- `mutual_follow_jaccard_hydrator` — MinHash Jaccard similarity between viewer's and candidate-author's follow graphs (the "people you might know are talking about this" signal)
- `blocked_by_hydrator`
- `subscription_hydrator`
- `vf_candidate_hydrator`, `ads_brand_safety_hydrator`, `ads_brand_safety_vf_hydrator`
- `filtered_topics_hydrator`, `tweet_type_metrics_hydrator`

Notable: the **mutual-follow Jaccard via MinHash** is a holdover-style signal — but it's used as a hydrated feature flowing into the model, not as a hand-applied multiplier in `weighted_scorer.rs`. Whether the transformer actually uses it depends on whether prod includes it in the input projection.

---

## 5. Filters

### 5.1 Pre-scoring (`home-mixer/filters/`)

| Filter | Drops |
|---|---|
| `drop_duplicates_filter` | Duplicate post IDs |
| `core_data_hydration_filter` | Failed-to-hydrate candidates |
| `age_filter` | Stale posts (config-driven threshold) |
| `self_tweet_filter` | Your own posts |
| `retweet_deduplication_filter` | Multiple reposts of the same source |
| `ineligible_subscription_filter` | Paywalled content you can't access |
| `previously_seen_posts_filter` + `_backup_filter` | Posts you've already seen (with a backup if the bloom filter fails) |
| `previously_served_posts_filter` | Posts served in the current session |
| `muted_keyword_filter` | Tokenised match against your muted keywords (real tokenizer, not substring — won't trip on "scunthorpe") |
| `author_socialgraph_filter` | Blocked / muted authors |
| `topic_ids_filter` + `new_user_topic_ids_filter` | Topic-based filtering |
| `video_filter` | Video-eligibility checks |

### 5.2 Post-selection (`vf_filter`, `ancillary_vf_filter`, `dedup_conversation_filter`)

`vf_filter` is the final visibility-filtering pass that drops anything where the visibility-filtering subsystem returned `Action::Drop(_)` — deleted, spam, violence/gore, etc. `dedup_conversation_filter` collapses multiple branches of the same reply thread.

---

## 6. Grox — Content Understanding (entirely new vs 2023)

`grox/` is an async task-execution engine that runs classifiers and embedders over freshly-published posts.

### Plans run by `PlanMaster`:
- `PlanInitialBanger` — the "banger screen": does the post look like it has viral potential (topic match + likely engagement). Caches Grok topic catalogue with 1h TTL.
- `PlanPostSafety` — initial safety screening (rules + classifier).
- `PlanSafetyPtos` — Platform Terms Of Service violations, including a "deluxe" recheck path that explicitly re-runs the adult-content classifier even when no adult-content violation was originally flagged.
- `PlanSpamComment` — uses `SpamEapiLowFollowerClassifier`, with follower-count bucketing (≤100, ≤500, ≤1000, >1000) on both the immediate reply target and the root author. Implicit assumption: low-follower replies are spammier; bucket explicitly recorded as a metric.
- `PlanPostEmbeddingV5` / `…WithSummary` (with `_for_reply` variants) — multimodal post embeddings produced via a summariser pipeline (Grok-generated text summary of multimodal content feeds into the embedder). Two embedder versions live side-by-side (`multimodal_post_embedder_v2.py`, `multimodal_post_embedder_v5.py`).
- `PlanReplyRanking` — ranks replies under a post.

### Side note — `task_asr.py`
Automatic speech recognition over video/audio content. The transcripts feed into the multimodal embedder, so videos can be ranked on what was *said* in them, not just frame embeddings.

### Side note — `disable_rules.py`
Centralised place to switch off individual tasks. Suggests rapid-iteration safety controls.

---

## 7. Ads Blender (new module)

`home-mixer/ads/` is the only place ads enter the feed.

### Two blenders, selected via `AdsBlenderType` param:

- `SafeGapAdsBlender` — looks at scored posts, finds **"safe gaps"** (positions where neither the post above nor below has `BrandSafetyVerdict::MediumRisk`), then assigns ads to those gaps according to requested/min spacing. Won't insert an ad sandwiched between two risky posts.
- `PartitionOrganicAdsBlender` — alternative blender (not read here, but partitions organic content).

### Ad-side controls (`util.rs`):
- `should_drop_bsr_low` — if an ad has Brand-Safety-Risk-Low / IAS-Low and the adjacent post has `LowRisk` verdict, drop the ad.
- `should_drop_handle` — advertisers can specify @handles their ad must not run next to. Tokenised match against author IDs above/below.
- `should_drop_keyword` — advertisers can specify keywords their ad must not run next to. Uses the same tokeniser as muted-keyword filtering, so this is a proper tokenised match (no substring false positives).
- `MIN_POSTS_FOR_ADS = 5` — no ads if the candidate set is tiny.
- `DEFAULT_SPACING = AdSpacing { requested: 3, min: 2 }` — at most one ad every ~3 organic posts, never closer than 2.
- The last slot in the feed is never an ad (`items.pop()` if the trailing item is an Ad).

### Telemetry
Brand-safety verdicts and risk levels are emitted as Prometheus-style counters (`AdsBlender.post_brand_safety_verdict`, `AdsBlender.ad_brand_safety_risk`). Useful for spotting drift in the safety classifier upstream.

---

## 8. Selector — the BlenderSelector

`home-mixer/selectors/blender_selector.rs` partitions FeedItems into `(posts, ads, wtf_modules, prompts, push_to_home)`, runs the chosen ads blender, then injects:
- **Who-To-Follow** module at a configured `WHO_TO_FOLLOW_POSITION` (params)
- **Prompts** at `PROMPTS_POSITION`
- **Push-to-Home** posts (sticky promoted posts) somewhere in the partition logic.

Final output is a fully positioned `Vec<FeedItem>`.

---

## 9. What Changed vs the 2023 Twitter Release

This is largely a *replacement*, not an evolution. Mapping old → new:

| 2023 Twitter | 2026 xAI |
|---|---|
| Heavy Ranker (parallel MLP, hundreds of engineered features) | Single Grok transformer, only hash embeddings + actions + product surface |
| `SimClusters` (sparse community membership) | Gone. Two-tower retrieval replaces it. |
| `TwHIN` graph embeddings | Gone. Hash embeddings learned end-to-end inside Phoenix. |
| `RealGraph` (predicted engagement between users) | Gone explicitly. The transformer ingests engagement *history* directly. |
| Earlybird search-based ranker | Gone. Phoenix retrieval covers the OON corpus. |
| Trust & Safety: heuristic rules + a few classifiers | `grox/` content-understanding pipeline with PTOS, banger, spam, ASR, multimodal embedders. Much heavier ML investment. |
| Author-author social-graph features (mutuals, who-follows-whom) | Reduced to one MinHash-Jaccard hydrator; the transformer is expected to learn the rest from history. |
| Weighted score: ~10 action heads, weights known to include strongly-negative `Reported` weight | 19 action heads (15 positive + 4 negative); weights still external; same linear-combination structure. |
| In-network = Tweetypie + GraphJet | In-network = Thunder (Kafka-fed in-memory store, sub-ms reads) |
| Server: Scala/Finagle | Server: Rust/tonic (gRPC), Python/JAX for ML |
| Heuristics: author diversity, OON penalty, recency boost | Author diversity scorer (geometric decay), OON multiplicative penalty kept; recency moved into the model as a learned post-age embedding |
| Hand-coded ad-injection rules | Brand-safety-aware ad blender with explicit risk gap-finding and advertiser-side keyword/handle exclusions |

The most consequential philosophical shift is **"no hand-engineered features"**. The 2023 stack lived and died on feature engineering across dozens of services; this one collapses almost all of it into a single model fed by hashed IDs and an action sequence. That's a much more "GPT-shaped" architecture for recommendations — and it means the model's behaviour is much harder to inspect or audit from outside than the 2023 stack was.

---

## 10. Interesting Tidbits / Red Flags

- **The actual weights are still closed.** Every `…_WEIGHT` constant is referenced but not published. The structural code is honest about what the system *can* do with weights, but you can't tell whether reports really tank a post or merely nudge it. (In 2023, the leaked weights were extreme: a reported-content weight was something like -369 vs +13.5 for a reply.)
- **The model can ingest your IP** (`use_ip_address` flag, `num_ip_hashes`). Disabled in the released mini config. Whether prod uses it is unclear, but the input projection is wired for it.
- **The "deluxe" PTOS recheck** in `task_safety_ptos_policy.py` explicitly re-injects an `AdultContent` policy check even when no violation was flagged — defence-in-depth against the classifier missing adult content. Suggests this is a known failure mode of the upstream classifier.
- **Banger detection is its own thing.** A classifier specifically trained to detect "this looks viral" runs at ingestion. Topic catalogue cached with a 1h TTL. There is no public list of what counts as a "banger" topic in the open code, but the cache is keyed by topic, so trending-topic boosting is at least possible.
- **Multimodal embeddings are summary-first.** `PostEmbeddingWithSummary` runs the post through a summariser (likely Grok) and embeds the summary plus the raw multimodal content. So for video posts, what Grok *thinks the video is about* becomes part of the retrieval signal — interesting failure mode if Grok mis-summarises.
- **Spam detection is follower-bucketed** at 100 / 500 / 1000 thresholds for both root author and immediate reply parent. New accounts replying to other new accounts get more spam scrutiny than established-to-established replies.
- **Author diversity uses geometric decay, not a hard cap.** Theoretically your favourite author can occupy positions 1, 2, 3… they just decay quickly. Combined with `floor`, very high-quality content from one author can still beat low-quality from many.
- **The model is trained continuously in production**, frozen for release. That means the open checkpoint will get stale; behaviour observers should not over-fit to it.
- **Mutual-Follow Jaccard via MinHash** is the only piece of the old social-graph machinery that visibly survived as a per-candidate feature.
- **`mask_neg_feedback_on_negatives: bool = True`** in `PhoenixModelConfig` — there's a training-time choice to mask negative-feedback labels when the candidate is itself a negative example. Suggests they care about not double-penalising negatives during training.

---

## 11. What's *Not* in the Repo

- The numeric weights of the weighted scorer (`crate::params`).
- The list of `feature switches` / runtime config (`A/B test framework`).
- The production-size model weights (only a mini 128-dim / 4-layer is published).
- The training pipeline (only inference is exposed; `run_pipeline.py` loads a frozen checkpoint).
- The full corpus index — only a 6-hour sports-topic slice (`sports_corpus.npz`) is published.
- The exact Botmaker rule catalogue used by `ads_brand_safety_hydrator`.
- Anything about creator monetisation tiers and how (if at all) they influence ranking. The `subscription_hydrator` only filters paywalled content; whether subscribed-author posts get any score lift is invisible in this release.

---

## 12. How to Run It Locally

```bash
cd ~/repos/x-algorithm/phoenix
unzip artifacts/oss-phoenix-artifacts.zip -d artifacts/
uv sync
uv run run_pipeline.py --artifacts_dir artifacts/oss-phoenix-artifacts
```

That runs **retrieval → ranking** against the 537K-post sports corpus with an example user history of (NFL, NBA, NHL likes/dwells) and prints ranked candidates with per-action engagement probabilities.

To experiment with what the model thinks of *you*: edit `phoenix/artifacts/oss-phoenix-artifacts/example_sequence.json` to seed a different action history (`post_id`, `author_id`, `actions` map keyed by action enum). Action enum from the README: `1=favorite, 4=reply, 5=quote, 6=repost, 11=dwell, 13=video_quality_view`.

---

## Sources / Refs

- `xai-org/x-algorithm` (this analysis) — README.md, `home-mixer/scorers/weighted_scorer.rs`, `phoenix/recsys_model.py`, `phoenix/README.md`, `grox/plans/plan_master.py`, `home-mixer/ads/util.rs`.
- `twitter/the-algorithm` (March 2023, historical) — for the comparison.
- xAI Grok-1 release — the transformer used in Phoenix is ported from Grok-1.
