Fix source-missing auto-merge and remove Pink Floyd examples from prompts
Auto-merge: when ingest pipeline detects "source file missing", now checks if the track already exists in the library by file_hash. If so, marks the pending entry as 'merged' instead of 'error' — avoiding stale error entries for files that were already successfully ingested in a previous run. Prompts: replaced Pink Floyd/The Wall/Have a Cigar examples in both normalize.txt and merge.txt with Deep Purple examples. The LLM was using these famous artist/album/track names as fallback output when raw metadata was empty or ambiguous, causing hallucinated metadata like "artist: Pink Floyd, title: Have a Cigar" for completely unrelated tracks. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,14 +4,14 @@ You are a music library artist merge assistant. You will receive a list of artis
|
||||
|
||||
You will receive a structured list like:
|
||||
|
||||
### Artist ID 42: "pink floyd"
|
||||
Album ID 10: "the wall" (1979)
|
||||
- 01. "In the Flesh?" [track_id=100]
|
||||
- 02. "The Thin Ice" [track_id=101]
|
||||
### Artist ID 42: "deep purple"
|
||||
Album ID 10: "machine head" (1972)
|
||||
- 01. "Highway Star" [track_id=100]
|
||||
- 02. "Maybe I'm a Leo" [track_id=101]
|
||||
|
||||
### Artist ID 43: "Pink Floyd"
|
||||
Album ID 11: "Wish You Were Here" (1975)
|
||||
- 01. "Shine On You Crazy Diamond (Parts I-V)" [track_id=200]
|
||||
### Artist ID 43: "Deep Purple"
|
||||
Album ID 11: "Burn" (1974)
|
||||
- 01. "Burn" [track_id=200]
|
||||
|
||||
## Your task
|
||||
|
||||
@@ -20,7 +20,7 @@ Determine if the artists are duplicates and produce a merge plan.
|
||||
## Rules
|
||||
|
||||
### 1. Canonical artist name
|
||||
- Use correct capitalization and canonical spelling (e.g., "pink floyd" → "Pink Floyd", "AC DC" → "AC/DC").
|
||||
- Use correct capitalization and canonical spelling (e.g., "deep purple" → "Deep Purple", "AC DC" → "AC/DC").
|
||||
- If the database already contains an artist with a well-formed name, prefer that exact form.
|
||||
- If one artist has clearly more tracks or albums, their name spelling may be more authoritative.
|
||||
- Fix obvious typos or casing errors.
|
||||
@@ -54,7 +54,7 @@ Determine if the artists are duplicates and produce a merge plan.
|
||||
|
||||
You MUST respond with a single JSON object, no markdown fences, no extra text:
|
||||
|
||||
{"canonical_artist_name": "...", "winner_artist_id": 42, "album_mappings": [{"source_album_id": 10, "canonical_name": "The Wall", "merge_into_album_id": null}, {"source_album_id": 11, "canonical_name": "Wish You Were Here", "merge_into_album_id": null}], "notes": "..."}
|
||||
{"canonical_artist_name": "...", "winner_artist_id": 42, "album_mappings": [{"source_album_id": 10, "canonical_name": "Machine Head", "merge_into_album_id": null}, {"source_album_id": 11, "canonical_name": "Burn", "merge_into_album_id": null}], "notes": "..."}
|
||||
|
||||
- `canonical_artist_name`: the single correct name for this artist after merging.
|
||||
- `winner_artist_id`: the integer ID of the artist whose record survives (must be one of the IDs provided).
|
||||
|
||||
@@ -3,10 +3,10 @@ You are a music metadata normalization assistant. Your job is to take raw metada
|
||||
## Rules
|
||||
|
||||
1. **Artist names** must use correct capitalization and canonical spelling. Examples:
|
||||
- "pink floyd" → "Pink Floyd"
|
||||
- "deep purple" → "Deep Purple"
|
||||
- "AC DC" → "AC/DC"
|
||||
- "Guns n roses" → "Guns N' Roses"
|
||||
- "Led zepplin" → "Led Zeppelin" (fix common misspellings)
|
||||
- "guns n roses" → "Guns N' Roses"
|
||||
- "led zepplin" → "Led Zeppelin" (fix common misspellings)
|
||||
- "саша скул" → "Саша Скул" (fix capitalization, keep the language as-is)
|
||||
- If the database already contains a matching artist (same name in any case or transliteration), always use the existing canonical name exactly. For example, if the DB has "Саша Скул" and the file says "саша скул" or "Sasha Skul", use "Саша Скул".
|
||||
- **Compound artist fields**: When the artist field or path contains multiple artist names joined by "и", "and", "&", "/", ",", "x", or "vs", you MUST split them. The "artist" field must contain ONLY ONE primary artist. All others go into "featured_artists". If one of the names already exists in the database, prefer that one as the primary artist.
|
||||
@@ -43,12 +43,12 @@ You are a music metadata normalization assistant. Your job is to take raw metada
|
||||
- Preserve original language for non-English albums.
|
||||
- If the database already contains a matching album under the same artist, use the existing name exactly.
|
||||
- Do not alter the creative content of album names (same principle as track titles).
|
||||
- **Remastered editions**: A remastered release is a separate album entity, even if it shares the same title and tracks as the original. If the tags or path indicate a remaster (e.g., "Remastered", "Remaster", "REMASTERED" anywhere in tags, filename, or path), append " (Remastered)" to the album name if not already present, and use the year of the remaster release (not the original). Example: original album "The Wall" (1979) remastered in 2011 → album: "The Wall (Remastered)", year: 2011.
|
||||
- **Remastered editions**: A remastered release is a separate album entity, even if it shares the same title and tracks as the original. If the tags or path indicate a remaster (e.g., "Remastered", "Remaster", "REMASTERED" anywhere in tags, filename, or path), append " (Remastered)" to the album name if not already present, and use the year of the remaster release (not the original). Example: original album "Paranoid" (1970) remastered in 2009 → album: "Paranoid (Remastered)", year: 2009.
|
||||
|
||||
4. **Track titles** must use correct capitalization, but their content must be preserved exactly.
|
||||
- Use title case for English titles.
|
||||
- Preserve original language for non-English titles.
|
||||
- Remove leading track numbers if present (e.g., "01 - Have a Cigar" → "Have a Cigar").
|
||||
- Remove leading track numbers if present (e.g., "01 - Smoke on the Water" → "Smoke on the Water").
|
||||
- **NEVER remove, add, or alter words, numbers, suffixes, punctuation marks, or special characters in titles.** Your job is to fix capitalization and encoding, not to edit the creative content. If a title contains unusual punctuation, numbers, apostrophes, or symbols — they are intentional and must be kept as-is.
|
||||
- If all tracks in the same album follow a naming pattern (e.g., numbered names like "Part 1", "Part 2"), preserve that pattern consistently. Do not simplify or truncate individual track names.
|
||||
|
||||
|
||||
@@ -188,8 +188,20 @@ async fn reprocess_pending(state: &Arc<AppState>) -> anyhow::Result<usize> {
|
||||
}
|
||||
}
|
||||
} else {
|
||||
tracing::error!(id = %pt.id, "Source file missing: {:?}", source);
|
||||
db::update_pending_status(&state.pool, pt.id, "error", Some("Source file missing")).await?;
|
||||
// Source file is gone — check if already in library by hash
|
||||
let in_library: (bool,) = sqlx::query_as(
|
||||
"SELECT EXISTS(SELECT 1 FROM tracks WHERE file_hash = $1)"
|
||||
)
|
||||
.bind(&pt.file_hash)
|
||||
.fetch_one(&state.pool).await.unwrap_or((false,));
|
||||
|
||||
if in_library.0 {
|
||||
tracing::info!(id = %pt.id, "Source missing but track already in library — merging");
|
||||
db::update_pending_status(&state.pool, pt.id, "merged", None).await?;
|
||||
} else {
|
||||
tracing::error!(id = %pt.id, "Source file missing: {:?}", source);
|
||||
db::update_pending_status(&state.pool, pt.id, "error", Some("Source file missing")).await?;
|
||||
}
|
||||
continue;
|
||||
};
|
||||
|
||||
|
||||
Reference in New Issue
Block a user