Working with Parsed Tags
This page demonstrates how to work with the structured data returned by the parseTag
function in the ally-bcp-47
library.
Basic Parsing
The first step is to parse a language tag into its components:
import { parseTag } from "ally-bcp-47";
// Parse a simple language tag
const tag = parseTag("en-US");
console.log(tag);
// Output:
// {
// tag: 'en-US',
// language: 'en',
// region: 'US'
// }
Accessing Tag Components
Once you have a parsed tag, you can access its components directly:
import { parseTag } from "ally-bcp-47";
const tag = parseTag("zh-Hans-CN");
// Access individual components
console.log(`Language: ${tag.language}`); // Language: zh
console.log(`Script: ${tag.script}`); // Script: Hans
console.log(`Region: ${tag.region}`); // Region: CN
// Handle optional components safely
const simpleTag = parseTag("fr");
console.log(`Language: ${simpleTag.language}`); // Language: fr
console.log(`Script: ${simpleTag.script || "None"}`); // Script: None
console.log(`Region: ${simpleTag.region || "None"}`); // Region: None
Working with Variants
Variant subtags are returned as an array:
import { parseTag } from "ally-bcp-47";
const tag = parseTag("de-DE-1901-1996");
if (tag.variants && tag.variants.length > 0) {
console.log("Variants:");
tag.variants.forEach((variant) => {
console.log(`- ${variant}`);
});
}
// Output:
// Variants:
// - 1901
// - 1996
Working with Extensions
Extension subtags are organized by singleton in an object:
import { parseTag } from "ally-bcp-47";
const tag = parseTag("en-US-u-ca-gregory-nu-latn-t-m0-iso-i-klingon");
// Unicode extensions (u)
if (tag.extensions && tag.extensions.u) {
console.log("Unicode extensions:");
for (let i = 0; i < tag.extensions.u.length; i += 2) {
const key = tag.extensions.u[i];
const value = tag.extensions.u[i + 1] || "";
console.log(`- ${key}: ${value}`);
}
}
// Output:
// Unicode extensions:
// - ca: gregory
// - nu: latn
// Transformed content extensions (t)
if (tag.extensions && tag.extensions.t) {
console.log("Transformed content extensions:");
console.log(tag.extensions.t);
}
// Output:
// Transformed content extensions:
// ['m0', 'iso']
// Other extensions
if (tag.extensions && tag.extensions.i) {
console.log("Other extensions (i):");
console.log(tag.extensions.i);
}
// Output:
// Other extensions (i):
// ['klingon']
Working with Private Use Subtags
Private use subtags are returned as an array:
import { parseTag } from "ally-bcp-47";
const tag = parseTag("en-x-private1-private2");
if (tag.privateuse && tag.privateuse.length > 0) {
console.log("Private use subtags:");
console.log(tag.privateuse);
}
// Output:
// Private use subtags:
// ['private1', 'private2']
Modifying and Reconstructing Tags
You can create modified versions of language tags by parsing, modifying components, and then reconstructing:
import { parseTag, validateLanguageTag } from "ally-bcp-47";
function changeRegion(tag, newRegion) {
const parsed = parseTag(tag);
if (!parsed) return null;
// Create a modified version
const parts = [];
parts.push(parsed.language);
if (parsed.extlang) parts.push(parsed.extlang);
if (parsed.script) parts.push(parsed.script);
parts.push(newRegion);
if (parsed.variants) parts.push(...parsed.variants);
// Handle extensions and privateuse if needed
// ...
const newTag = parts.join("-");
// Validate the new tag before returning
const result = validateLanguageTag(newTag);
return result.isValid ? newTag : null;
}
// Example usage
const americanEnglish = "en-US";
const britishEnglish = changeRegion(americanEnglish, "GB");
console.log(britishEnglish); // en-GB
// More complex example
const americanChinese = "zh-Hans-US";
const chineseChinese = changeRegion(americanChinese, "CN");
console.log(chineseChinese); // zh-Hans-CN
Filtering and Grouping Language Tags
Parsed tags are useful for filtering and grouping:
import { parseTag } from "ally-bcp-47";
// A collection of language tags
const languageTags = [
"en-US",
"en-GB",
"en-AU",
"fr-FR",
"fr-CA",
"fr-BE",
"es-ES",
"es-MX",
"es-AR",
];
// Group tags by language
const groupedByLanguage = {};
languageTags.forEach((tagString) => {
const tag = parseTag(tagString);
if (tag) {
if (!groupedByLanguage[tag.language]) {
groupedByLanguage[tag.language] = [];
}
groupedByLanguage[tag.language].push(tagString);
}
});
console.log(groupedByLanguage);
// Output:
// {
// 'en': ['en-US', 'en-GB', 'en-AU'],
// 'fr': ['fr-FR', 'fr-CA', 'fr-BE'],
// 'es': ['es-ES', 'es-MX', 'es-AR']
// }
// Filter for a specific region
const northAmericanTags = languageTags.filter((tagString) => {
const tag = parseTag(tagString);
return (
tag && (tag.region === "US" || tag.region === "CA" || tag.region === "MX")
);
});
console.log(northAmericanTags);
// Output: ['en-US', 'fr-CA', 'es-MX']
Language Matching
Parsed tags are essential for implementing language matching algorithms:
import { parseTag } from "ally-bcp-47";
// Function to calculate how closely two language tags match
function calculateTagMatch(userTag, availableTag) {
const userParsed = parseTag(userTag);
const availableParsed = parseTag(availableTag);
if (!userParsed || !availableParsed) return 0;
let score = 0;
// Match language (most important)
if (userParsed.language === availableParsed.language) {
score += 100;
// Match region if present
if (
userParsed.region &&
availableParsed.region &&
userParsed.region === availableParsed.region
) {
score += 50;
}
// Match script if present
if (
userParsed.script &&
availableParsed.script &&
userParsed.script === availableParsed.script
) {
score += 30;
}
// Match variants if present
if (userParsed.variants && availableParsed.variants) {
// Count matching variants
const userVariantSet = new Set(userParsed.variants);
const matchingVariants = availableParsed.variants.filter((v) =>
userVariantSet.has(v)
);
score += matchingVariants.length * 10;
}
}
return score;
}
// Example usage
const userPreference = "zh-Hans-CN";
const availableTags = ["zh-CN", "zh-Hans-CN", "zh-Hant-TW", "zh", "en-US"];
// Sort available tags by match score
const rankedTags = availableTags
.map((tag) => ({
tag,
score: calculateTagMatch(userPreference, tag),
}))
.sort((a, b) => b.score - a.score);
console.log("Ranked matches for", userPreference);
rankedTags.forEach(({ tag, score }) => {
console.log(`${tag}: ${score}`);
});
// Output:
// Ranked matches for zh-Hans-CN
// zh-Hans-CN: 180
// zh-CN: 150
// zh: 100
// zh-Hant-TW: 100
// en-US: 0
Fallback Chains
Create fallback chains from parsed tags:
import { parseTag, validateLanguageTag } from "ally-bcp-47";
// Generate a chain of fallback language tags
function getFallbackChain(originalTag) {
const result = [];
const parsed = parseTag(originalTag);
if (!parsed) return result;
// Add the original tag
result.push(originalTag);
let current = { ...parsed };
// Remove privateuse
if (current.privateuse) {
const copy = { ...current };
delete copy.privateuse;
// Reconstruct the tag
const parts = [copy.language];
if (copy.extlang) parts.push(copy.extlang);
if (copy.script) parts.push(copy.script);
if (copy.region) parts.push(copy.region);
if (copy.variants) parts.push(...copy.variants);
// Handle extensions if needed
// ...
const newTag = parts.join("-");
if (validateLanguageTag(newTag).isValid && newTag !== originalTag) {
result.push(newTag);
}
}
// Remove variants
if (current.variants && current.variants.length) {
const copy = { ...current };
delete copy.variants;
// Reconstruct
const parts = [copy.language];
if (copy.extlang) parts.push(copy.extlang);
if (copy.script) parts.push(copy.script);
if (copy.region) parts.push(copy.region);
const newTag = parts.join("-");
if (validateLanguageTag(newTag).isValid && !result.includes(newTag)) {
result.push(newTag);
}
}
// Remove region
if (current.region) {
const copy = { ...current };
delete copy.region;
// Reconstruct
const parts = [copy.language];
if (copy.extlang) parts.push(copy.extlang);
if (copy.script) parts.push(copy.script);
const newTag = parts.join("-");
if (validateLanguageTag(newTag).isValid && !result.includes(newTag)) {
result.push(newTag);
}
}
// Remove script
if (current.script) {
const copy = { ...current };
delete copy.script;
// Reconstruct
const parts = [copy.language];
if (copy.extlang) parts.push(copy.extlang);
const newTag = parts.join("-");
if (validateLanguageTag(newTag).isValid && !result.includes(newTag)) {
result.push(newTag);
}
}
// Remove extlang
if (current.extlang) {
const copy = { ...current };
delete copy.extlang;
// Reconstruct
const newTag = copy.language;
if (validateLanguageTag(newTag).isValid && !result.includes(newTag)) {
result.push(newTag);
}
}
return result;
}
// Example usage
const tag = "zh-Hans-CN-variant-x-private";
const fallbacks = getFallbackChain(tag);
console.log("Fallback chain:");
fallbacks.forEach((tag) => console.log(`- ${tag}`));
// Output:
// Fallback chain:
// - zh-Hans-CN-variant-x-private
// - zh-Hans-CN-variant
// - zh-Hans-CN
// - zh-Hans
// - zh
Conclusion
Parsed language tags provide a structured way to access, manipulate, and reason about BCP-47 language tags. By using the parseTag
function, you can:
- Access individual components of a language tag
- Work with complex tags containing variants, extensions, and private use subtags
- Create modified versions of existing tags
- Implement language matching and fallback logic
- Filter and group collections of language tags
For more information, see the parseTag
API documentation.