Canonicalization
Canonicalization is the process of converting a language tag to its preferred form according to the BCP-47 standard. The ally-bcp-47
library provides robust canonicalization capabilities.
What is Canonicalization?
Canonicalization ensures that language tags:
- Follow consistent casing rules
- Use preferred subtags instead of deprecated ones
- Remove redundant information
- Follow a standard order
Basic Canonicalization
The simplest way to canonicalize a tag is using the canonicalizeTag
function:
import { canonicalizeTag } from "ally-bcp-47";
// Case normalization
console.log(canonicalizeTag("en-us")); // 'en-US'
console.log(canonicalizeTag("ZH-hans-cn")); // 'zh-Hans-CN'
// Redundant script removal
console.log(canonicalizeTag("en-Latn-US")); // 'en-US'
// Preferred tag substitution
console.log(canonicalizeTag("i-navajo")); // 'nv'
console.log(canonicalizeTag("zh-cmn-Hans-CN")); // 'zh-Hans-CN'
Case Normalization Rules
BCP-47 defines specific case rules for different subtag types:
- Language subtags: lowercase (
en
, notEN
) - Script subtags: titlecase (
Latn
, notlatn
orLATN
) - Region subtags: uppercase (
US
, notus
orUs
) - Variant subtags: lowercase
- Extension singletons: lowercase
- Extension values: lowercase
import { canonicalizeTag } from "ally-bcp-47";
// Mixed case tag
const mixedCase = "eN-lATn-uS";
console.log(canonicalizeTag(mixedCase)); // 'en-Latn-US'
Redundant Script Suppression
Some language tags have a "suppress-script" value defined in the registry. This means that the script is obvious from the language itself and can be omitted:
import { canonicalizeTag } from "ally-bcp-47";
// English is implicitly written in Latin script
console.log(canonicalizeTag("en-Latn")); // 'en'
// Japanese is implicitly written in Japanese script
console.log(canonicalizeTag("ja-Jpan")); // 'ja'
// Chinese requires a script code as there is no default
console.log(canonicalizeTag("zh-Hans")); // 'zh-Hans'
Extension Canonicalization
Extensions are sorted by their singleton and their values are sorted:
import { canonicalizeTag } from "ally-bcp-47";
// Extensions are reordered by singleton
console.log(canonicalizeTag("en-z-abc-u-ca-gregory")); // 'en-u-ca-gregory-z-abc'
// Extension values are sorted
console.log(canonicalizeTag("en-u-nu-latn-ca-gregory")); // 'en-u-ca-gregory-nu-latn'
Preferred Value Substitution
Some subtags have preferred values that should be used instead:
import { canonicalizeTag } from "ally-bcp-47";
// 'sgn-GR' should be 'gss'
console.log(canonicalizeTag("sgn-GR")); // 'gss'
// 'zh-cmn' should be 'zh'
console.log(canonicalizeTag("zh-cmn")); // 'zh'
Canonicalization in the Validation Process
When you use validateLanguageTag
, the result includes the canonicalized tag:
import { validateLanguageTag } from "ally-bcp-47";
const result = validateLanguageTag("en-us");
console.log(result.tag.tag); // 'en-US'
Programmatically Working with Canonical Forms
You might want to canonicalize tags before storing or comparing them:
import { canonicalizeTag, isValid } from "ally-bcp-47";
function processUserLanguage(userInput) {
if (!isValid(userInput)) {
return { error: "Invalid language tag" };
}
const canonical = canonicalizeTag(userInput);
// Store the canonical form in the database
saveToDatabase(canonical);
return { success: true, canonical };
}
// These are functionally equivalent after canonicalization
console.log(canonicalizeTag("zh-cmn-Hans-CN")); // 'zh-Hans-CN'
console.log(canonicalizeTag("zh-Hans-CN")); // 'zh-Hans-CN'
Canonicalization for Comparison
Canonicalization helps with language tag comparison:
import { canonicalizeTag } from "ally-bcp-47";
function areTagsEquivalent(tag1, tag2) {
return canonicalizeTag(tag1) === canonicalizeTag(tag2);
}
console.log(areTagsEquivalent("en-us", "en-US")); // true
console.log(areTagsEquivalent("zh-cmn-Hans-CN", "zh-Hans-CN")); // true
console.log(areTagsEquivalent("en-Latn", "en")); // true
console.log(areTagsEquivalent("en-US", "en-GB")); // false
Common Canonicalization Scenarios
Consistent Database Storage
import { canonicalizeTag, isValid } from "ally-bcp-47";
function prepareForStorage(tag) {
if (!isValid(tag)) {
throw new Error(`Invalid language tag: ${tag}`);
}
return canonicalizeTag(tag);
}
User Input Normalization
import { canonicalizeTag, isValid } from "ally-bcp-47";
function normalizeUserLanguageInput(input) {
if (!isValid(input)) {
return {
isValid: false,
message: "Please enter a valid language tag (e.g., en-US)",
};
}
return {
isValid: true,
original: input,
normalized: canonicalizeTag(input),
};
}
API Response Standardization
import { canonicalizeTag } from "ally-bcp-47";
function standardizeApiResponse(data) {
if (data.languageTag) {
data.languageTag = canonicalizeTag(data.languageTag);
}
if (data.supportedLanguages) {
data.supportedLanguages = data.supportedLanguages.map((tag) =>
canonicalizeTag(tag)
);
}
return data;
}
Next Steps
- ADA Compliance - Learn how this library helps with ADA compliance
- API Reference - View the complete API documentation for canonicalization