Although the codes are case insensitive, they are commonly written lowercased, but this is merely a convention. Note also that, where ISO offers ...
Accesskeynskipstoinpagenavigation.Skiptothecontentstart
Relatedlinks
RFC3066:TagsfortheIdentificationofLanguages
ISO639:CodesfortheRepresentationofNamesofLanguages
ISO3166:CodesforCountryNames
IANAlanguagetagregistry
AuthoringTechniquesforXHTML&HTMLInternationalization:Specifyingthelanguageof
content1.0
W3CI18Nresourceindex:Languagedeclarationsandlanguagenegotiation
Internationalization
Home
About
Groups
Topics
Techniques
Resources
News
LanguagetagsinHTMLandXML
onthispage: RFC3066 -
Specialprimarysubtags -IANA-registeredtags -Matchingtags -
Issueswithtags -Bythe
way -Furtherreading
Terminology
Inthisarticlewerefertothevalueofalanguageattributesuchasfr-CAasalanguagetag.
ThefrandCApartsarereferredtoassubtagswhendescribedaspartsofatag.When
describedasmembersofanISOlistoflanguagesorcountries,frandCAarereferredtoascodes.
Languagetagscanbe(andshouldbe)usedtoindicatethelanguageoftextinHTMLandXMLdocuments.For
HTML4,languagetagsarespecifiedwiththelangattribute.For
XML,languagetagsaregiveninthexml:langattribute.Inbothcases,language
informationisinheritedalongthedocumenthierarchy,i.e.ithastobegivenonlyonceifthewholedocumentisinonelanguage,andlanguage
informationnests,i.e.innerattributesoverwriteouterattributes.
LanguagetagsaredefinedinRFC3066,whichobsoletestheolder
RFC1766.XMLhasbeenupdatedtouseRFC3066byan
erratum.RFC3066isbasedon
ISO-639two-letterandthreeletterlanguagecodes,andon
ISO-3166two-lettercountrycodes.RFC1766did
notincludethree-letterlanguagecodes.
Examplesinclude:
Code
Language
Explanation
en
English
ISO-639two-letterlanguagecode
mas
Masai
ISO-639three-letterlanguagecode
fr-CA
FrenchasusedinCanada
ISO-639two-lettercodewithISO-3166two-lettercountrycode
en-scouse
EnglishLiverpudliandialectknownas'Scouse'
ISO-639two-letterlanguagecodewithaddition,IANA-registered
i-klingon
Klingon
IANA-registeredlanguagecode
x-pig-latin
PigLatin
Unregistered/Experimental
Languagetagsstartingwithi-aredefinedintheIANAregistryof
languagetags.Languagetagsstartingwithx-denoteexperimentaltagswithoutguaranteeforuniqueness.Thelistof
ISO-639two-letterandthree-letterlanguagecodesisprovidedbythe
ISO639-2RegistrationAuthority(LibraryofCongress,USA).
AccordingtoRFC3066,forlanguageswithbothatwo-letterandathree-lettercode,the
two-lettercodemustbeused.Thisalsosolvestheproblemofthoselanguagesthathavetwodifferentthree-lettercodes,becauseallofthemalso
haveatwo-lettercode.
XMLnowalsoprovidesameanstopreventinheritanceoflanguageusingtheemptystring,ie.
xml:lang=""
Essentially,thissays:Idonotwanttoassociateanylanguagewiththisinformation.
Theremainderofthisarticleprovidesadditionaldetailonhowtouselanguagetags.
RFC3066rules
RFC3066isthestandardthatdefineshowtouselanguagetagstoidentifylanguages.
Alanguagetagiscomposedofaprimarysubtag,followedbyzeroormoreadditionalsubtags,separatedbyhyphens.
Theprimarysubtagrepresentsalanguage(therearetwopossibleexceptions,i-andx-,
whicharedescribedbelow),andanyfollowingsubtagsservetoqualifythedialectorusageofthelanguage.Theselatter
subtagstypicallyrepresentcountries,dialectsorscripts.
ThefollowingexampleindicatesthatadocumentiswrittennotjustinEnglishbutinBritishEnglish,asopposedto,say,US
English.
Subtagsarecaseinsensitive;theycanincludethelettersanddigitsAtoZ,
atozand0to9;andtheymustbe8
charactersorlessinlength.
NotethattheHTMLspecificationstillrecommendstheuseofRFC1766foridentifyinglanguage.RFC3066isanupdateofRFC1766that
supersedesit,andthereisaplannederratuminplacefortheHTMLspecification,soyoushoulduseRFC3066despitewhattheHTMLspecification
currentlysays.
RFC3066merelyexpandsandclarifiesthepossibilitiesforspecifyinglanguages.IfyouhavebeenusingRFC1766you
shouldnotneedtomakeanychangestoyourtaginordertostartusingRFC3066.
AproposedsuccessortoRFC3066iscurrentlybeingdeveloped,butitaimsto
retainbackwardscompatibilitywithtagscreatedusingRFC3066.
Theprimarysubtag
Allsubtagsininitialpositionmustbe1,2or3lettersinlength.All2and3lettersubtagsinthispositionmustbelanguagecodes
fromISO639part2,whichdefinescodestorepresentlanguages.1lettersubtags
mustbeoneoftheprefixesi-orx-wewilldescribelater.
Althoughthecodesarecaseinsensitive,theyarecommonlywrittenlowercased,butthisismerelyaconvention.
Notealsothat,whereISOoffersachoicebetween2-letterand3-lettercodes,youshouldchoosethe2-letterone.Thisensuresthatfor
eachlanguage,asfaraspossible,auniquecodeisused.Olderdatausingtwo-lettercodes(basedonRFC1766,whichdidnotallowthree-letter
codes)doesnotneedtobechanged.Also,thequestionofwhichthree-lettercodetouseisavoided,sincethefewlanguagesthathavetwodifferent
three-lettercodesallhaveatwo-lettercode.
Additionalsubtags
Subtagscanbeaddedtoindicategeographic,dialectal,script,orotherrefinementstotheprimary(language)tag.Anynumberofsubtags
canfollowtheprimarytag,althoughitisunusualtoseemorethanone.
RFC3066specifiesthatany2-lettertagsinthesecondsubtagmustbe
ISO3166countrycodes.Therearenorulesfor
anythirdandsubsequentsubtagsthatareused.
Two-letterISOsubtagsindicatingcountryarecommonlywrittenuppercase,butthisisonlyaconvention.
Specialprimarysubtags
RFC3066definesacoupleofinstanceswherethelanguagetagmightnotbeginwithanISOlanguagecode.
Alanguagetagthatbeginswithi-isreservedforIANA-registeredlanguagetags.Examplesinclude
i-mingo
i-klingon
i-tao
Alanguagetagthatbeginswithx-providesamechanismforuser-definedlanguagetags.Thesecondtagmustbemore
thanoneletterlong,andmustnotbeoneofthefollowingreservedsubtags:AA,QM-QZ,XA-XZ,andZZ.Forexample:
x-mylanguage
Ofcourse,neitheroftheseapproachesshouldbeusedtoidentifyalanguageiftheapproachbasedoninitialtwo-orthree-letterISOcodes
isavailable.Thesemethodsrestrictorpreventinteroperablelanguagetagrecognition.
IANA-registeredlanguagetags
ItispossibletoregisterlanguagetagswithIANAusingthesubmissionprocess
describedinRFC3066.Thesetagscanhave3-to8-lettersubtagsinthesecondposition.
Whilethei-prefixisreservedspecificallyforIANAtags,notallIANAtagsbeginwithit.Forexample,anumber
ofChinesedialectshavebeenregisteredwithIANA.Theseincludezh-guoyu,zh-hakka,
zh-min,zh-min-nan,zh-wuu,etc.
RegisteringtagswithIANAisbetterthanusinguser-definedtagsbecauseitmaximizesthelikelihoodofinteroperability,duetothefact
thattheIANAtagsarevisibletoothers.Ontheotherhand,IANAtagsmaybedeprecatedasnewcodesareaddedtotheISOstandard.Forthisreason,
theremaybesomerisktolong-terminteroperabilitywhenusingcertainIANAregisteredtags.Thisisparticularlylikelytoapplytotagsbeginning
withthei-prefix.
IANAtagsthathavebeendeprecatedatthetimethistutorialwaspublishedincludeno-bok(Norwegian"Book
language"-useISO639nb),i-navajo(Navajo-useISO639nv),
i-lux(Luxembourgish-useISO639lb),andothers.
SomeparticularlyusefultagsregisteredwithIANAallowyoutospecifyTraditionalvs.SimplifiedChinese.Inthepastitwasnecessaryto
distinguishthetwobyusingsomethinglikezh-CN(MainlandChina)forSimplifiedChineseandzh-TW(Taiwan)forTraditionalChinese.Apartfromthe
factthatthisismislabelled,youcouldnotguaranteethatotherswouldrecognizetheseconventions,orevenfollowthem.Forexample,somepeople
usedzh-HKtorepresentTraditionalChinese.NowIANAmakesavailablethetagszh-Hansandzh-HantforSimplifiedandTraditionalChinese,
respectively.Thefollowingtwoparagraphsillustratetheuseofthesetags.
当世界需要沟通时,请用Unicode!
當世界需要溝通時,請用統一碼(Unicode)
Itisexpectedthatthesetagswillpersistfortheforeseeablefuture,soitwouldbegoodtousethemassoonaspossibleinorderto
improvefutureinteroperabilitysoonerratherthanlater.
Matchinglanguagetags
AccordingtoRFC3066'en-GB'shouldalsomatch'en'.Forexample,thefollowingCSScodecolorsallEnglishtextredinbrowsersthat
supportthepseudo-attribute:lang.
:lang(en){color:red;}
Inthefollowingcode,thetextdescribedaslang="en-GB"willbered.
Enjanvier,touteslesboutiquesdeLondresaffichentdespanneaux
SALE,maisenfaitcesmagasinssontbienpropres!
Ontheotherhand,giventhefollowingCSSdeclaration,
:lang(en-GB){color:red;}
theword'SALE'shouldnotberedinthefollowingcode.
Enjanvier,touteslesboutiquesdeLondresaffichentdespanneaux
SALE,maisenfaitcesmagasinssontbienpropres!
Note,however,thatthisisnotthecaseforlanguagenegotiationonanApacheserver.Ifyouwanttobeautomatically
directedtoapageexample.fr.htmlandyourbrowsersettingsonlystateapreferencefor'fr-CA',youwillneedtoadd'fr'toyoursettings.(See
Settinglanguagepreferencesinabrowser.)
Issueswithlanguagetags
AlthoughRFC3066languagetagsworkwellmuchofthetime,therearestillsomeissues:
ManymorecodesareneededthanthoseprovidedbyISOtocovertheapproximately6,000languagesoftheworld.
Theydon'tcovertheneedstoexpressgeneralregions;forexample,thereisstillnotagforthegeneralizedLatin-AmericanSpanish
thatmanyorganizationsusetocreateSpanishcontent.
Thereissomelackofclaritybetweentheuseoflanguagetagvaluesfordesignatinglanguagevs.locale.'Locales'arecombinationsof
languageplusgeographicalregiontypicallyusedtosetsuchthingsasdateandtimedefaultsinsoftware.
Thereisaneed,sometimes,todistinguishthescriptused,inadditiontothelanguage.Forexample,Mongolianmightbewrittenin
MongolianscriptorCyrillic;CroatianmightbewritteninLatinorCyrillic;...
Peoplearecurrentlyworkingonsolutionstotheseissues,includingpeoplefromISOTC37,SIL,andW3C,etc.The
proposedsuccessortoRFC3066isalsotargetingtheseissues.
Bytheway...
LanguagetagsforHTMLwerefirstformallydefinedinRFC2070,F.Yergeau,et.al.InternationalizationoftheHypertextMarkupLanguage.RFC2070wasincorporatedinto
HTML4,andhasbeenreclassifiedashistoric.
NotechangestoISOlanguagecodes,inparticular
thosein1989(withdrawingiw,in,andji,replacingthembyhe,id,andyi,andaddingse,iu,ug,andza).
Unicodeprovidescross-referencestoMicrosoftandApplecodes.
ManyotherW3CandWeb-relatedspecificationsuselanguagetags:
XHTML1.0,reformulatingHTMLintermsofXML,which
advisestouseboththeHTMLlangattributeandtheXMLxml:langattribute,
withthelatertakingprecedenceincasethereshouldbeanydifferences.
HTTPuseslanguagetagsintheAccept-LanguageandContent-Languageheaders.
SMILandSVGcanuselanguagetagsinthestatement.
CSSandXSLuselanguagetagsfordetailedstylecontrol.
Notealsothatlanguageinformationcanbeattachedtoobjectssuchasimagesandincludedaudiofiles.
Furtherreading
RFC3066:TagsfortheIdentificationofLanguageshttp://www.ietf.org/rfc/rfc3066.txt
ISO639:CodesfortheRepresentationofNamesofLanguages
http://www.loc.gov/standards/iso639-2/langcodes.html
ISO3166:CodesforCountryNames
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
IANAlanguagetagregistryhttp://www.iana.org/assignments/language-tags
AuthoringTechniquesforXHTML&HTMLInternationalization:Specifyingthelanguageofcontent
1.0http://www.w3.org/TR/i18n-html-tech-lang/
W3CI18Nresourceindex:Languagedeclarationsandlanguagenegotiation
http://www.w3.org/International/resource-index.html#lang
Authors:MartinDürst&RichardIshida(W3C).
Lastupdate2005-05-1320:06GMT
Forasummaryofsignificantchanges,searchforthetitleinthechangelog.
Copyright©2005W3C®(MIT,ERCIM,Keio),AllRights
Reserved.W3Cliability,trademark,documentuse
andsoftwarelicensingrulesapply.Yourinteractionswiththissitearein
accordancewithourpublicandMemberprivacystatements.