Language tags in HTML and XML - World Wide Web ...

文章推薦指數: 80 %
投票人數:10人

These codes come from, and are kept up to date with, ISO 639 language codes. Because RFC 3066 didn't provide a list of valid subtags and just ... Relatedlinks IANALanguageSubtagRegistry Subtagsearchtool BCP47 Intendedaudience: XHTML/HTMLcoders(usingeditorsorscripting),scriptdevelopers(PHP,JSP,etc.),schemadevelopers(DTDs,XMLSchema,RelaxNG,etc.),XSLTdevelopers,Webprojectmanagers,standardsimplementers,andanyonewhoneedsanoverviewofhowlanguagetagsareconstructedusingBCP47. Overview Terminology Inthisarticlewerefertothevalueofalanguageattributesuchasfr-CAasalanguagetag.ThefrandCApartsarereferredtoassubtagswhendescribedaspartsofatag.WhendescribedasmembersofanISOlistoflanguagesorcountries,frandCAarereferredtoascodes. LanguagetagsareusedtoindicatethelanguageoftextorotheritemsinHTMLandXMLdocuments.UsethelangattributetospecifylanguagetagsinHTML,andthexml:langattributeforXML Inbothcases,language informationisinheritedbyelementsinsidetheonewherethedeclarationwasmade,unlessoneofthoseelementsdeclaresadifferentlanguage(inthesameway). RFCsarewhattheIETFcallsitsspecifications.EachRFChasauniquenumber.Unfortunately,itisnotpossibletotell,whenreadingRFC1766orRFC3066thatthesespecificationshavebeenobsoletedandreplacedbyotherspecifications. LanguagetagsyntaxisdefinedbytheIETF'sBCP47.BCPstandsfor'BestCurrent Practice',andisapersistentnameforaseriesofRFCswhosenumberschangeastheyareupdated.The latestRFCdescribinglanguagetagsyntaxisRFC5646,TagsfortheIdentificationof Languages,anditobsoletestheolderRFCs4646,3066and1766. YouusedtofindsubtagsbyconsultingthelistsofcodesinvariousISOstandards,but nowyoucanfindallsubtagsintheIANALanguageSubtagRegistry.Wewilldescribethenewregistrybelow. Note!Ifyouwantstep-by-stepguidanceforchoosingalanguagetag,youshouldreadChoosingalanguagetag.Whatfollowshereprovidesmoreofahigh-leveloverviewofthesyntaxandconceptsinvolvedinlanguagetags,asdescribedbyBCP47. Mostlanguagetagsconsistofatwo-orthree-letterlanguagesubtag.Oftenthisisfollowedbyatwo-letterorthree-digitregion subtag.RFC5646alsoallowsforanumberofadditionalsubtags,whereneeded.Thesewillbeexplainedbrieflyinthenextsection,andinclude extendedlanguage,script,variant,extensionandprivate-usesubtags. Thegoldenrulewhencreatinglanguagetagsistokeepthetagasshortaspossible.Avoidregion,scriptorother subtagsexceptwheretheyaddusefuldistinguishinginformation.Forinstance,usejaforJapaneseandnotja-JP,unless thereisaparticularreasonthatyouneedtosaythatthisisJapaneseasspokeninJapan,ratherthanelsewhere. Examples: Code Language Subtags en English language mas Masai language fr-CA FrenchasusedinCanada language+region es-419 SpanishasusedinLatinAmerica language+region zh-Hans ChinesewrittenwithSimplifiedscript language+script HTMLandXMLalsoprovideameanstopreventinheritanceoflanguageusingtheemptystring,ie.xml:lang="".Essentially,thissays:Idonotwanttoassociateanylanguagewiththisinformation. Theremainderofthisarticleprovidesadditionaldetailonhowtoconstructlanguagetags. Constructinglanguagetags SomeofthekeydifferencesbetweenRFC5646andearlierspecificationssuchasRFC3066are: thereisjustoneplacetolookforvalidsubtags,thenewIANA registry subtagshavefixedpositionsandlengths,whichmakesforeasiermatchingoflanguagetags thereismoreflexibilityaroundthepotentialcomponentsofalanguagetag. RFC3066essentiallyallowedyoutocomposelanguagetagsthatwereeitheralanguage codeonitsown,alanguagecodeplusacountrycode,oroneofasmallnumberofspeciallyregisteredvaluesintheIANAlanguagetagregistry. RFC5646catersformoretypesofsubtag,andallowsyoutocombinethemin variousways.Whilethismayappeartomakelifemuchmorecomplicated,generallyspeakingchoosinglanguagetagswillcontinuetobeasimplematter -however,whereyouneedadditionalpoweritwillbeavailabletoyou.Infact,formostpeople,RFC5646shouldactuallymakelifesimplerin anumberofways–foronething,thereisonlyoneplaceyouneedtolooknowforvalidsubtags. Althoughitprovidessomeadditionaloptionsforidentifyingcommonlanguagevariations,RFC5646includesallof thetagsthatwerepreviouslyvalid.IfyouhavebeenusingRFC1766,RFC3066,orRFC4646youdonotneedtomakeanychangestoyourtags. Thelistbelowshowsthevarioustypesofsubtagthatareavailable.Wewillworkourwaythroughtheseandhowtheyareusedinthe sectionsthatfollow. language-extlang-script-region-variant-extension-privateuse Theentriesintheregistryfollowcertainconventionswithregardtoupperandlowerletter-casing.Forexample,languagetagsarelowercase, alphabeticregionsubtagsareuppercase,andscripttagsbeginwithaninitialcapital.Thisisonlyaconvention!Whenyouusethesesubtagsyou arefreetodoasyoulike,unlessyouareconstrainedbytherulesofthesystemyouareworkingwith.ForHTMLandXMLlanguagemarkup,thecaseshouldnotmatter. Usingthesubtagregistry Asmentionedabove,youusedtofindsubtagsbyconsultingthelistsofcodesinvariousISOstandards,butnowyoucanfindall subtagsinoneplace.TheIANAregistrylooksalittlecomplicatedatfirst, comparedtotheISOcodelists,butitiseasyenoughtouseonceyouunderstanditsstructure. Theregistryisalongtextfile.Tofindalanguagesubtag,searchthepageforthenameofthatlanguage,inEnglish.Ifwesearch for'French',wefindarecordthatlookslikethis: %% Type:language Subtag:fr Description:French Added:2005-10-16 Suppress-Script:Latn %% Notethatthetypeofthisrecordislanguage.WhatyouarelookingforisthecodelabeledSubtag,whichindicatesavalueoffr. Youcanfindothertagsinthesameway.Forexample,tocreateatagfr-CA(FrenchasusedinCanada),youwouldnextsearchforCanada,andcheckthatyouhadfoundatagoftyperegion. Thereare,however,someadditionalthingsyouneedtobearinmindwhenchoosingsubtags.Forexample,youshouldavoidsubtagsthataredescribedintheregistryasredundantordeprecated,andyouneedtousevariantsubtagsincombinationwithcertainotherprescribedsubtags.Formoreinformationaboutchoosingsubtags,readChoosingalanguagetag. Thereisalsoanunofficial,user-friendlytoolforsearchingthe registry. Thefollowingsectionswillgiveyoumoredetailaboutspecificsubtags. Theprimarylanguagesubtag Languagesubtags en ast ReadmoreintheBCP47spec: 2.2.1PrimaryLanguageSubtag 4.1ChoiceofLanguageTag 4.1.1TaggingEncompassedLanguages Alllanguagetagsmustbeginwithaprimarylanguagesubtag. Examplesofsimple,language-onlylanguagetagsinclude: en(English) ast(Asturian-notwo-lettercodeexistsforAsturianintheISOlists) Thesecodescomefrom,andarekeptuptodatewith,ISO639languagecodes. BecauseRFC3066didn'tprovidealistofvalidsubtags andjustreferreduserstoISO639,therewassometimesconfusionabouthowtotaglanguageswhentheISOcodelistscontainedbothtwo-letterand three-lettercodes(andsometimesmorethanonethree-lettercode).NowallvalidsubtagsarelistedinasingleIANAregistry,whichadoptsonlyonevaluefromtheISOlistsperlanguage.If atwo-letterISOcodeisavailable,thiswillbetheoneintheregistry.Otherwisetheregistrywillcontainonethree-lettercode.Thisshouldmake thingssimpler. WhenRFC5646waspublished,over7,000newISO639-3three-lettercodeswereaddedtotheSubtagRegistry. ThisisanexampleoftheprimarylanguagesubtagforSpanish,es,intheregistry: %% Type:language Subtag:es Description:Spanish Description:Castilian Added:2005-10-16 Suppress-Script:Latn %% Althoughthecodesarecaseinsensitive,theyarecommonlywrittenlowercased,butthisismerelyaconvention. Theextendedlanguagesubtag Extlangsubtags zh-yue ar-afb ReadmoreintheBCP47spec: 2.2.2ExtendedLanguageSubtags 4.1.2UsingExtendedLanguageSubtags Wewillrefertoextendedlanguagesubtagsasextlangsubtags.Anextlangsubtagmustalwaysbeprecededbyaspecificprimarylanguagesubtag,therecanonlybeoneinalanguagetag,anditcomesbeforeanyothersubtags. Examplesoflanguagetagsincludingextlangsubtagsare: zh-yue(CantoneseChinese) ar-afb(GulfArabic) Language+extlangcombinationsareprovidedtoaccommodatelegacylanguagetagforms,however,thereisasinglelanguagesubtagavailableforeverylanguage+extlangcombination.Thatlanguagesubtagshouldbeusedratherthanthelanguage+extlangcombination,wherepossible.Forexample,useyueratherthanzh-yueforCantonese,andafbratherthanar-afbforGulfArabic,ifyoucan. Extlangsubtagsarealwaysthreeletterslong.EachextlangentryintheregistrycontainsaPrefixfieldthatspecifiesthelanguagethatmustprecedetheextlangsubtag.EntriesalsoincludeaPreferred-Valuefieldthatindicatestheequivalentlanguagetag. ThisisanexampleoftheextlangcodeforGulfArabic,afb,intheregistry: %% Type:extlang Subtag:afb Description:GulfArabic Added:2009-07-29 Preferred-Value:afb Prefix:arMacrolanguage:ar %% MacrolanguagesTheprimarylanguagesubtagsusedwithanextlangsubtagareknownasmacrolanguages,andencompassanumberoflanguageswithmorespecificprimarylanguagesubtags.Themacrolanguagesubtagcanbeusedonitsown,butunlessthereissomeconventionaboutitsmeaninginthecontextwhereitisused,itisnotnecessarilypreciseenough. Forexample,zhmeansChinese,butitcoversmanyChinesedialects,oftenmutuallyincomprehensible.Whenzhisusedonitsown,itisusuallyusedtomeanthepredominantlanguageintheencompassedrange,althoughthisisnotexplicitlyspecifiedinBCP47.Forexample,conventionallyzhisconsideredtorepresentthepredominant,MandarinformofChinese.Whereabsoluteclarityisneededyoucanusecmninsteadaslongasthatdoesn'tbreakinteroperability,however,ifyouareusingzhtorepresentalanguagewhichisnotMandarin,suchasHakkaChinese,youarebetteroffusingtheexplicitcode(inthatcase,hak). Ontheotherhand,zh-Hansuseszhinitsgenericsense.ThisisausefulwaytodescribewritinginSimplifiedChinese,sinceChinesetendstobewritteninthesameway,regardlessofthedialectofthereader. Thescriptsubtag Scriptsubtags zh-Hans az-Latn ReadmoreintheBCP47spec: 2.2.3ScriptSubtag 4.1ChoiceofLanguageTag Examplesoflanguagetagsincludingscriptsubtagsare: zh-Hans(SimplifiedChinese) az-Latn(Azerbaijani,writteninLatinscript-sinceAzerbaijanicanalsobewrittenusingtheArabicscript) ThescriptsubtagwasfirstintroducedinRFC4646.Thesubtagscomefrom,andarekeptuptodatewith,thelistofISO15924scriptcodes. Onlyonescriptsubtagcanappearinalanguagetag,anditmustimmediatelyfollowthelanguageoranyextlangsubtag.Itisalwaysfourletters long. Youshouldonlyusescripttagsiftheyarenecessarytomakeadistinctionyouneed.AsRFC4646co-author,Addison Phillips,writes,"Forvirtuallyanycontentthatdoesnotuseascripttagtoday,itremainsthebestpracticenottouseoneinthefuture". Ifyouspecificallywanttoindicatethatcontentisnotwritten,thereisasubtagforthat.Forexample,youcoulduseen-ZxxxtomakeitclearthatanaudiorecordinginEnglishisnotwrittencontent. Actually,manylanguagesubtagentriesintheregistrystronglydiscouragetheuseofscripttagsbyincludingaSuppressscriptfield.ThereissuchafieldintheSpanishexampleabove,whichindicatesthatSpanishisnormallywrittenusingLatinscript,andsotheLatnsubtagshouldnormallynotbeusedwithes. ThisexampleshowstheregistryentryforCyrillicscript,Cyrl,usedforlanguagessuchasRussian: %% Type:script Subtag:Cyrl Description:Cyrillic Added:2005-10-16 %% Althoughforcommonusesoflanguagetagsitisnotlikelythatyouwillneedtospecifythescript,thereareoneortwosituations thathavebeencryingoutforitforsometime.OnesuchexampleisChinese.TherearemanyChinesedialects,oftenmutuallyunintelligible,but thesedialectsareallwrittenusingeitherSimplifiedorTraditionalChinesescript.PeopletypicallywanttolabelChinesetextaseither SimplifiedorTraditional,butuntilrecentlytherewasnowaytodoso.Peoplehadtobendsomethinglikezh-CN(meaningChineseasspokeninChina) tomeanSimplifiedChinese,eveninSingapore,andzh-TW(meaningChineseasspokeninTaiwan)forTraditionalChinese.(Otherpeople,however,usezh-HKforTraditionalChinese.)Theavailabilityofzh-Hansandzh-HantforChinesewritteninSimplifiedandTraditionalscriptsshouldimprove consistencyandaccuracy,andisalreadybecomingwidelyused,althoughofcourseyoumayneedtocontinuetousetheoldlanguagetagsinsomecasesforconsistency. Theregionsubtag Regionsubtags en-GB es-005 zh-Hant-HK ReadmoreintheBCP47spec: 2.2.4RegionSubtag 4.1ChoiceofLanguageTag Examplesoflanguagetagsincludingregionsubtagsinclude: en-GB(BritishEnglish) es-005(SouthAmericanSpanish) zh-Hant-HK(TraditionalChineseasusedinHongKong) TheregionsubtaginRFC3066tookitsvaluesfromtheISO3166countrycodes.Thesetwo-lettercodesarestillavailablefromthenew registry,buttheregistryalsolists3-digitUNM.49regioncodes.Theadvantageofthesecodesisthattheycanrepresentmorethanjustcountries. Forexample,localizationgroupshaveforsometimewantedtolabeltheircarefullycraftedtranslationsasLatin-AmericanSpanish,ratherthanthe Spanishofanyparticularcountry.WithRFC5646thisispossible;theappropriatelanguagetagises-419. Onlyoneregionsubtagcanappearinalanguagetag,anditmustappearafterthelanguagesubtagandanyextlangandscripttags.Itisatwo-letteralphaor3-digitnumericcode.Youcanhavealanguagecodeimmediatelyfollowedbyaregioncode,justasyouare usedtoforlanguagetagssuchasen-US. Onceagain,youshouldonlyuseregionsubtagsiftheyarenecessarytomakeadistinctionyouneed.Unlessyouspecificallyneedto highlightthatyouaretalkingaboutItalianasspokeninItalyyoushoulduseitforItalian,andnotit-IT.The samegoesforanyotherpossiblecombination. TheseexamplesfromtheregistryshowthecodesforAustria,AT,andNorthernAfrica,015: %% Type:region Subtag:AT Description:Austria Added:2005-10-16 %% Type:region Subtag:015 Description:NorthernAfrica Added:2005-10-16 %% Variantsubtags Variantsubtags sl-nedis sl-IT-nedis de-CH-1901 ReadmoreintheBCP47spec: 2.2.5VariantSubtags 4.1ChoiceofLanguageTag Variantsubtagsarevaluesusedtoindicatedialectsorscriptvariationsnotalreadycoveredbycombinationsoflanguage,scriptand regionsubtag.The variantsubtagsmustappearafteranylanguage,scriptorregionsubtags,butscriptandregionsubtagsdonotneedtoprecedethem. Itisunlikelythatyouwillneedtousevariantsubtagsunlessyouareworkinginaspecializedarea. Thefollowingexamplesmayhelpyouunderstandwhatthesesubtagsdo. sl-nedis(theNadizadialectofSlovenian) sl-rozaj(theRezijandialectofSlovenian) sl-IT-nedis(thespecificvariantoftheNadizadialectofSlovenianthatisspokeninItaly) de-CH-1901(thevariantofGermanorthographydatingfromthe1901reforms,asseeninSwitzerland) ThisexamplefromtheregistryshowsthecodefortheNadizadialectofSlovenian,nedis: %% Type:variant Subtag:nedis Description:Natisonedialect Description:Nadizadialect Added:2005-10-16 Prefix:sl %% Intheregistrythesesubtagsaretiedtoaspecificlanguage(andpossiblyadditionalsubtagsbetweenthissubtagandtheprimarylanguagesubtag)bythe'Prefix'field.Thenedisexampleshownaboveshould onlybeusedwithSlovenian. Ifyouneedtoexpressaparticulardialectalorscriptnuancethatisnotcurrentlyavailable,youshouldproposeavariantsubtagorsubtagsforinclusioninthe registryusingtheregistrationprocedureoutlinedinRFC5646. Extensionandprivate-usesubtags Extensionsubtags de-DE-u-co-phonebk Privateusesubtags en-US-x-twain ReadmoreintheBCP47spec: 2.2.7PrivateUseSubtags 2.2.6ExtensionSubtags 4.1ChoiceofLanguageTag Ifyoufeelyoureallyneedtousethesesubtags,youshouldreadthespecification,ratherthanthisarticle. Extensionandprivateusesubtagsareintroducedbyasinglelettertag,or'singleton'.Anorganizationcanproposeasingletonforanextension.ItsintendedusemustbedescribedbyanRFC(IETFspecification).Thesingletonwillbeaddedtotheregistryifitsuccessfullypassesareview.Thesingletonxisreservedforprivateuse.Multiplesubtagsareallowedafterthesingleton;however,asforallsubtags,theymusteachbe8orlesscharactersinlength. Extensionsubtagsallowforextensionstothelanguagetag.Forexample,theextensionsubtaguhasbeenregisteredbytheUnicodeConsortiumtoaddinformationaboutlanguageorlocalebehavior.Manylocaleidentifiersrequireadditional"tailorings"oroptionsforspecificvalueswithinalanguage,culture,region,orothervariation.Thisextensionprovidesamechanismforusingtheseadditionaltailoringswithinlanguagetagsforgeneralinterchange. Forexample,thefollowingindicatesthatphonebookcollationordershouldbeusedbyanapplication,thatsorteddatainadocumentissortedaccordingtothiscollation,andsoon. de-DE-u-co-phonebk Theu-extensionisdefinedinRFC6067,whichpointstotheUnicodeConsortium'sCommonLocaleDataRepository(CLDR)fordetailsonthesubtagsthatfollowit.ItisnotdefinedbyBCP47. Private-usesubtagsdonotappearinthesubtagregistry,andarechosenandmaintainedbyprivateagreementamongstparties. BecausethesesubtagsareonlymeaningfulwithinprivateagreementsandcannotbeusedinteroperablyacrosstheWeb,theyshouldbeusedwithgreatcare,andavoidedwheneverpossible. ThefollowingexampleofaprivateusesubtagmayidentifyaspecifictypeofUSEnglish,butonlywithinaclosedcommunity.Outsideof thatprivateagreement,itsmeaningcannotbereliedupon. en-US-x-twain Grandfatheredandredundantsubtags ReadmoreintheBCP47spec: 2.2.8GrandfatheredandRedundantRegistrations Grandfatheredtagsarespecialcases,providedforbackwardscompatibility.TheyaresubtagsthatwereregisteredbeforeRFC4646thatcannotbecompletelycomposedfromthesubtagsinthecurrentregistry,ordonotfitthesyntaxcurrentlydefinedforlanguagetags. RedundanttagsarelanguagetagscomposedofasequenceofsubtagsandregisteredbeforeRFC4646thatcannowbeformedbycombiningseparatesubtagsfromthecurrentregistry.Theoriginalregistrationsremainintheregistrymostly'asamatterofhistoricalcuriosity'. Manygrandfatheredtagshavebeensupercededbysubtagsorcombinationsofsubtagsintheregistry.Suchgrandfatheredtagsarenowdeprecated,andusuallycontainaPreferred-Valuefieldthatindicateshowyououghttorepresentthatlanguageinstead.Forinstance,thefollowingexampleofagrandfatheredtagindicatesthatyoushouldusethejbolanguagesubtaginsteadofart-lojban. %% Type:grandfathered Tag:art-lojban Description:Lojban Added:2001-11-11 Deprecated:2003-09-02 Preferred-Value:jbo %% Matchinglanguagetags Matchingdifferentlanguagetagsisimportantforanumberofapplications.AccordingtoBCP47encanbesaidtomatchen-GB.For example,thefollowingCSScodecolorsallEnglishtextredinbrowsersthatsupportthepseudo-attribute:lang. :lang(en){color:red;} Inthefollowingcode,thetextdescribedaslang="en-GB"willbered.

Enjanvier,touteslesboutiquesdeLondresaffichentdespanneaux SALE,maisenfaitcesmagasinssontbienpropres!

Ontheotherhand,giventhefollowingCSSdeclaration, :lang(en-GB){color:red;} theword'SALE'shouldnotberedinthefollowingcode.

Enjanvier,touteslesboutiquesdeLondresaffichentdespanneaux SALE,maisenfaitcesmagasinssontbienpropres!

WiththeavailabilityofadditionaltagsinRFC5646,matchingisalittlemorecomplicated.Inaddition,itscompanion,RFC4647MatchingofLanguageTags,describesmorethanonepossibleapproachtomatching. Matchingwillbedescribedinanotherarticle. Bytheway LanguagetagsforHTMLwerefirstformallydefinedinRFC2070,F.Yergeau,et.al.InternationalizationoftheHypertextMarkupLanguage.RFC2070wasincorporatedintoHTML4,andhasbeenreclassifiedashistoric. NotetherehavebeenchangestoISOlanguagecodes. In1989iw,in,andjiwerewithdrawnandreplacedbyhe,id,andyi.Morerecently,theISOcountrycodecs,thatusedtorepresentCzechoslovakia, waschangedtorepresentSerbiaandMontenegro.Suchchangescanleadtoconfusionwhencomparingcodesthatwereassignedtotextoveralong period.ThenewIANAsubtagregistryallowsfortagstobedeprecatedandsupersededbynewtags,butwillneverremoveorchangethemeaningofa subtag.ItisexpectedthatISOwillalsofollowasimilarpolicyforthefuture. ManyotherW3CandWeb-relatedspecificationsuselanguagetags: XHTML1.0useslanguagetagsintheHTMLlangattributeandtheXMLxml:langattribute,aswellasthehreflangattribute. HTTPuseslanguagetagsintheAccept-LanguageandContent-Languageheaders. SMILandSVGcanuselanguagetagsintheswitchstatement. CSSandXSLuselanguagetagsfordetailedstylecontrol. Notealsothatlanguageinformationcanbeattachedtoobjectssuchasimagesandincludedaudiofiles. Furtherreading Gettingstarted?LanguageontheWeb Choosingalanguagetag Relatedlinks,AuthoringHTML&CSS Language Relatedlinks,AuthoringXML Language Relatedlinks,AuthoringSVG Language Relatedlinks,Developingschemas Language


請為這篇文章評分?