8 Language information and text direction

文章推薦指數: 80 %
投票人數:10人

Inheritance of language codes  previous  next  contents elements attributes index 8Languageinformationandtext direction Contents Specifyingthelanguageofcontent:the langattribute Languagecodes Inheritanceoflanguagecodes Interpretationoflanguage codes Specifyingthedirectionoftextand tables:thedirattribute Introductiontothebidirectional algorithm Inheritanceoftextdirection information Settingthedirectionofembedded text Overridingthebidirectionalalgorithm: theBDOelement Characterreferencesfordirectionality andjoiningcontrol Theeffectofstylesheetson bidirectionality Thissectionofthedocumentdiscussestwoimportantissuesthataffectthe internationalizationofHTML:specifyingthelanguage(thelang attribute)anddirection(the dirattribute)oftextinadocument. 8.1Specifyingthelanguageof content:thelangattribute Attributedefinitions lang=language-code[CI] Thisattributespecifiesthebaselanguageofanelement'sattributevalues andtextcontent.Thedefaultvalueofthisattributeisunknown. Languageinformationspecifiedviathelang attributemaybeusedbyauseragenttocontrolrenderinginavarietyof ways.Somesituationswhereauthor-suppliedlanguageinformationmaybehelpful include: Assistingsearchengines Assistingspeechsynthesizers Helpingauseragentselectglyphvariantsforhighqualitytypography Helpingauseragentchooseasetofquotationmarks Helpingauseragentmakedecisionsabouthyphenation,ligatures,andspacing Assistingspellcheckersandgrammarcheckers The langattributespecifiesthelanguageofelementcontentand attributevalues;whetheritisrelevant foragivenattributedependsonthesyntaxandsemanticsoftheattributeand theoperationinvolved. Theintentofthelangattributeistoallowuseragentstorender contentmoremeaningfullybasedonacceptedculturalpracticeforagiven language.Thisdoesnotimplythatuseragentsshouldrendercharactersthat areatypicalforaparticularlanguageinlessmeaningfulways;useragents mustmakeabestattempttorenderallcharacters, regardlessofthevaluespecifiedbylang. Forinstance,ifcharactersfromtheGreekalphabetappearinthemidstof Englishtext:

Hersuper-powersweretheresultof γ-radiation,heexplained.

auseragent(1)shouldtrytorendertheEnglishcontentinanappropriate manner(e.g.,initshandlingthequotationmarks)and(2)mustmakeabest attempttorenderγeventhoughitisnotanEnglishcharacter. Pleaseconsultthesectionon undisplayablecharactersforrelatedinformation. 8.1.1Languagecodes The langattribute'svalueisalanguagecodethatidentifiesanatural languagespoken,written,orotherwiseusedforthecommunicationof informationamongpeople.Computerlanguagesareexplicitlyexcludedfrom languagecodes. [RFC1766]definesandexplainsthelanguagecodesthatmustbeusedinHTML documents. Briefly,languagecodesconsistofaprimarycodeandapossiblyempty seriesofsubcodes: language-code=primary-code("-"subcode)* Herearesomesamplelanguagecodes: "en":English "en-US":theU.S.versionofEnglish. "en-cockney":theCockneyversionofEnglish. "i-navajo":theNavajolanguagespokenbysomeNativeAmericans. "x-klingon":Theprimarytag"x"indicatesanexperimentallanguage tag Two-letterprimarycodesarereservedfor[ISO639]language abbreviations.Two-lettercodesincludefr(French),de(German),it(Italian), nl(Dutch),el(Greek),es(Spanish),pt(Portuguese),ar(Arabic),he (Hebrew),ru(Russian),zh(Chinese),ja(Japanese),hi(Hindi),ur(Urdu),and sa(Sanskrit). Anytwo-lettersubcodeisunderstoodtobea[ISO3166]country code. 8.1.2Inheritanceoflanguagecodes Anelementinheritslanguagecodeinformationaccordingtothefollowing orderofprecedence(highesttolowest): The langattributesetfortheelementitself. Theclosestparentelementthathasthelangattributeset(i.e.,the langattributeisinherited). TheHTTP"Content-Language"header(whichmaybeconfiguredinaserver). Forexample: Content-Language:en-cockney Useragentdefaultvaluesanduserpreferences. Inthisexample,theprimarylanguageofthedocumentisFrench("fr").One paragraphisdeclaredtobeinSpanish("es"),afterwhichtheprimarylanguage returnstoFrench.ThefollowingparagraphincludesanembeddedJapanese("ja") phrase,afterwhichtheprimarylanguagereturnstoFrench. -//W3C//DTDHTML4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Undocumentmultilingue ...InterpretedasFrench... ...InterpretedasSpanish...

...InterpretedasFrenchagain...

...Frenchtextinterruptedbysome JapaneseFrenchbeginshereagain...

Note.Tablecellsmayinheritlang valuesnotfromitsparentbutfromthefirstcellinaspan.Pleaseconsult thesectiononalignment inheritancefordetails. 8.1.3Interpretationoflanguagecodes InthecontextofHTML,alanguagecodeshouldbeinterpretedbyuseragents asahierarchyoftokensratherthanasingletoken.Whenauseragentadjusts renderingaccordingtolanguageinformation(say,bycomparingstylesheet languagecodesandlangvalues),itshouldalwaysfavoranexactmatch,but shouldalsoconsidermatchingprimarycodestobesufficient.Thus,ifthe langattributevalueof"en-US"issetfortheHTML element,auseragentshouldpreferstyleinformationthatmatches"en-US" first,thenthemoregeneralvalue"en". Note.Languagecodehierarchiesdonotguaranteethat alllanguageswithacommonprefixwillbeunderstoodbythosefluentinoneor moreofthoselanguages.Theydoallowausertorequestthiscommonalitywhen itistrueforthatuser. 8.2Specifyingthedirectionof textandtables:thedirattribute Attributedefinitions dir=LTR| RTL[CI] Thisattributespecifiesthebasedirectionofdirectionallyneutraltext (i.e.,textthatdoesn'thaveinherentdirectionalityasdefinedin [UNICODE])inanelement'scontentandattributevalues.Italsospecifies thedirectionalityoftables. Possiblevalues: LTR:Left-to-righttextortable. RTL:Right-to-lefttextortable. Inadditiontospecifyingthelanguageofadocumentwiththelang attribute,authorsmayneedtospecifythebase directionality(left-to-rightorright-to-left)ofportionsofa document'stext,oftablestructure,etc.Thisisdonewiththedir attribute. The[UNICODE]specificationassignsdirectionalitytocharactersand definesa(complex)algorithmfordeterminingtheproperdirectionalityof text.Ifadocumentdoesnotcontainadisplayableright-to-leftcharacter,a conforminguseragentisnotrequiredtoapplythe[UNICODE]bidirectional algorithm.Ifadocumentcontainsright-to-leftcharacters,andiftheuser agentdisplaysthesecharacters,theuseragentmustusethebidirectional algorithm. AlthoughUnicodespecifiesspecialcharactersthatdealwithtextdirection, HTMLoffershigher-levelmarkupconstructsthatdothesamething:thedir attribute(donotconfusewiththeDIRelement)andtheBDO element.Thus,toexpressaHebrewquotation,itismoreintuitivetowrite ...aHebrewquotation... thantheequivalentwithUnicodereferences: ‫״...aHebrewquotation...״‬ Useragentsmustnotusethelang attributetodeterminetextdirectionality. The dirattributeisinheritedandmaybeoverridden.Pleaseconsultthe sectionontheinheritanceoftextdirection informationfordetails. 8.2.1Introductiontothebidirectionalalgorithm Thefollowingexampleillustratestheexpectedbehaviorofthebidirectional algorithm.ItinvolvesEnglish,aleft-to-rightscript,andHebrew,a right-to-leftscript. Considerthefollowingexampletext: english1HEBREW2english3HEBREW4english5HEBREW6 Thecharactersinthisexample(andinallrelatedexamples)arestoredin thecomputerthewaytheyaredisplayedhere:thefirstcharacterinthefile is"e",thesecondis"n",andthelastis"6". Supposethepredominantlanguageofthedocumentcontainingthisparagraph isEnglish.Thismeansthatthebasedirectionisleft-to-right.Thecorrect presentationofthislinewouldbe: english12WERBEHenglish34WERBEHenglish56WERBEH E Thedottedlinesindicatethestructureofthesentence:English predominatesandsomeHebrewtextisembedded.Achievingthecorrect presentationrequiresnoadditionalmarkupsincetheHebrewfragmentsare reversedcorrectlybyuseragentsapplyingthebidirectionalalgorithm. If,ontheotherhand,thepredominantlanguageofthedocumentisHebrew, thebasedirectionisright-to-left.Thecorrectpresentationistherefore: 6WERBEHenglish54WERBEHenglish32WERBEHenglish1 ------->------->-------> EEE ...aright-to-lefttitle... ...right-to-lefttext... ...left-to-righttext...

...right-to-lefttextagain...

Inlineelements,ontheotherhand,donotinheritthedir attribute.Thismeansthataninlineelementwithoutadir attributedoesnotopenanadditionallevelofembeddingwith respecttothebidirectionalalgorithm.(Here,anelementisconsideredtobe block-levelorinlinebasedonitsdefaultpresentation.NotethattheINSandDEL elementscanbeblock-levelorinlinedependingontheircontext.) 8.2.3Settingthedirectionofembeddedtext The[UNICODE]bidirectionalalgorithmautomaticallyreversesembedded charactersequencesaccordingtotheirinherentdirectionality(asillustrated bythepreviousexamples).However,ingeneralonlyonelevelofembeddingcan beaccountedfor.Toachieveadditionallevelsofembeddeddirectionchanges, youmustmakeuseofthedirattributeonaninlineelement. Considerthesameexampletextasbefore: english1HEBREW2english3HEBREW4english5HEBREW6 Supposethepredominantlanguageofthedocumentcontainingthisparagraph isEnglish.Furthermore,theaboveEnglishsentencecontainsaHebrewsection extendingfromHEBREW2throughHEBREW4andtheHebrewsectioncontainsan Englishquotation(english3).Thedesiredpresentationofthetextisthus: english14WERBEHenglish32WERBEHenglish56WERBEH -------> E E Toachievetwoembeddeddirectionchanges,wemustsupplyadditional information,whichwedobydelimitingthesecondembeddingexplicitly.Inthis example,weusetheSPANelementandthedirattributetomarkupthetext: english1HEBREW2english3HEBREW4english5HEBREW6 AuthorsmayalsousespecialUnicodecharacterstoachievemultipleembedded directionchanges.Toachieveleft-to-rightembedding,surroundembeddedtext withthecharactersLEFT-TO-RIGHTEMBEDDING("LRE",hexadecimal202A)andPOP DIRECTIONALFORMATTING("PDF",hexadecimal202C).Toachieveright-to-left embedding,surroundembeddedtextwiththecharactersRIGHT-TO-LEFTEMBEDDING ("RTE",hexadecimal202B)andPDF. UsingHTMLdirectionalitymarkupwithUnicode characters.Authorsanddesignersofauthoringsoftwareshouldbe awarethatconflictscanariseifthedirattributeisusedoninline elements(includingBDO)concurrentlywiththecorresponding [UNICODE]formattingcharacters.Preferablyoneortheothershouldbeused exclusively.Themarkupmethodoffersabetterguaranteeofdocumentstructural integrityandalleviatessomeproblemswheneditingbidirectionalHTMLtext withasimpletexteditor,butsomesoftwaremaybemoreaptatusingthe [UNICODE]characters.Ifbothmethodsareused,greatcareshouldbe exercisedtoinsurepropernestingofmarkupanddirectionalembeddingor override,otherwise,renderingresultsareundefined. 8.2.4Overridingthebidirectionalalgorithm:theBDOelement Starttag:required,Endtag: required Attributedefinitions dir=LTR |RTL[CI] Thismandatoryattributespecifiesthebasedirectionoftheelement'stext content.Thisdirectionoverridestheinherentdirectionalityofcharactersas definedin[UNICODE].Possiblevalues: LTR:Left-to-righttext. RTL:Right-to-lefttext. Attributesdefinedelsewhere lang(languageinformation) Thebidirectionalalgorithmandthedirattributegenerallysufficeto manageembeddeddirectionchanges.However,somesituationsmayarisewhenthe bidirectionalalgorithmresultsinincorrectpresentation.TheBDO elementallowsauthorstoturnoffthebidirectionalalgorithm forselectedfragmentsoftext. Consideradocumentcontainingthesametextasbefore: english1HEBREW2english3HEBREW4english5HEBREW6 butassumethatthistexthasalreadybeenputinvisualorder.Onereason forthismaybethattheMIMEstandard([RFC2045], [RFC1556])favorsvisualorder,i.e.,thatright-to-leftcharacter sequencesareinsertedright-to-leftinthebytestream.Inanemail,theabove mightbeformatted,includinglinebreaks,as: english12WERBEHenglish3 4WERBEHenglish56WERBEH Thisconflictswiththe[UNICODE]bidirectional algorithm,becausethatalgorithmwouldinvert2WERBEH, 4WERBEH,and6WERBEHasecondtime,displayingtheHebrewwords left-to-rightinsteadofright-to-left. Thesolutioninthiscaseistooverridethebidirectionalalgorithmby puttingtheEmailexcerptina PREelement(toconservelinebreaks)andeach lineina BDOelement,whosedirattributeissetto LTR:

english12WERBEHenglish3
4WERBEHenglish56WERBEH
Thistellsthebidirectionalalgorithm"Leavemeleft-to-right!"andwould producethedesiredpresentation: english12WERBEHenglish3 4WERBEHenglish56WERBEH The BDOelementshouldbeusedinscenarioswhereabsolutecontrolover sequenceorderisrequired(e.g.,multi-languagepartnumbers).The dirattributeismandatoryforthiselement. AuthorsmayalsousespecialUnicodecharacterstooverridethe bidirectionalalgorithm--LEFT-TO-RIGHTOVERRIDE(202D)orRIGHT-TO-LEFT OVERRIDE(hexadecimal202E).ThePOPDIRECTIONALFORMATTING(hexadecimal202C) characterendseitherbidirectionaloverride. Note.Recallthatconflictscanariseifthedir attributeisusedoninlineelements(includingBDO)concurrentlywiththe corresponding[UNICODE]formattingcharacters. BidirectionalityandcharacterencodingAccordingto [RFC1555]and[RFC1556],therearespecialconventionsfortheuseof "charset"parametervaluestoindicatebidirectionaltreatmentinMIMEmail,in particulartodistinguishbetweenvisual,implicit,andexplicit directionality.Theparametervalue"ISO-8859-8"(forHebrew)denotesvisual encoding,"ISO-8859-8-i"denotesimplicitbidirectionality,and"ISO-8859-8-e" denotesexplicitdirectionality. BecauseHTMLusestheUnicodebidirectionalityalgorithm,conforming documentsencodedusingISO8859-8mustbelabeledas"ISO-8859-8-i".Explicit directionalcontrolisalsopossiblewithHTML,butcannotbeexpressedwith ISO8859-8,so"ISO-8859-8-e"shouldnotbeused. Thevalue"ISO-8859-8"impliesthatthedocumentisformattedvisually, misusingsomemarkup(suchas TABLEwithrightalignmentandnolinewrapping) toensurereasonabledisplayonolderuseragentsthatdonothandle bidirectionality.Suchdocumentsdonotconformtothepresentspecification. Ifnecessary,theycanbemadetoconformtothecurrentspecification(andat thesametimewillbedisplayedcorrectlyonolderuseragents)byaddingBDO markupwherenecessary.Contrarytowhatissaidin [RFC1555]and[RFC1556],ISO-8859-6(Arabic)isnot visualordering. 8.2.5Character referencesfordirectionalityandjoiningcontrol Sinceambiguitiessometimesariseastothedirectionalityofcertain characters(e.g.,punctuation),the[UNICODE]specification includescharacterstoenabletheirproperresolution.Also,Unicodeincludes somecharacterstocontroljoiningbehaviorwherethisisnecessary(e.g.,some situationswithArabicletters).HTML4includescharacterreferencesforthesecharacters. ThefollowingDTDexcerptpresentssomeofthedirectionalentities: Thezwnjentityisusedtoblockjoiningbehaviorincontexts wherejoiningwilloccurbutshouldn't.Thezwjentitydoesthe opposite;itforcesjoiningwhenitwouldn'toccurbutshould.Forexample,the Arabicletter"HEH"isusedtoabbreviate"Hijri",thenameoftheIslamic calendarsystem.Sincetheisolatedformof"HEH"lookslikethedigitfiveas employedinArabicscript(basedonIndicdigits),inordertoprevent confusing"HEH"asafinaldigitfiveinayear,theinitialformof"HEH"is used.However,thereisnofollowingcontext(i.e.,ajoiningletter)towhich the"HEH"canjoin.Thezwjcharacterprovidesthatcontext. Similarly,inPersiantexts,therearecaseswherealetterthatnormally wouldjoinasubsequentletterinacursiveconnectionshouldnot.The characterzwnjisusedtoblockjoininginsuchcases. Theothercharacters,lrmandrlm,areusedto forcedirectionalityofdirectionallyneutralcharacters.Forexample,ifa doublequotationmarkcomesbetweenanArabic(right-to-left)andaLatin (left-to-right)letter,thedirectionofthequotationmarkisnotclear(isit quotingtheArabictextortheLatintext?).Thelrmand rlmcharactershaveadirectionalpropertybutnowidthandnoword/line breakproperty.Pleaseconsult[UNICODE]formore details. Mirroredcharacterglyphs.Ingeneral,the bidirectionalalgorithmdoesnotmirrorcharacterglyphsbutleavesthem unaffected.Anexceptionarecharacterssuchasparentheses(see [UNICODE],table4-7).Incaseswheremirroringisdesired,forexamplefor EgyptianHieroglyphs,GreekBustrophedon,orspecialdesigneffects,this shouldbecontrolledwithstyles. 8.2.6The effectofstylesheetsonbidirectionality Ingeneral,usingstylesheetstochangeanelement'svisualrenderingfrom block-leveltoinlineorvice-versaisstraightforward.However,becausethe bidirectionalalgorithmreliesonthe inline/block-leveldistinction,specialcaremustbetakenduringthe transformation. Whenaninlineelementthatdoesnothaveadirattributeistransformedto thestyleofablock-levelelementbyastylesheet,itinheritsthedir attributefromitsclosestparentblockelementtodefinethebasedirectionof theblock. Whenablockelementthatdoesnothaveadirattributeistransformedto thestyleofaninlineelementbyastylesheet,theresultingpresentation shouldbeequivalent,intermsofbidirectionalformatting,totheformatting obtainedbyexplicitlyaddinga dirattribute(assignedtheinheritedvalue)to thetransformedelement. previous next  contents elements attributes  index



請為這篇文章評分?