8 Language information and text direction
文章推薦指數: 80 %
Inheritance of language codes previous next contents elements attributes index 8Languageinformationandtext direction Contents Specifyingthelanguageofcontent:the langattribute Languagecodes Inheritanceoflanguagecodes Interpretationoflanguage codes Specifyingthedirectionoftextand tables:thedirattribute Introductiontothebidirectional algorithm Inheritanceoftextdirection information Settingthedirectionofembedded text Overridingthebidirectionalalgorithm: theBDOelement Characterreferencesfordirectionality andjoiningcontrol Theeffectofstylesheetson bidirectionality Thissectionofthedocumentdiscussestwoimportantissuesthataffectthe internationalizationofHTML:specifyingthelanguage(thelang attribute)anddirection(the dirattribute)oftextinadocument. 8.1Specifyingthelanguageof content:thelangattribute Attributedefinitions lang=language-code[CI] Thisattributespecifiesthebaselanguageofanelement'sattributevalues andtextcontent.Thedefaultvalueofthisattributeisunknown. Languageinformationspecifiedviathelang attributemaybeusedbyauseragenttocontrolrenderinginavarietyof ways.Somesituationswhereauthor-suppliedlanguageinformationmaybehelpful include: Assistingsearchengines Assistingspeechsynthesizers Helpingauseragentselectglyphvariantsforhighqualitytypography Helpingauseragentchooseasetofquotationmarks Helpingauseragentmakedecisionsabouthyphenation,ligatures,andspacing Assistingspellcheckersandgrammarcheckers The langattributespecifiesthelanguageofelementcontentand attributevalues;whetheritisrelevant foragivenattributedependsonthesyntaxandsemanticsoftheattributeand theoperationinvolved. Theintentofthelangattributeistoallowuseragentstorender contentmoremeaningfullybasedonacceptedculturalpracticeforagiven language.Thisdoesnotimplythatuseragentsshouldrendercharactersthat areatypicalforaparticularlanguageinlessmeaningfulways;useragents mustmakeabestattempttorenderallcharacters, regardlessofthevaluespecifiedbylang. Forinstance,ifcharactersfromtheGreekalphabetappearinthemidstof Englishtext:
...InterpretedasFrenchagain...
...Frenchtextinterruptedby
Note.Tablecellsmayinheritlang
valuesnotfromitsparentbutfromthefirstcellinaspan.Pleaseconsult
thesectiononalignment
inheritancefordetails.
8.1.3Interpretationoflanguagecodes
InthecontextofHTML,alanguagecodeshouldbeinterpretedbyuseragents
asahierarchyoftokensratherthanasingletoken.Whenauseragentadjusts
renderingaccordingtolanguageinformation(say,bycomparingstylesheet
languagecodesandlangvalues),itshouldalwaysfavoranexactmatch,but
shouldalsoconsidermatchingprimarycodestobesufficient.Thus,ifthe
langattributevalueof"en-US"issetfortheHTML
element,auseragentshouldpreferstyleinformationthatmatches"en-US"
first,thenthemoregeneralvalue"en".
Note.Languagecodehierarchiesdonotguaranteethat
alllanguageswithacommonprefixwillbeunderstoodbythosefluentinoneor
moreofthoselanguages.Theydoallowausertorequestthiscommonalitywhen
itistrueforthatuser.
8.2Specifyingthedirectionof
textandtables:thedirattribute
Attributedefinitions
dir=LTR|
RTL[CI]
Thisattributespecifiesthebasedirectionofdirectionallyneutraltext
(i.e.,textthatdoesn'thaveinherentdirectionalityasdefinedin
[UNICODE])inanelement'scontentandattributevalues.Italsospecifies
thedirectionalityoftables.
Possiblevalues:
LTR:Left-to-righttextortable.
RTL:Right-to-lefttextortable.
Inadditiontospecifyingthelanguageofadocumentwiththelang
attribute,authorsmayneedtospecifythebase
directionality(left-to-rightorright-to-left)ofportionsofa
document'stext,oftablestructure,etc.Thisisdonewiththedir
attribute.
The[UNICODE]specificationassignsdirectionalitytocharactersand
definesa(complex)algorithmfordeterminingtheproperdirectionalityof
text.Ifadocumentdoesnotcontainadisplayableright-to-leftcharacter,a
conforminguseragentisnotrequiredtoapplythe[UNICODE]bidirectional
algorithm.Ifadocumentcontainsright-to-leftcharacters,andiftheuser
agentdisplaysthesecharacters,theuseragentmustusethebidirectional
algorithm.
AlthoughUnicodespecifiesspecialcharactersthatdealwithtextdirection,
HTMLoffershigher-levelmarkupconstructsthatdothesamething:thedir
attribute(donotconfusewiththeDIRelement)andtheBDO
element.Thus,toexpressaHebrewquotation,itismoreintuitivetowrite
...right-to-lefttextagain...
Inlineelements,ontheotherhand,donotinheritthedir
attribute.Thismeansthataninlineelementwithoutadir
attributedoesnotopenanadditionallevelofembeddingwith
respecttothebidirectionalalgorithm.(Here,anelementisconsideredtobe
block-levelorinlinebasedonitsdefaultpresentation.NotethattheINSandDEL
elementscanbeblock-levelorinlinedependingontheircontext.)
8.2.3Settingthedirectionofembeddedtext
The[UNICODE]bidirectionalalgorithmautomaticallyreversesembedded
charactersequencesaccordingtotheirinherentdirectionality(asillustrated
bythepreviousexamples).However,ingeneralonlyonelevelofembeddingcan
beaccountedfor.Toachieveadditionallevelsofembeddeddirectionchanges,
youmustmakeuseofthedirattributeonaninlineelement.
Considerthesameexampletextasbefore:
english1HEBREW2english3HEBREW4english5HEBREW6
Supposethepredominantlanguageofthedocumentcontainingthisparagraph
isEnglish.Furthermore,theaboveEnglishsentencecontainsaHebrewsection
extendingfromHEBREW2throughHEBREW4andtheHebrewsectioncontainsan
Englishquotation(english3).Thedesiredpresentationofthetextisthus:
english14WERBEHenglish32WERBEHenglish56WERBEH
------->
E
E
Toachievetwoembeddeddirectionchanges,wemustsupplyadditional
information,whichwedobydelimitingthesecondembeddingexplicitly.Inthis
example,weusetheSPANelementandthedirattributetomarkupthetext:
english1
Thistellsthebidirectionalalgorithm"Leavemeleft-to-right!"andwould
producethedesiredpresentation:
english12WERBEHenglish3
4WERBEHenglish56WERBEH
The
BDOelementshouldbeusedinscenarioswhereabsolutecontrolover
sequenceorderisrequired(e.g.,multi-languagepartnumbers).The
dirattributeismandatoryforthiselement.
AuthorsmayalsousespecialUnicodecharacterstooverridethe
bidirectionalalgorithm--LEFT-TO-RIGHTOVERRIDE(202D)orRIGHT-TO-LEFT
OVERRIDE(hexadecimal202E).ThePOPDIRECTIONALFORMATTING(hexadecimal202C)
characterendseitherbidirectionaloverride.
Note.Recallthatconflictscanariseifthedir
attributeisusedoninlineelements(includingBDO)concurrentlywiththe
corresponding[UNICODE]formattingcharacters.
BidirectionalityandcharacterencodingAccordingto
[RFC1555]and[RFC1556],therearespecialconventionsfortheuseof
"charset"parametervaluestoindicatebidirectionaltreatmentinMIMEmail,in
particulartodistinguishbetweenvisual,implicit,andexplicit
directionality.Theparametervalue"ISO-8859-8"(forHebrew)denotesvisual
encoding,"ISO-8859-8-i"denotesimplicitbidirectionality,and"ISO-8859-8-e"
denotesexplicitdirectionality.
BecauseHTMLusestheUnicodebidirectionalityalgorithm,conforming
documentsencodedusingISO8859-8mustbelabeledas"ISO-8859-8-i".Explicit
directionalcontrolisalsopossiblewithHTML,butcannotbeexpressedwith
ISO8859-8,so"ISO-8859-8-e"shouldnotbeused.
Thevalue"ISO-8859-8"impliesthatthedocumentisformattedvisually,
misusingsomemarkup(suchas
TABLEwithrightalignmentandnolinewrapping)
toensurereasonabledisplayonolderuseragentsthatdonothandle
bidirectionality.Suchdocumentsdonotconformtothepresentspecification.
Ifnecessary,theycanbemadetoconformtothecurrentspecification(andat
thesametimewillbedisplayedcorrectlyonolderuseragents)byaddingBDO
markupwherenecessary.Contrarytowhatissaidin
[RFC1555]and[RFC1556],ISO-8859-6(Arabic)isnot
visualordering.
8.2.5Character
referencesfordirectionalityandjoiningcontrol
Sinceambiguitiessometimesariseastothedirectionalityofcertain
characters(e.g.,punctuation),the[UNICODE]specification
includescharacterstoenabletheirproperresolution.Also,Unicodeincludes
somecharacterstocontroljoiningbehaviorwherethisisnecessary(e.g.,some
situationswithArabicletters).HTML4includescharacterreferencesforthesecharacters.
ThefollowingDTDexcerptpresentssomeofthedirectionalentities:
Thezwnjentityisusedtoblockjoiningbehaviorincontexts
wherejoiningwilloccurbutshouldn't.Thezwjentitydoesthe
opposite;itforcesjoiningwhenitwouldn'toccurbutshould.Forexample,the
Arabicletter"HEH"isusedtoabbreviate"Hijri",thenameoftheIslamic
calendarsystem.Sincetheisolatedformof"HEH"lookslikethedigitfiveas
employedinArabicscript(basedonIndicdigits),inordertoprevent
confusing"HEH"asafinaldigitfiveinayear,theinitialformof"HEH"is
used.However,thereisnofollowingcontext(i.e.,ajoiningletter)towhich
the"HEH"canjoin.Thezwjcharacterprovidesthatcontext.
Similarly,inPersiantexts,therearecaseswherealetterthatnormally
wouldjoinasubsequentletterinacursiveconnectionshouldnot.The
characterzwnjisusedtoblockjoininginsuchcases.
Theothercharacters,lrmandrlm,areusedto
forcedirectionalityofdirectionallyneutralcharacters.Forexample,ifa
doublequotationmarkcomesbetweenanArabic(right-to-left)andaLatin
(left-to-right)letter,thedirectionofthequotationmarkisnotclear(isit
quotingtheArabictextortheLatintext?).Thelrmand
rlmcharactershaveadirectionalpropertybutnowidthandnoword/line
breakproperty.Pleaseconsult[UNICODE]formore
details.
Mirroredcharacterglyphs.Ingeneral,the
bidirectionalalgorithmdoesnotmirrorcharacterglyphsbutleavesthem
unaffected.Anexceptionarecharacterssuchasparentheses(see
[UNICODE],table4-7).Incaseswheremirroringisdesired,forexamplefor
EgyptianHieroglyphs,GreekBustrophedon,orspecialdesigneffects,this
shouldbecontrolledwithstyles.
8.2.6The
effectofstylesheetsonbidirectionality
Ingeneral,usingstylesheetstochangeanelement'svisualrenderingfrom
block-leveltoinlineorvice-versaisstraightforward.However,becausethe
bidirectionalalgorithmreliesonthe
inline/block-leveldistinction,specialcaremustbetakenduringthe
transformation.
Whenaninlineelementthatdoesnothaveadirattributeistransformedto
thestyleofablock-levelelementbyastylesheet,itinheritsthedir
attributefromitsclosestparentblockelementtodefinethebasedirectionof
theblock.
Whenablockelementthatdoesnothaveadirattributeistransformedto
thestyleofaninlineelementbyastylesheet,theresultingpresentation
shouldbeequivalent,intermsofbidirectionalformatting,totheformatting
obtainedbyexplicitlyaddinga
dirattribute(assignedtheinheritedvalue)to
thetransformedelement.
previous next
contents elements attributes
index
延伸文章資訊
- 1HTML ISO Language Code Reference
According to the W3C recommendation you should declare the primary language for each Web page wit...
- 2Tag content with international language codes - GOV.UK
Use the World Wide Web Consortium (W3C) guidance to: show you how to annotate language on the web...
- 38 Language information and text direction
Inheritance of language codes
- 4HTML - Wikipedia
The HyperText Markup Language, or HTML is the standard markup language for documents ... The Worl...
- 5HTML ISO Language Codes - W3docs
The HTML lang attribute declares the language of a web page or only a part of a web page. It is u...