; TeX output 2011.12.20:1613K [( <DtGGcmr17The7tmathematicsofRAID-6[XQ cmr12H.PreterAnving cmmi12pFirstvrersion20January2004 'LastupSdated20Decemrber2011 93"K`y cmr10RAID-6suppGortslosinganytwodrives.yThewaythisisdoneisbycomputingtwo$"syndromes,UUgenerallyreferred*"V cmbx10PandQ. aH$".Nff cmbx121 cmmi10A8+BG=ABq.f0[t3.="TheUUadditiveidentityelement(0)isrepresentedbyf32ϻW=uf02g;Q8f02gJ=-hf04g #f02g>3ϻW=uf04g;Q8f02gJ=-hf08gf02g>4ϻW=uf08g;Q8f02gJ=-hf10gf02g>5ϻW=uf10g;Q8f02gJ=-hf20gf02g>6ϻW=uf20g;Q8f02gJ=-hf40gf02g>7ϻW=uf40g;Q8f02gJ=-hf80g9="(Note,UUhowever:qǸf02gß8 5N=f1dgG.)aR="F*orUUexample:T-'f8dg#=tAf80g-+8f08g+8f04g+8f01g ##=tAf02g=7詐+8f02gܟ3n/+8f02gܟ2+8f01g׍="Thus:tQA8f8dg=Af02gܟ7n/+Af02gܟ3+Af02gܟ2+A="or,UUequivqalently*,晍ݵA8f8dg=(((Af02gܟ45O)+A)f02g+A)f02gܟ2n/+A`2 K [( +[s12.="R}'aisingոtoapower(repGeatedmultiplicationwiththesamevqalue)iscongruentmoGd255 ="(cardinalityvZofallelementsexceptf00gV).AlsonotethattheexpGonentisanor}'dinary="inte}'germodulo255UU(anelementin' msbm10Z255 uY)asoppGosedtoaGalois eldelement. doA^256d=,f01g8Ad=, AoA^255d=,f01goA^254d=,A^255 uY=qAd=, f01g=qAϲ=*dA^1Abu cmex109 Ab= Ab;L/A6=f00g" +[s13.="Thereareelements(g[ٲ),Jcalledgener}'ators,Iofthe eldsuchthatg[ٟ^ 0ercmmi7ndoGesn'trepeatuntil="theyhaveexhaustedallelementsofthe eldexceptf00g.XBF*ortheLinuxRAID-6 eld="representation,bf02gIòisdsucheagenerator{asisf02gX`nCforanynwhichdisrelativeprime="toUU255.+[s14.="Accordingly*,weucansimplifythisbyuprecomputingafew $"multiplicationUUtables:5܍ϵA~в=<$Վg[ٟ^y@Lxy!wfe5q) (֍g[ٟry@Lx,%+8f01g䕲=g[ٟy@Lx,%8(g[ٟy@Lx+f01gܲ)1י(17)<hB~в=<$ҵg[ٟ^xy!wfe5q) (֍g[ٟry@Lx,%+8f01g䕲=g[ٟx ݞ8(g[ٟy@Lx,%+f01gܲ)1י(18)y3"...qwhichUUonlydepGendonxandy.asopposedtoonthedatabytes.3"TheUUexpressionthenbGecomes:{PDx=A8(P+Pxy Wk)+BQ(Q+Qxy)י(19){3"W*eUUcanthengetDy#ܲfromthepreviousexpression:Dy=(P8+Pxy Wk)+Dxי(20) v$"3costofcomputing $"theQsyndrome.PThebiggestcostisrelatedtothecostofGalois eldmultiplication,which$"doGesn'tmapconvenientlyontostandardCPUahardware,andthereforehastypicallybGeen$"doneUUbytableloGokup.3"T*ableQloGokups,however,areQinherentlyserializing;itwouldbGedesirabletomakeuseof$"theUUwidedatapathsofcurrentCPUs.3"InUUordertodothis,wefactorequation2assuch:Q=((:::Dn1:::)8g+D2|s)g+D1|s)g+D0י(21)̍3"TheqonlyropGerationsinthisisaddition,i.e.BXOR,andmultiplicationbyqg"=f02gG.BThus,$"weonlyneedanecientwaytoimplementmultiplicationbyf02ginordertocomputeQ$"quickly*,UUnotarbitrarymultiplication.3"MultiplicationUUbyf02g*forasinglebytecanbGeimplemetedusingtheCcode:$"uint8_t?c,cc;$"cc?=(c<<1)^((c&0x80)?0x1d:0);{3"Now,wewanttodothisonmultiplebytesinparallel.YAssumeforthemomentweare$"ona32-bitmachine(theextensionto64bitsshouldbGeobvious),andseparatetheseinto$"twoUUparts:$"uint32_t?v,vv;$"vv =?(v<<1)&0xfefefefe;$"vv?^=((v&0x00000080)?0x0000001d:0)+C~((v?&0x00008000)?0x00001d00:0)+C~((v?&0x00800000)?0x001d0000:0)+C~((v?&0x80000000)?0x1d000000:0);`5;K [( 3"The?0xfefefefe>ofthe rststatementmasksanybitsthatgetshiftedintothenextbyte. $"The}Ysecond}XstatementisclearlytoGocomplextobe}Xecientlyexecuted,Zhowever.If}Xwecan$"proGduceUUamaskbasedonthetopbitineachbyte,wecouldjustdo:7n$"uint32_t?v,vv;$"vv =?(v<<1)&0xfefefefe;$"vv?^=MASK(v)&0x1d1d1d1d;3"InUUstandardpGortableC,oneimplemenationofthisMASK()functionlookslike:$"uint32_t?MASK(uint32_tv)$"{.v?&=0x80808080;I/*Extractthetopbits*/.return?(v<<1)-(v>>7);/*OverflowonthetopbitisOK*/$"}3"Theresultis0x00foranybytewiththetopbitclear,P0xffforanybytewiththetop$"bitUUset.qThisisthealgorithmusedinthe leraid6int.uc.3"F*oradditionalspGeedimprovements,itisdesirabletouseanyintegervectorinstruction$"setǼthatǽhappGenstobeǽavqailableonthemachine,VsuchǼasMMXǟorSSE-2onx86,VAltiV*ec$"onPowerPC,etc.LTheseinstructionsetstypicallyhavequirksthatmaymakethemeasier$"orhardertousethantheintegerimplementation,}butusuallyeasier.F*orexample,the$"MMX/SSE-2FinstructionEPCMPGTB2convenientlyimplementsFtheMASK()functionwhencom-$"paringTagainstUzero,andthePADDB6instructionimplementstheshiftandmaskinthe rst$"lineUUoftheopGerationsonvvwhenaddedwithitself.3"Note/thatnoneofthiswillavoid/the/arbitrarymultiplicationsofequations5and19.$"Thus,.in$2-disk-degraded%moGde,performance$willbe%veryslow.NHowever,.itis%expGectedthat$"thatBwillAbGearareoGccurrence,andthatperformanceAwillnotmattersigni cantlyinthat$"case.$"6N cmbx123.1BSp`ecialnotesonPowerPCAltiVecandx86SSSE3uT$"TheFAltivecESIMDvectorinstructionsetEforPowerPChasEaspGecialinstruction, vperm,$"whichUUdoGesaparalleltablelookupusingthebottom vebitsofeachbyteinavector.3"ThiscanbGeusedtohandlearbitraryscalarvectormultiplication(asinequations5and$"19)UUquickly*,bydecompGosingthevector.3"ThisUUdecompGositionissimplyamatterofobservingthat,fromthedistributivelaw:5dVn=♌VaP+8Vb>A8Vn=♌A8VaP+AVb3"F*or3the4decompGositiontowork,(ntherecanonlybGe32pGossiblevqaluesforeachbyte4inVa$"orVbD;theeasiestsuchdecompGositionissimplyVabeingthelowfourbitsandVb\being$"thehighfourbits;sinceadditionisXORthisisavqaliddecompGosition,andthereareonly$"16UUpGossiblevqaluesofeach.3"Thus,ќforeachmultiplication(i.e.:vqalueofA)weneedtosetupapairofvectorregisters,$"onewhichcontains(Azf00gwv,ŵAf01g,ƵAf02g,...<6Af0fg)andonewhichcontains(Azf00gwv,$"A8f10gܲ,UUAf20g,UU...qǵAff0g).3"If^thesevectors^areinv12andv13respGectively*,`andv14setupto^contain(f04g,`f04g,$"...),UUwecancomputev1g A8v0/thisUUway:`6GK [( N"xvsrb v1,?v0,v14 N"xvperm?v2,v12,v12,v0N"xvperm?v1,v13,v13,v1N"xvxor v1,?v2,v13"OnxmostAltivecproGcessors,̥thiswillexecuteinthreecycles.8Notethatwedon'tactually$"needtomaskthetopbitsforthe rstvperm; sincewerepGeatv12twicewee ectivelyignore$"bitUU4,andbits5-7areignoredbythehardwareanyway*.3"TheSSSE3(SupplementalSSE3)extensionstothex86instructionsetincludesaPSHUFB$"instruction,whichcanbGeusedinasimilarway*.xPSHUFBusesbit7asacontrolbit,which$"meansՋthatՊthelowerՋhalfopGerationhastobGemasked;$simplyreplicatingtheinputswillnot$"help.WF*urthermore,since/no0PSRAB instructionexists,one0alsohastomaskthehighhalf.$"Thus,UUasabGove,withxmm14havingthescalarconstantf0fg:N"xmovdqa xmm2,?xmm0N"xpsrawxmm0,?4N"xmovdqa xmm1,?xmm12N"xmovdqa xmm3,?xmm13N"xpandxmm0,?xmm14N"xpandxmm2,?xmm14N"xpshufb xmm1,?xmm0N"xpshufb xmm3,?xmm2N"xpxorxmm1,?xmm3!č$"4P8+P0Q=Dzq+Xzי(24)_;pQ?vw߲=>Q8+Q0Q=g[ٟz0JDzq+g[ٟzXzb=g[ٟz(Dzq+Xz)=g[ٟz8P?י(25)3"By/\assumption,6Xzb6=Dzandthus/]P^?_6=f00gG.eF*urthermore,6sinceg[ٟ^z6=f00gvpforany/\zp, $"Q^?_6=f00gG.3"ThusUUititvqalidtostate:Q?=P?_=g[ٟzי(26)`7WK [( 3"SinceUU0z7noted?abGove,9forthecase?ofacorruptdatadrive,8P^?ʁ6=1f00g,9andQ^?ʀ6=1f00g.1The$"other{pGossiblecasescanbetriviallyshownztoresultinvqariouscombinationswhichinvolve$"P^?9and/orUUQ^?bGeingzero:.gzffVd_G ffP^?  ff/Q^?N fflffrfdͤ ffΟfdNoUUcorruption' ffq=f00g ff@=f00g ffffrͤ ffΟfdPUUdrivecorruptionw~ ffq6=f00g ff@=f00g ffffrͤ ffΟfdQUUdrivecorruptionb ffq=f00g ff@6=f00g ffffrͤ ffΟfdDataUUdrivecorruption͡ ffq6=f00g ff@6=f00g ffffr.qƍ3"or,UUequivqalently:g+ffYG ff P^03 ff0Q^0r ff}fffdͤ ffΟfdNoUUcorruption' ffqP=P^0 ff\sQ=Q^0 ffffͤ ffΟfdPUUdrivecorruptionw~ ffqP6=P^0 ff\sQ=Q^0 ffffͤ ffΟfdQUUdrivecorruptionb ffqP=P^0 ff\sQ6=Q^0 ffffͤ ffΟfdDataUUdrivecorruption͡ ffqP6=P^0 ff\sQ6=Q^0 ffff3"Obviously*,forthecasesofPorQdrivecorruption,justreplacethecorruptdatawith $"the$ recomputedP^0ForQ^09,-respGectively*.aZInthecaseofdatadrivecorruption,-oncethefaulty$"drive,hasbGeenidenti ed,recoverusingthePdriveinthesamewayasaone-diskerasure$"failure.3"ItshouldbGenotedthatalthoughwehaveusedscalarnotationforthecorruptdrive,data$"corruptionisactuallydetectedonabyteZbybytebasis.^Thus,thezeronesstestsshouldbGe$"doneforeachbyte,andz4inequation27reallyshouldbGeavectorresult, z.AUItit,ofcourse,a$"qualityofimplementationissuewhetherornotitispGossibletorecoverfrommultipledrives$"havingUUnon-overlappingcorruptionincorrespGondingsectorsorblocks.3"Finally*,44asawordofcautionitshouldbGenotedthatRAID-6byitselfcannot(inthe$"generalncase)evendetect,4nevermmindrecoverfrom,4dual-diskcorruption.Iftwodisksare$"corrupt0in0thesamebytepGositions,gtheabove0algorithm0will(again,ginthegeneralcase)$"introGduceyadditionaldatacorruptionbycorruptingzathirddrive.3However,Rthefollowing$"probabilisticٗpatternsarelikelytoٖbGeindicativeofsuchmultidiskcorruption,andaqual-$"ityi3implementationshouldtakei4appropriateaction,*suchasi4abGortingratherthanfurther$"corruptingUUdata:3"="zvqaluesUUinconsistentwiththenumbGerofdisks,forexamplez7=136whenn=20.3"="Inconsistent-z\IJvqalueswithinasinglehardwaresector.orbloGck.6OThisdoGesnotapply="toSoGccasionalbyteswithSnocorruption(P^?_=Q^?=f00gG)S{afterall,Tevenastanding="cloGckUUiscorrectonceevery12hours.`8 dK [( $"5 cmmi10 0ercmmi7O \cmmi5K`y cmr10ٓRcmr7u cmex10z!