您好,欢迎访问三七文档
ELSEVIERChemometficsandIntelligentLaboratorySystems35(1996)45-65ChemometricsandintelligentlaboratorysystemsMissingdatamethodsinPCAandPLS:ScorecalculationswithincompleteobservationsPhilipR.C.Nelson,PaulA.Taylor*,JohnF.MacGregorDepartmentofChemicalEngineering,McMasterUniuersity,Hamilton,ON,Canada,L8S4L8Received8June1995;revised27November1995;accepted11January1996AbstractAveryimportantprobleminindustrialapplicationsofPCAandPLSmodels,suchasprocessmodellingormonitoring,istheestimationofscoreswhentheobservationvectorhasmissingmeasurements.Thealternativeofsuspendingtheapplica-tionuntilallmeasurementsareavailableisusuallyunacceptable.TheproblemtreatedinthisworkisthatofestimatingscoresfromanexistingPCAorPLSmodelwhennewobservationvectorsareincomplete.Buildingthemodelwithincompleteob-servationsisnottreatedhere,althoughtheanalysisgiveninthispaperprovidesconsiderableinsightintothisproblem.Sev-eralmethodsforestimatingscoresfromdatawithmissingmeasurementsarepresented,andanalysed:amethod,termedsin-glecomponentprojection,derivedfromtheNIPALSalgorithmformodelbuildingwithmissingdata;amethodofprojectiontothemodelplane;anddatareplacementbytheconditionalmean.Expressionsaredevelopedfortheerrorinthescorescalculatedbyeachmethod.Theerroranalysisisillustratedusingsimulateddatasetsdesignedtohighlightproblemsitua-tions.Alargerindustrialdatasetisalsousedtocomparetheapproaches.Ingeneral,allthemethodsperformreasonablewellwithmoderateamountsofmissingdata(upto20%ofthemeasurements).However,inextremecaseswherecriticalcombi-nationsofmeasurementsaremissing,theconditionalmeanreplacementmethodisgenerallysuperiortotheotherap-proaches.Keywords:PCA;PLS;Missingdata;NIPALSalgorithm;EMalgorithm1.IntroductionTherearemanyreasonswhymeasurementsmaybemissingfromadataset.Missingmeasurementsoccurperiodicallywhensensorsfailoraretakenoff-lineforroutinemaintenance.Inothersituations,measurementsareremovedfromadatasetbecausegrossmeasurementerrorsoccurorsamplesaresimplynotcollectedattherequiredtime.Inthesecases,themeasurementsaremissedatrandomtimes.Inothersituations,missingmea-surementsoccuronaveryregularbasis.Acommonexampleoccurswhensensorshavedifferentsamplingperi-ods.*Correspondingauthor.Fax.:+19055211350.0169-7439/96/$15.00Copyright©1996ElsevierScienceB.V.Allrightsreserved.PHS0169-7439(96)00007-X46P.R.C.Nelsonetal./ChemometricsandIntelligentLaboratorySystems35(1996)45-65PCAandPLShavebeenwidelyusedtodevelopmodelsfromdatasetscomposedofobservationsonlargenumbersofhighlycorrelatedvariables.Inmanyofthesesituations,particularlythoseinvolvingindustrialpro-cesses,missingmeasurementsareacommonoccurrence.Toinsistonusingonlycompletedatasetswhenbuild-ingorapplyingPCAorPLSmodelswouldentailthrowingawaylargeamountsofthedata.Therefore,itisim-portantthatefficientmethodsforhandlingmissingdatabeavailableforanalysingandbuildingmultivariatemodelsfromsuchdata.Onceamodelhasbeenbuilt,itcanbeappliedtofutureprocessdataininferentialcontrolschemestopredictprocessresponses[1,2],orinmultivariatestatisticalprocesscontrolschemestomonitoranddiagnosefuturepro-cessoperatingperformance[3-8].Sincesomefuturemultivariateobservationswillalsohavemissingmeasure-ments,theseapplicationswouldbeoflimitedvalueunlessmethodswereavailabletohandlemissingdata.Inthispaperweconsiderthesecondproblem,thatofusingfuturemultivariateobservationswithmissingdatatoestimatelatentvariablescoresandtopredictresponsesfromanexistingPCAorPLSmodel.Weanalysethepropertiesofvariousalgorithmsforhandlingmissingdatawhentheunderlyingmodelcanbeassumedtobefixedandknown.Theadditionalissuesinvolvediniterativelybuildingthesemodelsfromdatasetswithmissingmeasurementswillbetreatedinasubsequentpaper.Duringmodelbuilding,wheretheloadingvectorsareunknown,missingdataisoftentreatedusingtheNI-PALSalgorithmwhichcomputesonevectoratatime[9,10].Onceamodelhasbeenbuilt,andtheloadingvec-torsdefined,missingdatacanbetreatedinPCAbysimultaneouslyaccountingfortheireffectsinalllatentvari-abledimensions[9,10]byprojectiontothemodelplane.Inthispaper,wefirstdiscussamethodderivedfromtheNIPALSalgorithmformodelbuildingwithmissingdata.Wedesignateitthesinglecomponentprojectionmethod.Wedevelopexpressionsforthescoreestimationerrorarisingfromthemissingdatawiththisalgorithm.Thisanalysisrevealshowerrorsenterandpropagateinthesinglecomponentprojectionmethod,therebyprovid-ingjustificationforusingsimultaneousprojectionmethodsandsupplyinginsightintothesourcesoferrorthatariseduringmodelbuildingwithsequentialmethods.Twoapproacheswhicharenotlimitedtoconsideringasingledirectionatatimearethentreated:(i)projectiontothemodelplaneand(ii)datareplacementusingtheconditionalmean.Themeansquaredscoreestimationerrorsarecalculatedforeachofthemethodswhenap-pliedtosimulationexamplescarefullyconstructedtoaccentuatetheeffectsofcertaintypesoferrors.Finally,themethodsareappliedtoanindustrialsettoillustratehowthemethodsworkinpractice.2.NomenclatureLowercaseboldvariables,bothRomanandGreek,arecolumnvectorsanduppercaseones,matrices.Asu-perscriptasteriskindicatesavectorormatrixwithrowscorrespondingtomissingmea
本文标题:Missing data methods in PCA and PLS Score calculat
链接地址:https://www.777doc.com/doc-3967867 .html