模式识别讲座课件

上传人：29 文档编号：173590382 上传时间：2022-12-11 格式：PPT 页数：79 大小：2MB

收藏版权申诉举报下载

第1页 / 共79页

第2页 / 共79页

第3页 / 共79页

下载文档到电脑，查找使用更方便

20 积分

下载资源

资源描述：

《模式识别讲座课件》由会员分享，可在线阅读，更多相关《模式识别讲座课件（79页珍藏版）》请在装配图网上搜索。

1、1Pattern RecognitionNanyang Technological UniversityDr.Shi,DamingHarbin Engineering University标题添加点击此处输入相关文本内容点击此处输入相关文本内容总体概述点击此处输入相关文本内容标题添加点击此处输入相关文本内容3What is Pattern RecognitionnClassify raw data into the category of the pattern.nA branch of artificial intelligence concerned with the identifica

2、tion of visual or audio patterns by computers.For example character recognition,speech recognition,face recognition,etc.nTwo categories:syntactic(or structural)pattern recognition and statistical pattern recognitionIntroductionPattern Recognition=Pattern Classification45What is Pattern RecognitionTr

3、aining PhaseTraining dataUnknown dataFeature ExtractionLearning(Feature selection,clustering,discriminant function generation,grammar parsing)Recognition(statistical,structural)ResultsRecognition PhaseKnowledge6What is Pattern RecognitionTraining PhaseTraining dataUnknown dataFeature ExtractionLearn

4、ing(Feature selection,clustering,discriminant function generation,grammar parsing)Recognition(statistical,structural)ResultsRecognition PhaseKnowledge7CategorisationnBased on Application AreasnFace RecognitionnSpeech RecognitionnCharacter Recognitionnetc,etcnBased on Decision Making ApproachesnSynta

5、ctic Pattern RecognitionnStatistical Pattern RecognitionIntroduction8Syntactic Pattern RecognitionAny problem is described with formal language,and the solution is obtained through grammatical parsingIn Memory of Prof.FU,King-Sun and Prof.Shu WenhaoIntroduction9Statistical Pattern RecognitionIn the

6、statistical approach,each pattern is viewed as a point in a multi-dimensional space.The decision boundaries are determined by the probability distribution of the patterns belonging to each class,which must either be specified or learned.Introduction10Scope of the SeminarnModule 1 Distance-Based Clas

7、sificationnModule 2 Probabilistic ClassificationnModule 3 Linear Discriminant AnalysisnModule 4 Neural Networks for P.R.nModule 5 ClusteringnModule 6 Feature SelectionIntroduction11Module 1 Distance-Based ClassificationNanyang Technological UniversityDr.Shi,DamingHarbin Engineering UniversityPattern

8、 Recognition12OverviewnDistance based classification is the most common type of pattern recognition techniquenConcepts are a basis for other classification techniquesnFirst,a prototype is chosen through training to represent a classnThen,the distance is calculated from an unknown data to the class u

9、sing the prototypeDistance-Based Classification13Classification by distanceObjects can be represented by vectors in a space.In training,we have the samples:In recognition,an unknown data is classified by distance:Distance-Based Classification14PrototypenTo find the pattern-to-class distance,we need

10、to use a class prototype(pattern):(1)Sample Mean.For class ci,(2)Most Typical Sample.chooseSuch thatis minimized.Distance-Based Classification15Prototype Nearest Neighbour(3)Nearest Neighbour.chooseSuch thatis minimized.Nearest neighbour prototypes are sensitive to noise and outliers in the training

11、 set.Distance-Based Classification16Prototype k-NN(4)k-Nearest Neighbours.K-NN is more robust against noise,but is more computationally expensive.The pattern y is classified in the class of its k nearest neighbours from the training samples.The chosen distance determines how near is defined.Distance

12、-Based Classification17Distance MeasuresnMost familiar distance metric is the Euclidean distancenAnother example is the Manhattan distance:nMany other distance measures Distance-Based Classification18Minimum Euclidean Distance(MED)ClassifierEquivalently,19Decision BoundaryGiven a prototype and a dis

13、tance metric,it is possible to find the decision boundary between classes.Linear boundaryNonlinear boundaryDecision Boundary=Discriminant FunctionDistance-Based Classificationlightnesslengthlightnesslength20ExampleDistance-Based Classification21ExampleAny fish is a vector in the 2-dimensional space

14、of width and lightness.21xxfishDistance-Based Classificationlightnesslength22ExampleDistance-Based Classification23SummarynClassification by the distance from an unknown data to class prototypes.nChoosing prototype:nSample MeannMost Typical SamplenNearest NeighbournK-Nearest NeighbournDecision Bound

15、ary=Discriminant FunctionDistance-Based Classification24Module 2 Probabilistic ClassificationNanyang Technological UniversityDr.Shi,DamingHarbin Engineering UniversityPattern Recognition25Review and Extend26Maximum A Posterior(MAP)ClassifiernIdeally,we want to favour the class with the highest proba

16、bility for the given pattern:Where P(Ci|x)is the a posterior probability of class Ci given x27Bayesian ClassificationnBayes Theoreom:Where P(x|Ci)is the class conditional probability density(p.d.f),which needs to be estimated from the available samples or otherwise assumed.Where P(Ci)is a priori pro

17、bability of class Ci.Probabilistic Classification28MAP ClassifiernBayesian Classifier,also known as MAP ClassifierSo,assign the pattern x to the class with maximum weighted p.d.f.Probabilistic Classification29Accuracy VS.RiskHowever,in the real world,life is not just about accuracy.In some cases,a s

18、mall misclassification may result in a big disaster.For example,medical diagnosis,fraud detection.The MAP classifier is biased towards the most likely class.maximum likelihood classification.Probabilistic Classification30Loss FunctionOn the other hand,in the case of P(C1)P(C2),the lowest error rate

19、can be attained by always classifying as C1A solution is to assign a loss to misclassification.which leads to Also known as the problem of imbalanced training data.Probabilistic Classification31Conditional RiskInstead of using the likelihood P(Ci|x),we use conditional riskcost of action i given clas

20、s j To minimize overall risk,choose the action with the lowest risk for the pattern:Probabilistic Classification32Conditional RiskProbabilistic Classification33ExampleAssuming that the amount of fraudulent activity is about1%of the total credit card activity:C1=Fraud P(C1)=0.01C2=No fraud P(C2)=0.99

21、If losses are equal for misclassification,then:Probabilistic Classification34ExampleHowever,losses are probably not the same.Classifying a fraudulent transaction as legitimate leads to direct dollar losses as well as intangible losses(e.g.reputation,hassles for consumers).Classifying a legitimate tr

22、ansaction as fraudulent inconveniences consumers,as their purchases are denied.This could lead to loss of future business.Lets assume that the ratio of loss for not fraud to fraud is 1 to 50,i.e.,A missed fraud is 50 times more expensive than accidentally freezing a card due to legitimate use.Probab

23、ilistic Classification35ExampleBy including the loss function,the decision boundaries change significantly.Instead of We use Probabilistic Classification36Probability Density FunctionRelatively speaking,its much easy to estimate a priori probability,e.g.simply takekkiiNNCP)(To estimate p.d.f.,we can

24、(1)Assume a known p.d.f,and estimate its parameters(2)Estimate the non-parametric p.d.f from training samplesProbabilistic Classification37Maximum Likelihood Parameter EstimationnWithout the loss of generality,we consider Gaussian density.P(x|Ci)=Training examples for class CiParameter values to be

25、identifiedWe are looking forthat maximize the likelihood,so,The sample covariance matrix!38Density Estimationnif we do not know the specific form of the p.d.f.,then we need a different density estimation approach which is a non-parametric technique that uses variations of histogram approximation.(1)

26、Simplest density estimation is to use“bins”.e.g.,in 1-D case,take the x-axis and divide into bins of length h.Estimate the probability of a sample in each bin.kN is the number of samples in the bin(2)Alternatively,we can take windows of unit volume and apply these windows to each sample.The overlap

27、of the windows defines the estimated p.d.f.This technique is known as Parzen windows or kernels.Probabilistic Classification39SummarynBayesian Theoreom nMaximum A Posterior Classifier=Maximum Likelihood classifernDensity EstimationProbabilistic Classification40Module 3 Linear Discriminant AnalysisNa

28、nyang Technological UniversityDr.Shi,DamingHarbin Engineering UniversityPattern Recognition41Linear Classifier-1A linear classifier implements discriminant function or a decision boundary represented by a straight line in the multidimensional space.Given an input,x=(x1 xm)Tthe decision boundary of a

29、 linear classifier is given by a discriminant functionWith weight vector w=(w1 wm)TbxwbxfmkkkT1)(xwLDA42Linear Classifier-2The output of the function f(x)for any input will depend upon the value of weight vector and input vector.For example,the following class definition may be employed:If f(x)0 The

30、n x is Ballet dancerIf f(x)0 Then x is Rugby playerLDA43Linear Classifier-3bxfTxw)(x1x2f(x)0f(x)pel is cytoplasmIf value pel is nucleusthis is clustering based on density estimation.peaks=cluster centres.valleys=cluster boundariesClustering63Parameterized Density EstimationWe shall begin with parame

31、terized p.d.f.,in which the only thing that must be learned is the value of an unknown parameter vectornWe make the following assumptions:nThe samples come from a known number c of classesnThe prior probabilities P(j)for each class are known nP(x|j,j)(j=1,c)are knownnThe values of the c parameter ve

32、ctors 1,2,c are unknownClustering64Mixture DensitynThe category labels are unknown,and this density function is called a mixture density,andnOur goal will be to use samples drawn from this mixture density to estimate the unknown parameter vector.nOnce is known,we can decompose the mixture into its c

33、omponents and use a MAP classifier on the derived densities.tc21c1jparameters mixingjdensities componentjj),.,(where)(P.),|x(P)|x(P Clustering65Chinese Ying-Yang PhilosophynEverything in the universe can be viewed as a product of a constant conflict between the opposites Ying and Yang.Yingnegativefe

34、maleinvisiblepositivemalevisibleYangn The optimal status is reached if Ying-Yang achieves harmonyClustering66Bayesian Ying-Yang ClusteringnTo find a clusters y to partition input data xnx is visible but y is invisiblenx decides y in training but y decides x in runningp(x,y)=p(y|x)p(x)p(x,y)=p(x|y)p(

35、y)xyp(,)Clustering67Bayesian Ying Yang Harmony Learning(1)nTo minimise the difference between the Ying-Yang pair:)()|(),(xpxypMMKLYangMYangYingdxdyypyxpxpxypYingYangMM)()|()()|(lndxdymxGxpxypyyyMYang),()()|(lnnTo select the optimal model(cluster number):)()(minargkJkHkkwherekyyykyyNikykJxypxypNkH111

37、yP yxxmNClustering69SummarynClustering by DistancenGoodness of paretitioningnK-meansnClustering by Density EstimationnBYYClustering70Module 6 Feature SelectionNanyang Technological UniversityDr.Shi,DamingHarbin Engineering UniversityPattern Recognition71MotivationFeature SelectionClassifier performa

38、nce depend on a combination of the number of samples,number of features,and complexity of the classifier.Q1:The more samples,the better?Q2:The more features,the better?Q3:The more complex,the better?However,the number of samples is fixed when trainingBoth requires to reduce the number of features72C

39、urse of Dimensionalityif the number of training samples is small relative to the number of features,the performance may be degraded.Because:with the increase of the number of features,the number of unknown parameters will be increased accordingly,then the reliability of the parameter estimation decr

40、eases.Feature Selection73Ocams Razor Hypothesisor plurality should not be posited without necessity.Pluralitas non est ponenda sine neccesitate-William of Ockham(ca.1285-1349).To make the system simpler,unnecessary features must be removed.Feature Selection74Feature SelectionIn general,we would like

41、 to have a classifier to use a minimum number of dimensions,in order to achieve:-less computations-statistical estimation reliabilityFeature Selection:Given m measurements,choose nm best as featureWe require:A criterion to evaluate featuresAn algorithm to optimize the criterionFeature Selection75Cri

42、terionTypically,Interclass Distance(normalized by intraclass distance)2 classes:where mi1=mean of ith feature of class 1 si1=scatter(variance)of ith feature in class 1k classes:Feature Selection76OptimizingnChoosing n features from m measurements,the combinations are nUsually an exhaustive compariso

43、n is not feasible.nSome sub-optimal strategies include:nRank features by effectiveness and choose bestnIncrementally add features to set of chosen featuresnSuccessively add and delete features to chosen setFeature Selection问题提问与解答问答HERE COMES THE QUESTION AND ANSWER SESSION结束语 CONCLUSION感谢参与本课程，也感激大家对我们工作的支持与积极的参与。课程后会发放课程满意度评估表，如果对我们课程或者工作有什么建议和意见，也请写在上边，来自于您的声音是对我们最大的鼓励和帮助，大家在填写评估表的同时，也预祝各位步步高升，真心期待着再次相会！感谢聆听The user can demonstrate on a projector or computer,or print the presentation and make it into a film讲师：XXXX日期：20XX.X月

展开阅读全文

温馨提示:
1: 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

备案号:蜀ICP备2024067431号-1 川公网安备51140202000466号

本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知装配图网，我们立即给予删除！

模式识别讲座课件

最新文档

相关资源

相关搜索