Python jieba.posseg方法代碼示例- 純淨天空

文章推薦指數: 80 %
投票人數:10人

在下文中一共展示了jieba.posseg方法的14個代碼示例,這些例子默認根據受歡迎程度 ... 需要導入模塊: import jieba [as 別名] # 或者: from jieba import posseg [as ... 當前位置:首頁>>代碼示例>>Python>>正文 本文整理匯總了Python中jieba.posseg方法的典型用法代碼示例。

如果您正苦於以下問題:Pythonjieba.posseg方法的具體用法?Pythonjieba.posseg怎麽用?Pythonjieba.posseg使用的例子?那麽恭喜您,這裏精選的方法代碼示例或許可以為您提供幫助。

您也可以進一步了解該方法所在類jieba的用法示例。

在下文中一共展示了jieba.posseg方法的14個代碼示例,這些例子默認根據受歡迎程度排序。

您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於我們的係統推薦出更棒的Python代碼示例。

示例1:cutfunc ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defcutfunc(sentence,_,HMM=True): forw,finjieba.posseg.cut(sentence,HMM): yieldw+posdelim+f開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:5,代碼來源:__main__.py 示例2:__init__ ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] def__init__(self,idf_path=None): self.tokenizer=jieba.dt self.postokenizer=jieba.posseg.dt self.stop_words=self.STOP_WORDS.copy() self.idf_loader=IDFLoader(idf_pathorDEFAULT_IDF) self.idf_freq,self.median_idf=self.idf_loader.get_idf()開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:8,代碼來源:tfidf.py 示例3:testPosseg ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] deftestPosseg(self): importjieba.possegaspseg forcontentintest_contents: result=pseg.cut(content) assertisinstance(result,types.GeneratorType),"TestPossegGeneratorerror" result=list(result) assertisinstance(result,list),"TestPossegerroroncontent:%s"%content print(",".join([w.word+"/"+w.flagforwinresult]),file=sys.stderr) print("testPosseg",file=sys.stderr)開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:11,代碼來源:jieba_test.py 示例4:testPosseg_NOHMM ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] deftestPosseg_NOHMM(self): importjieba.possegaspseg forcontentintest_contents: result=pseg.cut(content,HMM=False) assertisinstance(result,types.GeneratorType),"TestPossegGeneratorerror" result=list(result) assertisinstance(result,list),"TestPossegerroroncontent:%s"%content print(",".join([w.word+"/"+w.flagforwinresult]),file=sys.stderr) print("testPosseg_NOHMM",file=sys.stderr)開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:11,代碼來源:jieba_test.py 示例5:text2ner ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] deftext2ner(text): seq,pos,label=[],[],[] segment=jieba.posseg.cut(text) words,flags=[],[] forseginsegment: words.append(seg.word) flags.append(seg.flag) i=0 tag='O' pre=0#判斷前麵<> sign=0#記錄有多個連續的<> whilei': i+=1 i+=1 returnseq,pos,label開發者ID:baiyyang,項目名稱:medical-entity-recognition,代碼行數:40,代碼來源:predata.py 示例6:test_segment ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] deftest_segment(): """測試疾病名糾錯""" error_sentence_1='這個新藥奧美砂坦脂片能治療心絞痛,效果還可以'#奧美沙坦酯片 print(error_sentence_1) print(segment(error_sentence_1)) importjieba print(list(jieba.tokenize(error_sentence_1))) importjieba.possegaspseg words=pseg.lcut("我愛北京天安門")#jieba默認模式 print('old:',words) #jieba.enable_paddle()#啟動paddle模式。

0.40版之後開始支持,早期版本不支持 #words=pseg.cut("我愛北京天安門",use_paddle=True)#paddle模式 #forword,flaginwords: #print('new:','%s%s'%(word,flag))開發者ID:shibing624,項目名稱:pycorrector,代碼行數:16,代碼來源:tokenizer_test.py 示例7:posseg_cut_examples ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defposseg_cut_examples(self,example): raw_entities=example.get("entities",[]) example_posseg=self.posseg(example.text) for(item_posseg,start,end)inexample_posseg: part_of_speech=self.component_config["part_of_speech"] for(word_posseg,flag_posseg)initem_posseg: ifflag_posseginpart_of_speech: raw_entities.append({ 'start':start, 'end':end, 'value':word_posseg, 'entity':flag_posseg }) returnraw_entities開發者ID:GaoQ1,項目名稱:rasa_nlu_gq,代碼行數:17,代碼來源:jieba_pseg_extractor.py 示例8:posseg ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defposseg(text): #type:(Text)->List[Token] result=[] for(word,start,end)injieba.tokenize(text): pseg_data=[(w,f)for(w,f)inpseg.cut(word)] result.append((pseg_data,start,end)) returnresult開發者ID:GaoQ1,項目名稱:rasa_nlu_gq,代碼行數:10,代碼來源:jieba_pseg_extractor.py 示例9:posseg_cut_examples ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defposseg_cut_examples(self,example): raw_entities=example.get("entities",[]) example_posseg=self.posseg(example.text) for(item_posseg,start,end)inexample_posseg: part_of_speech=self.component_config["part_of_speech"] for(word_posseg,flag_posseg)initem_posseg: ifflag_posseginpart_of_speech: raw_entities.append({ 'start':start, 'end':end, 'value':word_posseg, 'entity':flag_posseg }) returnraw_entities開發者ID:weizhenzhao,項目名稱:rasa_nlu,代碼行數:16,代碼來源:jieba_pseg_extractor.py 示例10:posseg ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defposseg(text): #type:(Text)->List[Token] importjieba importjieba.possegaspseg result=[] for(word,start,end)injieba.tokenize(text): pseg_data=[(w,f)for(w,f)inpseg.cut(word)] result.append((pseg_data,start,end)) returnresult開發者ID:weizhenzhao,項目名稱:rasa_nlu,代碼行數:14,代碼來源:jieba_pseg_extractor.py 示例11:get_n ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defget_n(sentence): words=jieba.posseg.cut(sentence) word_list=[] forword,flaginwords: if'n'inflagorflagin['vn']: word_list.append(word) returnset(word_list)開發者ID:SeanLee97,項目名稱:chinese_reading_comprehension,代碼行數:9,代碼來源:predict.py 示例12:posseg ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defposseg(self,sent,standard_name=False,stopwords=None): ifself.language=='en': fromnltkimportword_tokenize,pos_tag stopwords=set()ifstopwordsisNoneelsestopwords tokens=[wordforwordinword_tokenize(sent)ifwordnotinstopwords] returnpos_tag(tokens,tagset='universal') else: self.standard_name=standard_name entities_info=self.entity_linking(sent) sent2=self.decoref(sent,entities_info) result=[] i=0 forword,flaginpseg.cut(sent2): ifwordinself.entity_types: ifself.standard_name: word=entities_info[i][1][0]#使用鏈接的實體 else: l,r=entities_info[i][0]#或使用原文 word=sent[l:r] flag=entities_info[i][1][1][1:-1] i+=1 else: ifstopwordsandwordinstopwords: continue result.append((word,flag)) returnresult開發者ID:blmoistawinde,項目名稱:HarvestText,代碼行數:28,代碼來源:harvesttext.py 示例13:synonym_cut ​點讚5 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defsynonym_cut(sentence,pattern="wf"): """Cutthesentenceintoasynonymvectortag. 將句子切分為同義詞向量標簽。

Ifawordinthissentencewasnotfoundinthesynonymdictionary, itwillbemarkedwithdefaultvalueofthewordsegmentationtool. 如果同義詞詞典中沒有則標注為切詞工具默認的詞性。

Args: pattern:'w'-分詞,'k'-唯一關鍵詞,'t'-關鍵詞列表,'wf'-分詞標簽,'tf-關鍵詞標簽'。

""" #句尾標點符號過濾 sentence=sentence.rstrip(''.join(punctuation_all)) #句尾語氣詞過濾 sentence=sentence.rstrip(tone_words) synonym_vector=[] ifpattern=="w": synonym_vector=[itemforiteminjieba.cut(sentence)ifitemnotinfilter_characters] elifpattern=="k": synonym_vector=analyse.extract_tags(sentence,topK=1) elifpattern=="t": synonym_vector=analyse.extract_tags(sentence,topK=10) elifpattern=="wf": result=posseg.cut(sentence) #synonym_vector=[(item.word,item.flag)foriteminresult\ #ifitem.wordnotinfilter_characters] #Modifyin2017.4.27 foriteminresult: ifitem.wordnotinfilter_characters: iflen(item.flag)<4: item.flag=list(posseg.cut(item.word))[0].flag synonym_vector.append((item.word,item.flag)) elifpattern=="tf": result=posseg.cut(sentence) tags=analyse.extract_tags(sentence,topK=10) foriteminresult: ifitem.wordintags: synonym_vector.append((item.word,item.flag)) returnsynonym_vector開發者ID:Decalogue,項目名稱:chat,代碼行數:41,代碼來源:semantic.py 示例14:extract_tags ​點讚4 ​ #需要導入模塊:importjieba[as別名] #或者:fromjiebaimportposseg[as別名] defextract_tags(self,sentence,topK=20,withWeight=False,allowPOS=(),withFlag=False): """ ExtractkeywordsfromsentenceusingTF-IDFalgorithm. Parameter: -topK:returnhowmanytopkeywords.`None`forallpossiblewords. -withWeight:ifTrue,returnalistof(word,weight); ifFalse,returnalistofwords. -allowPOS:theallowedPOSlisteg.['ns','n','vn','v','nr']. ifthePOSofwisnotinthislist,itwillbefiltered. -withFlag:onlyworkwithallowPOSisnotempty. ifTrue,returnalistofpair(word,weight)likeposseg.cut ifFalse,returnalistofwords """ ifallowPOS: allowPOS=frozenset(allowPOS) words=self.postokenizer.cut(sentence) else: words=self.tokenizer.cut(sentence) freq={} forwinwords: ifallowPOS: ifw.flagnotinallowPOS: continue elifnotwithFlag: w=w.word wc=w.wordifallowPOSandwithFlagelsew iflen(wc.strip())<2orwc.lower()inself.stop_words: continue freq[w]=freq.get(w,0.0)+1.0 total=sum(freq.values()) forkinfreq: kw=k.wordifallowPOSandwithFlagelsek freq[k]*=self.idf_freq.get(kw,self.median_idf)/total ifwithWeight: tags=sorted(freq.items(),key=itemgetter(1),reverse=True) else: tags=sorted(freq,key=freq.__getitem__,reverse=True) iftopK: returntags[:topK] else: returntags開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:44,代碼來源:tfidf.py 注:本文中的jieba.posseg方法示例由純淨天空整理自Github/MSDocs等源碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。



請為這篇文章評分?