Python jieba.posseg方法代碼示例- 純淨天空
文章推薦指數: 80 %
在下文中一共展示了jieba.posseg方法的14個代碼示例,這些例子默認根據受歡迎程度 ... 需要導入模塊: import jieba [as 別名] # 或者: from jieba import posseg [as ...
當前位置:首頁>>代碼示例>>Python>>正文
本文整理匯總了Python中jieba.posseg方法的典型用法代碼示例。
如果您正苦於以下問題:Pythonjieba.posseg方法的具體用法?Pythonjieba.posseg怎麽用?Pythonjieba.posseg使用的例子?那麽恭喜您,這裏精選的方法代碼示例或許可以為您提供幫助。
您也可以進一步了解該方法所在類jieba的用法示例。
在下文中一共展示了jieba.posseg方法的14個代碼示例,這些例子默認根據受歡迎程度排序。
您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於我們的係統推薦出更棒的Python代碼示例。
示例1:cutfunc
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defcutfunc(sentence,_,HMM=True):
forw,finjieba.posseg.cut(sentence,HMM):
yieldw+posdelim+f開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:5,代碼來源:__main__.py
示例2:__init__
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
def__init__(self,idf_path=None):
self.tokenizer=jieba.dt
self.postokenizer=jieba.posseg.dt
self.stop_words=self.STOP_WORDS.copy()
self.idf_loader=IDFLoader(idf_pathorDEFAULT_IDF)
self.idf_freq,self.median_idf=self.idf_loader.get_idf()開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:8,代碼來源:tfidf.py
示例3:testPosseg
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
deftestPosseg(self):
importjieba.possegaspseg
forcontentintest_contents:
result=pseg.cut(content)
assertisinstance(result,types.GeneratorType),"TestPossegGeneratorerror"
result=list(result)
assertisinstance(result,list),"TestPossegerroroncontent:%s"%content
print(",".join([w.word+"/"+w.flagforwinresult]),file=sys.stderr)
print("testPosseg",file=sys.stderr)開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:11,代碼來源:jieba_test.py
示例4:testPosseg_NOHMM
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
deftestPosseg_NOHMM(self):
importjieba.possegaspseg
forcontentintest_contents:
result=pseg.cut(content,HMM=False)
assertisinstance(result,types.GeneratorType),"TestPossegGeneratorerror"
result=list(result)
assertisinstance(result,list),"TestPossegerroroncontent:%s"%content
print(",".join([w.word+"/"+w.flagforwinresult]),file=sys.stderr)
print("testPosseg_NOHMM",file=sys.stderr)開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:11,代碼來源:jieba_test.py
示例5:text2ner
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
deftext2ner(text):
seq,pos,label=[],[],[]
segment=jieba.posseg.cut(text)
words,flags=[],[]
forseginsegment:
words.append(seg.word)
flags.append(seg.flag)
i=0
tag='O'
pre=0#判斷前麵<>
sign=0#記錄有多個連續的<>
whilei
0.40版之後開始支持,早期版本不支持
#words=pseg.cut("我愛北京天安門",use_paddle=True)#paddle模式
#forword,flaginwords:
#print('new:','%s%s'%(word,flag))開發者ID:shibing624,項目名稱:pycorrector,代碼行數:16,代碼來源:tokenizer_test.py
示例7:posseg_cut_examples
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defposseg_cut_examples(self,example):
raw_entities=example.get("entities",[])
example_posseg=self.posseg(example.text)
for(item_posseg,start,end)inexample_posseg:
part_of_speech=self.component_config["part_of_speech"]
for(word_posseg,flag_posseg)initem_posseg:
ifflag_posseginpart_of_speech:
raw_entities.append({
'start':start,
'end':end,
'value':word_posseg,
'entity':flag_posseg
})
returnraw_entities開發者ID:GaoQ1,項目名稱:rasa_nlu_gq,代碼行數:17,代碼來源:jieba_pseg_extractor.py
示例8:posseg
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defposseg(text):
#type:(Text)->List[Token]
result=[]
for(word,start,end)injieba.tokenize(text):
pseg_data=[(w,f)for(w,f)inpseg.cut(word)]
result.append((pseg_data,start,end))
returnresult開發者ID:GaoQ1,項目名稱:rasa_nlu_gq,代碼行數:10,代碼來源:jieba_pseg_extractor.py
示例9:posseg_cut_examples
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defposseg_cut_examples(self,example):
raw_entities=example.get("entities",[])
example_posseg=self.posseg(example.text)
for(item_posseg,start,end)inexample_posseg:
part_of_speech=self.component_config["part_of_speech"]
for(word_posseg,flag_posseg)initem_posseg:
ifflag_posseginpart_of_speech:
raw_entities.append({
'start':start,
'end':end,
'value':word_posseg,
'entity':flag_posseg
})
returnraw_entities開發者ID:weizhenzhao,項目名稱:rasa_nlu,代碼行數:16,代碼來源:jieba_pseg_extractor.py
示例10:posseg
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defposseg(text):
#type:(Text)->List[Token]
importjieba
importjieba.possegaspseg
result=[]
for(word,start,end)injieba.tokenize(text):
pseg_data=[(w,f)for(w,f)inpseg.cut(word)]
result.append((pseg_data,start,end))
returnresult開發者ID:weizhenzhao,項目名稱:rasa_nlu,代碼行數:14,代碼來源:jieba_pseg_extractor.py
示例11:get_n
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defget_n(sentence):
words=jieba.posseg.cut(sentence)
word_list=[]
forword,flaginwords:
if'n'inflagorflagin['vn']:
word_list.append(word)
returnset(word_list)開發者ID:SeanLee97,項目名稱:chinese_reading_comprehension,代碼行數:9,代碼來源:predict.py
示例12:posseg
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defposseg(self,sent,standard_name=False,stopwords=None):
ifself.language=='en':
fromnltkimportword_tokenize,pos_tag
stopwords=set()ifstopwordsisNoneelsestopwords
tokens=[wordforwordinword_tokenize(sent)ifwordnotinstopwords]
returnpos_tag(tokens,tagset='universal')
else:
self.standard_name=standard_name
entities_info=self.entity_linking(sent)
sent2=self.decoref(sent,entities_info)
result=[]
i=0
forword,flaginpseg.cut(sent2):
ifwordinself.entity_types:
ifself.standard_name:
word=entities_info[i][1][0]#使用鏈接的實體
else:
l,r=entities_info[i][0]#或使用原文
word=sent[l:r]
flag=entities_info[i][1][1][1:-1]
i+=1
else:
ifstopwordsandwordinstopwords:
continue
result.append((word,flag))
returnresult開發者ID:blmoistawinde,項目名稱:HarvestText,代碼行數:28,代碼來源:harvesttext.py
示例13:synonym_cut
點讚5
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defsynonym_cut(sentence,pattern="wf"):
"""Cutthesentenceintoasynonymvectortag.
將句子切分為同義詞向量標簽。
Ifawordinthissentencewasnotfoundinthesynonymdictionary,
itwillbemarkedwithdefaultvalueofthewordsegmentationtool.
如果同義詞詞典中沒有則標注為切詞工具默認的詞性。
Args:
pattern:'w'-分詞,'k'-唯一關鍵詞,'t'-關鍵詞列表,'wf'-分詞標簽,'tf-關鍵詞標簽'。
"""
#句尾標點符號過濾
sentence=sentence.rstrip(''.join(punctuation_all))
#句尾語氣詞過濾
sentence=sentence.rstrip(tone_words)
synonym_vector=[]
ifpattern=="w":
synonym_vector=[itemforiteminjieba.cut(sentence)ifitemnotinfilter_characters]
elifpattern=="k":
synonym_vector=analyse.extract_tags(sentence,topK=1)
elifpattern=="t":
synonym_vector=analyse.extract_tags(sentence,topK=10)
elifpattern=="wf":
result=posseg.cut(sentence)
#synonym_vector=[(item.word,item.flag)foriteminresult\
#ifitem.wordnotinfilter_characters]
#Modifyin2017.4.27
foriteminresult:
ifitem.wordnotinfilter_characters:
iflen(item.flag)<4:
item.flag=list(posseg.cut(item.word))[0].flag
synonym_vector.append((item.word,item.flag))
elifpattern=="tf":
result=posseg.cut(sentence)
tags=analyse.extract_tags(sentence,topK=10)
foriteminresult:
ifitem.wordintags:
synonym_vector.append((item.word,item.flag))
returnsynonym_vector開發者ID:Decalogue,項目名稱:chat,代碼行數:41,代碼來源:semantic.py
示例14:extract_tags
點讚4
#需要導入模塊:importjieba[as別名]
#或者:fromjiebaimportposseg[as別名]
defextract_tags(self,sentence,topK=20,withWeight=False,allowPOS=(),withFlag=False):
"""
ExtractkeywordsfromsentenceusingTF-IDFalgorithm.
Parameter:
-topK:returnhowmanytopkeywords.`None`forallpossiblewords.
-withWeight:ifTrue,returnalistof(word,weight);
ifFalse,returnalistofwords.
-allowPOS:theallowedPOSlisteg.['ns','n','vn','v','nr'].
ifthePOSofwisnotinthislist,itwillbefiltered.
-withFlag:onlyworkwithallowPOSisnotempty.
ifTrue,returnalistofpair(word,weight)likeposseg.cut
ifFalse,returnalistofwords
"""
ifallowPOS:
allowPOS=frozenset(allowPOS)
words=self.postokenizer.cut(sentence)
else:
words=self.tokenizer.cut(sentence)
freq={}
forwinwords:
ifallowPOS:
ifw.flagnotinallowPOS:
continue
elifnotwithFlag:
w=w.word
wc=w.wordifallowPOSandwithFlagelsew
iflen(wc.strip())<2orwc.lower()inself.stop_words:
continue
freq[w]=freq.get(w,0.0)+1.0
total=sum(freq.values())
forkinfreq:
kw=k.wordifallowPOSandwithFlagelsek
freq[k]*=self.idf_freq.get(kw,self.median_idf)/total
ifwithWeight:
tags=sorted(freq.items(),key=itemgetter(1),reverse=True)
else:
tags=sorted(freq,key=freq.__getitem__,reverse=True)
iftopK:
returntags[:topK]
else:
returntags開發者ID:deepcs233,項目名稱:jieba_fast,代碼行數:44,代碼來源:tfidf.py
注:本文中的jieba.posseg方法示例由純淨天空整理自Github/MSDocs等源碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。
延伸文章資訊
- 1Python - 知名Jieba 中文斷詞工具教學
透過jiba.posseg.cut () 可以將句子中的每個斷詞進行詞性標註。 程式碼:. 1 2 3, words = ...
- 2jieba分詞詳解_鴻煊的學習筆記
jieba分詞詳解. ... 4、jieba分詞所涉及到的HMM、TextRank、TF-IDF等演算法介紹 ... import jieba.posseg as posseg text = "...
- 3jieba 词性标注& 并行分词| 计算机科学论坛 - LearnKu
jieba 词性标注# 新建自定义分词器jieba.posseg.POSTokenizer(tokenizer=None) # 参数可指定内部使用的jieba.Tokenizer 分词器。 ji...
- 4Python jieba.posseg方法代碼示例- 純淨天空
在下文中一共展示了jieba.posseg方法的14個代碼示例,這些例子默認根據受歡迎程度 ... 需要導入模塊: import jieba [as 別名] # 或者: from jieba i...
- 5lcut、posseg.cut - jieba的几个分词接口 - CSDN博客
posseg jieba.posseg.cut(s) # <generator object cut at 0x10cc80eb0> list(jieba ...