Difference between revisions of "DIRT Paraphrase Collection"

From ACL Wiki
Jump to navigation Jump to search
m (Reverted edits by Edward518 (Talk); changed back to last version by Ppantel)
 
Line 27: Line 27:
 
[[Category:Knowledge Collections and Datasets]]
 
[[Category:Knowledge Collections and Datasets]]
 
[[Category:Textual Entailment Portal]]
 
[[Category:Textual Entailment Portal]]
北京万达火车票预定中心
 
 
[http://www.huochepiao168.cn 火车票] [http://www.huochepiao168.cn 订火车票] [http://www.huochepiao168.cn 北京火车票] [http://www.huochepiao168.cn 火车票预定]
 
[http://www.huochepiao168.cn 火车票预订] [http://www.huochepiao168.cn 火车票查询]
 
[http://www.huochepiao168.cn 北京火车票预定] [http://www.huochepiao168.cn 北京火车票查询]
 
[http://www.huochepiao168.cn 北京火车票预订]
 
[http://www.chepiao168.cn 火车票] [http://www.chepiao168.cn 订火车票] [http://www.chepiao168.cn 北京火车票] [http://www.chepiao168.cn 火车票预定]
 
[http://www.chepiao168.cn 火车票预订] [http://www.chepiao168.cn 火车票查询]
 
[http://www.chepiao168.cn 北京火车票预定] [http://www.chepiao168.cn 北京火车票查询]
 
[http://www.chepiao168.cn 北京火车票预订]
 
[http://www.shdzbc.net.cn 搬场]  [http://www.shdzbc.net.cn 搬家] [http://www.shdzbc.net.cn 上海搬场]
 
[http://www.shdzbc.net.cn 上海搬场公司][http://www.shdzbc.net.cn 上海搬场] [http://www.shdzbc.net.cn 搬家公司]
 
[http://www.shdzbc.net.cn 上海搬家公司] [http://www.shdzbc.net.cn 上海搬家]
 
[http://www.hunqing666.com 婚庆] [http://www.hunqing666.com 婚庆公司] [http://www.hunqing666.com 婚庆网]
 
[http://www.digseo.net 搜索引擎优化] [http://www.digseo.net 网络营销]
 

Latest revision as of 07:43, 8 January 2008

DIRT (Discovery of Inference Rules from Text) is both an algorithm and a resulting knowledge collection created by Dekang Lin and Patrick Pantel at the University of Alberta. The algorithm automatically learns paraphrase expressions from text using the Distributional Hypothesis over paths in dependency trees. A path, extracted from a parse tree, is an expression that represents a binary relationship between two nouns. In short, if two paths tend to link the same sets of words, DIRT hypothesizes that the meanings of the corresponding patterns are similar.

The DIRT knowledge collection is the output of the DIRT algorithm over a 1GB set of newspaper text (San Jose Mercury, Wall Street Journal and AP Newswire from the TREC-9 collection). It extracted 7 million paths from the parse trees (231,000 unique) from which paraphrases were generated. For example, here are the Top-20 paraphrases "X solves Y" generated by DIRT:

Y is solved by X, X resolves Y, X finds a solution to Y, X tries to solve Y, X deals with Y, Y is resolved by X, X addresses Y, X seeks a solution to Y, X does something about Y, X solution to Y, Y is resolved in X, Y is solved through X, X rectifies Y, X copes with Y, X overcomes Y, X eases Y, X tackles Y, X alleviates Y, X corrects Y, X is a solution to Y, X makes Y worse, X irons out Y

Acquiring the Resource

The DIRT knowledge collection is available for research purposes by contacting its authors.

Demos

References

Please refer to the following publication when using this resource:

  • Dekang Lin and Patrick Pantel. 2001. Discovery of Inference Rules for Question Answering. Natural Language Engineering 7(4):343-360.

Patents

Discovery of Inference Rules from Text. Dekang Lin and Patrick Pantel. US Patent – A facility for discovering a set of inference rules (or paraphrases) by analyzing a corpus of natural language text.

Authors

Dekang Lin

Patrick Pantel