Hypergraph Format: Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
Srush (talk | contribs)
No edit summary
Srush (talk | contribs)
No edit summary
Line 3: Line 3:
[http://www.json.org/ JSON Description]
[http://www.json.org/ JSON Description]


Pro:
* Implementations in every language (often packaged with language).
* Human readable
Con:
* Space inefficiency


== Protocol Buffers ==
== Protocol Buffers ==
Line 9: Line 15:


[http://github.com/srush/hypergraph Implementation Sketch]
[http://github.com/srush/hypergraph Implementation Sketch]
Pro:
* Conversion to and from JSON ([http://code.google.com/p/protobuf-json/ protobuf-json])
* Very fast to read (particularly in C++ and Java, hopefully soon in python)
* Very space efficient
* Implementations in every language (although requires a separate library)
Con:
* "It's really easy to get up to some of the data size
limits that are in place to prevent malicious data from having the PB
parser allocate too much memory"
* "You typically have to create a full hypergraph protocol buffer object before you can serialize it, so you either have to use the PB data structures
internally in your code or you have to copy your data structure. While
doing this copy, you can end up with two copies of the forest in
memory, which is bad for memory usage."


== SLF (Standard Lattice Format) ==
== SLF (Standard Lattice Format) ==


[http://labrosa.ee.columbia.edu/doc/HTKBook21/node257.html SLF Specification]
[http://labrosa.ee.columbia.edu/doc/HTKBook21/node257.html SLF Specification]

Revision as of 01:43, 7 November 2010

JSON

JSON Description

Pro:

  • Implementations in every language (often packaged with language).
  • Human readable

Con:

  • Space inefficiency

Protocol Buffers

Protocol Buffer Description

Implementation Sketch

Pro:

  • Conversion to and from JSON (protobuf-json)
  • Very fast to read (particularly in C++ and Java, hopefully soon in python)
  • Very space efficient
  • Implementations in every language (although requires a separate library)

Con:

  • "It's really easy to get up to some of the data size

limits that are in place to prevent malicious data from having the PB parser allocate too much memory"

  • "You typically have to create a full hypergraph protocol buffer object before you can serialize it, so you either have to use the PB data structures

internally in your code or you have to copy your data structure. While doing this copy, you can end up with two copies of the forest in memory, which is bad for memory usage."

SLF (Standard Lattice Format)

SLF Specification