<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=KEvang</id>
	<title>ACL Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.aclweb.org/aclwiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=KEvang"/>
	<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/Special:Contributions/KEvang"/>
	<updated>2026-04-10T12:51:55Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.6</generator>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=11032</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=11032"/>
		<updated>2015-04-21T12:27:49Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (CCG). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine (which should have multiple cores and at least around 40 GB of main memory). The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]] based on instructions from Tim Dawborn; thanks are due to Tim and also to Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.unix clean&lt;br /&gt;
 make -f Makefile.unix&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.unix all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=11031</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=11031"/>
		<updated>2015-04-21T12:27:34Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (CCG). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine (which should have multiple cores and at least around 40 GB of main memory). The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]] based on instructions from Tim Dawborn; thanks are due to Tim and also to Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.unix clean&lt;br /&gt;
 make -f Makefile.unix&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.unix all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10245</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10245"/>
		<updated>2013-09-10T10:55:08Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (CCG). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine (which should have multiple cores and at least around 40 GB of main memory). The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]] based on instructions from Tim Dawborn; thanks are due to Tim and also to Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10244</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10244"/>
		<updated>2013-09-10T10:54:11Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (CCG). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine (which should have multiple cores and at least around 40 GB of main memory). The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]]; thanks are due to Tim Dawborn, Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10243</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10243"/>
		<updated>2013-09-10T10:46:51Z</updated>

		<summary type="html">&lt;p&gt;KEvang: typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (CCG). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine. The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]]; thanks are due to Tim Dawborn, Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10242</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10242"/>
		<updated>2013-09-09T15:45:16Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (C&amp;amp;C). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine. The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]]; thanks are due to Tim Dawborn, Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10241</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10241"/>
		<updated>2013-09-09T15:44:28Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (C&amp;amp;C). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly different story. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine. The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]]; thanks are due to Tim Dawborn, Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2 # &lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10240</id>
		<title>Training the C&amp;C Parser</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Training_the_C%26C_Parser&amp;diff=10240"/>
		<updated>2013-09-09T15:39:12Z</updated>

		<summary type="html">&lt;p&gt;KEvang: Created page with &amp;quot;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (C&amp;amp;C). It is quite easy to us...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [http://svn.ask.it.usyd.edu.au/trac/candc C&amp;amp;C Parser] is an advanced statistical parser using the framework of Combinatory Categorial Grammar (C&amp;amp;C). It is quite easy to use with pre-trained models, but creating one&#039;s own models is a slightly differnt stories. Although the software is distributed with a wealth of scripts that should make training easy, differences between systems and dependencies on various libraries make the task of getting the training code to work a bit daunting. The following are terse but detailed step-by-step instructions to replicate the (almost) exact figures reported in Clark&amp;amp;Curran (2007)&amp;lt;ref&amp;gt;Stephen Clark and James Curran (2007): Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. In &amp;lt;i&amp;gt;Computational Linguistics 33(4)&amp;lt;/i&amp;gt;, http://aclweb.org/anthology-new/J/J07/J07-4004.pdf&amp;lt;/ref&amp;gt; on a single &#039;&#039;&#039;64-bit Ubuntu 12.04&#039;&#039;&#039; machine. The steps to take on other recent Linux distributions should be very similar.&lt;br /&gt;
&lt;br /&gt;
Please extend the instructions with more detail, helpful hints and notes on other operating systems! They were initially written up by [[User:KEvang|Kilian Evang]]; thanks are due to Tim Dawborn, Stephen Clark and James Curran for advice without which I would probably never have gotten it to run.&lt;br /&gt;
&lt;br /&gt;
 # Customize these variables:&lt;br /&gt;
 export CANDC_PREFIX=$HOME&lt;br /&gt;
 export CCGBANK=$HOME/data/CCGbank1.2&lt;br /&gt;
 export TMPDIR=$HOME/tmp # the default /tmp is often on a tiny filesystem&lt;br /&gt;
 export NUMNODES=32&lt;br /&gt;
 export LIB=/usr/lib&lt;br /&gt;
 &lt;br /&gt;
 # Some variables for use below:&lt;br /&gt;
 export CANDC=$CANDC_PREFIX/candc&lt;br /&gt;
 export SCRIPTS=$CANDC/src/scripts/ccg&lt;br /&gt;
 export EXT=$CANDC/ext&lt;br /&gt;
 &lt;br /&gt;
 # Package dependencies:&lt;br /&gt;
 sudo apt-get install g++ gawk libibumad-dev mpich2 subversion&lt;br /&gt;
 &lt;br /&gt;
 # Check out the C&amp;amp;C tools.&lt;br /&gt;
 # You need credentials for that, see&lt;br /&gt;
 # http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Subversion&lt;br /&gt;
 cd $CANDC_PREFIX&lt;br /&gt;
 svn checkout http://svn.ask.it.usyd.edu.au/candc/trunk candc -r 2400&lt;br /&gt;
 &lt;br /&gt;
 # Some patches to fix various problems with the scripts provided:&lt;br /&gt;
 &lt;br /&gt;
 # Use a temp directory different from /tmp since that often doesn&#039;t have enough&lt;br /&gt;
 # space:&lt;br /&gt;
 sed -i -e &amp;quot;s|/tmp|$TMPDIR|&amp;quot; $SCRIPTS/*_model_*&lt;br /&gt;
 &lt;br /&gt;
 # Replace /bin/env by /usr/bin/env&lt;br /&gt;
 sed -i -e &amp;quot;s|/bin/env|/usr/bin/env|&amp;quot; $SCRIPTS/lexicon_features \&lt;br /&gt;
         $SCRIPTS/count_features&lt;br /&gt;
 &lt;br /&gt;
 # Work around non-portable sed -f shebang&lt;br /&gt;
 sed -i -e &#039;s|$SCRIPTS/convert_brackets|sed -f $SCRIPTS/convert_brackets|g&#039; \&lt;br /&gt;
         $SCRIPTS/create_data&lt;br /&gt;
 &lt;br /&gt;
 # TODO patches to make the scripts work with the LDC version of CCGbank should&lt;br /&gt;
 # go here.&lt;br /&gt;
 &lt;br /&gt;
 # Make ext directory&lt;br /&gt;
 mkdir $EXT&lt;br /&gt;
 &lt;br /&gt;
 # Install Boost library (Ubuntu doesn&#039;t seem to have a version that is compiled&lt;br /&gt;
 # against MPICH2).&lt;br /&gt;
 echo &#039;using mpi ;&#039; &amp;gt; ~/user-config.jam # Boost&#039;s build script won&#039;t build MPI&lt;br /&gt;
        # library without this for some reason&lt;br /&gt;
 mkdir $EXT/install&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget https://dl.dropboxusercontent.com/u/5358991/boost_1_53_0.tar.gz # or&lt;br /&gt;
        # get it from Sourceforge&lt;br /&gt;
 tar -xzf boost_1_53_0.tar.gz&lt;br /&gt;
 cd boost_1_53_0&lt;br /&gt;
 ./bootstrap.sh --with-libraries=mpi --prefix=$EXT&lt;br /&gt;
 ./b2 install&lt;br /&gt;
 &lt;br /&gt;
 # Install ancient MR-MPI C&amp;amp;C depends on&lt;br /&gt;
 cd $EXT/install&lt;br /&gt;
 wget http://sydney.edu.au/it/~tdaw3088/misc/mrmpi-22Apr09.tbz2 # If this link is&lt;br /&gt;
         # dead, try http://dl.dropbox.com/u/5358991/mrmpi-22Apr09.tbz2&lt;br /&gt;
 tar jxf mrmpi-22Apr09.tbz2&lt;br /&gt;
 cd mrmpi-22Apr09/src&lt;br /&gt;
 make -f Makefile.linux clean&lt;br /&gt;
 make -f Makefile.linux&lt;br /&gt;
 cp *.h $EXT/include&lt;br /&gt;
 cp libmrmpi.a $EXT/lib&lt;br /&gt;
 &lt;br /&gt;
 # Build C&amp;amp;C&lt;br /&gt;
 cd $CANDC&lt;br /&gt;
 make -f Makefile.linux all train bin/generate&lt;br /&gt;
 &lt;br /&gt;
 # Create data&lt;br /&gt;
 # Will only work with CCGbank 1.2 for now, not with LDC version of CCGbank&lt;br /&gt;
 $SCRIPTS/create_data $CCGBANK $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Train the POS tagger and Supertagger:&lt;br /&gt;
 $SCRIPTS/train_taggers working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the supertagger model to ensure its results are sane:&lt;br /&gt;
 $SCRIPTS/cl07_table4 working/&lt;br /&gt;
 &lt;br /&gt;
 # Create the model_hybrid directory and empty config file:&lt;br /&gt;
 mkdir working/model_hybrid&lt;br /&gt;
 touch working/model_hybrid/config&lt;br /&gt;
 &lt;br /&gt;
 # Train a hybrid model:&lt;br /&gt;
 export LD_LIBRARY_PATH=$EXT/lib:$LIB&lt;br /&gt;
 $SCRIPTS/create_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 $SCRIPTS/train_model_hybrid `pwd` $NUMNODES working/&lt;br /&gt;
 &lt;br /&gt;
 # Evaluate the parser model:&lt;br /&gt;
 $SCRIPTS/cl07_table7 working/&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
	<entry>
		<id>https://www.aclweb.org/aclwiki/index.php?title=Research&amp;diff=10239</id>
		<title>Research</title>
		<link rel="alternate" type="text/html" href="https://www.aclweb.org/aclwiki/index.php?title=Research&amp;diff=10239"/>
		<updated>2013-09-09T14:52:52Z</updated>

		<summary type="html">&lt;p&gt;KEvang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a list of links to information on research in Computational Linguistics.&lt;br /&gt;
&lt;br /&gt;
* [http://www.aclweb.org/anthology ACL Anthology] - more than 10,000 CL papers&lt;br /&gt;
* [[Bibliographies]]&lt;br /&gt;
* [[Books]]&lt;br /&gt;
* [[Formalisms]]&lt;br /&gt;
* [[Papers]]&lt;br /&gt;
* [[Resources]]&lt;br /&gt;
* [[Searching for papers]]&lt;br /&gt;
* [[Wikipedia articles]] - on topics related to Computational Linguistics&lt;br /&gt;
&lt;br /&gt;
== ACL Wiki articles and tutorials ==&lt;br /&gt;
Write your own article or tutorial!&lt;br /&gt;
&amp;lt;!-- Please keep this list in alphabetical order --&amp;gt;&lt;br /&gt;
* [[Active Learning for NLP]] (stub)&lt;br /&gt;
* [[Computational Lexicology]]&lt;br /&gt;
* [[Computational Morphology]] (stub)&lt;br /&gt;
* [[Computational Phonology]]&lt;br /&gt;
* [[Computational Semantics]]&lt;br /&gt;
* [[Computational Syntax]]&lt;br /&gt;
* [[Constrained Conditional Model]] (stub)&lt;br /&gt;
* [[Dialectometrics]]&lt;br /&gt;
* [[Dialogue Systems]] (stub)&lt;br /&gt;
* [[Distributional Hypothesis]]&lt;br /&gt;
* [[Graph Based Methods]] (stub)&lt;br /&gt;
* [[Information Extraction]] (stub)&lt;br /&gt;
* [[Lexical Acquisition]] (stub)&lt;br /&gt;
* [[Machine Translation]] (stub)&lt;br /&gt;
* [[Natural Language Generation Portal]]&lt;br /&gt;
* [[Natural Language Understanding]] (redirect)&lt;br /&gt;
* [[Multiword Expressions]] (stub)&lt;br /&gt;
* [[Parsing]] (stub)&lt;br /&gt;
* [[Part-of-speech tagging]]&lt;br /&gt;
* [[Question Answering]]&lt;br /&gt;
* [[Semantics]] (stub)&lt;br /&gt;
* [[Speech Processing]]&lt;br /&gt;
* [[Statistical Semantics]]&lt;br /&gt;
* [[Text Categorization]]&lt;br /&gt;
* [[Textual Entailment]]&lt;br /&gt;
* [[Text Summarization]] (stub)&lt;br /&gt;
* [[Training the C&amp;amp;C Parser]]&lt;br /&gt;
* [[Word Sense Disambiguation]]&lt;br /&gt;
&amp;lt;!-- Please keep this list in alphabetical order --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Research|*]]&lt;/div&gt;</summary>
		<author><name>KEvang</name></author>
	</entry>
</feed>