-
Notifications
You must be signed in to change notification settings - Fork 4
NoJa
These are some notes to set up a system using the LOGON architecture for a new langauge pair. This is not officially supported by anyone at the moment.
Disclaimer: These pages are meant to be helpful, but that doesn't mean the authors will always be able to helpfully answer questions.
Contents
- Norwegian Japanese Machine Translation "NoJa"
- Running No/Ja
- Trouble shooting suggestions
- Set Up
- Differences in attributes
We recommend you have at least 3GB of RAM. Even more memory wouldn't hurt.
-
start transfer and generation in one emacs (M-x noja)
-
start a generator server
- trollet
- load ~/logon/dfki/jacy/lkb/script
- index for generator
- start server
-
translate
-
(mt::parse-interactively "overset meg") (C-c r)
-
a parse window appears
-
select the parse (with next/previous) and click transfer
-
select the translation (with next/previous) and click generate
-
the translation should magically popup in a little window
-
You can also run it as a batch with a bit more undocumented setup.
- start afresh from a new emacs
- are the predicates strings or types? (it is easy to confuse the two)
- do you have INPUT and OUTPUT the right way round?
Not all of the bits have been publically released yet (2006-06-09).
-
get a recent LOGON CVS (see LogonInstallation)
- fulfill your licensing requirements
- put your ACL licence in
- delete any bits you shouldn't have
- fulfill your licensing requirements
-
add in norsource
-
add in noja
-
setup various files
- .bashrc
- .emacs
- logon/dot.tsbrc
LOGONROOT=~/logon
if [ -f ${LOGONROOT}/dot.bashrc ]; then
. ${LOGONROOT}/dot.bashrc
fi
;;;
;;; LOGON-specific settings
(defun log ()
(interactive)
(if (getenv "LOGONROOT")
(let ((logon (substitute-in-file-name "$LOGONROOT")))
(if (file-exists-p (format "%s/dot.emacs" logon))
(load (format "%s/dot.emacs" logon) nil t t)))))
(defun jacy ()
(interactive)
;; set up logon
(log)
;; load lisp
(lisp)
;; make the encoding suitable for japanese (EUC-JP)
(japanese)
;; load the common-lisp commands
(insert (format ":ld %s/dot.clinit.cl\n" logon-root))
(fi:inferior-lisp-newline)
;; load the machine translation controller
(insert "(lmt)")
;; load the tsdb settings
(insert (format ":ld %s/dot.tsdbrc\n" logon-root))
(fi:inferior-lisp-newline)
;;set tsdb home and skeleton home
(insert "(tsdb::tsdb :home \"/home/bond/treebank/mrs\")")
(fi:inferior-lisp-newline)
(insert (format
"(tsdb::tsdb :skeleton \"%s/dfki/jacy/tsdb/skeletons\")"
logon-root))
(fi:inferior-lisp-newline)
;; load the grammar
(insert
(format "(read-script-file-aux \"%s/dfki/jacy/lkb/script\")"
logon-root))
(fi:inferior-lisp-newline))
(defun norse ()
(interactive)
;; set up logon
(log)
;; load lisp
(lisp)
;; load the common-lisp commands
(insert (format ":ld %s/dot.clinit.cl\n" logon-root))
(fi:inferior-lisp-newline)
;; load the machine translation controller
(fi:eval-in-lisp "(lmt)")
;; load the tsdb settings
(insert (format ":ld %s/dot.tsdbrc\n" logon-root))
(fi:inferior-lisp-newline)
;;set tsdb home and skeleton home
(insert "(tsdb::tsdb :home \"/home/bond/treebank/norse\")")
(fi:inferior-lisp-newline)
(insert (format
"(tsdb::tsdb :skeleton \"%s/ntnu/norsource/tsdb/skeletons\")"
logon-root))
(fi:inferior-lisp-newline)
;; load the grammar
(insert
(format "(read-script-file-aux \"%s/ntnu/norsource/lkb/scribet\")"
logon-root))
(fi:inferior-lisp-newline))
(defun noja ()
(interactive)
;; set up logon
(log)
;; load lisp
(lisp)
;; load the common-lisp commands
(insert (format ":ld %s/dot.clinit.cl\n" logon-root))
(fi:inferior-lisp-newline)
;; load the machine translation controller
(fi:eval-in-lisp "(lmt)")
;; load the tsdb settings
(insert (format ":ld %s/dot.tsdbrc\n" logon-root))
(fi:inferior-lisp-newline)
;; load the parser
(insert "(tsdb:tsdb :cpu :norse-parse :file t)")
(fi:inferior-lisp-newline)
;; load the transfer grammar
(insert
(format "(read-script-file-aux \"%s/ntnu/noja/lkb/script\")" logon-root))
(fi:inferior-lisp-newline))
;;;
;;; for NoJa (Norsource/Jacy)
;;;
(make-cpu
:host (short-site-name)
:spawn binary
:options (list "-I" base "-qq" "-locale" "no_NO.UTF-8"
"-L" (format nil "~a/ntnu/norse-parse.lisp" %logon%))
:class :norse-parse :name "norse-parse" :grammar "Norsource"
:task '(:parse) :wait wait :quantum quantum)
(in-package :common-lisp-user)
;;
;; make sure we have enough space available
;;
(system:resize-areas :old 256 :new 256)
(let* ((logon (system:getenv "LOGONROOT"))
(lingo (namestring (parse-namestring (format nil "~a/lingo" logon)))))
;;
;; load MK defsystem() and LinGO load-up library first
;;
(load (format nil "~a/lingo/lkb/src/general/loadup" logon))
;;
;; for NorSource, we need (close to) the full scoop
;;
(pushnew :lkb *features*)
(pushnew :mrs *features*)
(pushnew :tsdb *features*)
(pushnew :logon *features*)
(pushnew :slave *features*)
(excl:tenuring
(funcall (intern "COMPILE-SYSTEM" :make) "tsdb")
(funcall
(intern "READ-SCRIPT-FILE-AUX" :lkb)
(format nil "~a/ntnu/norsource/lkb/scribet" logon)))
(set (intern "*MAXIMUM-NUMBER-OF-EDGES*" :lkb) 10000)
(excl:gc :tenure) (excl:gc) (excl:gc t) (excl:gc)
(setf (sys:gsgc-parameter :auto-step) nil)
(set (intern "*TSDB-SEMANTIX-HOOK*" :tsdb) "mrs::get-mrs-string")
(funcall (symbol-function (find-symbol "SLAVE" :tsdb))))
Gain the admiration and respect of many by:
- running things from the DELPH-IN CVS
- running the parser using a cheap client instead of lisp
- setting up the tsdb cpus as a list append... + Japanese grammar overgenerates passive (passive is PASS +, non-passive is boolean) - fix in vpm
Jacy (copying the ERG) has tense on prepositions, while Norsource doesn't. This caused some problems. We solved them, using the variable property mapping, as follows:
Add some types (to mrs.tdl) these mark events for which the entire attribute will be deleted:
ditch-tense := tense.
ditch-mood := mood.
Use these in out.vpm to delete the attribute iff marked with ditch-tense, otherwise to pass the values across.
TENSE : TENSE
ditch-tense => !
* >> *
MOOD : MOOD
ditch-mood => !
* >> *
In the input munging norse.tdl, find all prepositions with a regular expression and then mark them to have tense and mood deleted.
prep_mark_jf := monotonic_mtr &
[ CONTEXT.RELS < [ PRED "~_p_", LBL #h0, ARG0 #e & e ] >,
FILTER.RELS < [ PRED "prep_swap_mark", LBL #h0, ARG0 #e ] >,
OUTPUT.RELS < [ PRED "prep_swap_mark", LBL #h0, ARG0 #e ] >,
FLAGS.EQUAL < #e > ].
specialize_prep_nf & monotonic_mtr &
[ INPUT.RELS < [ LBL #h, ARG0 #e],
[ PRED "prep_swap_mark", LBL #h, ARG0 #e ] >,
OUTPUT.RELS < +copy+ & [ ARG0 #e &
[TENSE ditch-tense,
MOOD ditch-mood] ] > ].
Home | Forum | Discussions | Events