Neural network transition-based dependency parser (in Rust)

dpar

Introduction

dpar is a neural network transition-based dependency parser. The original Go version can be found in the oldgo branch.

Dependencies

Build-time

Run-time

  • Tensorflow

Building dpar

To compile and install dpar, run the following in the main project directory:

cargo install --path dpar-utils

To do a debug build and run unit tests, run cargo build in the main project directory. To generate API documentation, run cargo doc.

Owner
Daniël de Kok
Machine learning, natural language processing, Nix(OS|pkgs)?. I love cycling 🚴‍♂️.
Daniël de Kok
Comments
  • Add updated parser config

    Add updated parser config

    basic-parse.conf doesn't seem to be up to date with the parser anymore. A second config has been added to account for this. Alternatively, basic-parse.conf could also be replaced by extended-parse.conf.

  • Feature name restricted to ASCII_ALPHANUMERIC

    Feature name restricted to ASCII_ALPHANUMERIC

    The feature definition file is restricted to ASCII_ALPHANUMERIC feature names. I don't know if that's wanted, but there are probably cases where people (me) have features with underscores or some other non-ascii-alphanumeric chars.

    Quick fix for this would be replace ASCII_ALPHANUMERIC in the grammar definition file by a newly defined char rule:

    feature_name = ${ char+ }
    char = _{ !(WHITESPACE | "|" | ":" ) ~ ANY }
    

    or a more defensive change to feature_name by whitelisting non-ascii-alphanumeric letters explicitly

    feature_name = ${ (ASCI_ALPHANUMERIC | "_" | "-" )+ }
    
  • Convert feature specification parser to pest.

    Convert feature specification parser to pest.

    This removes an external dependency (ragel) for building the feature parser.

    @sebpuetz: I thought I'd add you as the reviewer for this one, since we talked about Pest last week, so I though you'd have an above average interest ;).

  • Replace Numberer by Transitions in the transition systems.

    Replace Numberer by Transitions in the transition systems.

    Transitions is a wrapper of Numberer that has several additional properties that are useful for transition tables.

    • It insures that the identifier 0 for unknown transitions.
    • It returns the correct length of the transition table, that includes the special identifier 0.
    • The table can be both fresh and frozen. A fresh table automatically adds transitions that are not known. A frozen table returns the special identifier 0 when a transition is now known.

    For future consideration: provide a thaw method as well?

  • Support pseudo-projective parsing

    Support pseudo-projective parsing

    Currently, dpar assumes that all dependencies are projective, even though they are read from the non-projective column.

    Support for pseudo-projective parsing should be added to deal with non-projective structures.

  • Reduce memory use during training

    Reduce memory use during training

    We vectorize all the data before optimizing the graph. This worked fine when we were just storing indices, but now that we store embeddings for embedding layers, memory use is getting out of hand (~30GB on TüBa-D/Z).

    I guess we we should generate the batches on the fly instead.

  • Add input vector for association metrics

    Add input vector for association metrics

    So far, input layers only stored the i32 indices that were lookups into an embedding matrix or a lookup table. A new kind of input vector is now added which makes it possible to use f32 values directly as an input vector. This is convenient for e.g. including association measures like PMI into the training process of the parser. PMIs are retrieved for the parser states that are involved in an attachment.

    Note that the lookup of the PMIs is only a dummy HashMap right now. In the next step, the PMIs will be looked up from a file.

  • Add retrieval of addressed values to transition system(s)

    Add retrieval of addressed values to transition system(s)

    This PR adds functionality to retrieve the addresses of the two tokens between which a dependency relation is established. Depending on the transition system, these are tokens from the stack or buffer.

  • Add FreezableLookup which replaces several lookup tables.

    Add FreezableLookup which replaces several lookup tables.

    This change introduces a single lookup table that replaces:

    • TransitionLookup
    • LookupTable
    • MutableLookupTable

    StoredLookupTable is simplified, but still exists as a wrapper around FreezableLookup to simplify serialization.

    FreezableLookup implements three traits:

    • LookupValue: defines the methods to look up an identifier for a value and vice versa. In contrast to the lookups that are replaced, values are borrowed during lookup and only cloned upon insertion.
    • LookupLen: defines the len method.
    • Lookup: inherits LookupValue and LookupLen and is implemented for any type that implements both traits.

    In my first iteration, I had a single lookup trait. However, this sometimes led to type inference problems when the len method was used. Since len does not use the borrowed type in its signature, the type parameter for Lookup<B> could not be inferred. So, instead, we make the len method part of a trait without type parameters.

    Fixes #19.

  • Replace error-chain by failure.

    Replace error-chain by failure.

    Big boring change --- replaces the error-chain crate by failure, which is now the preferred error handling crate in the Rust community. This also allows us to upgrade to conllx 0.10 without too much wrapping.

  • Rewrite addressed value parser in nom

    Rewrite addressed value parser in nom

    Ragel dropped support for all languages outside C/C++/ASM. To change/regenerate the addressed value parser, one has to manually compile an old Ragel version.

    It would be nicer to just switch to nom, which is Rust-native and well-maintained.

  • Replace various lookups by one data structure

    Replace various lookups by one data structure

    There is a lot of overlap between the transition-specific lookup and feature table lookup. This can be factored out to one class that replaces Numberer.

    Maybe this should be a separate crate, because it is generally useful.

Go implementation of the Rust `dbg` macro

godbg ?? godbg is an implementation of the Rust2018 builtin debugging macro dbg. The purpose of this package is to provide a better and more effective

Dec 14, 2022
An online Zig compiler inspired by Go and Rust

Zig Playground This is a rudimentary online compiler for the Zig programming language. It is inspired by the Go playground. Setup The main server is a

Jan 3, 2023
CodePlayground is a playground tool for go and rust language.

CodePlayground CodePlayground is a playground tool for go and rust language. Installation Use homebrews to install code-playground. brew tap trendyol/

Mar 5, 2022
Go specs implemented as a scripting language in Rust.

Goscript A script language like Python or Lua written in Rust, with exactly the same syntax as Go's. The Goal Runs most pure Go code, probably add som

Jan 8, 2023
A Go implementation of Rust's evmap

A Go implementation of Rust's evmap which optimizes for high-read, low-write workloads and uses eventual consistency to ensure that readers and writers never block each other.

Sep 3, 2022
Slabmap - Ported from Rust library slabmap

slabmap Ported from Rust library slabmap Examples import "github.com/pourplusquo

Jul 30, 2022
Minimalistic, pluggable Golang evloop/timer handler with dependency-injection

Anagent Minimalistic, pluggable Golang evloop/timer handler with dependency-injection - based on codegangsta/inject - go-macaron/inject and chuckpresl

Sep 27, 2022
go.mod file is the root of dependency management in Go

go.mod file is the root of dependency management in Go. All the modules which are needed or to be used in the project are maintained in go.mod file. I

Feb 9, 2022
A Go parser for the /etc/passwd file

> godoc github.com/willdonnelly/passwd PACKAGE package passwd import "github.com/willdonnelly/passwd" FUNCTIONS func Parse() (map[string]Entr

Oct 20, 2022
Conventional Commits parser written in Go

Conventional Commit Parser This is a parser for Conventional Commits go get -u github.com/release-lab/conventional-commit-parser package main import

Feb 4, 2022
rsync wrapper (or output parser) that pushes metrics to prometheus

rsync-prom An rsync wrapper (or output parser) that pushes metrics to prometheus. This allows you to then build dashboards and alerting for your rsync

Dec 11, 2022
A toy language parser, lexer and interpreter written in Golang

Monkey - A toy programming language Monkey is a toy programming language used to learn how to write a lexer, parser and interpreter. The language is i

Nov 16, 2021
Cooklang parser

Cooklang parser

Aug 1, 2022
LDMud OBJ_DUMP Parser and Database Tool

DUMPDB: A LDMUD OBJ_DUMP -> SQLite3 DB Tool About The LDMUD MUD driver supports dumping a formatted text file with information about every object load

Dec 5, 2021
A limited Flow Access API which runs outside of the Flow Network using the DPS

Access API Flow DPS implements the Flow Access API Specification, except for the following endpoints: SendTransaction GetLatestProtocolStateSnapshot G

Jul 28, 2022
mmdb-dump-networks - print every network in an MMDB to STDOUT

mmdb-dump-networks mmdb-dump-networks - print every network in an MMDB to STDOUT Project Description Usage Description Installation Reporting Bugs and

Oct 19, 2021
Pokt-calculator - A reward explorer for Pocket Network nodes
Pokt-calculator - A reward explorer for Pocket Network nodes

POKT Calculator A reward explorer for Pocket Network nodes. Quick start You can

Oct 23, 2022
Flow-based and dataflow programming library for Go (golang)
Flow-based and dataflow programming library for Go (golang)

GoFlow - Dataflow and Flow-based programming library for Go (golang) Status of this branch (WIP) Warning: you are currently on v1 branch of GoFlow. v1

Dec 30, 2022
GObject-introspection based bindings generator

WARNING! This project is no longer maintained. Probably doesn't even compile. GObject-introspection based bindings generator for Go. Work in progress

Jan 5, 2022