14 - Syntactic Parsing

ucla | CS 162 | 2024-03-11 02:55

Syntactic Parsing

syntax is the characteristics of words and interdependent relations that help w interpretation
we parse as a tree from sentence (bottom) up
nonterminal rules: a rule that decomposes to other rules:
terminal rules: rules that decompose to a word/token:
natural language has lots of overloading/ambiguity and grammar isn’t known in advance
evaluation also requires true parsed outputs
Evaluation
decompose true to labeled constituents (only given rules, POS tags not considered - but usually labeled regardless)
calculate balanced precision and recall (f-measure)
green is predicted, orange is true (predicted as a tuple of the nonterminal and (continuous) span of the nonterminal rule):
F1, Precision, Recall - PROCESS

Parsing
Grammar Rules
the set of nonterminal and terminal rules given by natural language
also the span of each token
Process
the next pass, we look over all constituents of all previous passes and check if any start with the end of any current constituent
complete/optimal but inefficient
Optimization
tabulate in a chart of the sentence span
original algo:
iterative deepening of width, build with width-1, then width-2, (done by iterating over diagonals, remember constituent spans must be continuous) etc.
- width-1 is diagonal
- width-2:
- width-4:
to find a width-N constituent given all lower N constituents, we consider the possible N length spans and check which follow a grammar rule:

CKY Algo (Recognizer)

an algo to check if a sentence is parseable i.e., if a parse tree exists for the sentence
given the grammar rules, the tabular representation of spans, and the sentence, we determine constituents of length N as diagonals of the table:
Psuedocode:
Analysis of Algo
$O (n^{3})$ but tree grows exponentially with sentence/context window length
reasonably efficient (poly)
requires Chomsky Normal Form (nonterminal and terminal rules of 1 level order only, i.e., what we have above)
this CKY can check grammar
but also generate entire forest with all ambiguity
we can analyze ambiguity by counting trees in forest generated
CKY Parser (Probabilistic)
to mitigate ambiguity (due to same POS) like so:
many possible parses but some much more probable than others
assign a weight to each rule, select lowest weight (NOTE we use neg. log-prob so lowest is highest prob) s.t. tree’s weight is total weight of all its rules
Process
initialize w/ width-1 along diagonal, grammar rules with weights (We want lowest), and sentence with spans:
then run IDS and when ambiguity occurs, select lowest weight rule that works for that N-length constituent
the weight of each constituent is the sub-constituent weights plus the rule weight (red from blue and purple):
then, from root, follow backpointers of lowest weight:
cull the inferior constituents that we dont need in each span:
Variations
keep track of total probs of all possible parses of a sentence (Inside algo)
track all possible parses and their probs
Training
count and divide
featurized training
Penn Treebank
databse of grammar rules that we use as terminal and nonterminal rules for our parsers
eval f1 score on trees
train using span splits
Count and Divide CKY Probs
the true probability of the tree given the sentence, we use the probability chain rule: $\frac{c (entire left subtree)}{c (right subtree)}$
but to estimate, we make the independence assumption that subtrees are independent of other subtrees given their immediate parent and apply backoff $\frac{c (partial left subtree)}{c (right subtree)}$

Dependency Parsing

constituent view of syntax (CKY) does not consider discontinuous information or POS tags
constituent view als oloses lexically grounded selection preference, we instead may need to create many rules (propagating heads) depending on how the sentence should be interpreted
this can introduce too much information downstream making probs low and hard to learn
instead we can remove lexical tags and redundant term to derive dependency tree in-line
Dependency View of Syntax
no explicit phrase structure
features how words relate to each other instead of syntactic phrase only
- closer alignment bw analysis of structure of diff langs
- natural modeling of discontinuous words
- lexically grounded relationships
- loss of expressivity (cant go back to constituency tree from dependency)
- requires definition of head
may be good to assign labeled edges in dependency tree
similar to PTB, other treebanks made for depenndency trees: Prague, Universal
issue is determining head to start with, so we split into functional and content heads and usually start with content heads as root
Projectivity
a dependency tree is projective if every subtree occupies contiguous span and does not cross other subtrees
non-projective is not as common in english lang
Parsing Approaches
constituency and project
- build CKY parser then project down to dependency
- but does not handle non-projective situations
graph based
- Chiu-Liu Edmonds
- find MST of the possible graphs of the sentence
- quadratic time
transition based
- lienar time
- approx solution
- doesnt handle non-projective tho
  Shift Reduce Parsing
keep a stack of words and work at the top of the stack
have 3 ops
- Shift - add a word to stack
- Reduce-left - link top 2 stack elements, direction left, and pop both -> link both -> push the HEAD on top
- Reduce-right - same as above but right direction
  Inference
e.g., given the ground truth tree (we know we want to root at book)
we end up with this parsing procedure
we only reduce-right (RR) AFTER all subtrees of the possible child word have been linked (i.e. all RL for the word have been done)
Training
Input is sentence -> output is decision of OP (S,RR,RL), using a deterministic algo
Thus, we train probs $P (action | state)$ using featurized learning (the stack, the buffer, the extracted links <- and the most immediate versions of these (adjacent neighbors), choosing bw word or POS tag)
Beam Decoding to prevent bad Greedy
choose top k (Accumulating cost) at each step

14 - Syntactic Parsing

Table of Contents

Syntactic Parsing

Evaluation

F1, Precision, Recall - PROCESS

Parsing

Grammar Rules

Process

Optimization

CKY Algo (Recognizer)

Analysis of Algo

CKY Parser (Probabilistic)

Process

Variations

Training

Penn Treebank

Count and Divide CKY Probs

Dependency Parsing

Dependency View of Syntax

Projectivity

Parsing Approaches

Shift Reduce Parsing

Inference

Training

Beam Decoding to prevent bad Greedy