3 - Lexical Semantics
ucla | CS M151B | 2024-01-17 08:02
Table of Contents
- Issues with One-Hot
- Lexical Semantics
- Linguistics
- Naive Word Similarity
- Thesaurus-Based Similarity Algos
Issues with One-Hot
- one-hot encoded vectors are orthogonal across the vocab
- so no dot products to find similarities
- cannot represent out of distribution words and n-grams
Lexical Semantics
- semantic similarity refers to closeness in meaning in context
Types of similarity algos
- Thesaurus based - hierarchical word closeness
- e.g., WordNet
- Distributional-based - similarity in real world usage (on the fly)
Lemmas and Wordform
- lemmas/citation form - the stem representation or part of speech
- one lemma can have many meanings (i.e. homonyms)
- wordform - inflected word as used
- sense - discrete representation of a word’s meaning - i.e. same semantic word but different context
Synsets
- the synonym set or set of near-synonyms
- used in WordNet to instantiate senses with a “gloss”
- hypernymy and hierarchy in WordNet
- words that share form (spelling or pronunciation) but distinct meanings
- Homographs - homonyms of same spelling
- Homophones - homonyms of same pronunciation
- causes issues in semantic learning especially translation
Polysemy
- related multi-sense words
- checking for this can be difficult - Zeugma test
- distinct words with similar meaning in context (some or all)
- synonyms are relation between senses more than individual words
- antonyms - opposite senses wrt contextual meaning
- hyponymy - one sense that is a subclass or a specification of another sense/word
- hypernymy - one sense that is a superclass of another sense/word
- IS-A hierarchy
Meronym and Holonym
- part-whole relation between senses
- Meronym - wheel is part of a car
- Holonym - car has a wheel
Naive Word Similarity
- synonymy as a binary relation of senses - e.g., using distance as similarity (fairly loose metric)
- distance in hypernym hierarchy and similarity in glosses (definitions)
- uses path based distance for similarity of senses/synsets
Limitations
- measure only good as resource
- subject to missing nuances and concepts/senses
- limited in scope
- hypernymy assumes “is-a” relation
- works for nouns but not all
- context not accounted, not domain-adaptable, multi-language not accessible