Coreference Resolution

"Niall Ferguson is prolific, well-paid and a snappy dresser. Stephen Moss hated him - at least until he spent an hour being charmed in the historian's Oxford study."

referent

  • a real world entity that some piece of text (or speech) refers to. In the example above, the actual professor is the referent.

referring expressions

  • bits of language used to perform reference by a speaker. In the example, "Naill Ferguson", 'he', 'him'.

antecedent

  • the text initially evoking a referent. "Naill Ferguson"

anaphora

  • the phenomenon of referring to an antecedent.

cataphora

  • pronouns appear before the referent (rare) "Since she lost her dog, Kim bought another."

Pronoun resolution

Identifying the referents of pronouns .
Anaphora resolution: generally only consider cases which refer to antecedent noun phrases.

Algorithms for coreference resolution

Usually solved as a supervised classification.

  • instances: potential pronoun/antecedent pairings
  • class is TRUE/FALSE
  • training data labelled with correct pairings
  • candidate antecedents are all NPs in current sentence and preceeding 5 sentences (excluding pleonastic pronouns)

Hard constraints: Pronoun agreement

  • A little girl is at the door - see what she wants, please?
  • My dog has hurt his foot - he is in a lot of pain.
  • My dog has hurt his foot - it is in a lot of pain.
    Complications:
  • I don't know who the new lecturer will be, but I'm sure they'll make changes to the course.
  • The team played really well, but now they are all very tired.
  • Kim and Sandy are asleep: they are very tired.

Hard constraints: Reflexives

  • John $_{i}$ cut himself $_{i}$ shaving. (himself = John, subscript notation used to indicate this)
    $\#$ John $_{i}$ cut $\mathrm{him}_{j}$ shaving. $(\mathrm{i} \neq \mathrm{j}-$ a very odd sentence $)$

Reflexive pronouns must be coreferential with a preceeding argument of the same verb, non-reflexive pronouns cannot be.

Hard constraints: Pleonastic pronouns

Pleonastic pronouns are semantically empty, and don't refer:

  • It is snowing
  • It is not easy to think of good examples.
  • It is obvious that Kim snores.
  • It bothers Sandy that Kim snores.

Soft preferences: Salience

  • Recency: More recent antecedents are preferred. They are more accessible. "Kim has a big car. Sandy has a smaller one. Lee likes to drive it."
  • Grammatical role: Subjects > objects > everything else: "Fred went to the shopping centre with Bill. He bought a CD."
  • Repeated mention: Entities that have been mentioned more frequently are preferred.
  • Parallelism Entities which share the same role as the pronoun in the same sort of sentence are preferred: "Bill went with Fred to the lecture. Kim went with him to the bar." Him=Fred
  • Coherence effects: The pronoun resolution may depend on the rhetorical / discourse relation that is inferred. "Bill likes Fred. He has a great sense of humour."

Features

Cataphoric - Binary: true if pronoun before antecedent.
Number agreement - Binary: true if pronoun compatible with antecedent.
Gender agreement - Binary: true if gender agreement.
Same verb - Binary: true if the pronoun and the candidate antecedent are arguments of the same verb.
Sentence distance - Discrete: {0,1,2,3...}
Grammatical role - Discrete: {subject, object, other} The role of the potential antecedent.
Parallel - Binary: True if the potential antecedent and the pronoun share the same grammatical role.
Linguistic form - Discrete: { proper, definite, indefinite, pronoun }

Niall Ferguson is prolific, well-paid and a snappy dresser. Stephen Moss hated him - at least until he spent an hour being charmed in the historian's Oxford study.

Pasted image 20201206141008

Apply any classifier, e.g. SVM, random forests etc.

Problems with simple classification model

  • Cannot implement 'repeated mention' effect.
  • Cannot use information from previous links.

Not really pairwise: need a discourse model with real world entities corresponding to clusters of referring expressions.

End-to-end solution: Neural end-to-end coreference resolution