Providing legal pincite recommendations using language representations
- Date: May 12, 2025
- Time: 04:00 PM (Local Time Germany)
- Speaker: Henrik P. Olsen (University of Copenhagen)
- Room: Basement
Whereas
most existing legal information retrieval systems return entire documents,
lawyers and judges often require precise citations to specific paragraphs, so
called “pinciting”. This article addresses pinciting in the domain of EU law,
specifically in the case law of the Court of Justice of the European Union
(CJEU). We present a machine learning-based system trained on a novel
paragraph-to-paragraph CJEU citation dataset (1978–2021). This system predicts
relevant paragraphs given a citing paragraph or legal proposition, thereby
enabling precise and efficient legal research. We find that the model performs
well, even when extended to the complete corpus of over 670,000 paragraphs. On
test data, the ranker cites the cited paragraph, on average, in the second
position. Moreover, through qualitative legal analysis and manual annotation by
domain experts, we demonstrate the model’s ability to identify both exact and
legally analogous citations, even when not explicitly referenced in the source
case. Our findings suggest that paragraph-level retrieval systems can enhance
legal research, understanding, and argumentation, offering significant value
for practitioners and scholars alike. We also introduce a novel complexity
measure to analyze the increasing difficulty of legal retrieval tasks over
time.