Providing legal pincite recommendations using language representations

  • Date: May 12, 2025
  • Time: 04:00 PM (Local Time Germany)
  • Speaker: Henrik P. Olsen (University of Copenhagen)
  • Room: Basement
Whereas most existing legal information retrieval systems return entire documents, lawyers and judges often require precise citations to specific paragraphs, so called “pinciting”. This article addresses pinciting in the domain of EU law, specifically in the case law of the Court of Justice of the European Union (CJEU). We present a machine learning-based system trained on a novel paragraph-to-paragraph CJEU citation dataset (1978–2021). This system predicts relevant paragraphs given a citing paragraph or legal proposition, thereby enabling precise and efficient legal research. We find that the model performs well, even when extended to the complete corpus of over 670,000 paragraphs. On test data, the ranker cites the cited paragraph, on average, in the second position. Moreover, through qualitative legal analysis and manual annotation by domain experts, we demonstrate the model’s ability to identify both exact and legally analogous citations, even when not explicitly referenced in the source case. Our findings suggest that paragraph-level retrieval systems can enhance legal research, understanding, and argumentation, offering significant value for practitioners and scholars alike. We also introduce a novel complexity measure to analyze the increasing difficulty of legal retrieval tasks over time.
Go to Editor View