Automatic Linking of Judgements to UK Supreme Court Hearings
Hadeel Saadany
Centre for Translation Studies
University of Surrey
United Kingdom
hadeel.saadany@surrey.ac.uk
Constantin Orăsan
Centre for Translation Studies
University of Surrey
United Kingdom
c.orasan@surrey.ac.uk
Catherine Breslin
Kingfisher Labs Ltd
United Kingdom
Sophie Walker
Just Access
United Kingdom
Abstract
One the most important archived legal material
in the UK is the Supreme Court published
judgements and video recordings of court
sittings for the decided cases. The impact of
Supreme Court published material extends far
beyond the parties involved in any given case as
it provides landmark rulings on arguable points
of law of the greatest public and constitutional
importance. However, the recordings of a
case are usually very long which makes it
both time and effort consuming for legal
professionals to study the critical arguments
in the legal deliberations. In this research,
we summarise the second part of a combined
research-industrial project for building an
automated tool designed specifically to link
segments in the text judgement to semantically
relevant timespans in the videos of the hearings.
The tool is employed as a User-Interface (UI)
platform that provides a better access to justice
by bookmarking the timespans in the videos
which contributed to the final judgement of the
case. We explain how we employ AI generative
technology to retrieve the relevant links and
show that the customisation of the GPT text
embeddings to our dataset achieves the best
accuracy for our automatic linking system.
1
Introduction
In the UK, HM Courts & Tribunals Service
(HMCTS) publishes both live and recorded
videos of court hearing sessions across different
jurisdictions. This has been a tradition going on
as back as 2008 for different types of courts in
the UK. The main objective for this is to improve
public access to, and understanding of, the work of
the courts. The Supreme Court sits above all the
UK separate jurisdictions and as the final court of
appeal, its decisions contribute to the development
of United Kingdom law. Moreover, the court
hearings crucially aid in new case preparation,
provide guidance for court appeals, help in legal
training and even guide future policy.
However, there are two main obstacles to making
use of this rich material to learn more about
the judicial system and have a better access to
justice. First, the audio/video material for a case
typically spans over several hours on several days,
which makes it both time and effort consuming for
legal professionals to extract important information
relevant to their needs. Second, currently, the
existing need for legal transcriptions, covering
449K cases p.a in the UK across all court tribunals,
is largely met by human transcribers (Sturge, 2021).
This, of course, makes the recorded material,
which is rich of legal arguments relevant to how a
judgement is reached, difficult to navigate either in
text format or in its original audio-visual format.
In this research, we present a combined researchindustrial effort to construct an integrated system
for the automatic navigation of segments in
the media data of UK Supreme Court hearings
based on their semantic relevance to particular
paragraph(s) in the text of the judgement issued
following the hearing. Based on the timing
metadata of the court hearing transcription
segments, we manage to assign bookmarks on the
video sessions and link them to their semantically
relevant paragraphs in the judgement text. The
main objective of the video bookmarking is to
provide legal professionals, as well as the general
public an automatic navigation tool that pins down
the arguments and legal precedents presented in the
long hearing sessions and which are of particular
importance to how the judges made their decision
on the case. We call our system the Judgementto-Hearing Automatic Linking (J-HAL) and we
deployed it as a User-Interface platform that can
be used by legal professionals, academics and the
492
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 492–500
December 6-10, 2023 ©2023 Association for Computational Linguistics
public.
Figure 1 shows a snapshot of the UI we created.
On the left side of the screen, the paragraphs of
the written judgement are displayed. The user can
use the scroll down button to choose a specific
paragraph in the judgement. On the right side,
the timespan in the court hearing video that is
semantically relevant to the legal point mentioned
in the selected judgement paragraph is displayed
along with temporal metadata (session number,
day and time). The user can play the particular
timespan and go back and forth around it as
well as read our tool’s transcription of the speech.
Major stakeholders from legal domain have shown
interest in employing our UI as a bookmarking
tool for identifying legally critical minutes in video
sessions of Supreme court hearings to be used by
legal academics and professionals. Also, our UI is
currently in the application process to be published
as an innovative patent by the UK Intellectual
Property Office.
In this paper, we explain how we establish the
automatic linking between judgement paragraphs
and video session bookmarks through Information
Retrieval (IR) models. First, in section 2, we briefly
summarise relevant work for IR in the legal domain.
Then, in section 3, we explain our system’s pipeline
and its objectives. In section 4, we show how we
compiled and preprocessed our dataset. Then, in
section 5, we summarise our experiments on this
dataset using a zero-shot IR approach as a prefetching stage and how we customise the GPT text
embeddings for optimising our system. Finally,
in section 6, we present our conclusion on the
experiments conducted as well as our future plans
for improving the linking system.
2
Related Work
Recently there has been an increased interest
in employing NLP techniques to aid the textual
processing in the legal domain (Elwany et al., 2019;
Nay, 2021; Mumcuoğlu et al., 2021; Frankenreiter
and Nyarko, 2022). The main focus has been on
legal document summarisation (Shukla et al., 2022;
Hellesoe, 2022), predicting judgements (Aletras
et al., 2016; Trautmann et al., 2022) and contract
preprocessing and generation (Hendrycks et al.,
2021; Dixit et al., 2022). Moreover, NLP methods
for Information Extraction and Textual Entailment
have been extensively used in the domain of legal
NLP to either find an answer to a legal query in
legal documents (Zheng et al., 2021) or to connect
textual data (Rabelo et al., 2020). For example,
Chalkidis et al. (2021) experiment with different
IR models to extract relevant EU and UK legislative
acts that are important for organisations’ regulatory
compliance which they need to ensure it complies
with the relevant laws. Their experiments show
that fine-tuning a BERT model on an in-domain
classification task is the best pre-fetcher for their
dataset. Similarly, Kiyavitskaya et al. (2008) use
textual semantic annotation to extract government
regulations in different countries which companies
and software developers are required to comply
with. They show that AI-based IR tools are
effective in reducing the human effort to derive
requirements from regulations. Although there has
recently been significant increase in the legal IR
research, the processing and deployment of spoken
court hearings for legal IR has not received the
same attention as understanding and extracting
information from textual legal data. In this research,
we introduce an industrial product that employs
IR tools to automatically connect judgements and
videos of court hearings.
3
Linking Judgements and Case Hearings
The pipeline for automatically connecting
judgements of decided cases to relevant bookmarks
in the court hearing videos consists of two stages.
First, we build a customised speech-to-text
language model and employ NLP methods to
improve the quality of the court hearing video
transcriptions. The objective of this stage is to
obtain a high quality transcript for our automatic
retrieval model in the second stage. The second
stage consists of building an IR system capable of
extracting the best n-links between a paragraph(s)
of the judgement text and the timespans of the
transcribed video sessions of that particular case.
The links are then translated into timestamp
bookmarks in the long videos of each case to be
used in constructing our UI. The first stage of
the pipeline is beyond the scope of the current
paper and is described in (Saadany et al., 2022,
2023). In the following sections, we explain the
methodology we adopt for conducting the second
stage in the system pipeline where the automatic
linking is established between judgement text and
video bookmarks.
In the second stage, we treat the linking of
a judgement paragraph to the relevant timespan
493
Figure 1: User-Interface for Linking Judgement to Bookmarks in Video Court Sessions
transcripts of a video session as a text retrieval
task. In NLP, text retrieval is usually divided into
two sub tasks according to the length of the text:
query-to-passage retrieval and query-to-document
retrieval (Xu et al., 2022). For our use case, it is a
document-to-document retrieval task where we first
transcribe the video sessions by a custom speech-totext language model that we developed in stage one,
and then segment the judgement into paragraphs
where a paragraph(s) is treated as a query and the
transcript of the case is the corpus in which we
search for an answer to that query. It is important
to note that the judgement text and the timespans
of a transcript usually consist of several lines of
text which makes the task more challenging than
the traditional query-to-document IR where the
query is typically short. Another challenge specific
of our data is that we attempt to establish a link
between texts that belong to two different language
registers: written and spoken (Peters, 2003;
Matthiessen and Halliday, 2009). Linguistically,
the complexity of speech is “choreographic"
(Halliday, 2007) where meta-linguistic elements
such as intonation, loudness or quietness, pausing,
stress, pitch range and gestures communicate
semantic connotations. Moreover, spoken language
is characterised by complex sentence structures
with low lexical density (fewer high content words
per clause), whereas written language typically
contains simple sentence structures with high
lexical density (more high content words per
clause) (Halliday, 2007). Thus, the retrieval task
in our case is nontraditional as it needs a careful
preprocessing and segmentation of the spoken and
the written datasets to obtain accurate results. In the
following sections, we explain how we compiled
and preprocessed our training dataset.
4
Data Compilation and Processing
For training our IR models, we extracted 7 case
judgements consisting of 1.4M tokens scraped
from the official site of the UK Supreme Court1 .
As for the transcription data, it consisted of 53
hours of video material for the selected cases
obtained from the UK National Archive2 . The
video sessions were transcribed by our custom
speech-to-text model which we trained in stage one
of J-HAL system pipeline. We then ran a number
of preprocessing steps to obtain the best linking
accuracy between a judgement segment and the
relevant timespans in the transcripts.
The main challenge in preprocessing the dataset
was how to segment the judgement text into
semantically cohesive sections that would be
treated as queries in our IR method.
We
noticed that typically the Supreme Court judgement
is structured manually into sections such as:
“Introduction”, “The context”, “Facts of the Case”,
“The Outcome of the Case”, etc. However, after
we carefully scrutinised the dataset, we found that
the naming of sections is not consistent. On the
other hand, the judgement texts are consistently
divided into enumerated paragraphs (typically a
digit(s) followed by a dot). We opted, therefore,
for segmenting the judgement text into windows of
enumerated paragraphs. After experimenting with
different window sizes, the optimum window size
consisted of three enumerated paragraphs with an
average length of 389 tokens per segment. As for
the preprocessing of the transcription, it consisted
mainly of excluding very short timespans since they
were mostly either interjections (e.g. “Yes, sorry,
I’m not following", “I beg your pardon.", etc.) or
reference to logistics of the hearing (e.g. “This
1
2
494
https://www.supremecourt.uk/decided-cases/
https://www.nationalarchives.gov.uk/
is your paper, isn’t it?", “Please turn to the next
page. ", etc.). We chose to exclude transcription
spans less than 50 tokens as an empirical threshold
for semantically significant conversation units. For
both the judgement and transcript data, we cleaned
empty lines and extra spaces but kept punctuation
intact as it is essential in identifying names of cases
and legal provisions3 .
5
of a document d to a query q, based on the query
terms appearing in d. It is a modified version of
the tf-idf function where the ranking scores change
based on the length of the document d in words,
and the average d length in the corpus from which
documents are drawn.
B. Embedding-based Methods
Zero-shot Information Retrieval
The ability of an IR system to retrieve the topN most relevant results is usually assessed by
comparing its performance with human-generated
similarity labels on a sentence-to-sentence or queryto-document similarity dataset(s) (e.g. Agirre
et al., 2014; Boteva et al., 2016; Thakur et al.,
2021). In order to create a human-generated
evaluation dataset, we needed to assign human
annotators to manually check the correct links
between judgement segments and the timespans
of video hearing transcripts for each of our chosen
cases. However, in our use case, this is not feasible
since to annotate one Supreme Court case with,
for example, 50 judgement segments and 300
timespans of video transcript, the annotators will
need to read 50 x 300 judgement-timespan link,
which amounts to 15,000 doc-to-doc link per case.
To overcome this problem, we adopted a zero-shot
IR approach.
Thus to create a dataset for annotation, we first
experimented with different ways to encode the
judgement segments and transcription timespans as
numeric vectors for a single case in our dataset. We
used the cosine similarity as our semantic distance
metric to extract the top closest 20 transcript
timespans per judgement segment in vector space.
Then, we assigned a human annotator, postgraduate law student, to evaluate the first 20 links
produced by the different models. The annotator
compared each judgement segment against each
timespan to choose either ‘Yes’ there is a semantic
link or ‘No’ there is not. The IR models used for
our experiments are the following:
A. Frequency-based Methods (keyword search)
Okapi BM25 (Robertson et al., 2009): BM25 is
a traditional keyword search based on a bag-ofwords scoring function estimating the relevance
3
The UK legal system has a unique punctuation style for
case names such as “R v Chief Constable of South Wales
[2020] EWCA Civ 1058" which are crucial in understanding
legal precedents.
Document Similarity with Pooling:
We
experimented with different pooling methods of
the GloVe (Pennington et al., 2014a) pretrained
word embeddings. The GloVe vector embeddings
are created by unsupervised model training on
general domain data (Pennington et al., 2014b).
We create vectors for the judgement segment and
the transcripts spans from the mean, minimum and
maximum values of the GloVe embeddings.
Entailment Search: We use embeddings from
a pretrained model for textual entailment which
is trained to detect sentence pair relations, i.e.
one sentence entails or contradicts the other. We
employ the Microsoft MiniLM model (Wang et al.,
2020) which is trained on the Microsoft dataset
MiniLM-L6-H384-uncased and fine-tuned on a
1B sentence pairs dataset. The potential link
in this case is whether or not the judgement
paragraph(s) entails the particular segment of the
video transcript.
Legal BERT: Our dataset comes from the legal
domain which has distinct characteristics such as
specialised vocabulary, particularly formal syntax,
and semantics based on extensive domain-specific
knowledge (Williams, 2007; Haigh, 2018). For
this reason, we employed Legal BERT (Chalkidis
et al., 2020) which is a family of BERT models
for the legal domain pre-trained on 12 GB of
diverse English legal text from several fields (e.g.,
legislation, court cases, contracts). The judgement
text and the video transcript data were converted
into the Legal BERT pretrained word embeddings.
Asymmetric Semantic Search: Asymmetric
similarity search refers to finding similarity
between unequal spans of text, which may be
particularly applicable to our case where the
judgement text may be shorter than the span of
the video transcript. For this purpose, we created
the embeddings using the MS MARCO model
(Hofstätter et al., 2021) which is trained on a large
scale IR corpus of 500k Bing query examples.
GPT Question-answer linking: In this setting
a question-answer linking approach is adopted
495
Model
GPT
Entailment
Glove
BM25
Asymmetric
LegalBert
MAP@5
0.96
0.87
0.81
0.87
0.94
0.83
Recall@5
0.33
0.32
0.27
0.29
0.32
0.30
MAP@10
0.89
0.85
0.77
0.81
0.88
0.82
Recall@10
0.57
0.55
0.53
0.53
0.54
0.55
MAP@15
0.85
0.82
0.61
0.78
0.83
0.79
Recall@15
0.77
0.79
0.78
0.77
0.77
0.78
Table 1: Results of Unsupervised IR for Linking Judgements in One Case
Model
GPT
BM25
Entailment
Glove
Asymmetric
LegalBert
MAP@5
0.691
0.655
0.615
0.526
0.602
0.557
Recall@5
0.391
0.377
0.348
0.316
0.347
0.326
MAP@10
0.622
0.612
0.568
0.506
0.553
0.531
Recall@10
0.657
0.659
0.611
0.602
0.619
0.613
MAP@15
0.711
0.698
0.66
0.607
0.664
0.632
Recall@15
0.914
0.902
0.885
0.884
0.908
0.896
Table 2: Results of Unsupervised IR for linking Judgements in Entire Dataset
where the selected judgement text portion is treated
as a question, and the segments of the video
transcript as potential answers. We use pretrained
embeddings obtained from OpenAI’s GPT latest
text-embedding-ada-002 model to find answers in
video timespans for each segment in the judgement
which is treated as a prompt query.
To assess the performance of each model in
comparison to the human judgement, we calculated
the Mean Average Precision (MAP) which is the
de facto IR metric:
Q
1 X
M AP =
AP (q)
Q
(1)
q=1
where Q is the total number of queries, in our case
the judgement segments, and AP (q) is the average
precision of a single query q. AP (q) evaluates
whether all of the timespans assigned as relevant
by the annotator are ranked the highest by the
model. We calculated MAP for the first 5, 10, and
15 judgement-timespan pairs.
As can be seen from Table 1, the GPT model
demonstrated the best performance in comparison
to the other models. Thus, to create a dataset for
annotation for the rest of the cases, we extracted
the top 15 links for each judgement-transcript
segment according to the cosine similarity scores
of the GPT embedding model. We also extracted
5 links with the lower ranks (50 to 55) to avoid
bias to the GPT model and randomly shuffled the
20 links for each judgement-transcript segments.
After this processing, the dataset constructed for
manual annotation consisted of 3620 judgementto-transcript documents. The human annotators
were again asked to judge whether the extracted
timespan transcripts are semantically linked or not
linked to the judgement paragraph(s). This was
done with a specially designed interface which is
similar to the UI presented in Figure 1.4
The human annotations were compared to the
results of all the embedding models mentioned
above. As shown in Table 2, the GPT text
embedding model again shows superiority over the
other models. Thus, the approach of treating the
judgement segment as a query and the transcription
of the video sessions as the corpus in which we
try to find the answer gives the best MAP results
for the first 5, 10 and 15 links. However, it should
be pointed out that our use case is different than a
typical IR task where the efficacy of the model
is evaluated by its ability to get the best links
in the very first few hits (optimally hits 1 to 5).
In our case, the Recall, rather than the average
precision, is of higher importance. The reason is
that the output of the model is used to extract the
transcription temporal metadata which is then used
to bookmark the video sessions at the relevant parts
where the UI user can watch or draw the cursor
around the bookmark to get more information.
Accordingly, our system’s priority is to extract as
4
The
code
for
the
annotation
interface
is
available
at
https://github.com/dinel/
SimilarityLinksAnnotator.
496
many relevant bookmarks as possible from all the
true relevant links in the long video sessions so that
the user can understand how the judgement point
is argued in the session.
Table 2 also reveals that the BM25’s recall
performance is on par with the GPT specifically
in retrieving relevant links in the top 10 and
15 hits. Although compute-intensive approaches
generalise better, they are significantly slow as
compared to BM25 which is a lightweight search
model. In building our system, we find that the
GPT approach is a midway solution because of
its superior performance and speedy API calls for
extracting embeddings.
Figure 2: Cosine Similarity Distribution with Original
GPT Embeddings
5.1 Model Optimisation
To optimise the performance of the best IR model,
we customised the GPT embeddings to be more
domain specific. The GPT embedding model used
for our retrieval is the text-embedding-ada-002,
which was introduced by OpenAI in December
2022 as their state-of-the-art text embedding
model. It is trained on different datasets used
for text search, text similarity, and code search.
In order to customise the GPT embeddings,
we follow the OpenAI method for embedding
customisation (Sanders, 2023). Thus, we train a
classification model on our humanly-annotated data
with the following objective:
SEmin = min SE(x) | x ∈ {−1, −0.99, . . . , 1}
(2)
where x is the cosine similarity threshold between
the positive and negative class which we obtain
by sweeping between cosine similarity scores
from -1 to 1 in steps of 0.01 till we get the
lowest standard error of mean SEmin for the
cosine similarity distribution. The output of
this training is a matrix M that we multiply by
the embedding vector v of each judgement and
transcript segment. This multiplication produces
customised embeddings which are more adapted
to our legal dataset. The graphs in Figures 2 and 3
show that the overlap between the distribution of
the cosine similarities for relevant and irrelevant
judgement-hearing links improves from 70.5%
± 2.7% with the original GPT embeddings to
73.0% ± 2.6% with the customised embeddings.
The customised embeddings contribute to a more
accurate automatic linking between judgement text
and video court hearings. An example of the GPT
Figure 3: Cosine Similarity Distribution with
Customised GPT Embeddings
retrieval output which we use as the back-end
model for our UI is shown in Appendix A. As per
our human annotator markings, each text colour
indicates a legal point presented in the judgement
text and its semantically relevant argument in the
hearing transcript is presented in the same colour.
6
Conclusion
This research presented the second stage of
our pipeline which employs generative AI to
automatically link a judgement text of a decided
case in the UK Supreme Court to its video
hearings. The IR system we provide assists users
in extracting the arguments and information they
may find useful in understanding the particular
case they are studying. The system does not,
however, explicitly return answers to questions
legal professionals may have on a legal precedent.
The UI we provide supports the users in browsing
or filtering the lengthy videos of the court hearing
sessions by searching through hundreds of video
timespans and then provides a set of need-to-watch
bookmarks that are crucial in understanding the
497
judgement decided for the case. The implications
of this tool extends beyond providing practical aid
for legal professionals and academics to expanding
the public’s ability to access information on
court proceedings and justice in general. In the
future stages of the project, we aim to expand
our annotated linking dataset and explore the
effectiveness of coupling judgements and video
hearings according to common legal entities such
as articles, legal provisions and names of similar
cases. We also aim to adopt a similar methodology
for constructing similar UI tools for other purpose
domains such as bookmarking recorded lecture
spans to educational text books or linking written
documents to recorded meetings in the business
sector.
retrieval: A case study in eu/uk legislation where
text similarity has limitations. arXiv preprint
arXiv:2101.10726.
Abhishek Dixit, Vipin Deval, Vimal Dwivedi, Alex
Norta, and Dirk Draheim. 2022.
Towards
user-centered and legally relevant smart-contract
development: A systematic literature review. Journal
of Industrial Information Integration, 26:100314.
Emad Elwany, Dave Moore, and Gaurav Oberoi. 2019.
Bert goes to law school: Quantifying the competitive
advantage of access to large legal corpora in contract
understanding. arXiv preprint arXiv:1911.00473.
Jens Frankenreiter and Julian Nyarko. 2022. Natural
language processing in legal tech. Legal Tech and
the Future of Civil Justice (David Engstrom ed.).
Rupert Haigh. 2018. Legal English. Routledge.
Michael Alexander Kirkwood Halliday. 2007.
Language and Education: Volume 9. A&C Black.
Acknowledgements
Lui Joseph Hellesoe. 2022. Automatic Domain-Specific
Text Summarisation With Deep Learning Approaches.
Ph.D. thesis, Auckland University of Technology.
We would like to thank and acknowledge the effort
exerted by our legal expert team of annotators who
took the time to carefully read the dataset and
provide relevancy labels.
Dan Hendrycks, Collin Burns, Anya Chen, and Spencer
Ball. 2021.
Cuad: An expert-annotated nlp
dataset for legal contract review. arXiv preprint
arXiv:2103.06268.
References
Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong
Yang, Jimmy Lin, and Allan Hanbury. 2021.
Efficiently Teaching an Effective Dense Retriever
with Balanced Topic Aware Sampling. In Proc. of
SIGIR.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel
Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei
Guo, Rada Mihalcea, German Rigau, and Janyce
Wiebe. 2014. Semeval-2014 task 10: Multilingual
semantic textual similarity. In Proceedings of the
8th international workshop on semantic evaluation
(SemEval 2014), pages 81–91.
Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel
Preoţiuc-Pietro, and Vasileios Lampos. 2016.
Predicting judicial decisions of the european court
of human rights: A natural language processing
perspective. PeerJ computer science, 2:e93.
Vera Boteva, Demian Gholipour, Artem Sokolov,
and Stefan Riezler. 2016. A full-text learning
to rank dataset for medical information
retrieval.
In Proceedings of the 38th
European Conference on Information Retrieval.
http://www.cl.uni-heidelberg.de/~riezler/
publications/papers/ECIR2016.pdf.
Ilias Chalkidis, Manos Fergadiotis, Prodromos
Malakasiotis,
Nikolaos Aletras,
and Ion
Androutsopoulos. 2020.
LEGAL-BERT: The
muppets straight out of law school. In Findings
of the Association for Computational Linguistics:
EMNLP 2020, pages 2898–2904, Online. Association
for Computational Linguistics.
Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas,
Eva Katakalou, and Prodromos Malakasiotis. 2021.
Regulatory compliance through doc2doc information
Nadzeya Kiyavitskaya, Nicola Zeni, Travis D Breaux,
Annie I Antón, James R Cordy, Luisa Mich, and John
Mylopoulos. 2008. Automating the extraction of
rights and obligations for regulatory compliance. In
Conceptual Modeling-ER 2008: 27th International
Conference on Conceptual Modeling, Barcelona,
Spain, October 20-24, 2008. Proceedings 27, pages
154–168. Springer.
Christian
MIM
Matthiessen
and
Michael
Alexander Kirkwood Halliday. 2009. Systemic
functional grammar: A first step into the theory.
Emre Mumcuoğlu, Ceyhun E Öztürk, Haldun M
Ozaktas, and Aykut Koç. 2021. Natural language
processing in law: Prediction of outcomes in the
higher courts of turkey. Information Processing &
Management, 58(5):102684.
John J. Nay. 2021. Natural Language Processing
for Legal Texts, DOI=10.1017/9781316529683.011,
page 99–113. Cambridge University Press.
Jeffrey Pennington, Richard Socher, and Christopher D
Manning. 2014a.
Glove: Global vectors for
word representation. In Proceedings of the 2014
conference on empirical methods in natural language
processing (EMNLP), pages 1532–1543.
498
Jeffrey Pennington, Richard Socher, and Christopher D
Manning. 2014b. Glove: Global vectors for word
representation. Figshare https://nlp.stanford.
edu/projects/glove/.
J Melanie Peters. 2003. The Impact of Tele-Advice on
the Community Nurses’ Management of Leg Ulcers.
University of South Wales (United Kingdom).
J Rabelo, MY Kim, R Goebel, M Yoshioka,
Y Kano, and K Satoh. 2020.
Coliee 2020:
Methods for legal document retrieval and
entailment, 2020. URL: https://sites. ualberta.
ca/˜ rabelo/COLIEE2021/COLIEE_2020_summary.
pdf.
Stephen Robertson, Hugo Zaragoza, et al. 2009.
The probabilistic relevance framework: Bm25 and
beyond. Foundations and Trends® in Information
Retrieval, 3(4):333–389.
Christopher Williams. 2007. Tradition and change in
legal English: Verbal constructions in prescriptive
texts, volume 20. Peter Lang.
Guangwei Xu, Yangzhao Zhang, Longhui Zhang,
Dingkun Long, Pengjun Xie, and Ruijie Guo. 2022.
Hybrid retrieval and multi-stage text ranking solution
at trec 2022 deep learning track. TREC 2022 Deep
Learning Track.
Lucia Zheng, Neel Guha, Brandon R Anderson,
Peter Henderson, and Daniel E Ho. 2021. When
does pretraining help? assessing self-supervised
learning for law and the casehold dataset of 53,000+
legal holdings. In Proceedings of the eighteenth
international conference on artificial intelligence and
law, pages 159–168.
Hadeel Saadany, Catherine Breslin, Constantin Orăsan,
and Sophie Walker. 2023. Better transcription of uk
supreme court hearings. In Workshop on Artificial
Intelligence for Access to Justice (AI4AJ 2023).
Hadeel Saadany, Constantin Orăsan, and Catherine
Breslin. 2022. Better transcription of uk supreme
court hearings. arXiv preprint arXiv:2211.17094.
Ted Sanders. 2023.
Customizing embeddings.
OpenAI
https://github.com/openai/
openai-cookbook/blob/main/examples/
Customizing_embeddings.ipynb.
Abhay Shukla, Paheli Bhattacharya, Soham Poddar,
Rajdeep Mukherjee, Kripabandhu Ghosh, Pawan
Goyal, and Saptarshi Ghosh. 2022. Legal case
document summarization: Extractive and abstractive
methods and their evaluation.
arXiv preprint
arXiv:2210.07544.
Georgina Sturge. 2021. Court statistics for England
and Wales. Technical report, House of Commons
Library.
Nandan Thakur, Nils Reimers, Andreas Rücklé,
Abhishek Srivastava, and Iryna Gurevych. 2021.
BEIR: A heterogeneous benchmark for zero-shot
evaluation of information retrieval models. In
Thirty-fifth Conference on Neural Information
Processing Systems Datasets and Benchmarks Track
(Round 2). https://openreview.net/forum?id=
wCu6T5xFjeJ.
Dietrich Trautmann, Alina Petrova, and Frank Schilder.
2022. Legal prompt engineering for multilingual
legal judgement prediction.
arXiv preprint
arXiv:2212.02199.
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan
Yang, and Ming Zhou. 2020. Minilm: Deep selfattention distillation for task-agnostic compression
of pre-trained transformers.
499
Appendix A
An Example of the Automatic Linking of Judgement Segment and
Transcription Segments by GPT Embeddings
500