Lab X - Building A Machine-Learning Annotator With Watson Knowledge Studio
Lab X - Building A Machine-Learning Annotator With Watson Knowledge Studio
Lab X - Building A Machine-Learning Annotator With Watson Knowledge Studio
Watson Application
Developer Workshop
Lab02
Watson Knowledge Studio:
Building a Machine-learning
Annotator with Watson
Knowledge Studio
January 2017
Duration: 60 minutes
Prepared by Víctor L. Fandiño | IBM Global Business Partners
Overview
You can use Watson Knowledge Studio (WKS) to create a machine-learning model
that understands the linguistic nuances, meaning, and relationships specific to your
industry or to create a rule-based model that finds entities in documents based on
rules that you define.
To become a subject matter expert in a given industry or domain, Watson must be
trained. You can facilitate the task of training Watson with Watson Knowledge
Studio. With Watson Knowledge Studio you can deliver meaningful insights to users
by deploying a trained model in other Watson cloud-based offerings and cognitive
solutions, including AlchemyLanguage, Watson Discovery service and Watson
Explorer.
Watson Knowledge Studio provides easy-to-use tools for annotating unstructured
domain literature, and uses those annotations to create a custom machine-learning
model that understands the language of the domain. The accuracy of the model
improves through iterative testing, ultimately resulting in an algorithm that can
learn from the patterns that it sees and recognize those patterns in large collections
of new documents.
The following diagram illustrates how it works
Additionally, you can build a rule-based model with Watson Knowledge Studio.
Watson Knowledge Studio provides a rules editor that simplifies the process of
finding and capturing common patterns in your documents as rules. You can then
create a model that recognizes the rule patterns, and deploy it for use in other
services.
Objectives
• Create projects
• Add dictionaries
Prerequisites
In the Labs Preparation Guide: Getting Started with IBM Watson APIs & SDKs you
have instructions to get an IBM Bluemix and Watson Knowledge Studio account.
Also, you will need Postman for testing the deployed annotator. For this lab, use
the latest version of Chrome or Firefox web browsers. For the best performance,
use a screen resolution of at least 1024x1280.
Note: in this lab you will be working in your own Watson Knowledge Studio instance
with the administrator role (ADMIN). That means that you are the only member of
the annotator component team. A real project always requires multiple human
Creating a project
A project defines all of the resources that are required to create a machine-learning
annotator, including training documents, the type system, dictionaries, and
annotations that are added by human annotators. For more information about
project creation, see Creating a project.
3. Give the project a name. You cannot change the project name later, so
choose a short name that reflects your domain content or the purpose of
the annotator component. You can specify a longer description, which
can be changed later. In this lab, we will name the project “wadwWKS”
6. In the Project Manager Selection, you have the option to add project
managers to the project (the administrator can add or remove project
managers later by editing the project). Only the names of people that you
7. When you are ready, click Create. The project will be created and you will
be directed to the project Type System configuration. To change the
project description or add or remove project managers later, an
administrator can edit the project.
You will now learn how to import and modify a type system within Watson
Knowledge Studio. You must create or import a type system before you begin any
annotation tasks. See Type systems for more information about this topic.
10. Within your project, click Type System in the banner or the navigation
menu. On the Type System page, click Import
11. Select the en-klue2.zip file from your computer and click Import. The
imported type system is displayed in the table
12. 52 entity types and 2,177 relation types should be imported. You can
browse the type system. You can also edit an entity type. For instance,
locate the MONEY entity type. In the Action section click Edit and in the
Roles column delete the role AWARD. Click Save
After you finish making changes to the type system, you can begin adding
documents to your project. You will now learn how to add documents to a project
in Watson Knowledge Studio that can be annotated by human annotators. See
Adding documents to a project for more information about adding documents.
14. Within your project, click Documents in the banner or the navigation
menu. On the Documents page, click Import Document Set
15. Select the documents-new.csv file from your computer and click Import.
The imported file is displayed in the table. The imported document set
should contain 14 documents. You can click the document set in the table
to access a browse the content of each document in the set. They contain
news about computing technologies and companies.
At this point, as a Project Manager, you are now ready to divide the corpus into
multiple document sets and assign the document sets to different human
annotators. Since you are the only user in the instance, you will create a single
annotation set.
16. Within your project, click Documents in the banner or the navigation
menu
17. Click Create Annotation Sets. The Create Annotation Sets window opens.
By default, this window shows the base set (containing all documents), as
well as fields where you can specify the information for a new annotation
set
18. Select your name in the Annotator list and provide a name for the set.
Notice that you could add more sets (and a human annotator for each
one), which is a more realistic situation in a business environment. In the
case of more than one set, the Overlap field specifies the percentage of
documents in the base set to be included in all of the new sets, so they
can be annotated by all annotators and you can compare the results.
Since you only have one set, the overlap has no effect
The new annotation set is created and now appear in the Annotation Sets tab of the
Documents page.
Adding a dictionary
Dictionaries are used in WKS for pre-annotating text when creating a machine-
learning annotator. You will now learn how to add a dictionary to a project in
Watson Knowledge Studio. For more information about dictionaries, see Adding
dictionaries to a project.
21. Within your project, click Dictionaries in the banner or the navigation
menu
Note: Do not click the Import icon, which is used to import a dictionary that you
want to use as-is. For the lab, you will create a new editable dictionary and then
import terms into it.
23. In the Name field, type “Test dictionary”. Click Save to create the (empty)
dictionary. The new dictionary is created and automatically opened for
editing
24. In the dictionary pane, click Import. In the Import Dictionary Entries
window, select the dictionary-items-organization.csv file from your
computer and then click Import. 24 terms in the file are imported into the
dictionary. Each term represents an organization
25. Click Add Entry to create a new term. An editable row is added at the top
26. In the Surface Forms column, type “IBM” and “International Business
Machines Corporation” on separate lines (when you begin to type a new
surface form, a space is added below for an additional surface form).
Leave the radio button next to IBM selected, indicating that this surface
form is the lemma. In the Part of Speech column, select Noun. Click Save
After you create a dictionary, you can use it to speed up human annotation tasks by
pre-annotating the documents.
27. Within your project, click Annotator Component in the banner or the
navigation menu. You can see different ways to pre-annotate documents
28. Under the description of the Dictionary Pre-annotator type, click Create
this type of pre-annotator. The Dictionary Mapping window opens.
29. The list of entity types you previously imported when creating the type
system appears. You now have to associate each dictionary that you want
the dictionary pre-annotator to use, with the entity type that matches the
type of the dictionary terms. You must map at least one dictionary before
you can run the pre-annotator. Map the ORGANIZATION entity type to
the “Test dictionary” dictionary you created previously: Click Edit for the
ORGANIZATION entity type name. Choose the dictionary from the list
30. Click the plus sign beside the dictionary name to add the mapping, and
then click Save
31. Click Create and then select Create & Run from the drop-down menu
32. On the Run Annotator page, click the check boxes to select the document
set that you created earlier in the lab (not including the base set)
The documents in the selected set are pre-annotated using the dictionary annotator
you created. The annotator component is added to the Annotator Component page;
you could later use the same annotator to pre-annotate additional document sets
by clicking Run.
In this section, you will learn how to use annotation tasks to track the work of
human annotators in Watson Knowledge Studio. For more information about
34. Within your project, click Human Annotation in the banner or the
navigation menu. On the Human Annotation page, click Add Task
35. Specify the details for the task: In the Title field, type “Test”. In the
Deadline field, select a date in the future
37. In the Add Annotation Sets to Task window, click the check boxes to select
the document set you created previously. This specifies that the
document set must be annotated by the assigned human annotators as
part of this task. Remember that for this lab you only have one human
annotator and the corresponding annotation set. In a real scenario, you
will have multiple annotation sets assigned to different human
annotators in your project
39. Click the Test task to open it. You can use this view to view the progress
of human annotation work, calculate the inter-annotator agreement
scores, and view overlapping documents to adjudicate annotation
conflicts
Annotating documents
In this section, you will learn how to use the Ground Truth Editor to annotate
documents in Watson Knowledge Studio. For more information about human
annotation, see Annotation with the Ground Truth Editor.
40. In the Test task you created in the previous section, click Annotate next
to the Annotation Set 1 annotation set. The Ground Truth Editor opens,
showing you a preview of each document in the document set. The
Ground Truth Editor opens in a new browser tab, showing you a preview
of each document in the document set
41. Scroll to the “Technology - gmanews.tv” document and click to open it for
annotation. Note that the term “IBM” has already been annotated with
the ORGANIZATION entity type; this annotation was added by the
previous dictionary pre-annotator process. This pre-annotation is correct,
so it does not need to be modified
42. You will now annotate a mention. Click the Mentions icon to begin
annotating mentions. In the document body, select the text “Thomas
Watson”
43. In the list of entity types, click PERSON. The entity type PERSON is applied
to the selected mention
45. Select the “Thomas Watson” and “IBM” mentions (in that order). To
select a mention, click the entity-type label above the text
46. In the list of relation types, click founderOf. The two mentions are
connected with a founderOf relationship
47. Click the Completed option from the menu, and then click the Save icon
to confirm, and then click Close
48. In the list of documents click Submit All to submit the documents for
approval. Once confirmed, you can see that the status of all documents is
Completed
Note: In a real project, you would create many more annotations and complete all
of the documents in the set before submitting.
50. Back in the Human Annotation Tasks window, click the Refresh button.
You can see now that the Annotation Set 1 is in Submitted status
51. Mark the check box near to Annotation Set 1; you will see that an Accept
and Cancel buttons appear. Click Accept. You have now promoted the
Note: In a real situation you will have several annotation sets reviewed by different
human annotators. You will have to compare their work to determine whether
different human annotators are annotating overlapping documents consistently. In
that situation, Watson Knowledge Studio calculates inter-annotator agreement
(IAA) scores by examining all overlapping documents in all document sets in the
task, regardless of the status of the document sets. The IAA scores show how
different human annotators annotated mentions, relations, and coreference
chains. It is a good idea to check IAA scores periodically and verify that human
annotators are consistent with each other. For example, a human annotator could
have defined the relation between IBM and Thomas Watson as founderOf and
another one as employedBy. The IAA scores will reflect this situation that you will
have to analyse and discuss with the annotators to adjudicate conflicts. This is
something you can do in the annotation task. For this simple example in the lab,
you are the single annotator and a minimum set of human annotations have been
done, so no conflicts are present and the annotation set status is completed (the
status should be In Conflict if any overlapping is detected when you select and
accept the document sets).
After you have promoted the documents to ground truth, you can use them to train
the machine-learning annotator. When you create a machine-learning annotator,
you select the document sets that you want to use to train it. You also specify the
percentage of documents that are to be used as training data, test data, and blind
data. Only documents that became ground truth through approval or adjudication
can be used to train the machine-learning annotator.
53. Select the document sets that you want to use for creating a machine-
learning annotator. Click the check mark next to Annotation Set1. Use the
default values for creating your testing, training, and blind data. Then,
click Next. Accept to reuse the current dictionary mapping in the next
window and click Train & Evaluate
Note: Training might take more than ten minutes, or even hours, depending on the
number of human annotations and the total number of words across documents.
54. You are now back to the Annotator Component window where you can
see the training progress of your annotator
55. After the machine-learning annotator is trained, you can export it or you
can view detailed information on its performance by clicking Details. On
the Model Settings tab, you have access to the Train / Test / Blind sets
where you can view the documents that human annotators worked on.
You can click View Decoding Results to see the annotations that the
trained machine-learning annotator created on that same set of
documents
56. On the Statistics tab, you can view details about the precision, recall and
F1 scores for the machine-learning annotator. You can view these scores
for mentions, relations, and coreference chains by using the radio
buttons. You can analyse performance by viewing a summary of statistics
for entity types, relation types, and coreference chains. You can also
analyse statistics that are presented in a confusion matrix. The confusion
matrix helps you compare the annotations that were added by the
machine-learning annotator to the annotations in the ground truth.
Note: In this tutorial, you annotated documents with only a single dictionary for
organizations. Therefore, the scores you see are 0 or N/A for most entity types. The
numbers are low, but that is expected, because you did not do any human
annotation or correction.
57. On the Versions tab, you can take a snapshot of the annotator and the
resources that were used to create it (except for dictionaries and
annotation tasks). For example, you might want to take a snapshot before
you change the annotator. If the statistics are poorer the next time you
run it, you can promote the older version and delete the version that
returned poorer results. Also, if you want to make your annotator
available to other Watson applications, you must create at least one
version of the annotator. This allows you to deploy one version, while you
continue to improve the current version. The option to deploy does not
appear until you create at least one version
58. On the Versions tab, click Take Snapshot. Provide a description of the
snapshot
59. Once the snapshot has been created, click Deploy in the Action section in
the same line that the snapshot. Select AlchemyLanguage as the service
to deploy the model to and click Next
60. In the Deploy Model window, introduce your AlchemyLanguage API key.
This is the same one you used in the previous AlchemyLanguage lab (you
can get it from your IBM Bluemix dashboard, where you should have your
AlchemyLanguage service). Click Deploy
62. You can check the status of the deployment in the Action section close to
the snapshot. Depending on the model, the deployment can take some
time to complete. Once you see that the status is available, you are model
is ready to be used by the AlchemyLanguage methods
After you train a machine-learning annotator, you can use it to pre-annotate new
documents that you add to the corpus and you can make it available to other
Watson applications, like AlchemyLanguage, Watson Discovery service or Watson
Explorer.
See Using the machine-learning model to learn how to deploy your annotators to
these IBM Watson applications.
Now that you have deployed your model, you can test it with AlchemyLanguage.
Verify first that the deployment has finished. We are now going to extract named
entities with the AlchemyLanguage GetRankedNamedEntities method.
63. Open a Postman session. Create a new POST HTTP request with the
following parameters:
Method POST
Endpoint https://gateway-
a.watsonplatform.net/calls/text/TextGetRankedNamedEntities
outputMode xml
emotion 1
sentiment 1
knowledgeGraph 1
text NCR, which counts later IBM Thomas Watson as one of its early
employees, said its products and services account for more than
$400 billion in annual commerce and 23 billion consumer self-
service transactions
64. Verify that in the Body section of the request, the option x-www-form-
urlencoded is selected
65. Click Send to make the request. You should get an XML output similar to
the following one
67. If you try the same request, but now removing the model parameter, you
should realize that the entities extracted are much more generic
68. You can try the same exercise now with the AlchemyLanguage
GetTypeRelations method. You will see that Thomas Watson is identified
as founderOf IBM. Try again removing the model parameter and compare
the results
End of Lab