The term "Constructed Language"---often shortened to "ConLang"---is used to refer to any artificially created language that is intended, in principle, to be as expressive as naturally evolved human languages. The latter restriction is important since ConLangs are to be distinguished from artificial languages, such as mathematical symbology, which are artificial languages (at least in the formal sense of language) to be sure, but which are much more limited in the kinds of messages they can convey.
When people think of ConLangs, the first things that come to mind may be languages like Esperanto or Interlingua, which are artificial languages designed to be like natural languages, though with fewer of the grammaical complexities one often finds in real natural languages. ConLangs in this category were intended by their creators to serve as substitutes for naturally evolved languages.
Or, second, one might think of fantasy languages, such as the languages of Tolkien’s Middle Earth---Elvish (Quenya) or the language of Mordor. Again, these are artificial languages that nonetheless bear a strong resemblance to natural languages. Quenya, for example, looks very Finno-Ugri: see https://en.wikipedia.org/wiki/Finnish_influences_on_Tolkien.
A third category are invented alien languages, such as Klingon or Vulcan from the Star Trek franchise which, at least in principle, were designed to be somewhat unlike typical human languages in their construction.
Finally, there is a fourth category of languages that are supposed to be based on "logical" principles, such as LogLan or John Wilkins’ "philosophical language".
In this research, we are interested in discovering to what degree Large Language Models can help in the creation of ConLangs that are much like real human languages in their properties, but are not intended to replace naturally evolved languages. Thus the ConLangs created basically fit into the second notion of sketched above.
First of all:
pip3 install -r requirements.txt
For Anthropic (Claude series) models we assume access via AWS. Make sure that
your environment recognizes AWS environment variables. To check this, run echo $AWS_ACCESS_KEY_ID
and echo $AWS_SECRET_ACCESS_KEY
, and make sure some value
is returned. If not, set the environment variables export AWS_ACCESS_KEY_ID=<your access key id>
and export AWS_SECRET_ACCESS_KEY=<your secret acces key>
. If you are using conda (which we use), also consider adding
the environment variables in the virtual environment by conda env config vars set AWS_ACCESS_KEY_ID=<your access key id>
etc.
For OpenAI and Gemini you will need to pass your API key as a command-line argument (though eval_morphosyntax.py
reads the OpenAI key from the environment variable OPEN_AI_API_KEY).
A good place to start is with the scripts named controlled_*_claude*.sh
in the
subdirectories modular_experiments
and modular_experiment/controlled_morphosyntax
.
The basic protocol is to:
- Generate a phonology (phonotactics) for the language
- Generate the morphosyntax
- Generate an initial lexicon
- Generate the orthography
- Generate an updated lexicon with spellings
- Generate a handbook --- a short grammar describing the language.
- Optionally, translate more texts into the target language.
Examples for various language configurations are given below:
- Japanese-like phonology
- Turkish-like morphosyntax
- Cyrillic-based orthography
. ./modular_experiments/controlled_phonology_claude.sh
. ./modular_experiments/controlled_morphosyntax/claude/turkish.sh
. ./modular_experiment_outputs_controlled/claude/turkish/metascripts/word_order.sh
. ./modular_experiment_outputs_controlled/claude/turkish/metascripts/morphosyntax_0_0_0.sh
. ./modular_experiment_outputs_controlled/claude/turkish/metascripts/morphosyntax_0_1_0.sh
. ./modular_experiment_outputs_controlled/claude/turkish/metascripts/morphosyntax_0_2_0.sh
. ./modular_experiments/controlled_corpus_creation_claude_0.sh
. ./modular_experiments/controlled_orthography_claude.sh
. ./modular_experiments/controlled_corpus_creation_claude_1.sh
. ./modular_experiments/controlled_handbook_claude.sh
This workflow does not include instructions on phonotactics/phonology. The example below uses Claude as the LLM and the French-like feature sets.
./modular_experiments/controlled_morphosyntax/claude/french.sh
./modular_experiment_outputs_controlled/claude/french/metascripts/word_order.sh
./modular_experiment_outputs_controlled/claude/french/metascripts/morphosyntax_0_0_0.sh
./modular_experiment_outputs_controlled/claude/french/metascripts/morphosyntax_0_1_0.sh
./modular_experiment_outputs_controlled/claude/french/metascripts/morphosyntax_0_2_0.sh
A metascript morphosyntax_0_0_0.sh
, e.g. for French, includes the following shell scripts (for now):
./modular_experiment_outputs_controlled/claude/french/negation_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/nominal_number_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/definiteness_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/pro_drop_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/adjective_agreement_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/comparative_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/mood_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/tense_aspect_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/person_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/voice_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/relativization_0_0_0/script.sh
./modular_experiment_outputs_controlled/claude/french/infinitive_0_0_0/script.sh
To run an evaluation,
./modular_experiment_outputs_controlled/claude/french/metascripts/evaluation_0_0.sh
- After you run
modular_experiment_outputs_controlled/claude/metascripts/word_order.sh
, you get the intermediate output inmodular_experiment_outputs_controlled/claude/word_order_0_{j}/sentence_design_output.txt
. - The final output can be found in
modular_experiment_outputs_controlled/claude/
You can run an experiment across all the languages, including the evaluation step, by running the following command:
./modular_experiments/controlled_morphosyntax/claude/all_languages.sh
If you add a new morphological feature stage, then you should see the corresponding folder (like number_0_0_0/
) here.
and each shell script, for example inclusive_exclusive_0_0_0/script.sh
, looks like this:
#!/bin/bash
python3 create/create.py \
--model="claude" \
--task="cumulative_morphosyntax" \
--sample_text="sentence_design_output/grammatical_test_sentences.txt" \
--open_ai_api_key="" \
--num_iterations=1 \
--output="modular_experiment_outputs_controlled/claude/inclusive_exclusive_0_0_0/sentence_design_output.txt" \
--output_full="modular_experiment_outputs_controlled/claude/inclusive_exclusive_0_0_0/sentence_design_output_full.txt" \
--user_prompt="prompts/cumulative_morphosyntax/inclusive_exclusive.txt" \
--user_prompt_dump="modular_experiment_outputs_controlled/claude/inclusive_exclusive_0_0_0/user_prompt.txt" \
--previous_translation="modular_experiment_outputs_controlled/claude/word_order_0_0/sentence_design_output.txt" \
--inclusive_exclusive="False"
To run this correctly, make sure to have the user_prompt
ready.
When you run create/create.py
with the cumulative_morphosyntax
task,
- it will call
create/create_lib.py
'srun_model(client, params, output_path, output_path_full, user_prompt_path, modular_morphosyntax, user_prompt_dump=user_prompt_dump,
function. run_model()
function first callsutils/common_utils.py
'screate_user_prompt(params, user_prompt_path, modular_morphosyntax)
functio 981F n.- This function gets the corresponding prompt text file first, and calls the
modular_morphosyntax_prompt()
function to prepare the prompt text viaload_system_instruction()
function. The comments<!-- ... -->
and the instruction tags<INSTRUCTIONS>
</INSTRUCTIONS>
are removed. - The loaded user prompt contains placeholders for specific morphosyntactic parameters. These placeholders are filled out by the passed morphosyntactic parameters via
jinja
'sTemplate(prompt).render(**params)
. - So, make sure that the keys in
params
(defined inmorphosyntax_params.py
) correctly corresponds to the placeholder variable names in the user prompt.
- This function gets the corresponding prompt text file first, and calls the
So, if you want to add a new morphological feature, below are the points for modification:
prompts/cumulative_morphosyntax/
: In this folder, make a new prompt file for the newly added feature.- Add the new feature to either
Syntax
class orMorphology
class inmodular_experiments/morphosyntax_params.py
. Also update the sample parameter sets if necessary.- You may also add a nested BaseModel value to a key. (only one-level nest is supported for now.)
- If the value is BaseModel, the corresponding flag value in a command should be in string. (see
RELATIVIZATION
defined incommon_utils.py
for an example.) - Then, the passed string is re-converted to a dictionary in
cumulative_morphosyntax_params()
incommon_utils.py
, which is loaded inrun_morphosyntax.py()
.
- In
modular_experiments/run_morphosyntax.py
,- If you are adding a syntactic feature, add to the
run_word_order_experiment()
the following:- Add a line to load the newly added feature (e.g.
main_word_order = features["main_word_order"]
) - Add an argument to a command line string to pass the newly added feature (e.g.
f' --adj_noun_word_order="{adj_noun_word_order}"'
).
- Add a line to load the newly added feature (e.g.
- If you are adding a morphological feature,
- Add the stage (i.e., the generation step to process the morphological feature) in the parameter
STAGE
. - Add it to the dictionary
STAGE_FLAGS
. The key should be the stage name inSTAGE
, and the value should be the root name of the prompt file (e.g., if the prompt file name isstage.txt
, the value should bestage
.) - If your new morphological feature is a nested BaseModel, then make sure to reconstruct the original dict structure in
cumulative_morphosyntax_params()
.
- Add the stage (i.e., the generation step to process the morphological feature) in the parameter
- If you are adding a syntactic feature, add to the
- In
utils/common_utils.py
,- Add a command line argument variable for the newly added feature with
flags.DEFINE_string
orflags.DEFINE_list
(an absl feature) - Add the new feature and its corresponding file name to
feature_to_file
. It is safe to have the same names in a key and its value. - Add the new feature to the return values of
cumulative_morphosyntax_params()
to map the flag value to the params dictionary. This will be called increate.py
.
- Add a command line argument variable for the newly added feature with
- In
modular_experiments/run_handbook.py
, update the features passed tocmd
inrun_handbook_experiment()
.
If you add a feature that is not a module (stage) but is processed by an existing module, then you need to:
- define a feature in
modular_experiments/run_morphosyntax.py
; - add a placeholder in the corresponding existing prompt file.
But:
- You do not have to create a new prompt file;
- You do not have to add a stage to
STAGE
inmodular_experiments/run_morphosyntax.py
.
First, go to llm/llm.py
and add the new LLM's name to the flag options in MODEL
.
If the model needs an API key, make sure you have the API key set as an environment variable.
To check whether you have set it ready in the environment variables, you can run conda env config vars list
.
To add, conda env config vars set <API_KEY_NAME>=<your_api_key>
.
Then, add a code block to instantiate an LLM client for the newly added model in the client()
function of llm/llm.py
.
If it is a new variant of an already added model family (e.g., another Gemini model), then make sure that the existing condition will correctly load the newly added model.
Next, add a code block to llm_predict()
in llm/llm.py
to run the LLM inference through API.
Finally, add the newly added model name to the --model
argument in the argument parser in get_args()
in modular_experiments/run_morphosyntax.py
.
modular_experiments/controlled_translation_claude.sh
gives an illustration of how to translate a new text into the target language. This can then
be followed by modular_experiments/controlled_corpus_creation_claude_2.sh
, which will add new words to the lexicon and produce a written
corpus of the new translation using the language's orthography. This assumes you
have created a language with Japanese-like phonology, Arabic-like morphosyntax, and
which uses the Latin script.
ERROR: Can't invoke 'us.anthropic.claude-3-5-sonnet-20240620-v1:0'. Reason: Unable to locate credentials.
: This happens when the AWS Bedrock cannot find your AWS credentials in the environment variables. Make sure you have your AWS credentials and have set them as the environmental variables.- `ERROR in sample text in the user prompt: the sample text depends on the earlier morphosyntax outputs by Claude. If any of the preceding tasks did not run successfully, you do not get the output. Make sure that the preceding tasks are successfully executed.
The structure of the evaluation is as follows:
- Input:
- input text (str): The input text (English) to translate.
- features (dict-like): The morphosyntactic features that our desired conlang has.
- Output:
- output text (str): The output text as a gloss.
Let's say that the input text is "He is the smartest student in our school." and the input features are the French-like set. Then, the output should be something like "he be-PRES-3SG the student-SING smart-SUP in we-GEN school."
See the subdirectory handbooks
for a variety of handbooks for different ConLangs created by the system.
Please cite this work as:
Chihiro Taguchi and Richard Sproat. 2025. "IASC: Interactive Agentic System for ConLangs". https://arxiv.org/abs/2510.07591.
Iasc is Irish Gaelic for 'fish', which seemed to us appropriate for Sakana.ai.