Module talk:languages/data/2

From Wiktionary, the free dictionary
Latest comment: 1 day ago by Slowcuber7 in topic Edit request: Azerbaijani (az)
Jump to navigation Jump to search

Edit request

[edit]

Can someone instate this sort_key for pl the next time they edit?

{
	from = { "[Ąą]", "[Ćć]", "[Ęę]", "[Łł]", "[Ńń]", "[Óó]", "[Śś]", "[Żż]", "[Źź]" },
	to   = {
		"a\244\143\191\191",
		"c\244\143\191\191",
		"e\244\143\191\191",
		"l\244\143\191\191",
		"n\244\143\191\191",
		"o\244\143\191\191",
		"s\244\143\191\191",
		"z\244\143\191\191",
		"z\244\143\191\190"
	}
}

Keφr 08:36, 25 December 2013 (UTC)Reply

Done Done - -sche (discuss) 20:22, 26 December 2013 (UTC)Reply

Serbian, Croatian, Bosnian

[edit]
m["sr"] = {
	names = {"Serbian", "Standard Serbian"},
	type = "regular",
	scripts = {"Latn", "Cryl"},
	family = "..."}
m["hr"] = {
	names = {"Croatian", "Standard Croatian "},
	type = "regular",
	scripts = {"Latn", "Cryl"},
	family = "..."}
m["bs"] = {
	names = {"Bosnian", "Standard Bosnian"},
	type = "regular",
	scripts = {"Latn", "Cryl"},
	family = "..."}

These languages should also be seperated from 'sh'. They are completely standardized. Anyway, I don't know what to put in the family codes. --Octahedron80 (talk) 04:09, 6 April 2014 (UTC)Reply

That's one way to do it, but we had a long and very thorough debate and a vote, and decided to do it the other way. You're welcome to disagree with it, but that was the decision made here at Wiktionary. All three are considered sh=Serbo-Croatian, with different forms labeled in the definition line. Look through the entries accessible through Category:Serbo-Croatian language and read WT:ASH to see how we treat them. Chuck Entz (talk) 05:12, 6 April 2014 (UTC)Reply

Edit request (Lithuanian)

[edit]

Hello. Could anyone add "ũ" into the place shown below? There are terms which contain a diphthong with accentuation , e.g. kraũjas (blood), naũjas (new). --Eryk Kij (talk) 20:31, 1 June 2015 (UTC)Reply

[úù]
Done Done --WikiTiki89 20:36, 1 June 2015 (UTC)Reply

Edit request: duplicated name

[edit]

Data integrity: Sichuan Yi (code "ii") has "Nuosu" listed twice as "otherName". — Jberkel (talk) 12:09, 27 March 2016 (UTC)Reply

Fixed. :) - -sche (discuss) 14:06, 27 March 2016 (UTC)Reply

th & lo

[edit]

Please add this sort key to 'th':

	sort_key = {
		from = {"([เแโใไ])([ก-ฮ])"},
		to   = {"%2%1"}},

Please add this sort key to 'lo':

	sort_key = {
		from = {"ຼ", "ຽ", "ໜ", "ໝ", "([ເແໂໃໄ])([ກ-ຮ])"},
		to   = {"ລ", "ຍ", "ຫນ", "ຫມ", "%2%1"}},

Then we don't need to use DEFAULTSORT every entry. These sortings are not perfect like local dictionaries, but it's better than nothing. --Octahedron80 (talk) 06:47, 15 September 2016 (UTC)Reply

Edit request

[edit]

For Punjabi (pa), can the following line be added:

ancestors = {"psu"},

Thanks. —Aryamanarora (मुझसे बात करो) 17:53, 22 May 2017 (UTC)Reply

Done!Eru·tuon 18:55, 22 May 2017 (UTC)Reply

standard characters

[edit]

Should standard characters for Chinese be included?--Zcreator (talk) 13:08, 31 December 2017 (UTC)Reply

Avar

[edit]

Can someone correct the Wikipedia article for Avar? It's Avestan right now lol. – Julia (talk• formerly Gormflaith • 01:19, 27 May 2018 (UTC)Reply

Edit request for Kashmiri (ks)

[edit]

Hey, I am adding transliteration systems for Kashmiri Perso-Arabic and Devanagari. I need to list at list three modules here. This should do it:

translit_module = "ks-Arab-translit", "ks-Deva-translit", "ks-translit"

— This unsigned comment was added by Sinonquoi (talkcontribs) at 07:58, 7 July 2018 (UTC).Reply

To add multiple modules (one for each script, presumably), we use Module:translit-redirect. But Module:ks-Arab-translit does not transliterate vowels, Module:ks-translit has module errors, Module:ks-Deva-translit does not exist. The modules can be added when they are ready. — Eru·tuon 21:31, 7 July 2018 (UTC)Reply
@AryamanA: The modules were throwing errors and then (when I fixed the module error) not actually transliterating (टंजानिया → टंजानिया in the translations in Tanzania because Module:ks-translit doesn't handle Devanagari). They're not quite ready. — Eru·tuon 00:52, 8 July 2018 (UTC)Reply
@Erutuon: Oh, I thought MOD:ks-translit was handling Devanagari (at least it was when I made it a while ago). Thanks for catching all of that, I prematurely added the module. —AryamanA (मुझसे बात करेंयोगदान) 02:39, 8 July 2018 (UTC)Reply
@AryamanA: Yeah, @Sinonquoi changed it while working on Arabic script. Now the edit history contains stuff that belongs at Module:ks-Deva-translit and Module:ks-Arab-translit. Oh well. — Eru·tuon 03:01, 8 July 2018 (UTC)Reply
@Erutuon: I fixed the ks-Arab module. It should be able to handle most cases that will be thrown at it initially. Do have a look at the testcases page. Sinonquoi (talk) 06:44, 9 July 2018 (UTC)Reply
@Sinonquoi: It looks much better now. I don't know Kashmiri, so I did a Wiktionary search (insource:"Kashmiri" insource:/\|ks\|[؆-ۿ]/) and found some examples that either were definitely not transliterated correctly by the module, or looked incorrect, and added them to the testcases. But others looked like they may have been transliterated correctly (though according to a different system).
What concerns me is that the transliteration system in manual transliterations in the search results is quite different from the one in the module: for instance, macrons rather than the doubled vowel letters. So if I added the module, there would be two very different systems in use, which Wiktionary tries to avoid. Either the module has to change, or the current manual transliterations do. — Eru·tuon 07:44, 9 July 2018 (UTC)Reply
@Erutuon: I looked at those terms you added to the testcases page. They're not written in the proper Kashmiri standard, but rather some sort of a Persian or Urdu standard. Kashmiri marks the long /i:/ on the ye itself unlike what has been done with the word for Kashmir (کٔشِیر) which is a Persian practice. Likewise with the other words which either drop diacritics (علم ریاضی) or again don't conform to the standard (وُونٹھ, where the long /u:/ should be on the second waw). Writing Kashmiri without diacritics is horrendous since there can be five inherent short vowels on a consonant if not marked.
I agree with the inconsistency within the various transliterations. Anything mentioned on Wiktionary or elsewhere is just arbitrary as any other system, and I find most transliteration systems to have issues or to be based on models which don't have much relevance for Kashmiri (The IAST for instance), ergo the practice I've implemented which seems to me a very fit model for Kashmiri and fairly readable too, which however I am free to modify if it persists as a hindrance to the addition of this module. Thanks. Sinonquoi (talk) 08:15, 9 July 2018 (UTC)Reply
@AryamanA reckons Module:ks-Arab-translit is working well enough now, so it is being used. I un-hooked Module:ks-Deva-translit earlier on because it's using a different transliteration system from the other module and the testcases aren't passing. — Eru·tuon 16:03, 9 July 2018 (UTC)Reply

Edit request

[edit]

Please could someone be so kind to add the sort_key and standardChars for West Frisian (fy) from fy:Module:languages/data2. Thank you! --PiefPafPier (talk) 16:43, 8 August 2018 (UTC)Reply

@PiefPafPier: Done. — Eru·tuon 18:53, 8 August 2018 (UTC)Reply

@Eru: Can you add translit_module = "translit-redirect", for "ku"? Thanks.--Calak (talk) 10:42, 21 August 2018 (UTC)Reply

otherNames of Malay (ms)

[edit]

Can someone please remove "Orang Seletar", "Orang Kanaq", "Jakun", "Temuan" which are listed under "otherNames" of the language code "ms" (Malay language)? Those four languages are spoken by the aboriginal people (Orang Asli) in peninsular Malaysia which is not the same as "standard Malay" (standard register spoken in Brunei, Singapore, Malaysia) or "Malaysian" (national language of Malaysia). The proposal made in 2016 at Wiktionary:Language treatment requests/Archives/2015-19#Merging Malay which states that these four languages are "mere dialects of Malay with no written tradition and perfectly mutually intelligible" is incorrect. The four languages are mutually intelligible with one another but differences do exist between these four languages and standard Malay. KevinUp (talk) 16:37, 27 September 2018 (UTC)Reply

@KevinUp: Done Done. I wonder, are there language codes that these languages should be placed under? — Eru·tuon 19:46, 27 September 2018 (UTC)Reply
Thanks. The generic language code "msa" (Malayan languages) defined in ISO 639-3 or the more specific "ors", "orn", "jak", "tmw" can be used for these four languages. These languages are actually endangered with less than 25,000 native speakers per language so it might be a while before we encounter editors interested or proficient in these languages. KevinUp (talk) 23:33, 27 September 2018 (UTC)Reply

Alternative name "Tigrigna" for Tigrinya

[edit]

Can someone please add otherNames = {"Tigrigna"} to the Tigrinya field (m["ti"])? This is a common alternative spelling of the language's name. Mo-Al (talk) 21:30, 7 December 2018 (UTC)Reply

@Mo-Al: Done Done. — Eru·tuon 22:16, 7 December 2018 (UTC)Reply

ja-Latn

[edit]

@Erutuon I was thinking about removing Latn from ja, and subsuming the Latn character set under the Jpan character set. Sometimes the automatic script detection decides that text is Latn just because it has more Latn characters in it, which looks terrible: あああああaaaaaa. I also don't see the need for adding Hira. Do you think this is alright? —Suzukaze-c 02:39, 1 February 2019 (UTC)Reply

@Suzukaze-c: Yeah, the logic of findBestScript doesn't work for Japanese. It probably isn't possible to always detect the correct script code automatically, though "Jpan if any Hani, Hira, or Kana, else Latn if any Latn" would come closer than the current codepoint-count logic.
I guess it doesn't hurt to merge Hira into Jpan because they probably both use the same fonts. It could be argued that hiragana transcriptions of words that would otherwise be spelled with kanji should be tagged as Hira, but style-wise it probably doesn't really matter. And maybe all such words would have a manually supplied script code.
Font-wise, I guess rōmaji should be displayed in half-width characters, which maybe can be achieved by tagging them as Latn, and Latin-script words that are used in actual Japanese text in full-width characters, which is achieved by tagging them as Jpan. In theory that means that Latn should be in the script list for Japanese. But findBestScript probably can't reliably determine what is rōmaji and hence should use Latn even if we give it better logic. I mean, it would have no way to decide that an only Latin-script term is rōmaji or not. So maybe manual input would be required anyway and so it wouldn't really hurt to remove Latn. — Eru·tuon 03:46, 1 February 2019 (UTC)Reply
Module:ja-headword is already hardcoded to use Latn for Rōmaji. I like that better than hardcoding it to always use Jpan.
Jpan indeed already includes Hira.
As for mixing Latn and Jpan, I think that always using Jpan is better, since Japanese fonts are designed to feature Latn text, but not the other way around. —Suzukaze-c 03:56, 1 February 2019 (UTC)Reply
I guess I agree that it's better for some rōmaji to be tagged as Jpan than for some kanji–kana–Latin to not be. (Another way to achieve this is by adding a special case to findBestScript for Japanese.) This would change the script tagging of rōmaji that is used outside of headwords without Latn being manually specified (however many cases there are of that), and would remove Latin from lists (like the list of scripts in Category:Japanese language) that rely on the language data table. — Eru·tuon 04:41, 1 February 2019 (UTC)Reply
Hmm, are there examples of such rōmaji? I can't think of any. —Suzukaze-c 05:01, 1 February 2019 (UTC)Reply
I guess not many. Searching : insource:"m|ja" insource:/\{\{m\|ja\|[a-zA-ZāĀēĒīĪōŌūŪ]/ or : insource:"l|ja" insource:/\{\{l\|ja\|[a-zA-ZāĀēĒīĪōŌūŪ]/ mostly finds stuff with {{m|ja|sc=. So it's a small issue. — Eru·tuon 05:14, 1 February 2019 (UTC)Reply
They all seem to be instances of improper formatting. I tried fixing some of them, but I'm not interested in hunting down every single one. I think I'll go ahead with my plans. —Suzukaze-c 07:10, 1 February 2019 (UTC)Reply
Turns out the change has broken some cases of {{ja-usex}}, because it uses the Jpan character pattern to determine if the second parameter is a translation or a hiragana transcription. But that can probably be fixed in Module:ja-usex. — Eru·tuon 01:04, 4 February 2019 (UTC)Reply

@Fish bowl, Erutuon Sorry for necroposting. Can you explain why Japanese need Latn at all? Neither ko nor Kore contains Latn and I don't see Korean having any obvious problems with latin letters (PC방). -- Huhu9001 (talk) 03:34, 1 April 2023 (UTC)Reply

Do you mean adding/removing Latn to the ja language object, or do you mean including Latn as a part of Jpan? I guess the former is for romaji (it's been like that since the earliest module revision, so I wouldn't really know), while the latter is (as described above) to avoid tagging 51% Latn text as Latn instead of Jpan. —Fish bowl (talk) 03:42, 1 April 2023 (UTC)Reply
The former. I am considering to just remove Latn from Japanese altogether since I don't quite see why it is necessary. None of the romaji in ageru, 𛀁 or 起きてぃ shows any difference with or without Latn in Jpan. -- Huhu9001 (talk) 03:52, 1 April 2023 (UTC)Reply
If everything still works then I don't see any harm in it. —Fish bowl (talk) 01:00, 2 April 2023 (UTC)Reply
@Fish bowl There is actually an issue with this, because it makes it impossible to sort romaji correctly with makeSortKey. Reinstating it won't cause problems for entries like PC-98 (PC-98) either, because {{ja-pos}} will never select Latin. Since it was only removed because it seemed pointless, I've gone ahead and restored it. Theknightwho (talk) 12:17, 26 August 2023 (UTC)Reply
I think possibly one reason was so that in romaji entries, Module:headword would see that the title is Latn and wouldn't add a display title with Jpan, which makes the romaji display in a Japanese-appropriate font in Firefox (even when there might be non-Japanese entries on the page). It doesn't have a visual effect because browsers don't know about our ad-hoc class="Latn". Actually, that is true in Firefox, but apparently not in Chrome (just tried it). Probably the Chromium engine is noticing the ASCII characters and deciding they don't need a special font.
However it looks like that {{head|ja|adjective|sc=Jpan}} does not add a class="Jpan" display title on a Latin-script title (whereas the display title is added on a kanji and hiragana title), so this reason is no longer valid, if we can rely on this continuing to be true. Not sure what changed.
We used to prevent Firefox using a different font by tagging romaji with lang="ja-Latn" in transliterations (though we no longer do that... not sure when that changed), because then the browser knows from the compound language code that it shouldn't use fonts that are designed for kanji, hiragana, and katakana.
Removing Latn from ja and related languages could have effects on how sortkeys are generated. We have sort_key = {Jpan = "Jpan-sortkey"} so previously Latn terms would not be passed to Module:Jpan-sortkey. It might be fine though. — Eru·tuon 01:35, 2 April 2023 (UTC)Reply
Oh, actually Latn was already removed from ja in February 2019, so I'm not sure what the issue is here. — Eru·tuon 01:40, 2 April 2023 (UTC)Reply

Sotho-Tswana

[edit]

Can we add Proto-Sotho-Tswana as a direct ancestor of Sotho, Tswana, Northern Sotho, etc.? These languages form a genetic grouping within the Bantu family. Smashhoof (talk) 03:46, 23 March 2019 (UTC)Reply

@Smashhoof2: That requires a code for Proto-Sotho-Tswana and for a Sotho-Tswana family. That can be discussed in the Beer Parlour or Wiktionary:Requests for moves, mergers and splits, I think. — Eru·tuon 22:38, 23 March 2019 (UTC)Reply
We don't need a discussion elsewhere for a non-controversial new code, I don't think. Something like bnt-sts should be fine. @-scheΜετάknowledgediscuss/deeds 23:56, 23 March 2019 (UTC)Reply
Yeah, bnt-sts & ...-pro works; I'll add the family and proto-language code, if you'd like to take care of adding it as an ancestor of all the relevant languages. - -sche (discuss) 00:05, 24 March 2019 (UTC)Reply
Should be all done now. —Μετάknowledgediscuss/deeds 00:13, 24 March 2019 (UTC)Reply
Thanks! — Eru·tuon 00:33, 24 March 2019 (UTC)Reply

Edit request: ancestors of Norwegian Bokmål

[edit]

I would like to request that Danish (da) be added as an ancestor of Norwegian Bokmål (nb). Unlike the spoken Norwegian language (no) and the Nynorsk standard (nn), Bokmål is not a direct descendant of Old Norse, but rather one of Danish (cf. w:Bokmål). It has gone through far-reaching spelling reforms aiming to “re-norwegianize” it, through which it has received spellings, inflections, vocabulary, syntax etc. from Nynorsk and the spoken dialects, but there is a continuous line going back to 19th century Danish (the w:Riksmål movement has been “faithful” to this relation), from which modern written Danish also developed separately. The common standard was in turn based on the dialect of Copenhagen, which was developed from Old Danish.

For example, it has kept -j- between g/k and front vowels whereas Danish has abolished them, it introduced the letter Å several decades before Danish and it has merged the Danish verb endings -ede and -et into -et (the Middle Norwegian / Nynorsk line yielded -a in both cases). It has also re-introduced voiceless consonants instead of the voiceless ones used in certain cases in Danish, but in some cases other sound changes have not been reversed, yielding hybrid forms such as uke (Danish uge) vs. native veke (earlier/dialectal vike, vika, vyku etc.). In other cases, the Danish word has been kept with minimal changes, e.g. lege (Danish and early Bokmål læge), cf. Nynorsk lækjar and dialectal Eastern Norwegian /leːkǝɾ/. There are some common inventions shared between Danish and Eastern Norwegian, e.g. vatn > vann (written vand in Danish) and (partial) reduction of Norse vowels a and u in inflectional endings, but central parts of the Bokmål standard are unmistakably Danish, like the pronouns hva (“what”, E.Norw. (h)å etc.) and hvilken (“which”, E.Norw. (h)okken, (h)åfferen etc.) and the definite plural ending -ene in masculine nouns (E.Norw. -a(n(e))).

Through a failed attempt to merge it with Nynorsk, Bokmål has received the three-gender system present in all native dialects except the one in Bergen (which, through contact with Middle Low German, developed the same two-gender system that is found in Stockholm and Copenhagen), (some) diphthongs and Norwegian syntax and vocabulary (on the other hand, Bokmål influence on Nynorsk was mostly restricted to vocabulary, not syntax and inflections, and is comparable to the Middle Low German influence on the language). It thus has roots in both Danish and New/Middle Norwegian, and this module should reflect that.

An unfortunate consequence of the current state, is that I was unable to add the correct etymology of the word hellig. It is not inherited directly from Old Norse heilag as the entry read before. Instead it was inherited from Danish hellig and has been unchanged since the common Danish language used in the 1800s in both countries (native heilag has neither been introduced through the spelling reforms nor actually been used in literature). Using {{inh|nb|da}} produces an error and using {{bor}} instead of {{inh}} gives an incorrect picture of the reality – it was not borrowed, it was there all along.

PS. The “official” dictionary Bokmålsordboka is full of incorrect etymologies like this, due to the fact that it was developed together with the Nynorsk dictionary Nynorskordboka and relies heavily upon an earlier, unpublished draft for a Nynorsk dictionary (“Grunnmanuskriptet”). Wiktionary should not inherit these mistakes – the contemporary, independently developed Bokmål dictionary NAOB almost always gives the correct etymologies. It never says “borrowed from Danish” unless it is a newer loan, it just says “Danish form X, cf. native/Nynorsk Y”. Hått (talk) 23:35, 17 August 2019 (UTC)Reply

@Hått: I don't know enough about Bokmål to evaluate this request. (Not many languages are recorded as having more than one ancestor; see the full list.) This would be better discussed in the beer parlour, where people familiar with Norwegian will see it. — Eru·tuon 00:19, 18 August 2019 (UTC)Reply
I tried starting a discussion there (Wiktionary:Beer_parlour/2019/August#Norwegian_Bokmål:_classification_and_etymology_issues), but failed to draw any significant attention to it. Is there anything else I can try? The Wikipedia article on Bokmål explains how it was developed from Danish (also note the language family tree in the infobox on the right), but I do not know if that is sufficient. Hått (talk) 19:11, 10 October 2019 (UTC)Reply

Japanese parent

[edit]

Can we make Japanese's ancestor ja-ear? MiguelX413 (talk) 19:18, 28 October 2019 (UTC)Reply

@MiguelX413: Not at the moment; that would break etymology templates because of the way inheritance relationships are set up. I explained this in Wiktionary:Beer parlour/2019/September § Technical considerations: an etymology template "resolves an etymology language to its parent before checking that the first language can inherit from the second". If ja-ear was the ancestor of ja, then given {{inh|ja|ja-ear|...}}, the template would convert ja-ear to ojp and then check whether ojp is the ancestor of ja – nope – and then return a module error. Parents of etymology languages are not the same thing as ancestors of non-etymology languages: they are the language that the etymology language is a subvariety of. So this would require changes to our language infrastructure. — Eru·tuon 21:05, 28 October 2019 (UTC)Reply
@Erutuon: I see, thanks for the explanation! MiguelX413 (talk) 21:10, 28 October 2019 (UTC)Reply

sa Sanskrit

[edit]

Kindly add the Takri script as one of the scripts for Sanskrit. It was heavily used to write Sanskrit in kingdoms of Himachal Pradesh. Nik9hil (talk) 12:23, 25 September 2020 (UTC)Reply

Latin, Sardinian & Venetian diacritics

[edit]

Over in this discussion a whole two of us came to a consensus that Sardinian and Venetian entries should contain no diacritics - namely, no acutes or graves. I've also discovered that Latin doesn't currently strip acutes, graves and circumflexes. There are other curious spellings in Category:Latin terms by their individual characters worth having a discussion about, but at least the diacritics I mentioned are part of the now-superceded orthography jointly marking length and accent, but only in accented syllables; and the ‹ûm› in deûm is basically a morpheme disambiguation mark meaning "genitive plural" that is written over all of these 2nd-declension -um genitives (yes, they thought it was a contraction). In short, it seems uncontrovertial that these should be stripped - which would be very welcome. Brutal Russian (talk) 03:46, 11 March 2021 (UTC)Reply

Pali

[edit]

Please enable the usage of module pi-translit or translit-redirect for Pali. As no-one else has raised or reiterated any objections, fine-tuning for different writing systems within a script will be done either in the inflection modules or manually. (The inflection module will mostly subcontract that to a special entry point within module pi-translit.) RichardW57 (talk) 20:22, 6 May 2021 (UTC)Reply

@Erutuon, Metaknowledge, AryamanA, Octahedron80, -sche: Pretty please. --RichardW57 (talk) 18:32, 1 June 2021 (UTC)Reply
I think this was discussed before, right? I seem to recall being involved in a discussion about it, with you and Erutuon. But I'm having trouble finding where that discussion was, can you remind me? I'm trying to ascertain whether there was actually consensus, because I know some issues were raised regarding transliterating the same string two different ways depending on whether it was viewed as having inherent vowels or not, and discussinh whether it was sensible to handle that in one way vs another. I don't personally want to implement this until I'm sure it's the right / agreed-upon course of action. If one of the other people you pinged knows more about this and thinks the proposed edit is ready for implementation, they should feel free to proceed,. - -sche (discuss) 21:37, 1 June 2021 (UTC)Reply
Not really. What was discussed, Module_talk:scripts/data#Aphabetic_Thai_and_Lao, was setting up separate script codes for some Pali writing systems. The mood was against it, so I dealt with the issue by bypassing the standard interface for inflection tables so I could pass in writing system parameters. There are residual locations left, but they can be handled by manually inputting writing system-sensitive transliterations. --RichardW57 (talk) 07:41, 2 June 2021 (UTC)Reply
@-sche: For example, if I need to refer to alphabetic Thai script ยุชฌิ ('he fought'), I need to use manual input to choose between 'yujajhi' (wrong) and 'yujjhi' (correct). But if I am generating an inflection table, the inflection module has to know whether one is using the alphabet or the abugidic writing system, and it then passes that information in through the fuller, alternative transliteration interface function trwo() instead of using tr(). (This does require that the transliteration module be invoked directly from another module.) But if I refer to alphabetic Thai script คัจฉิ (gacchi, 'he went'), the transliterator can detect the writing system and automatically correctly transliterate it as 'gacchi' rather than the incorrect 'gacachi'. Needing manual transliteration will be the exception rather than the rule. --RichardW57m (talk) 08:38, 2 June 2021 (UTC)Reply

Change the ancestor of Norwegian Bokmål to Danish. It’s currently set to Middle Norwegian.

[edit]

Norwegian Bokmål is used in Norway and seen by many as Norwegian, but from an etymological standpoint, it’s a product of Danish. Some words in Bokmål are of Norwegian origin, but they are few compared to the Danish-derived core. A word like senke, for instance, is inherited from Danish, not borrowed. This is one of many words that can’t possible come from Middle Norwegian or Old West Norse. --Eiliv (talk) 09:07, 1 June 2021 (UTC)Reply

This was previously proposed at Wiktionary:Beer_parlour/2021/April#List_Norwegian_Bokmål_as_a_descendant_of_Danish_on_etymology_pages and opposed. (I don't think it would be right, either; the spellings of the words were adapted to Danish, and some words were borrowed, but to say the lect overall has a separate ancestor is an even more controversial position than the already controversial idea that the different spellings in Bokmal and Nynorsk vs Riksmal and the Norwegian dialects that are neither Nynorsk nor Bokmal make Norwegian 3+ different languages.) - -sche (discuss) 21:43, 1 June 2021 (UTC)Reply
Not universally opposed, though – far from it. Saying that Bokmål has a separate ancestor is not at all controversial either. In fact, the simplified story that we are told in school is that Knud Knudsen wanted to create a written Norwegian language by adapting the written Danish language to match how Norwegians pronounced it and eventually borrow elements of the grammar and lexicon from Norwegian, whereas Ivar Aasen instead wanted to create a new one from the Norwegian dialects. The controversial point of view is, if anything, that the contemporary Bokmål still can be considered a dialect of Danish (the standard is after all exceptionally wide, containing a bunch of doublets). That Bokmål has its roots in Danish is just an unquestionable fact that is reflected in the English Wikipedia article as well, and it is quite strange that Wiktionary choses to ignore it and instead adopt the ahistoric view that Bokmål somehow is a descendant of Middle Norwegian (for which it is impossible to present a coherent argument IMHO). I looked through the public archives of the Ministry of church and education last week and found lots of documents on orthographic questions that not at all covered up the fact that Bokmål/Riksmål is derived from Danish. Just a few weeks ago we had a new language law passed that abolished the term "målform" (language variant) in favour of "språk" (language). The explicit purpose was to reflect that Bokmål and Nynorsk indeed are different languages and not simply variants of the same language (as was claimed when the government's official policy was to merge them). I honestly do not understand the Wiktionarian resistance.
As I have mentioned in a past discussion, Danish spellings in contemporary Bokmål were never borrowed from Danish. They have been there all along, from the time when the language was identical to Danish, i.e. when Danish was the only official written language. The term "borrowed" implies that a word that is now present in the language was not there in the past, but at some point was introduced from an external source. In this case the words predate the language itself, which is often said to have become a separate standard in 1907 (they began to diverge in the late 1800s, but that is besides the point). Thus they cannot be borrowings from Danish, but rather must be inherited.
PS. I have yet to see anybody claim that Riksmål and Bokmål are different languages. It is widely known that these are mostly overlapping standards, where contemporary Riksmål in reality is a subset of contemporary Bokmål. It must also be noted that contemporary Riksmål is not a direct descendant of the standard that was called Riksmål in the early 1900s. It is a privately regulated standard that branched off Bokmål after the official name was changed. The primary goal was to put an end to the attempt to merge Bokmål and Nynorsk, and the Riksmål movement sought to do so by defining a strict, conservative standard and persuading the press and authors to choose it over the official standard. They did not fully return to the 1917 Riksmål, but chose a subset of the 1938 standard and modified it by reintroducing certain 1917 spellings. Hått (talk) 00:29, 2 June 2021 (UTC)Reply

Edit request: Tagalog (Baybayin transliteration)

[edit]

Can someone add support to the recently created Tagalog transliteration module (for Baybayin)?

       translit_module = "tl-translit",

override_translit = true, TagaSanPedroAko (talk) 20:22, 21 November 2021 (UTC)Reply

Added. DTLHS (talk) 21:02, 21 November 2021 (UTC)Reply
Also allow Baybayin transliteration to be overriden where needed. TagaSanPedroAko (talk) 08:50, 23 November 2021 (UTC)Reply

Judeo-Urdu

[edit]

Is it possible that Judeo-Urdu can be assigned a language code for Wiktionary? There is no ISO code assigned for it, but I have created entries for it, but I would like it not to appear/mix with the actual Urdu lemmas. I believe, Hebrew also needs to be appended to the list of Urdu scripts. نعم البدل (talk) 03:04, 9 October 2022 (UTC)Reply

@نعم البدل Thanks for adding coverage of Judeo-Urdu. The statements on pages 15-16 of
D. Rubin, Aaron "A Unique Hebrew Glossary from India: An Analysis of Judeo-Urdu".
suggest that Judeo-Urdu should be treated as a variety of Urdu.
There is no indication of any distinct Jewish dialect of Hindi/Urdu, and there is no evidence that the dialectal features, colloquialisms, or errors in this text were typical only of Jewish speakers.
Two of the Judeo-Urdu texts of which I am aware are transcriptions of well-known Urdu texts (Indar Sabhā and Laila Majnu).
A number of the lexical items in the glossary, specifically those of Arabic origin, are more common in Urdu than in Hindi, or are found only in Urdu.
Based on the differences from standard Urdu as presented in "A Unique Hebrew Glossary from India: An Analysis of Judeo-Urdu", do you still believe that Judeo-Urdu differs from Urdu to the extent that it needs a separate language code under the language family Category:Hindustani languages similar to the Category:Deccani language? Kutchkutch (talk) 02:31, 21 November 2022 (UTC)Reply
Hi Kutchkutch, no, not all, I don't wish Judeo-Urdu to be treated separately to Urdu (neither when it comes to Deccani, for that matter), I merely wanted it to be assigned a language code, that is subordinate to Urdu, like Category:Judeo-Arabic does. Also, could you please make it so that on Category:Urdu lemmas, Judeo-Urdu lemmas don't appear first please, as I'm not sure how that is done? نعم البدل (talk) 20:45, 21 November 2022 (UTC)Reply
@نعم البدل Thanks for the clarification. What you mean by a language code is an etymology-only code. According to Module:etymology languages/data, etymology-only codes are
used mainly in etymology templates (specifically foreign derivation templates) such as {{der}} and {{cog}}.
When adding this etymology-only code, it might be polite to notify other Urdu users (Notifying RonnieSingh, AryamanA, Svartava): , but they probably will not have anything to say about this change. Would it be okay to assign the etymology-only code as ur-jud?
The Hebrew script appears before the Arabic script because the Hebrew Unicode block (Appendix:Unicode/Hebrew) covers code points from U+0590 to U+05FF, which is lower than the code points that represent the Arabic Unicode block from U+0600 to U+06FF (Appendix:Unicode/Arabic). @Erutuon Is it possible change the sort order of scripts in categories? Kutchkutch (talk) 07:13, 22 November 2022 (UTC)Reply
@Kutchkutch: Unfortunately no. Whatever sortkey is chosen, it is sorted case-insensitively in code point order. — Eru·tuon 09:29, 22 November 2022 (UTC)Reply
Do you know how it works for Category:Arabic lemmas / Category:Judeo-Arabic? نعم البدل (talk) 11:24, 23 November 2022 (UTC)Reply
I don't know what you mean, but to guess, the Category:Judeo-Arabic category link is added to the entries in Category:Judeo-Arabic by {{Judeo-Arabic spelling of}} and {{tlb|ar|Judeo-Arabic}}. If that's not what you're asking, please be more specific. — Eru·tuon 21:49, 23 November 2022 (UTC)Reply
@نعم البدل: Done. — Fenakhay (حيطي · مساهماتي) 07:05, 28 November 2022 (UTC)Reply
Thanks!نعم البدل (talk) 21:43, 30 November 2022 (UTC)Reply
Yep, that's the term! نعم البدل (talk) 11:23, 23 November 2022 (UTC)Reply

Sort order for Swedish

[edit]

I have a request to change the Swedish sort order. I would like to add

	sort_key = {
		from = {"å", "ä", "ö"},
		to   = {"z~", "z¡", "z°"}},

after line 1538. Gabbe (talk) 14:59, 11 October 2022 (UTC)Reply

Tamil language corrections

[edit]

The Tamil language table needs to be modified, Middle Tamil's language code 'ta-mid' needs to be changed to 'dra-mta' (in accordance with the proper language code formatting guidance), and it's been made a child of Tamil (ta) which is incorrect, it needs to be changed as the predecessor of Tamil language. (Proto-Dravidian (dra-pro) -> Old Tamil (oty) -> Middle Tamil -> Tamil (ta)). Emmanuel Asbon (talk) 04:43, 5 November 2022 (UTC)Reply

The reason for both of those is because ta-mid is an etymology-only language (it is not treated as an independent language in terms of L2 headings and is classified as a label under Tamil), which cannot be a parent of any full language for technical reasons. — SURJECTION / T / C / L / 14:52, 5 November 2022 (UTC)Reply

inlined Kazakh translit

[edit]

@Theknightwho Was this needed? I suspect it resulted in several memory errors as it significantly increases the module size. Benwing2 (talk) 01:54, 4 March 2023 (UTC)Reply

@Benwing2 I was seeing if it caused anything to start throwing errors after a while (i.e. whether it had a negative impact on memory usage). It did not.
It's the exact inverse of the sortkey module situation you asked me to reverse a few months ago, so I didn't think it was particularly likely anyway. I think that we should probably convert more instances of these, to be honest, as many transliteration modules are poorly written (i.e. inefficient), and separate modules are inherently less efficient anyway when it comes to simple replacements. Theknightwho (talk) 02:05, 4 March 2023 (UTC)Reply
@Theknightwho OK, as long as you're monitoring the memory usage it sounds good to me. Benwing2 (talk) 02:27, 4 March 2023 (UTC)Reply

Old Telugu, Old Malayalam, Proto South Dravidian, Proto North Dravidian and Proto Central Dravidian, Hindustani

[edit]

Can they get codes? Also many places mark consider Kumarbhag Paharia and Sauria Paharia to be a single language Malto (even wikipedia) so can they be made into one?

Kannada script is recent and during Middle Kannada and Old Kannada period, Kadamba script (and southern Brahmi?) were used AleksiB 1945 (talk) 14:09, 8 March 2023 (UTC)Reply

Tagalog sort key support

[edit]

Adds support for Tagalog sort key, which will handle Ñ and NG (and their Abecedario forms with tilde on the N or G) as separate characters for sorting.

m["tl"] = {
	"Tagalog",
	34057,
	"phi",
	"Latn, Tglg",
	translit = {Tglg = "tl-translit"},
	override_translit = true,
	entry_name = {Latn = {remove_diacritics = c.grave .. c.acute .. c.circ}},
	standardChars = {
		Latn = "AaBbKkDdEeGgHhIiLlMmNnOoPpRrSsTtUuWwYy",
		c.punc
	},
	sort_key = {
		Latn = "tl-sortkey"
	},
}

TagaSanPedroAko (talk) 07:41, 15 June 2023 (UTC)Reply

@Theknightwho please complete, thanks. TagaSanPedroAko (talk) 10:03, 15 June 2023 (UTC)Reply
@Kutchkutch please add, thanks.
TagaSanPedroAko (talk) 18:17, 15 June 2023 (UTC)Reply
@Kutchkutch please follow up, thanks. --TagaSanPedroAko (talk) 21:23, 15 June 2023 (UTC)Reply
@TagaSanPedroAko: Done Done Kutchkutch (talk) 22:56, 15 June 2023 (UTC)Reply

sukun for Persian is not deleted

[edit]

Hi @Benwing2: It looks like sukun for Persian is not deleted from the link but it's listed in the module with other diacritics. Please see diff. It links to چن٘گک with a sukun. Anatoli T. (обсудить/вклад) 08:03, 5 September 2023 (UTC)Reply

That's a ghunna diacritic mark. — Fenakhay (حيطي · مساهماتي) 08:46, 5 September 2023 (UTC)Reply
@Fenakhay, @Benwing2: Thanks and ignore my request! Anatoli T. (обсудить/вклад) 09:07, 5 September 2023 (UTC)Reply
@Atitarev sorry, this might be because of me. the phonetic Persian spelling uses a ghunna diacritic (not sukoon) to show a nasal articulated at the same place as the proceeding consonant [n̪][ɲ][ŋ][ɴ][m] (and [ɳ] for Hazaragi). It technically isn't a Persian letter, but I thought it was helpful since it was so specific. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 09:09, 5 September 2023 (UTC)Reply
@Sameerhameedy: Hi. I have indeed copied from the output to {{fa-IPA|čangak}} but failed to see it wasn't a sokun but a ghunna :)
Actually, I wanted to ask about this rule: e.g. {{fa-IPA|ir=tiliwīz`yun}} produces "تِلِویزْیُن" with a sokun over ز but {{fa-IPA|ir=kāmpyū`tir}} produces "کان٘پیوتِر" with no sokun over "پ" with the IPA [kʰɒːm.pᵊ.juː.t̪ʰéɹ] Anatoli T. (обсудить/вклад) 10:24, 5 September 2023 (UTC)Reply
@Atitarev, that's a mistake. I set fa-IPA to break up impossible consonant clusters with an epenthetic vowel, but it's been happening in places it's not supposed to.
As for the sukoon thing, some reason when there are three consecutive consonants, it only puts a sukoon on the first two. Not sure why that's happening but i'll see if I can fix it somehow. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 18:08, 5 September 2023 (UTC)Reply
@Sameerhameedy I suspect the issue with two sukuuns in a row is you have to do the pattern substitution twice in a row, because pattern substitutions won't overlap. Benwing2 (talk) 20:36, 5 September 2023 (UTC)Reply

English sorting

[edit]

@J3133 FYI, it does sort the character 🅱 properly, but the problem is that the English sortkey only applies to text recognised as Latn, and 🅱 in isolation isn't recognised as Latn as it's not part of that set of characters. I'm not super convinced we should add every rare character used in English entries (for the time being, anyway), as it adds overhead on every page, due to how sortkeys get processed. Some of the really weird ones that were there already are used in L2 headings, as English sorting gets used to determine the correct L2 order; I think there were a few others, though. Theknightwho (talk) 10:00, 19 August 2024 (UTC)Reply

@Theknightwho: The problem is that ⟨🅱⟩ is not sorted under ⟨B⟩ in Category:English terms by their individual characters, which is the category I was trying to fix as I wrote in my edit summary. J3133 (talk) 10:03, 19 August 2024 (UTC)Reply
@J3133 I've done it manually for now. The terms in it should sort automatically, though (e.g. in Category:English lemmas). Theknightwho (talk) 10:15, 19 August 2024 (UTC)Reply
@Theknightwho: I see you have added |sort=B to Category:English terms spelled with 🅱. The ⟨C⟩ ones I added to the module ⟨¢©ᴄ⟩ are not being sorted correctly either. J3133 (talk) 10:14, 19 August 2024 (UTC)Reply
@J3133 Same issue - I've been working on a more general fix for a while now, but it isn't ready yet. Theknightwho (talk) 10:16, 19 August 2024 (UTC)Reply

Edit request: Cornish (kw)

[edit]

Could someone add ò (o with a grave accent) to the language's remove_diacritics? Reasoning is provided here:

The short equivalent of <oo> may be spelt as <ò> in dictionaries and teaching materials for learners to show that it is pronounced differently from the short equivalent of <o>.

Kernowek Standard uses ù, so this should be unaffected. Người mang giấm (talk) 15:46, 11 September 2024 (UTC)Reply

@Người mang giấm What about ? Theknightwho (talk) 20:49, 11 September 2024 (UTC)Reply
@Theknightwho Huh. I'm surprised there's no mention of it in the Wikipedia article. Are there other words that use it, or is it a French "où" situation? Người mang giấm (talk) 02:17, 12 September 2024 (UTC)Reply

Edit request: Azerbaijani (az)

[edit]

It's better to mention Middle Azerbaijani (or Ajem Turkic) after the Old Anatolian Turkish in order to make it more detailed. Slowcuber7 (talk) 09:02, 9 November 2024 (UTC)Reply