Deprecated: Function get_magic_quotes_gpc() is deprecated in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 99

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 619

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1169

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176
8000 tool_homedir: Fix GetEnv for Windows Unicode builds by jay · Pull Request #7281 · curl/curl · GitHub
Nothing Special   »   [go: up one dir, main page]

Skip to content

Conversation

jay
Copy link
Member
@jay jay commented Jun 21, 2021
  • For Windows Unicode builds retrieve the homedir in UTF-8 encoded
    Unicode instead of ANSI local character encoding.

Fixes #7252
Closes #xxxx


/cc @dEajL3kA

@jay jay added cmdline tool Windows Windows-specific labels Jun 21, 2021
@jay
Copy link
Member Author
jay commented Jun 21, 2021

This could be a problem if homedir is expected in local encoding by some dependencies. For example, homedir() is used to pass the homedir to the ssh library:

curl/src/tool_operate.c

Lines 1755 to 1760 in 6b951a6

char *home = homedir(NULL);
if(home) {
char *file = aprintf("%s/.ssh/known_hosts", home);
if(file) {
/* new in curl 7.19.6 */
result = res_setopt_str(curl, CURLOPT_SSH_KNOWNHOSTS, file);

curl/lib/vssh/libssh.c

Lines 2224 to 2232 in 6b951a6

if(data->set.str[STRING_SSH_KNOWNHOSTS]) {
infof(data, "Known hosts: %s\n", data->set.str[STRING_SSH_KNOWNHOSTS]);
rc = ssh_options_set(ssh->ssh_session, SSH_OPTIONS_KNOWNHOSTS,
data->set.str[STRING_SSH_KNOWNHOSTS]);
if(rc != SSH_OK) {
failf(data, "Could not set known hosts file path");
return CURLE_FAILED_INIT;
}
}

curl/lib/vssh/libssh2.c

Lines 3159 to 3177 in 6b951a6

#ifdef HAVE_LIBSSH2_KNOWNHOST_API
if(data->set.str[STRING_SSH_KNOWNHOSTS]) {
int rc;
sshc->kh = libssh2_knownhost_init(sshc->ssh_session);
if(!sshc->kh) {
libssh2_session_free(sshc->ssh_session);
sshc->ssh_session = NULL;
return CURLE_FAILED_INIT;
}
/* read all known hosts from there */
rc = libssh2_knownhost_readfile(sshc->kh,
data->set.str[STRING_SSH_KNOWNHOSTS],
LIBSSH2_KNOWNHOST_FILE_OPENSSH);
if(rc < 0)
infof(data, "Failed to read known hosts from %s\n",
data->set.str[STRING_SSH_KNOWNHOSTS]);
}
#endif /* HAVE_LIBSSH2_KNOWNHOST_API */

libssh's ssh_options_set
libssh2's libssh2_knownhost_readfile

@vszakats
Copy link
Member
vszakats commented Jun 21, 2021

@jay: You're right, this may indeed be a problem. To resolve this with the least amount of confusion, I think a viable solution is to introduce one variant of homedir() that can be used by callers needing legacy 8-bit encodings (e.g. because they need to pass it along to libs that specifically expect 8-bit encodings, or, they have undefined requirements) and another one (say homedir_utf8()) that can be used wherever we know that an UTF-8 string is needed.

This also generally describes the problem encodings create. For example, last week I decided to update a libcurl wrapper for Windows Unicode support, but instantly bumbed into the problem where I would have to know precisely when various libcurl CURLOPTs, CURLINFOs, functions and structures expect/return a zero-terminated raw string (including legacy 8-bit codepage ones), a binary string (this one is obvious because it comes with a length), an 7-bit ASCII string, an UTF-8 string. Depending on this, I need to either convert the string to/from UTF-8 (from the callee's codepage) or leave it untouched. In many cases it's difficult to decide without knowing the whole called stack and at the moment the documentation misses this information. For now I started with a rough map to assign an encoding requirement with each string input/output.

E.g. I assumed that URLs and local filenames are always UTF-8. But, for example usernames and password can be anything, and this is something libcurl needs to specify. Non-local filenames passed around in MIME are a different category, because here we may need to inform the other party about the encoding used (and vice versa).

When this is defined, even curl tool may need be adapted to allow passing raw strings from the command-line (without any codepage/encoding conversion), for example as hex- or base64-encoded strings. This way any password value could be passed without applying a (potentially unwanted) conversion to them, or sending other values just like a server expects.

Back to GetEnv(): To fix #7252, what also needs to be addressed are all direct curl_getenv() calls made from curl/libcurl. Here, 8-bit strings are returned even though some callers would need UTF-8. This could be solved by a curl_getenv_utf8() which would always make sure to return UTF-8.

$ git grep curl_getenv
curl_ntlm_wb.c:  ntlm_auth_alloc = curl_getenv("CURL_NTLM_WB_FILE");
netrc.c:    char *homea = curl_getenv("HOME"); /* portable environment reader */
url.c:  prox = curl_getenv(proxy_env);
url.c:    prox = curl_getenv(proxy_env);
url.c:    proxy = curl_getenv(envp); /* default proxy to use */
url.c:      proxy = curl_getenv(envp);
url.c:    no_proxy = curl_getenv(p);
url.c:      no_proxy = curl_getenv(p);
vssh/libssh2.c:          char *home = curl_getenv("HOME");
vtls/keylog.c:    keylog_file_name = curl_getenv("SSLKEYLOGFILE");
vtls/vtls.c:  env = env_tmp = curl_getenv("CURL_SSL_BACKEND");

@vszakats
Copy link
Member
vszakats commented Jun 24, 2021

FWIW I've created a Wiki page with the goal to collect string encoding rules for the various libcurl APIs. It is now showing a pretty rough, initial state. This is a result of a quick walkthrough over the ones I used in a particular libcurl wrapper (not exactly the linked one, but an updated, local version) and doing a rough guess about string encodings. And even making a rough guess is non-trivial in a lot of cases:
https://github.com/curl/curl/wiki/WIP:-libcurl-API-and-expected-string-encodings

[ Oh, notice that the UTF-8-ness is mostly a Windows Unicode matter, at the moment at least, but the overall notion of having this defined matters on all platforms equally. ]

@jay jay force-pushed the fix_tool_getenv_for_windows_unicode branch from abd6afa to c1904c7 Compare July 10, 2021 06:03
jay added a commit to jay/curl that referenced this pull request Jul 10, 2021
- For Windows Unicode builds add tool_getenv_utf8 to get Unicode UTF-8
  encoded environment variable values.

- Add tool_getenv_local to getenv in the current locale encoding. This
  is the equivalent of curl_getenv which is now banned in favor of
  tool_getenv.

- Map tool_getenv macro to tool_getenv_utf8 for Windows Unicode and
  tool_getenv_local otherwise.

- Similar to above, split homedir into homedir_utf8, homedir_local
  and a homedir macro which maps to one of the first two.

Background:

Windows does not support UTF-8 locale (or, not really). Because of that
our Windows Unicode builds continue to use the current locale, but
expect Unicode UTF-8 encoded paths for internal use. In other words file
operations by curl or libcurl to open / access / stat a file are
expected to have a Unicode path.

Complicating this is that dependencies, which may or may not be Unicode
builds, may or may not expect UTF-8 encoded Unicode for char * string
pathnames. For example, libcurl can use libssh or libssh2 as an SSH
library and pathnames need to be passed in the local encoding AFAICS.

Prior to this change the curl tool made a best effort to convert
pathnames from the command line to Unicode UTF-8 encoding, but did not
do so for those pathnames retrieved by environment variables via
curl_getenv, which is in the local encoding. This worked without much
incident but was incorrect.

Essentially, for Windows Unicode builds we need access to both local
encoded and UTF-8 encoded pathnames from the environment. Therefore,
curl_getenv is not sufficient. In order to make this obvious in the
code I've banned curl_getenv (via checksrc) in favor of new
tool_getenv_local, tool_getenv_utf8 and tool_getenv and made similar
changes to homedir, as described.

The end result is something like CURLOPT_SSH_KNOWNHOSTS which is passed
to the SSH library and always needs the local encoding (in other words,
even if it's a Windows Unicode build) can now be made by calling
homedir_local (which calls tool_getenv_local). But, if it's a path that
is used internally only then just homedir can be used (which calls
tool_getenv_utf8 for Windows Unicode, and tool_getenv_local otherwise).

Also of note is 765e060, which removed local encoding fallbacks and
is another reason to make these changes.

Fixes curl#7252
Closes curl#7281
@jay
Copy link
Member Author
jay commented Jul 10, 2021

The latest iteration of this I've banned curl_getenv in the tool in favor of tool_getenv which maps to tool_getenv_local or tool_getenv_utf8. Similarly homedir which maps to homedir_local or homedir_utf8. The _utf8 versions are the default versions used in Windows Unicode builds and local otherwise.

The Windows Unicode builds also have access to _local for situations where a string encoded in the current locale needs to be passed to a dependency. For example, the paths set via curlopt to be passed to the SSH and SSL libraries are retrieved via _local. I'm assuming here these dependencies are using the local encoding, and also this is effectively the way we had it before. The wiki page @vszakats created guesses UTF-8 for these paths, but I haven't seen any dependency documentation supporting that.

The commit message contains more detail.

@vszakats
Copy link
Member

@jay: LGTM! The codepage wiki indeed only lists the codepage that "makes sense" in its context (hence the included comment). Mapping these codepages to actual build combination behaviour is a next (likely large) undertaking. libssh2 indeed doesn't have special support for Unicode now, so it needs strings as-is.

@jay jay force-pushed the fix_tool_getenv_for_windows_unicode branch from c1904c7 to dbd9e5b Compare July 13, 2021 07:07
@jay
Copy link
Member Author
jay commented Jul 13, 2021

I redid it again to cover both libcurl and the curl tool. getenv, curl_getenv, curlx_getenv are banned in favor of Curl_getenv which maps to Curl_getenv_utf8 for Windows Unicode and Curl_getenv_local otherwise. The Curl_getenv functions are curlx functions that are compiled in both the lib and tool. File paths that are passed to dependencies are retrieved in the current locale encoding always by calling Curl_getenv_local, regardless of build type.

@jay jay force-pushed the fix_tool_getenv_for_windows_unicode branch from dbd9e5b to 8f0adc3 Compare July 13, 2021 07:16
@jay jay force-pushed the fix_tool_getenv_for_windows_unicode branch from 8f0adc3 to ab6a811 Compare July 13, 2021 07:27
@vszakats
Copy link
Member

Maybe late to note but OpenSSL does support UTF-8 in default Windows builds (for filenames notably), in these cases, it'd be useful to keep strings in UTF-8.

@jay
Copy link
Member Author
jay commented Jul 14, 2021

Maybe late to note but OpenSSL does support UTF-8 in default Windows builds (for filenames notably), in these cases, it'd be useful to keep strings in UTF-8.

OpenSSL falls back to the local encoding like we used to do. In that case though passing UTF-8 ca paths from the tool would be more correct. I'll look into if it's practical to special case it.

@vszakats
Copy link
Member

Yeah, I've noticed (to my horror) that OpenSSL does have a fallback logic. But, UTF-8 is the primary encoding there and without it, it's only possible to pass non-ASCII strings that match your system locale. (This isn't ideal because in shipped software you have no control over the system locale.)

@jay
Copy link
Member Author
jay commented Jul 15, 2021

Yeah, I've noticed (to my horror) that OpenSSL does have a fallback logic. But, UTF-8 is the primary encoding there and without it, it's only possible to pass non-ASCII strings that match your system locale. (This isn't ideal because in shipped software you have no control over the system locale.)

Frankly this is a good tradeoff, and looking at their source it's actually not the logic I had used. (edit: actually, it's pretty similar). They use the current locale (ie it calls fopen) if the string is not UTF-8 encoded. We have libcurl options that may be passed in local encoding and it would be nice not to break anything, but also not cause any security issue. I may backpedal a bit (To your horror I guess..) and submit a PR to fall back the way OpenSSL does. My comments in 765e060 are wrong. Currently in Windows Unicode builds, if the user is passing a string in the current locale it may be valid UTF-8 and treated as such, regardless of any change there is nothing we can do about that except document such and such option requires (or prefers maybe) UTF-8 for those builds.


Edit:

Submitted #7421 to resurrect local encoding fallback.

OpenSSL 1.0.0a added Windows UTF-8 file open support.
openssl/openssl@336d1ee

BoringSSL and LibreSSL forked after that point but the same functions in master are missing that support. Also, I searched for wfopen, CP_UTF8 and utf to see if maybe they somehow did it differently (?) but didn't find anything. So we can assume they removed that code.
https://github.com/google/boringssl/blob/ecc301ca0f77dd5d78de4b22d25713e63d18168a/crypto/bio/file.c#L92
https://github.com/libressl-portable/openbsd/blob/7db46cfb5799731fdf87786bc5bc8877aa113631/src/lib/libcrypto/bio/bss_file.c#L113

Note recent versions of OpenSSL such as 1.1.1 have moved it to openssl_fopen.
https://github.com/openssl/openssl/blob/OpenSSL_1_1_1k/crypto/o_fopen.c#L38

- For Windows Unicode builds add Curl_getenv_utf8 to get Unicode UTF-8
  encoded environment variable values.

- Add Curl_getenv_local to get the environment variable in the current
  locale encoding.

- Map Curl_getenv macro to Curl_getenv_utf8 for Windows Unicode and
  Curl_getenv_local otherwise.

- Ban public getenv, curlx_getenv, and curl_getenv from being called in
  curl/libcurl in favor of Curl_getenv.

- Add functions to check if a environment variable exists:
  Curl_env_exist_utf8, Curl_env_exist_local and macro Curl_env_exist
  which maps to one of the first two.

- Similar to above, split homedir into homedir_utf8, homedir_local
  and a homedir macro which maps to one of the first two.

The Curl_getenv functions are curlx functions that are compiled in both
the lib and tool.

Background:

Windows does not use or support UTF-8 as the current locale (or, not
really). The Windows Unicode builds of curl/libcurl use the current
locale, but expect Unicode UTF-8 encoded paths for internal use such as
open, access and stat.

Prior to this change Windows Unicode builds of the curl tool made a best
effort to convert pathnames from the command line to Unicode UTF-8
encoding, but curl/libcurl did not do so for those pathnames retrieved
by environment variables via getenv/curl_getenv, which returns the local
encoding. This worked without much incident but was incorrect if the
path was going to be opened by curl/libcurl. However, it was correct for
dependencies that always expect paths in the local encoding.

Basically, for Windows Unicode builds we need access to environment
variables in both UTF-8 and current locale encoding. There is more
detail about this issue in the getenv.h comment block.

Also of note is 765e060, which removed local encoding fallbacks and
is another reason to make these changes.

Fixes curl#7252
Closes curl#7281
@jay jay force-pushed the fix_tool_getenv_for_windows_unicode branch from ab6a811 to a1a983c Compare July 16, 2021 07:23
@jay
Copy link
Member Author
jay commented Jul 16, 2021

OpenSSL falls back to the local encoding like we used to do. In that case though passing UTF-8 ca paths from the tool would be more correct. I'll look into if it's practical to special case it.

Ok I did this with a tool global variable ssl_paths_use_utf8, you can see how in tool_libinfo.c changes

@vszakats
Copy link
Member

ssl_paths_use_utf8 is a neat idea! The method may well work for URL strings too.

@jay
Copy link
Member Author
jay commented Jul 19, 2021

I'm second guessing this, I think it may be better to document each string option in Windows Unicode builds should be passed to libcurl in UTF-8 and then in libcurl convert to the current locale. It could be done in Curl_vsetopt when the string option is set or later on when it is passed to the dependency. Otherwise we end up with a mix of having to document for each option which encoding and I think that will make it harder for the user to maintain their code.

vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Windows Unicode support in libcurl does
not seem to be ready for production. Existing support extended certain
Windows interfaces to use the Unicode flavour of the Windows API, but that
also meant that the expected encoding/codepage of strings (e.g. local
filenames, URLs) exchanged via the libcurl API became ambiguous and
undefined.

Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API, build options/dependencies, internal fallback logic based on
runtime auto-detection of passed string, and the result of file operations
(scheduled for removal in 7.78.0). While some parts of libcurl kept using
8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in
finding out how to pass (or read) a certain non-ASCII string to (from) a
specific API without unwanted or accidental conversions or other
side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, reading/writing a
different file, accessing the wrong URL or passing a corrupt username or
password.

Note that these issues may only affect strings with _non-7-bit-ASCII_
content.

For now the least bad solution seems to be to revert back to how
libcurl/curl worked for most of its existence and only re-enable Unicode
once the remaining parts of Windows Unicode support are well-understood,
ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Windows Unicode support in libcurl does
not seem to be ready for production. Existing support extended certain
Windows interfaces to use the Unicode flavour of the Windows API, but that
also meant that the expected encoding/codepage of strings (e.g. local
filenames, URLs) exchanged via the libcurl API became ambiguous and
undefined.

Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API, build options/dependencies, internal fallback logic based on
runtime auto-detection of passed string, and the result of file operations
(scheduled for removal in 7.78.0). While some parts of libcurl kept using
8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in
finding out how to pass (or read) a certain non-ASCII string to (from) a
specific API without unwanted or accidental conversions or other
side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, reading/writing a
different file, accessing the wrong URL or passing a corrupt username or
password.

Note that these issues may only affect strings with _non-7-bit-ASCII_
content.

For now the least bad solution seems to be to revert back to how
libcurl/curl worked for most of its existence and only re-enable Unicode
once the remaining parts of Windows Unicode support are well-understood,
ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
jay added a commit to jay/curl that referenced this pull request Aug 13, 2022
Bug: curl#7252
Reported-by: dEajL3kA@users.noreply.github.com

Ref: curl#7281

Closes #xxxx
2254
@jay
Copy link
Member Author
jay commented Aug 13, 2022

Dead. Adding it to known bugs see #9305. I'm not certain about any of the solutions discussed including this PR which in retrospect seems too complicated. homedir paths may be passed to other libraries that expect the current locale. If this issue comes up again we can reconsider all the options.

@jay jay closed this Aug 13, 2022
jay added a commit that referenced this pull request Aug 16, 2022
Bug: #7252
Reported-by: dEajL3kA@users.noreply.github.com

Ref: #7281

Closes #9305
jquepi pushed a commit to jquepi/curl.1.555 that referenced this pull request Oct 24, 2022
@vszakats vszakats added the Unicode Unicode, code page, character encoding label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmdline tool Unicode Unicode, code page, character encoding Windows Windows-specific

Development

Successfully merging this pull request may close these issues.

2 participants

0