Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

powershell - Character-encoding problem with string literal in source code

$logstring = Invoke-Command -ComputerName $filesServer   -ScriptBlock {
        param(
            $logstring,
            $grp
        )

    $Klassenbuchordner = "KB " + $grp.Gruppe
    $Gruppenordner = $grp.Gruppe
    $share = $grp.Gruppe
    $path = "D:Gruppen$Gruppenordner"

    if ((Test-Path D:Dozenten1_Klassenbücher$Klassenbuchordner) -eq $true)
    {$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
    else {
        mkdir D:Dozenten1_Klassenbücher$Klassenbuchordner
        $logstring += "Klassenbuchordner wurde erstellt!"
    }} -ArgumentList $logstring, $grp

My goal is to test the existence of a directory and create it on demand.

The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.

For instance, the server receives path "D:Dozent1_Klassenb??cher" instead of the expected "D:Dozent1_Klassenbücher".

How can I force proper UTF-8 encoding?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Note: Remoting and use of Invoke-Command are incidental to your problem.

Since the problem occurs with a string literal in your source code (...1_Klassenbücher...), the likeliest explanation is that your script file is misinterpreted by PowerShell.

In Windows PowerShell (as opposed to PowerShell Core (v6+)), if your script file is de facto UTF-8-encoded but lacks a BOM, the PowerShell engine will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]

Therefore: Re-save your script as UTF-8 with BOM.


Why you should save your scripts as UTF-8 with BOM:

Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.

By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.

You can only get away with "ANSI"-encoded files:

  • if your scripts will never be run in PowerShell Core - where all future development effort will go.

  • if your scripts will never run on a machine where a different "ANSI" code page is in effect.

  • if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.

Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).


How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:

Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.

Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):

"[powershell]": {
  "files.encoding": "utf8bom"
}

Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.


[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...