Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

go - How to parse xml with MACROMAN encoding

I am trying to parse the given pom content but getting err: xml: opening charset "MACROMAN": unsupported charset: "MACROMAN". I tried to disable the strict encoding by setting decoder.Strict = false but that didn't work either.

Here is the Go playground link where i'm parsing this particular pom. Any help/ref will be appreciated.

<?xml version="1.0" encoding="MACROMAN"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <parent>
   <groupId>org.eclipse.vorto</groupId>
   <artifactId>org.eclipse.vorto.parent</artifactId>
   <version>0.10.0.M1</version>
 </parent>
 <modelVersion>4.0.0</modelVersion>
 <artifactId>generators</artifactId>
 
 <name>Eclipse Vorto Code Generators</name>
 <packaging>pom</packaging>
 <modules>
        <module>org.eclipse.vorto.codegen.thingworx</module>
        <module>org.eclipse.vorto.codegen.javabean</module>
        <module>org.eclipse.vorto.codegen.mqtt</module>
        <module>org.eclipse.vorto.codegen.webui</module>
        <module>org.eclipse.vorto.codegen.webdevice</module>
        <module>org.eclipse.vorto.codegen.markdown</module>
        <module>org.eclipse.vorto.codegen.ios</module>
        <module>org.eclipse.vorto.codegen.latex</module>
        <module>org.eclipse.vorto.codegen.bosch.things</module>
        <module>org.eclipse.vorto.codegen.coap</module>
        <module>org.eclipse.vorto.codegen.aws</module>
        <module>org.eclipse.vorto.codegen.lwm2m</module>
        <module>org.eclipse.vorto.codegen.prosystfi</module>
        <module>org.eclipse.vorto.codegen.kura</module>
 </modules>
 
</project>
`
question from:https://stackoverflow.com/questions/65645058/how-to-parse-xml-with-macroman-encoding

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

These are the known aliases for the macintosh encoding in Go:

var nameMap = map[string]htmlEncoding{
    // ...
    "csmacintosh":         macintosh,
    "mac":                 macintosh,
    "macintosh":           macintosh,
    "x-mac-roman":         macintosh,
    // ...
}

Since macroman is not in that list, you can use the CharsetReader function field to use your custom list of aliases by setting

    decoder.CharsetReader = charsetReader

where charsetReader is:

func charsetReader(charset string, input io.Reader) (io.Reader, error) {
    if isCharsetMacintosh(charset) {
        return transform.NewReader(input, charmap.Macintosh.NewDecoder()), nil
    }
    return input, nil
}

var macNames = []string{
    "macroman",
    "csmacintosh",
    "mac",
    "macintosh",
    "x-mac-roman",
}

func isCharsetMacintosh(charset string) bool {
    charset = strings.ToLower(charset)
    for _, n := range macNames {
        if charset == strings.ToLower(n) {
            return true
        }
    }
    return false
}

If you need more information, the answers here might be helpful: Unmarshal an ISO-8859-1 XML input in Go. Also helpful was looking at the source code of the charset.NewReaderLabel function and following the function calls.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...