Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
197 views
in Technique[技术] by (71.8m points)

c# - .Net Parse blob data from MySQL dump file

There is a MySQL dump and I need to get an image from it. I have written a small text parser to parse this file, but I have a problem with the encoding that I cannot solve in any way. Here is a snippet of the MySQL dump: mysql dump

I pasted it as a screenshot because if you copy and paste here, the bytes (0xFF,0xD8,0xFF,0xE0) in the image change to strange characters (???JFIF). Here is the code snippet where I am trying to process the image:

    List<ImageRecord> ImagesList = new List<ImageRecord>();
    private void Parse(byte[] sqlFile)
    {
        var sql = Encoding.UTF8.GetString(sqlFile);
        string strStart = @"INSERT INTO `images` VALUES (";
        string strEnd = @"');"; 
        int Start = sql.IndexOf(strStart, 0) + strStart.Length;
        int End = sql.IndexOf(strEnd, Start);
        var value = sql.Substring(Start, End - Start);
        var valueslist = value.Split("'),('"); 
        foreach (var imagedata in valueslist)
        {
            ImageRecord cfg = new ImageRecord(imagedata);
            this.ImagesList.Add(cfg);
        }
    }
    public class ImageRecord 
    {
        public int id { get; set; }
        public DateTime timestamp { get; set; }
        public string user { get; set; } = String.Empty;
        public byte[] imagedata { get; set; }
        public ImageRecord() { }
        public ImageRecord(string sqlpart) 
        {
            string value = sqlpart;
            if (sqlpart[0] == ''')
                value = value.Substring(1, value.Length - 1);  
            var valueslist = value.Split("', '");
            this.id = Convert.ToInt32(valueslist[0]);
            this.timestamp = Convert.ToDateTime(valueslist[1]);
            this.user = valueslist[2];
            this.imagedata = Encoding.UTF8.GetBytes(valueslist[3]);
        }
    }

I understand that the problem is that I am reading the file as UTF8 and those bytes are converted to characters, but I don't know how to do it differently. I also tried this option: get the position in the document where the image starts, go back to the file in byte representation and get the bytes, but that doesn't work because the position is not the same and the resulting file doesn't start with 0xFF, 0xD8,0xFF, 0xE0. but a little earlier (in the middle of the table description) and the file length does not match what I need. It turns out that I can only navigate through this file if I read it in UTF8, but I need to get the document fragment as it is. In this example, the image is in *.jpg format, but can be in any other format.

question from:https://stackoverflow.com/questions/65950830/net-parse-blob-data-from-mysql-dump-file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

These are not UTF-8 bytes, so Encoding.UTF8.GetString(sqlFile) won't work; it will corrupt the data.

You will need to process the raw byte[] directly. The format is fairly simple: ' or is escaped with a ; every other byte represents itself. (And the entire BLOB is surrounded by ' bytes.)

Once you've read the file up to the opening ' before the image data, the processing loop would look something like this (pseudocode):

List<byte> output = new();
int pos = currentPos + 1; // first byte after the opening '

// read until the terminating '
while (bytes[pos] != (byte) ''')
{
    // if it's a backslash, skip it and read the next byte instead
    if (bytes[pos] == (byte) '\')
        pos++;

    // copy this byte to the output
    output.Add(bytes[pos]);
    pos++;
}
var imageData = output.ToArray();

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...