Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

mysql - How to fix badly formatted JSON in PHP?

I'm getting a data feed which is in JSON format and the only available format. In PHP, I'm using json_decode to decode the JSON, but it was breaking, and I found out that the JSON was generated in some places with double quotes in a person's nick name. I verified this using: http://jsonformatter.curiousconcept.com

I don't have control over the creation of the data, but I have to deal with this broken format when it occurs. This data after it's parsed will be put into a MySQL TABLE.

For example:

"contact1": "David "Dave" Letterman",

json_decode would return a NULL. If I manually saved the file, and changed it to single quotes around the nickname of Dave, then everything worked.

$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);

How do I fix the broken JSON format in json_string before it gets processed by json_decode? What should be done to pre-process the file, backslash the double quotes of the nickname? Or change them to single quotes? Is it even a good idea to store double quotes like this in MySQL?

I don't know when this might occur with each data feed, so I don't want to just check for contact1 if it has inner double quotes to fix them. Is there a way in PHP to take a line such as the above example, and backslash everything after the colon except the outer double quotes? Thanks!

This is the correct code for as provided by tftd:

<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David "Dave" Letterman",

$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "
";
echo $preg . "
";
?>

Here is the output:

string(39) ""contact1": "David "Dave" Letterman","
NULL

"contact1": "David "Dave" Letterman",
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have a own jsonFixer() function - it works in two steps: removing garbage (for equality of incoherent formatting) and reformatting.

<?php
  function jsonFixer($json){
    $patterns     = [];
    /** garbage removal */
    $patterns[0]  = "/([s:,{}[]])s*'([^:,{}[]]*)'s*([s:,{}[]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
    $patterns[1]  = '/([^s:,{}[]]*){([^s:,{}[]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[2]  =  "/([^s:,{}[]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[3]  = "/(}),s*/"; //JSON.parse() doesn't allow trailing commas
    /** reformatting */
    $patterns[4]  = '/([^s:,{}[]]+s*)*[^s:,{}[]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[5]  = '/["']+([^"':,{}[]]*)["']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
    $patterns[6]  = '/(")([^s:,{}[]]+)(")(s+([^s:,{}[]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[7]  = "/(')([^s:,{}[]]+)(')(s+([^s:,{}[]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[8]  = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
    $patterns[9]  = '/,s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
    $patterns[10] = '/s+/'; //Find one or more spaces...
    $patterns[11] = '/^s+/'; //Find one or more spaces at start of string...

    $replacements     = [];
    /** garbage removal */
    $replacements[0]  = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
    $replacements[1]  = '$1 { $2'; //...and put spaces between them;
    $replacements[2]  = '$1 }'; //...and put a space between them;
    $replacements[3]  = '$1'; //...so, remove trailing commas of any right curly brackets;
    /** reformatting */
    $replacements[4]  = '"$0"'; //...and put quotation marks surrounding them;
    $replacements[5]  = '"$1"'; //...and replace by single quotation marks;
    $replacements[6]  = '\$1$2\$3$4'; //...and add back slashes to its quotation marks;
    $replacements[7]  = '\$1$2\$3$4'; //...and add back slashes to its apostrophes;
    $replacements[8]  = '$1, $2'; //...and put a comma followed by a space character between them;
    $replacements[9]  = ' $1'; //...and replace by a space followed by a right curly bracket;
    $replacements[10] = ' '; //...and replace by one space;
    $replacements[11] = ''; //...and remove it.

    $result = preg_replace($patterns, $replacements, $json);

    return $result;
  }
?>

Example of usage:

<?php
  // Received badly formatted json:
  // {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
  $json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}';
  jsonFixer($json_string);
?>

Will result:

{"contact1": "David "Dave" Letterman", "price" : "30.00", "details" : "Greatest 'Hits' Album"}

Note: this wasn't tested with all possible badly formatted JSON strings but I use on a complex multi level JSON string and is working well until then.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...