I am processing a large csv file with fields enclosed in double quotes which has text descriptions containing unescaped double quotes which I need to replace with an escaped double quote. I have tried using the following regex: (?<!^|",)("(?:$[^"])|"(?!,"|$))
which is able to find the unescaped quotes except when they are followed by a line break. Any help in resolving this issue gratefully received.
I know the csv is incorrectly formatted but don't have control of this unfortunately, so I need to be able to correct the formatting for further processing.
Example:
"Field 1","Field 2","Field 3 "with unescaped quote"
followed by line break","Field 4"
Needs to become:
"Field 1","Field 2","Field 3 ""with unescaped quote""
followed by line break","Field 4"
Powershell script I'm using is as follows:
[string]$path = 'C: ...'
[string]$directory = [System.IO.Path]::GetDirectoryName($Path);
[string]$strippedFileName = [System.IO.Path]::GetFileNameWithoutExtension($Path);
[string]$extension = [System.IO.Path]::GetExtension($Path);
[string]$newFileName = $strippedFileName + [DateTime]::Now.ToString("yyyyMMdd-HHmmss") + $extension;
[string]$newFilePath = [System.IO.Path]::Combine($directory, $newFileName);
$reader = New-Object 'System.IO.StreamReader'($path, $true);
$regex = [regex] '(?<!^|",)("(?:$[^"])|"(?!,"|$))'
$writer = [System.IO.StreamWriter] $newFilePath;
try{
while (($line = $reader.ReadLine()) -ne $null ){
$newline = $line -replace $regex, '""';
$writer.WriteLine($newline);
}
}
finally{
$reader.Close();
$writer.Close();
}
question from:
https://stackoverflow.com/questions/65951075/powershell-regex-replace-unescaped-double-quote-followed-by-line-break 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…