I have a folder full of text files and each file look something like below:
# Mainline
apple
orange
banana
onion #small#
# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#
# lineB
chocolate
nuts
sugar
coffee
# lineC
lemon
honey
carrots
broccoli
All files always start with # Mainline but the order of other lines are not the same. Some files missing lineA, some files missing lineC, some files have lineB first before lineA and so on.
I'm trying to see if I can extract the text between each of the lines beginning with # and make them their own file.
i.e., file1_mainline would have
# Mainline
apple
orange
banana
onion #small#
file1_lineA would have
# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#
and so on. I've tried using
$file = get-content "filename"
$Mainstring = "# Mainline"
$lineAString = "# lineA"
$lineBString = "# lineB"
$lineCString = "# lineC"
$MainExt = "$Mainstring(.*?)$lineAstring"
$lineAExt = "$lineAstring(.*?)$lineAstring"
$lineBExt = "$lineBstring(.*?)$lineCstring"
$lineCExt = "$lineCstring(.*)"
[regex]::Match($file,$MainExt).Groups[1].value | out-file file1_main.txt
[regex]::Match($file,$lineAExt).Groups[1].value | out-file file1_lineA.txt
[regex]::Match($file,$lineBstring).Groups[1].value | out-file file1_lineB.txt
[regex]::Match($file,$lineCstring).Groups[1].value | out-file file1_lineC.txt
Along with the fact that there might be a simpler approach to deal with this all, I'm running into the following problems:
- The files are from a Unix subsystem, I'm not sure if that's causing the issues but the line breaks are not preserved in the resulting file.
- Some files, where the order is not well preserved is where the script breaks.
I've looked up enough on here previously, but I can't seem to find a way to put together a working code. Any help is appreciated.
question from:
https://stackoverflow.com/questions/65918904/powershell-extract-by-pattern 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…