Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

regex - PowerShell extract by pattern

I have a folder full of text files and each file look something like below:

# Mainline
apple
orange
banana
onion #small#

# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#

# lineB
chocolate
nuts
sugar
coffee

# lineC
lemon
honey
carrots
broccoli

All files always start with # Mainline but the order of other lines are not the same. Some files missing lineA, some files missing lineC, some files have lineB first before lineA and so on.

I'm trying to see if I can extract the text between each of the lines beginning with # and make them their own file.

i.e., file1_mainline would have

# Mainline
apple
orange
banana
onion #small#

file1_lineA would have

# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#

and so on. I've tried using

$file = get-content "filename"
$Mainstring = "# Mainline"
$lineAString = "# lineA"
$lineBString = "# lineB"
$lineCString = "# lineC"

$MainExt = "$Mainstring(.*?)$lineAstring"
$lineAExt = "$lineAstring(.*?)$lineAstring"
$lineBExt = "$lineBstring(.*?)$lineCstring"
$lineCExt = "$lineCstring(.*)"
[regex]::Match($file,$MainExt).Groups[1].value | out-file file1_main.txt
[regex]::Match($file,$lineAExt).Groups[1].value | out-file file1_lineA.txt
[regex]::Match($file,$lineBstring).Groups[1].value | out-file file1_lineB.txt
[regex]::Match($file,$lineCstring).Groups[1].value | out-file file1_lineC.txt

Along with the fact that there might be a simpler approach to deal with this all, I'm running into the following problems:

  1. The files are from a Unix subsystem, I'm not sure if that's causing the issues but the line breaks are not preserved in the resulting file.
  2. Some files, where the order is not well preserved is where the script breaks.

I've looked up enough on here previously, but I can't seem to find a way to put together a working code. Any help is appreciated.

question from:https://stackoverflow.com/questions/65918904/powershell-extract-by-pattern

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Why not make it totally generic? Don't search for a specific block just deal with the blocks as they appear regardless of the order. If the input is as you said describe, scan through the file line by line and pick out the lines starting with # and then use the subsequent text on that line to create the filename. Then output all the following lines to that file until you next hit the next # line. Something like this:

foreach ($file in (gci *.txt)) {
    $c = Get-Content $file.fullname
    $filename = $null;
    foreach ($line in $c) {
        if ($line -match '^# (?<name>.*)') {
            $filename = "$($file.fullname.Substring(0, $file.fullname.Length - $file.Extension.Length))_$($Matches.name)$($file.extension)"
        }
        if ($filename) {
            Add-Content $filename $line
        }
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...