regex - PowerShell extract by pattern

Question

Welcome To Ask or Share your Answers For Others

regex - PowerShell extract by pattern

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - PowerShell extract by pattern

I have a folder full of text files and each file look something like below:

# Mainline
apple
orange
banana
onion #small#

# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#

# lineB
chocolate
nuts
sugar
coffee

# lineC
lemon
honey
carrots
broccoli

All files always start with # Mainline but the order of other lines are not the same. Some files missing lineA, some files missing lineC, some files have lineB first before lineA and so on.

I'm trying to see if I can extract the text between each of the lines beginning with # and make them their own file.

i.e., file1_mainline would have

# Mainline
apple
orange
banana
onion #small#

file1_lineA would have

# lineA
orange
banana
watermelon
raisins #packed#
raisins #unpacked#

and so on. I've tried using

$file = get-content "filename"
$Mainstring = "# Mainline"
$lineAString = "# lineA"
$lineBString = "# lineB"
$lineCString = "# lineC"

$MainExt = "$Mainstring(.*?)$lineAstring"
$lineAExt = "$lineAstring(.*?)$lineAstring"
$lineBExt = "$lineBstring(.*?)$lineCstring"
$lineCExt = "$lineCstring(.*)"
[regex]::Match($file,$MainExt).Groups[1].value | out-file file1_main.txt
[regex]::Match($file,$lineAExt).Groups[1].value | out-file file1_lineA.txt
[regex]::Match($file,$lineBstring).Groups[1].value | out-file file1_lineB.txt
[regex]::Match($file,$lineCstring).Groups[1].value | out-file file1_lineC.txt

Along with the fact that there might be a simpler approach to deal with this all, I'm running into the following problems:

The files are from a Unix subsystem, I'm not sure if that's causing the issues but the line breaks are not preserved in the resulting file.
Some files, where the order is not well preserved is where the script breaks.

I've looked up enough on here previously, but I can't seem to find a way to put together a working code. Any help is appreciated.

question from:https://stackoverflow.com/questions/65918904/powershell-extract-by-pattern

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:08:52+0000

Why not make it totally generic? Don't search for a specific block just deal with the blocks as they appear regardless of the order. If the input is as you said describe, scan through the file line by line and pick out the lines starting with # and then use the subsequent text on that line to create the filename. Then output all the following lines to that file until you next hit the next # line. Something like this:

foreach ($file in (gci *.txt)) {
    $c = Get-Content $file.fullname
    $filename = $null;
    foreach ($line in $c) {
        if ($line -match '^# (?<name>.*)') {
            $filename = "$($file.fullname.Substring(0, $file.fullname.Length - $file.Extension.Length))_$($Matches.name)$($file.extension)"
        }
        if ($filename) {
            Add-Content $filename $line
        }
    }
}

Categories

regex - PowerShell extract by pattern

regex - PowerShell extract by pattern

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags