Your suggested method is probably not a good way to do this. If:
- you have access to
grep
- your version of grep supports perl-compatible regex (
PCRE
)
- this style of
div
only wraps your data, not other elements
- the 'data'
div
does not contain other div
s
Then you can use:
(?s)<div style="float:left; padding-top:5px;">.*?</div>
The important parts of this are:
(?s)
which activates DOTALL
, which means that .
will match newlines
.*?
which matches the contents of the div reluctantly, which means it'll stop at the first </div>
it finds.
To use this, you'll need to activate a few grep options:
grep -Pzo $PATTERN file
For these:
-P
activates the PCRE
-z
replaces
by NUL
so grep will treat the entire file as a single line
-o
prints only the matching parts
After this you'll need to strip off the divs. sed
is a good tool for this.
sed 's|</?div[^>]*>||g'
If you put all of your files in one directory you can do the joining at the same time:
grep -Pzo $PATTERN *.html | sed 's|</?div[^>]*>||g' > out.html
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…