Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
161 views
in Technique[技术] by (71.8m points)

perl - read xml file without any XML module

I am trying to read a XML form using Perl but I can not use any XML modules like XML::Simple, XML::Parse.

It is a simple XML form which has some basic information and a MS Doc attachment. I want to read this XML and download this attached Doc file then print the XML information in the screen.

But I don't know any way how I can do this without a XML module, I heard that XML file can be parse using Data::Dumper but I am not familiar with this module, so not getting how to do this.

Could you please help me on this if there is any way to do this without a XML modules?

Sample XML:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd like to re-iterate that this is a BAD IDEA. Because whilst XML looks like plain text - it's isn't plain text. And if you treat it as such, you are creating brittle, unmaintainable and unsupportable code, which may well break one day, because someone changes the XML format in a valid way.

I would strongly suggest that your first port of call is go back to your project, and point out how parsing XML without an XML parser is rather like trying to use a hammer to put screws into a piece of wood. In that it sort of works, but the results are rather shoddy, and frankly it's completely unnecessary because screwdrivers exist and they do the job properly, easily and are widely available.

E.g.

can you tell me how I can print the author, title and price for each book id for the above XML file with a XML module ?

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
foreach my $book ( $twig -> get_xpath ( '//book' ) ) {
    print join ("
", 
         $book -> att('id'),
         $book -> field('author'),
         $book -> field('title'),
         $book -> field('price'), ),"
----
";
}

However:

Given your very specific sample, you may be able to get away with treating it as 'plain text'. Before you do this, you should point out to your project lead that this is a risky approach - you're putting in screws with a hammer - and therefore creating ongoing risk of support problems, which is trivially resolved by just installing a bit of freely available, open source code.

I am only suggesting this AT ALL because I've had to deal with ludicrously unreasonable similar project demands.

Like this:

#!/usr/bin/env perl
use strict;
use warnings;

while ( <> ) {
   if ( m/<book/ ) { 
       my ( $id ) = ( m/id="(w+)"/ ); 
       print $id,"
";
   }
   if ( m/<author/ ) { 
        my ( $author ) = ( m/>(.*)</ );
        print $author,"
";
   }
}

Now, the reason this doesn't work is your sample above can be perfectly validly formatted as:

<?xml version="1.0"?>
<catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>

Or

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
  </book>
</catalog>

Or:

<?xml version="1.0"?>
<catalog
><book
id="bk101"
><author
>Gambardella, Matthew</author><title
>XML Developer's Guide</title><genre
>Computer</genre><price
>44.95</price><publish_date
>2000-10-01</publish_date><description
>An in-depth look at creating applications 
      with XML.</description></book><book
id="bk102"
><author
>Ralls, Kim</author><title
>Midnight Rain</title><genre
>Fantasy</genre><price
>5.95</price><publish_date
>2000-12-16</publish_date><description
>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>

Or:

<?xml version="1.0"?>

<catalog>
  <book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book>
  <book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book>
</catalog>

This is why you have so many comments that say 'use a parser' - from those snippets above, the simplistic example I gave you... will only work on one and break messily on the others.

But the XML::Twig solution handles them all correctly. XML::Twig is freely available on CPAN. (There's other libraries that do the job too just as well). And it's also pre-packaged with a lot of operating systems 'default' repositories.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...