Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

java - Reading website's contents into string

Currently I'm working on a class that can be used to read the contents of the website specified by the url. I'm just beginning my adventures with java.io and java.net so I need to consult my design.

Usage:

TextURL url = new TextURL(urlString);
String contents = url.read();

My code:

package pl.maciejziarko.util;

import java.io.*;
import java.net.*;

public final class TextURL
{
    private static final int BUFFER_SIZE = 1024 * 10;
    private static final int ZERO = 0;
    private final byte[] dataBuffer = new byte[BUFFER_SIZE];
    private final URL urlObject;

    public TextURL(String urlString) throws MalformedURLException
    {
        this.urlObject = new URL(urlString);
    }

    public String read() 
    {
        final StringBuilder sb = new StringBuilder();

        try
        {
            final BufferedInputStream in =
                    new BufferedInputStream(urlObject.openStream());

            int bytesRead = ZERO;

            while ((bytesRead = in.read(dataBuffer, ZERO, BUFFER_SIZE)) >= ZERO)
            {
                sb.append(new String(dataBuffer, ZERO, bytesRead));
            }
        }
        catch (UnknownHostException e)
        {
            return null;
        }
        catch (IOException e)
        {
            return null;
        }

        return sb.toString();
    }

    //Usage:
    public static void main(String[] args)
    {
        try
        {
            TextURL url = new TextURL("http://www.flickr.com/explore/interesting/7days/");
            String contents = url.read();

            if (contents != null)
                System.out.println(contents);
            else
                System.out.println("ERROR!");
        }
        catch (MalformedURLException e)
        {
            System.out.println("Check you the url!");
        }
    }
}

My question is: Is it a good way to achieve what I want? Are there any better solutions?

I particularly didn't like sb.append(new String(dataBuffer, ZERO, bytesRead)); but I wasn't able to express it in a different way. Is it good to create a new String every iteration? I suppose no.

Any other weak points?

Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Consider using URLConnection instead. Furthermore you might want to leverage IOUtils from Apache Commons IO to make the string reading easier too. For example:

URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();  // ** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);

If you don't want to use IOUtils I'd probably rewrite that line above something like:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
int len = 0;
while ((len = in.read(buf)) != -1) {
    baos.write(buf, 0, len);
}
String body = new String(baos.toByteArray(), encoding);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...