Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
330 views
in Technique[技术] by (71.8m points)

java - JSOUP Malformed URL when reading from CSV file using openCSV

I am having an issue with using JSoup in that it is giving me a malformed URL error. If I hardcode the URL into the program it works fine but if I read a csv file into a List<String[]> and then loop each of the values in the list it fails. For example if I hardcode http://www.clubmark.org.uk/ into the program it works fine, but if I read it from the csv into the List<String[]> it fails.

The stack trace is

Exception in thread "restartedMain" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: java.lang.IllegalArgumentException: Malformed URL: http://www.clubmark.org.uk/
    at org.jsoup.helper.HttpConnection.url(HttpConnection.java:131)
    at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:70)
    at org.jsoup.Jsoup.connect(Jsoup.java:73)
    at com.domainModel.DownloadImages.findImages(DownloadImages.java:43)
    at com.workingprojects.WebScraperApplication.main(WebScraperApplication.java:40)

My main class is

@SpringBootApplication
@EntityScan({"com.bootstrap","com.domainModel"})
@ComponentScan({"com.bootstrap","com.domainModel"})
public class WebScraperApplication {

    public static void main(String[] args) throws IOException, CsvException {
        SpringApplication.run(WebScraperApplication.class, args);
        
        DownloadImages downloadImages = new DownloadImages();


        
        ReadCSV readCSV = new ReadCSV();
        ArrayList<String[]> urls = (ArrayList<String[]>) readCSV.csvReader("C:\link1.csv");
    

        for (int i = 0; i < 1; i++) {     
            String[] thisURLObject = urls.get(0);
            String thisURL =thisURLObject[0];
            String status = downloadImages.findImages(thisURL, "C:\Users\xxx\images");
            System.out.println(thisURL + status);
            
            
            }
        
        
;
        System.out.println("finished");
        
    }

}

My class which gets the images and where the issue is seen is

package com.domainModel;


import org.jsoup.Jsoup;






public class DownloadImages {
    
    
    
     //The url of the website.
    @Getter @Setter
    private String webSiteURL;



//The path of the folder that you want to save the images to
@Getter @Setter
private  String folderPath;
 
public String findImages(String webSiteURL, String folderPath ) {
 
    try {
 
        //Connect to the website and get the html
        Document doc = Jsoup.connect(webSiteURL).get();
        
 
        //Get all elements with img tag ,
        Elements img = doc.getElementsByTag("img");
       System.out.println("Images is" + img.size());
       
 
       String folderNameWk2 = webSiteURL.replace(".html", "");
       String folderNameWk3 = folderNameWk2.replace("http://", "");
     
       Path path = Paths.get(folderPath + folderNameWk3);
       Files.createDirectories(path);
       String path1 = path.toString();
       System.out.println("The path is " + path1);
       
       
       int counter = 0;
 
        for (Element el : img) {
            
            
            
            String docName = String.valueOf(counter)+".jpeg";
 
            //for each element get the srs url
            String src = el.absUrl("src");
 
            System.out.println("Image Found!");
            System.out.println("src attribute is : "+src);
            getImages(src, path1, docName);
     
            counter = counter+1;
 
        }
 
    } catch (IOException ex) {
        
        System.err.println("There was an error");
        System.out.println(ex);
    //    Logger.getLogger(DownloadImages.class.getName()).log(Level.SEVERE, null, ex);
    }
    
    return "complete";
}



    private void getImages(String src, String folderPath, String docName) throws IOException {
 
     //   String folder = null;
 
        //Exctract the name of the image from the src attribute
        int indexname = src.lastIndexOf("/");
 
        if (indexname == src.length()) {
            src = src.substring(1, indexname);
        }
 
        indexname = src.lastIndexOf("/");
        String name = src.substring(indexname, src.length());
 
        System.out.println(name);
 
        //Open a URL Stream
        URL url = new URL(src);
        InputStream in = url.openStream();
 
        OutputStream out = new BufferedOutputStream(new FileOutputStream(folderPath+"/" + docName));
 
        for (int b; (b = in.read()) != -1;) {
            out.write(b);
        }
        out.close();
        in.close();
 
    }

    /**
     * @param webSiteURL
     * @param folderPath
     */
    public DownloadImages(String webSiteURL, String folderPath) {
        super();
        this.webSiteURL = webSiteURL;
        this.folderPath = folderPath;
    }

    /**
     * 
     */
    public DownloadImages() {
        super();
    }
    
    
}


And the class which gets the CSV file is 

    package com.domainModel;



public class ReadCSV {
    

    
    public List<String[]> csvReader(String fileName) throws IOException, CsvException{

           
        try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
            List<String[]> r = reader.readAll();
     
            
            return r;
            

    
}
}
}

My class which reads in the CSV

public class ReadCSV {
    

    
    public List<String[]> csvReader(String fileName) throws IOException, CsvException{

           
        try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
            List<String[]> r = reader.readAll();
     
            
            return r;
            

    
}
}
}

I am reasonably certain the issue is with the format of what I am passing from the list but when I look at the values they certainly seem to be Strings

First two rows of csv file

http://www.clubmark.org.uk/, http://www.designit-uk.com/,

Image of the first two rows of data in notepad

image of 1st 2 rows of csv


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...