Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
440 views
in Technique[技术] by (71.8m points)

pagination - GitHub API - Different number of results for jq filtered response

I am trying to use the GitHub API to make a list of well documented, open source Java libraries. To do so, I went through the GitHub API documentation and made this simple curl.

curl -G  https://api.github.com/search/repositories?q=language:Java+stars:%3E=500+library+java+in:readme > output1.txt

The output of this is a giant txt file, containing information about all of the repositories found. In this example, there was a total of 736 matches. However, the file from the command above is quite unreadable, so I decided to do some parsing using jq, which resulted in the following code:

curl -G  https://api.github.com/search/repositories?q=language:Java+stars:%3E=500+library+java+in:readme 
 | jq ".items[] | {name, description, language, watchers_count, html_url}" > parsedOutput1.txt

After this, instead of 736 results, I got something around 30 repositories, which is unacceptable for my purposes.

Doing this search: language:java stars:>=500 java library in:readme in the GitHub search box gives me the same 736 results. I don't really know what i am doing wrong so I could use the help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It was a paging problem, as presented in the documentation, the api only gives you 30 items per request, so you need to add some code to include all the pages. I was using bash so my code ended up like this:

 for i in `seq 1 34`;
        do
            URL="https://api.github.com/search/repositories?q=language:Java+stars:%3E=500+library+java+in:readme&page=$i"
            echo $URL
            curl -G  $URL 
            | jq ".items[] | {name, description, language, watchers_count, html_url}" >> parsedOutput1.txt
        done

On another note, when doing a lot of requests you should authenticate, otherwise you will end up with the API rate limit exceeded message.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...