unix - how access specific part of data as an input of AWK

Question

Welcome To Ask or Share your Answers For Others

unix - how access specific part of data as an input of AWK

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

unix - how access specific part of data as an input of AWK

Suppose I want to access an online dictionary and need to look for a specific word. I just like to have the specific part of data, which is those related to word and its translation as input of AWK,any idea?

In other words, I just want to have on my machine a margin of data, How can I prevent downloading all the data and hopefully save space and time. Is there any way to do so without downloading all the data to local machine?

This question is related to my last question here.

Edit 1:

I select dictionary as an example because when you want to look up for a word, it is enough to access a specific part of data and there is no need to process whole of it.

I am not an expert in programming so i was thinking I can modify this answer to make it work(that is why I add AWK tag again). I dont use any specific OS or tool. this is just a basic idea to see what are the possibilities so I dont know how can I improve the tags.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:21:31+0000

awk cannot download. You must download the file and pipe it into a command that terminates as soon as it finds a result:

wget -qqO- http://example.com/path |grep -wim1 "word"

wget -qqO- URL will have no output other than the content of the given URL, which is placed on standard out so you can then parse it. grep -wim1 "word" will find the first bounded word matching "word" and then terminate. If you don't need it outputted, you can use -wiq instead. If the dictionary has one word per line (and nothing else), you're better off with -x instead of -w so that you can match "can" in its entirety rather than "can't" (' is a word boundary). Remove the -i if you want to match case.

In the comments, you asked:

it may improve to jumpt to start of "w" character maybe so not to download whole data from "a" to "w". is it possible? I guess not

Some programs can "resume" downloads and you may be able to play with that, but you'd have to guess where to start. This would be a lot of work and you might seek too far and therefore fail to get a match.

If you are querying this dictionary more than once, I'd recommend downloading it and saving it so you can query it locally. Even the largest dictionary I know of is only 213MB (compressed, search with zgrep), though I am assuming you're talking about a traditional word list rather than a hash table or other arbitrary data form. Of course, anything longer would take such a long time to download that you'd only want to do it once.

If you really don't want to store it locally, you should probably consider a database rather than a flat file.

Categories

unix - how access specific part of data as an input of AWK

unix - how access specific part of data as an input of AWK

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags