Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
275 views
in Technique[技术] by (71.8m points)

Add an exclude array to an existing awk code

Given the awk code from this accepted answer:

awk '
BEGIN{
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    if(tolower($i) in smallLetters){
      $i=tolower(substr($i,1,1)) substr($i,2)
    }
    else{
      if($i~/^"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file

This code properly capitalice the lines of a file when it matches some text, in this case TITLE. The idea is to use it to modify some cue sheet files and properly capitalice them following three basic rules:

  • Capitalize all words, with exception to:
  • Lowercase all articles (a, the), prepositions (to, at, in, with), and coordinating conjunctions (and, but, or)
  • Capitalize the first and last word in a title, regardless of part of speech

Well, I would like to modify the awk code, to add a second array with a list of words to exclude, and always write them as they're written in the matrix.

This would be very useful for words like: McCartney, feat., vs., CD, USA, NYC, etc. Because, without this exclusion array, they would be changed to: Mccartney, Feat., Cd, Usa, Nyc, etc. This exclusion should be even when these words are the first and last word of the TITLE, as explained in the related question.

For example, with an array like this: "McCartney feat. vs. CD USA NYC" the code must convert this:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "dig A pony, Feat. paul mccartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

Into this:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "Dig a Pony, feat. Paul McCartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

Instead of doing this:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "Dig a Pony, Feat. Paul Mccartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

EDIT: OP told there could be words like "a" too so handle that case adding following now.

awk '
BEGIN{
  s1="""
  num=split("McCartney feat. vs. CD USA NYC",array," ")
  for(k=1;k<=num;k++){
     temp=tolower(array[k])
     ignoreLetters[temp]=array[k]
  }
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]=array[i]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    front=end=nothing=both=""
    if($i~/^"/ && $i!~/"$/){
      temp=tolower(substr($i,2))
      front=1
    }
    else if($i ~ /^".*"$/){
      temp=tolower(substr($i,2,length($i)-2))
      both=1
    }
    else if($i ~/"$/ && $i!~/^"/){
      temp=tolower(substr($i,1,length($i)-1))
      end=1
    }
    else{
      temp=tolower($i)
      nothing=1
    }
    if(temp in ignoreLetters){
      if(front){
         $i=s1 ignoreLetters[temp]
      }
      else if(end){
         $i=ignoreLetters[temp] s1
      }
      else if(both){
         $i=s1 ignoreLetters[temp] s1
      }
      else if(nothing){
         $i=ignoreLetters[temp]
      }
    }
    else if(temp in smallLetters){
      if(front){
         $i=s1 smallLetters[temp]
      }
      else if(end){
         $i=smallLetters[temp] s1
      }
      else if(nothing){
         $i=smallLetters[temp]
      }
      else if(both){
         $i=s1 smallLetters[temp] s1
      }
    }
    else{
      if($i~/^"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file


Could you please try following.

awk '
BEGIN{
  s1="""
  num=split("McCartney feat. vs. CD USA NYC",array," ")
  for(k=1;k<=num;k++){
     temp=tolower(array[k])
     ignoreLetters[temp]=array[k]
  }
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]=array[i]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    front=end=nothing=""
    if($i~/^"/){
      temp=tolower(substr($i,2))
      front=1
    }
    else if($i ~/"$/){
      temp=tolower(substr($i,1,length($i)-1))
      end=1
    }
    else{
      temp=tolower($i)
      nothing=1
    }
    if(temp in ignoreLetters){
      if(front){
         $i=s1 ignoreLetters[temp]
      }
      else if(end){
         $i=ignoreLetters[temp] s1
      }
      else if(nothing){
         $i=ignoreLetters[temp]
      }
    }
    else if(tolower($i) in smallLetters){
      $i=tolower(substr($i,1,1)) substr($i,2)
    }
    else{
      if($i~/^"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file

Output will be as follows:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
TITLE "Dig a Pony, feat. Paul McCartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

What does code take care of:

  • It takes care of making mentioned words into small letters.
  • It takes care of making some letters as per their style, mentioned by OP in question.
  • It takes of rest of fields which DO NOT fall in any of above category and makes their first letter as capital letter.
  • Code also takes care of words starting with " OR ending with " too, it will first remove them to check if they are present into user mentioned array or not and later add them as per their position.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...