Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

MATLAB: Simple string analysis - Find locations

Here I have an example of a piece of literature that I would like to do a simple analysis on. Notice the different sections:

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

I'm interested all the data dat can be called "Random info in middle", which is after a Chapter name, and before a verse beginning.

I would like to use the function "extractBetween" to extract the information found between "CHAPTER #" and "1"(First Verse).

I know how to use the function "extractBetween", but how can I determine the locations just before "CHAPTER #" and just after "1"(First Verse), for any amount of Chapters?

At the end I would like to have such an answer, where the random information for each Chapter is allocated in a table:

enter image description here

I've tried, regexp() and findstr(), but have no success. All help will be appreciated. Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use a regular expression with regexp to match the text.

[tokens, matches] = regexp(str, '(CHAPTER d).s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s	%s
', tokens{k}(1), tokens{k}(2)); 
    % or: fprintf('%s	%s
', tokens{k}); 
end

Will print

CHAPTER 1   Random info in middle one, Random info still continues. 
CHAPTER 2   Random info in middle two. Random info still continues. 

To explain the regular expression (CHAPTER d).s*(.*?)1:

  • (CHAPTER d) matches CHAPTER with any number, and the () brackets surrounding it will capture the match in the tokens variable.
  • . matches the period
  • s* matches any possible whitespace
  • (.*?)1 will capture any text till the next 1 in the text. Note the questionmark to make it match lazy, otherwise it will match all the text till the last 1 in str.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...