Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

haskell - looking for a function like break that include the last element

Something that should be easy but somehow I don't find how to do it... let's have the following string: "0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"

the section 0@workspace_command can be something else but will never contain a '-', I want a function that returns the following result : ["0@workspace_command-","7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"] , all the functions that I have found give me the following result : ["0@workspace_command","-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"]...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is an example of the kind of approach I advocate in my comment above, to show how lightweight it can be. I'll use regex-applicative for my parser combinators; a similar approach (with some tweaks) will work with other combinator libraries. I'll also use the uuid package. So, the boilerplate:

import Data.List
import Data.UUID
import Text.Regex.Applicative
import Text.Regex.Applicative.Common

A custom-tailored type (I picked a name without really knowing what these are for):

data IndexedCmd = IndexedCmd
    { index :: Int
    , command :: String
    , uuid :: UUID
    } deriving (Eq, Ord, Read, Show)

If we can get our hands on one of these, it offers a lot of features that your break version doesn't, and didn't require much code to cook up. Here's how we build one. We need to build a parser for UUIDs first:

hexDigitAsChar :: RE Char Char
hexDigitAsChar = psym $ c -> or
    [ '0' <= c && c <= '9'
    , 'a' <= c && c <= 'f'
    , 'A' <= c && c <= 'F'
    ]

parseUUID :: RE Char UUID
parseUUID = id
    . fmap read
    . sequenceA
    . intercalate [sym '-']
    $ [replicate n hexDigitAsChar | n <- [8,4,4,4,12]]

Once we have that in place, our parser for IndexedCmds is short and sweet:

parseIndexedCmd :: RE Char IndexedCmd
parseIndexedCmd = pure IndexedCmd
    <*> decimal <* sym '@'
    <*> many anySym <* sym '-'
    <*> parseUUID

That's the whole development. It's a bit longer than the other answers, but it also does a lot more, including a lot of work that you would probably want to do anyway even if you had the exact variant of break that you want. For example, it extracts a structured representation of the 0@workspace_command- prefix; and it checks that the UUID is in the right format, a task that is so annoying to do with bare Data.List functions that I would probably tire of writing the code and skip it entirely if I were doing this without parser combinators.

We can now use match to parse a single string if we want:

> match parseIndexedCmd "0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace_command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

As a side bonus, we can now even handle commands which have dashes in them, an effect that would be very tedious indeed to replicate using break as our primitive:

> match parseIndexedCmd "0@workspace-command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace-command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

We can also continue our development and embed this parser in a larger one for entire files full of these strings or as part of some other structured file format.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...