For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
- Working Directory: /incoming/external/data
- Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.