[CLOSED]Date Stamp - Strip from String.

January 04, 2017, 04:07 PM

Hello.. I have a project where I need to strip a date or date time from a file name. The date/date-time can have multiple formats and appear anywhere within the file name. It is not always proceeded with a unique delimiter. The file name is pretty free form. Is there a function(s) available that will help to accomplish this. I've searched and some appear to want a unique delimiter or specific placement, etc:

Samples of file names:

abc_201612.txt
12-12345_abc_20161201.txt
120116_abc_company.txt
abc_xyz_company_20120601_040801.txt

As you can see the date can be in any format, with our without a time stamp, etc. These are only a few examples.

Any help would be appreaciated.

This message has been edited. Last edited by: Tamra,

January 04, 2017, 04:24 PM

Squatch

Some more details would be helpful.

How will you be automating this project? A computer program or scripting language on a particular operating system? Or using WebFOCUS somehow?

And are you intending to rename these files on whatever medium they are stored?

January 04, 2017, 04:37 PM

isjhb

This will be a report run via Report Caster. The goal of the report is to join 2/3 files using only this file name. The problem being that one file may/may not have some additional date/time information included in the name and the file name in the other file will not have the date information in it.

Currently, the file name fields on all files are striped of the .txt extension or .txt.pgp extension and upper cased - creating a new field: updated_file_nm_a, updated_file_nm_b. Then I need to join the files on these newly created fields. Additional processing needs to be done to this newly defined field to strip out the dates. I'm not trying to rename a file being stored. Basically, it can be just considered a basic string not a file name.

January 04, 2017, 04:50 PM

eric.woerle

So basically, you were told "We will create our file names however we want and you must psychically know which ones go together"... I think I would consult the nearest crystal ball on this one...

I guess the best thing I could think of would be to combine files based on whether they Contained "abc" or some other common word in the file. Other then that, if you are on linux, I would consider maybe creating a shell script that would look for numbers in groups of 6 or 8 and validate whether those numbers constitute a valid date. But something like this sounds about as unrealistic as it gets...

January 04, 2017, 04:52 PM

isjhb

Pretty much exactly what they are wanting. When you talk about combining files based on the whether they 'contain' something else I was trying to figure out how to do that. Currently I'm doing a join so either they match or not. Do you have a sample of how you would go about they other.

January 04, 2017, 05:15 PM

David Briars

Here is an interesting discussion on finding a 'date' in a string on Stack Overflow.

http://stackoverflow.com/quest...rbitrary-date-string

Perhaps, if nothing else, it will generate some ideas.

January 05, 2017, 08:30 AM

Squatch

quote:
Originally posted by isjhb:
When you talk about combining files based on the whether they 'contain' something else I was trying to figure out how to do that.

I'm still not totally sure about what you are trying to accomplish, but if you want to combine files together based on common character strings in the file names, you can do that in WebFOCUS. I did it for a project I worked on early last year.

WebFOCUS can combine individual files to make it appear as if all the data in the individual files is actually one big "virtual" database table. You just need to specify what directory the files are in and what common string of characters the file names share.

January 05, 2017, 08:53 AM

isjhb

As, I said. I have two files. They come from two separate processes. My user is wanting those joined/matched on file name. Problem is on one file there maybe be a data/date-time stamp also within that file name.

Ex:

File 1 Sample:

File Name Date Flag1 Field 1 Field 2, Etc
abc_test_company.txt 01/01/2016 Y 20 West
123-5678_xyz_co_20160101.txt 01/01/2016 N 10 North
201601_def_company.txt 01/01/2016 N 20 North
abc_light_co_20160101_040400.txt 01/01/2016 Y 20 South
12-6789_bank_co20160101.txt 01/01/2016 N 10 West

File 2 Sample:

File Name Date Flag6 Field 3 Field 4, Etc
abc_test_company.txt 01/01/2016 Y 100.00 West
123-5678_xyz_co.txt 01/01/2016 N 100.00 North
def_company.txt 01/01/2016 N 200.00 North
abc_light_co.txt 01/01/2016 Y 200.00 South

So as you can see from the examples above the first 4 files appear to be a match with the exception of the date/date-time that has been used in the file name. I need some way to join/match/merge, etc to get them joined. I need to use something more like a 'CONTAINS' or 'LIKE' but not sure how to do that when trying to join the 2 files. When these 2 files join they appear on different lines because obviously they didn't match.

January 05, 2017, 09:04 AM

Squatch

Perhaps you could create a DEFINE for the file name field and remove all characters that are not upper- or lower-case letters? Then JOIN on that?

I don't know if that is possible, but it is the only thing that comes to mind right now. Obviously, the situation you are in right now is not a good one.

January 05, 2017, 09:19 AM

isjhb

eric.woerle

Do you happen to have a sample of how you would go about combining the files based on the 'CONTAINS' as you suggested. Even once these files are joined, the records are on different lines.

January 05, 2017, 09:56 AM

Francis Mariani

I thought I understood what your problem is, but now I don't think so.

You need to join two files, where the file name is unpredictable? Why should this matter to the JOIN?

Is it the data IN THE FILES that you're having trouble with? If so, then it appears you have a problem.

The cleanest way I can think of to solve this problem is to create hold files and isolate the real file name data from the 'cruddy' file name data. Then the join should be easy.

Do you know from which date onward you might find a date buried in the file name data? Having a few 'known knowns' would be helpful. It looks like you may have two situations: 1) a form of date trails the last underscore; 2) a form of date is at the beginning. It looks like if a date is included in the file name data, it is in YYYYMMDD or YYYYMM format. It looks like the date in the file name data is some form of the date in the date column. If these are the known knowns, then it should be easy to isolate the real file name in the file name data. THEN do the JOIN.

January 05, 2017, 02:40 PM

Francis Mariani

Based on some assumptions, here's some code. The hardest part was removing what looks like time from the file name data. The date conversion might be a bit clumsy. The FILE_NAMEA...FILE_NAMEF DEFINE columns is where the file name data is manipulated, first to remove the date in YYYYMMDD format, then to remove the date in YYYYMM format, then to clean up. The last bit is to remove the time.

Hopefully this gives you a couple of ideas.

SET ASNAMES=ON
SET HOLDLIST=PRINTONLY
SET HOLDFORMAT=ALPHA
-RUN

-* Create the master for file1 ----------------------------------------
APP FI DATAMAST DISK file1.mas
-RUN

-WRITE DATAMAST FILE=FILE1, SUFFIX=FIX, $
-WRITE DATAMAST SEGNAME=SEG1, SEGTYPE=S0, $
-WRITE DATAMAST FIELD=FILE_NAME , ALIAS=FILE_NAME , USAGE=A32, ACTUAL=A32, $
-WRITE DATAMAST FIELD=FILE_DATE1, ALIAS=FILE_DATE1, USAGE=A10, ACTUAL=A10, $
-WRITE DATAMAST FIELD=FLAG1     , ALIAS=FLAG1     , USAGE=A01, ACTUAL=A01, $
-WRITE DATAMAST FIELD=FIELD1    , ALIAS=FIELD1    , USAGE=A10, ACTUAL=A10, $
-WRITE DATAMAST FIELD=FIELD2    , ALIAS=FIELD2    , USAGE=A10, ACTUAL=A10, $



-* Create the data file for file1 --------------------------------------
FILEDEF FILE1 DISK file1.ftm
-RUN

-WRITE FILE1 abc_test_company.txt            01/01/2016Y20        West
-WRITE FILE1 123-5678_xyz_co_20160101.txt    01/01/2016N10        North
-WRITE FILE1 201601_def_company.txt          01/01/2016N20        North
-WRITE FILE1 abc_light_co_20160101_040400.txt01/01/2016Y20        South
-WRITE FILE1 12-6789_bank_co20160101.txt     01/01/2016N10        West



-* Create the master for file2 ----------------------------------------
APP FI DATAMAST DISK file2.mas
-RUN

-WRITE DATAMAST FILE=FILE2, SUFFIX=FIX, $
-WRITE DATAMAST SEGNAME=SEG1, SEGTYPE=S0, $
-WRITE DATAMAST FIELD=FILE_NAME , ALIAS=FILE_NAME , USAGE=A32, ACTUAL=A32, $
-WRITE DATAMAST FIELD=FILE_DATE2, ALIAS=FILE_DATE2, USAGE=A10, ACTUAL=A10, $
-WRITE DATAMAST FIELD=FLAG6     , ALIAS=FLAG6     , USAGE=A01, ACTUAL=A01, $
-WRITE DATAMAST FIELD=FIELD3    , ALIAS=FIELD3    , USAGE=A10, ACTUAL=A10, $
-WRITE DATAMAST FIELD=FIELD4    , ALIAS=FIELD4    , USAGE=A10, ACTUAL=A10, $



-* Create the data file for file2 --------------------------------------
FILEDEF FILE2 DISK file2.ftm
-RUN

-WRITE FILE2 abc_test_company.txt            01/01/2016Y100.00    West
-WRITE FILE2 123-5678_xyz_co.txt             01/01/2016N100.00    North
-WRITE FILE2 def_company.txt                 01/01/2016N200.00    North
-WRITE FILE2 abc_light_co.txt                01/01/2016Y200.00    South



DEFINE FILE FILE1
FILE_DATE_MDYYA/A8MDYY  = EDIT(FILE_DATE1, '99$99$9999');
FILE_DATE_MDYY/MDYY     = FILE_DATE_MDYYA;
FILE_DATE_YYMDA/A8YYMD  = FILE_DATE_MDYY;
FILE_DATE_YYMA/A6YYM    = FILE_DATE_MDYY;

FILE_NAMEA/A32 = STRREP (32, FILE_NAME , 8, FILE_DATE_YYMDA, 0, '', 32, 'A32');
FILE_NAMEB/A32 = STRREP (32, FILE_NAMEA, 6, FILE_DATE_YYMA , 0, '', 32, 'A32');
FILE_NAMEC/A32 = STRREP (32, FILE_NAMEB, 2, '_.', 1, '.', 32, 'A32');
FILE_NAMED/A32 = STRREP (32, FILE_NAMEC, 2, '__', 1, '_', 32, 'A32');
FILE_NAMEE/A32 = IF FILE_NAMED LIKE '\_%' ESCAPE '\' THEN SUBSTR(32, FILE_NAMED, 2, 32, 32, 'A32') ELSE FILE_NAMED;
FILE_NAMEE_REV/A32 = LJUST(32, REVERSE(32, FILE_NAMEE, 'A32'), 'A32');
FILE_NAMEE_PAT/A32 = PATTERN(32, FILE_NAMEE_REV, 'A32' );
SUBS/A11 = SUBSTR(32, FILE_NAMEE_PAT, 1, 11, 11, 'A11');
FILE_NAMEF_REV/A32 = IF SUBSTR(32, FILE_NAMEE_PAT, 1, 11, 11, 'A11') EQ 'aaa.999999_' THEN 'txt.' || SUBSTR(32, FILE_NAMEE_REV, 12, 28, 28, 'A28') ELSE FILE_NAMEE_REV;
FILE_NAMEF/A32 = LJUST(32, REVERSE(32, FILE_NAMEF_REV, 'A32'), 'A32');
END
-RUN



TABLE FILE FILE1
PRINT
FILE_DATE1
FLAG1
FIELD1
FIELD2
-*FILE_NAMEB
-*FILE_NAMEC
-*FILE_NAMED
-*FILE_NAMEE
-*FILE_NAMEE_REV
-*FILE_NAMEE_PAT
-*FILE_NAMEF_REV
-*FILE_NAMEF
SUBS
BY FILE_NAMEF AS 'FILE_NAME'

ON TABLE HOLD AS HFILE1
END
-RUN



MATCH FILE HFILE1
PRINT
FILE_DATE1
FLAG1
FIELD1
FIELD2

BY FILE_NAME
RUN

FILE FILE2
PRINT
FILE_DATE2
FLAG6
FIELD3
FIELD4

BY FILE_NAME

AFTER MATCH HOLD AS HFILE3 OLD-OR-NEW
END
-RUN



TABLE FILE HFILE3
PRINT *
END
-RUN

January 05, 2017, 03:00 PM

David Briars

Oh, so the requirement is to match two files on a field - FileName, and the field is known to have different date formats within it; yet a human would be able to discern that the FileName values are otherwise the same.

@Francis - What is you kept your code basically as it is...use SOUNDEX to get a phonetic value of the string, then at match time, match the phonetic values? Something like strip out all 'dates?numbers?', then get the phonetic value of what is left, then match...or don't even attempt to strip out the 'dates'...

Just a thought, been a while since I've used SOUNDEX.

January 05, 2017, 03:10 PM

Squatch

quote:
Originally posted by David Briars:
@Francis - What if you kept your code basically as it is...use SOUNDEX to get a phonetic value of the string, then at match time, match the phonetic values?

SOUNDEX is the computer programmer's equivalent of the "Hail Mary pass" in American football.

January 05, 2017, 03:20 PM

Francis Mariani

Hmmm, I've never used SOUNDEX... Does it work for Canadian accents?

January 05, 2017, 03:48 PM

Squatch

I think you Canucks are out of luck, eh?

January 05, 2017, 05:20 PM

Francis Mariani

January 05, 2017, 07:41 PM

David Briars

quote:
SOUNDEX is the computer programmer's equivalent of the "Hail Mary pass" in American football.

:-)

SOUNDEX might approximate/definitely would not be completely exact.

Here is a discussion of possible implementation (not WF I know):
http://stackoverflow.com/quest...-approximately-equal

For sure if given the input, certain substrings (dates) could be reliably stripped, then that would be the way to go.

Fun stuff!

This message has been edited. Last edited by: David Briars,

January 12, 2017, 05:05 PM

susannah

ok, if all you need to do is REMOVE the dates, then how about using the STRIP function.
STRIP 1, then 2, then....
you might have to strip the dashes as well.
but not the underscores, it seems you keep those.
or you might have to STRIP in two parts:
STRIP _1 and then STRIP 1,
STRIP _2 and then ..
oh wait, maybe you can't STRIP more than 1 char at a time....
so you gotta use POSIT function 10 times (or more)to see if the string contains '_1' and then re-form the string out of the bit before the position and the bit beginning 2 spots after the position... and keep that up until the POSIT function gives a 0 value, meaning you got 'em all.
...
or... use the STRREP function 10 times to remove
'_1' and '_2' etc. You can replace them with blanks if you have to, and then STRIP out all the blanks at the end of the game. I like this way!

This message has been edited. Last edited by: susannah,