Thursday, May 3, 2012

TIBCO BW Parse CSV with Header & Footer

This will be a quick post to answer a question from a reader on how to parse a CSV file with typical header and footer rows.

The original post is here.

Intuitively, there are 3 parts in such kind of file i.e. the header, the payload and the footer. Each of these parts needs to be parsed differently mainly because they may have either different number of fields, different delimiter, and so forth. As usual, there would be more than one solution to a given problem, the decision will really depend on your circumstances, so pick what you like and discard what you think isn't applicable.

Quick summary of steps.

In TIBCO BW Designer, define the following data formats:

1) Header format
2) Row Format (for the payload)
3) Footer format; and
4) Wholefile (for the entire file, its sole purpose is for row counting)

I created a master XSD (pictured below) to aid the creation of the first 3 data format.

Create a process as follow.

Brief descriptions of the process activities:

1) The first 'Parse Data' is for counting, hence 'ForCounting' is the name. Its sole purpose of existence is to count the total number of rows in the input file.
2) 'Parse Header', as the name suggest, parse the first row, directly from the input file (now specified as an input parameter in the 'start' activity). The assumption is that the first row is always the header.
3) 'Parse Data', parses the payload body of the directly input file. It uses the counter function that counts the number of rows in the output of  'Parse Data' activity. It starts parsing from row #2 and stops at row # [Totalrows -3]. Work the math, it is tricky.
4) 'Parse Footer' as the name suggested parses the last row of the direct file to give you the footer.
5) The 'Render XML' activity assembles all the parts together to form an XML output that complies to a given predefined XSD. See XSD picture above.

Future improvements:

1) Error handling to check if the file is empty so the position index won't go negative
2) Loop optimization for better performance when handling large file
3) To handle multi-row headers/footers, not sure how useful this feature could be :-)
4) Add your polling mechanism...throw in a semaphore mechanism to ensure only completed input files are polled...

That is it. I told you it was quick. Get the entire TIBCO Designer project file here, feel free to play with it and your feedback is always welcomed.

Cheers,

2 comments:

  1. This looks great. Just two one question. What if the first row(header row) is empty and the first column is empty. So your header starts at, say row 3 column 2 and your data starts at row 5 column 2. How do you handle that

    ReplyDelete
  2. I have a csv file which has 2 header I need to ignore some some headers in row1 and some in row 2 .How can that he handled .Also if there is no fixed column positions is it possible to handle in tibco bw

    ReplyDelete