Posted by
Esteban A. Maringolo on
Jul 26, 2017; 5:05pm
URL: https://forum.world.st/NeoCSV-on-Irregular-Files-tp4956850p4956858.html
2017-07-26 13:04 GMT-03:00 Sven Van Caekenberghe <
[hidden email]>:
> I agree.
>
> If the file is non-homegeneous it is not longer CSV by definition.
>
> Holding on to the original stream and creating new readers for each section is one option, an other one could be to add a #reset method.
>
> The big question is how to known when one section begins/ends.
In my experience I looked for certain delimiters, like a header row
with the field names.
Oil & Gas telemetry instruments generate outputs like that, like a
concatenation of several CSVs into one, maybe even with a non-csv like
header of 10 rows of data.
What I had to do to deal with that was either:
a) Reading it line by line, buffering the hole "segment" until EOF or
the next delimiter is found, or...
b) Pre-scanning the whole file, and marking start and end positions of
each segment, generating a new readStream with the contents and passed
it the CSV parser (which doesn't care nor know about segments).
> NeoCSVReader holds a one char buffer, so you could peek for something, just maybe. Then you could discover the section switches while parsing (a bit like #atEnd is used from #upToEnd, add a #atSectionEnd). But it all depends on your specific format.
It's harder to do if it is char based, instead of "line" based. Or at
least harder to code.
Regards!
Esteban A. Maringolo