[ann] PetitCobol

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
cbc
Reply | Threaded
Open this post in threaded view
|

[ann] PetitCobol

cbc
Hello.  It gives me great pleasure to announce my COBOL parser.  This is a fixed format COBOL parser.  I expect that it could be expanded to work with free format COBOL, but I have not need for that use.

Code is located at:

To invoke the parser, evalutate:
    CobolProg parseCobolCodingForm: <fileName>

This 'parser' contains 4 parsers plus a fair amount of additional logic to prep the files for for the prarsers (and output from previous parsers for later parsers).  The rough outline of what happens:

1) File is read line by line.  Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure.  (Parse it out into the various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.

The parser includes a full AST representation, along with a visitor to subclass to help handling the resulting AST.

The parser is not complete - it should parse any fixed format COBOL program file, but not all commands are implemented.  I have implemented a way to iteratively develop the parser.  It will continue to  parse each sentence up to a point where it cannot continue - at that point, it will parse into a CDJunk (for data division unknowns) and CobolStatement (for program division unknowns).  This later will point out any missing commands (which exist), or possibly incomplete commands (which may exist); a simple visitor over the AST trapping for those nodes should find them.

The result of the parse will leave you with a CobolProg containing the final parsed AST in the variable formattedStructure.   Comments in the code will be in the variable comments (along with the line number that they originated from).  In addition, most of the interim steps will also be present in the CobolProg instance, should you be interested in them. If not, you can send #cleanup to the instance to get rid of all but the final AST nodes.

Thanks,
cbc

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

stepharo
Thank Chris. This is a great news.
What is the license of the code?

Stef

On 21/7/14 03:47, Chris Cunningham wrote:
Hello.  It gives me great pleasure to announce my COBOL parser.  This is a fixed format COBOL parser.  I expect that it could be expanded to work with free format COBOL, but I have not need for that use.

Code is located at:

To invoke the parser, evalutate:
    CobolProg parseCobolCodingForm: <fileName>

This 'parser' contains 4 parsers plus a fair amount of additional logic to prep the files for for the prarsers (and output from previous parsers for later parsers).  The rough outline of what happens:

1) File is read line by line.  Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure.  (Parse it out into the various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.

The parser includes a full AST representation, along with a visitor to subclass to help handling the resulting AST.

The parser is not complete - it should parse any fixed format COBOL program file, but not all commands are implemented.  I have implemented a way to iteratively develop the parser.  It will continue to  parse each sentence up to a point where it cannot continue - at that point, it will parse into a CDJunk (for data division unknowns) and CobolStatement (for program division unknowns).  This later will point out any missing commands (which exist), or possibly incomplete commands (which may exist); a simple visitor over the AST trapping for those nodes should find them.

The result of the parse will leave you with a CobolProg containing the final parsed AST in the variable formattedStructure.   Comments in the code will be in the variable comments (along with the line number that they originated from).  In addition, most of the interim steps will also be present in the CobolProg instance, should you be interested in them. If not, you can send #cleanup to the instance to get rid of all but the final AST nodes.

Thanks,
cbc


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

Tudor Girba-2
In reply to this post by cbc
This really is a great piece of news!

Thanks,
Doru


On Mon, Jul 21, 2014 at 3:47 AM, Chris Cunningham <[hidden email]> wrote:
Hello.  It gives me great pleasure to announce my COBOL parser.  This is a fixed format COBOL parser.  I expect that it could be expanded to work with free format COBOL, but I have not need for that use.

Code is located at:

To invoke the parser, evalutate:
    CobolProg parseCobolCodingForm: <fileName>

This 'parser' contains 4 parsers plus a fair amount of additional logic to prep the files for for the prarsers (and output from previous parsers for later parsers).  The rough outline of what happens:

1) File is read line by line.  Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure.  (Parse it out into the various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.

The parser includes a full AST representation, along with a visitor to subclass to help handling the resulting AST.

The parser is not complete - it should parse any fixed format COBOL program file, but not all commands are implemented.  I have implemented a way to iteratively develop the parser.  It will continue to  parse each sentence up to a point where it cannot continue - at that point, it will parse into a CDJunk (for data division unknowns) and CobolStatement (for program division unknowns).  This later will point out any missing commands (which exist), or possibly incomplete commands (which may exist); a simple visitor over the AST trapping for those nodes should find them.

The result of the parse will leave you with a CobolProg containing the final parsed AST in the variable formattedStructure.   Comments in the code will be in the variable comments (along with the line number that they originated from).  In addition, most of the interim steps will also be present in the CobolProg instance, should you be interested in them. If not, you can send #cleanup to the instance to get rid of all but the final AST nodes.

Thanks,
cbc

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

Stephan Eggermont-3
In reply to this post by cbc
Nice! What image did you use to create it? I find I have some loading issues

Stephan
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

abergel
In reply to this post by cbc
Excellent Chris!!!

Alexandre
-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.



On Jul 20, 2014, at 9:47 PM, Chris Cunningham <[hidden email]> wrote:

Hello.  It gives me great pleasure to announce my COBOL parser.  This is a fixed format COBOL parser.  I expect that it could be expanded to work with free format COBOL, but I have not need for that use.

Code is located at:

To invoke the parser, evalutate:
    CobolProg parseCobolCodingForm: <fileName>

This 'parser' contains 4 parsers plus a fair amount of additional logic to prep the files for for the prarsers (and output from previous parsers for later parsers).  The rough outline of what happens:

1) File is read line by line.  Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure.  (Parse it out into the various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.

The parser includes a full AST representation, along with a visitor to subclass to help handling the resulting AST.

The parser is not complete - it should parse any fixed format COBOL program file, but not all commands are implemented.  I have implemented a way to iteratively develop the parser.  It will continue to  parse each sentence up to a point where it cannot continue - at that point, it will parse into a CDJunk (for data division unknowns) and CobolStatement (for program division unknowns).  This later will point out any missing commands (which exist), or possibly incomplete commands (which may exist); a simple visitor over the AST trapping for those nodes should find them.

The result of the parse will leave you with a CobolProg containing the final parsed AST in the variable formattedStructure.   Comments in the code will be in the variable comments (along with the line number that they originated from).  In addition, most of the interim steps will also be present in the CobolProg instance, should you be interested in them. If not, you can send #cleanup to the instance to get rid of all but the final AST nodes.

Thanks,
cbc
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
cbc
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

cbc
In reply to this post by Stephan Eggermont-3
Well, I didn't try a clean load first - I very much apologize about that.  This is my second take at the COBOL parser - the first one had rather too much logic in the AST nodes.  Unfortunately, just a few remainders were left in the package the cause loading to fail (although you should just be able to continue past the error).

Or, now, load the latest version, and also load the Cbc compatibility package (in the same repository) that includes a handful of methods that I used in the code, but are not rightly part of the parser itself.

This should now load and parse files in the latest Moose 5 image.

-cbc


On Mon, Jul 21, 2014 at 3:51 AM, Stephan Eggermont <[hidden email]> wrote:
Nice! What image did you use to create it? I find I have some loading issues

Stephan
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
cbc
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

cbc
In reply to this post by stepharo
MIT, as noted on SmalltalkHub.


On Mon, Jul 21, 2014 at 12:54 AM, stepharo <[hidden email]> wrote:
Thank Chris. This is a great news.
What is the license of the code?

Stef


On 21/7/14 03:47, Chris Cunningham wrote:
Hello.  It gives me great pleasure to announce my COBOL parser.  This is a fixed format COBOL parser.  I expect that it could be expanded to work with free format COBOL, but I have not need for that use.

Code is located at:

To invoke the parser, evalutate:
    CobolProg parseCobolCodingForm: <fileName>

This 'parser' contains 4 parsers plus a fair amount of additional logic to prep the files for for the prarsers (and output from previous parsers for later parsers).  The rough outline of what happens:

1) File is read line by line.  Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure.  (Parse it out into the various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.

The parser includes a full AST representation, along with a visitor to subclass to help handling the resulting AST.

The parser is not complete - it should parse any fixed format COBOL program file, but not all commands are implemented.  I have implemented a way to iteratively develop the parser.  It will continue to  parse each sentence up to a point where it cannot continue - at that point, it will parse into a CDJunk (for data division unknowns) and CobolStatement (for program division unknowns).  This later will point out any missing commands (which exist), or possibly incomplete commands (which may exist); a simple visitor over the AST trapping for those nodes should find them.

The result of the parse will leave you with a CobolProg containing the final parsed AST in the variable formattedStructure.   Comments in the code will be in the variable comments (along with the line number that they originated from).  In addition, most of the interim steps will also be present in the CobolProg instance, should you be interested in them. If not, you can send #cleanup to the instance to get rid of all but the final AST nodes.

Thanks,
cbc


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev



_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

Stephan Eggermont-3
In reply to this post by cbc
Good, now I can try it. How do you deal with copy books?
I really like that you remembered that Cobol should be
compilable in less than 16K of ram. Multiple stages really
help understandability here.

Stephan


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
cbc
Reply | Threaded
Open this post in threaded view
|

Re: [ann] PetitCobol

cbc
Hi.  Copy books - I basically ignore them.  Usually I tried to get the listing after the compiler had handled the copybooks (and any macro expansions) - made my life easier.  My take is that this is a parser - not a compiler.  After parsing, then other interesting things can happen - but for my work, I didn't really need to figure out how to handle copy books.

You might be interested in another little package: PunchedCards.  On SqueakMap
Should be able to show any oft the CobolCard's in from the initial parsing stage as a punched card, ready to be read.  Useful? Probably not, but fun.

May or may not run in Moose - haven't tried it yet (don't know if a required piece has shifted in Pharo yet or not, mainly).


On Tue, Jul 22, 2014 at 2:31 AM, Stephan Eggermont <[hidden email]> wrote:
Good, now I can try it. How do you deal with copy books?
I really like that you remembered that Cobol should be
compilable in less than 16K of ram. Multiple stages really
help understandability here.

Stephan


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev