Consuming delimiters in PetitParser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Consuming delimiters in PetitParser

hernanmd
Hi all,

I'm writing a parser like the following

( $> asParser , 'gi' asParser , $| asParser ,
  ( #digit asParser plus flatten plusLazy: $| asParser ) ,
        ( (#letter asParser / #digit asParser / #space asParser /
#punctuation asParser ) asParser plus flatten ) ) end

In the attachment you will see the my sample input and the parsing
result collection in which the last element is "|abc", but I want the
#plusLazy: (or #plusGreedy: ?) to consume the $| delimiter so it
results in

#($> 'gi' $| #('648') $| 'abc')

any suggestions?

Cheers,

--
Hernán Morales
Information Technology Manager,
Institute of Veterinary Genetics.
National Scientific and Technical Research Council (CONICET).
La Plata (1900), Buenos Aires, Argentina.
Telephone: +54 (0221) 421-1799.
Internal: 422
Fax: 425-7980 or 421-1799.

_______________________________________________
Pharo-users mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users

fasta1.jpg (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Consuming delimiters in PetitParser

Lukas Renggli
Hi Hernán,

I am not really sure what you want to express with the following parser?

    ( #digit asParser plus flatten plusLazy: $| asParser )

Having multiple quantifications (#optional, #plus, #plusLazy:,
#plusGreedy:, #star, #starLazy:, #plusGreedy:, #times:, #min:,
#min:max:) in a row does not really make sense and seems likely a bug.
In your example the #plusLazy: will only be able to consume the
receiver once. As you can read in the method comment of #plusLazy: the
argument is not consumed, thus whatever follows will have to consume
it.

If you want to accept multiple numbers like in

    >gi|123|456|789|abc

then you should likely use #separatedBy: like in

   p := ( $> asParser , 'gi' asParser , $| asParser ,
        ( #digit asParser plus flatten separatedBy: $| asParser ) ,
        ($| asParser) ,
        ( (#letter asParser / #digit asParser / #space asParser /
#punctuation asParser ) asParser plus flatten ) ) end.

If you just want one number then the simpler

   p := ( $> asParser , 'gi' asParser , $| asParser ,
        ( #digit asParser plus flatten ) ,
        ($| asParser) ,
        ( (#letter asParser / #digit asParser / #space asParser /
#punctuation asParser ) asParser plus flatten ) ) end.

will do.

Let us know if this solves your problem.

Lukas


On 8 October 2010 21:29, Hernán Morales Durand <[hidden email]> wrote:

> Hi all,
>
> I'm writing a parser like the following
>
> ( $> asParser , 'gi' asParser , $| asParser ,
>  ( #digit asParser plus flatten plusLazy: $| asParser ) ,
>        ( (#letter asParser / #digit asParser / #space asParser /
> #punctuation asParser ) asParser plus flatten ) ) end
>
> In the attachment you will see the my sample input and the parsing
> result collection in which the last element is "|abc", but I want the
> #plusLazy: (or #plusGreedy: ?) to consume the $| delimiter so it
> results in
>
> #($> 'gi' $| #('648') $| 'abc')
>
> any suggestions?
>
> Cheers,
>
> --
> Hernán Morales
> Information Technology Manager,
> Institute of Veterinary Genetics.
> National Scientific and Technical Research Council (CONICET).
> La Plata (1900), Buenos Aires, Argentina.
> Telephone: +54 (0221) 421-1799.
> Internal: 422
> Fax: 425-7980 or 421-1799.
>
> _______________________________________________
> Pharo-users mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users
>
>



--
Lukas Renggli
www.lukas-renggli.ch

_______________________________________________
Pharo-users mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users
Reply | Threaded
Open this post in threaded view
|

Re: Consuming delimiters in PetitParser

hernanmd
Hi Lukas,

Yes, the simpler method worked fine, I was confused by the fact that
after typing the first digit in the #digit parser (it could be many
digits but no alphabetical characters in that part) the sticky
evaluator tolds me "punctuation expected". For now I just need to
parse one identifier.
Thanks!

2010/10/8 Lukas Renggli <[hidden email]>:

> Hi Hernán,
>
> I am not really sure what you want to express with the following parser?
>
>    ( #digit asParser plus flatten plusLazy: $| asParser )
>
> Having multiple quantifications (#optional, #plus, #plusLazy:,
> #plusGreedy:, #star, #starLazy:, #plusGreedy:, #times:, #min:,
> #min:max:) in a row does not really make sense and seems likely a bug.
> In your example the #plusLazy: will only be able to consume the
> receiver once. As you can read in the method comment of #plusLazy: the
> argument is not consumed, thus whatever follows will have to consume
> it.
>
> If you want to accept multiple numbers like in
>
>    >gi|123|456|789|abc
>
> then you should likely use #separatedBy: like in
>
>   p := ( $> asParser , 'gi' asParser , $| asParser ,
>        ( #digit asParser plus flatten separatedBy: $| asParser ) ,
>        ($| asParser) ,
>        ( (#letter asParser / #digit asParser / #space asParser /
> #punctuation asParser ) asParser plus flatten ) ) end.
>
> If you just want one number then the simpler
>
>   p := ( $> asParser , 'gi' asParser , $| asParser ,
>        ( #digit asParser plus flatten ) ,
>        ($| asParser) ,
>        ( (#letter asParser / #digit asParser / #space asParser /
> #punctuation asParser ) asParser plus flatten ) ) end.
>
> will do.
>
> Let us know if this solves your problem.
>
> Lukas
>
>
> On 8 October 2010 21:29, Hernán Morales Durand <[hidden email]> wrote:
>> Hi all,
>>
>> I'm writing a parser like the following
>>
>> ( $> asParser , 'gi' asParser , $| asParser ,
>>  ( #digit asParser plus flatten plusLazy: $| asParser ) ,
>>        ( (#letter asParser / #digit asParser / #space asParser /
>> #punctuation asParser ) asParser plus flatten ) ) end
>>
>> In the attachment you will see the my sample input and the parsing
>> result collection in which the last element is "|abc", but I want the
>> #plusLazy: (or #plusGreedy: ?) to consume the $| delimiter so it
>> results in
>>
>> #($> 'gi' $| #('648') $| 'abc')
>>
>> any suggestions?
>>
>> Cheers,
>>
>> --
>> Hernán Morales
>> Information Technology Manager,
>> Institute of Veterinary Genetics.
>> National Scientific and Technical Research Council (CONICET).
>> La Plata (1900), Buenos Aires, Argentina.
>> Telephone: +54 (0221) 421-1799.
>> Internal: 422
>> Fax: 425-7980 or 421-1799.
>>
>> _______________________________________________
>> Pharo-users mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users
>>
>>
>
>
>
> --
> Lukas Renggli
> www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-users mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users
>



--
Hernán Morales
Information Technology Manager,
Institute of Veterinary Genetics.
National Scientific and Technical Research Council (CONICET).
La Plata (1900), Buenos Aires, Argentina.
Telephone: +54 (0221) 421-1799.
Internal: 422
Fax: 425-7980 or 421-1799.

_______________________________________________
Pharo-users mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-users