Xtreams Parsing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Xtreams Parsing

Steffen Märcker
Hi,

if parsing a certain string according to a grammar fails, how do I get the  
position of that error? I want to give the user feedback where to start  
looking for mistakes.

Regards, Steffen
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams Parsing

Michael Lucas-Smith-2
The short answer is you don't. Instead, where alternates can fail but must not, add a final alternate that is an error node. It must match some content in such a way that you can "continue" the stream. If you intend to not continue the stream, at that point you can just consume all the characters up to the end. Since you hace matched your failure alternative, the actor can announce an error, record its position etc. You can even use the failure match to try and do a recovery, so that more of the stream can be processed.

Cheers,
Michael

On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:

> Hi,
>
> if parsing a certain string according to a grammar fails, how do I get the  
> position of that error? I want to give the user feedback where to start  
> looking for mistakes.
>
> Regards, Steffen
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams Parsing

Steffen Märcker
Thanks Michael. Is my observation correct that the PEG grammar includes  
DefinitionError for this purpose?

Ciao, Steffen


Am 07.09.2011, 17:32 Uhr, schrieb Michael Lucas-Smith  
<[hidden email]>:

> The short answer is you don't. Instead, where alternates can fail but  
> must not, add a final alternate that is an error node. It must match  
> some content in such a way that you can "continue" the stream. If you  
> intend to not continue the stream, at that point you can just consume  
> all the characters up to the end. Since you hace matched your failure  
> alternative, the actor can announce an error, record its position etc.  
> You can even use the failure match to try and do a recovery, so that  
> more of the stream can be processed.
>
> Cheers,
> Michael
>
> On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:
>
>> Hi,
>>
>> if parsing a certain string according to a grammar fails, how do I get  
>> the
>> position of that error? I want to give the user feedback where to start
>> looking for mistakes.
>>
>> Regards, Steffen
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams Parsing

Michael Lucas-Smith-2
Yep, that's right.

2011/9/8 Steffen Märcker <[hidden email]>
Thanks Michael. Is my observation correct that the PEG grammar includes
DefinitionError for this purpose?

Ciao, Steffen


Am 07.09.2011, 17:32 Uhr, schrieb Michael Lucas-Smith
<[hidden email]>:

> The short answer is you don't. Instead, where alternates can fail but
> must not, add a final alternate that is an error node. It must match
> some content in such a way that you can "continue" the stream. If you
> intend to not continue the stream, at that point you can just consume
> all the characters up to the end. Since you hace matched your failure
> alternative, the actor can announce an error, record its position etc.
> You can even use the failure match to try and do a recovery, so that
> more of the stream can be processed.
>
> Cheers,
> Michael
>
> On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:
>
>> Hi,
>>
>> if parsing a certain string according to a grammar fails, how do I get
>> the
>> position of that error? I want to give the user feedback where to start
>> looking for mistakes.
>>
>> Regards, Steffen
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

[7.8] readWriteStream position wrong?

Carl Gundel
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [7.8] readWriteStream position wrong?

Carl Gundel
Oh, I forgot to mention it's a stream on a file.  Also for what it's worth my code works on 7.4 without this weirdness.  Is this a clue?

To create the stream I'm just sending asFilename readWriteStream.  Then I set the position to the beginning of a record and use nextPutAll: to write the record.  It writes more characters than the size of the string.

Thanks,

-Carl

On Sep 10, 2011, at 5:05 PM, Carl Gundel wrote:

> How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?
>
> What ever happened to the principle of least surprise?
>
> -Carl
> http://www.libertybasic.com
> http://www.runbasic.com
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [7.8] readWriteStream position wrong?

Carl Gundel
In reply to this post by Carl Gundel
Thanks Eliot,

I tried the following when opening my file but it made no difference.

    (myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:



On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8
 

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



--
best,
Eliot



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [7.8] readWriteStream position wrong?

Carl Gundel
Yeah, it turns out to be as simple as it writing CRLF for every CR in my data.  Setting UTF_8 encoding doesn't seem to fix it.  What's the right way to prevent this?

-Carl

On Sep 10, 2011, at 5:47 PM, Carl Gundel wrote:

Thanks Eliot,

I tried the following when opening my file but it made no difference.

    (myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:



On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8
 

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



--
best,
Eliot


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [7.8] readWriteStream position wrong?

Steven Kelly
In reply to this post by Carl Gundel
Do you mean your stream is advancing >650 bytes for 650 characters? That's actually not so surprising - even just a single "Character cr" is worth two bytes on Windows. As Eliot hinted, a non-ASCII character (accented characters, curly quotes, en or em dash etc.) will also be encoded into more than one byte in a lot of common encodings. Even an ASCII character can be encoded into more than one byte in some common encodings. 7.8 has a lot more facilities for that kind of thing than 7.4, because these days the assumption that all users of programs are English speakers in the US is rarely valid.
 
By default, VW uses the platform's default encoding and line end character. Try stepping into the lower levels of the code to see how these things work, or then read the InternationalGuide manual for a brief introduction.
 
HTH,
Steve

 

From: [hidden email] on behalf of Carl Gundel
Sent: Sun 11/09/2011 00:47
To: Eliot Miranda
Cc: VWNC
Subject: Re: [vwnc] [7.8] readWriteStream position wrong?

Thanks Eliot,

I tried the following when opening my file but it made no difference.

    (myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:



On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8
 

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



--
best,
Eliot



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [7.8] readWriteStream position wrong?

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Carl Gundel

Carl,

 

#lineEndTransparent, but also see #lineEndCR, #lineEndLF, #lineEndCRLF and #lineEndAuto.

 

HTH,

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Carl Gundel
Sent: Saturday, September 10, 2011 6:23 PM
To: VWNC
Subject: Re: [vwnc] [7.8] readWriteStream position wrong?

 

Yeah, it turns out to be as simple as it writing CRLF for every CR in my data.  Setting UTF_8 encoding doesn't seem to fix it.  What's the right way to prevent this?

 

-Carl

 

On Sep 10, 2011, at 5:47 PM, Carl Gundel wrote:



Thanks Eliot,

 

I tried the following when opening my file but it made no difference.

 

    (myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

 

Any ideas?

 

-Carl

 

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:



 

On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:

How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it?  Is it related to multi byte character sets or something?  Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

 

UTF-8

 


-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



 

--
best,

Eliot

 

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc