Smalltalk › Cincom › VisualWorks

Xtreams Parsing

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Steffen Märcker

Xtreams Parsing

Hi,

if parsing a certain string according to a grammar fails, how do I get the
position of that error? I want to give the user feedback where to start
looking for mistakes.

Regards, Steffen
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Michael Lucas-Smith-2

Re: Xtreams Parsing

The short answer is you don't. Instead, where alternates can fail but must not, add a final alternate that is an error node. It must match some content in such a way that you can "continue" the stream. If you intend to not continue the stream, at that point you can just consume all the characters up to the end. Since you hace matched your failure alternative, the actor can announce an error, record its position etc. You can even use the failure match to try and do a recovery, so that more of the stream can be processed.

Cheers,
Michael

On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:

> Hi,
>
> if parsing a certain string according to a grammar fails, how do I get the
> position of that error? I want to give the user feedback where to start
> looking for mistakes.
>
> Regards, Steffen
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steffen Märcker

Re: Xtreams Parsing

Thanks Michael. Is my observation correct that the PEG grammar includes
DefinitionError for this purpose?

Ciao, Steffen

Am 07.09.2011, 17:32 Uhr, schrieb Michael Lucas-Smith
<[hidden email]>:

> The short answer is you don't. Instead, where alternates can fail but
> must not, add a final alternate that is an error node. It must match
> some content in such a way that you can "continue" the stream. If you
> intend to not continue the stream, at that point you can just consume
> all the characters up to the end. Since you hace matched your failure
> alternative, the actor can announce an error, record its position etc.
> You can even use the failure match to try and do a recovery, so that
> more of the stream can be processed.
>
> Cheers,
> Michael
>
> On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:
>
>> Hi,
>>
>> if parsing a certain string according to a grammar fails, how do I get
>> the
>> position of that error? I want to give the user feedback where to start
>> looking for mistakes.
>>
>> Regards, Steffen
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Michael Lucas-Smith-2

Re: Xtreams Parsing

Yep, that's right.

2011/9/8 Steffen Märcker <[hidden email]>

Thanks Michael. Is my observation correct that the PEG grammar includes
DefinitionError for this purpose?

Ciao, Steffen

Am 07.09.2011, 17:32 Uhr, schrieb Michael Lucas-Smith
<[hidden email]>:

> The short answer is you don't. Instead, where alternates can fail but
> must not, add a final alternate that is an error node. It must match
> some content in such a way that you can "continue" the stream. If you
> intend to not continue the stream, at that point you can just consume
> all the characters up to the end. Since you hace matched your failure
> alternative, the actor can announce an error, record its position etc.
> You can even use the failure match to try and do a recovery, so that
> more of the stream can be processed.
>
> Cheers,
> Michael
>
> On Sep 7, 2011, at 8:24 AM, Steffen Märcker wrote:
>
>> Hi,
>>
>> if parsing a certain string according to a grammar fails, how do I get
>> the
>> position of that error? I want to give the user feedback where to start
>> looking for mistakes.
>>
>> Regards, Steffen
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Carl Gundel

[7.8] readWriteStream position wrong?

How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it? Is it related to multi byte character sets or something? Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Carl Gundel

Re: [7.8] readWriteStream position wrong?

Oh, I forgot to mention it's a stream on a file. Also for what it's worth my code works on 7.4 without this weirdness. Is this a clue?

To create the stream I'm just sending asFilename readWriteStream. Then I set the position to the beginning of a record and use nextPutAll: to write the record. It writes more characters than the size of the string.

Thanks,

-Carl

On Sep 10, 2011, at 5:05 PM, Carl Gundel wrote:

> How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it? Is it related to multi byte character sets or something? Is there some trick to setting up random access file streams for basic ASCII data?
>
> What ever happened to the principle of least surprise?
>
> -Carl
> http://www.libertybasic.com
> http://www.runbasic.com
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Carl Gundel

Re: [7.8] readWriteStream position wrong?

In reply to this post by Carl Gundel

Thanks Eliot,

I tried the following when opening my file but it made no difference.

(myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:

On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it? Is it related to multi byte character sets or something? Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
best,
Eliot

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Carl Gundel

Re: [7.8] readWriteStream position wrong?

Yeah, it turns out to be as simple as it writing CRLF for every CR in my data. Setting UTF_8 encoding doesn't seem to fix it. What's the right way to prevent this?

-Carl

On Sep 10, 2011, at 5:47 PM, Carl Gundel wrote:

Thanks Eliot,

I tried the following when opening my file but it made no difference.

(myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:

On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:
How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it? Is it related to multi byte character sets or something? Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
best,
Eliot

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steven Kelly

Re: [7.8] readWriteStream position wrong?

In reply to this post by Carl Gundel

Do you mean your stream is advancing >650 bytes for 650 characters? That's actually not so surprising - even just a single "Character cr" is worth two bytes on Windows. As Eliot hinted, a non-ASCII character (accented characters, curly quotes, en or em dash etc.) will also be encoded into more than one byte in a lot of common encodings. Even an ASCII character can be encoded into more than one byte in some common encodings. 7.8 has a lot more facilities for that kind of thing than 7.4, because these days the assumption that all users of programs are English speakers in the US is rarely valid.

By default, VW uses the platform's default encoding and line end character. Try stepping into the lower levels of the code to see how these things work, or then read the InternationalGuide manual for a brief introduction.

HTH,

Steve

From: [hidden email] on behalf of Carl Gundel
Sent: Sun 11/09/2011 00:47
To: Eliot Miranda
Cc: VWNC
Subject: Re: [vwnc] [7.8] readWriteStream position wrong?

Thanks Eliot,

I tried the following when opening my file but it made no difference.

(myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:

On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:

How is it reasonable to expect the position of readWriteStream to advance more than 650 characters when writing a 650 character long string to it? Is it related to multi byte character sets or something? Is there some trick to setting up random access file streams for basic ASCII data?

What ever happened to the principle of least surprise?

UTF-8

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
best,
Eliot

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Boris Popov, DeepCove Labs (SNN)

Re: [7.8] readWriteStream position wrong?

In reply to this post by Carl Gundel

Carl,

#lineEndTransparent, but also see #lineEndCR, #lineEndLF, #lineEndCRLF and #lineEndAuto.

HTH,

-Boris

From: [hidden email] [mailto:[hidden email]] On Behalf Of Carl Gundel
Sent: Saturday, September 10, 2011 6:23 PM
To: VWNC
Subject: Re: [vwnc] [7.8] readWriteStream position wrong?

Yeah, it turns out to be as simple as it writing CRLF for every CR in my data. Setting UTF_8 encoding doesn't seem to fix it. What's the right way to prevent this?

-Carl

On Sep 10, 2011, at 5:47 PM, Carl Gundel wrote:

Thanks Eliot,

I tried the following when opening my file but it made no difference.

(myFilenameString asFilename withEncoding: #UTF_8) readWriteStream

Any ideas?

-Carl

On Sep 10, 2011, at 5:13 PM, Eliot Miranda wrote:

On Sat, Sep 10, 2011 at 2:05 PM, Carl Gundel <[hidden email]> wrote:

UTF-8

-Carl
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
best,

Eliot

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc