Smalltalk › Frameworks & Tools › Seaside › Seaside General

Scaling Seaside apps (was: About SToR)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

29 messages Options

Chris Muller

Scaling Seaside apps (was: About SToR)

> > What made you decide to use a session pool rather than have a magma
>
> > session per Seaside session? Is any of the stuff you've built to
> marry
>
> Currently we have a compromise - we use one magma session per Seaside
> session, but it is allocated from a pool. This means we cut out the
> session creation time, but more importantly - the session is "hot" so
> we
> keep the cached objects of the session.
>
> But the problem with using one Magma session per Seaside session
> (which
> thus remains for us) is that if you have a large persistent domain
> model
> AND you wish to cache quite a bit of it - then you get:
>
> numberOfObjectsInRAM = numberOfSeasideSessions *
> objectsCachedPerSession
>
> So if you have 100 concurrent sessions and 10000 objects cached per
> session you get a million objects. Ouch.

This is just a physical memory ouch, right? An internal structure of
only 10000 objects cached per session will keep each session from
getting bogged down with a huge dictionary cache, not to mention more
supportive of concurrent processing. I really think a caching
100-thousand objects in one shared session would hurt worse..

Let us clarify that this is a general web programming ouch or, more
generically, a "three-tier" ouch rather than anything specific to using
Magma. No matter what DB is used, you have to choose to share the
model between sessions or each session works on its own copy of the
model (or some hybrid of the two).

Sharing the model requires the need for thread-safety throughout, not
to mention throwing out commit-conflict detection (that one clients
changes affect the db transactions of other clients) and consistent db
views.. Scary!

Has anyone tried the suggested approach to scaling with Magma; using
multiple images, CPUs, servers? This permits the simple 1:1
application architecture, is ultimately more scalable and, probably
more economical in the end, because cost of h/w < cost of complicated
s/w architectures..

Programming to the simple, one session per web session, permits your
app to scale by simply adding more hardware with absolutely _no
changes_ to the code.

In fact, the only thing you have to do is go into each Seaside
configuration panel and point the "Magma DB location" to the remotely
hosted database rather than a locally-hosted one. This has been
demonstrated via the "Magma Seaside" demo available..).

And you don't necessarily need additional hardware to at least *try*
this approach, in fact, just multiple images on the same machine would
prove its viability and leverage any multi-core abilities of that
machine in the process. I hope you will at least try it out and report
back how it went..

Regards,
Chris

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Göran Krampe

Re: Scaling Seaside apps (was: About SToR)

Hi!

Chris Muller <[hidden email]> wrote:

> > > What made you decide to use a session pool rather than have a magma
> >
> > > session per Seaside session? Is any of the stuff you've built to
> > marry
> >
> > Currently we have a compromise - we use one magma session per Seaside
> > session, but it is allocated from a pool. This means we cut out the
> > session creation time, but more importantly - the session is "hot" so
> > we
> > keep the cached objects of the session.
> >
> > But the problem with using one Magma session per Seaside session
> > (which
> > thus remains for us) is that if you have a large persistent domain
> > model
> > AND you wish to cache quite a bit of it - then you get:
> >
> > numberOfObjectsInRAM = numberOfSeasideSessions *
> > objectsCachedPerSession
> >
> > So if you have 100 concurrent sessions and 10000 objects cached per
> > session you get a million objects. Ouch.
>
> This is just a physical memory ouch, right? An internal structure of

Yes.

> only 10000 objects cached per session will keep each session from
> getting bogged down with a huge dictionary cache, not to mention more
> supportive of concurrent processing. I really think a caching
> 100-thousand objects in one shared session would hurt worse..

Ok. But again, 10000 was just a number out of thin air. Perhaps I want
100000 - I don't know. But I do know that we will need to be able to
support about 100 concurrent users approx.

> Let us clarify that this is a general web programming ouch or, more
> generically, a "three-tier" ouch rather than anything specific to using
> Magma. No matter what DB is used, you have to choose to share the
> model between sessions or each session works on its own copy of the
> model (or some hybrid of the two).

Well, I presume you could *in theory* have some kind of copy-on-write
mechanism thus sharing non-modified objects and still maintaining the
principle of different sessions maintaining their own logical view.
Doesn't GemStone use shadow pages in some kind of copy-on-write scheme
for example? Not sure, my memory may be wrong.

> Sharing the model requires the need for thread-safety throughout, not
> to mention throwing out commit-conflict detection (that one clients
> changes affect the db transactions of other clients) and consistent db
> views.. Scary!

Well, the idea was to *not* perform modifications in the shared Magma
session. They would be performed in separate allocated sessions. And
AFAIK thread safety is not an issue in a readonly model. And the same
goes for conflict detection etc - still should work fine.

But this approach was more like a mind experiment in how we could make
the RAM requirement say 50 times lower. :)

> Has anyone tried the suggested approach to scaling with Magma; using
> multiple images, CPUs, servers? This permits the simple 1:1
> application architecture, is ultimately more scalable and, probably
> more economical in the end, because cost of h/w < cost of complicated
> s/w architectures..

Well, we probably will have to. But we are not there yet. ;)

> Programming to the simple, one session per web session, permits your
> app to scale by simply adding more hardware with absolutely _no
> changes_ to the code.
>
> In fact, the only thing you have to do is go into each Seaside
> configuration panel and point the "Magma DB location" to the remotely
> hosted database rather than a locally-hosted one. This has been
> demonstrated via the "Magma Seaside" demo available..).
>
> And you don't necessarily need additional hardware to at least *try*
> this approach, in fact, just multiple images on the same machine would
> prove its viability and leverage any multi-core abilities of that
> machine in the process. I hope you will at least try it out and report
> back how it went..

Sure, we don't intend to make any experiments in session management at
this point. I was just "airing a concern" I have. :) We will see.

Btw, the approach of wrapping each request in an commit block (or as we
do - just an abort before performing the request (we run modifications
to the model in separately started transactions)) has a noticable "feel"
penalty.

Seaside typically does a redirect so each "click" will result in two
http requests - each doing an abort. And even if nothing at all has
changed in Magma (and we are still running Magma in the same image even,
so there is no roundtrip involved) it gives a sluggish feeling. Doing a
cheap trick (which will not work if we go multi-image) by using a
transaction counter in the image we can avoid making the aborts if we
know that there have been no transactions since last abort. This
improved the "feel" a LOT.

I did profile the aborts trying to figure out why they take so long even
when there are no changes - but can't recall right now what it was.

regards, Göran
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Ramon Leon-5

RE: Scaling Seaside apps (was: About SToR)

> Btw, the approach of wrapping each request in an commit block
> (or as we do - just an abort before performing the request
> (we run modifications to the model in separately started
> transactions)) has a noticable "feel"
> penalty.
>
> Seaside typically does a redirect so each "click" will result
> in two http requests - each doing an abort. And even if
> nothing at all has changed in Magma (and we are still running
> Magma in the same image even, so there is no roundtrip
> involved) it gives a sluggish feeling. Doing a cheap trick
> (which will not work if we go multi-image) by using a
> transaction counter in the image we can avoid making the
> aborts if we know that there have been no transactions since
> last abort. This improved the "feel" a LOT.
>
> regards, Göran

I abandoned this approach as well, committing on every request seems to have
a huge penalty when using GOODS. Seems you can't hide transactions
completely. I just added a commit method on session that delegates to the
db, and simply call self session commit whenever I feel it necessary.
Response times are far better and snappier than wrapping the entire request
in a commit.

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Chris Muller

Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Göran Krampe

> Well, the idea was to *not* perform modifications in the shared Magma
> session. They would be performed in separate allocated sessions.

Sounds interesting. How would you "find" the object(s) to be modified
in the other (mutator) session? An oid lookup, perhaps?

> Btw, the approach of wrapping each request in an commit block (or as
> we
> do - just an abort before performing the request (we run
> modifications
> to the model in separately started transactions)) has a noticable
> "feel"
> penalty.

Did you set refreshPersistentObjectsEvenWhenChangedOnlyByMe: true?
That can kill abort performance, I would try really hard to leave that
off.

Did you know, when you leave that option off, you can:

- make changes to the model, outside of a transaction
- send #begin, which will refresh only the objects which were changed
by others, not revert your own changes
- immediately send #commit, which will commit your changes

This satisfies the "I forgot to begin" use-case, but maybe also a
useful alternative to the performance-killing
refreshPersistentObjectsEvenWhenChangedOnlyByMe mode..

> I did profile the aborts trying to figure out why they take so long
> even
> when there are no changes - but can't recall right now what it was.

If the profile shows its in MaTransaction>>#restore then the
refreshPersistentObjectsEvenWhenChangedOnlyByMe was turned on, don't do
that.

Otherwise, please post the profile to the Magma list, I'll look at it
promptly.

Regards,
Chris
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Darius Clarke

Re: Scaling Seaside apps (was: About SToR)

Could, should Magma also use Amazon S3
http://www.amazon.com/s3
as a storage device?

I've not thought through what it would take to optimize for it, but it
might reduce a lot of data/code/image persistency headaches.

Cheers,
Darius
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Chris Muller

Re: Scaling Seaside apps (was: About SToR)

Very interesting. It looks like something that Squeak itself could
benefit from, wrapped in a Stream or Flow interface.

Whether an I/O intensive application like a DB server could benefit
from that is hard to say, those servers typically want to have close
(read: quick) access to the db files, I'm pretty sure there would be
performance challenges with remote primitive access.

It might be good for backups though..

--- Darius Clarke <[hidden email]> wrote:

> Could, should Magma also use Amazon S3
> http://www.amazon.com/s3
> as a storage device?
>
> I've not thought through what it would take to optimize for it, but
> it
> might reduce a lot of data/code/image persistency headaches.
>
> Cheers,
> Darius
>
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Chris Muller

Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Darius Clarke

Wow, it actually provides "key" access to ByteArrays.. I didn't see
that at first. Very interesting..!

--- Darius Clarke <[hidden email]> wrote:

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Blanchard, Todd

RE: Scaling Seaside apps (was: About SToR)

In reply to this post by Chris Muller

You should realize that S3 provides availability over consistency. It is quite possible that you can put a chunk of data, ask for it back, and get the previous version due to propagation delays across the replicated store. Great for backups, not so hot for real time usage.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Chris Muller
Sent: Tuesday, August 01, 2006 7:42 AM
To: The Squeak Enterprise Aubergines Server - general discussion.
Subject: Re: Scaling Seaside apps (was: [Seaside] About SToR)

Wow, it actually provides "key" access to ByteArrays.. I didn't see that at first. Very interesting..!

--- Darius Clarke <[hidden email]> wrote:

> Could, should Magma also use Amazon S3
> http://www.amazon.com/s3
> as a storage device?
>
> I've not thought through what it would take to optimize for it, but it
> might reduce a lot of data/code/image persistency headaches.
>
> Cheers,
> Darius
>
>

Blanchard, Todd

RE: Scaling Seaside apps (was: About SToR)

In reply to this post by Chris Muller

I should also point out that the ability to make torrents makes it a great publishing mechanism.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Blanchard, Todd
Sent: Tuesday, August 01, 2006 11:00 AM
To: [hidden email]; The Squeak Enterprise Aubergines Server - general discussion.
Subject: RE: Scaling Seaside apps (was: [Seaside] About SToR)

You should realize that S3 provides availability over consistency. It is quite possible that you can put a chunk of data, ask for it back, and get the previous version due to propagation delays across the replicated store. Great for backups, not so hot for real time usage.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Chris Muller
Sent: Tuesday, August 01, 2006 7:42 AM
To: The Squeak Enterprise Aubergines Server - general discussion.
Subject: Re: Scaling Seaside apps (was: [Seaside] About SToR)

Wow, it actually provides "key" access to ByteArrays.. I didn't see that at first. Very interesting..!

--- Darius Clarke <[hidden email]> wrote:

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Avi Bryant

S3 (was Re: Scaling Seaside apps)

On Aug 1, 2006, at 11:03 AM, Blanchard, Todd wrote:

> I should also point out that the ability to make torrents makes it
> a great publishing mechanism.

In the Squeak world, a more appropriate use of S3 might be as a
Monticello repository (Colin suggested this to me recently). Its
architecture and permissions model is probably just about right for
source control.

Avi
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Colin Putney

S3 (was Scaling Seaside apps)

In reply to this post by Blanchard, Todd

On Aug 1, 2006, at 2:00 PM, Blanchard, Todd wrote:

> You should realize that S3 provides availability over consistency.
> It is quite possible that you can put a chunk of data, ask for it
> back, and get the previous version due to propagation delays across
> the replicated store. Great for backups, not so hot for real time
> usage.

Ah, but it's a very good match for Monticello usage patterns. In
Monticello, each version of a package is immutable, and identified by
a UUID. This means that clients don't have to coordinate when
creating new versions and we can't get "old" versions. A version is
either available or it isn't; its content can never change.

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Jeremy Shute

Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Chris Muller

Another datapoint, for those interested...

I hacked up a quick implementation of some parsing code I have in several
other languages, using GLORP. The process is single-threaded, but forked
off as a background thread that is monitored from a Seaside "control
panel".

There are several, somewhat structured scrapings from web pages that I
want stored on disk. This data should be approximately 1GB when the
process finishes. I wrote a lightweight proxy for GLORP that makes
session access atomic, and everything works like a charm.

What I was amazed to find was that the Squeak image, with one process
running mind you, is CPU limited! (I have tried a variety of different
priorities for the forked process, including the IO priorities.) It's
been difficult for me to figure out exactly how to count the number of
message sends (looking in the Seaside profiler, I know it's quite
possible), however, looking at the Process panel seems to point the finger
at GLORP, constructing a ton of queries on-the-fly. Opening the task
manager and watching bandwidth consumption agrees... Brief periods of
activity followed by pauses as my program tries to figure out what to do
with the data it pulled. The running Postgres image, too, is sitting
there with 5% CPU usage, not breaking a sweat.

GLORP is a dream to work with. It almost makes those spurious
object-access patterns look free. :-) But, if you don't want to store a
whole table in memory and you don't want to go twiddling down the whole
B-tree every time you do an object access, you want a cursor, and I
haven't quite figured out how to get that to work...

On a side note, I achieved 10-12x the throughput with my prototype program
(written in a different language and dumping the serialized representation
to disk), and I have moved on to yet another language to finish the job.
*Sigh* One day I'll be able to use Squeak.

Jeremy

> Very interesting. It looks like something that Squeak itself could
> benefit from, wrapped in a Stream or Flow interface.
>
> Whether an I/O intensive application like a DB server could benefit
> from that is hard to say, those servers typically want to have close
> (read: quick) access to the db files, I'm pretty sure there would be
> performance challenges with remote primitive access.
>
> It might be good for backups though..
>
> --- Darius Clarke <[hidden email]> wrote:
>
>> Could, should Magma also use Amazon S3
>> http://www.amazon.com/s3
>> as a storage device?
>>
>> I've not thought through what it would take to optimize for it, but
>> it
>> might reduce a lot of data/code/image persistency headaches.
>>
>> Cheers,
>> Darius
>>
>>
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>

GPG PUBLIC KEY: 0xA2B36CE5

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Yanni Chiu

Re: Scaling Seaside apps (was: About SToR)

Jeremy Shute wrote:

> GLORP is a dream to work with. It almost makes those spurious
> object-access patterns look free. :-) But, if you don't want to store a
> whole table in memory and you don't want to go twiddling down the whole
> B-tree every time you do an object access, you want a cursor, and I
> haven't quite figured out how to get that to work...
>
> On a side note, I achieved 10-12x the throughput with my prototype program
> (written in a different language and dumping the serialized representation
> to disk), and I have moved on to yet another language to finish the job.
> *Sigh* One day I'll be able to use Squeak.

Do you have to have GLORP in the critical path?
If you only have a few tables, maybe coding the SQL directly
is possible. Or, use GLORP for the bulk of your model, but
isolate the performance critical portion of the model in a
separate subsystem, and use custom SQL for that portion.

Maybe GLORP is not appropriate for your data set.
Your use case does not sound ideal for any O-R framework.
Even in Java using Hibernate O-R, the recommend you NOT
use it for bulk data processing. But they do suggest a
workaround suitable for some cases, which is to use a
"report" query. What that does is bypass all the object
instantiation and caching framework needed for O-R,
(i.e. you don't need to create an actual object,
you just want the data values to push out a report).

Having said that, unless you use cursors, the postgres
driver will pull the entire result set into memory.
This behaviour is an artifact of the communication protocol
between the postgres server and a client process. However,
the newer version 3 of this protocol does not pull in the
entire data set. I'd be interested to know whether you can
in fact avoid pulling in the entire data set by using cursors,
with the current postgres driver (which implements version 2
protocol).

Assuming you can get cursors working, I'd be surprised if
you couldn't match the 10-12x increase you got using another
language. Basically, the postgres driver just pulls bytes off
the socket and makes arrays of strings.

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Richard Huxton

Re: Re: Scaling Seaside apps (was: About SToR)

Yanni Chiu wrote:
> Having said that, unless you use cursors, the postgres
> driver will pull the entire result set into memory.
> This behaviour is an artifact of the communication protocol
> between the postgres server and a client process. However,
> the newer version 3 of this protocol does not pull in the
> entire data set. I'd be interested to know whether you can
> in fact avoid pulling in the entire data set by using cursors,
> with the current postgres driver (which implements version 2
> protocol).

I'm not sure this has changed in v3 of the protocol, PG has always
returned all the rows you request. I certainly can't find any mention of
it here:
http://www.postgresql.org/docs/8.1/static/protocol-changes.html

As you say, cursors sound like the way to go.

--
Richard Huxton
Archonet Ltd
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Yanni Chiu

Re: Scaling Seaside apps (was: About SToR)

Richard Huxton wrote:
> I'm not sure this has changed in v3 of the protocol, PG has always
> returned all the rows you request. I certainly can't find any mention of
> it here:
> http://www.postgresql.org/docs/8.1/static/protocol-changes.html

I looked at implementing the v3 protocol when it was introduced
(maybe 2 or 3 years ago). I recall that it didn't quite make
sense to me that cursors should already work with the v2 protocol,
yet it seemed that the v3 protocol was needed to get partial result
sets. After re-reading the spec, I agree with you - PG does return
the rows you request. So, like you said, cursors is what you need
to avoid filling up your memory with a large result set, and this
should work already with the existed driver.

Now the part that got me confused was "Extended Query" at:
http://www.postgresql.org/docs/8.1/static/protocol-flow.html#AEN60506
where it says:
Once a portal exists, it can be executed using an Execute message.
The Execute message specifies the portal name (empty string denotes
the unnamed portal) and a maximum result-row count (zero meaning "fetch all rows").

The Extended Query is new in the v3 protocol. That section, and some
other words around message synchronization led me to conclude that
the protocol had changed a lot. Now, it seems to me that it is probably
just a matter of adding the new message types, and altering the state
machine. However, adding the changes to a single state machine may
start to get ugly (i.e. unmanagable).

Do you have any sense of when (or if) the v2 protocol support
on the server side would be discontinued?

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Richard Huxton

Re: Re: Scaling Seaside apps (was: About SToR)

Yanni Chiu wrote:

> Richard Huxton wrote:
>> I'm not sure this has changed in v3 of the protocol, PG has always
>> returned all the rows you request. I certainly can't find any mention
>> of it here:
>> http://www.postgresql.org/docs/8.1/static/protocol-changes.html
>
> I looked at implementing the v3 protocol when it was introduced
> (maybe 2 or 3 years ago). I recall that it didn't quite make
> sense to me that cursors should already work with the v2 protocol,
> yet it seemed that the v3 protocol was needed to get partial result
> sets. After re-reading the spec, I agree with you - PG does return
> the rows you request. So, like you said, cursors is what you need
> to avoid filling up your memory with a large result set, and this
> should work already with the existed driver.
>
> Now the part that got me confused was "Extended Query" at:
> http://www.postgresql.org/docs/8.1/static/protocol-flow.html#AEN60506
> where it says:
> Once a portal exists, it can be executed using an Execute message.
> The Execute message specifies the portal name (empty string denotes
> the unnamed portal) and a maximum result-row count (zero meaning
> "fetch all rows").
>
> The Extended Query is new in the v3 protocol. That section, and some
> other words around message synchronization led me to conclude that
> the protocol had changed a lot. Now, it seems to me that it is probably
> just a matter of adding the new message types, and altering the state
> machine. However, adding the changes to a single state machine may
> start to get ugly (i.e. unmanagable).

I'd be surprised if it wasn't fairly straightforward to have the state
machine drop back from v3 to v2. The PG developers try to keep it simple
to connect between versions.

> Do you have any sense of when (or if) the v2 protocol support
> on the server side would be discontinued?

I don't think it's being dropped in the next release (8.2), so you're
safe for at least 18 months I'd say.

--
Richard Huxton
Archonet Ltd
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Jeremy Shute

Re: Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Yanni Chiu

On 8/2/06, Yanni Chiu <[hidden email]> wrote:

Jeremy Shute wrote:

> GLORP is a dream to work with.  It almost makes those spurious
> object-access patterns look free.  :-)  But, if you don't want to store a
> whole table in memory and you don't want to go twiddling down the whole
> B-tree every time you do an object access, you want a cursor, and I
> haven't quite figured out how to get that to work...
>
> On a side note, I achieved 10-12x the throughput with my prototype program
> (written in a different language and dumping the serialized representation
> to disk), and I have moved on to yet another language to finish the job.
> *Sigh*  One day I'll be able to use Squeak.

Do you have to have GLORP in the critical path?
If you only have a few tables, maybe coding the SQL directly
is possible. Or, use GLORP for the bulk of your model, but
isolate the performance critical portion of the model in a
separate subsystem, and use custom SQL for that portion.

Yes, it's very much in the critical path. I'm sorry but I'm still amazed that it can't assemble and ship queries to Postgres as fast as I can get data from a cable modem. That was a huge shocker -- I've got a weak cable connection on one side and a disk on the other, and rifling through strings and objects in RAM is the issue???

I agree that there are solutions which involve direct-SQL access, and making a mess of otherwise clean code (but a well isolated mess, of course). I could also simply contribute to GLORP in order to make it better. I would do this in a second if I thought it would get me from point A to point B, GLORP is great software!

As an off-topic side-note, in order to GET from point A to point B, I addressed the problem by developing a "new to me" paradigm for dealing with data of this type. So far, I think I did the right thing. Like SQL (and unlike serialization or the Prevayler approach), multiple programs can get access to the same objects from out-of-core datastructures. But unlike SQL, the indices require ~1 disk seeks to get at objects after a cache-miss ( i.e. hash based), columns can be in a much more structured format (think of something similar to memcpy for a DOM tree, for instance), etc.

Having said that, unless you use cursors, the postgres
driver will pull the entire result set into memory.

Sigh. I know. The options seem to be:

* Get the whole result set if it fits in memory.
* Seek the same B-tree nodes over and over again if it doesn't (the root should be cached by the RDBMS, of course).

Cursors would definitely be the answer to this, but I recognize that I am in the minority in my need for them. Really, I don't think the cursor would fix my 10-12x problem. For me, it's a matter of bypassing caches and using prepared statements. But, I wanted to deal in objects, and found a fine way to continue to do that without the overhead of OR mapping.

Assuming you can get cursors working, I'd be surprised if
you couldn't match the 10-12x increase you got using another
language. Basically, the postgres driver just pulls bytes off
the socket and makes arrays of strings.

I'm betting that Squeak is capable of that 10-12x with proper massage. But the 10-12x will simply match the prototype implementation, which in turn has not been massaged. (In fact, both implementations are really stupid in that they are SERIAL.) I have figured out how to use a proxy object to get the GLORP sessions to be thread-safe, but the next barrier will be lock contention as the serial implementation becomes a simple producer/consumer queue.

I would say that Squeak is currently state-of-the-art in terms of programmer interface -- Seaside and GLORP are basically unrivalled in their design and terseness. Avi didn't need any of the stuff I mentioned to make a great piece of software. But, the subject was "scalability", so I wanted to offer myself as a data point.

Jeremy

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Colin Putney

Re: Re: Scaling Seaside apps (was: About SToR)

On Aug 3, 2006, at 4:51 PM, Jeremy Shute wrote:

> I agree that there are solutions which involve direct-SQL access,
> and making a mess of otherwise clean code (but a well isolated
> mess, of course). I could also simply contribute to GLORP in order
> to make it better. I would do this in a second if I thought it
> would get me from point A to point B, GLORP is great software!

Hi Jeremy,

You might want to have a look at ROE. It was a little experiment that
Avi did for creating SQL queries in a nice object-oriented way. See
the url below for more explanation. The code is in SqueakMap.

http://www.cincomsmalltalk.com/userblogs/avi/blogView?
showComments=true&entry=3246121322

Colin
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

keith1y

Re: Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Jeremy Shute

Jeremy

am I right in understanding that what you are saying is that squeak is
simply not able to rip through strings as fast as perhaps a perl regex.
Does this mean that there is some overhead in the handling or Strings in
squeak that could use a look. I can imagine that ByteArrays may be more
efficient if less useful than Strings.

Just a thought, but as you talk about all this work with strings and so
forth I am wondering about object creation/deletion overhead.

One example for you. In a UI system that I once used, there were a lot
of Rect objects flying about. It turns out that in this case extreme
performance improvements could be had be simply reusing one Rect
instance and passing it into the routines that need it. Hundreds of
calculations and operations can all be performed without filling any
memory up with instances that are instantly thrown away and hang around
for extensive garbage collection later.

so for example

drawSquare: size
| w |
w := Rect new
w width: size height: size.
do things with a rect here...

becomes

drawSquare: size on: aRect
"note no new object allocation"
w width: size height: size.
do things with rect here.

I have used this tactic/pattern on several occasions many years ago, and
I struggle to remember the details of specific instances, but I think
one such instance was in an import routine. I was importing a data table
of alarms that are raised by a piece of telecoms equipment. The input
would have been a raw text file, the output 6000 or so populated objects
with some munging in between. Simply reusing the same object as a buffer
saved a lot of time.

best regards

Keith

___________________________________________________________
Yahoo! Messenger - with free PC-PC calling and photo sharing. http://uk.messenger.yahoo.com
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Rick Flower

Re: Re: Scaling Seaside apps (was: About SToR)

In reply to this post by Jeremy Shute

Jeremy Shute wrote:

>
> Yes, it's very much in the critical path. I'm sorry but I'm still
> amazed that it can't assemble and ship queries to Postgres as fast as I
> can get data from a cable modem. That was a huge shocker -- I've got a
> weak cable connection on one side and a disk on the other, and rifling
> through strings and objects in RAM is the issue???
>
> I agree that there are solutions which involve direct-SQL access, and
> making a mess of otherwise clean code (but a well isolated mess, of
> course). I could also simply contribute to GLORP in order to make it
> better. I would do this in a second if I thought it would get me from
> point A to point B, GLORP is great software!

Jeremy --

I plopped a note over on the Glorp mailing list about your Cursor
comment (I hope you didn't mind) and got the following reply from Alan
Knight about what happens with Cursors & Glorp (he wanted me to post
this since he wasn't able to post directly) :

==========================================================================
If you can post, you might mention that Glorp actually does everything
internally in terms of cursors. If you want the result set returned only
part at a time, you can set the query collectionType: to
GlorpCursoredStream, which gives you a stream on the results. However,
that will then depend on the underlying driver's behaviour. I know that
in VW, I've seen complaints the Postgresql driver doesn't do cursors
very effectively - it gets all the results before returning anything.
Other drivers, however, certainly do do cursors.
==========================================================================

-- Rick
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside