Smalltalk › Squeak › Squeak - Dev

Minnow WIKI Status

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

34 messages Options

Janko Mivšek

Re: Image as database

I also have that idea for a while and together with Michael Lucas-Smith
we started to work on something like that on VisualWorks, it's called
Prevayler and on my wiki you can find more about it:

http://wiki.eranova.si/aida/Prevayler+persistency

I think that a Prevayler is quite easily achievable especially on
VisualWorks, because it supports so called object immutability, that is,
you can set an object read-only and when someone try to change it, an
UHE is raised. All a Prevayler needs is to catch that exception and save
changes.

I don't know if Sqeak supports immutability too?

Janko

Bert Freudenberg wrote:

> Am 23.10.2006 um 21:30 schrieb Philippe Marschall:
>
>> 2006/10/23, Cees de Groot <[hidden email]>:
>>> On 10/23/06, Philippe Marschall <[hidden email]> wrote:
>>> > > > So about 300 Euros?
>>>
>>> [...]
>>>
>>> > 64bit VM?
>>> >
>>> You pay the hosting bills for a new box? ;)
>>
>> I'm willing to pay 2 GB of RAM if that's what is needed to run Pier.
>> That Squeak can't handle this is a Squeak specific limitation that has
>> nothing to do with the point that memory is that cheap.
>> As pointed out numerous times on squeak-dev and disputed by none, all
>> VM related issues can be fixed easily by just fixing the VM. This is
>> no problem since the VM is open source.
>
> If we had a transactional virtual object memory that's continuously
> saved to disk (think OOZE/LOOM), that might be viable. Perhaps with
> Magma you could have almost the same semantics, just be careful what you
> touch. But not with the current object memory. No way. Not if you care
> about the data.
>
> It's not about RAM being cheap or not. It's about designing for the
> common and the worst case. Why you would want to bring in gigabytes of
> data if the working set is just a few megabytes is beyond me.
>
> - Bert -

Philippe Marschall

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Bert Freudenberg

2006/10/23, Bert Freudenberg <[hidden email]>:

> Am 23.10.2006 um 21:30 schrieb Philippe Marschall:
>
> > 2006/10/23, Cees de Groot <[hidden email]>:
> >> On 10/23/06, Philippe Marschall <[hidden email]> wrote:
> >> > > > So about 300 Euros?
> >>
> >> [...]
> >>
> >> > 64bit VM?
> >> >
> >> You pay the hosting bills for a new box? ;)
> >
> > I'm willing to pay 2 GB of RAM if that's what is needed to run Pier.
> > That Squeak can't handle this is a Squeak specific limitation that has
> > nothing to do with the point that memory is that cheap.
> > As pointed out numerous times on squeak-dev and disputed by none, all
> > VM related issues can be fixed easily by just fixing the VM. This is
> > no problem since the VM is open source.
>
> If we had a transactional virtual object memory that's continuously
> saved to disk (think OOZE/LOOM), that might be viable. Perhaps with
> Magma you could have almost the same semantics, just be careful what
> you touch. But not with the current object memory. No way. Not if you
> care about the data.
>
> It's not about RAM being cheap or not. It's about designing for the
> common and the worst case. Why you would want to bring in gigabytes
> of data if the working set is just a few megabytes is beyond me.

The point was just that holding the whole wiki in the memory is no
problem memory or money wise.

That the vm, like in many other cases too, is the real problem (and
I'm quite sure the Java VM would be up to it) is a completely
unrealted issue.

Philippe

keith1y

Re: Minnow WIKI Migration

In reply to this post by Michael Rueger-6

The process of porting minnow from swiki to pier, is likely to happen.
It will just require a little bit of patience, as the needed components
are road-tested and refined.

As you can imagine my own testing process has been somewhat hindered by
image freezes. Now that I have been informed that Squeak vm 3.6.3 is
actually stable. I have been able to create a pier-wiki with 6200 or
more pages for testing purposes.

Initial Stats Using Pier with PRNullPersistency (i.e. everything in
memory):

6200 pages (generated one for each squeak class from squeak sources)
235352 internal links.
Adding a page 100-500ms
Removing a page 215 seconds!!
(many wikis dont support removing pages anyway)
Total Memory = 77Mb.

Compare this to minnow: Text data of its 5889 pages is about 30Mb.
Of course the swiki, has full page history, and uploaded files too.

Pier Magma should be able to handle this kind of load, but it remains to
be explicitly tested.

Anticipated work to make things workable.
1. Some explicit caching of items that will slow pier-magma down with
data on disk rather than in memory. Removal of pages may be extremely
slow without this.
2. Explicit support for an indexed full text search which avoids the
need to traverse the whole data tree for a simple search.
3. Some form of logging of user edits in addition to the default
persistency strategy.

J J wrote:
>I would say go to Pier. I think Kieth released some software that you
can point >at the swiki and it will slurp it all up. Am I right Kieth?

I haven't written any proper data slurper for minnow. I believe there is
already an importing tool. I have pointed wget at minnow to get the
current set of pages as a test data set. Its about 30Mb or so.

Which leads me to a question. How would pier handle some random person
running this script?

#!/usr/bin/ruby

for i in 1..5889
print `wget --user=squeak --password=viewpoints
http://minnow.cc.gatech.edu/squeak/#{i
}.edit`
end

this would probably create almost 5800 seaside sessions, in a matter of
minutes?

Keith

Send instant messages to your online friends http://uk.messenger.yahoo.com

Bert Freudenberg

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Philippe Marschall

Am 23.10.2006 um 22:55 schrieb Philippe Marschall:

> 2006/10/23, Bert Freudenberg <[hidden email]>:
>> Am 23.10.2006 um 21:30 schrieb Philippe Marschall:
>>
>> > 2006/10/23, Cees de Groot <[hidden email]>:
>> >> On 10/23/06, Philippe Marschall <[hidden email]>
>> wrote:
>> >> > > > So about 300 Euros?
>> >>
>> >> [...]
>> >>
>> >> > 64bit VM?
>> >> >
>> >> You pay the hosting bills for a new box? ;)
>> >
>> > I'm willing to pay 2 GB of RAM if that's what is needed to run
>> Pier.
>> > That Squeak can't handle this is a Squeak specific limitation
>> that has
>> > nothing to do with the point that memory is that cheap.
>> > As pointed out numerous times on squeak-dev and disputed by
>> none, all
>> > VM related issues can be fixed easily by just fixing the VM.
>> This is
>> > no problem since the VM is open source.
>>
>> If we had a transactional virtual object memory that's continuously
>> saved to disk (think OOZE/LOOM), that might be viable. Perhaps with
>> Magma you could have almost the same semantics, just be careful what
>> you touch. But not with the current object memory. No way. Not if you
>> care about the data.
>>
>> It's not about RAM being cheap or not. It's about designing for the
>> common and the worst case. Why you would want to bring in gigabytes
>> of data if the working set is just a few megabytes is beyond me.
>
> The point was just that holding the whole wiki in the memory is no
> problem memory or money wise.

No, this was not the point at all. The point was that *even* if you
could have as many Gigabytes of RAM as you want, holding everything
in the image *without* being backed by some permanent storage does
not scale, and therefore is unsuited for real deployment.

> That the vm, like in many other cases too, is the real problem (and
> I'm quite sure the Java VM would be up to it) is a completely
> unrealted issue.

It's news to me that the Java VM supports an object image. Or that
any real-world system on Java would just load a snapshot of *all* its
data and save it *in whole* later - I sincerely doubt that.

- Bert -

keith1y

Re: Image as database

>
> It's news to me that the Java VM supports an object image. Or that any
> real-world system on Java would just load a snapshot of *all* its data
> and save it *in whole* later - I sincerely doubt that.
>
> - Bert -
>
I wrote a system which held a substantial data set in image with no
problems (this was ST/X).

The biggest problem I had was with stdio.h being limited to 255 open
file descriptors in certain situations on Solaris. As a work around I
had to open 256 dummy file descriptors, then open the 700-1000 file
descriptors that I wanted to use. Then close the dummy file descriptors,
so as to leave some in the range 0-255 available for those parts of the
system that required them.

Another team tried a similar project in perl and another team followed
suit in java. Last I heard they reimplemented from scratch in C++. The
java system took a farm of machines to run it. Following this
experience I have no confidence in the java vm, or associated
technologies being able to run anything of any size or complexity.

Smalltalk can load a simulation of over 1000 interacting telecoms units
with a full simulation of all their configurations, cards, alarms, etc,
and load and have running that simulation in about 20 seconds. The time
it takes to load a 200-400Mb image. Which is not long on a big expensive
sun server machine.

I cant imagine even attempting the same in java without requiring a
database backend and all of the overhead that that would entail.

Overall Squeak's vm may not be as fast as ST/X, but I think it does a
pretty good job.

Keith

Send instant messages to your online friends http://uk.messenger.yahoo.com

Philippe Marschall

Re: Minnow WIKI Migration

In reply to this post by Michael Rueger-6

2006/10/23, Michael Rueger <[hidden email]>:

> Philippe Marschall wrote:
> > 2006/10/23, Cees de Groot <[hidden email]>:
> >> On 10/23/06, Philippe Marschall <[hidden email]> wrote:
> >> > > > So about 300 Euros?
> >>
> >> [...]
> >>
> >> > 64bit VM?
> >> >
> >> You pay the hosting bills for a new box? ;)
> >
> > I'm willing to pay 2 GB of RAM if that's what is needed to run Pier.
> > That Squeak can't handle this is a Squeak specific limitation that has
> > nothing to do with the point that memory is that cheap.
> > As pointed out numerous times on squeak-dev and disputed by none, all
> > VM related issues can be fixed easily by just fixing the VM. This is
> > no problem since the VM is open source.
>
> So I'm assuming you just volunteered to fix these issues so we can
> switch to Pier in a few weeks?

That would be a waste of time because in ten years we will have a new,
fixed and cool VM.

Philippe

Philippe Marschall

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Bert Freudenberg

2006/10/23, Bert Freudenberg <[hidden email]>:

> Am 23.10.2006 um 22:55 schrieb Philippe Marschall:
>
> > 2006/10/23, Bert Freudenberg <[hidden email]>:
> >> Am 23.10.2006 um 21:30 schrieb Philippe Marschall:
> >>
> >> > 2006/10/23, Cees de Groot <[hidden email]>:
> >> >> On 10/23/06, Philippe Marschall <[hidden email]>
> >> wrote:
> >> >> > > > So about 300 Euros?
> >> >>
> >> >> [...]
> >> >>
> >> >> > 64bit VM?
> >> >> >
> >> >> You pay the hosting bills for a new box? ;)
> >> >
> >> > I'm willing to pay 2 GB of RAM if that's what is needed to run
> >> Pier.
> >> > That Squeak can't handle this is a Squeak specific limitation
> >> that has
> >> > nothing to do with the point that memory is that cheap.
> >> > As pointed out numerous times on squeak-dev and disputed by
> >> none, all
> >> > VM related issues can be fixed easily by just fixing the VM.
> >> This is
> >> > no problem since the VM is open source.
> >>
> >> If we had a transactional virtual object memory that's continuously
> >> saved to disk (think OOZE/LOOM), that might be viable. Perhaps with
> >> Magma you could have almost the same semantics, just be careful what
> >> you touch. But not with the current object memory. No way. Not if you
> >> care about the data.
> >>
> >> It's not about RAM being cheap or not. It's about designing for the
> >> common and the worst case. Why you would want to bring in gigabytes
> >> of data if the working set is just a few megabytes is beyond me.
> >
> > The point was just that holding the whole wiki in the memory is no
> > problem memory or money wise.
>
> No, this was not the point at all. The point was that *even* if you
> could have as many Gigabytes of RAM as you want, holding everything
> in the image *without* being backed by some permanent storage does
> not scale, and therefore is unsuited for real deployment.

Let me quote Michael Rueger:
> IIRC this does not pull in the history. For the SmallWiki port Thomas
> back then wrote an importer that imports everything. The persistency
> also avoid having to keep everything in memory which with the amount of
> content on Minnow is not practical anyways. I know memory is cheap, but
> not that cheap ;-)

Having no permanent storage (save image doesn't count) is just plain
stupid and therefore is unsuited for real deployment. But this has
nothing to do with holding the whole data in the image. You can save a
page to the filesystem when it was edited or created and still have it
in the image. Pier has hooks for this since before it was called Pier.

Having all the data in RAM scales the same way as having all the data
on disk. Linearly. IIRC Google can hold almost the entire web in RAM.
So there is virtually no limit to that. I know this is not clever. I
just say it is possible and the cost is not excessive (holding Minnow
in RAM, not the web).

> > That the vm, like in many other cases too, is the real problem (and
> > I'm quite sure the Java VM would be up to it) is a completely
> > unrealted issue.
>
> It's news to me that the Java VM supports an object image. Or that
> any real-world system on Java would just load a snapshot of *all* its
> data and save it *in whole* later - I sincerely doubt that.

I was talking about dealing with 2 GB of RAM.

Philippe

Bert Freudenberg

Re: Image as database (was: Re: Minnow WIKI Migration)

Am 24.10.2006 um 21:04 schrieb Philippe Marschall:

> Having no permanent storage (save image doesn't count) is just plain
> stupid and therefore is unsuited for real deployment. But this has
> nothing to do with holding the whole data in the image. You can save a
> page to the filesystem when it was edited or created and still have it
> in the image. Pier has hooks for this since before it was called Pier.
>
> Having all the data in RAM scales the same way as having all the data
> on disk. Linearly. IIRC Google can hold almost the entire web in RAM.
> So there is virtually no limit to that. I know this is not clever. I
> just say it is possible and the cost is not excessive (holding Minnow
> in RAM, not the web).

I thought we were having a serious discussion, and not just pointing
fingers at RAM prices. Or pointing to non-existent VM technology, as
you did in another thread.

I stand by my assessment that holding *everything* including all
versions of all pages and also all uploaded files in RAM is just
plain stupid.

- Bert -

Lukas Renggli

Re: Re: Image as database (was: Re: Minnow WIKI Migration)

> > Having all the data in RAM scales the same way as having all the data
> > on disk. Linearly. IIRC Google can hold almost the entire web in RAM.
> > So there is virtually no limit to that. I know this is not clever. I
> > just say it is possible and the cost is not excessive (holding Minnow
> > in RAM, not the web).
>
> I thought we were having a serious discussion, and not just pointing
> fingers at RAM prices. Or pointing to non-existent VM technology, as
> you did in another thread.

I strongly second Philippe.

The Squeak VM technology will simply die, if it is unable to
efficiently address more than 2 GB of data and process its
calculations on only 1 CPU. There are technologies like memory-mapped
files that transparently give an unlimited amount of RAM (if the GC
was a bit smarter ...)

> I stand by my assessment that holding *everything* including all
> versions of all pages and also all uploaded files in RAM is just
> plain stupid.

We are used to be called ridiculous and stupid. No problem.

And yes, we do not keep files in RAM. We store them on the file-system
so that Apache can serve them quickly: reading the file into the image
and pushing it into a socket way too slow anyway. And yes, Apache
caches often requested files in the RAM.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Bert Freudenberg

Re: Image as database (was: Re: Minnow WIKI Migration)

Am 24.10.2006 um 22:28 schrieb Lukas Renggli:

>> > Having all the data in RAM scales the same way as having all the
>> data
>> > on disk. Linearly. IIRC Google can hold almost the entire web in
>> RAM.
>> > So there is virtually no limit to that. I know this is not
>> clever. I
>> > just say it is possible and the cost is not excessive (holding
>> Minnow
>> > in RAM, not the web).
>>
>> I thought we were having a serious discussion, and not just pointing
>> fingers at RAM prices. Or pointing to non-existent VM technology, as
>> you did in another thread.
>
> I strongly second Philippe.
>
> The Squeak VM technology will simply die, if it is unable to
> efficiently address more than 2 GB of data and process its
> calculations on only 1 CPU. There are technologies like memory-mapped
> files that transparently give an unlimited amount of RAM (if the GC
> was a bit smarter ...)

Sure. That's just irrelevant to the discussion at hand.

>> I stand by my assessment that holding *everything* including all
>> versions of all pages and also all uploaded files in RAM is just
>> plain stupid.
>
> We are used to be called ridiculous and stupid. No problem.

Come on, I wasn't calling you stupid. Below you actually say that you
are not doing what I described - so why are you upset? And wouldn't
you agree that *if* someone would hold, for example, all uploaded
files of a large Wiki in the Squeak image running on the current VM,
that this would be highly unreasonable? I can imagine systems that
allow that, I pointed out ideas for such systems in fact, but for our
immediate problem we need to stick to what we have.

> And yes, we do not keep files in RAM. We store them on the file-system
> so that Apache can serve them quickly: reading the file into the image
> and pushing it into a socket way too slow anyway. And yes, Apache
> caches often requested files in the RAM.

So we are not in disagreement after all.

- Bert -

timrowledge

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Lukas Renggli

On 24-Oct-06, at 1:28 PM, Lukas Renggli wrote:

>
> The Squeak VM technology will simply die, if it is unable to
> efficiently address more than 2 GB of data and process its
> calculations on only 1 CPU. There are technologies like memory-mapped
> files that transparently give an unlimited amount of RAM (if the GC
> was a bit smarter ...)

Generate your vm with the '64 bit' flag turned on in VMMaker

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: CPM: Change Programmer's Mind

Philippe Marschall

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Bert Freudenberg

2006/10/24, Bert Freudenberg <[hidden email]>:

> Am 24.10.2006 um 21:04 schrieb Philippe Marschall:
>
> > Having no permanent storage (save image doesn't count) is just plain
> > stupid and therefore is unsuited for real deployment. But this has
> > nothing to do with holding the whole data in the image. You can save a
> > page to the filesystem when it was edited or created and still have it
> > in the image. Pier has hooks for this since before it was called Pier.
> >
> > Having all the data in RAM scales the same way as having all the data
> > on disk. Linearly. IIRC Google can hold almost the entire web in RAM.
> > So there is virtually no limit to that. I know this is not clever. I
> > just say it is possible and the cost is not excessive (holding Minnow
> > in RAM, not the web).
>
> I thought we were having a serious discussion, and not just pointing
> fingers at RAM prices. Or pointing to non-existent VM technology, as
> you did in another thread.

No that is not the case. Sorry for the misunderstanding.

> I stand by my assessment that holding *everything* including all
> versions of all pages and also all uploaded files in RAM is just
> plain stupid.

I never questioned that one.

Philippe

Giovanni Giorgi-3

Re: Minnow WIKI Migration

In reply to this post by keith1y

In my experience, trying to fit a so large db in RAM can be a problem.
SqueakVM has proven to be not so kind when the image size grows, even
if we will not have the 2Gb limit any more in 64bit VM

I have used magma years ago, and it was quite slow when data starts to grow.
Even using the Collection class provided for large data set, the
result is not so impressive.

I think magma is improved, but what about using a simple relational
database for the pages?
Squeak has an MySQL driver....and also for postgres...

On 10/23/06, Keith Hodges <[hidden email]> wrote:
> The process of porting minnow from swiki to pier, is likely to happen.
> It will just require a little bit of patience, as the needed components
> are road-tested and refined.
>

--
"Just Design It" -- GG
Software Architect
http://www.objectsroot.com/

Lex Spoon

Re: Image as database (was: Re: Minnow WIKI Migration)

In reply to this post by Bert Freudenberg

Bert Freudenberg <[hidden email]> writes:
> I thought we were having a serious discussion, and not just pointing
> fingers at RAM prices. Or pointing to non-existent VM technology, as
> you did in another thread.
>
> I stand by my assessment that holding *everything* including all
> versions of all pages and also all uploaded files in RAM is just
> plain stupid.

I am not sure about just plain stupid, but it's at least a very risky
thing to do. Dealing with such a large image is almost certainly
harder than rewriting it not to need so much memory.

The challenge to the hardware is just the beginning.

Squeak's VM is not at all made for such big images. I saw "funny" GC
behavior with my 75 MB images for Chuck. I'd want to run some
experiments before entrusting it to a 2 GB image. Probably you'd have
to code your software carefully w.r.t. memory management.

The other problem leaping out to me is managing the data over time,
especially when corruption inevitably occurs. Are you ready to open a
Squeak just to debug the data? Are you aware that Squeak can lose
images in some cases? [1] With the real data in simple files or in a
database, these issues are much less risky.

Overall, I am not opposed to jumping ship from ComSwiki. Indeed, it
would be excellent to use wiki software that is maintained by someone
highly motivated. Even given that, however, should we not wait until
we have something that is *already* better than ComSwiki?

-Lex

[1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2001-January/009731.html