Is Squeak/Pharo an appropriate language choice?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Is Squeak/Pharo an appropriate language choice?

Charles Hixson-2
I'm contemplating a project that would benefit greatly by a persistent
memory image, though I'll eventually (in a year or so) need the 64-bit
image, but:
The image will be a lot larger than RAM.  It would include a directed
graph that had an index of a million or so entries, and most nodes
wouldn't be indexed.  So in order to even load it would need to use some
sort of lazy access.  And I'm not even sure that a Dictionary of over a
million items is reasonable.  (Naturally none of the examples address
this problem.)

Additionally, all of my (written) documentation is so old that it
doesn't even discuss multi-processor systems, so I don't know whether
modern Smalltalks make any use of additional available processors.

I'd really like some advice, and possibly some references.  I know that
Smalltalk has the reputation for being slow (yes, I've been reading
about the recent speed-ups), but much of what I'd need to write in any
other language seems like it may already be present in Smalltalk, so if
it would work, I'd like to choose it.  But I won't be able to test this
until the application has been running for quite awhile, so I would be
very desirable that I know ahead of time.

--
Charles Hixson

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Is Squeak/Pharo an appropriate language choice?

Louis LaBrunda
Hi Charles,

You may want to cross post this on the main Squeak news group:
gmane.comp.lang.smalltalk.squeak.general where there are people that are
interested in non-beginner questions.

I don't have enough experience with Squeak/Pharo to give you an answer but
I'm interested in hearing the answer and in what your project is.

Lou


On Thu, 31 Oct 2013 10:51:49 -0700, Charles Hixson
<[hidden email]> wrote:

>I'm contemplating a project that would benefit greatly by a persistent
>memory image, though I'll eventually (in a year or so) need the 64-bit
>image, but:
>The image will be a lot larger than RAM.  It would include a directed
>graph that had an index of a million or so entries, and most nodes
>wouldn't be indexed.  So in order to even load it would need to use some
>sort of lazy access.  And I'm not even sure that a Dictionary of over a
>million items is reasonable.  (Naturally none of the examples address
>this problem.)
>
>Additionally, all of my (written) documentation is so old that it
>doesn't even discuss multi-processor systems, so I don't know whether
>modern Smalltalks make any use of additional available processors.
>
>I'd really like some advice, and possibly some references.  I know that
>Smalltalk has the reputation for being slow (yes, I've been reading
>about the recent speed-ups), but much of what I'd need to write in any
>other language seems like it may already be present in Smalltalk, so if
>it would work, I'd like to choose it.  But I won't be able to test this
>until the application has been running for quite awhile, so I would be
>very desirable that I know ahead of time.
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

Levente Uzonyi-2
In reply to this post by Charles Hixson-2
On Thu, 31 Oct 2013, Charles Hixson wrote:

> I'm contemplating a project that would benefit greatly by a persistent memory
> image, though I'll eventually (in a year or so) need the 64-bit image, but:
> The image will be a lot larger than RAM.  It would include a directed graph

The current garbage collector is not suitable for large images. GC delays
become noticable when the image grows over a few hundred MBs. Eliot is
working on a better one, but we don't know how it performans until it's
ready.

I don't see how your image could be a lot larger than RAM. It's
technically possible, but it's pretty likely that it would be too slow to
be practical.

> that had an index of a million or so entries, and most nodes wouldn't be
> indexed.  So in order to even load it would need to use some sort of lazy
> access.  And I'm not even sure that a Dictionary of over a million items is
> reasonable.  (Naturally none of the examples address this problem.)

The perfomance of Dictionary mainly depends on the implementation of
#hash and #= of the objects you want to store in it.

>
> Additionally, all of my (written) documentation is so old that it doesn't
> even discuss multi-processor systems, so I don't know whether modern
> Smalltalks make any use of additional available processors.

Squeak/Pharo don't support them from a single image. There are
experimental VMs designed for multi-processor systems (RoarVM, HydraVM),
but AFAIK none of them is ready for production use.

>
> I'd really like some advice, and possibly some references.  I know that
> Smalltalk has the reputation for being slow (yes, I've been reading about the
> recent speed-ups), but much of what I'd need to write in any other language
> seems like it may already be present in Smalltalk, so if it would work, I'd
> like to choose it.  But I won't be able to test this until the application
> has been running for quite awhile, so I would be very desirable that I know
> ahead of time.

It's hard to tell more without knowing more details about the project.


Levente

P.S.: you might want to check out GemStone/S
http://gemtalksystems.com/index.php/products/gemstones/

>
> --
> Charles Hixson
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

Charles Hixson-2
I think you *did* answer my questions. In a way that means a lot of
extra work for me.
Too much of what I want to do depends on things that are currently
experimental in Smalltalk.  It sounds like the image can't load lazily,
which would probably be necessary if this were to work at all.  (Yeah,
the 64-bit image could hold enough, but I don't have the RAM to hold it
all, and getting that much RAM is ridiculous, when most of it would be
rolled out most of the time.)

If I'm going to need to use a database, and handle my own rolling in and
out anyway, then Smalltalk isn't a good choice.  And while multiple
processing is only a speed-up thing, that's a pretty important thing in
and of itself.

Gemstone isn't a good choice as I need a FOSS distributable. (Actually,
if I'm reading the web site properly they don't mention what their
license is, and it seems as if their Smalltalk version is Pharo...which
we've already covered.)

FWIW, I'm well aware that I'm trying to run too much program on too
small a system.  I know this implies a massive speed penalty. But that's
true whatever approach I take.  I was hoping that I could avoid doing my
own memory management, and for that Smalltalk appeared the only feasible
choice.  Apparently, however, I'm trying something a bit beyond the
bleeding edge at the current state of the art.

As to more details as to what I'm planning:
So what I'm going to need to do is connect the graph nodes by id#s, and
roll them in from a database and stick them in a dictionary (indexed by
id#, as most of the nodes won't have any other unique and persistent
id).  This is necessary as each node will link to up to around 80 other
nodes, with some of the links being bidirectional, but not dependably
so.  And I'll need another index of "words" which are indexes from
external symbols into nodes.  Doing it this way, most of it can be kept
rolled out most of the time, but there's an obvious speed penalty.  So
I'll need to track which references are stale and roll them out to disk
(or just drop them, if they aren't dirty).  Etc.  Much of this would
have been handled automatically in Smalltalk, but not the automatic roll
out, apparently.  (In Smalltalk I'd use references rather than id#s, in
fact id#s wouldn't have been needed.)
I'll probably write the first version in Python (rather than Ruby,
because Doxygen documentation for Python is better than I can generate
for Ruby, though Ruby is in some other ways better). Then, when it's
working I'll translate it into D or Ada. (Not yet decided, though D has
the inside track.  Ada has wider support, but D is garbage collected and
has variable sized arrays and built-in hash tables.  Ada currently has a
better interface to databases, but D is improving much more rapidly.  
And D program design structures are more similar to those of Python.  Of
course Vala is an outside chance.  But it's been developing quite
slowly.  And Go seems headed in a different direction, even though it
has an easier support for concurrency.)

P.S.:  Were Smalltalk suitable I'd be needing to repartition my disk to
give me a much larger virtual memory space.  Currently I'm only set up
for around 1.5 Gigabytes, which should be enough for the first few
months, but would limit what else I could be doing towards the end of
that time.

P.P.S:  I also considered a graph database, Neo4j, but they don't
support enough information on the links...though I could coerce integers
into floating point, the loss of precision was worrying. This isn't a
problem that would show up until the id#s started to get large, but
that's not very reassuring.  Also too much appears to need to be decided
at compile time rather than at run time, and this is a very dynamic
system (or it had better be!).

Thank you for your help, and good reporting of the current state of the
environment.

On 10/31/2013 11:40 AM, Levente Uzonyi wrote:

> On Thu, 31 Oct 2013, Charles Hixson wrote:
>
>> I'm contemplating a project that would benefit greatly by a
>> persistent memory image, though I'll eventually (in a year or so)
>> need the 64-bit image, but:
>> The image will be a lot larger than RAM.  It would include a directed
>> graph
>
> The current garbage collector is not suitable for large images. GC
> delays become noticable when the image grows over a few hundred MBs.
> Eliot is working on a better one, but we don't know how it performans
> until it's ready.
>
> I don't see how your image could be a lot larger than RAM. It's
> technically possible, but it's pretty likely that it would be too slow
> to be practical.
>
>> that had an index of a million or so entries, and most nodes wouldn't
>> be indexed.  So in order to even load it would need to use some sort
>> of lazy access.  And I'm not even sure that a Dictionary of over a
>> million items is reasonable.  (Naturally none of the examples address
>> this problem.)
>
> The perfomance of Dictionary mainly depends on the implementation of
> #hash and #= of the objects you want to store in it.
>
>>
>> Additionally, all of my (written) documentation is so old that it
>> doesn't even discuss multi-processor systems, so I don't know whether
>> modern Smalltalks make any use of additional available processors.
>
> Squeak/Pharo don't support them from a single image. There are
> experimental VMs designed for multi-processor systems (RoarVM,
> HydraVM), but AFAIK none of them is ready for production use.
>
>>
>> I'd really like some advice, and possibly some references.  I know
>> that Smalltalk has the reputation for being slow (yes, I've been
>> reading about the recent speed-ups), but much of what I'd need to
>> write in any other language seems like it may already be present in
>> Smalltalk, so if it would work, I'd like to choose it.  But I won't
>> be able to test this until the application has been running for quite
>> awhile, so I would be very desirable that I know ahead of time.
>
> It's hard to tell more without knowing more details about the project.
>
>
> Levente
>
> P.S.: you might want to check out GemStone/S
> http://gemtalksystems.com/index.php/products/gemstones/
>
>>
>> --
>> Charles Hixson
>>
>> _______________________________________________
>> Beginners mailing list
>> [hidden email]
>> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>>


--
Charles Hixson

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Is Squeak/Pharo an appropriate language choice?

Louis LaBrunda
Hi Charles,

>If I'm going to need to use a database, and handle my own rolling in and
>out anyway, then Smalltalk isn't a good choice.  And while multiple
>processing is only a speed-up thing, that's a pretty important thing in
>and of itself.

I think you may need an OODB, you should take a look at Magma
http://wiki.squeak.org/squeak/2665.  You may not need to do as much rolling
in and out on your own as you think.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

Levente Uzonyi-2
In reply to this post by Charles Hixson-2
On Thu, 31 Oct 2013, Charles Hixson wrote:

> I think you *did* answer my questions. In a way that means a lot of extra
> work for me.
> Too much of what I want to do depends on things that are currently
> experimental in Smalltalk.  It sounds like the image can't load lazily, which
> would probably be necessary if this were to work at all.  (Yeah, the 64-bit
> image could hold enough, but I don't have the RAM to hold it all, and getting
> that much RAM is ridiculous, when most of it would be rolled out most of the
> time.)
>
> If I'm going to need to use a database, and handle my own rolling in and out
> anyway, then Smalltalk isn't a good choice.  And while multiple processing is
> only a speed-up thing, that's a pretty important thing in and of itself.

If I understand correctly, Magma[1] and Glorp[2] can both help you with
this project. The former is a pure smalltalk object database, the latter
is an ORM using PostgreSQL.

>
> Gemstone isn't a good choice as I need a FOSS distributable. (Actually, if
> I'm reading the web site properly they don't mention what their license is,
> and it seems as if their Smalltalk version is Pharo...which we've already
> covered.)

GemStone/S is not open source, but there's a free version with some
resource limitation (1 CPU/16 GB IIRC). They use their own Smalltalk
implementation, but it has no GUI, so they wrote tools in Squeak/Pharo,
which let you develop your code.

>
> FWIW, I'm well aware that I'm trying to run too much program on too small a
> system.  I know this implies a massive speed penalty. But that's true
> whatever approach I take.  I was hoping that I could avoid doing my own
> memory management, and for that Smalltalk appeared the only feasible choice.

Magma and Glorp can both help you with this.


Levente

[1] http://wiki.squeak.org/squeak/2665
[2] http://glorpwiki.wikispaces.com/How+Glorp+Works

> Apparently, however, I'm trying something a bit beyond the bleeding edge at
> the current state of the art.
>
> As to more details as to what I'm planning:
> So what I'm going to need to do is connect the graph nodes by id#s, and roll
> them in from a database and stick them in a dictionary (indexed by id#, as
> most of the nodes won't have any other unique and persistent id).  This is
> necessary as each node will link to up to around 80 other nodes, with some of
> the links being bidirectional, but not dependably so.  And I'll need another
> index of "words" which are indexes from external symbols into nodes.  Doing
> it this way, most of it can be kept rolled out most of the time, but there's
> an obvious speed penalty.  So I'll need to track which references are stale
> and roll them out to disk (or just drop them, if they aren't dirty).  Etc.
> Much of this would have been handled automatically in Smalltalk, but not the
> automatic roll out, apparently.  (In Smalltalk I'd use references rather than
> id#s, in fact id#s wouldn't have been needed.)
> I'll probably write the first version in Python (rather than Ruby, because
> Doxygen documentation for Python is better than I can generate for Ruby,
> though Ruby is in some other ways better). Then, when it's working I'll
> translate it into D or Ada. (Not yet decided, though D has the inside track.
> Ada has wider support, but D is garbage collected and has variable sized
> arrays and built-in hash tables.  Ada currently has a better interface to
> databases, but D is improving much more rapidly.  And D program design
> structures are more similar to those of Python.  Of course Vala is an outside
> chance.  But it's been developing quite slowly.  And Go seems headed in a
> different direction, even though it has an easier support for concurrency.)
>
> P.S.:  Were Smalltalk suitable I'd be needing to repartition my disk to give
> me a much larger virtual memory space.  Currently I'm only set up for around
> 1.5 Gigabytes, which should be enough for the first few months, but would
> limit what else I could be doing towards the end of that time.
>
> P.P.S:  I also considered a graph database, Neo4j, but they don't support
> enough information on the links...though I could coerce integers into
> floating point, the loss of precision was worrying. This isn't a problem that
> would show up until the id#s started to get large, but that's not very
> reassuring.  Also too much appears to need to be decided at compile time
> rather than at run time, and this is a very dynamic system (or it had better
> be!).
>
> Thank you for your help, and good reporting of the current state of the
> environment.
>
> On 10/31/2013 11:40 AM, Levente Uzonyi wrote:
>> On Thu, 31 Oct 2013, Charles Hixson wrote:
>>
>>> I'm contemplating a project that would benefit greatly by a persistent
>>> memory image, though I'll eventually (in a year or so) need the 64-bit
>>> image, but:
>>> The image will be a lot larger than RAM.  It would include a directed
>>> graph
>>
>> The current garbage collector is not suitable for large images. GC delays
>> become noticable when the image grows over a few hundred MBs. Eliot is
>> working on a better one, but we don't know how it performans until it's
>> ready.
>>
>> I don't see how your image could be a lot larger than RAM. It's technically
>> possible, but it's pretty likely that it would be too slow to be practical.
>>
>>> that had an index of a million or so entries, and most nodes wouldn't be
>>> indexed.  So in order to even load it would need to use some sort of lazy
>>> access.  And I'm not even sure that a Dictionary of over a million items
>>> is reasonable.  (Naturally none of the examples address this problem.)
>>
>> The perfomance of Dictionary mainly depends on the implementation of #hash
>> and #= of the objects you want to store in it.
>>
>>>
>>> Additionally, all of my (written) documentation is so old that it doesn't
>>> even discuss multi-processor systems, so I don't know whether modern
>>> Smalltalks make any use of additional available processors.
>>
>> Squeak/Pharo don't support them from a single image. There are experimental
>> VMs designed for multi-processor systems (RoarVM, HydraVM), but AFAIK none
>> of them is ready for production use.
>>
>>>
>>> I'd really like some advice, and possibly some references.  I know that
>>> Smalltalk has the reputation for being slow (yes, I've been reading about
>>> the recent speed-ups), but much of what I'd need to write in any other
>>> language seems like it may already be present in Smalltalk, so if it would
>>> work, I'd like to choose it.  But I won't be able to test this until the
>>> application has been running for quite awhile, so I would be very
>>> desirable that I know ahead of time.
>>
>> It's hard to tell more without knowing more details about the project.
>>
>>
>> Levente
>>
>> P.S.: you might want to check out GemStone/S
>> http://gemtalksystems.com/index.php/products/gemstones/
>>
>>>
>>> --
>>> Charles Hixson
>>>
>>> _______________________________________________
>>> Beginners mailing list
>>> [hidden email]
>>> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>>>
>
>
> --
> Charles Hixson
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

Charles Hixson-2
In reply to this post by Louis LaBrunda
On 10/31/2013 01:28 PM, Louis LaBrunda wrote:
Hi Charles,

If I'm going to need to use a database, and handle my own rolling in and 
out anyway, then Smalltalk isn't a good choice.  And while multiple 
processing is only a speed-up thing, that's a pretty important thing in 
and of itself.
I think you may need an OODB, you should take a look at Magma
http://wiki.squeak.org/squeak/2665.  You may not need to do as much rolling
in and out on your own as you think.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
[hidden email] http://www.Keystone-Software.com
Short answer:
Probably not sufficient.

Long answer (excuse the rambling, I was thinking it through as I wrote it):
If I'm understanding http://wiki.squeak.org/squeak/2639 correctly, which I may not be, I'd still need to recode the entire graph structure to be designed in terms of id#s (keys) rather than direct references.
I.e., I'd need to code it in terms of two collections one of which would contain keys that, when interpreted, referenced itself.  This does appear to move the plan into the area of the possible, but at the cost of the advantage that I'd hoped Smalltalk would provide of a large persistent image.  I thought at first when it was talking about transparency that this wouldn't be necessary, but:

Magma can maintain and quickly "search" large, flat structures, but the normal Smalltalk collections such as Bag or OrderedCollection are not suitable for this. The contiguous ByteArray records Magma uses to store and transport Smalltalk objects would be impractical for a large Smalltalk Collection
Seems to mean that the Graph couldn't be stored as something that Magma would recognize as a graph.  So does "Objects are persisted by reachability", though that has other possible interpretations.  But since the graph would contain a very large number of cycles in multiple "dimensions"...  OTOH http://wiki.squeak.org/squeak/2638 on Read Strategies appears to mean that it wouldn't automatically (or rather could be set to not automatically) pull in items that are references within the object being read.

Again, http://wiki.squeak.org/squeak/5722 , may mean that a class with named variables holding 4 arrays of arrays of length 3 (reference float float) and a few other variables containing things like bools and strings and ints, would be handled without problem.  But note that each of those references is to an item of the same type, and it could include cycles.  So I can't decide WHAT it means.  Do I need to recode the references as id#s? Does that even suffice?  (If it does, then it's still a good deal.  But if I must name each entry separately, it's not a good deal at all, as the number of entries in each of the 4 outer level arrays is highly variable, and though I intend to apply an upper limit, only experiment can determine what a reasonable upper limit is.)

And yet again (if I'm understanding correctly) I'm going to need to violate just about every one of the hints on performance in http://wiki.squeak.org/squeak/2985 .  I'm not sure how much MagmaArray keeps in RAM of things that aren't currently in use.  At one point it sounded like 6 bytes.  This is actually a lot of overhead in this kind of a system.

Additionally, it appears that Magma doesn't have anyway to detect that a reference is "stale" (i.e., hasn't been referenced in a long time), an use that to decide to roll it out.  It looks as if this needs to be done by the program...but that time-stamp (and a few other items mustn't (well, needn't...but I sure would need to overwrite it when I read it in) itself be included in the items rolled out.  So I need to solve THAT problem.

Magma seems to be a good object database, but I can't see that it makes Smalltalk a desirable choice for this project  (It may, this could be a documentation problem...either my not understanding it or the information not being clear.)  If I'm going to recode the references into id#s, then either Ruby or Python make it trivial to turn the object into a string (and to reconstitute it later), and they also make it trivial to leave out any volatile variables.  Perhaps Magma does the latter, but this wasn't clear.

Definitely a part of my problem is that I don't have a clear image of how I would proceed.  The only examples given were small fragments, extremely useful in clarifying points, but insufficient to yield a larger idea of how to use things.  (E.g., I have no idea how to do Ma Object Serialization, but I may need to implement it anyway.)

Perhaps this is all because I don't really know Smalltalk well...which I assuredly don't.  I was hoping to use Smalltalk to avoid the database problem, trading RAM (including virtual RAM) consumption for capacity, but it looks as if I end up at a database anyway.  And in that case I should use a language that I'm already familiar with.  (I'd really been hoping that the persistent image would be the answer.)  If I do a decomposition I could even get away with using a key-value store.  The only problem is that the id# requires lookup via an indirect reference.  (Is it in the Directory?  If not, get it from the database, if not, it's a new value.)  Once I do the recoding of references to id#s, the database portion is "trivial, but annoying". But now I've added thousands of additional indirections/second.  However, IIUC, Magma would be doing that under the hood anyway (as opposed to the image, which would be handled in hardware memory translation), and If I code it, I can put in things like automatically rolling out when it's stale.  (By the way, does "stub" mean remove from memory, or remove from the database?  From context I decided it probably meant remove from memory, but I couldn't decide whether dirty data would be written before being removed from memory, and I couldn't be really sure it wasn't just being deleted.  That needs rephrasing by someone who knows what it's supposed to mean.)

To me this appears to be, again, not the project that justifies implementation in Smalltalk.  Perhaps if I were already experienced in Smalltalk I wouldn't see things that way, as Magma clearly means that Smalltalk *can* handle doing the project.

Thank you for your suggestion.
-- 
Charles Hixson

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Is Squeak/Pharo an appropriate language choice?

Louis LaBrunda
Hi Charles,

I don't know enough about Magma to answer your questions.  I'm really a VA
Smalltalk guy and only play a little with Squeak.  I knew just enough about
Magma to point you to it.  I'm sure there are a lot of Squeakers that know
about Magma and can probably answer your questions but they are probably
not reading this list.  Try re-posting on:
gmane.comp.lang.smalltalk.squeak.general.  There may also be a Magma
specific list but I'm not sure about that.

Before you decide you need a database for sure maybe you could experiment
with creating a lot of data in your image and see how long it takes to
load/save.  If it isn't too long then the OS paging out pieces and then
back when needed might not be too bad?

Also, if you are going to use a database maybe you could use a hash (big)
for the id's?

Lou

On Thu, 31 Oct 2013 14:53:03 -0700, Charles Hixson
<[hidden email]> wrote:

>On 10/31/2013 01:28 PM, Louis LaBrunda wrote:
>> Hi Charles,
>>
>>> If I'm going to need to use a database, and handle my own rolling in and
>>> out anyway, then Smalltalk isn't a good choice.  And while multiple
>>> processing is only a speed-up thing, that's a pretty important thing in
>>> and of itself.
>> I think you may need an OODB, you should take a look at Magma
>> http://wiki.squeak.org/squeak/2665.  You may not need to do as much rolling
>> in and out on your own as you think.
>>
>> Lou
>> -----------------------------------------------------------
>> Louis LaBrunda
>> Keystone Software Corp.
>> SkypeMe callto://PhotonDemon
>> mailto:[hidden email] http://www.Keystone-Software.com
>Short answer:
>Probably not sufficient.
>
>Long answer (excuse the rambling, I was thinking it through as I wrote it):
>If I'm understanding http://wiki.squeak.org/squeak/2639 correctly, which
>I may not be, I'd still need to recode the entire graph structure to be
>designed in terms of id#s (keys) rather than direct references.
>I.e., I'd need to code it in terms of two collections one of which would
>contain keys that, when interpreted, referenced itself.  This does
>appear to move the plan into the area of the possible, but at the cost
>of the advantage that I'd hoped Smalltalk would provide of a large
>persistent image.  I thought at first when it was talking about
>transparency that this wouldn't be necessary, but:
>
>> Magma *can* maintain and quickly "search" large, flat structures, but
>> the normal Smalltalk collections such as Bag or OrderedCollection are
>> not suitable for this. The contiguous ByteArray records Magma uses to
>> store and transport Smalltalk objects would be impractical for a large
>> Smalltalk Collection
>Seems to mean that the Graph couldn't be stored as something that Magma
>would recognize as a graph.  So does "Objects are persisted by
>reachability", though that has other possible interpretations.  But
>since the graph would contain a very large number of cycles in multiple
>"dimensions"...  OTOH http://wiki.squeak.org/squeak/2638 on Read
>Strategies appears to mean that it wouldn't automatically (or rather
>could be set to not automatically) pull in items that are references
>within the object being read.
>
>Again, http://wiki.squeak.org/squeak/5722 , may mean that a class with
>named variables holding 4 arrays of arrays of length 3 (reference float
>float) and a few other variables containing things like bools and
>strings and ints, would be handled without problem. But note that each
>of those references is to an item of the same type, and it could include
>cycles.  So I can't decide WHAT it means.  Do I need to recode the
>references as id#s? Does that even suffice?  (If it does, then it's
>still a good deal.  But if I must name each entry separately, it's not a
>good deal at all, as the number of entries in each of the 4 outer level
>arrays is highly variable, and though I intend to apply an upper limit,
>only experiment can determine what a reasonable upper limit is.)
>
>And yet again (if I'm understanding correctly) I'm going to need to
>violate just about every one of the hints on performance in
>http://wiki.squeak.org/squeak/2985 .  I'm not sure how much MagmaArray
>keeps in RAM of things that aren't currently in use.  At one point it
>sounded like 6 bytes.  This is actually a lot of overhead in this kind
>of a system.
>
>Additionally, it appears that Magma doesn't have anyway to detect that a
>reference is "stale" (i.e., hasn't been referenced in a long time), an
>use that to decide to roll it out.  It looks as if this needs to be done
>by the program...but that time-stamp (and a few other items mustn't
>(well, needn't...but I sure would need to overwrite it when I read it
>in) itself be included in the items rolled out.  So I need to solve THAT
>problem.
>
>Magma seems to be a good object database, but I can't see that it makes
>Smalltalk a desirable choice for this project  (It may, this could be a
>documentation problem...either my not understanding it or the
>information not being clear.)  If I'm going to recode the references
>into id#s, then either Ruby or Python make it trivial to turn the object
>into a string (and to reconstitute it later), and they also make it
>trivial to leave out any volatile variables. Perhaps Magma does the
>latter, but this wasn't clear.
>
>Definitely a part of my problem is that I don't have a clear image of
>how I would proceed.  The only examples given were small fragments,
>extremely useful in clarifying points, but insufficient to yield a
>larger idea of how to use things.  (E.g., I have no idea how to do Ma
>Object Serialization, but I may need to implement it anyway.)
>
>Perhaps this is all because I don't really know Smalltalk well...which I
>assuredly don't.  I was hoping to use Smalltalk to avoid the database
>problem, trading RAM (including virtual RAM) consumption for capacity,
>but it looks as if I end up at a database anyway.  And in that case I
>should use a language that I'm already familiar with.  (I'd really been
>hoping that the persistent image would be the answer.)  If I do a
>decomposition I could even get away with using a key-value store.  The
>only problem is that the id# requires lookup via an indirect reference.  
>(Is it in the Directory?  If not, get it from the database, if not, it's
>a new value.)  Once I do the recoding of references to id#s, the
>database portion is "trivial, but annoying". But now I've added
>thousands of additional indirections/second.  However, IIUC, Magma would
>be doing that under the hood anyway (as opposed to the image, which
>would be handled in hardware memory translation), and If I code it, I
>can put in things like automatically rolling out when it's stale.  (By
>the way, does "stub" mean remove from memory, or remove from the
>database?  From context I decided it probably meant remove from memory,
>but I couldn't decide whether dirty data would be written before being
>removed from memory, and I couldn't be really sure it wasn't just being
>deleted.  That needs rephrasing by someone who knows what it's supposed
>to mean.)
>
>To me this appears to be, again, not the project that justifies
>implementation in Smalltalk.  Perhaps if I were already experienced in
>Smalltalk I wouldn't see things that way, as Magma clearly means that
>Smalltalk *can* handle doing the project.
>
>Thank you for your suggestion.
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

David T. Lewis
Hi Charles,

You can join the squeak-dev list here:
http://lists.squeakfoundation.org/mailman/listinfo/squeak-dev

Chris Muller, the developer of Magma, is very active on that list and I am
sure he will be happy to answer any questions.

For commercial applications you should also we aware of Gemstone, which is
a well regarded Smalltalk object-oriented database that can be used in
conjunction with Squeak or Pharo. It's marketed as a database but it is
really more like a multi-user Smalltalk with unlimited persistence.

If your background is with the typical sort of Java (or whatever) backed
by relational database, you may be surprised at how much you can do with a
few hundred megabytes of Smalltalk, using simple image persistence and no
database at all. A little bit of good design goes a long way, and that's a
lot easier to do with Smalltalk (and Magma or Gemstone).

> Hi Charles,
>
> I don't know enough about Magma to answer your questions.  I'm really a VA
> Smalltalk guy and only play a little with Squeak.  I knew just enough
> about
> Magma to point you to it.  I'm sure there are a lot of Squeakers that know
> about Magma and can probably answer your questions but they are probably
> not reading this list.  Try re-posting on:
> gmane.comp.lang.smalltalk.squeak.general.  There may also be a Magma
> specific list but I'm not sure about that.
>
> Before you decide you need a database for sure maybe you could experiment
> with creating a lot of data in your image and see how long it takes to
> load/save.  If it isn't too long then the OS paging out pieces and then
> back when needed might not be too bad?
>
> Also, if you are going to use a database maybe you could use a hash (big)
> for the id's?
>
> Lou
>
> On Thu, 31 Oct 2013 14:53:03 -0700, Charles Hixson
> <[hidden email]> wrote:
>
>>On 10/31/2013 01:28 PM, Louis LaBrunda wrote:
>>> Hi Charles,
>>>
>>>> If I'm going to need to use a database, and handle my own rolling in
>>>> and
>>>> out anyway, then Smalltalk isn't a good choice.  And while multiple
>>>> processing is only a speed-up thing, that's a pretty important thing
>>>> in
>>>> and of itself.
>>> I think you may need an OODB, you should take a look at Magma
>>> http://wiki.squeak.org/squeak/2665.  You may not need to do as much
>>> rolling
>>> in and out on your own as you think.
>>>
>>> Lou
>>> -----------------------------------------------------------
>>> Louis LaBrunda
>>> Keystone Software Corp.
>>> SkypeMe callto://PhotonDemon
>>> mailto:[hidden email] http://www.Keystone-Software.com
>>Short answer:
>>Probably not sufficient.
>>
>>Long answer (excuse the rambling, I was thinking it through as I wrote
>> it):
>>If I'm understanding http://wiki.squeak.org/squeak/2639 correctly, which
>>I may not be, I'd still need to recode the entire graph structure to be
>>designed in terms of id#s (keys) rather than direct references.
>>I.e., I'd need to code it in terms of two collections one of which would
>>contain keys that, when interpreted, referenced itself.  This does
>>appear to move the plan into the area of the possible, but at the cost
>>of the advantage that I'd hoped Smalltalk would provide of a large
>>persistent image.  I thought at first when it was talking about
>>transparency that this wouldn't be necessary, but:
>>
>>> Magma *can* maintain and quickly "search" large, flat structures, but
>>> the normal Smalltalk collections such as Bag or OrderedCollection are
>>> not suitable for this. The contiguous ByteArray records Magma uses to
>>> store and transport Smalltalk objects would be impractical for a large
>>> Smalltalk Collection
>>Seems to mean that the Graph couldn't be stored as something that Magma
>>would recognize as a graph.  So does "Objects are persisted by
>>reachability", though that has other possible interpretations.  But
>>since the graph would contain a very large number of cycles in multiple
>>"dimensions"...  OTOH http://wiki.squeak.org/squeak/2638 on Read
>>Strategies appears to mean that it wouldn't automatically (or rather
>>could be set to not automatically) pull in items that are references
>>within the object being read.
>>
>>Again, http://wiki.squeak.org/squeak/5722 , may mean that a class with
>>named variables holding 4 arrays of arrays of length 3 (reference float
>>float) and a few other variables containing things like bools and
>>strings and ints, would be handled without problem. But note that each
>>of those references is to an item of the same type, and it could include
>>cycles.  So I can't decide WHAT it means.  Do I need to recode the
>>references as id#s? Does that even suffice?  (If it does, then it's
>>still a good deal.  But if I must name each entry separately, it's not a
>>good deal at all, as the number of entries in each of the 4 outer level
>>arrays is highly variable, and though I intend to apply an upper limit,
>>only experiment can determine what a reasonable upper limit is.)
>>
>>And yet again (if I'm understanding correctly) I'm going to need to
>>violate just about every one of the hints on performance in
>>http://wiki.squeak.org/squeak/2985 .  I'm not sure how much MagmaArray
>>keeps in RAM of things that aren't currently in use.  At one point it
>>sounded like 6 bytes.  This is actually a lot of overhead in this kind
>>of a system.
>>
>>Additionally, it appears that Magma doesn't have anyway to detect that a
>>reference is "stale" (i.e., hasn't been referenced in a long time), an
>>use that to decide to roll it out.  It looks as if this needs to be done
>>by the program...but that time-stamp (and a few other items mustn't
>>(well, needn't...but I sure would need to overwrite it when I read it
>>in) itself be included in the items rolled out.  So I need to solve THAT
>>problem.
>>
>>Magma seems to be a good object database, but I can't see that it makes
>>Smalltalk a desirable choice for this project  (It may, this could be a
>>documentation problem...either my not understanding it or the
>>information not being clear.)  If I'm going to recode the references
>>into id#s, then either Ruby or Python make it trivial to turn the object
>>into a string (and to reconstitute it later), and they also make it
>>trivial to leave out any volatile variables. Perhaps Magma does the
>>latter, but this wasn't clear.
>>
>>Definitely a part of my problem is that I don't have a clear image of
>>how I would proceed.  The only examples given were small fragments,
>>extremely useful in clarifying points, but insufficient to yield a
>>larger idea of how to use things.  (E.g., I have no idea how to do Ma
>>Object Serialization, but I may need to implement it anyway.)
>>
>>Perhaps this is all because I don't really know Smalltalk well...which I
>>assuredly don't.  I was hoping to use Smalltalk to avoid the database
>>problem, trading RAM (including virtual RAM) consumption for capacity,
>>but it looks as if I end up at a database anyway.  And in that case I
>>should use a language that I'm already familiar with.  (I'd really been
>>hoping that the persistent image would be the answer.)  If I do a
>>decomposition I could even get away with using a key-value store.  The
>>only problem is that the id# requires lookup via an indirect reference.
>>(Is it in the Directory?  If not, get it from the database, if not, it's
>>a new value.)  Once I do the recoding of references to id#s, the
>>database portion is "trivial, but annoying". But now I've added
>>thousands of additional indirections/second.  However, IIUC, Magma would
>>be doing that under the hood anyway (as opposed to the image, which
>>would be handled in hardware memory translation), and If I code it, I
>>can put in things like automatically rolling out when it's stale.  (By
>>the way, does "stub" mean remove from memory, or remove from the
>>database?  From context I decided it probably meant remove from memory,
>>but I couldn't decide whether dirty data would be written before being
>>removed from memory, and I couldn't be really sure it wasn't just being
>>deleted.  That needs rephrasing by someone who knows what it's supposed
>>to mean.)
>>
>>To me this appears to be, again, not the project that justifies
>>implementation in Smalltalk.  Perhaps if I were already experienced in
>>Smalltalk I wouldn't see things that way, as Magma clearly means that
>>Smalltalk *can* handle doing the project.
>>
>>Thank you for your suggestion.
> -----------------------------------------------------------
> Louis LaBrunda
> Keystone Software Corp.
> SkypeMe callto://PhotonDemon
> mailto:[hidden email] http://www.Keystone-Software.com
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Is Squeak/Pharo an appropriate language choice?

Chris Muller-3
In reply to this post by Charles Hixson-2
Hi Charles, when I saw the description of what you were looking for --

  - an object model that larger than the size of available RAM
  - transparent access
  - keyword access
  - multi-core access and updates

I wanted to let you know Magma fits that problem domain like a glove.
Whether it can meet your performance requirements -- only you can
decide, but maybe I can at least clarify some of your questions.

> Long answer (excuse the rambling, I was thinking it through as I wrote it):
> If I'm understanding http://wiki.squeak.org/squeak/2639 correctly, which I
> may not be, I'd still need to recode the entire graph structure to be
> designed in terms of id#s (keys) rather than direct references.
> I.e., I'd need to code it in terms of two collections one of which would
> contain keys that, when interpreted, referenced itself.  This does appear to
> move the plan into the area of the possible, but at the cost of the
> advantage that I'd hoped Smalltalk would provide of a large persistent
> image.  I thought at first when it was talking about transparency that this
> wouldn't be necessary, but:

No.  There is no inherent requirement for any object to have id's.
You can, of course, but ODBMS's, whether GemStone or Magma, access
their objects transparently via direct pointer.

> Magma can maintain and quickly "search" large, flat structures, but the
> normal Smalltalk collections such as Bag or OrderedCollection are not
> suitable for this. The contiguous ByteArray records Magma uses to store and
> transport Smalltalk objects would be impractical for a large Smalltalk
> Collection
>
> Seems to mean that the Graph couldn't be stored as something that Magma
> would recognize as a graph.

I'm not sure what you mean by "recognize as a graph" but I don't think
that's correct.  MagmaCollections are treated the same as regular
collections, except that they can be very large and with increased
concurrency between sessions.

> So does "Objects are persisted by
> reachability", though that has other possible interpretations.  But since
> the graph would contain a very large number of cycles in multiple
> "dimensions"...  OTOH http://wiki.squeak.org/squeak/2638 on Read Strategies
> appears to mean that it wouldn't automatically (or rather could be set to
> not automatically) pull in items that are references within the object being
> read.

ReadStrategies are a performance optimization only.  You should never
use them except in very special cases after observing and diagnosing
slowness.

> Again, http://wiki.squeak.org/squeak/5722 , may mean that a class with named
> variables holding 4 arrays of arrays of length 3 (reference float float) and
> a few other variables containing things like bools and strings and ints,
> would be handled without problem.  But note that each of those references is
> to an item of the same type, and it could include cycles.  So I can't decide
> WHAT it means.  Do I need to recode the references as id#s? Does that even
> suffice?  (If it does, then it's still a good deal.  But if I must name each
> entry separately, it's not a good deal at all, as the number of entries in
> each of the 4 outer level arrays is highly variable, and though I intend to
> apply an upper limit, only experiment can determine what a reasonable upper
> limit is.)

Sorry if I'm having trouble understanding your question here.  Why
would you need to "recode the references as id's?"  ODBMS's preserve
the graph in the exact shape it was committed, including cycles.

> And yet again (if I'm understanding correctly) I'm going to need to violate
> just about every one of the hints on performance in
> http://wiki.squeak.org/squeak/2985 .  I'm not sure how much MagmaArray keeps
> in RAM of things that aren't currently in use.  At one point it sounded like
> 6 bytes.  This is actually a lot of overhead in this kind of a system.

MagmaArray's keep just one "page" of objects in memory at a time.  The
default page is 125, meaning 125 objects it references.  But you can
change that to anything you want as long as its > 0.

> Additionally, it appears that Magma doesn't have anyway to detect that a
> reference is "stale" (i.e., hasn't been referenced in a long time), an use
> that to decide to roll it out.  It looks as if this needs to be done by the
> program...but that time-stamp (and a few other items mustn't (well,
> needn't...but I sure would need to overwrite it when I read it in) itself be
> included in the items rolled out.  So I need to solve THAT problem.

When you said, "hasn't been referenced in a long time" I assume you
meant "hasn't been ACCESSED in a long time".  When you say "roll it
out" I assume you mean remove it from memory so RAM can be recovered?
If so, you should know that Magma only references retrieved objects
via Weak collections.  If your app is no longer referencing them,
they'll get "rolled out" automatically.  If your app is, obviously
they won't.

> Magma seems to be a good object database, but I can't see that it makes
> Smalltalk a desirable choice for this project  (It may, this could be a
> documentation problem...either my not understanding it or the information
> not being clear.)  If I'm going to recode the references into id#s, then
> either Ruby or Python make it trivial to turn the object into a string (and
> to reconstitute it later), and they also make it trivial to leave out any
> volatile variables.  Perhaps Magma does the latter, but this wasn't clear.
>
> Definitely a part of my problem is that I don't have a clear image of how I
> would proceed.  The only examples given were small fragments, extremely
> useful in clarifying points, but insufficient to yield a larger idea of how
> to use things.  (E.g., I have no idea how to do Ma Object Serialization, but
> I may need to implement it anyway.)

You could install and experiment with it..?  That's the Smalltalk way.

> Perhaps this is all because I don't really know Smalltalk well...which I
> assuredly don't.  I was hoping to use Smalltalk to avoid the database
> problem, trading RAM (including virtual RAM) consumption for capacity, but
> it looks as if I end up at a database anyway.  And in that case I should use
> a language that I'm already familiar with.  (I'd really been hoping that the
> persistent image would be the answer.)  If I do a decomposition I could even
> get away with using a key-value store.  The only problem is that the id#
> requires lookup via an indirect reference.  (Is it in the Directory?  If
> not, get it from the database, if not, it's a new value.)  Once I do the
> recoding of references to id#s, the database portion is "trivial, but
> annoying". But now I've added thousands of additional indirections/second.
> However, IIUC, Magma would be doing that under the hood anyway (as opposed
> to the image, which would be handled in hardware memory translation), and If
> I code it, I can put in things like automatically rolling out when it's
> stale.  (By the way, does "stub" mean remove from memory, or remove from the
> database?  From context I decided it probably meant remove from memory, but
> I couldn't decide whether dirty data would be written before being removed
> from memory, and I couldn't be really sure it wasn't just being deleted.
> That needs rephrasing by someone who knows what it's supposed to mean.)

#stubOut: is something I, myself, have rarely ever used.  It means
convert the object back to a Proxy.  If there's a chance it has been
changed, then it should only be used right after a commit because it
does NOT imply any writes to the DB.

> To me this appears to be, again, not the project that justifies
> implementation in Smalltalk.  Perhaps if I were already experienced in
> Smalltalk I wouldn't see things that way, as Magma clearly means that
> Smalltalk *can* handle doing the project.

Ok, good luck.

> Thank you for your suggestion.
>
> --
> Charles Hixson
>
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners