serializing object graphs / ordering graph anchors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

serializing object graphs / ordering graph anchors

NorbertHartl
Before I start a utility myself I want to ask if there is such a tool already.

I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.

If I look at the serializers I know the mechanics are simple:

1. take an object and look it up in a table.
2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.

If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.

One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.

Does anybody know if such a tool exists?

Norbert


Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

Jon Paynter-2
Can you raise the stack limit on your gem processes?
here we have a "standard" smalltalk stack depth of 10k, and it works well for deeply recursive processes.

On Thu, Jun 2, 2011 at 3:38 AM, Norbert Hartl <[hidden email]> wrote:
Before I start a utility myself I want to ask if there is such a tool already.

I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.

If I look at the serializers I know the mechanics are simple:

1. take an object and look it up in a table.
2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.

If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.

One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.

Does anybody know if such a tool exists?

Norbert



Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

NorbertHartl

Am 02.06.2011 um 20:10 schrieb Jon Paynter:

Can you raise the stack limit on your gem processes?
here we have a "standard" smalltalk stack depth of 10k, and it works well for deeply recursive processes.

It works with 10k as stack size. I just don't know for how long it will be ok. In the worst case in a highly interconnected graph the stack size can be as deep as the number of objects you have. And then 10k objects is not that much. 
Do you have the setting of 10k stack size all the time or just for doing stuff that needs it (it is a setting of the stone not the client gem). I mean the stack size can be cumbersome to adjust but on the other hand it can save you at the same time. For normal operation someone can assume something has gone wrong if the stack size is exceeded. 

So I'm trying to get a gut feeling how to tackle this sorts of problems.

thank you,

Norbert

On Thu, Jun 2, 2011 at 3:38 AM, Norbert Hartl <[hidden email]> wrote:
Before I start a utility myself I want to ask if there is such a tool already.

I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.

If I look at the serializers I know the mechanics are simple:

1. take an object and look it up in a table.
2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.

If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.

One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.

Does anybody know if such a tool exists?

Norbert




Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

Jon Paynter-2
On Fri, Jun 3, 2011 at 1:24 AM, Norbert Hartl <[hidden email]> wrote:

Am 02.06.2011 um 20:10 schrieb Jon Paynter:

Can you raise the stack limit on your gem processes?
here we have a "standard" smalltalk stack depth of 10k, and it works well for deeply recursive processes.

It works with 10k as stack size. I just don't know for how long it will be ok. In the worst case in a highly interconnected graph the stack size can be as deep as the number of objects you have. And then 10k objects is not that much. 
Do you have the setting of 10k stack size all the time or just for doing stuff that needs it (it is a setting of the stone not the client gem). I mean the stack size can be cumbersome to adjust but on the other hand it can save you at the same time. For normal operation someone can assume something has gone wrong if the stack size is exceeded. 
well based on what ive seen over time, the setting
GEM_MAX_SMALLTALK_STACK_DEPTH = 10000;
is set for all servers, so when we see a stack overflow error, something went wrong and it needs fixing.
 

So I'm trying to get a gut feeling how to tackle this sorts of problems.
And if somebody runs out of stack space, then the burden is placed on the developer to adjust their design or fixes so it doesn't happen.   Having chased a few of these, i know  its possible to fix... but not easy.

Lastly, since 10k is the max allowed there isnt any room to increase it further.

 

thank you,

Norbert

On Thu, Jun 2, 2011 at 3:38 AM, Norbert Hartl <[hidden email]> wrote:
Before I start a utility myself I want to ask if there is such a tool already.

I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.

If I look at the serializers I know the mechanics are simple:

1. take an object and look it up in a table.
2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.

If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.

One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.

Does anybody know if such a tool exists?

Norbert





Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

Stephan Eggermont-3
In reply to this post by NorbertHartl
Hello Norbert,

That sounds like a silly algorithm. There is no reason you should get a large stack size as
long as you don't try to do this more than one level deep only:

add object to the table
repeat
  find first unwritten object in the table
  write its contents:
    simple objects direct
    for a complex object:
      look it up in the table
        if it doesn't exist, add it to the table
        write its reference
  mark as written
until no unwritten objects in the table

Stephan


On 2 jun 2011, at 12:38, Norbert Hartl wrote:

> Before I start a utility myself I want to ask if there is such a tool already.
>
> I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.
>
> If I look at the serializers I know the mechanics are simple:
>
> 1. take an object and look it up in a table.
> 2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
> 3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
> 4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.
>
> If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.
>
> One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.
>
> Does anybody know if such a tool exists?
>
> Norbert
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] serializing object graphs / ordering graph anchors

NorbertHartl

Am 03.06.2011 um 20:29 schrieb Stephan Eggermont:

> Hello Norbert,
>
> That sounds like a silly algorithm. There is no reason you should get a large stack size as
> long as you don't try to do this more than one level deep only:
>
> add object to the table
> repeat
>  find first unwritten object in the table
>  write its contents:
>    simple objects direct
>    for a complex object:
>      look it up in the table
>        if it doesn't exist, add it to the table
>        write its reference
>  mark as written
> until no unwritten objects in the table
>
It depends on what you like to call stupid. If it comes to algorithms we are talking about trade-offs to choose. Basically I think the way Sixx is working is fine but it has some troubles in gemstone. My approach keeps the way it is working but resorts the problematic case.
Your approach is reversing referencing order. Thus preventing the ability to stream the result back in. You are exchanging backward references by forward references. This way you need to read in everything until you can reliably resolve references. With this approach you cannot utilize a pull parser which would enable you to read files of arbitrary size. This would like to prevent.
But actually I wasn't looking for the smartest algorithm. I was asking for a tool that already exists.

Norbert

>
>
> On 2 jun 2011, at 12:38, Norbert Hartl wrote:
>
>> Before I start a utility myself I want to ask if there is such a tool already.
>>
>> I'm using sixx for object graph serialization. It works quite good but it has some severe limits. Well, the limits are not really sixx dependent but a problem of usual serializers. My problem is the size of the stack such a tool generates. Especially in gemstone this appears to be a problem because the stack is limited per gem.
>>
>> If I look at the serializers I know the mechanics are simple:
>>
>> 1. take an object and look it up in a table.
>> 2. if the object is in the table it has already been serialized. We take a reference from the table and write it down
>> 3. if it isn't in the table it means the object has not been serialized and the contents need to be written. After that the object with a reference representation is written to the table
>> 4. while writing the content of an object (inst vars) I encounter simple objects (direct string representation) or complex objects (composite). In the latter case we start over from 1.
>>
>> If the object graph is well interconnected than the odds are high that a stack will grow high until the first reference can be written. With a small pier kernel I already matched a stack size of 1000.
>>
>> One solution I can think of is to collect and order objects in the graph that cover smaller segments from the overall graph. This way I would construct a sorted collection that is sorted by stack depth (lowest first). The collection will then be serialized leading to a much smaller possible stack sizes. It might be there are cases it can't work but I think for most cases it will do its job. And it will be slow of course.
>>
>> Does anybody know if such a tool exists?
>>
>> Norbert
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

NorbertHartl
In reply to this post by Jon Paynter-2

Am 03.06.2011 um 19:06 schrieb Jon Paynter:

> Lastly, since 10k is the max allowed there isnt any room to increase it further.

Looking at the 2.4 SysAdminGuide on p. 382 I read a maximum of 1000000.


Norbert

Reply | Threaded
Open this post in threaded view
|

Re: serializing object graphs / ordering graph anchors

Jon Paynter-2
On Tue, Jun 7, 2011 at 3:30 AM, Norbert Hartl <[hidden email]> wrote:

Am 03.06.2011 um 19:06 schrieb Jon Paynter:

> Lastly, since 10k is the max allowed there isnt any room to increase it further.

Looking at the 2.4 SysAdminGuide on p. 382 I read a maximum of 1000000.

hrm.. mabye they changed the max value then.
The comments in the config file I read says a range of 1000 to 10,000.