Smalltalk › Pharo › Pharo Smalltalk Users

[Data Modeling] approaches to data persistence

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

sergio_101

[Data Modeling] approaches to data persistence

Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

thanks!

----
peace,
sergio
photographer, journalist, visionary

Public Key: http://bit.ly/29z9fG0
#BitMessage BM-NBaswViL21xqgg9STRJjaJaUoyiNe2dV
http://www.Village-Buzz.com
http://www.ThoseOptimizeGuys.com
http://www.coffee-black.com
http://www.painlessfrugality.com
http://www.twitter.com/sergio_101
http://www.facebook.com/sergio101

signature.asc (852 bytes) Download Attachment

Ben Coman

Re: [Data Modeling] approaches to data persistence

On Tue, Feb 14, 2017 at 11:01 PM, sergio ruiz <[hidden email]> wrote:

Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

I was curious, so nothing better than to experiment...

myClass := Object subclass: #AA

instanceVariableNames: 'id data'

classVariableNames: ''

package: 'AAAA'.

myClass compile: 'id: i id:= i'.

myClass compile: 'data: d data:= d'.

N := 10 raisedTo: 7.

o := OrderedCollection new.

d := Dictionary new.

{ Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]].

Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]].

} inspect.

o := nil.

d := nil.

Smalltalk garbageCollect.

N=5 ==> "#(5 42)"

N=6 ==> "#(434 839)"

N=7 ==> "#(5733 17208)"

Slight modification to pre-allocate space to ignore dynamic growth cost...

o := OrderedCollection new: 2 * N.

d := Dictionary new: 2 * N.

N=5 ==> "#(7 33)"

N=6 ==> "#(411 802)"

N=7 ==> "#(5892 15141)"

cheers -ben

Ben Coman

Re: [Data Modeling] approaches to data persistence

On Wed, Feb 15, 2017 at 12:29 AM, Ben Coman <[hidden email]> wrote:

On Tue, Feb 14, 2017 at 11:01 PM, sergio ruiz <[hidden email]> wrote:
Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

I was curious, so nothing better than to experiment...

myClass := Object subclass: #AA
instanceVariableNames: 'id data'
classVariableNames: ''
package: 'AAAA'.
myClass compile: 'id: i id:= i'.
myClass compile: 'data: d data:= d'.

N := 10 raisedTo: 7.
o := OrderedCollection new.
d := Dictionary new.
{ Time millisecondsToRun: [
1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]].
Time millisecondsToRun: [
1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]].
} inspect.
o := nil.
d := nil.
Smalltalk garbageCollect.

N=5 ==> "#(5 42)"
N=6 ==> "#(434 839)"
N=7 ==> "#(5733 17208)"

Slight modification to pre-allocate space to ignore dynamic growth cost...
o := OrderedCollection new: 2 * N.
d := Dictionary new: 2 * N.

N=5 ==> "#(7 33)"
N=6 ==> "#(411 802)"
N=7 ==> "#(5892 15141)"

cheers -ben

Lets also bench Arrays, and be a nicer with cleaning up memory...

N := 10 raisedTo: 7.

a := Array new: 2 * N.

atime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| a at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

a := nil.

Smalltalk garbageCollect.

o := OrderedCollection new: 2 * N.

otime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]]) - Smalltalk vm totalGCTime.

o := nil.

Smalltalk garbageCollect.

d := Dictionary new: 2 * N.

dtime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

d := nil.

Smalltalk garbageCollect.

{atime. otime. dtime} inspect.

N=5 ==> "#(2 4 13)" "#(2 4 13)" "#(2 5 13)"

N=6 ==> "#(30 48 131)" "#(28 48 131)" "#(29 47 128)"

N=7 ==> "#(274 470 1313)" "#(259 456 1340)" "#(269 467 1306)"

So insertions into Dictionaries are

two to three times slower than OrderedCollection, and

five to six times slower than Arrays.

Now this is milliseconds, so even at the 100,000 level Dictionary performance

may be a reasonable tradeoff for other benefits.

cheers -ben

sergio_101

Re: [Data Modeling] approaches to data persistence

this is a GREAT answer.. a totally smalltalky answer!

i need to extend this test out to include randomly accessing a million records at a time…

On February 14, 2017 at 12:15:28 PM, Ben Coman ([hidden email]) wrote:

On Wed, Feb 15, 2017 at 12:29 AM, Ben Coman <[hidden email]> wrote:

On Tue, Feb 14, 2017 at 11:01 PM, sergio ruiz <[hidden email]> wrote:

Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

I was curious, so nothing better than to experiment...

myClass := Object subclass: #AA

instanceVariableNames: 'id data'

classVariableNames: ''

package: 'AAAA'.

myClass compile: 'id: i id:= i'.

myClass compile: 'data: d data:= d'.

N := 10 raisedTo: 7.

o := OrderedCollection new.

d := Dictionary new.

{ Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]].

Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]].

} inspect.

o := nil.

d := nil.

Smalltalk garbageCollect.

N=5 ==> "#(5 42)"

N=6 ==> "#(434 839)"

N=7 ==> "#(5733 17208)"

Slight modification to pre-allocate space to ignore dynamic growth cost...

o := OrderedCollection new: 2 * N.

d := Dictionary new: 2 * N.

N=5 ==> "#(7 33)"

N=6 ==> "#(411 802)"

N=7 ==> "#(5892 15141)"

cheers -ben

Lets also bench Arrays, and be a nicer with cleaning up memory...

N := 10 raisedTo: 7.

a := Array new: 2 * N.

atime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| a at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

a := nil.

Smalltalk garbageCollect.

o := OrderedCollection new: 2 * N.

otime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]]) - Smalltalk vm totalGCTime.

o := nil.

Smalltalk garbageCollect.

d := Dictionary new: 2 * N.

dtime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

d := nil.

Smalltalk garbageCollect.

{atime. otime. dtime} inspect.

N=5 ==> "#(2 4 13)" "#(2 4 13)" "#(2 5 13)"

N=6 ==> "#(30 48 131)" "#(28 48 131)" "#(29 47 128)"

N=7 ==> "#(274 470 1313)" "#(259 456 1340)" "#(269 467 1306)"

So insertions into Dictionaries are

two to three times slower than OrderedCollection, and

five to six times slower than Arrays.

Now this is milliseconds, so even at the 100,000 level Dictionary performance

may be a reasonable tradeoff for other benefits.

cheers -ben

----
peace,
sergio
photographer, journalist, visionary

Public Key: http://bit.ly/29z9fG0
#BitMessage BM-NBaswViL21xqgg9STRJjaJaUoyiNe2dV
http://www.Village-Buzz.com
http://www.ThoseOptimizeGuys.com
http://www.coffee-black.com
http://www.painlessfrugality.com
http://www.twitter.com/sergio_101
http://www.facebook.com/sergio101

signature.asc (852 bytes) Download Attachment

stepharong

Re: [Data Modeling] approaches to data persistence

In reply to this post by Ben Coman

On Tue, 14 Feb 2017 17:29:18 +0100, Ben Coman <[hidden email]> wrote:

On Tue, Feb 14, 2017 at 11:01 PM, sergio ruiz <[hidden email]> wrote:
Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

I was curious, so nothing better than to experiment...

I love your attitude!

"L'idee de l'experience ne remplace pas l'experience" Alain. :)

myClass := Object subclass: #AA
instanceVariableNames: 'id data'
classVariableNames: ''
package: 'AAAA'.
myClass compile: 'id: i id:= i'.
myClass compile: 'data: d data:= d'.

N := 10 raisedTo: 7.
o := OrderedCollection new.
d := Dictionary new.
{ Time millisecondsToRun: [
1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]].
Time millisecondsToRun: [
1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]].
} inspect.
o := nil.
d := nil.
Smalltalk garbageCollect.

N=5 ==> "#(5 42)"
N=6 ==> "#(434 839)"
N=7 ==> "#(5733 17208)"

Slight modification to pre-allocate space to ignore dynamic growth cost...
o := OrderedCollection new: 2 * N.
d := Dictionary new: 2 * N.

N=5 ==> "#(7 33)"
N=6 ==> "#(411 802)"
N=7 ==> "#(5892 15141)"

cheers -ben

Using Opera's mail client: http://www.opera.com/mail/

Dale Henrichs-3

Re: [Data Modeling] approaches to data persistence

In reply to this post by sergio_101

Sergio,

If you find that your data set grows large enough, keep in mind that you can port your application to GemStone/S[1]. GemStone provides a scalable solution for image-based persistence while preserving smalltalkiness:)

Dale

[1] https://gemtalksystems.com/small-business/gsdevkit/

On 02/14/2017 10:13 AM, sergio ruiz wrote:

this is a GREAT answer.. a totally smalltalky answer!

i need to extend this test out to include randomly accessing a million records at a time…

On February 14, 2017 at 12:15:28 PM, Ben Coman ([hidden email]) wrote:

On Wed, Feb 15, 2017 at 12:29 AM, Ben Coman <[hidden email]> wrote:

On Tue, Feb 14, 2017 at 11:01 PM, sergio ruiz <[hidden email]> wrote:

Hey, all..

I have been working on creating a REST interface using Teapot. In learning how to handle exceptions, I have been following along with the library example.

One of the things i noticed was that, in the library example, they are modeling that data a little differently than i have been..

to persist a list of items (and easily retrieve them), i just gave the object an “id”, and store them on a class variable as an OrderedCollection..

in the library example, I see something i really like. rather than saving an ordered collection, they save it as a dictionary.

This dictionary goes { id -> object }.. this takes the id out of the the object (which i really like) and makes the id generation pretty much irrelevant..

my question.. is there any performance hit either way once this list grows to tens of thousands of records?

I was curious, so nothing better than to experiment...

myClass := Object subclass: #AA

instanceVariableNames: 'id data'

classVariableNames: ''

package: 'AAAA'.

myClass compile: 'id: i id:= i'.

myClass compile: 'data: d data:= d'.

N := 10 raisedTo: 7.

o := OrderedCollection new.

d := Dictionary new.

{ Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]].

Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]].

} inspect.

o := nil.

d := nil.

Smalltalk garbageCollect.

N=5 ==> "#(5 42)"

N=6 ==> "#(434 839)"

N=7 ==> "#(5733 17208)"

Slight modification to pre-allocate space to ignore dynamic growth cost...

o := OrderedCollection new: 2 * N.

d := Dictionary new: 2 * N.

N=5 ==> "#(7 33)"

N=6 ==> "#(411 802)"

N=7 ==> "#(5892 15141)"

cheers -ben

Lets also bench Arrays, and be a nicer with cleaning up memory...

N := 10 raisedTo: 7.

a := Array new: 2 * N.

atime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| a at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

a := nil.

Smalltalk garbageCollect.

o := OrderedCollection new: 2 * N.

otime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| o add: (AA new id: id; data: 'blahblah')]]) - Smalltalk vm totalGCTime.

o := nil.

Smalltalk garbageCollect.

d := Dictionary new: 2 * N.

dtime := Smalltalk vm totalGCTime + (Time millisecondsToRun: [

1 to: N do: [:id| d at: id put: (AA new data: 'blahblah')]]) - Smalltalk vm totalGCTime.

d := nil.

Smalltalk garbageCollect.

{atime. otime. dtime} inspect.

N=5 ==> "#(2 4 13)" "#(2 4 13)" "#(2 5 13)"

N=6 ==> "#(30 48 131)" "#(28 48 131)" "#(29 47 128)"

N=7 ==> "#(274 470 1313)" "#(259 456 1340)" "#(269 467 1306)"

So insertions into Dictionaries are

two to three times slower than OrderedCollection, and

five to six times slower than Arrays.

Now this is milliseconds, so even at the 100,000 level Dictionary performance

may be a reasonable tradeoff for other benefits.

cheers -ben

----
peace,
sergio
photographer, journalist, visionary

Public Key: http://bit.ly/29z9fG0
#BitMessage BM-NBaswViL21xqgg9STRJjaJaUoyiNe2dV
http://www.Village-Buzz.com
http://www.ThoseOptimizeGuys.com
http://www.coffee-black.com
http://www.painlessfrugality.com
http://www.twitter.com/sergio_101
http://www.facebook.com/sergio101

sergio_101

Re: [Data Modeling] approaches to data persistence

Funny you should mention that.

i ended up removing all the voyage stuff, and just using objects.. as soon as i am ready to go live, i am going to use gemstones..

is there a quick and easy setup for heroku or digital ocean?

On February 15, 2017 at 12:15:48 PM, Dale Henrichs ([hidden email]) wrote:

If you find that your data set grows large enough, keep in mind that you can port your application to GemStone/S[1]. GemStone provides a scalable solution for image-based persistence while preserving smalltalkiness:)

----
peace,
sergio
photographer, journalist, visionary

Public Key: http://bit.ly/29z9fG0
#BitMessage BM-NBaswViL21xqgg9STRJjaJaUoyiNe2dV
http://www.Village-Buzz.com
http://www.ThoseOptimizeGuys.com
http://www.coffee-black.com
http://www.painlessfrugality.com
http://www.twitter.com/sergio_101
http://www.facebook.com/sergio101

signature.asc (852 bytes) Download Attachment

Dale Henrichs-3

Re: [Data Modeling] approaches to data persistence

On 02/16/2017 11:26 AM, sergio ruiz wrote:

Funny you should mention that.

i ended up removing all the voyage stuff, and just using objects.. as soon as i am ready to go live, i am going to use gemstones..

is there a quick and easy setup for heroku or digital ocean?

Probably not as simple as it could be, but GsDevKit_home[1] is pretty easy to install (MacOs, Ubuntu, Debian, and Centos installation available -- other Linux-based could be added).

Paul Debruicker has provided ansible playbooks[2] for installing GsDevKit_home, DaemonTools (gem monitoring), NginX and a few other useful utilities.

You mentioned that you are using Teapot for REST. Teapot itself hasn't been ported to GemStone, but Zinc _has_, so I assume it would not be a big chore to port Teapot to GemStone ... Ping me (the GLASS list[3] is best) when you are getting close and I'll help where I can ...

I'm pretty jammed up for the next month or so with GemStone 3.4 work, but after that I'll come up for air and start greasing squeaky wheels:) So that will be a good time to ping me:)

Dale

[1] https://github.com/GsDevKit/GsDevKit_home#open-source-development-kit-for-gemstones-64-bit-
[2] https://github.com/GsDevKit/GsDevKit_ansible#administer-gemstone--nginx-on-ubuntu
[3] http://forum.world.st/mailing_list/MailingListOptions.jtp?forum=1460844