Smalltalk › Gemtalk › GLASS

Solving some Garbage Collection issues

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

38 messages Options

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

On 07 Jan 2012, at 00:51, Stephan Eggermont wrote:

>
> On 6 jan 2012, at 21:01, Johan Brichau wrote:
>> The server where these stats were taken was not under any other kind of stress (no swapping, 10 of the 20Gb RAM free, no other disk-access process,… ). The iostat/vmstat output also showed that the disks (SAN) are operating at 100% throughput during the page reads.
>
> Is this SAN under your own control or a hosting providers? With hosting providers it is easy to lose a factor of 100 in throughput compared
> to locally connected disks.

We did some simple tests but they showed only marginal difference with a locally connected disk (see below). Local SAS storage reached 130 MB/s.
I think they are OK. Am I wrong?

[root@cl1 ~]# date; dd if=/dev/zero of=1GBfile.bin bs=1024 count=$((1024 * 1024)); dd if=/dev/zero of=2GBfile.bin bs=1024 count=$((1024 * 1024)); dd if=/dev/zero of=3GBfile.bin bs=1024 count=$((1024 * 1024)); date; rm -f *GBfile.bin; date;
Thu Oct 13 15:09:03 CEST 2011
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 10.4385 seconds, 103 MB/s
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 9.75272 seconds, 110 MB/s
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 10.3992 seconds, 103 MB/s
Thu Oct 13 15:09:33 CEST 2011
Thu Oct 13 15:09:58 CEST 2011

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Johan Brichau-2

Johan,

I think I've found confirmation that the PageReads are in fact being performed to repopulate your in-vm working set. And I probably don't need any additional statmon files at this point in time...

If you bring up a graph with the following stats for Topaz761-6 you'll see the evidence:

DataPageReads
ObjectsRead
ObjectsReadInBytes (with PerSecond filter)
PomGenSizeBytes

While these curves don't exactly match up, they are close enough to get an idea of what's going on. So lets look at the sample at 1:48:41 for each of the curves (corresponds to the max for each curve):

2,800 DataPageReads/sec peak
19,000 ObjectsRead/sec peak
2,000,000 ObectsReadInBytes/sec peak

22,000 DataPages read during PageRead episode
170,000 ObjectsRead during pageRead episode
16,000,000 ObjectsReadInBytes total for duration of PageRead episode
16,000,000 PomGenSizeBytes growth during PageRead episode (starts at 56 bytes)

Since PomGenSizeBytes starts at 56 bytes, it's pretty clear that PomGen was flushed in that gem. Also since you read 16M bytes of objects and PomGen grows by 16M bytes it's fairly clear that you're loading up the working set of POM objects for your gem.

The stats for Topaz 10-7 in the "statmonit20120102" file have similar numbers.

You've got 31,250 frames in your cache and you're reading 22,000 DataPages during the PageRead episode. That's a significant percentage of the total cache. Doing an MFC ends up running the whole 6G repository through the SPC so it's not a surprise that your gems end up hitting disk.

We can also see that you are averaging about 8 objects per data page (170k/22k) and that your objects are averaging about 100 bytes/object. In theory you can get 160 100 bytes objects per data page, so IF you could pack the objects you are reading onto data pages you could reduce the number of page reads by 20 times.... GemStone has a featuring for clustering data on pages for just this reason, so once you've understood what is being read you might consider taking a look at clustering to reduce the number of page reads required to prime the working set...

Earlier I said that one couldn't trace the objects that are loaded into a vm, but I spoke too soon. If you take a look at System class>>_enableTraceObjectsRead, you'll see that you can create a hidden set that includes all of the POM objects faulted into memory. Read the comment in the method for more information. Also take a look at the methods in the 'Hidden Set Support' protocol for System class as there are a number of options for pulling the objects from a hidden set ...

An alternative is to use System class>_vmInstanceCounts:. This method returns an Array of Associations containing Class->(instanceCount, byteSize). No comment in that method, but you pass in an integer to define for which generation you want the instance counts:

0 = all generations
1 = new gen only
2 = pom gen only
3 = old only
4 = perm only
5 = code only

You'll want to use `2`. Presumably this is the most interesting info for you as it is a summary of the objects in POM and it is already in a form that could be popped into the object log for your analysis. You'll want to fire this puppy off as close the the end of processing the first http request to the gem as possible to get a good read on what's being loaded into POM while you have the high rate of data page reads.

I would think that the same sort of priming the internal working set goes on when you fire up a fresh gem and hit it with a request, so you might even be able to trigger this operation in a manual topaz session ... You could look at the signature in statmon to see if this theory is true...

There are a couple of things that you could consider doing to address this issue:

- cut back on the setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
a bit to reduce the refresh hit
- use a "cache warmer" gem to periodically "refresh the working set"
to keep the data pages "hot" in the cache while the MFC
is proceeding
- re-visit your algorithm to see if there might be some inefficiencies
in the working set (scanning large collections, etc.)
- do some clustering to increase the density of the objects used in the
working set on the data pages ... of course if the collections are
pretty dynamic, the clustering improvements may not last long...
- increase the SPC, but since your repository is larger than 2G you might
not see improvements
- run your MFC at non-peak times and use a "cache warmer" gem to reload
the data pages into the cache before peak time starts ...

Dale
----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, January 6, 2012 1:30:56 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
|
|
|
| On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
|
|
| Would it be possible to send the statmon files immediately preceding
| the ones you sent. I'd like to be able to see what's going on
| between the tail end of the MFC and the high PageRead episode.
|
| Unfortunately, I don't have any stats for those. I always start
| statmonitor manually.
|
|
| I started a monitor now so I can include that entire timeperiod in
| there.
|
|
| I'll be back… ;-)
|
|
| Johan

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

Dale,

Thanks a lot for that detailed analysis.

The strange thing about this is that when I stop the entire stone and fire it back up, the gems are responding very fast. I would expect that the working set would have to be loaded into the gems as well at that time and take equally long? I did some tests this morning (some with cache warming and some not) and the app responded very fast.

So, I would conclude that some application condition might be triggering the load of so many objects? The tracing methods seem promising to allow me to find out what is happening.

thanks! I hope I can come back soon with some information that might benefit others as well.
Johan

On 07 Jan 2012, at 01:45, Dale Henrichs wrote:

> Johan,
>
> I think I've found confirmation that the PageReads are in fact being performed to repopulate your in-vm working set. And I probably don't need any additional statmon files at this point in time...
>
> If you bring up a graph with the following stats for Topaz761-6 you'll see the evidence:
>
> DataPageReads
> ObjectsRead
> ObjectsReadInBytes (with PerSecond filter)
> PomGenSizeBytes
>
> While these curves don't exactly match up, they are close enough to get an idea of what's going on. So lets look at the sample at 1:48:41 for each of the curves (corresponds to the max for each curve):
>
> 2,800 DataPageReads/sec peak
> 19,000 ObjectsRead/sec peak
> 2,000,000 ObectsReadInBytes/sec peak
>
> 22,000 DataPages read during PageRead episode
> 170,000 ObjectsRead during pageRead episode
> 16,000,000 ObjectsReadInBytes total for duration of PageRead episode
> 16,000,000 PomGenSizeBytes growth during PageRead episode (starts at 56 bytes)
>
> Since PomGenSizeBytes starts at 56 bytes, it's pretty clear that PomGen was flushed in that gem. Also since you read 16M bytes of objects and PomGen grows by 16M bytes it's fairly clear that you're loading up the working set of POM objects for your gem.
>
> The stats for Topaz 10-7 in the "statmonit20120102" file have similar numbers.
>
> You've got 31,250 frames in your cache and you're reading 22,000 DataPages during the PageRead episode. That's a significant percentage of the total cache. Doing an MFC ends up running the whole 6G repository through the SPC so it's not a surprise that your gems end up hitting disk.
>
> We can also see that you are averaging about 8 objects per data page (170k/22k) and that your objects are averaging about 100 bytes/object. In theory you can get 160 100 bytes objects per data page, so IF you could pack the objects you are reading onto data pages you could reduce the number of page reads by 20 times.... GemStone has a featuring for clustering data on pages for just this reason, so once you've understood what is being read you might consider taking a look at clustering to reduce the number of page reads required to prime the working set...
>
> Earlier I said that one couldn't trace the objects that are loaded into a vm, but I spoke too soon. If you take a look at System class>>_enableTraceObjectsRead, you'll see that you can create a hidden set that includes all of the POM objects faulted into memory. Read the comment in the method for more information. Also take a look at the methods in the 'Hidden Set Support' protocol for System class as there are a number of options for pulling the objects from a hidden set ...
>
> An alternative is to use System class>_vmInstanceCounts:. This method returns an Array of Associations containing Class->(instanceCount, byteSize). No comment in that method, but you pass in an integer to define for which generation you want the instance counts:
>
> 0 = all generations
> 1 = new gen only
> 2 = pom gen only
> 3 = old only
> 4 = perm only
> 5 = code only
>
> You'll want to use `2`. Presumably this is the most interesting info for you as it is a summary of the objects in POM and it is already in a form that could be popped into the object log for your analysis. You'll want to fire this puppy off as close the the end of processing the first http request to the gem as possible to get a good read on what's being loaded into POM while you have the high rate of data page reads.
>
> I would think that the same sort of priming the internal working set goes on when you fire up a fresh gem and hit it with a request, so you might even be able to trigger this operation in a manual topaz session ... You could look at the signature in statmon to see if this theory is true...
>
> There are a couple of things that you could consider doing to address this issue:
>
> - cut back on the setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
> a bit to reduce the refresh hit
> - use a "cache warmer" gem to periodically "refresh the working set"
> to keep the data pages "hot" in the cache while the MFC
> is proceeding
> - re-visit your algorithm to see if there might be some inefficiencies
> in the working set (scanning large collections, etc.)
> - do some clustering to increase the density of the objects used in the
> working set on the data pages ... of course if the collections are
> pretty dynamic, the clustering improvements may not last long...
> - increase the SPC, but since your repository is larger than 2G you might
> not see improvements
> - run your MFC at non-peak times and use a "cache warmer" gem to reload
> the data pages into the cache before peak time starts ...
>
> Dale
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Friday, January 6, 2012 1:30:56 PM
> | Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
> |
> |
> |
> |
> | On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
> |
> |
> | Would it be possible to send the statmon files immediately preceding
> | the ones you sent. I'd like to be able to see what's going on
> | between the tail end of the MFC and the high PageRead episode.
> |
> | Unfortunately, I don't have any stats for those. I always start
> | statmonitor manually.
> |
> |
> | I started a monitor now so I can include that entire timeperiod in
> | there.
> |
> |
> | I'll be back… ;-)
> |
> |
> | Johan

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

Dale,

Getting the objects using the _vmInstanceCount: method has already proven very insightful:
Roughly 80% of the total byte size is consumed by objects of our data model _and_ roughly the entire set of instances gets loaded, while it should only be a fraction.

Now, I have to find out why ;-)

Johan

On 07 Jan 2012, at 02:17, Dale Henrichs wrote:

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Dale Henrichs

Hi Dale,

Using the _vmInstanceCounts method, I found at least one reason why so many objects are loaded.
Although it also happens in the cold startup phase, it at least is one of the causes of the page reads.

It seems that the equality indexes on a certain collection are not working properly in the context of a certain select block.

In the method quoted below, the variable 'aCollection' is a Set where we defined 2 equality indexes on the 'start' and 'end' variables (of type DateAndTime) of the containing objects.
The select block returns all objects that have an overlap with the requested time period (start to end). For some reason, it ends up doing a sequential search where all objects of the collection get loaded in the gem's pom.

findAllBetween: startDateAndTime and: endDateAndTime
^ aCollection select:{:entry | (entry.start <= endDateAndTime) & (entry.end >= startDateAndTime)}

The question I have is: is there any reason that this select block seems to revert to a sequential search? And is there a way to get notified because it seems to happen silently?

In contrast, the following method correctly employs the index (on the same collection) and only a fraction of the objects in the collection get loaded into the pom when it is executed.

findAllAfter: aDateAndTime
^ aCollection select:{:entry | entry.start >= aDateAndTime}

This is the definition of the indexes:

aCollection createEqualityIndexOn:'start' withLastElementClass: DateAndTime.
aCollection createEqualityIndexOn:'end' withLastElementClass: DateAndTime.

Johan

On 07 Jan 2012, at 02:17, Dale Henrichs wrote:

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

Good catch, I don't think that it is exactly a bug... I will need to look at the code in detail to verify my assumption, but if I'm not mistaken the query expression:

(entry.start <= endDateAndTime) & (entry.end >= startDateAndTime)

creates two result sets on for each sub-expression ... therefore if endDateAndTime and startDateAndTime are close to each other, you do end up with a traversal of the whole result set ... I will agree that it would better if it didn't:)

There is some fairly convoluted logic involved in the evaluation of a query with sub-expressions, so I will read some code and see if there is a way to perform this query efficiently or if indeed there is a bug ...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Monday, January 9, 2012 3:04:38 AM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Hi Dale,
|
| Using the _vmInstanceCounts method, I found at least one reason why
| so many objects are loaded.
| Although it also happens in the cold startup phase, it at least is
| one of the causes of the page reads.
|
| It seems that the equality indexes on a certain collection are not
| working properly in the context of a certain select block.
|
| In the method quoted below, the variable 'aCollection' is a Set where
| we defined 2 equality indexes on the 'start' and 'end' variables (of
| type DateAndTime) of the containing objects.
| The select block returns all objects that have an overlap with the
| requested time period (start to end). For some reason, it ends up
| doing a sequential search where all objects of the collection get
| loaded in the gem's pom.
|
| findAllBetween: startDateAndTime and: endDateAndTime
| ^ aCollection select:{:entry | (entry.start <= endDateAndTime) &
| (entry.end >= startDateAndTime)}
|
| The question I have is: is there any reason that this select block
| seems to revert to a sequential search? And is there a way to get
| notified because it seems to happen silently?
|
| In contrast, the following method correctly employs the index (on the
| same collection) and only a fraction of the objects in the
| collection get loaded into the pom when it is executed.
|
| findAllAfter: aDateAndTime
| ^ aCollection select:{:entry | entry.start >= aDateAndTime}
|
| This is the definition of the indexes:
|
| aCollection createEqualityIndexOn:'start' withLastElementClass:
| DateAndTime.
| aCollection createEqualityIndexOn:'end' withLastElementClass:
| DateAndTime.
|
|
| Johan
|
| On 07 Jan 2012, at 02:17, Dale Henrichs wrote:
|
| > Johan,
| >
| > That's interesting info about the lack of delay at startup ... it
| > means that you'll have to work harder to trigger the condition and
| > collect data on it ... 170,000 POM objects to respond to a request
| > seems like a large number and it is if that many objects aren't
| > needed in the standard cold startup case ...
| >
| > I'll poke around a little bit more on Monday to see if there's
| > anything else in the statmon files that point to other kinds of
| > problems ...
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Johan Brichau" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Friday, January 6, 2012 5:05:29 PM
| > | Subject: Re: [GS/SS Beta] Is there a way to trace what objects
| > | are being read from disk?
| > |
| > | Dale,
| > |
| > | Thanks a lot for that detailed analysis.
| > |
| > | The strange thing about this is that when I stop the entire stone
| > | and
| > | fire it back up, the gems are responding very fast. I would
| > | expect
| > | that the working set would have to be loaded into the gems as
| > | well
| > | at that time and take equally long? I did some tests this morning
| > | (some with cache warming and some not) and the app responded very
| > | fast.
| > |
| > | So, I would conclude that some application condition might be
| > | triggering the load of so many objects? The tracing methods seem
| > | promising to allow me to find out what is happening.
| > |
| > | thanks! I hope I can come back soon with some information that
| > | might
| > | benefit others as well.
| > | Johan
| > |
| > | On 07 Jan 2012, at 01:45, Dale Henrichs wrote:
| > |
| > | > Johan,
| > | >
| > | > I think I've found confirmation that the PageReads are in fact
| > | > being performed to repopulate your in-vm working set. And I
| > | > probably don't need any additional statmon files at this point
| > | > in
| > | > time...
| > | >
| > | > If you bring up a graph with the following stats for Topaz761-6
| > | > you'll see the evidence:
| > | >
| > | > DataPageReads
| > | > ObjectsRead
| > | > ObjectsReadInBytes (with PerSecond filter)
| > | > PomGenSizeBytes
| > | >
| > | > While these curves don't exactly match up, they are close
| > | > enough to
| > | > get an idea of what's going on. So lets look at the sample at
| > | > 1:48:41 for each of the curves (corresponds to the max for each
| > | > curve):
| > | >
| > | > 2,800 DataPageReads/sec peak
| > | > 19,000 ObjectsRead/sec peak
| > | > 2,000,000 ObectsReadInBytes/sec peak
| > | >
| > | > 22,000 DataPages read during PageRead episode
| > | > 170,000 ObjectsRead during pageRead episode
| > | > 16,000,000 ObjectsReadInBytes total for duration of PageRead
| > | > episode
| > | > 16,000,000 PomGenSizeBytes growth during PageRead episode
| > | > (starts
| > | > at 56 bytes)
| > | >
| > | > Since PomGenSizeBytes starts at 56 bytes, it's pretty clear
| > | > that
| > | > PomGen was flushed in that gem. Also since you read 16M bytes
| > | > of
| > | > objects and PomGen grows by 16M bytes it's fairly clear that
| > | > you're loading up the working set of POM objects for your gem.
| > | >
| > | > The stats for Topaz 10-7 in the "statmonit20120102" file have
| > | > similar numbers.
| > | >
| > | > You've got 31,250 frames in your cache and you're reading
| > | > 22,000
| > | > DataPages during the PageRead episode. That's a significant
| > | > percentage of the total cache. Doing an MFC ends up running the
| > | > whole 6G repository through the SPC so it's not a surprise that
| > | > your gems end up hitting disk.
| > | >
| > | > We can also see that you are averaging about 8 objects per data
| > | > page (170k/22k) and that your objects are averaging about 100
| > | > bytes/object. In theory you can get 160 100 bytes objects per
| > | > data
| > | > page, so IF you could pack the objects you are reading onto
| > | > data
| > | > pages you could reduce the number of page reads by 20 times....
| > | > GemStone has a featuring for clustering data on pages for just
| > | > this reason, so once you've understood what is being read you
| > | > might consider taking a look at clustering to reduce the number
| > | > of
| > | > page reads required to prime the working set...
| > | >
| > | > Earlier I said that one couldn't trace the objects that are
| > | > loaded
| > | > into a vm, but I spoke too soon. If you take a look at System
| > | > class>>_enableTraceObjectsRead, you'll see that you can create
| > | > a
| > | > hidden set that includes all of the POM objects faulted into
| > | > memory. Read the comment in the method for more information.
| > | > Also
| > | > take a look at the methods in the 'Hidden Set Support' protocol
| > | > for System class as there are a number of options for pulling
| > | > the
| > | > objects from a hidden set ...
| > | >
| > | > An alternative is to use System class>_vmInstanceCounts:. This
| > | > method returns an Array of Associations containing
| > | > Class->(instanceCount, byteSize). No comment in that method,
| > | > but
| > | > you pass in an integer to define for which generation you want
| > | > the
| > | > instance counts:
| > | >
| > | > 0 = all generations
| > | > 1 = new gen only
| > | > 2 = pom gen only
| > | > 3 = old only
| > | > 4 = perm only
| > | > 5 = code only
| > | >
| > | > You'll want to use `2`. Presumably this is the most interesting
| > | > info for you as it is a summary of the objects in POM and it is
| > | > already in a form that could be popped into the object log for
| > | > your analysis. You'll want to fire this puppy off as close the
| > | > the
| > | > end of processing the first http request to the gem as possible
| > | > to
| > | > get a good read on what's being loaded into POM while you have
| > | > the
| > | > high rate of data page reads.
| > | >
| > | > I would think that the same sort of priming the internal
| > | > working
| > | > set goes on when you fire up a fresh gem and hit it with a
| > | > request, so you might even be able to trigger this operation in
| > | > a
| > | > manual topaz session ... You could look at the signature in
| > | > statmon to see if this theory is true...
| > | >
| > | > There are a couple of things that you could consider doing to
| > | > address this issue:
| > | >
| > | > - cut back on the setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
| > | > a bit to reduce the refresh hit
| > | > - use a "cache warmer" gem to periodically "refresh the
| > | > working
| > | > set"
| > | > to keep the data pages "hot" in the cache while the MFC
| > | > is proceeding
| > | > - re-visit your algorithm to see if there might be some
| > | > inefficiencies
| > | > in the working set (scanning large collections, etc.)
| > | > - do some clustering to increase the density of the objects
| > | > used
| > | > in the
| > | > working set on the data pages ... of course if the
| > | > collections
| > | > are
| > | > pretty dynamic, the clustering improvements may not last
| > | > long...
| > | > - increase the SPC, but since your repository is larger than
| > | > 2G
| > | > you might
| > | > not see improvements
| > | > - run your MFC at non-peak times and use a "cache warmer" gem
| > | > to
| > | > reload
| > | > the data pages into the cache before peak time starts ...
| > | >
| > | > Dale
| > | > ----- Original Message -----
| > | > | From: "Johan Brichau" <[hidden email]>
| > | > | To: "GemStone Seaside beta discussion"
| > | > | <[hidden email]>
| > | > | Sent: Friday, January 6, 2012 1:30:56 PM
| > | > | Subject: Re: [GS/SS Beta] Is there a way to trace what
| > | > | objects
| > | > | are being read from disk?
| > | > |
| > | > |
| > | > |
| > | > |
| > | > | On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
| > | > |
| > | > |
| > | > | Would it be possible to send the statmon files immediately
| > | > | preceding
| > | > | the ones you sent. I'd like to be able to see what's going on
| > | > | between the tail end of the MFC and the high PageRead
| > | > | episode.
| > | > |
| > | > | Unfortunately, I don't have any stats for those. I always
| > | > | start
| > | > | statmonitor manually.
| > | > |
| > | > |
| > | > | I started a monitor now so I can include that entire
| > | > | timeperiod
| > | > | in
| > | > | there.
| > | > |
| > | > |
| > | > | I'll be back… ;-)
| > | > |
| > | > |
| > | > | Johan
| > |
| > |
|
|

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Johan Brichau-2

Johan,

I'll send mail tomorrow with a more detailed analysis (I've run out of time this evening). For best performance, you should try using #selectAsStream: as follows:

result := OrderedCollection new.
stream := nsc selectAsStream: {:entry | entry.end >= startDateAndTime }.
[stream atEnd not and: [
entry := stream next.
(entry start <= endDateAndTime)]] whileTrue: [
result add: entry ].

The short explanation is that you use #selectAsStream: to reach the first entry that could possibly satisfy the query (assuming that end > start) and then traverse each entry until the start date is beyond the end date ... this gives you the minimum number of objects traversed ... Of course this is what the query evaluator should do, but our optimization algorithm isn't smart enough ...

Like I said I'll send more information tomorrow ... If you get a chance, run some tests yourself to verify that this approach improves the page reads (I haven't done a complete analysis quite yet)...

Dale
----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Monday, January 9, 2012 3:04:38 AM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Hi Dale,
|
| Using the _vmInstanceCounts method, I found at least one reason why
| so many objects are loaded.
| Although it also happens in the cold startup phase, it at least is
| one of the causes of the page reads.
|
| It seems that the equality indexes on a certain collection are not
| working properly in the context of a certain select block.
|
| In the method quoted below, the variable 'aCollection' is a Set where
| we defined 2 equality indexes on the 'start' and 'end' variables (of
| type DateAndTime) of the containing objects.
| The select block returns all objects that have an overlap with the
| requested time period (start to end). For some reason, it ends up
| doing a sequential search where all objects of the collection get
| loaded in the gem's pom.
|
| findAllBetween: startDateAndTime and: endDateAndTime
| ^ aCollection select:{:entry | (entry.start <= endDateAndTime) &
| (entry.end >= startDateAndTime)}
|
| The question I have is: is there any reason that this select block
| seems to revert to a sequential search? And is there a way to get
| notified because it seems to happen silently?
|
| In contrast, the following method correctly employs the index (on the
| same collection) and only a fraction of the objects in the
| collection get loaded into the pom when it is executed.
|
| findAllAfter: aDateAndTime
| ^ aCollection select:{:entry | entry.start >= aDateAndTime}
|
| This is the definition of the indexes:
|
| aCollection createEqualityIndexOn:'start' withLastElementClass:
| DateAndTime.
| aCollection createEqualityIndexOn:'end' withLastElementClass:
| DateAndTime.
|
|
| Johan
|
| On 07 Jan 2012, at 02:17, Dale Henrichs wrote:
|
| > Johan,
| >
| > That's interesting info about the lack of delay at startup ... it
| > means that you'll have to work harder to trigger the condition and
| > collect data on it ... 170,000 POM objects to respond to a request
| > seems like a large number and it is if that many objects aren't
| > needed in the standard cold startup case ...
| >
| > I'll poke around a little bit more on Monday to see if there's
| > anything else in the statmon files that point to other kinds of
| > problems ...
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Johan Brichau" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Friday, January 6, 2012 5:05:29 PM
| > | Subject: Re: [GS/SS Beta] Is there a way to trace what objects
| > | are being read from disk?
| > |
| > | Dale,
| > |
| > | Thanks a lot for that detailed analysis.
| > |
| > | The strange thing about this is that when I stop the entire stone
| > | and
| > | fire it back up, the gems are responding very fast. I would
| > | expect
| > | that the working set would have to be loaded into the gems as
| > | well
| > | at that time and take equally long? I did some tests this morning
| > | (some with cache warming and some not) and the app responded very
| > | fast.
| > |
| > | So, I would conclude that some application condition might be
| > | triggering the load of so many objects? The tracing methods seem
| > | promising to allow me to find out what is happening.
| > |
| > | thanks! I hope I can come back soon with some information that
| > | might
| > | benefit others as well.
| > | Johan
| > |
| > | On 07 Jan 2012, at 01:45, Dale Henrichs wrote:
| > |
| > | > Johan,
| > | >
| > | > I think I've found confirmation that the PageReads are in fact
| > | > being performed to repopulate your in-vm working set. And I
| > | > probably don't need any additional statmon files at this point
| > | > in
| > | > time...
| > | >
| > | > If you bring up a graph with the following stats for Topaz761-6
| > | > you'll see the evidence:
| > | >
| > | > DataPageReads
| > | > ObjectsRead
| > | > ObjectsReadInBytes (with PerSecond filter)
| > | > PomGenSizeBytes
| > | >
| > | > While these curves don't exactly match up, they are close
| > | > enough to
| > | > get an idea of what's going on. So lets look at the sample at
| > | > 1:48:41 for each of the curves (corresponds to the max for each
| > | > curve):
| > | >
| > | > 2,800 DataPageReads/sec peak
| > | > 19,000 ObjectsRead/sec peak
| > | > 2,000,000 ObectsReadInBytes/sec peak
| > | >
| > | > 22,000 DataPages read during PageRead episode
| > | > 170,000 ObjectsRead during pageRead episode
| > | > 16,000,000 ObjectsReadInBytes total for duration of PageRead
| > | > episode
| > | > 16,000,000 PomGenSizeBytes growth during PageRead episode
| > | > (starts
| > | > at 56 bytes)
| > | >
| > | > Since PomGenSizeBytes starts at 56 bytes, it's pretty clear
| > | > that
| > | > PomGen was flushed in that gem. Also since you read 16M bytes
| > | > of
| > | > objects and PomGen grows by 16M bytes it's fairly clear that
| > | > you're loading up the working set of POM objects for your gem.
| > | >
| > | > The stats for Topaz 10-7 in the "statmonit20120102" file have
| > | > similar numbers.
| > | >
| > | > You've got 31,250 frames in your cache and you're reading
| > | > 22,000
| > | > DataPages during the PageRead episode. That's a significant
| > | > percentage of the total cache. Doing an MFC ends up running the
| > | > whole 6G repository through the SPC so it's not a surprise that
| > | > your gems end up hitting disk.
| > | >
| > | > We can also see that you are averaging about 8 objects per data
| > | > page (170k/22k) and that your objects are averaging about 100
| > | > bytes/object. In theory you can get 160 100 bytes objects per
| > | > data
| > | > page, so IF you could pack the objects you are reading onto
| > | > data
| > | > pages you could reduce the number of page reads by 20 times....
| > | > GemStone has a featuring for clustering data on pages for just
| > | > this reason, so once you've understood what is being read you
| > | > might consider taking a look at clustering to reduce the number
| > | > of
| > | > page reads required to prime the working set...
| > | >
| > | > Earlier I said that one couldn't trace the objects that are
| > | > loaded
| > | > into a vm, but I spoke too soon. If you take a look at System
| > | > class>>_enableTraceObjectsRead, you'll see that you can create
| > | > a
| > | > hidden set that includes all of the POM objects faulted into
| > | > memory. Read the comment in the method for more information.
| > | > Also
| > | > take a look at the methods in the 'Hidden Set Support' protocol
| > | > for System class as there are a number of options for pulling
| > | > the
| > | > objects from a hidden set ...
| > | >
| > | > An alternative is to use System class>_vmInstanceCounts:. This
| > | > method returns an Array of Associations containing
| > | > Class->(instanceCount, byteSize). No comment in that method,
| > | > but
| > | > you pass in an integer to define for which generation you want
| > | > the
| > | > instance counts:
| > | >
| > | > 0 = all generations
| > | > 1 = new gen only
| > | > 2 = pom gen only
| > | > 3 = old only
| > | > 4 = perm only
| > | > 5 = code only
| > | >
| > | > You'll want to use `2`. Presumably this is the most interesting
| > | > info for you as it is a summary of the objects in POM and it is
| > | > already in a form that could be popped into the object log for
| > | > your analysis. You'll want to fire this puppy off as close the
| > | > the
| > | > end of processing the first http request to the gem as possible
| > | > to
| > | > get a good read on what's being loaded into POM while you have
| > | > the
| > | > high rate of data page reads.
| > | >
| > | > I would think that the same sort of priming the internal
| > | > working
| > | > set goes on when you fire up a fresh gem and hit it with a
| > | > request, so you might even be able to trigger this operation in
| > | > a
| > | > manual topaz session ... You could look at the signature in
| > | > statmon to see if this theory is true...
| > | >
| > | > There are a couple of things that you could consider doing to
| > | > address this issue:
| > | >
| > | > - cut back on the setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
| > | > a bit to reduce the refresh hit
| > | > - use a "cache warmer" gem to periodically "refresh the
| > | > working
| > | > set"
| > | > to keep the data pages "hot" in the cache while the MFC
| > | > is proceeding
| > | > - re-visit your algorithm to see if there might be some
| > | > inefficiencies
| > | > in the working set (scanning large collections, etc.)
| > | > - do some clustering to increase the density of the objects
| > | > used
| > | > in the
| > | > working set on the data pages ... of course if the
| > | > collections
| > | > are
| > | > pretty dynamic, the clustering improvements may not last
| > | > long...
| > | > - increase the SPC, but since your repository is larger than
| > | > 2G
| > | > you might
| > | > not see improvements
| > | > - run your MFC at non-peak times and use a "cache warmer" gem
| > | > to
| > | > reload
| > | > the data pages into the cache before peak time starts ...
| > | >
| > | > Dale
| > | > ----- Original Message -----
| > | > | From: "Johan Brichau" <[hidden email]>
| > | > | To: "GemStone Seaside beta discussion"
| > | > | <[hidden email]>
| > | > | Sent: Friday, January 6, 2012 1:30:56 PM
| > | > | Subject: Re: [GS/SS Beta] Is there a way to trace what
| > | > | objects
| > | > | are being read from disk?
| > | > |
| > | > |
| > | > |
| > | > |
| > | > | On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
| > | > |
| > | > |
| > | > | Would it be possible to send the statmon files immediately
| > | > | preceding
| > | > | the ones you sent. I'd like to be able to see what's going on
| > | > | between the tail end of the MFC and the high PageRead
| > | > | episode.
| > | > |
| > | > | Unfortunately, I don't have any stats for those. I always
| > | > | start
| > | > | statmonitor manually.
| > | > |
| > | > |
| > | > | I started a monitor now so I can include that entire
| > | > | timeperiod
| > | > | in
| > | > | there.
| > | > |
| > | > |
| > | > | I'll be back… ;-)
| > | > |
| > | > |
| > | > | Johan
| > |
| > |
|
|

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

Dale,

Seems like there is more to Gemstone querying than what can be read in the manual ;-)
Thanks for that code snippet. I did not know about selectAsStream: and it's implication.

I have found a number of different reasons why objects get loaded that should not. One of those was the query we are talking about.
Now that I can trace which objects get loaded, it's easier to fix the actual read problem.

Preliminary results show that things improve as I fix the problems.
But I'm still puzzled by why the large page reads happen only at seemingly random moments. They get shorter but they are still there.

Johan

On 10 Jan 2012, at 06:49, Dale Henrichs wrote:

> Like I said I'll send more information tomorrow ... If you get a chance, run some tests yourself to verify that this approach improves the page reads (I haven't done a complete analysis quite yet)...

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

Johan,

DataReads are happening because the objects aren't in the vms head and the data pages aren't already in the cache. If a large chunk of the indexed collection is already in the vm's memory then only the "missing objects" need to be faulted in and if a number of the needed DataPages are already in the SPC then only the "missing DataPages" need to be read ...

It's worth keeping an eye on the ObjectsRead stat as that gives you a measure of the objects faulted in with or without a DataPage read ...

Over time, the POM generations are slowly flushed from the vm's memory, and this means that old POM objects are flushed without an MFC event .... The objects that make up a collection and it's indexes get flushed when the collection/index are modified and will only be faulted in on demand ...

Since the whole repository doesn't fit in the SPC, DataPages are preempted by the FreeFrame servers to ensure that there are plenty of FreeFrames and the algorithm tends to preempt pages that haven't been accessed recently ...

I am curious whether the query always involves the same start/end date range... different date ranges could cause different parts of the index data structure to be faulted in ...

Finally, there are some trigger points in the query execution algorithm where a different algorithm path is taken based on the size of the result set ... If your queries happen to fall into this range you could see different behavior depending upon a slight change in the size of the result set ... This is the one area that I want to characterize so that I can give you some practical advice...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, January 10, 2012 6:28:41 AM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Dale,
|
| Seems like there is more to Gemstone querying than what can be read
| in the manual ;-)
| Thanks for that code snippet. I did not know about selectAsStream:
| and it's implication.
|
| I have found a number of different reasons why objects get loaded
| that should not. One of those was the query we are talking about.
| Now that I can trace which objects get loaded, it's easier to fix the
| actual read problem.
|
| Preliminary results show that things improve as I fix the problems.
| But I'm still puzzled by why the large page reads happen only at
| seemingly random moments. They get shorter but they are still there.
|
| Johan
|
| On 10 Jan 2012, at 06:49, Dale Henrichs wrote:
|
| > Like I said I'll send more information tomorrow ... If you get a
| > chance, run some tests yourself to verify that this approach
| > improves the page reads (I haven't done a complete analysis quite
| > yet)...
|
|

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Dale Henrichs

Johan,

Here's my analysis...Basically I'll be describing the job that is performed the by the QueryExecuter class which handles the evaluation of `{}` select/detect/reject blocks. I'll also assume that each of the paths involved in the expression has an equality index defined for it.

If there is a single predicate in the expression like the following:

nsc select:{:entry | (entry.start <= endDateAndTime) }

Then the predicate is evaluated using the btree for the appropriate index. The btree is walked applying the predicate until the first element is found that satisfies the predicate. This involves faulting in BTree pages (DataPages) and the DataPages for the entries themselves as they are compared (if the value is encoded in the BTree then the entry DataPages aren't faulted in). Once an entry is found that satisfies the predicate a result set is created and the entries are added to the result set (traversing BTree pages) until the predicate is no longer satisfied. This is the ideal case, since basically only the objects that satisfy the query are faulted into the vm and you can avoid traversing the whole collection.

There are for different types of predicates:

1. constant-constant (such as 3 > 2)
2. path-constant (such as 'spouse.age' > 30)
3. constant-path (such as 30 <= 'spouse.age')
4. path-path (such as 'spouse.age' < 'spouse.weight')"

constant-constant predicates return whole collection (virtually no DataPage access) or empty collection for select and reject and an element (minimal DataPage access) or no elements for detect.

path-path predicates end up traversing the entire collection, so indexes don't necessarily provide a lot of advantage for these types of queries.

For the remainder of the discussion we'll just cover the case where multiple path-constant and constant-path predicates (identical except for order) are involved in a query, like the query in your original example:

nsc select:{:entry | (entry.start <= endDateAndTime) &
(entry.end >= startDateAndTime)}

If two of the predicates are doing a range style test on the same path term, then those two predicates are reduced to a single predicate that is evaluated in a single pass. This works for an inclusive range `constant1 > path term < constant2` or an exclusive range `constant1 < path term > constant2`. A multi-predicate query like the following will still be efficiently evaluated (i.e., btree traversal to first value satisfying the dual predicate, collect results until dual predicate fails). Here's an example of a query that would be optimized:

nsc select:{:entry | (entry.key <= endDateAndTime) &
(entry.key >= startDateAndTime)}

After the predicates are optimized (if possible), the predicates are ordered for evaluation. The ordering of predicates is as follows:

- constant-constant predicates
- predicates for which an index exists (excluding path-path predicates)
- identity comparisons
- equality comparisons
- remaining predicates

The (non-obvious) side effect of the ordering operation is that the evaluation order of the predicates is basically reversed at best and 'randomized' at worst. Of course "best" and "worst" only apply if you thought that you had explicitly put the predicates in an optimum order.

Once the predicates are ordered, the 'first' predicate is evaluated against the original collection (using indexes if available) and producing a 'intermediate result set'.

If there are no more predicates, the 'intermediate result set' is returned as the 'result of the query'. If the number of elements in the 'intermediate result set' is less than the #bruteThreshold (currently 1000), then each of the remaining predicates are applied against each element of the 'intermediate result set' producing the 'result of the query'.

If the number of elements in the 'intermediate result set' is greater than or equal to the #bruteThreshold, then each of the remaining predicates are evaulated against the original collection (using indexes if available) and each successive 'partial result set' is intersected with the 'intermediate result set' producing a new 'intermediate result set'. Then the final 'intermediate result set' is returned the 'result of the query'.

In the worst case the overall number of DataPages read is a function of the size of the 'intermediate result set' for each of the individual predicates while evaluating each predicate against the whole collection, sequentially.

#selectAsStream: is a way for you to programatically optimize the query with the added advantage that you don't have to traverse the whole result set of any of the predicates to finish your query, since you can return early from the block.

BTW, you are allowed to have two predicates in a #selectAsStream: query as long as they can be optimized to a single 'dual predicate'...

Hopefully, I've made things a bit clearer about what the indexing system can and cannot do...

Let me know if I've been less than clear in any of my explanations.

Dale

----- Original Message -----
| From: "Dale Henrichs" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Monday, January 9, 2012 9:49:37 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
| Johan,
|
| I'll send mail tomorrow with a more detailed analysis...

Wai Lun Mang-2

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Dale Henrichs

You might also want to try this:

findAllBetween: startDateAndTime and: endDateAndTime
| result1 result2 |

result1 := aCollection select:{:entry | (entry.start <= endDateAndTime) }.

result2 := aCollection select:{:entry | (entry.end >= startDateAndTime) }.

^result1 * result2

This might be faster than using selectAsStream because it will not be reading any of the objects in the collection if the index structure holds all the information for indexing.

Dale and Norm could explain the details better.

Mang

From: Dale Henrichs <[hidden email]>
To: GemStone Seaside beta discussion <[hidden email]>
Sent: Tuesday, January 10, 2012 1:49 PM
Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?

Johan,

I'll send mail tomorrow with a more detailed analysis (I've run out of time this evening). For best performance, you should try using #selectAsStream: as follows:

result := OrderedCollection new.
stream := nsc selectAsStream: {:entry | entry.end >= startDateAndTime }.
[stream atEnd not and: [
entry := stream next.
(entry start <= endDateAndTime)]] whileTrue: [
result add: entry ].

The short explanation is that you use #selectAsStream: to reach the first entry that could possibly satisfy the query (assuming that end > start) and then traverse each entry until the start date is beyond the end date ... this gives you the minimum number of objects traversed ... Of course this is what the query evaluator should do, but our optimization algorithm isn't smart enough ...

Like I said I'll send more information tomorrow ... If you get a chance, run some tests yourself to verify that this approach improves the page reads (I haven't done a complete analysis quite yet)...

Dale
----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Monday, January 9, 2012 3:04:38 AM
| Subject: Re: [GS/SS Beta] Is there a way to trace what    objects    are    being    read    from disk?
|
| Hi Dale,
|
| Using the _vmInstanceCounts method, I found at least one reason why
| so many objects are loaded.
| Although it also happens in the cold startup phase, it at least is
| one of the causes of the page reads.
|
| It seems that the equality indexes on a certain collection are not
| working properly in the context of a certain select block.
|
| In the method quoted below, the variable 'aCollection' is a Set where
| we defined 2 equality indexes on the 'start' and 'end' variables (of
| type DateAndTime) of the containing objects.
| The select block returns all objects that have an overlap with the
| requested time period (start to end). For some reason, it ends up
| doing a sequential search where all objects of the collection get
| loaded in the gem's pom.
|
| findAllBetween: startDateAndTime and: endDateAndTime
|     ^ aCollection select:{:entry | (entry.start <= endDateAndTime) &
|     (entry.end >= startDateAndTime)}
|
| The question I have is: is there any reason that this select block
| seems to revert to a sequential search? And is there a way to get
| notified because it seems to happen silently?
|
| In contrast, the following method correctly employs the index (on the
| same collection) and only a fraction of the objects in the
| collection get loaded into the pom when it is executed.
|
| findAllAfter: aDateAndTime
|     ^ aCollection select:{:entry | entry.start >= aDateAndTime}
|
| This is the definition of the indexes:
|
|     aCollection createEqualityIndexOn:'start' withLastElementClass:
|     DateAndTime.
|     aCollection createEqualityIndexOn:'end' withLastElementClass:
|     DateAndTime.
|
|
| Johan
|
| On 07 Jan 2012, at 02:17, Dale Henrichs wrote:
|
| > Johan,
| >
| > That's interesting info about the lack of delay at startup ... it
| > means that you'll have to work harder to trigger the condition and
| > collect data on it ... 170,000 POM objects to respond to a request
| > seems like a large number and it is if that many objects aren't
| > needed in the standard cold startup case ...
| >
| > I'll poke around a little bit more on Monday to see if there's
| > anything else in the statmon files that point to other kinds of
| > problems ...
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Johan Brichau" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Friday, January 6, 2012 5:05:29 PM
| > | Subject: Re: [GS/SS Beta] Is there a way to trace what objects
| > |     are    being    read    from disk?
| > |
| > | Dale,
| > |
| > | Thanks a lot for that detailed analysis.
| > |
| > | The strange thing about this is that when I stop the entire stone
| > | and
| > | fire it back up, the gems are responding very fast. I would
| > | expect
| > | that the working set would have to be loaded into the gems as
| > | well
| > | at that time and take equally long? I did some tests this morning
| > | (some with cache warming and some not) and the app responded very
| > | fast.
| > |
| > | So, I would conclude that some application condition might be
| > | triggering the load of so many objects? The tracing methods seem
| > | promising to allow me to find out what is happening.
| > |
| > | thanks! I hope I can come back soon with some information that
| > | might
| > | benefit others as well.
| > | Johan
| > |
| > | On 07 Jan 2012, at 01:45, Dale Henrichs wrote:
| > |
| > | > Johan,
| > | >
| > | > I think I've found confirmation that the PageReads are in fact
| > | > being performed to repopulate your in-vm working set. And I
| > | > probably don't need any additional statmon files at this point
| > | > in
| > | > time...
| > | >
| > | > If you bring up a graph with the following stats for Topaz761-6
| > | > you'll see the evidence:
| > | >
| > | > DataPageReads
| > | > ObjectsRead
| > | > ObjectsReadInBytes (with PerSecond filter)
| > | > PomGenSizeBytes
| > | >
| > | > While these curves don't exactly match up, they are close
| > | > enough to
| > | > get an idea of what's going on. So lets look at the sample at
| > | > 1:48:41 for each of the curves (corresponds to the max for each
| > | > curve):
| > | >
| > | > 2,800 DataPageReads/sec peak
| > | > 19,000 ObjectsRead/sec peak
| > | > 2,000,000 ObectsReadInBytes/sec peak
| > | >
| > | > 22,000 DataPages read during PageRead episode
| > | > 170,000 ObjectsRead during pageRead episode
| > | > 16,000,000 ObjectsReadInBytes total for duration of PageRead
| > | > episode
| > | > 16,000,000 PomGenSizeBytes growth during PageRead episode
| > | > (starts
| > | > at 56 bytes)
| > | >
| > | > Since PomGenSizeBytes starts at 56 bytes, it's pretty clear
| > | > that
| > | > PomGen was flushed in that gem. Also since you read 16M bytes
| > | > of
| > | > objects and PomGen grows by 16M bytes it's fairly clear that
| > | > you're loading up the working set of POM objects for your gem.
| > | >
| > | > The stats for Topaz 10-7 in the "statmonit20120102" file have
| > | > similar numbers.
| > | >
| > | > You've got 31,250 frames in your cache and you're reading
| > | > 22,000
| > | > DataPages during the PageRead episode. That's a significant
| > | > percentage of the total cache. Doing an MFC ends up running the
| > | > whole 6G repository through the SPC so it's not a surprise that
| > | > your gems end up hitting disk.
| > | >
| > | > We can also see that you are averaging about 8 objects per data
| > | > page (170k/22k) and that your objects are averaging about 100
| > | > bytes/object. In theory you can get 160 100 bytes objects per
| > | > data
| > | > page, so IF you could pack the objects you are reading onto
| > | > data
| > | > pages you could reduce the number of page reads by 20 times....
| > | > GemStone has a featuring for clustering data on pages for just
| > | > this reason, so once you've understood what is being read you
| > | > might consider taking a look at clustering to reduce the number
| > | > of
| > | > page reads required to prime the working set...
| > | >
| > | > Earlier I said that one couldn't trace the objects that are
| > | > loaded
| > | > into a vm, but I spoke too soon. If you take a look at System
| > | > class>>_enableTraceObjectsRead, you'll see that you can create
| > | > a
| > | > hidden set that includes all of the POM objects faulted into
| > | > memory. Read the comment in the method for more information.
| > | > Also
| > | > take a look at the methods in the 'Hidden Set Support' protocol
| > | > for System class as there are a number of options for pulling
| > | > the
| > | > objects from a hidden set ...
| > | >
| > | > An alternative is to use System class>_vmInstanceCounts:. This
| > | > method returns an Array of Associations containing
| > | > Class->(instanceCount, byteSize). No comment in that method,
| > | > but
| > | > you pass in an integer to define for which generation you want
| > | > the
| > | > instance counts:
| > | >
| > | > 0 = all generations
| > | > 1 = new gen only
| > | > 2 = pom gen only
| > | > 3 = old only
| > | > 4 = perm only
| > | > 5 = code only
| > | >
| > | > You'll want to use `2`. Presumably this is the most interesting
| > | > info for you as it is a summary of the objects in POM and it is
| > | > already in a form that could be popped into the object log for
| > | > your analysis. You'll want to fire this puppy off as close the
| > | > the
| > | > end of processing the first http request to the gem as possible
| > | > to
| > | > get a good read on what's being loaded into POM while you have
| > | > the
| > | > high rate of data page reads.
| > | >
| > | > I would think that the same sort of priming the internal
| > | > working
| > | > set goes on when you fire up a fresh gem and hit it with a
| > | > request, so you might even be able to trigger this operation in
| > | > a
| > | > manual topaz session ... You could look at the signature in
| > | > statmon to see if this theory is true...
| > | >
| > | > There are a couple of things that you could consider doing to
| > | > address this issue:
| > | >
| > | > - cut back on the setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
| > | > a bit to reduce the refresh hit
| > | > - use a "cache warmer" gem to periodically "refresh the
| > | > working
| > | > set"
| > | > to keep the data pages "hot" in the cache while the MFC
| > | > is proceeding
| > | > - re-visit your algorithm to see if there might be some
| > | > inefficiencies
| > | > in the working set (scanning large collections, etc.)
| > | > - do some clustering to increase the density of the objects
| > | > used
| > | > in the
| > | > working set on the data pages ... of course if the
| > | > collections
| > | > are
| > | > pretty dynamic, the clustering improvements may not last
| > | > long...
| > | > - increase the SPC, but since your repository is larger than
| > | > 2G
| > | > you might
| > | > not see improvements
| > | > - run your MFC at non-peak times and use a "cache warmer" gem
| > | > to
| > | > reload
| > | > the data pages into the cache before peak time starts ...
| > | >
| > | > Dale
| > | > ----- Original Message -----
| > | > | From: "Johan Brichau" <[hidden email]>
| > | > | To: "GemStone Seaside beta discussion"
| > | > | <[hidden email]>
| > | > | Sent: Friday, January 6, 2012 1:30:56 PM
| > | > | Subject: Re: [GS/SS Beta] Is there a way to trace what
| > | > | objects
| > | > | are    being    read    from disk?
| > | > |
| > | > |
| > | > |
| > | > |
| > | > | On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
| > | > |
| > | > |
| > | > | Would it be possible to send the statmon files immediately
| > | > | preceding
| > | > | the ones you sent. I'd like to be able to see what's going on
| > | > | between the tail end of the MFC and the high PageRead
| > | > | episode.
| > | > |
| > | > | Unfortunately, I don't have any stats for those. I always
| > | > | start
| > | > | statmonitor manually.
| > | > |
| > | > |
| > | > | I started a monitor now so I can include that entire
| > | > | timeperiod
| > | > | in
| > | > | there.
| > | > |
| > | > |
| > | > | I'll be back… ;-)
| > | > |
| > | > |
| > | > | Johan
| > |
| > |
|
|

James Foster-8

Re: Is there a way to trace what objects are being read from disk?

I like Mang's approach and I think it takes good advantage of GemStone's efficient handling of sets.

For amusement, I wrote up an alternative that I learned about 25 years ago. You can read about it at http://programminggems.wordpress.com/2012/01/11/indexing-for-a-date-range/.

-James

On Jan 10, 2012, at 4:45 PM, Wai Lun Mang wrote:

You might also want to try this:

findAllBetween: startDateAndTime and: endDateAndTime
| result1 result2 |
result1 := aCollection select:{:entry | (entry.start <= endDateAndTime) }.
result2 := aCollection select:{:entry | (entry.end >= startDateAndTime) }.
^result1 * result2

This might be faster than using selectAsStream because it will not be reading any of the objects in the collection if the index structure holds all the information for indexing.

Dale and Norm could explain the details better.

Mang

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Wai Lun Mang-2

Mang,

Very good point Mang. You are correct that when the values are encoded into the index structure (see RangeEqualityIndex class>>isBasicClass: for the list of class instances that are encoded). The objects are not faulted in during index processing ... the result sets are constructed by directly installing the oop of the object into the set without having to read the data page where the object resides ... the set intersect also should just be comparing oops directly again avoiding the need to access the object itself .... So it's true that you can avoid significant amounts of page reads if you can use a class that is directly encoded into the index structure.

It turns out that DateAndTime objects are _not_ encoded in the index structures, so in this case the objects are faulted in while traversing the the index structures, so the stream option may still prove advantageous ... especially if the final result set is very small relative to the size of the either of the partial result sets.

DateTime instances _are_ encoded in the index structures, so Johan, you might consider converting your DateAndTimes into DateTimes as another technique to reduce page reads ...

Dale
----- Original Message -----
| From: "Wai Lun Mang" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Tuesday, January 10, 2012 4:45:59 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
|
|
| You might also want to try this:
|
|
| findAllBetween: startDateAndTime and: endDateAndTime
| | result1 result2 |
| result1 := aCollection select:{:entry | (entry.start <=
| endDateAndTime) }.
|
| result2 := aCollection select:{:entry | (entry.end >=
| startDateAndTime) }.
| ^result1 * result2
|
|
| This might be faster than using selectAsStream because it will not be
| reading any of the objects in the collection if the index structure
| holds all the information for indexing.
|
|
| Dale and Norm could explain the details better.
|
|
| Mang
|
|
|
|
|
| From: Dale Henrichs <[hidden email]>
| To: GemStone Seaside beta discussion <[hidden email]>
| Sent: Tuesday, January 10, 2012 1:49 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are
| being read from disk?
|
| Johan,
|
| I'll send mail tomorrow with a more detailed analysis (I've run out
| of time this evening). For best performance, you should try using
| #selectAsStream: as follows:
|
| result := OrderedCollection new.
| stream := nsc selectAsStream: {:entry | entry.end >= startDateAndTime
| }.
| [stream atEnd not and: [
| entry := stream next.
| (entry start <= endDateAndTime)]] whileTrue: [
| result add: entry ].
|
| The short explanation is that you use #selectAsStream: to reach the
| first entry that could possibly satisfy the query (assuming that end
| > start) and then traverse each entry until the start date is beyond
| the end date ... this gives you the minimum number of objects
| traversed ... Of course this is what the query evaluator should do,
| but our optimization algorithm isn't smart enough ...
|
| Like I said I'll send more information tomorrow ... If you get a
| chance, run some tests yourself to verify that this approach
| improves the page reads (I haven't done a complete analysis quite
| yet)...
|
| Dale
| ----- Original Message -----
| | From: "Johan Brichau" < [hidden email] >
| | To: "GemStone Seaside beta discussion" < [hidden email]
| | >
| | Sent: Monday, January 9, 2012 3:04:38 AM
| | Subject: Re: [GS/SS Beta] Is there a way to trace what objects are
| | being read from disk?
| |
| | Hi Dale,
| |
| | Using the _vmInstanceCounts method, I found at least one reason why
| | so many objects are loaded.
| | Although it also happens in the cold startup phase, it at least is
| | one of the causes of the page reads.
| |
| | It seems that the equality indexes on a certain collection are not
| | working properly in the context of a certain select block.
| |
| | In the method quoted below, the variable 'aCollection' is a Set
| | where
| | we defined 2 equality indexes on the 'start' and 'end' variables
| | (of
| | type DateAndTime) of the containing objects.
| | The select block returns all objects that have an overlap with the
| | requested time period (start to end). For some reason, it ends up
| | doing a sequential search where all objects of the collection get
| | loaded in the gem's pom.
| |
| | findAllBetween: startDateAndTime and: endDateAndTime
| | ^ aCollection select:{:entry | (entry.start <= endDateAndTime) &
| | (entry.end >= startDateAndTime)}
| |
| | The question I have is: is there any reason that this select block
| | seems to revert to a sequential search? And is there a way to get
| | notified because it seems to happen silently?
| |
| | In contrast, the following method correctly employs the index (on
| | the
| | same collection) and only a fraction of the objects in the
| | collection get loaded into the pom when it is executed.
| |
| | findAllAfter: aDateAndTime
| | ^ aCollection select:{:entry | entry.start >= aDateAndTime}
| |
| | This is the definition of the indexes:
| |
| | aCollection createEqualityIndexOn:'start' withLastElementClass:
| | DateAndTime.
| | aCollection createEqualityIndexOn:'end' withLastElementClass:
| | DateAndTime.
| |
| |
| | Johan
| |
| | On 07 Jan 2012, at 02:17, Dale Henrichs wrote:
| |
| | > Johan,
| | >
| | > That's interesting info about the lack of delay at startup ... it
| | > means that you'll have to work harder to trigger the condition
| | > and
| | > collect data on it ... 170,000 POM objects to respond to a
| | > request
| | > seems like a large number and it is if that many objects aren't
| | > needed in the standard cold startup case ...
| | >
| | > I'll poke around a little bit more on Monday to see if there's
| | > anything else in the statmon files that point to other kinds of
| | > problems ...
| | >
| | > Dale
| | >
| | > ----- Original Message -----
| | > | From: "Johan Brichau" < [hidden email] >
| | > | To: "GemStone Seaside beta discussion"
| | > | < [hidden email] >
| | > | Sent: Friday, January 6, 2012 5:05:29 PM
| | > | Subject: Re: [GS/SS Beta] Is there a way to trace what objects
| | > | are being read from disk?
| | > |
| | > | Dale,
| | > |
| | > | Thanks a lot for that detailed analysis.
| | > |
| | > | The strange thing about this is that when I stop the entire
| | > | stone
| | > | and
| | > | fire it back up, the gems are responding very fast. I would
| | > | expect
| | > | that the working set would have to be loaded into the gems as
| | > | well
| | > | at that time and take equally long? I did some tests this
| | > | morning
| | > | (some with cache warming and some not) and the app responded
| | > | very
| | > | fast.
| | > |
| | > | So, I would conclude that some application condition might be
| | > | triggering the load of so many objects? The tracing methods
| | > | seem
| | > | promising to allow me to find out what is happening.
| | > |
| | > | thanks! I hope I can come back soon with some information that
| | > | might
| | > | benefit others as well.
| | > | Johan
| | > |
| | > | On 07 Jan 2012, at 01:45, Dale Henrichs wrote:
| | > |
| | > | > Johan,
| | > | >
| | > | > I think I've found confirmation that the PageReads are in
| | > | > fact
| | > | > being performed to repopulate your in-vm working set. And I
| | > | > probably don't need any additional statmon files at this
| | > | > point
| | > | > in
| | > | > time...
| | > | >
| | > | > If you bring up a graph with the following stats for
| | > | > Topaz761-6
| | > | > you'll see the evidence:
| | > | >
| | > | > DataPageReads
| | > | > ObjectsRead
| | > | > ObjectsReadInBytes (with PerSecond filter)
| | > | > PomGenSizeBytes
| | > | >
| | > | > While these curves don't exactly match up, they are close
| | > | > enough to
| | > | > get an idea of what's going on. So lets look at the sample at
| | > | > 1:48:41 for each of the curves (corresponds to the max for
| | > | > each
| | > | > curve):
| | > | >
| | > | > 2,800 DataPageReads/sec peak
| | > | > 19,000 ObjectsRead/sec peak
| | > | > 2,000,000 ObectsReadInBytes/sec peak
| | > | >
| | > | > 22,000 DataPages read during PageRead episode
| | > | > 170,000 ObjectsRead during pageRead episode
| | > | > 16,000,000 ObjectsReadInBytes total for duration of PageRead
| | > | > episode
| | > | > 16,000,000 PomGenSizeBytes growth during PageRead episode
| | > | > (starts
| | > | > at 56 bytes)
| | > | >
| | > | > Since PomGenSizeBytes starts at 56 bytes, it's pretty clear
| | > | > that
| | > | > PomGen was flushed in that gem. Also since you read 16M bytes
| | > | > of
| | > | > objects and PomGen grows by 16M bytes it's fairly clear that
| | > | > you're loading up the working set of POM objects for your
| | > | > gem.
| | > | >
| | > | > The stats for Topaz 10-7 in the "statmonit20120102" file have
| | > | > similar numbers.
| | > | >
| | > | > You've got 31,250 frames in your cache and you're reading
| | > | > 22,000
| | > | > DataPages during the PageRead episode. That's a significant
| | > | > percentage of the total cache. Doing an MFC ends up running
| | > | > the
| | > | > whole 6G repository through the SPC so it's not a surprise
| | > | > that
| | > | > your gems end up hitting disk.
| | > | >
| | > | > We can also see that you are averaging about 8 objects per
| | > | > data
| | > | > page (170k/22k) and that your objects are averaging about 100
| | > | > bytes/object. In theory you can get 160 100 bytes objects per
| | > | > data
| | > | > page, so IF you could pack the objects you are reading onto
| | > | > data
| | > | > pages you could reduce the number of page reads by 20
| | > | > times....
| | > | > GemStone has a featuring for clustering data on pages for
| | > | > just
| | > | > this reason, so once you've understood what is being read you
| | > | > might consider taking a look at clustering to reduce the
| | > | > number
| | > | > of
| | > | > page reads required to prime the working set...
| | > | >
| | > | > Earlier I said that one couldn't trace the objects that are
| | > | > loaded
| | > | > into a vm, but I spoke too soon. If you take a look at System
| | > | > class>>_enableTraceObjectsRead, you'll see that you can
| | > | > create
| | > | > a
| | > | > hidden set that includes all of the POM objects faulted into
| | > | > memory. Read the comment in the method for more information.
| | > | > Also
| | > | > take a look at the methods in the 'Hidden Set Support'
| | > | > protocol
| | > | > for System class as there are a number of options for pulling
| | > | > the
| | > | > objects from a hidden set ...
| | > | >
| | > | > An alternative is to use System class>_vmInstanceCounts:.
| | > | > This
| | > | > method returns an Array of Associations containing
| | > | > Class->(instanceCount, byteSize). No comment in that method,
| | > | > but
| | > | > you pass in an integer to define for which generation you
| | > | > want
| | > | > the
| | > | > instance counts:
| | > | >
| | > | > 0 = all generations
| | > | > 1 = new gen only
| | > | > 2 = pom gen only
| | > | > 3 = old only
| | > | > 4 = perm only
| | > | > 5 = code only
| | > | >
| | > | > You'll want to use `2`. Presumably this is the most
| | > | > interesting
| | > | > info for you as it is a summary of the objects in POM and it
| | > | > is
| | > | > already in a form that could be popped into the object log
| | > | > for
| | > | > your analysis. You'll want to fire this puppy off as close
| | > | > the
| | > | > the
| | > | > end of processing the first http request to the gem as
| | > | > possible
| | > | > to
| | > | > get a good read on what's being loaded into POM while you
| | > | > have
| | > | > the
| | > | > high rate of data page reads.
| | > | >
| | > | > I would think that the same sort of priming the internal
| | > | > working
| | > | > set goes on when you fire up a fresh gem and hit it with a
| | > | > request, so you might even be able to trigger this operation
| | > | > in
| | > | > a
| | > | > manual topaz session ... You could look at the signature in
| | > | > statmon to see if this theory is true...
| | > | >
| | > | > There are a couple of things that you could consider doing to
| | > | > address this issue:
| | > | >
| | > | > - cut back on the setting for
| | > | > GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE
| | > | > a bit to reduce the refresh hit
| | > | > - use a "cache warmer" gem to periodically "refresh the
| | > | > working
| | > | > set"
| | > | > to keep the data pages "hot" in the cache while the MFC
| | > | > is proceeding
| | > | > - re-visit your algorithm to see if there might be some
| | > | > inefficiencies
| | > | > in the working set (scanning large collections, etc.)
| | > | > - do some clustering to increase the density of the objects
| | > | > used
| | > | > in the
| | > | > working set on the data pages ... of course if the
| | > | > collections
| | > | > are
| | > | > pretty dynamic, the clustering improvements may not last
| | > | > long...
| | > | > - increase the SPC, but since your repository is larger than
| | > | > 2G
| | > | > you might
| | > | > not see improvements
| | > | > - run your MFC at non-peak times and use a "cache warmer" gem
| | > | > to
| | > | > reload
| | > | > the data pages into the cache before peak time starts ...
| | > | >
| | > | > Dale
| | > | > ----- Original Message -----
| | > | > | From: "Johan Brichau" < [hidden email] >
| | > | > | To: "GemStone Seaside beta discussion"
| | > | > | < [hidden email] >
| | > | > | Sent: Friday, January 6, 2012 1:30:56 PM
| | > | > | Subject: Re: [GS/SS Beta] Is there a way to trace what
| | > | > | objects
| | > | > | are being read from disk?
| | > | > |
| | > | > |
| | > | > |
| | > | > |
| | > | > | On 06 Jan 2012, at 21:45, Dale Henrichs wrote:
| | > | > |
| | > | > |
| | > | > | Would it be possible to send the statmon files immediately
| | > | > | preceding
| | > | > | the ones you sent. I'd like to be able to see what's going
| | > | > | on
| | > | > | between the tail end of the MFC and the high PageRead
| | > | > | episode.
| | > | > |
| | > | > | Unfortunately, I don't have any stats for those. I always
| | > | > | start
| | > | > | statmonitor manually.
| | > | > |
| | > | > |
| | > | > | I started a monitor now so I can include that entire
| | > | > | timeperiod
| | > | > | in
| | > | > | there.
| | > | > |
| | > | > |
| | > | > | I'll be back… ;-)
| | > | > |
| | > | > |
| | > | > | Johan
| | > |
| | > |
| |
| |
|
|
|

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by James Foster-8

James,

Very interesting post! Thanks a lot for writing that up.

I suspect that the solution with the streaming-select will be sufficient for us at this point. The collection over which we launch the query accumulates roughly 5000 objects per year that the application is running. Tests indicated that the streaming solution is very fast and loads approx. 100 objects in the gem's pom (which are the objects we need to have).

Johan

On 12 Jan 2012, at 07:22, James Foster wrote:

> I like Mang's approach and I think it takes good advantage of GemStone's efficient handling of sets.
>
> For amusement, I wrote up an alternative that I learned about 25 years ago. You can read about it at http://programminggems.wordpress.com/2012/01/11/indexing-for-a-date-range/.
>
> -James
>
> On Jan 10, 2012, at 4:45 PM, Wai Lun Mang wrote:
>
>> You might also want to try this:
>>
>> findAllBetween: startDateAndTime and: endDateAndTime
>> | result1 result2 |
>> result1 := aCollection select:{:entry | (entry.start <= endDateAndTime) }.
>> result2 := aCollection select:{:entry | (entry.end >= startDateAndTime) }.
>> ^result1 * result2
>>
>> This might be faster than using selectAsStream because it will not be reading any of the objects in the collection if the index structure holds all the information for indexing.
>>
>> Dale and Norm could explain the details better.
>>
>> Mang

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Dale Henrichs

Hi Dale,

That's a very comprehensive write-up and very interesting indeed.

Thanks!
Johan

Johan Brichau-2

Re: Is there a way to trace what objects are being read from disk?

In reply to this post by Dale Henrichs

On 12 Jan 2012, at 19:06, Dale Henrichs wrote:

> DateTime instances _are_ encoded in the index structures, so Johan, you might consider converting your DateAndTimes into DateTimes as another technique to reduce page reads ...

If I understand correctly, that would only be needed when I apply Mang's suggestion?

Or does this apply to _any_ query involving DateAndTime instances?

Mind that changing these values to DateTimes would indeed be feasible in our application because we have already made a special wrapper class around our domain objects to make indexing work. This is because the 'start' and 'end' attributes of the domain objects are not (always) instvars.

Johan

Dale Henrichs

Re: Is there a way to trace what objects are being read from disk?

Johan,

The PageREad efficiency for DateTime applies to all queries.

In actuality, Mang's suggestion and your select query should pretty much behave the same.

I know my writeup was fairly dense, but if you look at what the QueryExecuter does, it basically evaluates each predicate and finds the intersection of the results (which is basically what Mang was suggesting). The QueryExecuter reduces predicates, reorders predicate evaluation and in the case where the result set of the first predicate evaluated is less than 1000, the remaining predicates are applied directly to the result set ...

If you know the optimum order for evaluation (and that order differs from what the QueryExecuter does), then it makes sense to explicitly control the order for each predicate.

The same logic applies to using #selectAsStream: ... in your case the final result set is very small relative to the sizes of either of the partial result sets so the stream approach is advantageous.

The advantage to using DateTime instances is that only the BTree pages will be faulted into memory for the creation of the result sets ... So, if you had used DateTime instead of DateAndTime, we probably wouldn't be taking about PageRead issues right now:)

You _would_ have to use the DateTime instances as part of your domain structures for indexing.

It sounds like #selectAsStream: has given you the performance that you need, so it's probably not necessary to change your data structures.

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, January 13, 2012 11:43:18 PM
| Subject: Re: [GS/SS Beta] Is there a way to trace what objects are being read from disk?
|
|
| On 12 Jan 2012, at 19:06, Dale Henrichs wrote:
|
| > DateTime instances _are_ encoded in the index structures, so Johan,
| > you might consider converting your DateAndTimes into DateTimes as
| > another technique to reduce page reads ...
|
| If I understand correctly, that would only be needed when I apply
| Mang's suggestion?
|
| Or does this apply to _any_ query involving DateAndTime instances?
|
| Mind that changing these values to DateTimes would indeed be feasible
| in our application because we have already made a special wrapper
| class around our domain objects to make indexing work. This is
| because the 'start' and 'end' attributes of the domain objects are
| not (always) instvars.
|
| Johan