Indexing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing

Göran Krampe
Howdy folks!

Ok, I am using indexes and have a question:

My scenario right now (I will probably work around it, but would like to
know how this is meant to work) is that I have two different
MagmaCollections - one holding n objects and the other holding a subset
of those. So the same object occurs in two MagmaCollections.

Now, both collections have indexes. Let's say I navigate using one
collection to object A, call noteOldKeysFor: on it, then change it and
commit.

AFAICT MaTransaction>>markRead:using: calls
monitorLargeCollectionChanges:, but the WeakSet in MaTransaction just
keeps growing, I assume it uses identity instead but perhaps should use
equality?. Looks like a bug, anyway that is not my issue here...

So the code in captureOldHashesFor: seems to only capture the hashes for
the large collections that are "monitored", but in my case this happens
to only include the collection I navigated through (since I then read
it), but not the other! Is this logic really correct? Or do I need to
"trick" the session into reindexing my other collection too somehow?

The current result seems to be that object A is only reindexed in the
collection I navigated through.  I assume there is currently no way for
Magma to know which collections it should re-index on its own.

regards, Göran
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Chris Muller
Howdy Göran!

What version of Magma are you using?

> AFAICT MaTransaction>>markRead:using: calls
> monitorLargeCollectionChanges:, but the WeakSet in
> MaTransaction just
> keeps growing, I assume it uses identity instead but
> perhaps should use
> equality?. Looks like a bug, anyway that is not my
> issue here...

I'm trying to understand this without having access to
Squeak right now..  not remembering what WeakSet you
may be talking about; I know the MaTransaction has a
'readSet' which is a WeakIdentityKeyDictionary but
what is the name of the variable referencing the
WeakSet?  I don't remember, I'm afraid I'll have to
wait until this weekend to comment, sorry.

> So the code in captureOldHashesFor: seems to only
> capture the hashes for
> the large collections that are "monitored", but in
> my case this happens
> to only include the collection I navigated through
> (since I then read
> it), but not the other! Is this logic really
> correct? Or do I need to
> "trick" the session into reindexing my other
> collection too somehow?

No, this does not sound correct.  All LargeCollections
should be monitored as soon as they're persistent.  If
they're not persistent, changing keys doesn't matter.
But again, I'm talking codeless here..

Brent are you around?  Didn't we just fix a bug
related to this recently?

Sorry you had this problem Göran.  I will investigate
it this weekend and have an answer/fix for you.

> The current result seems to be that object A is only
> reindexed in the
> collection I navigated through.  I assume there is
> currently no way for
> Magma to know which collections it should re-index
> on its own.

Nope, no way.  I tried real hard for Magma to detect
this automatically but eventually concluded it was
impossible without severely affecting performance.

I'll get back to you tomorrow or Sunday..  Be sure to
tell me what version you're using.

 - Chris
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Göran Krampe
Hi Chris!

Been hectic days here, haven't had time to followup. But here goes:

Chris Muller <[hidden email]> wrote:
> Howdy Göran!
>
> What version of Magma are you using?

1.0

Btw, I am slightly confused about this. There is MagmaServerLoader,
MagmaTesterLoader and MagmaClientLoader. I presume these three MCs use
MC deps to refer to the latest of all components they consist of. I
understand that server is a superset of client etc. I guess that using
any of these would give me Magma 1.1 - or rather - the latest of all
packages, right?

And then I presume that Magma1.0 is an MC referring to a frozen set of
older snapshots, but does it correspond to MagmaServerLoader or
MagmaClientLoader or ... what?

Anyway, I just loaded Magma1.0-cmm.4 and my app still works. :)

(Cees and others are now using Monticello Configurations, perhaps that
is an option for you too - a config is just a list of specific
snapshots)

> > AFAICT MaTransaction>>markRead:using: calls
> > monitorLargeCollectionChanges:, but the WeakSet in
> > MaTransaction just
> > keeps growing, I assume it uses identity instead but
> > perhaps should use
> > equality?. Looks like a bug, anyway that is not my
> > issue here...
>
> I'm trying to understand this without having access to
> Squeak right now..  not remembering what WeakSet you
> may be talking about; I know the MaTransaction has a
> 'readSet' which is a WeakIdentityKeyDictionary but
> what is the name of the variable referencing the
> WeakSet?  I don't remember, I'm afraid I'll have to
> wait until this weekend to comment, sorry.

The WeakSet I am referring to is largeCollectionChanges.

In markRead:using: it says at the end:

...
anObject maIsLargeCollection
                ifTrue:
                        [ self monitorLargeCollectionChanges: anObject changes.
                        anObject session: session ].
^anObject

And in MaTransaction>>monitorLargeCollectionChanges: we have:

monitorLargeCollectionChanges: aMaLargeCollectionChanges

        largeCollectionChanges add: aMaLargeCollectionChanges


Ok, so in my app I have MagmaCollections in three different places and
given the number of instances of my domain objects at the moment I
should have this number of MagmaCollections:

        (Q2Model allInstances size * 2) + (Q2Process allInstances size) ==> 9

Note: Q2Model has two MagmaCollection instvars and Q2Process one.

This gives me 9 right now. MagmaCollection allInstances size gives me
378! And MagmaCollectionChanges allInstances size gives 379.

Hmmmmm, ok - now I cleaned out my Magma db directory (it had tons of
older MagmaCollection files - indexes that is - around 370-ish). Now it
looks much better. I still have "twice too many" MagmaCollections in my
image though:

{(Q2Model allInstances size * 2) + (Q2Process allInstances size).
MagmaSession allInstances size.
MagmaCollection allInstances size.
MagmaCollectionChanges allInstances size}
 ===> #(9 2 18 19)

Q2Model allInstances size is 1 - which is my domain model root object.
It has two MagmaCollections. Then I have Q2Process - 7 instances with
one MagmaCollection each. That is the expected number given a single
MagmaSession. The second MagmaSession seems to be an extra internal
session used by Magma (right?) and perhaps that session is for some
reason also materializing the collections - which would explain the
double amount (18 instead of 9). And yes, I have files on disk
indicating 9 MagmaCollections (there are 9 unique numbers used in the
.hdx filenames).

Anyway, this looks "reasonable" and funny enough my experience that
spurred this email (the index not being updated) seems to have magically
disappeared. It might have been related to these older index files
laying around? Odd.

> > So the code in captureOldHashesFor: seems to only
> > capture the hashes for
> > the large collections that are "monitored", but in
> > my case this happens
> > to only include the collection I navigated through
> > (since I then read
> > it), but not the other! Is this logic really
> > correct? Or do I need to
> > "trick" the session into reindexing my other
> > collection too somehow?
>
> No, this does not sound correct.  All LargeCollections
> should be monitored as soon as they're persistent.  If
> they're not persistent, changing keys doesn't matter.
> But again, I'm talking codeless here..

So a MagmaSession always "knows" all MagmaCollections in a db,
regardless of if they have been navigated and materialized in the
session yet?

> Brent are you around?  Didn't we just fix a bug
> related to this recently?
>
> Sorry you had this problem Göran.  I will investigate
> it this weekend and have an answer/fix for you.

No problem, I haven't had many issues at all with Magma so far - and in
this case it was probably because there were older files in the db dir
(my guess).

Btw, in my app you can actually have the server "build" a separate Magma
db, then download it, unzip and reconnect to it on the clients locally -
so it was nice to see that you are very careful with predicting "odd"
scenarios - because I then stumbled onto this exception (perfectly
correctly - because I needed to reconnect etc) signalled in
MagmaSession>>validateRemoteId   :

        "Cannot connect because the repository has been replaced."

:)

> > The current result seems to be that object A is only
> > reindexed in the
> > collection I navigated through.  I assume there is
> > currently no way for
> > Magma to know which collections it should re-index
> > on its own.
>
> Nope, no way.  I tried real hard for Magma to detect
> this automatically but eventually concluded it was
> impossible without severely affecting performance.
>
> I'll get back to you tomorrow or Sunday..  Be sure to
> tell me what version you're using.
>
>  - Chris

Hmmm, let me see now... above you are saying (I guess) that only
monitored MagmaCollections will be reindexed. And AFAICT from the code
the monitored collections are the ones we have materialized in the
session. But above you wrote "All LargeCollections should be monitored
as soon as they're persistent." which seems contradictory.

I am at a loss right now. Right now my app seems to work nicely - but
perhaps that is just because I materialize all these collections in my
sessions right now.

regards, Göran

PS. Very happy with Magma so far. :) And yes, the second demo the other
day went fine and we have a GO for the project! And most likely we will
open source it too.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Chris Muller
> Btw, I am slightly confused about this. There is
> MagmaServerLoader,
> MagmaTesterLoader and MagmaClientLoader. I presume
> these three MCs use
> MC deps to refer to the latest of all components
> they consist of. I
> understand that server is a superset of client etc.
> I guess that using
> any of these would give me Magma 1.1 - or rather -
> the latest of all
> packages, right?

Yep.  MagmaClientLoader has just the client packages,
used to ONLY connect to a *remote* server.
MagmaServerLoader includes all the packages in client
plus some extras for the server.  These are needed to
either host a server or connect using #openLocal:.
MagmaTesterLoader includes all in server (and client)
plus a bunch extra for the test cases.

> And then I presume that Magma1.0 is an MC referring
> to a frozen set of
> older snapshots, but does it correspond to
> MagmaServerLoader or
> MagmaClientLoader or ... what?

Better to think of them not as "older" (because
they're actually newer with these patches this week)
but as "minus the security code".

Rather than create three separate Loader packages for
1.0, I just created one that includes everything (a la
"Magma1.0TesterLoader").  The premise is that soon 1.1
will be the best one to use.  With these last two
fix-updates to 1.0, it is now branched from 1.1 on
SqueakSource.  I'm not planning to release another 1.1
for a few weeks yet.

So this is another reason to stay with 1.0 for now.  I
have merged the fixes into my own local 1.1 but not
planning to commit it to SqueakSource yet until I'm
done with this iteration.

> Anyway, I just loaded Magma1.0-cmm.4 and my app
> still works. :)
>
> (Cees and others are now using Monticello
> Configurations, perhaps that
> is an option for you too - a config is just a list
> of specific
> snapshots)

I have no preference either way other than I really
don't want to have a SqueakSource server running right
now just to use MC-Configs..  When they support
File-based repositories I'll check them out again.

> Ok, so in my app I have MagmaCollections in three
> different places and
> given the number of instances of my domain objects
> at the moment I
> should have this number of MagmaCollections:
>
> (Q2Model allInstances size * 2) + (Q2Process
> allInstances size) ==> 9
>
> Note: Q2Model has two MagmaCollection instvars and
> Q2Process one.
>
> This gives me 9 right now. MagmaCollection
> allInstances size gives me
> 378! And MagmaCollectionChanges allInstances size
> gives 379.

The next time this happens, see how many instances of
MagmaSession you have.  Remember, they all have their
own copy of all the MagmaCollections and changes.

There have been intermittent issues with cleanup of
old sessions over the years, it may be back..  It was
always related to Block/Method contexts holding old
Sessions in one of their (temp-var?) references..
There is a utility method, MagmaSession
class>>#cleanUp which enumerates all instances of
these contexts does a fine job of getting rid of the
ones; print-it to see the before/after instance count.


> Hmmmmm, ok - now I cleaned out my Magma db directory
> (it had tons of
> older MagmaCollection files - indexes that is -
> around 370-ish). Now it
> looks much better.

Now this confuses me.  "Cleaning up" the the directory
files alone should have no effect on the number of
instances in the image..  ??

> I still have "twice too many"
> MagmaCollections in my
> image though:
> ...
> That is the expected
> number given a single
> MagmaSession. The second MagmaSession seems to be an
> extra internal
> session used by Magma (right?) and perhaps that
> session is for some
> reason also materializing the collections - which
> would explain the
> double amount (18 instead of 9).

Exactly right.  Magma has a meta-model that is
maintained via its own transaction mechanism.  The
meta-model includes such things as the
class-definitions, the magma-collections and their
indexes, the code-base for the repository, etc.  See
MagmaRepositoryDefinition.  It is the root of the
"meta side".

When a new class-definition or large-collection is
added, the server refreshes its own "internal" session
because it must know about them to do its work
properly.

> > No, this does not sound correct.  All
> LargeCollections
> > should be monitored as soon as they're persistent.
>  If
> > they're not persistent, changing keys doesn't
> matter.
> > But again, I'm talking codeless here..
>
> So a MagmaSession always "knows" all
> MagmaCollections in a db,
> regardless of if they have been navigated and
> materialized in the
> session yet?

Since all the MagmaCollections are part of the
MagmaRepositoryDefinition (the meta root), and this
definition is faulted down and materialized upon
connect, the answer is yes, each connected
MagmaSession always knows all MagmaCollections in a
db.

> Btw, in my app you can actually have the server
> "build" a separate Magma
> db, then download it, unzip and reconnect to it on
> the clients locally -

Wow, you can tell me more about this?  This is
obviously part of the "working offline" function,
right?

This might be painful if you are planning to try to
"merge" the offline work back into the "master" later.

I have planned, for 1.2, an efficient server-to-server
protocol that will allow large chunks of domains to be
transported between repositories without having to go
through the client; and, further, to be able to "sync"
up with the original repository.  I hope to have this
done by summer.

> so it was nice to see that you are very careful with
> predicting "odd"
> scenarios - because I then stumbled onto this
> exception (perfectly
> correctly - because I needed to reconnect etc)
> signalled in
> MagmaSession>>validateRemoteId   :
>
> "Cannot connect because the repository has been
> replaced."
>
> :)

I never imagined anyone would run into that condition
so soon.  :)  Glad you are putting it through some
good paces.

So I gather you discovered you just need to connect
with a new MagmaSession instance instead of trying to
reuse the old one.

> Hmmm, let me see now... above you are saying (I
> guess) that only
> monitored MagmaCollections will be reindexed. And
> AFAICT from the code
> the monitored collections are the ones we have
> materialized in the
> session. But above you wrote "All LargeCollections
> should be monitored
> as soon as they're persistent." which seems
> contradictory.

This question is hopefully answered now (above).  All
MagmaCollections in the db are monitored as soon as
you connect because they're part of the meta
RepositoryDef.  All newly craeted ones since the
connect are monitored as soon as they become
persistent via your commit.  Non-persistent
collections with indices do not suffer from key-change
side-effects.

> PS. Very happy with Magma so far. :) And yes, the
> second demo the other
> day went fine and we have a GO for the project! And
> most likely we will
> open source it too.

Fantastic!  Someday I hope my Java-Oracle cohorts will
at least *listen* to an alternative for five-minutes
without smirk and ridicule (about which they know
NOTHING).  In the meantime, we spend hours and
hundreds of e-mails every day toiling over
column-lengths, types, slow-BLOBs and CLOBs,
constraint order, naming-abbreviation "standards", DBA
fights, etc. etc.  Blecch!

 - Chris
Reply | Threaded
Open this post in threaded view
|

Re: Indexing

Göran Krampe
Hi Chris!

Chris Muller <[hidden email]> wrote:
[SNIP]
> So this is another reason to stay with 1.0 for now.  I
> have merged the fixes into my own local 1.1 but not
> planning to commit it to SqueakSource yet until I'm
> done with this iteration.

Ok, yes, I will be sticking to 1.0 until there is a compelling reason to
switch for me - and KryptOn is not AFAICT such a reason in this
particular project.

> > Anyway, I just loaded Magma1.0-cmm.4 and my app
> > still works. :)
> >
> > (Cees and others are now using Monticello
> > Configurations, perhaps that
> > is an option for you too - a config is just a list
> > of specific
> > snapshots)
>
> I have no preference either way other than I really
> don't want to have a SqueakSource server running right
> now just to use MC-Configs..  When they support
> File-based repositories I'll check them out again.

Oh, ok. Didn't know that.
 

> > Ok, so in my app I have MagmaCollections in three
> > different places and
> > given the number of instances of my domain objects
> > at the moment I
> > should have this number of MagmaCollections:
> >
> > (Q2Model allInstances size * 2) + (Q2Process
> > allInstances size) ==> 9
> >
> > Note: Q2Model has two MagmaCollection instvars and
> > Q2Process one.
> >
> > This gives me 9 right now. MagmaCollection
> > allInstances size gives me
> > 378! And MagmaCollectionChanges allInstances size
> > gives 379.
>
> The next time this happens, see how many instances of
> MagmaSession you have.  Remember, they all have their
> own copy of all the MagmaCollections and changes.

Right, I am aware of that.

> There have been intermittent issues with cleanup of
> old sessions over the years, it may be back..  It was
> always related to Block/Method contexts holding old
> Sessions in one of their (temp-var?) references..
> There is a utility method, MagmaSession
> class>>#cleanUp which enumerates all instances of
> these contexts does a fine job of getting rid of the
> ones; print-it to see the before/after instance count.

Good advice! I have been battling trying to get rid of MagmaSessions
quite a bit you see.
It has seemed quite odd to me, but I will try that.
 
> > Hmmmmm, ok - now I cleaned out my Magma db directory
> > (it had tons of
> > older MagmaCollection files - indexes that is -
> > around 370-ish). Now it
> > looks much better.
>
> Now this confuses me.  "Cleaning up" the the directory
> files alone should have no effect on the number of
> instances in the image..  ??

No, I actually toasted the whole dir, recreated the db and indexes and
all.
The problem is probably related to the fact that my "fill the db with
stuff" code also creates the indexes (at the same time as I instantiate
the MagmaCollections) so running that code (reinitializing my domain
model) over and over creates more and more index files. And then - when
I close and reopen the db Magma evidently gets a bit confused - that is
my guess.

> > I still have "twice too many"
> > MagmaCollections in my
> > image though:
> > ...
> > That is the expected
> > number given a single
> > MagmaSession. The second MagmaSession seems to be an
> > extra internal
> > session used by Magma (right?) and perhaps that
> > session is for some
> > reason also materializing the collections - which
> > would explain the
> > double amount (18 instead of 9).
>
> Exactly right.  Magma has a meta-model that is
> maintained via its own transaction mechanism.  The
> meta-model includes such things as the
> class-definitions, the magma-collections and their
> indexes, the code-base for the repository, etc.  See
> MagmaRepositoryDefinition.  It is the root of the
> "meta side".

Aha. Nice. And good to know. :)

> When a new class-definition or large-collection is
> added, the server refreshes its own "internal" session
> because it must know about them to do its work
> properly.
>
> > > No, this does not sound correct.  All
> > LargeCollections
> > > should be monitored as soon as they're persistent.
> >  If
> > > they're not persistent, changing keys doesn't
> > matter.
> > > But again, I'm talking codeless here..
> >
> > So a MagmaSession always "knows" all
> > MagmaCollections in a db,
> > regardless of if they have been navigated and
> > materialized in the
> > session yet?
>
> Since all the MagmaCollections are part of the
> MagmaRepositoryDefinition (the meta root), and this
> definition is faulted down and materialized upon
> connect, the answer is yes, each connected
> MagmaSession always knows all MagmaCollections in a
> db.

Ok. Good. Now I have a much better "picture" of how this works. :)

> > Btw, in my app you can actually have the server
> > "build" a separate Magma
> > db, then download it, unzip and reconnect to it on
> > the clients locally -
>
> Wow, you can tell me more about this?  This is
> obviously part of the "working offline" function,
> right?

Indeed. The master server has code to create a separate Magma db, then
does an intricate veryDeepCopy of the model, and excluding various parts
depending on the permissions of the user etc, and stores it in the new
db. The db is then zipped up and served out by KomHttpServer as a single
zip file. Then I use external calls to wget and unzip (because I expect
this db to possibly become quite large) from the client to get it down,
unpack etc.

The neat part is that all this is done behind a Seaside UI so the user
simply logs on, choose a "mirror" and press "download" and voila - back
to the login screen, but now the client Seaside app has a partial mirror
of the master server db.

> This might be painful if you are planning to try to
> "merge" the offline work back into the "master" later.

Nope, not at all. :) All changes to the domain model are modelled using
the Command pattern - or as I like to call them "transactions" (not to
be confused with Magma transactions of course).

So all modifications to the model are funneled through the top object
which in turn creates instances of Q2Txn (with concrete subclasses for
each type of change), call them to do their work and then I store them
in a MagmaCollection.

So basically I should be able to nuke the model and rebuild it in full
by simply applying all those Q2Txn instances in sequence. Quite
Prevaylerish in style.

Now - this model comes into real play in the offline scenario - a client
simply first downloads all "unknown" Q2Txns, applies them (bringing the
local Magma db up to date), then uploads all local Q2Txns to be applied
at the master server.

I have all this working today - the Q2Txn instances are first
"disconnected" (using UUIDs instead of object refs) from the domain
objects, serialized using ReferenceStream and gzipped, then sent over as
a ByteArray using SOAP (which does base64 encoding I think) and then
rematerialized on the other side, reconnected in the new model and
"applied". Works like a charm.

And since I then have real objects for all operations I kind attach
specific conflict code to each kind of transaction object. So a little
bit of manual work - but it pays off. And in other ways too - like
having full complete logging and traceability of all changes - per
definition.

> I have planned, for 1.2, an efficient server-to-server
> protocol that will allow large chunks of domains to be
> transported between repositories without having to go
> through the client; and, further, to be able to "sync"
> up with the original repository.  I hope to have this
> done by summer.

Ok, sounds like very useful tech for us - but we can't wait for it. :)
But it might come in handy later on.

Our scenario is first a full download of a partial db done on the LAN
and then regular synchs (sending those Q2Txns back and forth) with quite
small data. And since the Q2Txns are only deltas they turn very small.

[SNIP]
> So I gather you discovered you just need to connect
> with a new MagmaSession instance instead of trying to
> reuse the old one.

Indeed. No problem.

> > Hmmm, let me see now... above you are saying (I
> > guess) that only
> > monitored MagmaCollections will be reindexed. And
> > AFAICT from the code
> > the monitored collections are the ones we have
> > materialized in the
> > session. But above you wrote "All LargeCollections
> > should be monitored
> > as soon as they're persistent." which seems
> > contradictory.
>
> This question is hopefully answered now (above).  All
> MagmaCollections in the db are monitored as soon as
> you connect because they're part of the meta
> RepositoryDef.  All newly craeted ones since the
> connect are monitored as soon as they become
> persistent via your commit.  Non-persistent
> collections with indices do not suffer from key-change
> side-effects.

Ok. Got it.
 

> > PS. Very happy with Magma so far. :) And yes, the
> > second demo the other
> > day went fine and we have a GO for the project! And
> > most likely we will
> > open source it too.
>
> Fantastic!  Someday I hope my Java-Oracle cohorts will
> at least *listen* to an alternative for five-minutes
> without smirk and ridicule (about which they know
> NOTHING).  In the meantime, we spend hours and
> hundreds of e-mails every day toiling over
> column-lengths, types, slow-BLOBs and CLOBs,
> constraint order, naming-abbreviation "standards", DBA
> fights, etc. etc.  Blecch!

Hehe, yes indeed. A sidenote:

I ran a 2-hour workshop yesterday with 8 other employees at Toolkit
(where I work).
It was a "Shock and Awe"-workshop throwing them right into a stripped
version of my customer app - focusing mainly on Seaside but with Magma
inside too of course.

One of the fun parts is that with the Seaside/Magma integration and my
bits and pieces already in place they never ever saw a single line
related to the db.

One pair of developers added instvars in the domain model, created
objects per user object in the model, yaddayadda - and it "just worked".
Even if they actually know a bit about OODBs I still think they were a
bit mesmerized. I mean - hey, they didn't write a single line of code
for it - not even a "commit".

>  - Chris

regards, Göran