[vwnc] Querying Store for historical dependents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Querying Store for historical dependents

Reinout Heeck-2
I am in dire need of a 'historical dependents' query tool for store.

I whipped up something but the query is way too slow to make it usable,
can anybody suggest how to improve the following (rather naive) code?


computePackages
   
    allPundles := ((((PropertyRecord
        allVersionsWithName: 'developmentPrerequisites'
        newerThan: (PropertyRecord new timeStamp: 0))
        select:
            [:record |
            record definition
                anySatisfy: [:arr | arr class = Array and: [(arr at: 2)
= packageName]]])
        collectAll: [:record | record containingPackages])
        reject: [:ea | ea isNil]) asList.
        ...



Regarding some cruftiness above note that our store db is ancient, it
contains data from somewhat broken Store versions, so:

--the 'newerThan:' keyword is there because some of our PropertyRecords
contain a nil timeStamp field, so the sorting would break.
  Skipping all versions without a timestamp was acceptable for the job
at hand today, however more complete solutions would be appreciated.

--the check for Array class is there because some of these records
return a Text as their definition. I am skipping them for now...

--the isNil test is there because (apparently) we have some properties
without the enclosing package (I guess, don't know whether
#enclosingPackages should ever return nils or whether that indicates our
db is too corrupt).



I have also a need for a 'historical senders of...' query, but have no
idea where to start with that one.
Any suggestions?



TIA,

Reinout
-------




_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Querying Store for historical dependents

Alan Knight-2
Use StoreGlorp and try the following. It could probably be made faster by not actually reading the properties and just doing a subselect, but that would be more work. It might give false positives if something matches the package name (e.g. the package is named 'A') and conceivably false negatives if the list of development prerequisites is huge (>255 characters so it doesn't fit in the searchString).  I think the false negatives are unlikely and it's probably fast enough to sort through the false positives in-memory.

packageName := 'StoreForGlorpVWUI'.
properties := session read: StoreProperty where: [:each |
   each timestamp notNil &
  (each name = 'developmentPrerequisites') &
   (each searchString like: ('%', packageName, '%'))].
propertyIds := properties collect: #primaryKey.
session read: StorePackage where: [:each |
        each propertiesRecordDictionary anySatisfy: [:eachProperty |
                eachProperty primaryKey in: propertyIds]].

9 seconds against my local postgres, 2 against our remote Oracle.

At 11:27 AM 3/24/2009, Reinout Heeck wrote:

>I am in dire need of a 'historical dependents' query tool for store.
>
>I whipped up something but the query is way too slow to make it usable,
>can anybody suggest how to improve the following (rather naive) code?
>
>
>computePackages
>  
>    allPundles := ((((PropertyRecord
>        allVersionsWithName: 'developmentPrerequisites'
>        newerThan: (PropertyRecord new timeStamp: 0))
>        select:
>            [:record |
>            record definition
>                anySatisfy: [:arr | arr class = Array and: [(arr at: 2)
>= packageName]]])
>        collectAll: [:record | record containingPackages])
>        reject: [:ea | ea isNil]) asList.
>        ...
>
>
>
>Regarding some cruftiness above note that our store db is ancient, it
>contains data from somewhat broken Store versions, so:
>
>--the 'newerThan:' keyword is there because some of our PropertyRecords
>contain a nil timeStamp field, so the sorting would break.
>  Skipping all versions without a timestamp was acceptable for the job
>at hand today, however more complete solutions would be appreciated.
>
>--the check for Array class is there because some of these records
>return a Text as their definition. I am skipping them for now...
>
>--the isNil test is there because (apparently) we have some properties
>without the enclosing package (I guess, don't know whether
>#enclosingPackages should ever return nils or whether that indicates our
>db is too corrupt).
>
>
>
>I have also a need for a 'historical senders of...' query, but have no
>idea where to start with that one.
>Any suggestions?
>
>
>
>TIA,
>
>Reinout
>-------
>
>
>
>
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Querying Store for historical dependents

Reinout Heeck-2
In reply to this post by Reinout Heeck-2
Alan Knight wrote:
> Use StoreGlorp and try the following. It could probably be made faster by not actually reading the properties and just doing a subselect, but that would be more work. It might give false positives if something matches the package name (e.g. the package is named 'A') and conceivably false negatives if the list of development prerequisites is huge (>255 characters so it doesn't fit in the searchString).  I think the false negatives are unlikely and it's probably fast enough to sort through the false positives in-memory.
>  
We had a look at using searchStrings but it does not even come close:

as it happens quite a bit of our prereq lists are large (you call them
huge) to 'very large' (because we use lineUps) and a handful huge
(several hundred entries).

In our case this means that searchStrings contains relatively little
entries.
Moreover we publish our prereqs sorted for most ideal load order, this
means searchStrings has a tendency to contain the same base packages
again and again but not the more interesting dependent packages.
These problems are exacerbated in the case of /development/
prerequisites because the searchString has multiple occurrences of the
component type string 'package' which take up quite a bit of those 255
bytes.

So (as far as I can judge) we are pretty much condemned to using the
actual property definitions instead of the searchStrings 'cache' :-(



FYI, here's a histogram:

g := Bag new.
(PropertyRecord
    allVersionsWithName: 'developmentPrerequisites'
    newerThan: (PropertyRecord new timeStamp: 0))
do: [:record |
    record definition class = Array ifTrue: [bag add: record definition
size]].

bag contents associations asSortedCollection asArray collect: [ :ass |
Array with: ass key with: ass value]
--->
#(#(1 889) #(2 837) #(3 674) #(4 522) #(5 379) #(6 381) #(7 356) #(8
273) #(9 212) #(10 158) #(11 158) #(12 144) #(13 121) #(14 96) #(15 75)
#(16 68) #(17 61) #(18 40) #(19 49) #(20 39) #(21 26) #(22 16) #(23 8)
#(24 8) #(25 10) #(26 12) #(27 13) #(28 11) #(29 12) #(30 7) #(31 2)
#(32 4) #(33 7) #(34 4) #(35 4) #(36 10) #(37 7) #(38 3) #(39 7) #(40 6)
#(41 8) #(42 6) #(43 5) #(44 5) #(45 4) #(46 4) #(47 2) #(48 3) #(49 2)
#(50 11) #(51 2) #(52 2) #(53 3) #(54 6) #(55 6) #(56 4) #(57 2) #(58 3)
#(61 10) #(62 4) #(63 3) #(64 7) #(65 6) #(68 1) #(70 1) #(74 1) #(94 2)
#(154 18) #(161 1) #(162 4) #(163 1) #(166 1) #(167 2) #(171 2) #(176 4)
#(177 2) #(188 3) #(199 1) #(203 1) #(241 1) #(286 2) #(292 1) #(302 12)
#(306 3) #(307 19) #(308 17) #(309 14) #(310 1) #(311 58) #(313 2) #(314
3) #(317 3) #(318 6) #(325 1) #(327 4) #(342 1) #(343 2) #(355 1) #(485
1) #(486 2) #(569 1) #(570 1) #(784 1) #(786 2) #(787 1) #(931 2) #(932
2) #(937 1))

With our package naming conventions we get about seven prereqs listed in
the searchStrings field, hence using searchStrings won't fly :-(



> packageName := 'StoreForGlorpVWUI'.
> properties := session read: StoreProperty where: [:each |
>    each timestamp notNil &
>   (each name = 'developmentPrerequisites') &
>    (each searchString like: ('%', packageName, '%'))].
> propertyIds := properties collect: #primaryKey.
> session read: StorePackage where: [:each |
>         each propertiesRecordDictionary anySatisfy: [:eachProperty |
>                 eachProperty primaryKey in: propertyIds]].
>
> 9 seconds against my local postgres, 2 against our remote Oracle.
>  
I wish... :-)
I'm looking at 50 seconds query time with my scheme.....

The results are awesome though: this tool really shows our prereq
structure and how it's use changed over time - the code is 'talking' to
me again :-).

(In the abstract we all already know this -- imagine programming for a
day without using 'senders of' which is somewhat analogous in use as the
above query)




Thanks a lot!

Reinout
-------

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Querying Store for historical dependents

Alan Knight-2
OK, then it really isn't very searchable. The actual property values are stored in blobs, which are either entirely unsearchable (e.g. on Oracle where they're in long raw form) or not easy to search. Don't get me started on the tw_blob table.

I think your best bet is probably just to retrieve all of the development prerequisite property records and either store the information in a separate table which is actually searchable, or just keep them in memory.  So, for example, the rather horribly messy

packageName := 'StoreForGlorpVWUI'.
q := Query read: StoreProperty where: [:each | each name = 'developmentPrerequisites'].
q retrieve: [:each | each primaryKey].
q retrieve: [:each | each definition].
propertiesWithIds := session execute: q.
prereqs := propertiesWithIds select: [:each |
   myPrereqs := each last object.
   (myPrereqs isString | (myPrereqs class == Core.Text)) ifTrue: [
         '*', packageName, '*' match: myPrereqs]
     ifFalse: [myPrereqs anySatisfy: [:eachArray | eachArray includes: packageName]]].
prereqKeys := prereqs collect: [:each | each first].
packages := session read: StorePackage where: [:each | each propertiesRecordDictionary  anySatisfy: [:eachProperty |
                eachProperty primaryKey in: prereqKeys]].

You might be able to get away without some of the rather ugly "is this a string" logic if your database hasn't got as many interesting old bits of junk in it as mine.

Also, if your goal is to find all the prereqs for lots of things you could do a bulk query that did more than one at a time.

At 01:13 PM 3/24/2009, Reinout Heeck wrote:

>Alan Knight wrote:
>> Use StoreGlorp and try the following. It could probably be made faster by not actually reading the properties and just doing a subselect, but that would be more work. It might give false positives if something matches the package name (e.g. the package is named 'A') and conceivably false negatives if the list of development prerequisites is huge (>255 characters so it doesn't fit in the searchString).  I think the false negatives are unlikely and it's probably fast enough to sort through the false positives in-memory.
>>  
>We had a look at using searchStrings but it does not even come close:
>
>as it happens quite a bit of our prereq lists are large (you call them
>huge) to 'very large' (because we use lineUps) and a handful huge
>(several hundred entries).
>
>In our case this means that searchStrings contains relatively little
>entries.
>Moreover we publish our prereqs sorted for most ideal load order, this
>means searchStrings has a tendency to contain the same base packages
>again and again but not the more interesting dependent packages.
>These problems are exacerbated in the case of /development/
>prerequisites because the searchString has multiple occurrences of the
>component type string 'package' which take up quite a bit of those 255
>bytes.
>
>So (as far as I can judge) we are pretty much condemned to using the
>actual property definitions instead of the searchStrings 'cache' :-(
>
>
>
>FYI, here's a histogram:
>
>g := Bag new.
>(PropertyRecord
>    allVersionsWithName: 'developmentPrerequisites'
>    newerThan: (PropertyRecord new timeStamp: 0))
>do: [:record |
>    record definition class = Array ifTrue: [bag add: record definition
>size]].
>
>bag contents associations asSortedCollection asArray collect: [ :ass |
>Array with: ass key with: ass value]
>--->
>#(#(1 889) #(2 837) #(3 674) #(4 522) #(5 379) #(6 381) #(7 356) #(8
>273) #(9 212) #(10 158) #(11 158) #(12 144) #(13 121) #(14 96) #(15 75)
>#(16 68) #(17 61) #(18 40) #(19 49) #(20 39) #(21 26) #(22 16) #(23 8)
>#(24 8) #(25 10) #(26 12) #(27 13) #(28 11) #(29 12) #(30 7) #(31 2)
>#(32 4) #(33 7) #(34 4) #(35 4) #(36 10) #(37 7) #(38 3) #(39 7) #(40 6)
>#(41 8) #(42 6) #(43 5) #(44 5) #(45 4) #(46 4) #(47 2) #(48 3) #(49 2)
>#(50 11) #(51 2) #(52 2) #(53 3) #(54 6) #(55 6) #(56 4) #(57 2) #(58 3)
>#(61 10) #(62 4) #(63 3) #(64 7) #(65 6) #(68 1) #(70 1) #(74 1) #(94 2)
>#(154 18) #(161 1) #(162 4) #(163 1) #(166 1) #(167 2) #(171 2) #(176 4)
>#(177 2) #(188 3) #(199 1) #(203 1) #(241 1) #(286 2) #(292 1) #(302 12)
>#(306 3) #(307 19) #(308 17) #(309 14) #(310 1) #(311 58) #(313 2) #(314
>3) #(317 3) #(318 6) #(325 1) #(327 4) #(342 1) #(343 2) #(355 1) #(485
>1) #(486 2) #(569 1) #(570 1) #(784 1) #(786 2) #(787 1) #(931 2) #(932
>2) #(937 1))
>
>With our package naming conventions we get about seven prereqs listed in
>the searchStrings field, hence using searchStrings won't fly :-(
>
>
>
>> packageName := 'StoreForGlorpVWUI'.
>> properties := session read: StoreProperty where: [:each |
>>    each timestamp notNil &
>>   (each name = 'developmentPrerequisites') &
>>    (each searchString like: ('%', packageName, '%'))].
>> propertyIds := properties collect: #primaryKey.
>> session read: StorePackage where: [:each |
>>         each propertiesRecordDictionary anySatisfy: [:eachProperty |
>>                 eachProperty primaryKey in: propertyIds]].
>>
>> 9 seconds against my local postgres, 2 against our remote Oracle.
>>  
>I wish... :-)
>I'm looking at 50 seconds query time with my scheme.....
>
>The results are awesome though: this tool really shows our prereq
>structure and how it's use changed over time - the code is 'talking' to
>me again :-).
>
>(In the abstract we all already know this -- imagine programming for a
>day without using 'senders of' which is somewhat analogous in use as the
>above query)
>
>
>
>
>Thanks a lot!
>
>Reinout
>-------
>
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Querying Store for historical dependents

Reinout Heeck-2
Alan Knight wrote:

>
> I think your best bet is probably just to retrieve all of the
> development prerequisite property records and either store the
> information in a separate table which is actually searchable, or just
> keep them in memory.

I was thinking along those lines too.

However the code you suggested works an order of magnitude faster than
mine - that suffices so for the sake of simplicity I'll stick with that
for the moment.



Thanks a million!
I'm really happy with this little tool now :-)

R
-





   So, for example, the rather horribly messy

>
> packageName := 'StoreForGlorpVWUI'. q := Query read: StoreProperty
> where: [:each | each name = 'developmentPrerequisites']. q retrieve:
> [:each | each primaryKey]. q retrieve: [:each | each definition].
> propertiesWithIds := session execute: q. prereqs := propertiesWithIds
> select: [:each | myPrereqs := each last object. (myPrereqs isString |
> (myPrereqs class == Core.Text)) ifTrue: [ '*', packageName, '*'
> match: myPrereqs] ifFalse: [myPrereqs anySatisfy: [:eachArray |
> eachArray includes: packageName]]]. prereqKeys := prereqs collect:
> [:each | each first]. packages := session read: StorePackage where:
> [:each | each propertiesRecordDictionary  anySatisfy: [:eachProperty
> | eachProperty primaryKey in: prereqKeys]].
>
> You might be able to get away without some of the rather ugly "is
> this a string" logic if your database hasn't got as many interesting
> old bits of junk in it as mine.
>
> Also, if your goal is to find all the prereqs for lots of things you
> could do a bulk query that did more than one at a time.
>
> At 01:13 PM 3/24/2009, Reinout Heeck wrote:
>> Alan Knight wrote:
>>> Use StoreGlorp and try the following. It could probably be made
>>> faster by not actually reading the properties and just doing a
>>> subselect, but that would be more work. It might give false
>>> positives if something matches the package name (e.g. the package
>>> is named 'A') and conceivably false negatives if the list of
>>> development prerequisites is huge (>255 characters so it doesn't
>>> fit in the searchString).  I think the false negatives are
>>> unlikely and it's probably fast enough to sort through the false
>>> positives in-memory.
>>>
>> We had a look at using searchStrings but it does not even come
>> close:
>>
>> as it happens quite a bit of our prereq lists are large (you call
>> them huge) to 'very large' (because we use lineUps) and a handful
>> huge (several hundred entries).
>>
>> In our case this means that searchStrings contains relatively
>> little entries. Moreover we publish our prereqs sorted for most
>> ideal load order, this means searchStrings has a tendency to
>> contain the same base packages again and again but not the more
>> interesting dependent packages. These problems are exacerbated in
>> the case of /development/ prerequisites because the searchString
>> has multiple occurrences of the component type string 'package'
>> which take up quite a bit of those 255 bytes.
>>
>> So (as far as I can judge) we are pretty much condemned to using
>> the actual property definitions instead of the searchStrings
>> 'cache' :-(
>>
>>
>>
>> FYI, here's a histogram:
>>
>> g := Bag new. (PropertyRecord allVersionsWithName:
>> 'developmentPrerequisites' newerThan: (PropertyRecord new
>> timeStamp: 0)) do: [:record | record definition class = Array
>> ifTrue: [bag add: record definition size]].
>>
>> bag contents associations asSortedCollection asArray collect: [
>> :ass | Array with: ass key with: ass value] ---> #(#(1 889) #(2
>> 837) #(3 674) #(4 522) #(5 379) #(6 381) #(7 356) #(8 273) #(9 212)
>> #(10 158) #(11 158) #(12 144) #(13 121) #(14 96) #(15 75) #(16 68)
>> #(17 61) #(18 40) #(19 49) #(20 39) #(21 26) #(22 16) #(23 8) #(24
>> 8) #(25 10) #(26 12) #(27 13) #(28 11) #(29 12) #(30 7) #(31 2)
>> #(32 4) #(33 7) #(34 4) #(35 4) #(36 10) #(37 7) #(38 3) #(39 7)
>> #(40 6) #(41 8) #(42 6) #(43 5) #(44 5) #(45 4) #(46 4) #(47 2)
>> #(48 3) #(49 2) #(50 11) #(51 2) #(52 2) #(53 3) #(54 6) #(55 6)
>> #(56 4) #(57 2) #(58 3) #(61 10) #(62 4) #(63 3) #(64 7) #(65 6)
>> #(68 1) #(70 1) #(74 1) #(94 2) #(154 18) #(161 1) #(162 4) #(163
>> 1) #(166 1) #(167 2) #(171 2) #(176 4) #(177 2) #(188 3) #(199 1)
>> #(203 1) #(241 1) #(286 2) #(292 1) #(302 12) #(306 3) #(307 19)
>> #(308 17) #(309 14) #(310 1) #(311 58) #(313 2) #(314 3) #(317 3)
>> #(318 6) #(325 1) #(327 4) #(342 1) #(343 2) #(355 1) #(485 1)
>> #(486 2) #(569 1) #(570 1) #(784 1) #(786 2) #(787 1) #(931 2)
>> #(932 2) #(937 1))
>>
>> With our package naming conventions we get about seven prereqs
>> listed in the searchStrings field, hence using searchStrings won't
>> fly :-(
>>
>>
>>
>>> packageName := 'StoreForGlorpVWUI'. properties := session read:
>>> StoreProperty where: [:each | each timestamp notNil & (each name
>>> = 'developmentPrerequisites') & (each searchString like: ('%',
>>> packageName, '%'))]. propertyIds := properties collect:
>>> #primaryKey. session read: StorePackage where: [:each | each
>>> propertiesRecordDictionary anySatisfy: [:eachProperty |
>>> eachProperty primaryKey in: propertyIds]].
>>>
>>> 9 seconds against my local postgres, 2 against our remote Oracle.
>>>
>>>
>> I wish... :-) I'm looking at 50 seconds query time with my
>> scheme.....
>>
>> The results are awesome though: this tool really shows our prereq
>> structure and how it's use changed over time - the code is
>> 'talking' to me again :-).
>>
>> (In the abstract we all already know this -- imagine programming
>> for a day without using 'senders of' which is somewhat analogous in
>> use as the above query)
>>
>>
>>
>>
>> Thanks a lot!
>>
>> Reinout -------
>>
>> _______________________________________________ vwnc mailing list
>> [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> -- Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email] [hidden email] http://www.cincom.com/smalltalk
>
> _______________________________________________ vwnc mailing list
> [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Querying Store for historical dependents

Reinout Heeck-2

I made it available in the public repository as package
SpsDependentPundlesTool

People that need to manage dependencies across multiple images (or
multiple product versions) might find this useful.



Cheers,

Reinout
-------

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc