Store blessing comment search

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Store blessing comment search

Dave & Vicki Stevenson
In our current project we use a template for the blessing comment when publishing a package version which lists the task ID for the code change, as well as various other information.

Periodically we have to merge all the code for a particular task from a dev branch to the main branch. Especially for tasks completed months earlier and touching many packages, it can be difficult to find all the versions published for a given task in order to ensure everything gets merged properly.

Some time ago in 7.6 I wrote some extensions to Store to allow reading blessing comments for large numbers of pundle versions in a single query, then parsing the comments in Smalltalk looking for references to any of a list of task IDs. I even proposed some changes for Store to speed up finding comments for blessings (see below).

Now we're using 7.9.1 and the Store classes have changed considerably, especially in that they now use Glorp. I cannot find an easy way to port my 7.6 extensions to Glorp. I could port them using straight SQL rather than Glorp, but before I reinvent my wheel I thought I'd ask first if anyone has implemented blessing comment search?

Thanks,

Dave
[hidden email]
On 7/2/2011 11:58 PM, Joachim Geidel wrote:
Am 03.07.11 00:37 schrieb "Samuel S. Shuster" unter [hidden email]:
-------- Original Message --------
Subject: [vw-dev] slow blob retrieval in Store
Date: Fri, 1 Jul 2011 16:32:16 -0700 (PDT)
From: Dave Stevenson [hidden email]
To: vw-dev [hidden email]


Our Store repository has some package versions with very long blessing comments, thanks to a comment template we employ that references the task that authorized the code change and the modules and classes affected, combined with the merge tool's affinity for appending comments upon comments upon comments. Since Store breaks these large strings into chunks which are later retrieved one chunk per query, merely selecting a version in the published items tool can result in delays of minutes before the comment actually displays. One such version in our repository requires 315 queries just to display the comment. Very annoying.

Chunks of a blob are stored as a singly linked list in the tw_blob table. The table has 3 columns:
    primarykey
    blobtype
    blobdata

blobtype has a handful of normal values, but if the value is negative, it is a 'fakeType', which is really the negated primary key of the nextChunk, for blobs too large to store in a single record. Suppose I store a blob that is so large it must be split into 3 chunks, then the records might look like:

key    type    data
999    -998    'chunk1...'
998    -997    'chunk2...'
997    2    'chunk3...'

This blob is referenced by the primary key of its first chunk ("Store.Blob aRecordWithID: 999"). Blob class>>retrieveSources:session: reads record 999, then loops to get the next chunk until it reaches 997, which has a real type, not a fakeType. Then it assembles the chunks into a single object referencing primaryKey 999.

For some time now Common Table Expressions have been available, which could answer all the chunks for one or more such linked lists in a single query. For example, the following query answers all the chunks for 3 blobs whose primarykeys are 999, 16876 and 17:

    WITH RECURSIVE selectedBlobs(primarykey, blobtype, blobdata) AS (
            SELECT primarykey, blobtype, blobdata
                FROM tw_blob
                WHERE primarykey IN (999, 16876, 17)
        UNION ALL
            SELECT newChunk.primarykey, newChunk.blobtype, newChunk.blobdata
            FROM selectedBlobs selected, tw_blob newChunk
            WHERE selected.blobtype %< 0
                AND -selected.blobtype = newChunk.primarykey)
    SELECT * FROM selectedBlobs

The above query speeds up retrieval of large linked lists because the looping is done on the server instead of the client, eliminating network latency. But it should slow things down overall because the common case of a single chunk per blob would have more computational overhead on the server.

I think a better solution would be to add a column 'firstchunk':
    key    type    first    data
    999    -998    999    'chunk1...'
    998    -997    999    'chunk2...'
    997    2    999    'chunk3...'

Now a much simpler and faster query (because no looping is required on the client or the server) can answer all the chunks at once:
    SELECT primarykey, blobtype, blobdata
    FROM tw_blob
    WHERE firstchunk IN (999, 16876, 17)

The chunks answered by the above query would still be assembled into their 3 blobs in Smalltalk, but speed should be vastly improved for the slow cases, and equivalent for the simple cases. Publishing would require obtaining the primary key for the 'firstchunk' to be referenced by itself and all other chunks of the blob, but I can't imagine it would produce a significant slowdown. Perhaps even best of all, not only could we retrieve all the chunks for a blob at once, but we could also retrieve chunks for multiple blobs at once, perhaps all the blobs associated with an entire class or package. The same speedup of eliminating network latency for each chunk could be eliminated again for multiple blobs.

Thoughts?   
 
Dave Stevenson
[hidden email]

Great idea!

The outstanding issues are of course not trivial:

For those (and you) who use the Old Store Database Objects, adding a
column to a table is nasty, and can blow up older code that does NOT know
about it.

So, unless we/Cincom want to risk backward compatibility, this is not
something we would typically do.
That said, adding a "TW_BlobChunk" table (and "TW_BinaryBlobChunk") would
work, although you are then
adding a bit of overhead if the table existed, old code would simply
still work.
chunkKey, mainKey (or something like that).
999		999
997		999
998		999

(Note: I made the data not sequential to illustrate that it wouldn't matter)

Note: the Old Store Database Objects are, as of 7.8 obsolete and engineering does not do backward porting of OSDO to new schemas.

                                And So It Goes
                                     Sames
______________________________________________________________________

Samuel S. Shuster [|]
VisualWorks Engineering, Store Project
Smalltalk Enables Success -- What Are YOU Using?


SELECT statements are not impacted by additional columns if they enumerate
all columns explictly. I don't know if OSDO queries use "SELECT *
FROM...". If they do, this could still be solved in two ways:

- Rename the tw_blob table, add the new column, and create a writable view
called tw_blob which has the same list of columns as the current tw_blob
table. I don't know if that would work in all databases which can be used
for Store repositories. There are databases which don't support writing
into views. In the case of SQLite, this can be solved by installing
appropriate "INSTEAD OF" triggers.

- Provide a patch for OSDO which replaces "SELECT * FROM tw_blob" with
SELECT primarykey, blobtype, blobdata FROM tw_blob". That shouldn't be too
complicated, and a simple Smalltalk source code file for people who need
it would be sufficient.
 

AFAICT, there are several other issues:

1) Older versions of Store don't write the information needed, neither in
a new column in tw_blob nor in a new table. In this respect, there's no
difference between the two solutions. This would be a problem for people
who use an older version of VisualWorks to publish code after the
migration of the Store database - which would probably be the majority of
commercial customers. It would also be relevant for Cincom's public
repository. 

I think that this can be solved: When reading data from tw_blob with a new
version of Store, try the new query first, and if it does not find data
for a "firstchunk" key, fall back to the old way of searching. The
fallback strategy would only be needed for data which has been published
with an old version of Store after migrating the database, i.e. for a
small fraction of the data. Of course, I assume that the new column or
table is filled during database migration.

A bonus would be an SQL script which adds the missing information, and
which could be installed as a periodic database job or as part of an
"after update" or "after insert" trigger in database servers which support
this, or run manually from time to time on databases which don't support
jobs or triggers.

2) The new version of Store must be able to tell whether it is accessing a
new or an old Store database, and generate different SQL statements when
reading and writing tw_blob data. I don't know how difficult this would
be, or if it should be supported at all.

3) Store replication would have to cope with replication from an old
schema to a new one and vice versa, and the solution should be robust. It
must not lead to problems when someone replicates data from an old
repository to a new one using an old version of the replicator.

I prefer Dave's idea of adding a column in tw_blob instead of adding a new
table. An additional table also needs an index and a foreign key
constraint with cascading delete option, and additional code in the image
to ensure deletion in databases which don't support cascading delete (I
don't know if this is actually an issue). Also, Store replication might be
easier with just an additional column.


Best regards
Joachim Geidel





_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Store blessing comment search

Terry Raymond

I suspect that there are several organizations that would like to add attributes to

packages and be able to search for them.

 

Has Cincom ever considered adding additional fields to Package and provide

a way to populate them on the publish dialog and use them as a version filter?

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Dave
Sent: Wednesday, February 20, 2013 11:31 AM
To: VWNC
Subject: [vwnc] Store blessing comment search

 

In our current project we use a template for the blessing comment when publishing a package version which lists the task ID for the code change, as well as various other information.

Periodically we have to merge all the code for a particular task from a dev branch to the main branch. Especially for tasks completed months earlier and touching many packages, it can be difficult to find all the versions published for a given task in order to ensure everything gets merged properly.

Some time ago in 7.6 I wrote some extensions to Store to allow reading blessing comments for large numbers of pundle versions in a single query, then parsing the comments in Smalltalk looking for references to any of a list of task IDs. I even proposed some changes for Store to speed up finding comments for blessings (see below).

Now we're using 7.9.1 and the Store classes have changed considerably, especially in that they now use Glorp. I cannot find an easy way to port my 7.6 extensions to Glorp. I could port them using straight SQL rather than Glorp, but before I reinvent my wheel I thought I'd ask first if anyone has implemented blessing comment search?

Thanks,

Dave


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Store blessing comment search

Steven Kelly

Presumably Terry means “attributes for Package Versions” – packages already have attributes. Or maybe “attributes for pundle versions and/or blessings” would be more accurate. In any case, the ability to add attributes to pundle versions/blessings would be most useful. Like Dave, we too use a template for version/blessing comments, and having those ‘fields’ as real widgets would make integrating with bug management systems and producing release notes easier.

 

I wonder also whether the division of labor between version and blessing could be improved. In particular having the comments split over several blessings doesn’t always feel right, and it might be better to be able to have a comment on the actual version (editable subsequently when blessing). The blessing dialog would thus open with the existing comment in it, and that could be left alone (e.g. if just bumping up the blessing level to Released), added to (e.g. if you realise you forgot to mention something), or corrected. There could also be a separate blessing comment, maybe mostly blank but there for adding info where necessary (e.g. if bumping up to Released, to say which product version this was released in).

 

All the best,

Steve

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond
Sent: Wednesday, February 20, 2013 8:34 PM
To: 'VWNC'
Subject: Re: [vwnc] Store blessing comment search

 

I suspect that there are several organizations that would like to add attributes to

packages and be able to search for them.

 

Has Cincom ever considered adding additional fields to Package and provide

a way to populate them on the publish dialog and use them as a version filter?

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: [hidden email] [[hidden email]] On Behalf Of Dave
Sent: Wednesday, February 20, 2013 11:31 AM
To: VWNC
Subject: [vwnc] Store blessing comment search

 

In our current project we use a template for the blessing comment when publishing a package version which lists the task ID for the code change, as well as various other information.

Periodically we have to merge all the code for a particular task from a dev branch to the main branch. Especially for tasks completed months earlier and touching many packages, it can be difficult to find all the versions published for a given task in order to ensure everything gets merged properly.

Some time ago in 7.6 I wrote some extensions to Store to allow reading blessing comments for large numbers of pundle versions in a single query, then parsing the comments in Smalltalk looking for references to any of a list of task IDs. I even proposed some changes for Store to speed up finding comments for blessings (see below).

Now we're using 7.9.1 and the Store classes have changed considerably, especially in that they now use Glorp. I cannot find an easy way to port my 7.6 extensions to Glorp. I could port them using straight SQL rather than Glorp, but before I reinvent my wheel I thought I'd ask first if anyone has implemented blessing comment search?

Thanks,

Dave


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc