Optimizing Speed of Fetching Source Code

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing Speed of Fetching Source Code

Runar Jordahl
For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Michael Lucas-Smith-2
We have a couple of ARs related to this. ObjectStudio even caches filenames -- which can cause some potential problems but in practice hasn't seemed to have done so. There are other options being tossed around too in different ARs. No one solution has dominated as the best answer just yet, but we are looking at it.

Cheers,
Michael

On Nov 4, 2011, at 1:07 PM, Runar Jordahl wrote:

> For various internal tools we access source code of methods. These
> tools spend 90% - 95% of their time getting the source code strings,
> then 5% - 10% of total time processing them. In our image, getting the
> source code for all methods takes around 16 seconds. This only covers
> fetching source code, not getting the methods.
>
> To see where the time is spent, try evaluating the following snippet:
> TimeProfiler profile: [
> (Store.Registry bundleNamed: 'Base VisualWorks')
> withAllContainedItems do: [:pundle |
> pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]
>
>          75.4 [] in Store.MethodDescriptor>>sourceCode
>            69.1 Behavior>>sourceCodeAt:
>              69.1 Behavior>>sourceCodeForMethod:at:
> (...)
>                                    69.1 BlockClosure>>on:do:
>                                      69.1 [] in
> XMLSourceFileFormat>>methodSourceAt:in:
>                                        69.1 XML.XMLParser>>parseElement:
>                                          43.7 XML.XMLParser>>on:
> (...)
>                                                    43.7
> LogicalFilename>>asFilename
>                                                      43.7
> CharacterArray>>asFilename
>                                                        43.7 Filename
> class>>named:
>                                                          43.7
> PCFilename class>>createInstanceNamed:
>                                                            25.3
> NTFSFilename class>>canonicalize:forFileSystemAttributes:
>                                                              25.3
> PCFilename class>>canonicalize:forFileSystemAttributes:
>                                                                22.7
> OrderedCollection>>do:
>                                                                  22.7
> [] in PCFilename class>>canonicalize:forFileSystemAttributes:
>
> 22.7 Filename class>>chop:to:
>
> 22.7 PCFilename class>>isBadCharacter:
>
> Please note that we do not have any problems with the current
> performance. It just would be nice if the speed could be improved
> without too much work.
>
> We are now starting to cache source code in the development image. For
> our project, with 6000 classes, Seaside and VisualWorks Base, around
> 100 MB is used by the cache. This speeds up our internal tools that
> work on source code.
>
> I noticed that performance does not improve a lot when upgrading to a
> solid-state drive. I assume most source files get cached by the
> operating system in RAM.
>
>
> Kind regards
> Runar Jordahl
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Steven Kelly
In reply to this post by Runar Jordahl
[vwnc] Optimizing Speed of Fetching Source Code
Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.
 
20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?
 
If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx
 
Steve

From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Holger Guhl
I have observed this during development and Support work (AR 60922) and created a couple of solutions, one of them Res98610. With that solution the need to create filenames drops to zero.
Source code retrieval starts at some point with a file, but for accessing the XML formatted text the file is wrapped as InputSource. For some purposes it seemed necessary to keep the filename (although most recent method comments negate that). The filename access seems cheap, and the final implementation in FileConnection>>name (^... fileName asFilename asString) seems harmless. But in the concrete scenario the fileName is a PortableFilename where #asFilename is implemented as "self asResolvedString asFilename". The "funny" thing is that the new Filename is converted to a String and never used thereafter.
The Resolution knows that and discards the filename access. So caching filenames is not necessary at all.
The second part of the solution is avoiding a duplicate file access during class comment retrieval. This is more or less to make the solution complete, since class comment access is less frequent. Nevertheless, source code analysis tools may benefit from the optimization.
The Resolution has not yet been integrated. I guess there is a ton of more pressing things to do. So far I cannot tell about negative side effects. Maybe you want to give it a try...

Am 04.11.2011 23:15, schrieb Steven Kelly:
[vwnc] Optimizing Speed of Fetching Source Code
Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.
 
20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?
 
If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx
 
Steve

From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Res98610.zip (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Runar Jordahl
I can confirm that Res98610 reduce the time spent in our tools by
around 50%. I note that the resolution is for 7.7, not 7.8 which we
use.

It would be great if Cincom could integrate this fix. I am reluctant
to integrate this fix myself in our project.
Kind regards
Runar
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Steven Kelly
In reply to this post by Holger Guhl
[vwnc] Optimizing Speed of Fetching Source Code

That’s great, and very useful to us as developers. However, it achieves its speed up by avoiding a couple of calls to #asFilename, rather than by improving the speed of #asFilename. It’s thus only useful during development, and not to end users of VW applications. It would be great if the low-hanging fruit available in #getFileSystemAttributes: and encoded stream use could be harvested to improve overall performance, or at least performance in the commonest case: Windows and #UTF16.

 

Steve

 

From: Holger Guhl [mailto:[hidden email]]
Sent: 7. marraskuuta 2011 12:13
To: Steven Kelly
Cc: Runar Jordahl; [hidden email]
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

 

I have observed this during development and Support work (AR 60922) and created a couple of solutions, one of them Res98610. With that solution the need to create filenames drops to zero.
Source code retrieval starts at some point with a file, but for accessing the XML formatted text the file is wrapped as InputSource. For some purposes it seemed necessary to keep the filename (although most recent method comments negate that). The filename access seems cheap, and the final implementation in FileConnection>>name (^... fileName asFilename asString) seems harmless. But in the concrete scenario the fileName is a PortableFilename where #asFilename is implemented as "self asResolvedString asFilename". The "funny" thing is that the new Filename is converted to a String and never used thereafter.
The Resolution knows that and discards the filename access. So caching filenames is not necessary at all.
The second part of the solution is avoiding a duplicate file access during class comment retrieval. This is more or less to make the solution complete, since class comment access is less frequent. Nevertheless, source code analysis tools may benefit from the optimization.
The Resolution has not yet been integrated. I guess there is a ton of more pressing things to do. So far I cannot tell about negative side effects. Maybe you want to give it a try...

Am 04.11.2011 23:15, schrieb Steven Kelly:

Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.

 

20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?

 

If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx

 

Steve


From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

 
 
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc




Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Dave Stevenson-3
I don't pretend to understand the nuances of either proposed solution, so I'll just ask if they properly handle links?
 
On Windows I've created link to directories so that I can access them as a subfolder of one drive when they physically reside on another. I don't know if Windows allows linking between partitions with different file systems (ntfs/fat32, etc), but if it does, then any solution should take this into account. I would think the same consideration should be made on unixy systems.
 
And are there any repercussions for remote references:
    <A href="file://\\myServer\someFolder\someFile.txt">\\myServer\someFolder\someFile.txt
 
or remote folders mapped as a local drive letter?
 
Dave Stevenson
[hidden email]



From: Steven Kelly <[hidden email]>
To: [hidden email]
Sent: Mon, November 7, 2011 5:56:19 AM
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

That’s great, and very useful to us as developers. However, it achieves its speed up by avoiding a couple of calls to #asFilename, rather than by improving the speed of #asFilename. It’s thus only useful during development, and not to end users of VW applications. It would be great if the low-hanging fruit available in #getFileSystemAttributes: and encoded stream use could be harvested to improve overall performance, or at least performance in the commonest case: Windows and #UTF16.

 

Steve

 

From: Holger Guhl [mailto:[hidden email]]
Sent: 7. marraskuuta 2011 12:13
To: Steven Kelly
Cc: Runar Jordahl; [hidden email]
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

 

I have observed this during development and Support work (AR 60922) and created a couple of solutions, one of them Res98610. With that solution the need to create filenames drops to zero.
Source code retrieval starts at some point with a file, but for accessing the XML formatted text the file is wrapped as InputSource. For some purposes it seemed necessary to keep the filename (although most recent method comments negate that). The filename access seems cheap, and the final implementation in FileConnection>>name (^... fileName asFilename asString) seems harmless. But in the concrete scenario the fileName is a PortableFilename where #asFilename is implemented as "self asResolvedString asFilename". The "funny" thing is that the new Filename is converted to a String and never used thereafter.
The Resolution knows that and discards the filename access. So caching filenames is not necessary at all.
The second part of the solution is avoiding a duplicate file access during class comment retrieval. This is more or less to make the solution complete, since class comment access is less frequent. Nevertheless, source code analysis tools may benefit from the optimization.
The Resolution has not yet been integrated. I guess there is a ton of more pressing things to do. So far I cannot tell about negative side effects. Maybe you want to give it a try...

Am 04.11.2011 23:15, schrieb Steven Kelly:

Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.

 

20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?

 

If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx

 

Steve


From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

  
  
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc




Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Steven Kelly

AFAICS there is no longer any difference in VW between FATFilaneme and NTFSFilename, and similarly when reading the file system attributes from such a file. If what you are doing with links works now, it would work with caching.

 

Anything to do with networks has so many possible holes that there will obviously be the chance of repercussions. Then again, the gain of caching is also higher. SMB in Vista and Windows 7 is broken in a variety of new and exciting ways, precisely because Microsoft have tried to implement more aggressive caching (see e.g. http://fox.wikis.com/wc.dll?Wiki~OpportunisticLocking). If Microsoft will only let you get new information once every 10 seconds, there’s not much point VW asking hundreds of times a second.

 

Steve

 

From: Dave Stevenson [mailto:[hidden email]]
Sent: 7. marraskuuta 2011 19:48
To: Steven Kelly; [hidden email]
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

 

I don't pretend to understand the nuances of either proposed solution, so I'll just ask if they properly handle links?

 

On Windows I've created link to directories so that I can access them as a subfolder of one drive when they physically reside on another. I don't know if Windows allows linking between partitions with different file systems (ntfs/fat32, etc), but if it does, then any solution should take this into account. I would think the same consideration should be made on unixy systems.

 

And are there any repercussions for remote references:

    <a href="file:///\\myServer\someFolder\someFile.txt">\\myServer\someFolder\someFile.txt

 

or remote folders mapped as a local drive letter?
 

Dave Stevenson
[hidden email]

 

 


From: Steven Kelly <[hidden email]>
To: [hidden email]
Sent: Mon, November 7, 2011 5:56:19 AM
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

That’s great, and very useful to us as developers. However, it achieves its speed up by avoiding a couple of calls to #asFilename, rather than by improving the speed of #asFilename. It’s thus only useful during development, and not to end users of VW applications. It would be great if the low-hanging fruit available in #getFileSystemAttributes: and encoded stream use could be harvested to improve overall performance, or at least performance in the commonest case: Windows and #UTF16.

 

Steve

 

From: Holger Guhl [mailto:[hidden email]]
Sent: 7. marraskuuta 2011 12:13
To: Steven Kelly
Cc: Runar Jordahl; [hidden email]
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

 

I have observed this during development and Support work (AR 60922) and created a couple of solutions, one of them Res98610. With that solution the need to create filenames drops to zero.
Source code retrieval starts at some point with a file, but for accessing the XML formatted text the file is wrapped as InputSource. For some purposes it seemed necessary to keep the filename (although most recent method comments negate that). The filename access seems cheap, and the final implementation in FileConnection>>name (^... fileName asFilename asString) seems harmless. But in the concrete scenario the fileName is a PortableFilename where #asFilename is implemented as "self asResolvedString asFilename". The "funny" thing is that the new Filename is converted to a String and never used thereafter.
The Resolution knows that and discards the filename access. So caching filenames is not necessary at all.
The second part of the solution is avoiding a duplicate file access during class comment retrieval. This is more or less to make the solution complete, since class comment access is less frequent. Nevertheless, source code analysis tools may benefit from the optimization.
The Resolution has not yet been integrated. I guess there is a ton of more pressing things to do. So far I cannot tell about negative side effects. Maybe you want to give it a try...

Am 04.11.2011 23:15, schrieb Steven Kelly:

Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.

 

20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?

 

If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx

 

Steve


From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

  
  
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Holger Guhl
In reply to this post by Dave Stevenson-3
VisualWorks does not handle Windows links (Windows Shortcut File, *.lnk), and it's not the scope of the mentioned Resolution to change that.
Remote folders mapped to a local drive letter are handled transparently by Windows, and VisualWorks works fine with that. The Resolution improves performance because the necessity to create Filenames and retrieve file system properties is reduced.
Unix systems have a reasonable file system where links are transparently handled.

Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

Am 07.11.2011 18:48, schrieb Dave Stevenson:
I don't pretend to understand the nuances of either proposed solution, so I'll just ask if they properly handle links?
 
On Windows I've created link to directories so that I can access them as a subfolder of one drive when they physically reside on another. I don't know if Windows allows linking between partitions with different file systems (ntfs/fat32, etc), but if it does, then any solution should take this into account. I would think the same consideration should be made on unixy systems.
 
And are there any repercussions for remote references:
 
or remote folders mapped as a local drive letter?
 
Dave Stevenson
[hidden email]



From: Steven Kelly [hidden email]
To: [hidden email]
Sent: Mon, November 7, 2011 5:56:19 AM
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

That’s great, and very useful to us as developers. However, it achieves its speed up by avoiding a couple of calls to #asFilename, rather than by improving the speed of #asFilename. It’s thus only useful during development, and not to end users of VW applications. It would be great if the low-hanging fruit available in #getFileSystemAttributes: and encoded stream use could be harvested to improve overall performance, or at least performance in the commonest case: Windows and #UTF16.

 

Steve

 

From: Holger Guhl [[hidden email]]
Sent: 7. marraskuuta 2011 12:13
To: Steven Kelly
Cc: Runar Jordahl; [hidden email]
Subject: Re: [vwnc] Optimizing Speed of Fetching Source Code

 

I have observed this during development and Support work (AR 60922) and created a couple of solutions, one of them Res98610. With that solution the need to create filenames drops to zero.
Source code retrieval starts at some point with a file, but for accessing the XML formatted text the file is wrapped as InputSource. For some purposes it seemed necessary to keep the filename (although most recent method comments negate that). The filename access seems cheap, and the final implementation in FileConnection>>name (^... fileName asFilename asString) seems harmless. But in the concrete scenario the fileName is a PortableFilename where #asFilename is implemented as "self asResolvedString asFilename". The "funny" thing is that the new Filename is converted to a String and never used thereafter.
The Resolution knows that and discards the filename access. So caching filenames is not necessary at all.
The second part of the solution is avoiding a duplicate file access during class comment retrieval. This is more or less to make the solution complete, since class comment access is less frequent. Nevertheless, source code analysis tools may benefit from the optimization.
The Resolution has not yet been integrated. I guess there is a ton of more pressing things to do. So far I cannot tell about negative side effects. Maybe you want to give it a try...

Am 04.11.2011 23:15, schrieb Steven Kelly:

Looks like over 40% of the time is spent forming filenames from strings. We've also noticed this performance problem, in our case forming filenames for database files. Given that the filenames are already in the correct format, and don't need anything like #{VISUALWORKS} to be converted, it does seem a little odd that VW is so slow.

 

20% of the time is spent building a new encoded stream for ever individual character in the filename, to see if it #isBadCharacter:. Surely #chop:to: could make one encoded stream for the filename and use the same stream to test all the characters? Or then simply optimize the UTF16 case?

 

If we do that, 50% of the time for building a filename is in the primitive #getFileSystemAttributes:. I imagine this could be cached quite easily, based on a simple comparison of the first component of the string. We could clear the cache when the OS announces a new disk: http://msdn.microsoft.com/en-us/library/aa363215.aspx

 

Steve


From: [hidden email] on behalf of Runar Jordahl
Sent: Fri 04/11/2011 22:07
To: [hidden email]
Subject: [vwnc] Optimizing Speed of Fetching Source Code

For various internal tools we access source code of methods. These
tools spend 90% - 95% of their time getting the source code strings,
then 5% - 10% of total time processing them. In our image, getting the
source code for all methods takes around 16 seconds. This only covers
fetching source code, not getting the methods.

To see where the time is spent, try evaluating the following snippet:
TimeProfiler profile: [
        (Store.Registry bundleNamed: 'Base VisualWorks')
withAllContainedItems do: [:pundle |
                pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]

          75.4 [] in Store.MethodDescriptor>>sourceCode
            69.1 Behavior>>sourceCodeAt:
              69.1 Behavior>>sourceCodeForMethod:at:
(...)
                                    69.1 BlockClosure>>on:do:
                                      69.1 [] in
XMLSourceFileFormat>>methodSourceAt:in:
                                        69.1 XML.XMLParser>>parseElement:
                                          43.7 XML.XMLParser>>on:
(...)
                                                    43.7
LogicalFilename>>asFilename
                                                      43.7
CharacterArray>>asFilename
                                                        43.7 Filename
class>>named:
                                                          43.7
PCFilename class>>createInstanceNamed:
                                                            25.3
NTFSFilename class>>canonicalize:forFileSystemAttributes:
                                                              25.3
PCFilename class>>canonicalize:forFileSystemAttributes:
                                                                22.7
OrderedCollection>>do:
                                                                  22.7
[] in PCFilename class>>canonicalize:forFileSystemAttributes:

22.7 Filename class>>chop:to:

22.7 PCFilename class>>isBadCharacter:

Please note that we do not have any problems with the current
performance. It just would be nice if the speed could be improved
without too much work.

We are now starting to cache source code in the development image. For
our project, with 6000 classes, Seaside and VisualWorks Base, around
100 MB is used by the cache. This speeds up our internal tools that
work on source code.

I noticed that performance does not improve a lot when upgrading to a
solid-state drive. I assume most source files get cached by the
operating system in RAM.


Kind regards
Runar Jordahl
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

  
  
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc




Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Niall Ross
In reply to this post by Runar Jordahl
Dear Runar,
    if you change from xml to chunk, what is the effect on times?

(You will have to convert the source, not just change the setting from
XML to chunk.  After changing the setting then, depending on whether it
is pundle source, base source or parcel source you test on, you will
want #condenseChanges, #condenseChangesOntoSources,
#newBaseSourceFileWithoutParcels: and/or republishing of loaded parcels.)

I saw a 10-fold speed up in browsing pundle (i.e. changes-file) source
after converting a very large changes file in this way in VW 7.4.

(Of course, chunk might have its inconveniences in general even if it
suited you here.  It is far the less used format today, so less tested.  
One can of course have an image converted to chunk and then reset to XML
so XML will be used for file-out.  File-in, of course, recognises the
format in the file - the image setting is irrelevant.)

IIUC, the issue is that the XML reader is much more general than just a
reader of Smalltalk code.  The chunk format reader, by contrast,  is
optimised for Smalltalk source.

                Yours faithfully
                      Niall Ross

>For various internal tools we access source code of methods. These
>tools spend 90% - 95% of their time getting the source code strings,
>then 5% - 10% of total time processing them. In our image, getting the
>source code for all methods takes around 16 seconds. This only covers
>fetching source code, not getting the methods.
>
>To see where the time is spent, try evaluating the following snippet:
>TimeProfiler profile: [
> (Store.Registry bundleNamed: 'Base VisualWorks')
>withAllContainedItems do: [:pundle |
> pundle isPackage ifTrue: [pundle methods do: [:method | method sourceCode]]]]
>
>          75.4 [] in Store.MethodDescriptor>>sourceCode
>            69.1 Behavior>>sourceCodeAt:
>              69.1 Behavior>>sourceCodeForMethod:at:
>(...)
>                                    69.1 BlockClosure>>on:do:
>                                      69.1 [] in
>XMLSourceFileFormat>>methodSourceAt:in:
>                                        69.1 XML.XMLParser>>parseElement:
>                                          43.7 XML.XMLParser>>on:
>(...)
>                                                    43.7
>LogicalFilename>>asFilename
>                                                      43.7
>CharacterArray>>asFilename
>                                                        43.7 Filename
>class>>named:
>                                                          43.7
>PCFilename class>>createInstanceNamed:
>                                                            25.3
>NTFSFilename class>>canonicalize:forFileSystemAttributes:
>                                                              25.3
>PCFilename class>>canonicalize:forFileSystemAttributes:
>                                                                22.7
>OrderedCollection>>do:
>                                                                  22.7
>[] in PCFilename class>>canonicalize:forFileSystemAttributes:
>
>22.7 Filename class>>chop:to:
>
>22.7 PCFilename class>>isBadCharacter:
>
>Please note that we do not have any problems with the current
>performance. It just would be nice if the speed could be improved
>without too much work.
>
>We are now starting to cache source code in the development image. For
>our project, with 6000 classes, Seaside and VisualWorks Base, around
>100 MB is used by the cache. This speeds up our internal tools that
>work on source code.
>
>I noticed that performance does not improve a lot when upgrading to a
>solid-state drive. I assume most source files get cached by the
>operating system in RAM.
>
>
>Kind regards
>Runar Jordahl
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
>  
>


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing Speed of Fetching Source Code

Runar Jordahl
Sorry for this late reply. We want to limit changes to the base code
as much as possible. We did not include Res98610, and will not move to
another source code format.

We now cache source code in the images were we need quick access to
it. This greatly improves execution of our code quality tests.

We also use the technique in our interactive tools. One example is
"source code search", which now is instant like Google. I will publish
an update of this tool to the Public Store Repository later.

Runar
blog.epigent.com

2011/12/1 Niall Ross <[hidden email]>:

> Dear Runar,
>   if you change from xml to chunk, what is the effect on times?
>
> (You will have to convert the source, not just change the setting from XML
> to chunk.  After changing the setting then, depending on whether it is
> pundle source, base source or parcel source you test on, you will want
> #condenseChanges, #condenseChangesOntoSources,
> #newBaseSourceFileWithoutParcels: and/or republishing of loaded parcels.)
>
> I saw a 10-fold speed up in browsing pundle (i.e. changes-file) source after
> converting a very large changes file in this way in VW 7.4.
>
> (Of course, chunk might have its inconveniences in general even if it suited
> you here.  It is far the less used format today, so less tested.  One can of
> course have an image converted to chunk and then reset to XML so XML will be
> used for file-out.  File-in, of course, recognises the format in the file -
> the image setting is irrelevant.)
>
> IIUC, the issue is that the XML reader is much more general than just a
> reader of Smalltalk code.  The chunk format reader, by contrast,  is
> optimised for Smalltalk source.
>
>               Yours faithfully
>                     Niall Ross
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc