Alignment visualization performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Alignment visualization performance

hernanmd
Hello guys.

I want to visualize DNA sequence alignments in Pharo 8. For this task most bioinformatics applications set a background color for each letter. But in Pharo the Inspector is too slow to open even for just one small sequence of 1Kb. Consider now there are about 37k sequences of COVID-19 and each genome contains about 30k of letters, so visualizing and scrolling these should be fast (as for zooming).

But have a look at this script which takes about 6 seconds to open an Inspector. The script uses BioSmalltalk, and the code could be enhanced for sure, but that is not relevant to my performance problem of visualization:

[
| text attributes |
" Generate a Text object from a random sequence "
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: 1000) sequence asText.
" Setup an array for each nucleotide background color "
attributes := Array new: text size.
1 to: text size do: [ : index |
attributes at: index put: {
(TextBackgroundColor color: (BioDNAAlphabet colorMap at: (text at: index))) }  ].
text runs: (RunArray newFrom: attributes).
text inspect
] timeToRun asString  "'0:00:00:05.911'"

Also, resizing the opened Inspector takes 2-3 seconds to refresh.
You can see the output here: https://imgur.com/a/xUlBeVY

I should say without the #inspect the code ran without performance issues: "'0:00:00:00.009'"

So I ran again the script for different sequence sizes:

String streamContents: [ : stream |
100 to: 2000 by: 100 do: [ : sl |
stream nextPutAll: ([
| text attributes |
" Generate a Text object from a random sequence "
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: sl) sequence asText.
" Setup an array for each nucleotide background color "
attributes := Array new: text size.
1 to: text size do: [ : index |
attributes at: index put: {
(TextBackgroundColor color: (BioDNAAlphabet colorMap at: (text at: index))) }  ].
text runs: (RunArray newFrom: attributes).
text inspect
] timeToRun asString);
cr
]
]

And these are the results:

0:00:00:00.147
0:00:00:00.28
0:00:00:00.568
0:00:00:00.993
0:00:00:01.776
0:00:00:02.123
0:00:00:03.111
0:00:00:04.084
0:00:00:04.574
0:00:00:06.192
0:00:00:07.214
0:00:00:07.915
0:00:00:10.382
0:00:00:12.725
0:00:00:12.359
0:00:00:17.357
0:00:00:17.147
0:00:00:20.651
0:00:00:20.392
0:00:00:23.238

At first I thought it was a problem of the Glamout text renderer for Rubric Text, but profiling a single pass of the snippet for 2000 letters, shows a couple of methods in Rubric scanner, after some DNU sends, which are consuming a lot of the time: RubCharacterBlockScanner(RubCharacterBlockScanner) >> characterBlockAtPoint:index:in: and RubCharacterBlockScanner(RubCharacterBlockScanner) >> endOfRun". I attached the full profiler report so you may have a look if you like. But the summary is:

**Leaves**
37.4% {8800ms} RubCompositionScanner(RubCharacterScanner)>>basicScanCharactersFrom:to:in:rightX:stopConditions:kern:
6.3% {1476ms} Dictionary>>at:ifAbsentPut:
6.1% {1425ms} Context>>unwindComplete
4.6% {1082ms} Semaphore>>criticalReleasingOnError:
4.2% {991ms} Dictionary>>at:ifAbsent:
3.3% {785ms} Context>>aboutToReturn:through:
2.2% {527ms} Context>>resume:through:
2.0% {470ms} ExternalAddress>>isNull
1.8% {421ms} BlockClosure>>on:do:
1.7% {402ms} RubCharacterBlockScanner(RubCharacterScanner)>>setConditionArray:
1.6% {378ms} FreeTypeFace>>validate
1.6% {376ms} Dictionary>>scanFor:
1.5% {364ms} Context>>unwindComplete:
1.5% {344ms} Context>>unwindBlock
1.4% {323ms} Array(SequenceableCollection)>>do:
1.3% {299ms} Dictionary(HashedCollection)>>findElementOrNil:
1.2% {293ms} RunArray>>at:setRunOffsetAndValue:
1.2% {289ms} FreeTypeCache>>atFont:charCode:type:ifAbsentPut:
1.1% {252ms} FreeTypeCacheLinkedList>>moveDown:

**Memory**
old +0 bytes
young -1,485,272 bytes
used -1,485,272 bytes
free +1,485,272 bytes

**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 947 totalling 1,576ms (7.0% uptime), avg 2.0ms
tenures 0
root table 0 overflows

So my question is, is there any other text rendering backends to try? And when I say backends I say which don't use Rubric.

Cheers,

Hernán


Profile_DNABgColoring.txt (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Lse-pharo4pharo] Alignment visualization performance

hernanmd

El sáb., 22 feb. 2020 a las 5:22, Stéphane Ducasse (<[hidden email]>) escribió:
could you open a bug entry and we will tag it for large images. 

On 21 Feb 2020, at 06:40, Hernán Morales Durand <[hidden email]> wrote:

Hello guys.

I want to visualize DNA sequence alignments in Pharo 8. For this task most bioinformatics applications set a background color for each letter. But in Pharo the Inspector is too slow to open even for just one small sequence of 1Kb. Consider now there are about 37k sequences of COVID-19 and each genome contains about 30k of letters, so visualizing and scrolling these should be fast (as for zooming).

But have a look at this script which takes about 6 seconds to open an Inspector. The script uses BioSmalltalk, and the code could be enhanced for sure, but that is not relevant to my performance problem of visualization:

[
| text attributes |
" Generate a Text object from a random sequence "
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: 1000) sequence asText.
" Setup an array for each nucleotide background color "
attributes := Array new: text size.
1 to: text size do: [ : index |
attributes at: index put: {
(TextBackgroundColor color: (BioDNAAlphabet colorMap at: (text at: index))) }  ].
text runs: (RunArray newFrom: attributes).
text inspect
] timeToRun asString  "'0:00:00:05.911'"

Also, resizing the opened Inspector takes 2-3 seconds to refresh.
You can see the output here: https://imgur.com/a/xUlBeVY

I should say without the #inspect the code ran without performance issues: "'0:00:00:00.009'"

So I ran again the script for different sequence sizes:

String streamContents: [ : stream |
100 to: 2000 by: 100 do: [ : sl |
stream nextPutAll: ([
| text attributes |
" Generate a Text object from a random sequence "
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: sl) sequence asText.
" Setup an array for each nucleotide background color "
attributes := Array new: text size.
1 to: text size do: [ : index |
attributes at: index put: {
(TextBackgroundColor color: (BioDNAAlphabet colorMap at: (text at: index))) }  ].
text runs: (RunArray newFrom: attributes).
text inspect
] timeToRun asString);
cr
]
]

And these are the results:

0:00:00:00.147
0:00:00:00.28
0:00:00:00.568
0:00:00:00.993
0:00:00:01.776
0:00:00:02.123
0:00:00:03.111
0:00:00:04.084
0:00:00:04.574
0:00:00:06.192
0:00:00:07.214
0:00:00:07.915
0:00:00:10.382
0:00:00:12.725
0:00:00:12.359
0:00:00:17.357
0:00:00:17.147
0:00:00:20.651
0:00:00:20.392
0:00:00:23.238

At first I thought it was a problem of the Glamout text renderer for Rubric Text, but profiling a single pass of the snippet for 2000 letters, shows a couple of methods in Rubric scanner, after some DNU sends, which are consuming a lot of the time: RubCharacterBlockScanner(RubCharacterBlockScanner) >> characterBlockAtPoint:index:in: and RubCharacterBlockScanner(RubCharacterBlockScanner) >> endOfRun". I attached the full profiler report so you may have a look if you like. But the summary is:

**Leaves**
37.4% {8800ms} RubCompositionScanner(RubCharacterScanner)>>basicScanCharactersFrom:to:in:rightX:stopConditions:kern:
6.3% {1476ms} Dictionary>>at:ifAbsentPut:
6.1% {1425ms} Context>>unwindComplete
4.6% {1082ms} Semaphore>>criticalReleasingOnError:
4.2% {991ms} Dictionary>>at:ifAbsent:
3.3% {785ms} Context>>aboutToReturn:through:
2.2% {527ms} Context>>resume:through:
2.0% {470ms} ExternalAddress>>isNull
1.8% {421ms} BlockClosure>>on:do:
1.7% {402ms} RubCharacterBlockScanner(RubCharacterScanner)>>setConditionArray:
1.6% {378ms} FreeTypeFace>>validate
1.6% {376ms} Dictionary>>scanFor:
1.5% {364ms} Context>>unwindComplete:
1.5% {344ms} Context>>unwindBlock
1.4% {323ms} Array(SequenceableCollection)>>do:
1.3% {299ms} Dictionary(HashedCollection)>>findElementOrNil:
1.2% {293ms} RunArray>>at:setRunOffsetAndValue:
1.2% {289ms} FreeTypeCache>>atFont:charCode:type:ifAbsentPut:
1.1% {252ms} FreeTypeCacheLinkedList>>moveDown:

**Memory**
old +0 bytes
young -1,485,272 bytes
used -1,485,272 bytes
free +1,485,272 bytes

**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 947 totalling 1,576ms (7.0% uptime), avg 2.0ms
tenures 0
root table 0 overflows

So my question is, is there any other text rendering backends to try? And when I say backends I say which don't use Rubric.

Cheers,

Hernán

<Profile_DNABgColoring.txt>_______________________________________________
Lse-pharo4pharo mailing list
[hidden email]
https://lists.gforge.inria.fr/mailman/listinfo/lse-pharo4pharo

--------------------------------------------
Stéphane Ducasse
03 59 35 87 52
Assistant: Julie Jonas 
FAX 03 59 57 78 50
TEL 03 59 35 86 16
S. Ducasse - Inria
40, avenue Halley, 
Parc Scientifique de la Haute Borne, Bât.A, Park Plaza
Villeneuve d'Ascq 59650
France