Hello,
After checking the problem with Guille, we have the hypothesis of the source of the problem. We have seen that accessing Free Type is not thread-safe. Basically, the FTFace holds a structure that is filled up with the glyph and its information. As this structure is part of the Font Face (a font face is a font plus size and attributes), only one request to a glyph can be executed at the time. As we are sharing the same font in many places, you will be starting to see the problem. Also, we have seen that there many accesses to the glyphs outside the UI process. This problem started to appear as we starting to use Calypso (but it is not limited to it), as Calypso uses lazy tabs. The creation of these tabs is done outside the UI process. This is only a hack to test our hypothesis. To fix it correctly we will have to rewrite the drawing of the glyphs and synchronize the access to the glyph data information. Also, we want to do it in a way that does not penalize the processing in the UI process. I have only patched the code that is used by the rendering of morphic, other renderings like Athens or any other user using FT should be correctly rewritten. Once we are sure that synchronizing the access to FT fixes the problem, we will do a real fix not a dirty hack like this. I will be testing this new Image to see if there is an improvement. I will really love you try to use this image and tell me if you still find the problem. There is a PR generating a Pharo8 image (it is called wrong, but... it is a Pharo8) 32 bits ===== https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip 64 bits ===== https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip If somebody is willing to use it in Pharo 7, I can create a PR against Pharo7 to generate patched P7 images. Thanks! -- Pablo Tesone. [hidden email] |
Good find! The fac tthat FT_Face is not thread-safe is documented Hello, |
In reply to this post by tesonep@gmail.com
Thank you P and G.
It is so annoying. Stef > On 10 Apr 2019, at 15:38, [hidden email] wrote: > > Hello, > > After checking the problem with Guille, we have the hypothesis of the > source of the problem. > We have seen that accessing Free Type is not thread-safe. > Basically, the FTFace holds a structure that is filled up with the > glyph and its information. As this structure is part of the Font Face > (a font face is a font plus size and attributes), only one request to > a glyph can be executed at the time. As we are sharing the same font > in many places, you will be starting to see the problem. > > Also, we have seen that there many accesses to the glyphs outside the > UI process. > This problem started to appear as we starting to use Calypso (but it > is not limited to it), as Calypso uses lazy tabs. The creation of > these tabs is done outside the UI process. > > This is only a hack to test our hypothesis. > To fix it correctly we will have to rewrite the drawing of the glyphs > and synchronize the access to the glyph data information. Also, we > want to do it in a way that does not penalize the processing in the UI > process. > > I have only patched the code that is used by the rendering of morphic, > other renderings like Athens or any other user using FT should be > correctly rewritten. > > Once we are sure that synchronizing the access to FT fixes the > problem, we will do a real fix not a dirty hack like this. > > I will be testing this new Image to see if there is an improvement. > > I will really love you try to use this image and tell me if you still > find the problem. > > There is a PR generating a Pharo8 image (it is called wrong, but... it > is a Pharo8) > > 32 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip > > 64 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip > > If somebody is willing to use it in Pharo 7, I can create a PR against > Pharo7 to generate patched P7 images. > > Thanks! > > -- > Pablo Tesone. > [hidden email] > |
In reply to this post by tesonep@gmail.com
Pablo I tried the file you sent and I imported it in the PharoLauncher and I could not handle file.
I got a File class bad argument and after I could not use iceberg. Stef > On 10 Apr 2019, at 15:38, [hidden email] wrote: > > Hello, > > After checking the problem with Guille, we have the hypothesis of the > source of the problem. > We have seen that accessing Free Type is not thread-safe. > Basically, the FTFace holds a structure that is filled up with the > glyph and its information. As this structure is part of the Font Face > (a font face is a font plus size and attributes), only one request to > a glyph can be executed at the time. As we are sharing the same font > in many places, you will be starting to see the problem. > > Also, we have seen that there many accesses to the glyphs outside the > UI process. > This problem started to appear as we starting to use Calypso (but it > is not limited to it), as Calypso uses lazy tabs. The creation of > these tabs is done outside the UI process. > > This is only a hack to test our hypothesis. > To fix it correctly we will have to rewrite the drawing of the glyphs > and synchronize the access to the glyph data information. Also, we > want to do it in a way that does not penalize the processing in the UI > process. > > I have only patched the code that is used by the rendering of morphic, > other renderings like Athens or any other user using FT should be > correctly rewritten. > > Once we are sure that synchronizing the access to FT fixes the > problem, we will do a real fix not a dirty hack like this. > > I will be testing this new Image to see if there is an improvement. > > I will really love you try to use this image and tell me if you still > find the problem. > > There is a PR generating a Pharo8 image (it is called wrong, but... it > is a Pharo8) > > 32 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip > > 64 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip > > If somebody is willing to use it in Pharo 7, I can create a PR against > Pharo7 to generate patched P7 images. > > Thanks! > > -- > Pablo Tesone. > [hidden email] > |
We need a new stable vm... On Wed, Apr 10, 2019 at 4:16 PM ducasse <[hidden email]> wrote: Pablo I tried the file you sent and I imported it in the PharoLauncher and I could not handle file.
|
In reply to this post by ducasse
Ohhh, my fault, it is trying to look up for the commit of the merge.
That will never work... Because the commit is only in the CI. On Wed, Apr 10, 2019 at 4:16 PM ducasse <[hidden email]> wrote: > > Pablo I tried the file you sent and I imported it in the PharoLauncher and I could not handle file. > I got a File class bad argument and after I could not use iceberg. > > Stef > > > On 10 Apr 2019, at 15:38, [hidden email] wrote: > > > > Hello, > > > > After checking the problem with Guille, we have the hypothesis of the > > source of the problem. > > We have seen that accessing Free Type is not thread-safe. > > Basically, the FTFace holds a structure that is filled up with the > > glyph and its information. As this structure is part of the Font Face > > (a font face is a font plus size and attributes), only one request to > > a glyph can be executed at the time. As we are sharing the same font > > in many places, you will be starting to see the problem. > > > > Also, we have seen that there many accesses to the glyphs outside the > > UI process. > > This problem started to appear as we starting to use Calypso (but it > > is not limited to it), as Calypso uses lazy tabs. The creation of > > these tabs is done outside the UI process. > > > > This is only a hack to test our hypothesis. > > To fix it correctly we will have to rewrite the drawing of the glyphs > > and synchronize the access to the glyph data information. Also, we > > want to do it in a way that does not penalize the processing in the UI > > process. > > > > I have only patched the code that is used by the rendering of morphic, > > other renderings like Athens or any other user using FT should be > > correctly rewritten. > > > > Once we are sure that synchronizing the access to FT fixes the > > problem, we will do a real fix not a dirty hack like this. > > > > I will be testing this new Image to see if there is an improvement. > > > > I will really love you try to use this image and tell me if you still > > find the problem. > > > > There is a PR generating a Pharo8 image (it is called wrong, but... it > > is a Pharo8) > > > > 32 bits > > ===== > > > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip > > > > 64 bits > > ===== > > > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip > > > > If somebody is willing to use it in Pharo 7, I can create a PR against > > Pharo7 to generate patched P7 images. > > > > Thanks! > > > > -- > > Pablo Tesone. > > [hidden email] > > > > > -- Pablo Tesone. [hidden email] |
In reply to this post by Nicolas Cellier
Thanks Nicolas.
You know that I appreciate your work. I would appreciate that Eliot does not send me personal emails to tell that I’m a bad leader :) but at least this is coherent with the violent emails I received in the past - nothing new under the sun. Just anger against me. I cannot do much. I imagine that I cannot avoid the send so I will block the reception :). On Apr 10, 2019, at 7:08 AM, ducasse <[hidden email]> wrote: Why can you not include Nicolas Cellier in your thanks? He provided pertinent information. That you only ever thank members of the Pharo team I find annoying. It is bad leadership.
|
In reply to this post by tesonep@gmail.com
[hidden email]
<[hidden email]> wrote: > If somebody is willing to use it in Pharo 7, I can create a PR against > Pharo7 to generate patched P7 images. Yes, I want to Stephan |
In reply to this post by tesonep@gmail.com
Thanks a lot for actually diving into this, this mysterious issue has been bugging us for a very long time. I do hope you can finally solve this. I am sure many people will be grateful.
Sven > On 10 Apr 2019, at 15:38, [hidden email] wrote: > > Hello, > > After checking the problem with Guille, we have the hypothesis of the > source of the problem. > We have seen that accessing Free Type is not thread-safe. > Basically, the FTFace holds a structure that is filled up with the > glyph and its information. As this structure is part of the Font Face > (a font face is a font plus size and attributes), only one request to > a glyph can be executed at the time. As we are sharing the same font > in many places, you will be starting to see the problem. > > Also, we have seen that there many accesses to the glyphs outside the > UI process. > This problem started to appear as we starting to use Calypso (but it > is not limited to it), as Calypso uses lazy tabs. The creation of > these tabs is done outside the UI process. > > This is only a hack to test our hypothesis. > To fix it correctly we will have to rewrite the drawing of the glyphs > and synchronize the access to the glyph data information. Also, we > want to do it in a way that does not penalize the processing in the UI > process. > > I have only patched the code that is used by the rendering of morphic, > other renderings like Athens or any other user using FT should be > correctly rewritten. > > Once we are sure that synchronizing the access to FT fixes the > problem, we will do a real fix not a dirty hack like this. > > I will be testing this new Image to see if there is an improvement. > > I will really love you try to use this image and tell me if you still > find the problem. > > There is a PR generating a Pharo8 image (it is called wrong, but... it > is a Pharo8) > > 32 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip > > 64 bits > ===== > > https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip > > If somebody is willing to use it in Pharo 7, I can create a PR against > Pharo7 to generate patched P7 images. > > Thanks! > > -- > Pablo Tesone. > [hidden email] > |
So, this morning we have been working on validating this hypothesis and making a meaningful and reproducible test for it. I'm just the messenger here because Pablo who has done most of the job is shy. ## The main story Since we are working with concurrency and non-determinism, we have designed a test that reproduces the case with an accuracy of ~94%. Basically the test renders a Rubric morph in concurrent green threads on a big string, and then we compare that the drawn morphs should be bit identical. We have made some measurements and observed that putting three concurrent processes made the chance of hitting the bug to almost 94%. So running it more than ~5 times makes it highly reproducible. Good thing: this test inside a loop of 10 iterations (to increase the probability of hitting the bug to >~99%) detects the bug in the stock image and not in the one with our patch. Bad thing: this test relies on rubric, we have tried to simplify it (like just doing a collect on the string to fetch the glyph for each character), but simpler code produces a tighter loop which makes it far less reproducible (down from 94 to 33%). Still, there are several questions open: - can we simplify the test? :) - we have to think about a smart way to serialize Freetype access in the image (there are several users) - I'll paste below our math, so if somebody would like to review it would be good too, our probability is a bit rusty. - we have been doing benchmarks with Pablo this morning and adding a mutex does not seem to have any negative impact in rendering. Help is super welcome of course!! The patch that Pablo has done in the PR will just work for the most common cases (at least for the calypso tabs), but it shows that FT* needs some love. Still we would like to be able to backport something like this in Pharo7, but backporting a big Freetype refactor would not be so good. Maybe the mutex is a good enough change for Pharo7? Any more ideas? input? Cheers, Guille and Pablo ## The code The code of the test is the following: ``` | sem text canvases blocky |sem := Semaphore new. text := (String loremIpsum: 25*1024). FreeTypeCache current removeAll. canvases := OrderedCollection new. blocky := [ | canvas | canvas := FormCanvas extent: 1000@1000. canvases add: canvas. (RubScrolledTextMorph new) setText: text; font: StandardFonts codeFont; bounds: (0@0 corner: canvas form extent); fullDrawOn: canvas. sem signal ]. blocky forkAt: 39. blocky forkAt: 39. blocky forkAt: 39. sem wait; wait; wait. self assert: (((canvases at:1) form bits = (canvases at:2) form bits) and: [ ((canvases at:2) form bits = (canvases at:3) form bits) ]) ``` ## The math p := 0.94 asScaledDecimal. hittingTheBug := 0. notHittingTheBug := 1 - hittingTheBug. 5 timesRepeat: [ hittingTheBug := hittingTheBug + (notHittingTheBug * p). notHittingTheBug := 1 - hittingTheBug ]. hittingTheBug. "after 5 iterations" "0.99999969s8" On Wed, Apr 10, 2019 at 8:04 PM Sven Van Caekenberghe <[hidden email]> wrote: Thanks a lot for actually diving into this, this mysterious issue has been bugging us for a very long time. I do hope you can finally solve this. I am sure many people will be grateful.
|
I am not shy, the mail of Guille is perfect, I cannot do it! Thanks a lot for all the explanations! On Thu, Apr 11, 2019 at 3:25 PM Guillermo Polito <[hidden email]> wrote:
Pablo Tesone.
[hidden email] |
Free forum by Nabble | Edit this page |