Re: XML Parser / Pastell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: XML Parser / Pastell

Alexandre Bergel
Note that this email is sent to the Pharo mailing list. It is actually  
more appropriate since the moose mailing list members are also part of  
Pharo.

> I have an update I will push in a day or two that extends namespace  
> support to attributes and also preserves the order of attributes.  
> Additionally, the tokenizer has been sped up somewhat. I have also  
> renamed some of the SAX handlers, but if you have a subclass that  
> implements them, they will still be invoked but with a warning that  
> they have been deprecated.

Using deprecation is good

> I am going back on using Symbols to identify elements and  
> attributes. The performance of such a Symbol-based system is quite  
> erratic; running benchmark1 several times (at a much higher  
> interval), I repeatedly got results in the ~5800 to ~6200 range in a  
> clean image with XML-Support loaded into it from Monticello. Saving  
> the image and running it again, I got results in the range of ~4400  
> to ~4700. There seems to be a performance hit associated with using  
> symbols in this fashion that does not abate until you save the image  
> at least once. Even then my tests seem to indicate that a String-
> based system would be slightly faster. This is probably due to the  
> initial overhead of interning Symbols and the subsequent overhead of  
> looking them up.

Really strange. What make you think it is related to Symbol?

Cheers,
Alexandre

>
> ---- On Thu, 04 Mar 2010 12:35:39 -0800 Alexandre Bergel <[hidden email]
> > wrote ----
>
>> Hi,
>>
>> I exchanged a number of emails with Jaayer and Norbert regarding some
>> improvements of XMLSupport and its port to Gemstone.
>> It may be a bit difficult for people to follow this, but I think it  
>> is
>> important to not discuss privately.
>>
>>>>>>> I already changed
>>>>>>>
>>>>>>> XMLTokenizer>>nextName
>>>>>>> ....
>>>>>>> ^ self fastStreamStringContents: nameBuffer
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> XMLTokenizer>>nextName
>>>>>>> ....
>>>>>>> ^ (self fastStreamStringContents: nameBuffer) asSymbol
>>>>>>>
>>>>>>> in the gemstone parser to be more consistent.
>>>>>>
>>>>>> Have you noticed any slow down for this?
>>>>>>
>>>>> No I didn't do any tests. But if internally all names are symbols
>>>>> than I guess converting it while reading is the best way to do.
>>>>
>>>> I added benchmark1 in XMLParserTest. Really simple way to measure
>>>> progress (or slowdown).
>>>> On my machine, I have:
>>>> XMLParserTest new benchmark1
>>>> => 2097
>>>>
>>>> Adding "(self fastStreamStringContents: nameBuffer) asSymbol"
>>>> increase the bench to 2273
>>>>
>>> I don't believe this ;) you read them as string from the stream. If
>>> they are managed as symbols somehow they need to be converted. If
>>> not at this place then on some other. I would suspect that there are
>>> doubled calls to asSymbol. Could you check the sources?
>>
>> There is indeed a slowdown. I am not sure where it comes from  
>> however.
>> Executing twice "XMLParserTest new benchmark1" does not return the
>> same result. Actually, it increases at each execution! I thought that
>> a garbage collect before running the bench would help, does  
>> apparently
>> it does not.
>>
>> Calling asSymbol on a symbol should not be perceptible I believe.
>>
>> Cheers,
>> Alexandre
>>
>>
>>>>> anElement attributes class (I wrote species but that will fail in
>>>>> gemstone too I guess. Hell!)
>>>>
>>>> I just committed with species. Let me know. This is easy to adjust.
>>>>
>>> Ok.
>>>
>>>>>>> The gemstone XML Parser decides somehow to use
>>>>>>> IdentityDictionary internally. I think this should be allowed. I
>>>>>>> just changed Dictionary to IdentityDictionary so the = test
>>>>>>> reflects the right type. If you change it it is clear that it
>>>>>>> fails. Because pharo uses Dictionary internally. So you might
>>>>>>> see that it is not a question of using Dictionary or
>>>>>>> IdentityDictionary but a question of the wrongness of using =
>>>>>>
>>>>>> How the XMLNodeTest should look like to accommodate your  
>>>>>> situation?
>>>>>>
>>>>> I need to recheck this. The problem is really that the XML Parser
>>>>> creatios instances of class Association but { #key->'value' }
>>>>> creates an instance of class SymbolAssociation. That means I would
>>>>> know how to fix the test but I want to understand the implications
>>>>> of all of this. I'll get to you if I know anything new.
>>>>
>>>> Ok.
>>>>
>>> My mail to the gemstone list led to a ticket about removing class
>>> checks from Association. That would be easing the handling a lot.
>>>
>>> Norbert
>>>>>
>>>>>
>>>>>> Alexandre
>>>>>>
>>>>>>>
>>>>>>>>> Here there is an assumption about the allAttributes collection
>>>>>>>>> while using = as comparsion. But there is also an assumption
>>>>>>>>> about the order of the content. I changed this to
>>>>>>>>>
>>>>>>>>> self assert: (firstPerson allAttributes includesAllOf:
>>>>>>>>> #(#'first-name' #'employee-number' #'family-name')).
>>>>>>>>> self assert: (firstPerson allAttributeAssociations asArray
>>>>>>>>> includesAllOf: {(#'first-name'->'Bob'). (#'employee-number'-
>>>>>>>>>> 'A0000'). (#'family-name'->'Gates')}).
>>>>>>>>
>>>>>>>> Very right. My mistake. But wouldn't an asSortedCollection do
>>>>>>>> the thing? Do you not test the size of the array.
>>>>>>>>
>>>>>>>>> This is not the best way to do because the check is only in
>>>>>>>>> one direction but for this test it is ok. Somehow the second
>>>>>>>>> assert fails and I have to check what is going on here.
>>>>>>>>
>>>>>>>> Yeah, my mistake. Sorry. The elements may be differently
>>>>>>>> ordered. Would a asSortedCollection help?
>>>>>>>>
>>>>>>>> I have now granted you an access to the repository. You should
>>>>>>>> be able to directly commit in it.
>>>>>>>>
>>>>>>>> Jaayer, what is your Squeaksource account?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Alexandre
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28.02.2010, at 02:01, Alexandre Bergel wrote:
>>>>>>>>>
>>>>>>>>>>> thanks for now. I did a first merge attempt. It will be
>>>>>>>>>>> quite a bit of work. For me the xml parser is an important
>>>>>>>>>>> component. With the newest changes it became biased towards
>>>>>>>>>>> pharo. There are things like ClassTestCase, Unicode
>>>>>>>>>>> CharacterSet. These are for sure improvements/changes in
>>>>>>>>>>> pharo you like to use. But they make porting a lot more
>>>>>>>>>>> difficult. I would be glad if we could find some way to
>>>>>>>>>>> lower the porting barrier. The necessary class I could put
>>>>>>>>>>> in the squeak compat package in gemstone. But then the xml
>>>>>>>>>>> parser will depend on the squeak package which I don't like.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Norbert,
>>>>>>>>>>
>>>>>>>>>> XMLParser effectively depends on Squeak specific classes. I
>>>>>>>>>> wrote a small script that identify the squeak classes used in
>>>>>>>>>> XML-Support. Here is the list: LanguageEnvironment, Unicode,
>>>>>>>>>> LocaleID, CharacterSet
>>>>>>>>>>
>>>>>>>>>> I guess that porting the whole multilingual support may not
>>>>>>>>>> be that easy. The tag xml:lang is used to select the proper
>>>>>>>>>> support. It should be easy for you to ignore it I guess.
>>>>>>>>>>
>>>>>>>>>> CharacterSet seems to be one that has to be ported. It is not
>>>>>>>>>> a big class. It depends on WideCharacterSet. I am not sure
>>>>>>>>>> whether this is useful in your case however.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Alexandre
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>>>>>>>>> Alexandre Bergel http://www.bergel.eu
>>>>>>>>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>>>>>>> Alexandre Bergel http://www.bergel.eu
>>>>>>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>>>>> Alexandre Bergel http://www.bergel.eu
>>>>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>>> Alexandre Bergel http://www.bergel.eu
>>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>> Alexandre Bergel http://www.bergel.eu
>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project