fastest #isUtf8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

fastest #isUtf8

Chris Muller-4
For the GraphQL engine, I have to validate String inputs as being valid UTF-8.  Before researching it, I thought I'd check whether anyone has already done it and willing to share their implementation.

I see we have conversions for this, but I need the boolean response whether its valid UTF8, as fast as possible.

Thanks!



Reply | Threaded
Open this post in threaded view
|

Re: fastest #isUtf8

Jakob Reschke
Hi Chris,

I don't have an implementation ready.
Not Smalltalk, but a few approaches.

Is there any chance to do vectorized computation (using SIMD registers and instructions) from Squeak? Can the JIT compiler generate such code?

Kind regards,
Jakob

Am Do., 23. Jan. 2020 um 01:46 Uhr schrieb Chris Muller <[hidden email]>:
For the GraphQL engine, I have to validate String inputs as being valid UTF-8.  Before researching it, I thought I'd check whether anyone has already done it and willing to share their implementation.

I see we have conversions for this, but I need the boolean response whether its valid UTF8, as fast as possible.

Thanks!




Reply | Threaded
Open this post in threaded view
|

Re: fastest #isUtf8

Torge Husfeldt
Priceless:

Höhrmann’s finite-state machine

Von meinem iPhone gesendet

Am 24.01.2020 um 22:15 schrieb Jakob Reschke <[hidden email]>:


Hi Chris,

I don't have an implementation ready.
Not Smalltalk, but a few approaches.

Is there any chance to do vectorized computation (using SIMD registers and instructions) from Squeak? Can the JIT compiler generate such code?

Kind regards,
Jakob

Am Do., 23. Jan. 2020 um 01:46 Uhr schrieb Chris Muller <[hidden email]>:
For the GraphQL engine, I have to validate String inputs as being valid UTF-8.  Before researching it, I thought I'd check whether anyone has already done it and willing to share their implementation.

I see we have conversions for this, but I need the boolean response whether its valid UTF8, as fast as possible.

Thanks!





Reply | Threaded
Open this post in threaded view
|

Re: fastest #isUtf8

timrowledge
In reply to this post by Jakob Reschke


> On 2020-01-24, at 1:15 PM, Jakob Reschke <[hidden email]> wrote:
>
> Is there any chance to do vectorized computation (using SIMD registers and instructions) from Squeak? Can the JIT compiler generate such code?

Maybe not right now but it's a code generator so sure. Just work....

I see from his github code that there is also an ARM example - oh and it apparently works on x64 too, so win. See https://github.com/cyb70289/utf8/ Looks to me like a fairly easy plugin target.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: YVR: Branch to Vancouver