how to use indexes to copy a substring out of a string

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to use indexes to copy a substring out of a string

Roelof Wobben
Hello,

Still working on AdventOfCode

Im struggeling to see how this can be solved.

Now, a nice string is one with all of the following properties:

  • It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
  • It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.

my game plan was this

1) group the characters and filter everything out which has a count not equal to two

2) find the indexes of the characters and with the indexes use a copy method so I have a substring out of it

3) do the same with the substring so again 1 and 2

but how can I make 2 work.

Roelof


Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Pharo Smalltalk Users mailing list
Hi Roelof

You may have a look at my StringExtensions package at
https://github.com/hernanmd/StringExtensions
I wrote several algorithms for working with substrings like:

https://github.com/hernanmd/StringExtensions/blob/master/repository/StringExtensions.package/String.extension/instance/indexesOfMotif..st
https://github.com/hernanmd/StringExtensions/blob/master/repository/StringExtensions.package/String.extension/instance/indicesOfSubstringOverlaps..st

Some of them used in https://github.com/hernanmd/BioSmalltalk where
you can find also algorithms for k-mer counting and clump finding:

https://github.com/hernanmd/BioSmalltalk/blob/master/repository/BioTools.package/BioSequence.class/instance/clumpFindK.length.times..st
https://github.com/hernanmd/BioSmalltalk/blob/master/repository/BioTools.package/BioSequence.class/instance/kmersCount.mismatches..st

for the Bio repo just consider a Sequence like a more advanced String.

Cheers,

Hernán

El sáb., 29 dic. 2018 a las 13:29, Roelof Wobben (<[hidden email]>) escribió:

>
> Hello,
>
> Still working on AdventOfCode
>
> Im struggeling to see how this can be solved.
>
> Now, a nice string is one with all of the following properties:
>
> It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
> It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.
>
> my game plan was this
>
> 1) group the characters and filter everything out which has a count not equal to two
>
> 2) find the indexes of the characters and with the indexes use a copy method so I have a substring out of it
>
> 3) do the same with the substring so again 1 and 2
>
> but how can I make 2 work.
>
> Roelof
>
>

Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Roelof Wobben
Op 29-12-2018 om 21:05 schreef Hernán Morales Durand via Pharo-users:

Thanks,

I will look at these but I think there are too advanced for Aoc
challenges but maybe I see a way to solve it more easily


Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Ben Coman
In reply to this post by Roelof Wobben


On Sun, 30 Dec 2018 at 00:29, Roelof Wobben <[hidden email]> wrote:
Hello,

Still working on AdventOfCode

I'm struggling to see how this can be solved.

Now, a nice string is one with all of the following properties:

  • It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
To exclude overlaps, one approach could be to subtract each candidate pair from the string 
and then check the remainder-string for matches.
I'm not sure, but suspect that as the candidate pair progresses through the string, only the 
trailing remainder-string needs matching since the preceding remainder-string has already been checked.

So a Pharo 7 Spotter search for 'pairs' finds 28 implementors 
where SequenceableCollection>>overlappingPairsCollect:
looks quite close to what is required...
  'abcdefg' asArray overlappingPairsCollect: [ :a :b | { a.b } ].  "#(
  #($a $b) 
  #($b $c) 
  #($c $d) 
  #($d $e) 
  #($e $f) 
  #($f $g))" 

Reviewing its implementation...
    SequenceableCollection>>overlappingPairsCollect: aBlock 
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i + 1)) ].
^retval

it can be adapted to provide also a remainder...
    SequenceableCollection>>overlappingPairsRemainderCollect: aBlock 
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i + 1) value: (self allButFirst: i + 1)) ].
^retval

for use like this...
  'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | { a.b.rem } ] "#(
  #($a $b #($c $d $e $f $g)) 
  #($b $c #($d $e $f $g)) 
  #($c $d #($e $f $g)) 
  #($d $e #($f $g)) 
  #($e $f #($g)) 
  #($f $g #()))" 


  • It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.
You might adapt "overlappingPairsCollect: aBlock"
into  "overlappingPairsSkip: n collect: aBlock"
but changing  "size -1"  and   "i + 1"
into  "size - n"  and   "i + n"
 
HTH
cheers -ben

P.S. If  #overlappingPairsSkip:collect:  seemed generically useful to push into Pharo,
then the following would be redefined to reuse it...
  SequenceableCollection>>overlappingPairsCollect: aBlock
      ^self overlappingPairsSkip: 1 collect: aBlock

The extra layer of function call doesn't add much overhead.

On the other hand #overlappingPairsRemainderCollect:
does lots of extra copying through #allButFirst:
so would not be so good for reuse by  #overlappingPairsCollect:
Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Pharo Smalltalk Users mailing list
Thanks Ben,

I think you almost give me the answer but I have to do some testing to see how it really works,

Roelof



Op 30-12-2018 om 02:14 schreef Ben Coman:


On Sun, 30 Dec 2018 at 00:29, Roelof Wobben <[hidden email]> wrote:
Hello,

Still working on AdventOfCode

I'm struggling to see how this can be solved.

Now, a nice string is one with all of the following properties:

  • It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
To exclude overlaps, one approach could be to subtract each candidate pair from the string 
and then check the remainder-string for matches.
I'm not sure, but suspect that as the candidate pair progresses through the string, only the 
trailing remainder-string needs matching since the preceding remainder-string has already been checked.

So a Pharo 7 Spotter search for 'pairs' finds 28 implementors 
where SequenceableCollection>>overlappingPairsCollect:
looks quite close to what is required...
  'abcdefg' asArray overlappingPairsCollect: [ :a :b | { a.b } ].  "#(
  #($a $b) 
  #($b $c) 
  #($c $d) 
  #($d $e) 
  #($e $f) 
  #($f $g))" 

Reviewing its implementation...
    SequenceableCollection>>overlappingPairsCollect: aBlock 
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i + 1)) ].
^retval

it can be adapted to provide also a remainder...
    SequenceableCollection>>overlappingPairsRemainderCollect: aBlock 
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i + 1) value: (self allButFirst: i + 1)) ].
^retval

for use like this...
  'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | { a.b.rem } ] "#(
  #($a $b #($c $d $e $f $g)) 
  #($b $c #($d $e $f $g)) 
  #($c $d #($e $f $g)) 
  #($d $e #($f $g)) 
  #($e $f #($g)) 
  #($f $g #()))" 


  • It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.
You might adapt "overlappingPairsCollect: aBlock"
into  "overlappingPairsSkip: n collect: aBlock"
but changing  "size -1"  and   "i + 1"
into  "size - n"  and   "i + n"
 
HTH
cheers -ben

P.S. If  #overlappingPairsSkip:collect:  seemed generically useful to push into Pharo,
then the following would be redefined to reuse it...
  SequenceableCollection>>overlappingPairsCollect: aBlock
      ^self overlappingPairsSkip: 1 collect: aBlock

The extra layer of function call doesn't add much overhead.

On the other hand #overlappingPairsRemainderCollect:
does lots of extra copying through #allButFirst:
so would not be so good for reuse by  #overlappingPairsCollect:

Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Pharo Smalltalk Users mailing list
In reply to this post by Ben Coman
Op 30-12-2018 om 02:14 schreef Ben Coman:
'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | { a.b.rem } ]


Sorry, I do not see how this can work so I can find the answer

qjhvhtzxzqqjkmpb is nice because is has a pair that appears twice (qj)

so if I try it on the code for overlappingPairsRemainderCollect  I see this as answer :

"#($q $j #($h $v $h $t $z $x $z $q $q $j $k $m $p $b))"
"#($j $h #($v $h $t $z $x $z $q $q $j $k $m $p $b))"
"#($h $v #($h $t $z $x $z $q $q $j $k $m $p $b))"
"#($v $h #($t $z $x $z $q $q $j $k $m $p $b))"
"#($h $t #($z $x $z $q $q $j $k $m $p $b))"
"#($t $z #($x $z $q $q $j $k $m $p $b))"
"#($z $x #($z $q $q $j $k $m $p $b))"
"#($x $z #($q $q $j $k $m $p $b))"
"#($z $q #($q $j $k $m $p $b))"
"#($q $q #($j $k $m $p $b))"
"#($q $j #($k $m $p $b))"
"#($j $k #($m $p $b))"
"#($k $m #($p $b))"
"#($m $p #($b))"
"#($p $b #())"

How does help me the output in telling me that there are two pairs that are there two times

Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Ben Coman


On Sun, 30 Dec 2018 at 17:53, Roelof Wobben via Pharo-users <[hidden email]> wrote:
Op 30-12-2018 om 02:14 schreef Ben Coman:
'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | { a.b.rem } ]
This was deliberately only half a solution.  The block above "[ :a :b :rem | { a.b.rem } ]"  
is a placeholder only - to show you what data you would have to work with. You need to adapt it.

 
Sorry, I do not see how this can work so I can find the answer

qjhvhtzxzqqjkmpb is nice because is has a pair that appears twice (qj)

so if I try it on the code for overlappingPairsRemainderCollect  I see this as answer :

"#($q $j #($h $v $h $t $z $x $z $q $q $j $k $m $p $b))"

The line above looks pertinent!! Inside the block you will have...
a ==> $q
b ==> $j
rem ==> #($h $v $h $t $z $x $z $q $q $j $k $m $p $b)

<snip>
"#($p $b #())"

How does help me the output in telling me that there are two pairs that are there two times

You might start with the following block...
   [ :a :b :rem | self halt. rem overlappingPairsCollect: [ :a2 :b2 | ...something... ] ]  
and from in the debugger experiment by inspecting the second half of it.

cheers -ben 
Reply | Threaded
Open this post in threaded view
|

Re: how to use indexes to copy a substring out of a string

Pharo Smalltalk Users mailing list
In reply to this post by Roelof Wobben
Let us approach this as simply as we can.
(1) There are two numbers 1 <= i <= n-3, i+2 <= j <= n-1
    such that (s at: i) = (s at: j) and (s at: i+1) = (s at: j+1)

    Brute force will get us where we want to go:
    String
      hasTwoCharacterNonoverlappingRepeat
        1 to: self size - 3 do: [:i |
          i + 2 to: self size - 1 do: [:j |
            ((self at: i  ) = (self at: j  ) and:
            [(self at: i+1) = (self at: j+1)])
              ifTrue: [^true]]].
        ^false

    This is pretty much how you would do it in C or Java.
    This takes O(n**2) time where n is the size of the string.

    If you want to be *really* clever, you can do something with
    suffix arrays, but if you just want to be a little bit clever,
    you can sort the two-character substrings and their positions
    by just sorting the positions.
    Make an array a of the integers 1..n-1.
    Sort it by viewing each i in the array as the triple
    (s[i],s[i+1],i).  You don't *make* the triples, just do the
    comparisons as if they existed.
    Now a linear scan of the array will find a match if there is one.
    The whole process is O(n.lg n).  It's only worth doing when n is
    large.

(2) Finding xyx.  You did not say whether the middle letter can be the same
    as the outer ones or must be different.  I'm going to suppose that it
    does not matter.

    hasXYX
      3 to: self size do: [:i |
        (self at: i-2) = (self at: i) ifTrue: [^true]].
      ^false

    Again, this is just what you would do in Java or C:
    public static boolean hasXYX(String s) {
        final int n = s.length();
        for (int i = 2; i < n; i++)
            if (s.charAt(i-2) == s.charAt(i)) return true;
        }
        return false;
    }

For what it's worth, in my system this scans a Scrabble dictionary
with 146522 words (1,220,068 bytes), finding 2779 "nice" words,
in 0.18 seconds, so there's not a lot of point in improving it.

In both cases, DON'T copy.  DON'T make substrings.
Just state the problem simply and convert it to obvious code.
Simple boring code is GOOD.


On Sun, 30 Dec 2018 at 05:29, Roelof Wobben <[hidden email]> wrote:
Hello,

Still working on AdventOfCode

Im struggeling to see how this can be solved.

Now, a nice string is one with all of the following properties:

  • It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
  • It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.

my game plan was this

1) group the characters and filter everything out which has a count not equal to two

2) find the indexes of the characters and with the indexes use a copy method so I have a substring out of it

3) do the same with the substring so again 1 and 2

but how can I make 2 work.

Roelof