OT: compressing log files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

OT: compressing log files

Eliot Miranda-2
Hi All,

    I've just needed to make sense of a very long log file generated by strace.  The log file is full of entries like:

--- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744804, 491238}, NULL) = 0
sigreturn()                             = ? (mask now [])
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0

and my workspace script reduces these to e.g.

--- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 316183}, NULL) = 0
sigreturn()                             = ? (mask now [])
NEXT 2 LINES REPEAT 715 TIMES
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 317189}, NULL) = 0
sigreturn()                             = ? (mask now [])


My question is has anyone looked at this issue in any depth and perhaps come up with something not as crude as the below and possibly even recursive.  i.e. the above would ideally be reduced to e.g.

NEXT 7 LINES REPEAT 123456 TIMES
--- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 316183}, NULL) = 0
sigreturn()                             = ? (mask now [])
NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 317189}, NULL) = 0
sigreturn()                             = ? (mask now [])



Here's my quick hack that I ran in vw7.7nc:

| f o lines maxrun repeats range |
f := '../Cog/squeak.strace.log' asFilename readStream.
o := 'compressed.log' asFilename writeStream.
lines := OrderedCollection new.
maxrun := 50.
repeats := 0.
range := nil.
[[f atEnd] whileFalse:
[lines size > maxrun ifTrue:
[repeats > 0
ifTrue:
[1 to: range first - 1 do:
[:i| o nextPutAll: (lines at: i); cr].
o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT '; print: repeats + 1; nextPutAll: ' TIMES'; cr.
range do:
[:i| o nextPutAll: (lines at: i); cr].
lines removeFirst: range last.
repeats := 0]
ifFalse:
[o nextPutAll: lines removeFirst; cr; flush].
range := nil].
lines addLast: (f upTo: Character cr).
[:exit|
1 to: lines size do:
[:i| | line repeat |
line := lines at: i.
repeat := lines nextIndexOf: line from: i + 1 to: lines size.
(repeat ~~ nil
and: [lines size >= (repeat - i * 2 + i)
and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i + repeat)]]]) ifTrue:
[repeats := repeats + 1.
range isNil
ifTrue: [range := i to: repeat - 1]
ifFalse:
[range = (i to: repeat - 1) ifTrue:
[range do: [:ignore| lines removeAtIndex: repeat].
exit value]]]]] valueWithExit]]
ensure: [f close. o close].
repeats

Forgive the cross post.  I expect deep expertise in each newsgroup posted to.

best
Eliot

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: OT: compressing log files

Matthias Berth-2
Hi Eliot,

maybe RunArray helps here?

From the class comment of RunArray (in Pharo):
My instances provide space-efficient storage of data which tends to be
constant over long runs of the possible indices. Essentially repeated
values are stored singly and then associated with a "run" length that
denotes the number of consecutive occurrences of the value.

Cheers

Matthias

2010/2/9 Eliot Miranda <[hidden email]>:

> Hi All,
>     I've just needed to make sense of a very long log file generated by
> strace.  The log file is full of entries like:
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> gettimeofday({1265744804, 491238}, NULL) = 0
> sigreturn()                             = ? (mask now [])
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> and my workspace script reduces these to e.g.
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> gettimeofday({1265744797, 316183}, NULL) = 0
> sigreturn()                             = ? (mask now [])
> NEXT 2 LINES REPEAT 715 TIMES
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> gettimeofday({1265744797, 317189}, NULL) = 0
> sigreturn()                             = ? (mask now [])
>
> My question is has anyone looked at this issue in any depth and perhaps come
> up with something not as crude as the below and possibly even recursive.
>  i.e. the above would ideally be reduced to e.g.
> NEXT 7 LINES REPEAT 123456 TIMES
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> gettimeofday({1265744797, 316183}, NULL) = 0
> sigreturn()                             = ? (mask now [])
> NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> gettimeofday({1265744797, 317189}, NULL) = 0
> sigreturn()                             = ? (mask now [])
>
>
> Here's my quick hack that I ran in vw7.7nc:
> | f o lines maxrun repeats range |
> f := '../Cog/squeak.strace.log' asFilename readStream.
> o := 'compressed.log' asFilename writeStream.
> lines := OrderedCollection new.
> maxrun := 50.
> repeats := 0.
> range := nil.
> [[f atEnd] whileFalse:
> [lines size > maxrun ifTrue:
> [repeats > 0
> ifTrue:
> [1 to: range first - 1 do:
> [:i| o nextPutAll: (lines at: i); cr].
> o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT ';
> print: repeats + 1; nextPutAll: ' TIMES'; cr.
> range do:
> [:i| o nextPutAll: (lines at: i); cr].
> lines removeFirst: range last.
> repeats := 0]
> ifFalse:
> [o nextPutAll: lines removeFirst; cr; flush].
> range := nil].
> lines addLast: (f upTo: Character cr).
> [:exit|
> 1 to: lines size do:
> [:i| | line repeat |
> line := lines at: i.
> repeat := lines nextIndexOf: line from: i + 1 to: lines size.
> (repeat ~~ nil
> and: [lines size >= (repeat - i * 2 + i)
> and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i +
> repeat)]]]) ifTrue:
> [repeats := repeats + 1.
> range isNil
> ifTrue: [range := i to: repeat - 1]
> ifFalse:
> [range = (i to: repeat - 1) ifTrue:
> [range do: [:ignore| lines removeAtIndex: repeat].
> exit value]]]]] valueWithExit]]
> ensure: [f close. o close].
> repeats
> Forgive the cross post.  I expect deep expertise in each newsgroup posted
> to.
> best
> Eliot
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: OT: compressing log files

Nicolas Cellier
The problem is that (#(1 2 1 2 1 2) as: RunArray) does not save any space...

Nicolas

2010/2/10 Matthias Berth <[hidden email]>:

> Hi Eliot,
>
> maybe RunArray helps here?
>
> From the class comment of RunArray (in Pharo):
> My instances provide space-efficient storage of data which tends to be
> constant over long runs of the possible indices. Essentially repeated
> values are stored singly and then associated with a "run" length that
> denotes the number of consecutive occurrences of the value.
>
> Cheers
>
> Matthias
>
> 2010/2/9 Eliot Miranda <[hidden email]>:
>> Hi All,
>>     I've just needed to make sense of a very long log file generated by
>> strace.  The log file is full of entries like:
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> gettimeofday({1265744804, 491238}, NULL) = 0
>> sigreturn()                             = ? (mask now [])
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> and my workspace script reduces these to e.g.
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> gettimeofday({1265744797, 316183}, NULL) = 0
>> sigreturn()                             = ? (mask now [])
>> NEXT 2 LINES REPEAT 715 TIMES
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> gettimeofday({1265744797, 317189}, NULL) = 0
>> sigreturn()                             = ? (mask now [])
>>
>> My question is has anyone looked at this issue in any depth and perhaps come
>> up with something not as crude as the below and possibly even recursive.
>>  i.e. the above would ideally be reduced to e.g.
>> NEXT 7 LINES REPEAT 123456 TIMES
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> gettimeofday({1265744797, 316183}, NULL) = 0
>> sigreturn()                             = ? (mask now [])
>> NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
>> ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> gettimeofday({1265744797, 317189}, NULL) = 0
>> sigreturn()                             = ? (mask now [])
>>
>>
>> Here's my quick hack that I ran in vw7.7nc:
>> | f o lines maxrun repeats range |
>> f := '../Cog/squeak.strace.log' asFilename readStream.
>> o := 'compressed.log' asFilename writeStream.
>> lines := OrderedCollection new.
>> maxrun := 50.
>> repeats := 0.
>> range := nil.
>> [[f atEnd] whileFalse:
>> [lines size > maxrun ifTrue:
>> [repeats > 0
>> ifTrue:
>> [1 to: range first - 1 do:
>> [:i| o nextPutAll: (lines at: i); cr].
>> o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT ';
>> print: repeats + 1; nextPutAll: ' TIMES'; cr.
>> range do:
>> [:i| o nextPutAll: (lines at: i); cr].
>> lines removeFirst: range last.
>> repeats := 0]
>> ifFalse:
>> [o nextPutAll: lines removeFirst; cr; flush].
>> range := nil].
>> lines addLast: (f upTo: Character cr).
>> [:exit|
>> 1 to: lines size do:
>> [:i| | line repeat |
>> line := lines at: i.
>> repeat := lines nextIndexOf: line from: i + 1 to: lines size.
>> (repeat ~~ nil
>> and: [lines size >= (repeat - i * 2 + i)
>> and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i +
>> repeat)]]]) ifTrue:
>> [repeats := repeats + 1.
>> range isNil
>> ifTrue: [range := i to: repeat - 1]
>> ifFalse:
>> [range = (i to: repeat - 1) ifTrue:
>> [range do: [:ignore| lines removeAtIndex: repeat].
>> exit value]]]]] valueWithExit]]
>> ensure: [f close. o close].
>> repeats
>> Forgive the cross post.  I expect deep expertise in each newsgroup posted
>> to.
>> best
>> Eliot
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project