Hi All,
I've just needed to make sense of a very long log file generated by strace. The log file is full of entries like: --- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744804, 491238}, NULL) = 0 sigreturn() = ? (mask now [])
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 and my workspace script reduces these to e.g. --- SIGALRM (Alarm clock) @ 0 (0) --- gettimeofday({1265744797, 316183}, NULL) = 0
sigreturn() = ? (mask now []) NEXT 2 LINES REPEAT 715 TIMES
ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 ioctl(8, 0x80045530, 0xbfd4fe70) = 0
ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 --- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 317189}, NULL) = 0 sigreturn() = ? (mask now [])
My question is has anyone looked at this issue in any depth and perhaps come up with something not as crude as the below and possibly even recursive. i.e. the above would ideally be reduced to e.g.
NEXT 7 LINES REPEAT 123456 TIMES --- SIGALRM (Alarm clock) @ 0 (0) ---
gettimeofday({1265744797, 316183}, NULL) = 0 sigreturn() = ? (mask now [])
NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
ioctl(8, 0x80045530, 0xbfd4fe70) = 0 ioctl(8, 0xc1205531, 0xbfd4fb80) = 0
--- SIGALRM (Alarm clock) @ 0 (0) --- gettimeofday({1265744797, 317189}, NULL) = 0
sigreturn() = ? (mask now []) Here's my quick hack that I ran in vw7.7nc:
| f o lines maxrun repeats range | f := '../Cog/squeak.strace.log' asFilename readStream. o := 'compressed.log' asFilename writeStream. lines := OrderedCollection new.
maxrun := 50. repeats := 0. range := nil. [[f atEnd] whileFalse: [lines size > maxrun ifTrue: [repeats > 0
ifTrue: [1 to: range first - 1 do: [:i| o nextPutAll: (lines at: i); cr].
o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT '; print: repeats + 1; nextPutAll: ' TIMES'; cr.
range do: [:i| o nextPutAll: (lines at: i); cr]. lines removeFirst: range last.
repeats := 0] ifFalse: [o nextPutAll: lines removeFirst; cr; flush].
range := nil]. lines addLast: (f upTo: Character cr). [:exit|
1 to: lines size do: [:i| | line repeat | line := lines at: i.
repeat := lines nextIndexOf: line from: i + 1 to: lines size. (repeat ~~ nil
and: [lines size >= (repeat - i * 2 + i) and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i + repeat)]]]) ifTrue:
[repeats := repeats + 1. range isNil ifTrue: [range := i to: repeat - 1]
ifFalse: [range = (i to: repeat - 1) ifTrue: [range do: [:ignore| lines removeAtIndex: repeat].
exit value]]]]] valueWithExit]] ensure: [f close. o close]. repeats
Forgive the cross post. I expect deep expertise in each newsgroup posted to. best Eliot |
A bit of a strain on the old garbage collector, but a Bag is good
for that kind of analysis: f := FileStream fileNamed: 'strace.txt'. lines := Bag new. [[f atEnd] whileFalse: [lines add: (f upTo: Character lf)]] ensure: [f close]. lines sortedCounts inspect Dave On Tue, Feb 09, 2010 at 01:34:19PM -0800, Eliot Miranda wrote: > Hi All, > > I've just needed to make sense of a very long log file generated by > strace. The log file is full of entries like: > > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744804, 491238}, NULL) = 0 > sigreturn() = ? (mask now []) > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > > and my workspace script reduces these to e.g. > > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 316183}, NULL) = 0 > sigreturn() = ? (mask now []) > NEXT 2 LINES REPEAT 715 TIMES > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 317189}, NULL) = 0 > sigreturn() = ? (mask now []) > > > My question is has anyone looked at this issue in any depth and perhaps come > up with something not as crude as the below and possibly even recursive. > i.e. the above would ideally be reduced to e.g. > > NEXT 7 LINES REPEAT 123456 TIMES > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 316183}, NULL) = 0 > sigreturn() = ? (mask now []) > NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 317189}, NULL) = 0 > sigreturn() = ? (mask now []) > > > > Here's my quick hack that I ran in vw7.7nc: > > | f o lines maxrun repeats range | > f := '../Cog/squeak.strace.log' asFilename readStream. > o := 'compressed.log' asFilename writeStream. > lines := OrderedCollection new. > maxrun := 50. > repeats := 0. > range := nil. > [[f atEnd] whileFalse: > [lines size > maxrun ifTrue: > [repeats > 0 > ifTrue: > [1 to: range first - 1 do: > [:i| o nextPutAll: (lines at: i); cr]. > o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT '; > print: repeats + 1; nextPutAll: ' TIMES'; cr. > range do: > [:i| o nextPutAll: (lines at: i); cr]. > lines removeFirst: range last. > repeats := 0] > ifFalse: > [o nextPutAll: lines removeFirst; cr; flush]. > range := nil]. > lines addLast: (f upTo: Character cr). > [:exit| > 1 to: lines size do: > [:i| | line repeat | > line := lines at: i. > repeat := lines nextIndexOf: line from: i + 1 to: lines size. > (repeat ~~ nil > and: [lines size >= (repeat - i * 2 + i) > and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i > + repeat)]]]) ifTrue: > [repeats := repeats + 1. > range isNil > ifTrue: [range := i to: repeat - 1] > ifFalse: > [range = (i to: repeat - 1) ifTrue: > [range do: [:ignore| lines removeAtIndex: repeat]. > exit value]]]]] valueWithExit]] > ensure: [f close. o close]. > repeats > > Forgive the cross post. I expect deep expertise in each newsgroup posted > to. > > best > Eliot > |
On Tue, Feb 9, 2010 at 3:44 PM, David T. Lewis <[hidden email]> wrote: A bit of a strain on the old garbage collector, but a Bag is good That doesn't do what I want. That gives the frequency of each line. I want a shortened file that I can browse more easily where successive runs of multiple lines are compressed down to a single run of the multiple lines marked with a repeat count. Instead of having to wade through pages and pages of the same N lines there is just one occurrence of those N lines prefixed with a repeat count. So the condensed log preserves the ordering of the events it logs but is much abbreviated.
|
In reply to this post by Eliot Miranda-2
Hi Eliot -
Fun challenge. The problem is very similar to the LZ family of compressors (http://en.wikipedia.org/wiki/LZ77_and_LZ78) which basically find duplicate strings. So something similar is applicable here. My take is this: "Set up the input. Change this to your file and use #nextLine." stream := 'AAAABCBCBCABCABCA' readStream. input := [:buf| stream atEnd ifFalse:[buf add: stream next]]. "Set up the output. Takes the repeat count and the list of elements. Change this to your desired output format." output := [:count :list| Transcript show: count printString,'x'. list do:[:element| Transcript show: element]. Transcript space. ]. "The matching algorithm itself. It looks ahead for a match of the first element and if it repeats, simply eats the input until the match ends. Could be improved in various ways, for example by detecting the longest match instead of taking the first match (this leads to AABAAC being output as 2xA 1xB 2xA 1xB instead of 2xAAB)." lookahead := 20. "twice the length of longest match we can process" buffer := OrderedCollection new. [lookahead - buffer size timesRepeat:[input value: buffer]. buffer isEmpty] whileFalse:[ repeat := 1. match := 1. "find first occurance of first element in lookahead buffer" [repeat = 1 and:[ match := buffer indexOf: buffer first startingAt: match+1 ifAbsent:[0]. match between: 2 and: lookahead // 2]] whileTrue:[ "see if we have a repeat pattern" pattern := buffer copyFrom: 1 to: match-1. stop := match+match-2. [buffer size >= stop and:[pattern = (buffer copyFrom: match to: stop)]] whileTrue:[ "Eat the pattern and repeat" repeat := repeat + 1. buffer removeFirst: match-1. match-1 timesRepeat:[input value: buffer]. ]. repeat > 1 ifTrue:[ buffer removeFirst: match-1. output value: repeat value: pattern. ]. ]. repeat = 1 ifTrue:[ output value: 1 value: (buffer copyFrom: 1 to: 1). buffer removeFirst. ]. ]. For the input string 'AAAABCBCBCABCABCA' this will output 4xA 3xBC 2xABC 1xA. Just substitute "stream nextLine" for the input and format your output accordingly and that should do the trick. Cheers, - Andreas Eliot Miranda wrote: > Hi All, > > I've just needed to make sense of a very long log file generated by > strace. The log file is full of entries like: > > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744804, 491238}, NULL) = 0 > sigreturn() = ? (mask now []) > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > > and my workspace script reduces these to e.g. > > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 316183}, NULL) = 0 > sigreturn() = ? (mask now []) > NEXT 2 LINES REPEAT 715 TIMES > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 317189}, NULL) = 0 > sigreturn() = ? (mask now []) > > > My question is has anyone looked at this issue in any depth and perhaps > come up with something not as crude as the below and possibly even > recursive. i.e. the above would ideally be reduced to e.g. > > NEXT 7 LINES REPEAT 123456 TIMES > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 316183}, NULL) = 0 > sigreturn() = ? (mask now []) > NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > ioctl(8, 0x80045530, 0xbfd4fe70) = 0 > ioctl(8, 0xc1205531, 0xbfd4fb80) = 0 > --- SIGALRM (Alarm clock) @ 0 (0) --- > gettimeofday({1265744797, 317189}, NULL) = 0 > sigreturn() = ? (mask now []) > > > > Here's my quick hack that I ran in vw7.7nc: > > | f o lines maxrun repeats range | > f := '../Cog/squeak.strace.log' asFilename readStream. > o := 'compressed.log' asFilename writeStream. > lines := OrderedCollection new. > maxrun := 50. > repeats := 0. > range := nil. > [[f atEnd] whileFalse: > [lines size > maxrun ifTrue: > [repeats > 0 > ifTrue: > [1 to: range first - 1 do: > [:i| o nextPutAll: (lines at: i); cr]. > o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT '; > print: repeats + 1; nextPutAll: ' TIMES'; cr. > range do: > [:i| o nextPutAll: (lines at: i); cr]. > lines removeFirst: range last. > repeats := 0] > ifFalse: > [o nextPutAll: lines removeFirst; cr; flush]. > range := nil]. > lines addLast: (f upTo: Character cr). > [:exit| > 1 to: lines size do: > [:i| | line repeat | > line := lines at: i. > repeat := lines nextIndexOf: line from: i + 1 to: lines size. > (repeat ~~ nil > and: [lines size >= (repeat - i * 2 + i) > and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - > i + repeat)]]]) ifTrue: > [repeats := repeats + 1. > range isNil > ifTrue: [range := i to: repeat - 1] > ifFalse: > [range = (i to: repeat - 1) ifTrue: > [range do: [:ignore| lines removeAtIndex: repeat]. > exit value]]]]] valueWithExit]] > ensure: [f close. o close]. > repeats > > Forgive the cross post. I expect deep expertise in each newsgroup > posted to. > > best > Eliot > > > ------------------------------------------------------------------------ > > |
Free forum by Nabble | Edit this page |