Princeton University |
Computer Science 597d |
|
I have completed a series of optimizations as follows...
gopti3 - implements FRP starting with your opti5 code. No performance improvement.
gopti5 - starting from the orig code, I tried to compact the branches together. I noticed that the limiting factor seemed to be there was a data dependency between iterations based on the gotspace register which determines if a space character was seen in the last iteration. I broke this dependency by creating a separate gotspace register for each iteration. This decreased the cycle count from 289353 (orig) to 210456 (gopti5).
gopti6 - In this optimization I combine FRP (gopti3) with gotspace register renaming (gopti5). This gave an additional performance increase bringing the best count down to 171003.
gopti7 - I noticed that the code wasn't well packed in gopti6 and I thought the limiting factor might be the data dependence formed from the word_count and line_count accumulators. So I duplicated the accumulators and did register renaming to give each iteration it's own accumulator. This didn't help much and actually slowed the loop down from 171003 to 171020.
gopti8 - Startin from gopti7, I thought the limiting factor might be the boolean optimization, since you only implemented that for iters 1-4. So I added the boolean optimizations to iters 5-8. This actually helped quite a bit as it reduced the dependence height of the predicate calculations. The cycle count was brought down to 144722.
I have attached a tar-zip file of the gopti directories. There is a brief README file in each directory showing the cycle count and schedule.
Thanks,
Grant.
orig/wc.HS_im_p.input1.stat:(best_cycle_count 289353.000000)
gopti3/wc.HS_im_p.input1.stat:(best_cycle_count 236751.000000)
gopti5/wc.HS_im_p.input1.stat:(best_cycle_count 210456.000000)
gopti6/wc.HS_im_p.input1.stat:(best_cycle_count 171003.000000)
gopti7/wc.HS_im_p.input1.stat:(best_cycle_count 171020.000000)
gopti8/wc.HS_im_p.input1.stat:(best_cycle_count 144722.000000)
-----------------+---------------------------------------------------------------------------------------------------------------
cb 35 13150.0 | 55
-> 42 0.0 | 56
-> 42 0.0 | 57
-> 42 0.0 | 58
-> 42 1.0 | 59
-> 42 0.0 | 60
-> 42 0.0 | 61
-> 42 0.0 | 62
-> 35 1.0 | 63
-> 42 13148.0 | 64
| 65
| 66
| 67
| 68
| 69
| 70
| 0 81 72 75 73 74 71L - - - - - - - - - -
| 1 94 85 88 87 82 86 80 77 79 76 84L - - - - -
| 2 107 90 98 101 100 95 93 99 89 78 92 97L - - - 83J
| 3 120 103 102 111 114 113 106 108 112 91 105 110L - - - 96J
| 4 132 116 104 123 126 125 124 119 118 115 122L - - - - 109J
| 5 144 128 127 135 138 137 131 136 134L 117 130 - - - - 121J
| 6 147 150 149 156 148 140 143 139 129 142 146L - - - - 133J
| 7 152 159 162 161 155 151 158L 168 160 141 154 - - - - 145J
| 8 163 153 164 166 167 170 171 172 173 174 175 176 177 - - 157J
| 9 165 178 - - - - - - - - - - - - - 169J
| 10 179 - - - - - - - - - - - - - - 180J
-----------------+---------------------------------------------------------------------------------------------------------------
I know there are better schedules out there.... Firm due date is January 19th.