Optimum polynomial


Fork me on GitHub
2017-11-19

Problem 101: Optimum polynomial

Description:

If we are presented with the first k terms of a sequence it is impossible to say with certainty the value of the next term, as there are infinitely many polynomial functions that can model the sequence.

As an example, let us consider the sequence of cube numbers. This is defined by the generating function,

u_n = n^3: 1, 8, 27, 64, 125, 216, ...

Suppose we were only given the first two terms of this sequence. Working on the principle that "simple is best" we should assume a linear relationship and predict the next term to be 15 (common difference 7). Even if we were presented with the first three terms, by the same principle of simplicity, a quadratic relationship should be assumed.

We shall define OP(k, n) to be the nth term of the optimum polynomial generating function for the first k terms of a sequence. It should be clear that OP(k, n) will accurately generate the terms of the sequence for n <= k, and potentially the first incorrect term (FIT) will be OP(k, k+1); in which case we shall call it a bad OP (BOP).

As a basis, if we were only given the first term of sequence, it would be most sensible to assume constancy; that is, for n >= 2, OP(1, n) = u_1.

Hence we obtain the following OPs for the cubic sequence:

OP(1, n) = 1            1,     **1**, 1,      1,           ...
OP(2, n) = 7n-6         1,     8,     **15**,              ...
OP(3, n) = 6n^2-11n+6   1,     8,     27,   **58**,        ...
OP(4, n) = n^3          1,     8,     27,     64,     125, ...

Clearly no BOPs exist for k >= 4.

By considering the sum of FITs generated by the BOPs (indicated in red above), we obtain 1 + 15 + 58 = 74.

Consider the following tenth degree polynomial generating function:

un = 1 - n + n^2 - n^3 + n^4 - n^5 + n^6 - n^7 + n^8 - n^9 + n^10

Find the sum of FITs for the BOPs.


Solution:
vXXXX>1-v########### ###########   #
0XXXX0  1########### ###########   #
2XXXX1  0########### ###########   #                             >$22g1+22p v
3XXXX-v1<########### ###########   #              > g/\37*+22gp:!|>   :2v2: #     <
>p101^>-101-101-199 v###########   #              ^01gg22+*73::-1<  *53<1v\g33%p3<-
v_v#-7:-1:g01<p01++1<###########   #                              ^  < pp>:!#v_:3^1
>#$     10p0p^       ###########   #           >g11gv             _$v  0>|   >$21g^
  >$49*5+4*>:0\:35*%v###########   #>g:1+40p5-vg    >1-:37*+12gg\:^v<  ^1<  >   v
v31p02<v:p/ *53\+*73<###########   #^04p-1g04<62v75<  v:pg22+*7<v11< ^   <
>0p v :>1-\|     >$1v###########   #>`!*\37*v*-1*>$^v < >2gg-\3^>g>:2v2:       <
 v00< ^:<  ># #< ^# <###########   #^g11g04<+7 ^<|! <>37 *+22ggv     1v\g33%p3<-
 >:9+0g30g:20g* 30p*v               >\40g*v*45 >- :::^v<^1+*73\<     p>:!#v_:3^1
 |-+55\+1:\+g01 \p01< >140p>0>:::1\:|:-1\ <`0* ^1<*53$<|!:pg22+<     >|   >$21g^
 >$\8+1p1+:6-6-|>$111p^ v< |<|   -6<>$\11g\^g! >+22gg32 g*\37* ^v53p01<  ^     <
>             v>^>    ^ +* 0 >$40gv^-5:+1p-1<` ^*73::-1<*53$< v*<
   v      *53<>  ^      27 1^    <>8+1g40g11g^ >*\37*+12gp:!| >1-::37*+12gg1v
     >g37*+1#^ v# p24gg2<3 2     ^      >#  v#<^g24gg21+*73: <|:pg21+*73\/g0<
     2>12g1-12v>2gg32p22g^ >p>11g1-12g`!|v  #      p21+1g21< :>$57*22gg11g>1-:v
     2|!`g22g11<p22+1g21<    ^         <> 12g1+22p>11g22g`!| -    v:\gg22+ *73<g
  v <^<       > $p012g`!|!`g210<p21-2g<^ <v21p23gg21+*73g21< 1    >       |>$11^
  $ |!:pg21+*73 \*g2<   >           v ^11<>g37*+22# gg42p35*>^            >^
  3>>1-::37*+12 gg4 ^ >v2:       <       ^$$<     ^                             <
  > 5*>1-::37*+ 22gg3v 1v\g33%p3<-                >v2:       <
 v53$<|:pg22+*7 3\*g2< p>:!#v_:3^1                 1v\g33%p3<-
 *   ^<               2>|   >$21g^                 p>:!#v_:3^1
 >1-:::37*+12gg \37*v ^: g11$<      #              >|   >$21g^
 |:pg21+*73\-gg 22+ < v #   _^ >$57*22gg11g>1-:37*+2 2gg\:v
 >$57*12gg11g>1 -:37*+12gg\:^  |!:pg21+*73\ /g0<v # #     _v>$22g1+22pv
             ^       $< >10p35*>1-::37*+12g g1 ^  ^ 2:g11 $<|!:pg22+*7 3\/g0<
^                                   $ <    ^    <   >10p35* >1-::37*+2 2gg1 ^
>+\,,,23g.@    ^                    $#                                <
*v*p03*g02:g03g2+9:<00p<v *75::-1<+<>11g>1-::37*+\g0\`!#v_:12p35*>1-::37*+12gg0v
6>10p\10g+\:1+\55+-|   0>\g\9+2p:| ^65$<|:              <      $<|:pg21+*73\*-1<
9   v!-6-6:+1p3+8\$<   ^31p02::1$<    #^<                       ^<
7v0$_           ::20p13^              ^              <
:>31p>31g9+1g31g9+3g-!#v_31g9+3g:.55+,23g+23p11g1+11p^
^*48_^#  !-6-5p13:+1g13<
Start
??
Pause
Reset
Output:
Stack:   (0)

Explanation:

This resulted in a nice program, not much data, a good amount of code and there is imho still a fair amount of possibility for more compact program flow.

Now to our algorithm:

First we remember our polynomial as an 11-element array at [9|0]:

[ +1; -1; +1; -1; +1; -1; +1, -1; +1; -1; +1 ]

Then we calculate the real sequence at [9|1]. We only need the first 11 terms, because no later than OP(11, n) we should have found the correct sequence.

seq = [1; 683; 44287; 838861; 8138021; 51828151; 247165843; 954437177; 3138105961; 9090909091; 23775972551 ]

Then we start with k=1 and try guessing the polynomial parameters for the first k terms of our sequence. We generate k equations for the k values we "know" of our resulting sequence, here's an example how this would look for k=4:

t_0 * 1 + t_1 * 1 + t_2 *  1 + t_3 *  1 = 1              (n=1)
t_0 * 1 + t_1 * 2 + t_2 *  4 + t_3 *  8 = 683            (n=2)
t_0 * 1 + t_1 * 3 + t_2 *  9 + t_3 * 27 = 44287          (n=3)
t_0 * 1 + t_1 * 4 + t_2 * 16 + t_3 * 64 = 838861         (n=4)

where we are searching (guessing) for a polynomial of the form:

t_0 * n^0 + t_1 * n^1 + t_2 * n^2 + t_3 * n^3 + ...

Represented in memory are these (up to 11) equations with up to 11 elements as an 2d array (the last element is the result):

[1; 1;  1;  1; 0; 0; 0; 0; 0; 0; 0; 1     ]
[1; 2;  4;  8; 0; 0; 0; 0; 0; 0; 0; 683   ]
[1; 3;  9; 27; 0; 0; 0; 0; 0; 0; 0; 44287 ]
[1; 4; 16; 64; 0; 0; 0; 0; 0; 0; 0; 838861]

The next step is obviously solving the equation system (its linear and we have k variables and k equations, should be possible).
We are using the gaussian elimination method. In the first step (the Forward Elimination) we bring our system into the row echelon form. Theoretically we only need two operations for this: "Multiply an equation by a factor" and "Subtract one equation from another". But the values of our variables can grow in the steps of our algorithm quite large, so after every operation we "simplify" all modified equations. This means we calculate the GCD of all the variables, and the result of an equation and then multiply the equation by 1/gcd. This way we have again the smallest representation of this equation. After all this our equation system should look like this:

[1; 1; 1; 1; 0; 0; 0; 0; 0; 0; 0; 1     ]
[0; 1; 3; 7; 0; 0; 0; 0; 0; 0; 0; 682   ]
[0; 0; 1; 6; 0; 0; 0; 0; 0; 0; 0; 21461 ]
[0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 0; 118008]

Next we do the back substitution step, which is also just a bunch of subtracting one equation from another. The result should look like this:

[1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; -665807 ]
[0; 1; 0; 0; 0; 0; 0; 0; 0; 0; 0; +1234387]
[0; 0; 1; 0; 0; 0; 0; 0; 0; 0; 0; -686587 ]
[0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 0; +118008 ]

In the actual implementation it can happen here that some rows have -1 as their value at eq[i;i]. To fix this we simply multiply all equations with eq[i;i] in the last step.

From the solved equation system we can now read our current best guesses for t_0..t_11:

[ -665807; +1234387; -686587; +118008; 0; 0; 0; 0; 0; 0; 0 ]

and calculate our (best guess) for the sequence:
(again we only need the first 11 terms. After them we either have found a FIT, or found the correct values for t_0..t_11)

[ 1; 683; 44287; 838861; 3092453; 7513111; 14808883; 25687817; 40857961; 61027363; 86904071 ]

Next we compare this sequence with the correct sequence (that we calculated at the beginning). In this case we find the FIT at term 5 (3092453). That means we remember its value an try the whole process again with k=k+1.

If all 11 terms were equal, we would have found the correct polynomial. We then output the sum of all previous FIT values and terminate our program.

If you're running this program in a good visual interpreter (for example BefunExec) be sure to look at the top part were the equations are stored. It's interesting seeing the gaussian elimination algorithm live in action.


Interpreter steps: 1 825 656
Execution time (BefunExec): 250ms (7.30 MHz)
Program size: 83 x 37
Solution: 37076114526
Solved at: 2017-11-19



made with vanilla PHP and MySQL, no frameworks, no bootstrap, no unnecessary* javascript