Curve Fitting: Adjusting Data Points
Graphs are generated with the HP Prime emulator software.
A Curve Fitting Problem
Problem:
We have an investment project, which at the beginning will cost us 50,000 (insert the currency of your choice). Through analysis, the projected cash flows at the beginning of each term are shown here:
Period (beginning of) |
Cash Flow |
1 |
-50000 |
2 |
-3000 |
3 |
9000 |
4 |
16000 |
5 |
18000 |
6 |
21000 |
What is the projected cash flow at the beginning of period 7?
We could easily use linear regression, but the problem is, the data itself does not easily fit a line. So we can just use another regression model and that’s simple, right? Not quite, since two of the cash flows are negative.
Approach
Most calculators have at least the following curve fitting regressions:
|
Model |
Transformed |
X’ = |
Y’ = |
True B = |
Linear |
Y = B + M * X |
Y = B + M * X |
X |
Y |
B |
Exponential |
Y = B * exp(M *X) |
ln(Y) = ln(B) + M * X |
X |
ln(Y) |
exp(B) |
Logarithmic |
Y = B + M * ln(X) |
Y = B + M * ln(X) |
ln(X) |
Y |
B |
Power |
Y = B * X^M |
ln(Y) = ln(B) + M * ln(X) |
ln(X) |
ln(Y) |
exp(B) |
A lot of them rely on a transition of data to use a translated linear regression model. After the parameters slope (M) and y-intercept (B) are calculated, any adjustments are made to find the correct curve fitting parameters.
The statistical calculations operate on real numbers. If we try to fit an exponential, logarithmic, or power fit on data with non-positive numbers, we are going to have a problem because the natural logarithmic function returns non-real numbers in the transition process.
Still the data does not fit a linear regression line as good as the other regression curves. This calls for an adjustment of data. In essence, we are going to subtract out the minimum value of either the x or y data, or both “minus one”. Conceptually:
If min(X) ≤ 0, then:
P = min(X) -1
X_adj = X – P
If min(Y) ≤ 0, then:
Q = min(Y) -1
Y_adj = Y – Q
Why the “minus 1”?
If we only subtract the minimum, the lowest adjust amount would be 0. But ln(0) approaches negative infinity, which most calculators would not work with.
If we subtract the minimum minus one, the lowest adjusted amount would be 1. ln(1) =0. Beautiful!
Example: Adjusting the Raw Data
Raw data:
X |
Y |
1 |
-50000 |
2 |
-3000 |
3 |
9000 |
4 |
16000 |
5 |
18000 |
6 |
21000 |
min(X) = 1 > 0. No adjustment is needed.
min(Y) = -50000 ≤ 0. An adjustment is needed.
Let Q = min(Y) – 1 = -50000 – 1 = -50001
Adjusted data:
X |
Y |
Y’ = Y – Q |
ln(Y’) (fix 3) |
1 |
-50000 |
1 |
0 |
2 |
-3000 |
47001 |
10.758 |
3 |
9000 |
59001 |
10.985 |
4 |
16000 |
66001 |
11.097 |
5 |
18000 |
68001 |
11.127 |
6 |
21000 |
71001 |
11.170 |
Let’s fit the adjusted curve.
Keep in mind that:
Y’ = Y – Q
Y’ = Y - -50001
Y’ = Y + 50001
Y = Y’ – 50001
Fitting the Curve
Determine which one of the four regressions: linear, exponential, logarithmic, or power bets fits the adjusted data points. Predict the actual revenue (Y) for period 7.
Here is the data. We are going to use the data sets X and Y’.
X |
Y |
Y’ = Y – Q , (Q = -50001) |
1 |
-50000 |
1 |
2 |
-3000 |
47001 |
3 |
9000 |
59001 |
4 |
16000 |
66001 |
5 |
18000 |
68001 |
6 |
21000 |
71001 |
Correlation Coefficients:
Model |
Correlation (fix 4) |
Linear |
0.8477 |
Exponential |
0.6772 |
Logarithmic |
0.9520 |
Power |
0.8290 |
Calculations were made with a HP 14B calculator. The best fit has absolute value of the correlation closest to 1. Correlations near -1 or +1 are better than correlations near 0.
The best fit is the logarithmic fit. Hence the curve fit is:
Y’ ≈ 9618.6378 + 38498.9034 * ln(X)
Remember that Y’ = Y + 50001. Then:
Y + 50001 ≈ 9618.6378 + 38498.9034 * ln(X)
Y ≈ -40382.3262 + 38498.9034 * ln(X)
To predict for X = 7:
Y ≈ -40382.3262 + 38498.9034 * ln(7)
Y ≈ 34533.0807
The beginning of period 7 will have the project cash flow of about $34,533.
Eddie
All original content copyright, © 2011-2024. Edward Shore. Unauthorized use and/or unauthorized distribution for commercial purposes without express and written permission from the author is strictly prohibited. This blog entry may be distributed for noncommercial purposes, provided that full credit is given to the author.