Curve Fitting: Adjusting Data Points
Graphs are generated with the HP Prime emulator software.
A Curve Fitting Problem
Problem:
We have an investment project, which at the beginning will cost us 50,000 (insert the currency of your choice). Through analysis, the projected cash flows at the beginning of each term are shown here:
Period (beginning of) 
Cash Flow 
1 
50000 
2 
3000 
3 
9000 
4 
16000 
5 
18000 
6 
21000 
What is the projected cash flow at the beginning of period 7?
We could easily use linear regression, but the problem is, the data itself does not easily fit a line. So we can just use another regression model and that’s simple, right? Not quite, since two of the cash flows are negative.
Approach
Most calculators have at least the following curve fitting regressions:

Model 
Transformed 
X’ = 
Y’ = 
True B = 
Linear 
Y = B + M * X 
Y = B + M * X 
X 
Y 
B 
Exponential 
Y = B * exp(M *X) 
ln(Y) = ln(B) + M * X 
X 
ln(Y) 
exp(B) 
Logarithmic 
Y = B + M * ln(X) 
Y = B + M * ln(X) 
ln(X) 
Y 
B 
Power 
Y = B * X^M 
ln(Y) = ln(B) + M * ln(X) 
ln(X) 
ln(Y) 
exp(B) 
A lot of them rely on a transition of data to use a translated linear regression model. After the parameters slope (M) and yintercept (B) are calculated, any adjustments are made to find the correct curve fitting parameters.
The statistical calculations operate on real numbers. If we try to fit an exponential, logarithmic, or power fit on data with nonpositive numbers, we are going to have a problem because the natural logarithmic function returns nonreal numbers in the transition process.
Still the data does not fit a linear regression line as good as the other regression curves. This calls for an adjustment of data. In essence, we are going to subtract out the minimum value of either the x or y data, or both “minus one”. Conceptually:
If min(X) ≤ 0, then:
P = min(X) 1
X_adj = X – P
If min(Y) ≤ 0, then:
Q = min(Y) 1
Y_adj = Y – Q
Why the “minus 1”?
If we only subtract the minimum, the lowest adjust amount would be 0. But ln(0) approaches negative infinity, which most calculators would not work with.
If we subtract the minimum minus one, the lowest adjusted amount would be 1. ln(1) =0. Beautiful!
Example: Adjusting the Raw Data
Raw data:
X 
Y 
1 
50000 
2 
3000 
3 
9000 
4 
16000 
5 
18000 
6 
21000 
min(X) = 1 > 0. No adjustment is needed.
min(Y) = 50000 ≤ 0. An adjustment is needed.
Let Q = min(Y) – 1 = 50000 – 1 = 50001
Adjusted data:
X 
Y 
Y’ = Y – Q 
ln(Y’) (fix 3) 
1 
50000 
1 
0 
2 
3000 
47001 
10.758 
3 
9000 
59001 
10.985 
4 
16000 
66001 
11.097 
5 
18000 
68001 
11.127 
6 
21000 
71001 
11.170 
Let’s fit the adjusted curve.
Keep in mind that:
Y’ = Y – Q
Y’ = Y  50001
Y’ = Y + 50001
Y = Y’ – 50001
Fitting the Curve
Determine which one of the four regressions: linear, exponential, logarithmic, or power bets fits the adjusted data points. Predict the actual revenue (Y) for period 7.
Here is the data. We are going to use the data sets X and Y’.
X 
Y 
Y’ = Y – Q , (Q = 50001) 
1 
50000 
1 
2 
3000 
47001 
3 
9000 
59001 
4 
16000 
66001 
5 
18000 
68001 
6 
21000 
71001 
Correlation Coefficients:
Model 
Correlation (fix 4) 
Linear 
0.8477 
Exponential 
0.6772 
Logarithmic 
0.9520 
Power 
0.8290 
Calculations were made with a HP 14B calculator. The best fit has absolute value of the correlation closest to 1. Correlations near 1 or +1 are better than correlations near 0.
The best fit is the logarithmic fit. Hence the curve fit is:
Y’ ≈ 9618.6378 + 38498.9034 * ln(X)
Remember that Y’ = Y + 50001. Then:
Y + 50001 ≈ 9618.6378 + 38498.9034 * ln(X)
Y ≈ 40382.3262 + 38498.9034 * ln(X)
To predict for X = 7:
Y ≈ 40382.3262 + 38498.9034 * ln(7)
Y ≈ 34533.0807
The beginning of period 7 will have the project cash flow of about $34,533.
Eddie
All original content copyright, © 20112024. Edward Shore. Unauthorized use and/or unauthorized distribution for commercial purposes without express and written permission from the author is strictly prohibited. This blog entry may be distributed for noncommercial purposes, provided that full credit is given to the author.