Clarisse Chia
03/07/2024, 2:55 AMstatsforecast
up and running on pyspark, but have running into the ModuleNotFoundError: No module named 'fugue'
error, despite having installed fugue
.
i was wondering someone would be willing to help me troubleshoot/chat through what i might not be thinking about. for context, i’m running on python 3.8, pyspark 3.2.1, and scala2.12Tung Nguyen
03/07/2024, 10:29 AMThiago Vidigal
03/09/2024, 1:07 PMMakarand Batchu
03/14/2024, 2:19 PMfrom statsforecast.models import (
MSTL
)
# Create a list of models and instantiation parameters
models = [
MSTL(season_length = [7,31])
]
Makarand Batchu
03/14/2024, 4:10 PMBrian Head
03/14/2024, 5:04 PMValeriy
03/15/2024, 7:22 PMClarisse Chia
03/18/2024, 3:02 PMn-week
* 7 days
seasonality to try to capture the holiday/special date seasonality effect. below is how i’ve set up the modeling problem (will add example of modeling code setup in thread)
would love advice on the following pieces:
1. how to speed up .forecast()
, or more specifically, writing the .forecast()
output?
a. context: it’s currently taking anywhere from 5 to 17 hours, depending on what exogenous features i pass in for the ~200k series, despite the shortened “artificial timeline” (vs. full year’s timelines for each past year)
2. how do i reframe the problem such that i’m able to capture holiday/special date effect without having to create this “artificial timeline”?
a. context: it’s working when i’m setting up the timeline to capture holiday/special dates that fall on the same “day of week” every year, but i worry about that same ability for holiday/special dates that do not fall on the same “day of week”
thanks in advance!!Makarand Batchu
03/19/2024, 11:15 AMprediction_intervals
parameter in models of statsforecast. I understand that I have to pass ConformalIntervals
which takes horizon
and n_windows
but can someone explain what this all means and how it can be used help me improve forecasts? And how it is different from when nothing is passed for prediction_intervals
?
Thank you in advance.Brian Head
03/21/2024, 1:41 PMforecast
I get the error in the attached screen shot.
Pertinent details:
• I've been starting with samples of data (with seeds for consistency) and then will remove that when ready to scale up to the full dataframe. forecast
actually does work with samples under 5% (less than ~75 series with 48 monthly observations for training and 3 for forecasting). But, when I increase the frac to 0.05 I get this error.
• Given the error message, I thought it might be an issue with some of the data pulled in after the increase. However, I have done a couple of things I think rule that out
◦ Displayed the data and looked through it. Everything looked fine.
◦ Pulled it back down to regular pandas dataframe and ran everything that way. It works fine then with no errors--even when increasing the sample to 50%.
Before going to our data engineers, I wanted to check if there's any other thoughts or suggestions. They are helpful with many things, but they aren't familiar with Statsforecast, so wanted to rule ou any other things before pulling them in.
Thanks for any help you can provide.Brian Head
03/21/2024, 1:48 PMBrian Head
03/21/2024, 2:13 PMJeff Tackes
03/24/2024, 2:54 AMsf = StatsForecast(
models = [ETS(season_length=48*7)],
freq = "30min"
)
sf.fit(ts_train,
id_col = 'LCLid',
time_col = 'timestamp',
target_col = 'energy_consumption', )
sf.predict(h=48)
My data has enough fluctuation where i would have thought there would be better "movement".
When i run ETS using DARTS, i do not get a flat forecast and get cyclic patterns showing in my forecast.
Additionally, when i run ETS in NIXTLA, it takes several minutes whereas in DARTS it took 26 seconds.Makarand Batchu
03/25/2024, 2:43 PMh
, n_windows
and step_size
with an example? As it is unclear as to how to chose these parameter values.
Thanks in advance!Brian Head
03/26/2024, 6:55 PMforecast
function and forecast_fitted_values function
. For example, on my local laptop the CV and forecast functions run for approximately the same amount of time and the extraction of fitted values takes only a few seconds. However, when using spark in Databricks, the forecast and forecast_fitted values functions take about 3-4 times as long as the CV. Is that normal behavior? I'm wondering if it might have anything to do with the partitioning.
b. I've read some sources that say there should be 3-4 partitions per core. However, that's not realistic at all for my situation given the resources my team and I have. Is there any other guideline for the number of partitions?
2. I can understand that for non-statistical models I might get slightly different results. However, assuming I've got the exact same data, I should get the same results when training and forecasting with a statistical model no matter the processing type (e.g., local or distributed) and environment (e.g., laptop vs something like Databricks), right?Clarisse Chia
03/27/2024, 2:53 PM0
or null
forecasts and was wondering what might be going wrong.
the dataset im working with is sensitive, but if helpful, below is the simple model setup
from statsforecast import StatsForecast
from statsforecast.models import SeasonalNaive, AutoARIMA
# configure model
models = [AutoARIMA(season_length=7, nmodels=5, trace=True)]
statsforecast = StatsForecast(models=models, freq="D", fallback_model=SeasonalNaive(season_length=7), n_jobs=-1)
# forecast
horizon = test_x.select('ds').dropDuplicates().count()
forecast_results = statsforecast.forecast(df=train_set, h=horizon, X_df=test_x)
the model has been working quite well until recently, when i changed how one exogenous variable would look in the future forecast (within `test_x`; based on business assumptions)Valeriy
04/02/2024, 3:50 PMValeriy
04/04/2024, 2:19 PMJeff Tackes
04/04/2024, 8:44 PMClarisse Chia
04/04/2024, 10:29 PMAutoARIMA(season_length=7)
uses the exogenous variables we feed it.
context on model setup:
1. ~4 years of complete daily sales history
2. exogenous variables
a. covid indicators
b. day of week indicators
c. day of week * holiday indicators
i. idea here is to capture sales peak for each holiday, especially when a holiday falls on a different day of week each year
ii. [problem] this is where i notice, that when i have multiple holidays fall really close to each other (e.g., superbowl/st. patricks/easter), the forecasts can output some pretty extreme and unreasonable values
1. i wonder if the exogenous variables may be multiplicative (rather than additive, causing these extremely values to populate when these indicators fall on the same dates?)
would really appreciate it if folks have any suggestions of what I might be missing!Valeriy
04/05/2024, 1:59 PMValeriy
04/06/2024, 1:04 PMVítor Barbosa
04/18/2024, 4:32 PMNils de Korte
04/19/2024, 11:07 AMAbishek
04/21/2024, 6:13 AMtransform(generate_data(20), forecast, partition={"num":500, "by":"unique_id"}).show()
throws error. can anyone help.
error
8 ERROR ArrowPythonRunner: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/anaconda3/envs/bigdata/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 1225, in main
eval_type = read_int(infile)
Abishek
04/21/2024, 7:06 PMBharath Vishal G
04/22/2024, 6:42 PMStatsForecast
that could help me show explainability for StatsForecast models, , Appreciate any resources/direction?Dimitris Floros
04/23/2024, 2:02 AMdf_new
in predict, is there something similar in statsforecast?Jeff Tackes
04/25/2024, 12:20 AMYan Liu
04/26/2024, 4:40 AMnmodels
from 5 to 4 significantly impact training time?
We're training AutoARIMA for 8,000 - 12,000 time series models using the AutoARIMA instance specified as below, but it takes very long time and for some instances we saw this refitting (while ARIMA is almost instantaneous)
auto_arima_model = [AutoARIMA(season_length=7, nmodels=5, trace=True)]