Brandon Rickman
02/07/2025, 1:36 AMkapa.ai
02/07/2025, 1:37 AMpython import pandas as pd # Read from the cache into a pandas Dataframe: ohlcv_df = cache["ohlcv_historical_data"].to_pandas() # Convert 'time_period_start' to datetime format and necessary columns to numeric ohlcv_df["time_period_start"] = pd.to_datetime(ohlcv_df["time_period_start"]) numeric_columns = [ "price_open", "price_high", "price_low", "price_close", "volume_traded", "trades_count", ] ohlcv_df[numeric_columns] = ohlcv_df[numeric_columns].apply( pd.to_numeric, errors="coerce" ) # Calculate daily price movement ohlcv_df["daily_movement"] = ohlcv_df["price_close"] - ohlcv_df["price_open"]
This approach allows you to perform custom transformations on your data using pandas or any other Python libraries before writing it to your destination. [Extract crypto data from CoinAPI using PyAirbyte](https://airbyte.com/tutorials/extract-crypto-data-from-coinapi-using-pyairbyte)Brandon Rickman
02/07/2025, 1:39 AMBrandon Rickman
02/07/2025, 1:39 AMkapa.ai
02/07/2025, 1:39 AMkapa.ai
02/07/2025, 1:40 AMJOB_MAIN_CONTAINER_MEMORY_REQUEST
and JOB_MAIN_CONTAINER_MEMORY_LIMIT
environment variables to customize memory allocation [Source: Documentation](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale). 3. Scaling Resources: If you're running Airbyte on Kubernetes, you can scale your resources to handle larger datasets. Airbyte recommends starting with a mid-sized cloud instance (e.g., 4 or 8 cores) and gradually tuning the instance size to your workload [Source: Documentation](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale). 4. Disk Space: For long-running syncs that produce a lot of data, Airbyte recommends allocating a minimum of 30GBs of disk space per node [Source: Documentation](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale). While these points provide some context on how Airbyte handles large data volumes, they don't specifically address how to handle datasets that don't fit into memory when using PyAirbyte. It's possible that PyAirbyte leverages some of these underlying Airbyte mechanisms, but the provided knowledge sources don't contain explicit information about this. For a more accurate and detailed answer specific to PyAirbyte's handling of large datasets, you may need to consult the PyAirbyte documentation or reach out to Airbyte's support for more information.Brandon Rickman
02/07/2025, 1:41 AMBrandon Rickman
02/07/2025, 1:41 AMkapa.ai
02/07/2025, 1:41 AMkapa.ai
02/07/2025, 1:41 AM