Lucas Orellana Pérez
02/06/2025, 12:42 PMline_items
from HubSpot to BigQuery using Airbyte. The process takes an excessive amount of time to start, even though the number of records is small. Logs indicate that the sync remains idle for at least an hour before actually beginning.
Key Observations
1. Other HubSpot tables sync without issues – This problem is specific to line_items
.
2. Sync hangs before it officially starts – The workload is created and marked as RUNNING
, but no actual data transfer occurs for a long time.
3. The logs repeatedly show idle messages, such as:
4. arduino
5.
1. Pool queue size: 0, Active threads: 0
2. This suggests that the process is not actively fetching or writing data.
3. T*he buffer remains empty*
4. , as indicated by:
5. yaml
1. Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2. No explicit errors appear in the logs
3. , meaning there is no immediate failure but rather a stall in execution.
P*ossible Causes Considered*
• API rate limits? → Unlikely, since other tables sync fine.
• Permissions? → Not an issue, as full refresh works.
• Schema-related issues? → There might be a problem with how line_items
is processed.
• Pagination or cursor (updatedAt
) issues? → Airbyte may be stuck waiting for changes.
Questions for Support
1. Is there a known issue with syncing line_items
from HubSpot?
2. Could this be related to how Airbyte handles incremental syncs on deeply nested or structured data?
3. Are there specific configurations or workarounds to speed up incremental syncs for this stream?kapa.ai
02/06/2025, 12:43 PMline_items
from HubSpot to BigQuery using Airbyte. While there isn't a specific mention of issues with the line_items
stream, I can address some of your concerns and provide potential solutions.
1. Known issues with syncing line_items
from HubSpot:
There is no specific mention of known issues with the line_items
stream in the provided knowledge sources. However, HubSpot does have some limitations and performance considerations that might be affecting your sync.
2. Handling of incremental syncs on deeply nested or structured data:
The knowledge sources don't provide specific information about how Airbyte handles deeply nested or structured data for HubSpot. However, it's worth noting that since version 1.5.0 of the HubSpot connector, there have been changes in how nested fields are handled. As mentioned in the [HubSpot Source Documentation](https://docs.airbyte.com/integrations/sources/hubspot#limitations--troubleshooting), "Since version 1.5.0, in order to offer users access to nested fields, we also denest the top-level fields into individual fields in the destination."
3. Configurations or workarounds to speed up incremental syncs:
While there isn't a specific solution for your exact issue, here are some suggestions based on the available information:
a. Check API rate limits: Although you mentioned it's unlikely due to other tables syncing fine, it's worth double-checking the API rate limits. As per the [HubSpot Source Documentation](https://docs.airbyte.com/integrations/sources/hubspot#limitations--troubleshooting), HubSpot has different rate limits based on your product tier.
b. Verify permissions: Ensure that the API key or OAuth token has the necessary permissions for accessing the line_items
data. Some streams require specific permissions, as mentioned in the documentation.
c. Review the cursor field: Make sure the cursor field (updatedAt
) is properly indexed in your HubSpot account. A similar issue was reported in the [Airbyte Forum](https://discuss.airbyte.io/t/4007) where an unindexed cursor field caused significant delays.
d. Check for large data volumes: If the line_items
table is significantly larger than other tables, it might be causing the delay. Consider using a more recent start_date
for the initial sync to reduce the volume of data being processed.
e. Upgrade connector versions: Ensure you're using the latest versions of both the HubSpot source connector and the BigQuery destination connector. Newer versions often include performance improvements and bug fixes.
f. Consider using GCS staging: For large syncs, it's recommended to use GCS staging with BigQuery. This was suggested in an [Airbyte Forum post](https://discuss.airbyte.io/t/298) as a potential solution for slow syncs.kapa.ai
02/06/2025, 12:43 PMline_items
from HubSpot to BigQuery using Airbyte. While there isn't a specific mention of issues with the line_items
stream, I can address some of your concerns and provide potential solutions. 1. Known issues with syncing line_items
from HubSpot: There is no specific mention of known issues with the line_items
stream in the provided knowledge sources. However, HubSpot does have some limitations and performance considerations that might be affecting your sync. 2. Handling of incremental syncs on deeply nested or structured data: The knowledge sources don't provide specific information about how Airbyte handles deeply nested or structured data for HubSpot. However, it's worth noting that since version 1.5.0 of the HubSpot connector, there have been changes in how nested fields are handled. As mentioned in the [HubSpot Source Documentation](https://docs.airbyte.com/integrations/sources/hubspot#limitations--troubleshooting), "Since version 1.5.0, in order to offer users access to nested fields, we also denest the top-level fields into individual fields in the destination." 3. Configurations or workarounds to speed up incremental syncs: While there isn't a specific solution for your exact issue, here are some suggestions based on the available information: a. Check API rate limits: Although you mentioned it's unlikely due to other tables syncing fine, it's worth double-checking the API rate limits. As per the [HubSpot Source Documentation](https://docs.airbyte.com/integrations/sources/hubspot#limitations--troubleshooting), HubSpot has different rate limits based on your product tier. b. Verify permissions: Ensure that the API key or OAuth token has the necessary permissions for accessing the line_items
data. Some streams require specific permissions, as mentioned in the documentation. c. Review the cursor field: Make sure the cursor field (updatedAt
) is properly indexed in your HubSpot account. A similar issue was reported in the [Airbyte Forum](https://discuss.airbyte.io/t/4007) where an unindexed cursor field caused significant delays. d. Check for large data volumes: If the line_items
table is significantly larger than other tables, it might be causing the delay. Consider using a more recent start_date
for the initial sync to reduce the volume of data being processed. e. Upgrade connector versions: Ensure you're using the latest versions of both the HubSpot source connector and the BigQuery destination connector. Newer versions often include performance improvements and bug fixes. f. Consider using GCS staging: For large syncs, it's recommended to use GCS staging with BigQuery. This was suggested in an [Airbyte Forum post](https://discuss.airbyte.io/t/298) as a potential solution for slow syncs. g. Monitor resource usage: Keep an eye on the CPU and memory usage of your Airbyte instance during the sync. If resources are constrained, it could cause delays in processing. If these suggestions don't resolve the issue, I recommend opening a new issue on the [Airbyte GitHub repository](https://github.com/airbytehq/airbyte) with detailed logs and information about your setup. The Airbyte team can then investigate if there's a specific problem with the line_items
stream that needs to be addressed.