Slackbot
01/03/2024, 10:33 AMElijah Ben Izzy
01/03/2024, 3:18 PMadapter=
or use with_adapters
and pass it in as *args if you’re using the new driver builder
Should take 5 minutes although the blog post will make it much simpler just made the API public facing. If you’re having trouble I’m happy to share the draft of the post — it walks you through this.Arthur Andres
01/03/2024, 3:48 PMElijah Ben Izzy
01/03/2024, 3:49 PMArthur Andres
01/03/2024, 6:40 PM.with_adapter(SimplePythonGraphAdapter(DictResult()))
Arthur Andres
01/03/2024, 6:42 PMElijah Ben Izzy
01/03/2024, 7:11 PMArthur Andres
01/03/2024, 9:50 PMimport logging
from typing import Any, Dict, Optional
import pyarrow as pa
import humanize
from hamilton.lifecycle import NodeExecutionHook
logger = logging.getLogger(__name__)
class LogTableStatsNodeExecutionHook(NodeExecutionHook):
def run_before_node_execution(
self,
*,
node_name: str,
node_tags: Dict[str, Any],
node_kwargs: Dict[str, Any],
node_return_type: type,
task_id: Optional[str],
**future_kwargs: Any,
):
pass
def run_after_node_execution(
self,
*,
node_name: str,
node_tags: Dict[str, Any],
node_kwargs: Dict[str, Any],
node_return_type: type,
result: Any,
error: Optional[Exception],
success: bool,
task_id: Optional[str],
**future_kwargs: Any,
):
if isinstance(result, pa.Table):
<http://logger.info|logger.info>(
"Table %d: {result.num_rows:_d} * {result.num_columns}"
f" = {humanize.naturalsize(result.nbytes)}",
)
Elijah Ben Izzy
01/03/2024, 10:27 PMStefan Krawczyk
01/16/2024, 12:29 AMArthur Andres
01/16/2024, 8:29 AMpyarrow.Table
that I then join together myself.Arthur Andres
01/16/2024, 3:11 PMElijah Ben Izzy
01/16/2024, 4:21 PMextract_columns
works. I think this is a little less relevant to pyarrow (most people use tables), but might be worth exploring
2. Moving your pyarrow hooks into something like plugins.h_pyarrow
3. Adding a result-builder to do the joining if its common logic
Basically any bespoke piece that you did we might be able to fit into a pluginArthur Andres
01/16/2024, 5:26 PMStefan Krawczyk
01/16/2024, 5:37 PMI guess maybe you’d want to reimplement what you’ve done for pandas, but for pyarrow?yep supporting all “table” like data types is something we should have.