Hello What is the recommended (prod) way of ingest...
# general
j
Hello What is the recommended (prod) way of ingesting batch data without Hadoop ? I'm thinking about having a Python component generate parquet files + copy on deepstore, and triggering an ingestion Something like the
/ingestFromFile
API endpoint but prod-compatible (where can segment creation be done in that case ? Minion ?) Thanks !
k
I’m guessing Minion would work well for that use case, but I haven’t tried that. We just run shell scripts to trigger the segment generation job, which uses HDFS for input (csv) and output (segment) directories. Then a script executes a “URI push” job, which will use hdfs URIs to do a more efficient load of segments. Though you need to set up and use controller & server config files to configure hdfs as a valid file system for URIs.
j
I see, thanks for the feedback @User I'll have a look at using the minion, otherwise we'll use a shell script as you did 🙂
m
We are working on a solution where Minion can do the ingestion, but not ready yet. cc: @User
🙂 1