https://ploomber.io/ logo
Join Slack
Powered by
# hacktoberfest-team-8
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 7:12 PM
    Eva is also really close. I think she is working on dev-eva
  • b

    Ben Marsh

    10/30/2023, 7:21 PM
    ok, my work is slowly pushing to a new branch.
  • b

    Ben Marsh

    10/30/2023, 7:33 PM
    ok, now my work in on the ben-dev branch. src/app/app_precalc_index.py is what I'm trying to get work. You can see in my notebook the issue I'm having. src/etl/Ben_ETL.ipynb about halfway down in the Document Store section, when I try to run the pipeline.
  • b

    Ben Marsh

    10/30/2023, 7:34 PM
    i initialize the faiss document store, write the documents in, load the index I had previously calculated, but it doesn't seem to recognize the existence of the documents when I run the pipeline.
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 7:34 PM
    ok let me take a look
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:25 PM
    Hello @Ben Marsh took a look
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:25 PM
    which of the 2 apps is giving issues?
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:25 PM
    I see
    src/app/app_precalc_index.py
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:25 PM
    and
    src/app/app.py
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:25 PM
    are using 2 different approaches to load the document store
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:26 PM
    from the pipeline's perspective, it seems the app is currently incomplete I see an extraction script but I don't see the indexing script from the solution I shared, the team had created one script for each of the steps: • extract • index • q&a
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:28 PM
    also, index_path and config_path are not specified
    Copy code
    if __name__ == "__main__":
        # Load environment variables (if any)
        openai_key = os.environ['OPENAI_HACKTOBERFEST_KEY']
    
        # Initialize documents
        # documents = initialize_documents('../../data/recipe_docs.csv')
    
        # Initialize document store and retriever
        # document_store, retriever = initialize_faiss_document_store(documents=documents)
        document_store, retriever = preloaded_faiss(index_path=index_path, config_path=)
    
        # Initialize pipeline
        query_pipeline = initialize_rag_pipeline(retriever=retriever, openai_key=openai_key)
  • b

    Ben Marsh

    10/30/2023, 9:29 PM
    so app.py i got working when just loading a sample of 1000 rows of our data and calculating the index. for the app_precalc_index, i was trying to load an index I had saved from running it in google colab (my computer doesn't have a gpu, so calculating the index for the full data set was taking like 30 hrs.)
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:30 PM
    ok so from what I understand
    app.py
    is functional with the second approach taking longer due to computational issues
  • b

    Ben Marsh

    10/30/2023, 9:32 PM
    yes. in theory app.py would work for the full data set if i comment out the part of the initialize_documents function that takes a sample of the data, but it will take forever on my computer. the second approach was an attempt at a workaround, using colab to calculate the index, then saving that locally and loading it
  • b

    Ben Marsh

    10/30/2023, 9:33 PM
    but loading it doesn't seem to work properly
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:33 PM
    I am a bit confused also by the 2 approaches here is a second solution (some similarities) from Eva https://github.com/btmarsh6/rag-pipeline-chatbot/tree/dev-eva/src/app this solution is a bit more complete in terms of the packaging of the application and the extraction pipeline
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:34 PM
    It is confusing to know which solution to take
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:36 PM
    My take from this is given the goal is to complete an MVP, then if your team has found a solution for a smaller subset of the data (100 rows or 1000 rows) and were able to connect the pipeline to a chainlit application, then this smaller application is what is packaged for deployment
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:38 PM
    From the requirements perspective, this means 1. having finalized extraction scripts 2. having an indexing pipeline 3. having an app.py that can read and setup RAG for the smaller subset of the data 4. having a dockerfile and requirements.txt 5. package these for deployment to ploomber cloud
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:39 PM
    can get on a call tomorrow or Wednesday to help out merging the two approaches
  • b

    Ben Marsh

    10/30/2023, 9:41 PM
    Yes, i think a call would be helpful. Could you let me know what time would be best for you?
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:49 PM
    Sounds good I can meet on Wednesday at 12 PM Pacific Time or later (until 5 pm) Between now and then, can you follow some of the steps that Eva added by incorporating the dockerfile and requirements.txt for the smaller working app? Other steps that she followed included adding a complete download script. Will add you as a reviewer to her PR your goal is to merge her work with your work for the smaller working app
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 9:49 PM
    the larger app with the full subset of the data that is split into an indexing pipeline + q&a pipeline is something we can explore as future work
  • e

    Eva Draganova

    10/30/2023, 10:34 PM
    Hi @Ben Marsh and @Laura Gutierrez Funderburk, I just come back from work. Let me know if you want to meet tonight. I may be available tomorrow evenings, after the Halloweens candy time giving..so much candy to give tomorrow
  • l

    Laura Gutierrez Funderburk

    10/30/2023, 10:38 PM
    Hi @Eva Draganova and @Ben Marsh Can meet tomorrow evening (anytime after 3 pm PT) Tonight I can meet a bit later (around 9 pm PT)
  • b

    Ben Marsh

    10/30/2023, 10:40 PM
    I could do tonight though I know that's late for you Eva. Tomorrow, I can meet, but I have something else starting at 6 pm PT.
  • s

    Slackbot

    10/30/2023, 10:49 PM
    This message was deleted.
    l
    e
    b
    • 4
    • 7
  • l

    Laura Gutierrez Funderburk

    10/31/2023, 12:54 AM
    Ploomber cloud deployment https://docs.cloud.ploomber.io/en/latest/intro.html
  • l

    Laura Gutierrez Funderburk

    11/16/2023, 9:35 PM
    archived the channel