https://venicedb.org logo
Join Slack
Powered by
# antithesis
  • c

    Craig Alfieri

    07/13/2023, 8:57 PM
    Hi <!here> Thank you again for attending today's session for the read out of Antithesis Testing of Venice for the last couple of weeks. Safe to say we're just scratching the surface, looking forward to the potential. Items we covered off on today: Antithesis refresher: Components, Architecture, Workflow, Set-up, Findings. POC covered/validated items: ✅ Configured Image Builds ✅ Antithesis Set-up configured docker-compose.yml ✅ Implemented and Configured Workload ✅ Antithesis Simulated set-up executed (testing framework, fault injector, etc.) ✅ Identified and reproduced potential data durability issues ✅ Review and Readout Recommended next session: Walk through of debugging with Antithesis. Open Items: Security considerations/review of Antithesis (What is needed here) Cadenced session with Amit, timing Today's read out recording: https://us06web.zoom.us/rec/share/P9SyKn9Xfr_4GWo1hjRMe8LLJqcri9dFr3MqbtiOfIaAMlzOfK0X0GRi6TT2zmQw.xGFaCMC2pSHF8qNA Passcode: fQ$m6ZAz Today's presentation deck .PDF:
    Venice POC Status Slides.pdf
  • s

    Slackbot

    07/13/2023, 11:21 PM
    This message was deleted.
    f
    c
    s
    • 4
    • 12
  • s

    Saxena Amit

    07/24/2023, 8:38 PM
    Hey team, Sorry had to bail for an important meeting. Did we come to an understanding on the next steps?
  • c

    Craig Alfieri

    07/25/2023, 9:19 PM
    Hi <!here> Wanted to thank everyone again for their time yesterday. The recording can be accessed here: https://us06web.zoom.us/rec/share/qsQD2fTY_yBQKWT8P2MNsFqV5DkN-rvhnmqaFVKYZ1MZD9wMMsXSjCULNkoKpmPw.t4aZmo2_WNMMkSUa We enjoyed the discussion and would recommend the following two items to dive deeper on in our next session: 1. Antithesis to Demo with repro of the bug found (with the session we ran excluding faults on controller/Kafka) 2. Antithesis to show how our fault injection works and the state of Venice nodes and share data visualizations along the way This will help support the two capabilities of high value identified on our last discussion: 1. Fault Injection Capabilities and 2. Reproducibility of issues This will also provide a practical workflow for the teams to better understand how to remediate once a bug/bad state is found. We can also address top of mind questions/open discussion that have come up in the meantime. Open to additional to explore, we'll defer to the team on what those are. Best-
  • s

    Saxena Amit

    07/25/2023, 9:22 PM
    Yeh that sounds like a plan. We would also like to understand - how reproduction will work , how much control will we have over the process and how debugging will work .
    👍 1
  • s

    Saxena Amit

    07/25/2023, 9:23 PM
    thanks
    🙌 1
  • c

    Chang Xiao

    07/26/2023, 5:54 PM
    Hi team, questions we have in our reproduction of the messages: 1. At the point of when we see
    com.linkedin.venice.exceptions.validation.MissingDataException
    from
    <http://service_venice-server-1.dc-1.venicedb.io|service_venice-server-1.dc-1.venicedb.io>
    DC1 errors out and we no longer can query it for data. 2. We are still able to query DC0 for the dataset E.g.
    Copy code
    service_venice-client info
    2023-07-24 21:49:23 INFO [VenicePushJob] [main] Specific status: {dc-0=END_OF_PUSH_RECEIVED, dc-1=STARTED}
    333.087
    service_venice-client info
    2023-07-24 21:49:24 INFO [VenicePushJob] [main] Specific status: {dc-0=END_OF_PUSH_RECEIVED, dc-1=STARTED}
    334.193
    service_venice-client info
    2023-07-24 21:49:25 INFO [VenicePushJob] [main] Specific status: {dc-0=END_OF_PUSH_RECEIVED, dc-1=ERROR}
    a. Would we still be able to to validate that all data exists just from DC0? b. How does the data push in general works between the the data centers? In the
    batch-push-job.properties
    we have the venice.discover.urls set as the parent controller
  • f

    Felix GV

    07/26/2023, 5:57 PM
    If missing data is detected during a push, then the push errors out and the dataset version associated with the push never comes online (at least in the DC where the error occurred), as you’ve seen. If missing data is detected from nearline writes (i.e. after a push has already ended, or if there was no push at all…) then we only record a metric, but don’t error out, since the dataset version is already online (the show must go on).
  • s

    Slackbot

    07/26/2023, 8:02 PM
    This message was deleted.
    f
    j
    • 3
    • 3
  • c

    Craig Alfieri

    07/31/2023, 1:22 PM
    Hi Venice team - Would it be possible to look towards early next week (8/7 or 8/8) to re-gather on de-bugging / reproducing workflow? I would have suggested this week but we have some team members out of the office for PTO.
  • s

    Saxena Amit

    07/31/2023, 5:59 PM
    Sure. Monday is better than Tuesdays . 9-11 or 1-3 should work.
  • c

    Craig Alfieri

    07/31/2023, 10:30 PM
    Thanks Amit! Ill run these avails by our team and see which window lands best.
  • c

    Craig Alfieri

    08/01/2023, 1:29 PM
    Hi Amit- I just checked calendars on our side and it looks like we have the opposite situation on our side, (Tuesday is better than Monday). Any chance there is an opening on Wednesday (8/9)?
  • s

    Saxena Amit

    08/01/2023, 10:18 PM
    Wednesdays are no meeting days for us. Thursday after 12 can work as well.
  • s

    Saxena Amit

    08/02/2023, 5:38 PM
    Hi Craig Thursdays 1-2 pm is also good on 24th as well. I saw updated meeting invite for 24th. Not sure if its a mistake. These are the times that will work for us. Mon 9-11, 1-3:30 . Tue 1030 1130 , 2-3 . Wed not avail. Thu 10:30 to 11:30 , 2- 4 pm , fri 9-11:30 , 1-4 pm. Also Friday 18th is a no meeting day
  • c

    Craig Alfieri

    08/02/2023, 5:49 PM
    Thank you Amit- Not sure how/why the invite drifted to the 24th. I moved this to Friday 8/11 from 1-2 est. 10-11 pst. Thank you for patience while we got that squared away. Looking forward to reconnecting then.
    👍 1
  • c

    Craig Alfieri

    08/11/2023, 5:02 PM
    Hi everyone we are starting today's session if you can still make it, here is the VC details:
  • c

    Craig Alfieri

    08/11/2023, 5:02 PM
    https://us06web.zoom.us/j/83981515132?pwd=d1lDdXo2MHpOUElLR2NpVllESytWdz09
  • s

    Slackbot

    08/12/2023, 7:23 AM
    This message was deleted.
    👍 1
    🚀 1
    z
    j
    c
    • 4
    • 31
  • s

    Slackbot

    09/21/2023, 1:47 PM
    This message was deleted.
    f
    j
    • 3
    • 2
  • c

    Craig Alfieri

    10/03/2023, 2:38 PM
    Hi Venice Team- <!here> Hope you are all doing well! We recently released V2 of Antithesis and wanted to share some highlights. If there are any questions on these release notes or areas of interest we want to delve deeper into please do not hesitate. Thank you again and take great care. Antithesis Version 2 Summary 1. Antithesis Version 2 (V2) is now being rolled out to our customers. 2. This new version of Antithesis enables us to offer more to customers—and offer it sooner. This is because V2 empowers Antithesis to build new features in response to customer demand far quicker than we could on our old platform; our timeline has moved from months to days. Upsides for our Customers Reports • All reports now give access to log and core dumps, rather than these features having to be built specifically for each customer report. • Further, old reports only generated logs at crashes; we now generate logs and core dumps at any point in time, as well as providing the state of each test property at said points in time. • All reports contain examples and counter-examples of the selected test properties. • These new report features can be used to reduce the massive amount of logs our customers need to store. • This can be done by running most or all tests within Antithesis; since we capture such detailed info for each experiment, we can reconstitute a particular test within minutes. Performance • Each test run will be more effective because: ◦ Breadth of search increases. ◦ Latency decreases. • "SnapshotGC" will keep us from losing test runs due to running into RAM limits. New Features These are the new features that will immediately enable our Professional Services team to provide greater value to customers. • Ability to build powerful command line extensions. • We have already used this to create features which give greater detail of activity leading to crashes, or provide detailed system performance graphs. • "Multiverse Analysis". • We can explore the multiverse of system states created by Antithesis in two powerful ways: ◦ We can use test properties or particular combinations of properties to analyze particular system states, ◦ or explore further from said states. • Our new eventset query language enables members of Antithesis Professional Services to craft better test properties and reports for customers, by making their workflow far more intuitive. • We now have the ability to observe an ongoing test and interact with it in real time, running any command. • Before, we could only interact with a test after it had completed.
    🎉 1
    😮 1
  • z

    Zac Policzer

    10/03/2023, 6:22 PM
    Interacting with a test is kind of interesting. I wonder if we could rig it up to view the state of zk through the steps
  • f

    Felix GV

    10/03/2023, 6:23 PM
    We could log the entire EV/CV every time it changes I guess, right? We wouldn't want to run that in prod, but it could be an ok mode to enable in this kind of setup.
  • j

    Jeremy

    10/03/2023, 8:45 PM
    We could use the Zookeeper-cli at points during the test reproduction to inspect. Would that be able to get to the state information you’d be interested in?
  • c

    Chang Xiao

    10/05/2023, 5:15 PM
    Hi team, it was very nice meeting @Saxena Amit @Min Huang and @Zac Policzer at the Linkedin campus as well as catching up with @Gaojie Liu at the Qcon conference venue. Just to summarize some of the next steps we discussed: 1. The Venice team will use some "bug basher" time to fix the MissingDataException issue we found as part of the POC. Once this fix is in place will mark the conclusion of the POC and move towards discussion on an pilot engagement. 2. During the process of the fix, the Venice team may need to add more debugging, etc to the code. We are happy to help build off a debugging branch as needed to help this process and rerun our tests that finds the issue with the additional debugging. 3. Once a fix is in place, we will rerun and confirm with high confidence that this issue is resolved. Attached is the quick presentation I gave onsite, you can find additional debugging information including jmap core files in the analysis. Again please reach out if you need any additional help!
    Venice POC Status Slides (9_30).pdf
    🙌 3
  • s

    Slackbot

    10/24/2023, 9:36 PM
    This message was deleted.
    z
    c
    • 3
    • 2
  • s

    Slackbot

    10/26/2023, 6:20 PM
    This message was deleted.
    z
    j
    +2
    • 5
    • 13
  • s

    Slackbot

    11/29/2023, 2:35 PM
    This message was deleted.
    z
    c
    • 3
    • 2
  • s

    Slackbot

    12/05/2023, 7:18 PM
    This message was deleted.
    🙌 1
    z
    c
    • 3
    • 3
  • c

    Craig Alfieri

    01/24/2024, 4:04 PM
    Hi <!here> Greetings! It looks as though the Missing Data Exception is still being hit and was not resolved as originally thought. We tested one last time just to check and came across it. Below is a link to the report if the team wants to do an initial inspection... Suggest opening up the Environment drop down first to inspect what-was-tested. Then drop down on the Red Property Failure: Never: MissingDataExeception The reporting includes the log messages, downloadable log dump, and detail output text. https://venice.antithesis.com/report/G9lHBLz23oBiRIpKVJc2wiTZ/3hnxidMTWcWWtiazPpQds2jlE[…]YqrXZaRSs0xzHSYy82efKTOptGu4Q3HN-fbPgpaHzpi3xCQ&debug=true If time is available to anyone interested, happy to provide further guidance.