https://linen.dev logo
Join Slack
Powered by
# whylogs-support
  • p

    powerful-potato-83513

    12/07/2023, 6:52 AM
    Hello
  • p

    powerful-potato-83513

    12/07/2023, 6:54 AM
    Is it possible to use whylogs profile visualisation in Python outside of the notebook environment (say in streamlit) ?
  • a

    acoustic-painter-98305

    12/07/2023, 5:11 PM
    Hi @powerful-potato-83513 - Yes, the profile visualizer can output html content, which could be displayed in streamlit or any other place that supports html. Would love to see an example of it if you get something working.
  • c

    cuddly-france-22384

    01/12/2024, 7:37 PM
    Hello, newb here wanting some guidance. I opened an account on whylabs for evaluation of an LLM. I would like to log my eval results to my whylabs dashboard from some analysis that I am running on a Google colab notebook. I am having trouble using
    why.init()
    such that the metrics log onto my dashboard. I am using
    from langkit.config import check_or_prompt_for_api_keys
    to enter my Whylabs keys and dataset ID. I tried
    why.init(session_type='whylabs')
    . When I type in my credentials I tried with and without
    "..."
    Here's what I get:
    Copy code
    WARNING:whylogs.api.whylabs.session.session_manager:No api key found in session or configuration, will not be able to send data to whylabs.
    WARNING:whylogs.api.whylabs.session.session_manager:No org id found in session or configuration, will not be able to send data to whylabs.
    m
    • 2
    • 1
  • c

    cuddly-france-22384

    01/12/2024, 7:38 PM
    Appreciate any help! Ty!
  • c

    cuddly-france-22384

    01/12/2024, 7:49 PM
    When I follow this flow
    Copy code
    ### First, install whylogs with the whylabs extra
    ### pip install -q 'whylogs[whylabs]'
    
    import pandas as pd
    import os
    import whylogs as why
    
    os.environ["WHYLABS_API_KEY"] = "YOUR-API-KEY"
    os.environ["WHYLABS_DEFAULT_ORG_ID"] = "YOUR-ORG-ID"
    os.environ["WHYLABS_DEFAULT_DATASET_ID"] = "model-1" # Note: the 'model-id' is provided when setting-up a model in WhyLabs
    
    # Point to your local CSV if you have your own data
    df = pd.read_csv("<https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv>")
                    
    # Run whylogs on current data and upload to the WhyLabs Platform
    results = why.log(df)
    results.writer("whylabs").write()
    
    I get the message
    
    Skipping uploading profile to WhyLabs because no name was given with name=
  • m

    mysterious-solstice-25388

    01/13/2024, 1:36 AM
    That warning message is generated from the why.log call and doesn’t reference your explicit use of the writer to upload the profile to WhyLabs. If you run the line with the: results.writer(“WhyLabs”).write() In a separate cell you should see that write succeed without a confusing message. Did you check the dashboards in your WhyLabs account to see if it already worked?
  • s

    silly-cricket-55450

    01/30/2024, 11:15 AM
    Hi Team, I am currently exploring options for logging profile data, and I'm particularly interested in using InfluxDB for this purpose. I wanted to inquire whether Whylogs has the capability to log profile data directly into InfluxDB. If this feature is available or if there are any plans to support InfluxDB integration in the future, I would greatly appreciate any information or guidance you can provide.
    b
    m
    • 3
    • 6
  • l

    lively-apartment-74947

    02/01/2024, 6:26 AM
    Copy code
    Hi team, I am exploring whylogs with fugue, but I keep getting this error even when I have increase this parameter: spark.driver.maxResultSize (50.0 GiB)
    Copy code
    24/01/31 15:25:10 ERROR TaskSetManager: Total size of serialized results of 60 tasks (54.1 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    24/01/31 15:25:10 ERROR TaskSetManager: Total size of serialized results of 61 tasks (55.0 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    24/01/31 15:25:10 ERROR TaskSetManager: Total size of serialized results of 62 tasks (55.9 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    24/01/31 15:25:10 ERROR TaskSetManager: Total size of serialized results of 63 tasks (56.8 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    24/01/31 15:25:10 ERROR TaskSetManager: Total size of serialized results of 64 tasks (57.9 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    24/01/31 15:25:10 INFO DAGScheduler: ResultStage 1 (_collect_as_arrow at /env/lib/python3.9/site-packages/fugue_spark/_utils/convert.py:206) failed in 257.895 s due to Job aborted due to stage failure: Total size of serialized results of 56 tasks (50.5 GiB) is bigger than spark.driver.maxResultSize (50.0 GiB)
    This is my code:
    Copy code
    def profile_dataframe(transaction_id: str, featureset_id: str, messaging, spark_df) -> None:
        """
        Profile the Spark DataFrame and update the profile via API.
    
        Parameters:
        - transaction_id (str): The transaction ID for the API request.
        - featureset_id (str): The featureset ID for the API request.
        - spark_df: The input Spark DataFrame.
        - messaging: custom request service.
    
        Raises:
        - ValueError: If there is an error during profiling.
        """
        try: 
            # Profile the Spark DataFrame using Fugue
            # since our input is already a spark df, we don't need
            # to specify engine=spark, it is automatically inferred
            dataset_profile_view = fugue_profile(spark_df)
    
            serialized_profile = dataset_profile_view.serialize()
    
            # Encode the serialized profile to base64
            output = base64.b64encode(serialized_profile).decode()
    
            # Update the profile via API
            update_profile_via_api(messaging, transaction_id=transaction_id, featureset_id=featureset_id,
                                   serialized_profile=output)
    
        except ValueError as e:
            raise ValueError(f"An error occurred during profiling: {e}")
    Can you please suggest a better approach if you are aware to achieve this? Specifically, it is failing for datasets like this: The first column is actually coming up as varchar (i am aware it should be int or long but that's the case with few datasets). (refer the image attached). The data is fairly small and definitely not bigger than 50 GB. It has these many records: 3210035928.
    b
    m
    m
    • 4
    • 9
  • t

    thousands-match-39457

    02/06/2024, 5:51 AM
    Is there a way to retrieve the computed classification metrics(Accuracy, F1, AUC etc) from the results of this call using the whylogs client library:
    Copy code
    results = why.log_classification_metrics(
            df,
            target_column = "output_discount",
            prediction_column = "output_prediction",
            score_column="output_score",
            log_full_data=True
        )
    Here is the example notebook: https://github.com/whylabs/whylogs/blob/mainline/python/examples/integrations/writ[…]ion_Performance_Metrics_to_WhyLabs.ipynb?ref=content.whylabs.ai
    m
    m
    • 3
    • 3
  • l

    lively-apartment-74947

    02/08/2024, 6:58 AM
    is there an alternate for this data type. Please see the error in thread.
    • 1
    • 1
  • c

    curved-coat-90458

    02/13/2024, 7:32 PM
    I am trying to push LLM monitoring metrics to Platform. I am facing this error even though langkit has been installed. Could you please help ? No module named : langkit.callback.handler. Lankit version is 0.0.2 and the error seems to be due to the code - whylabs = WhyLabsCallbackHandler.from_params() (edited)
    👀 1
    b
    • 2
    • 2
  • s

    silly-cricket-55450

    02/16/2024, 7:02 AM
    Everyone - I'm reaching out regarding an issue I encountered while attempting to generate a profile output for a JSON payload using WhyLogs. My aim was to leverage WhyLogs to analyze JSON data, but it seems there might be a limitation or a misunderstanding on my end regarding its compatibility with JSON input. Here's a brief overview of the problem along with the code snippet I used:
    Copy code
    import pandas as pd
    import whylogs as why
    
    # Simple JSON input data
    json_data = [
        {"deviceId": 373088, "uin": "CV620GVHEG0000007", "deviceType": "AC"},
        {"deviceId": 373089, "uin": "CV620GVHEG0000008", "deviceType": "AC"},
        {"deviceId": 373090, "uin": "CV620GVHEG0000009", "deviceType": "AC"}
    ]
    
    # Convert JSON strings to a DataFrame
    retail_daily = pd.DataFrame(json_data, columns=['json_string'])
    
    # Log the data frame
    results = why.log(pandas=retail_daily)
    
    # Get the Results
    profile = results.profile()
    
    # Display the profile
    profile.view().to_pandas()
    Output received
    Copy code
    | Column            | cardinality/est | cardinality/lower_1 | cardinality/upper_1 | counts/inf | counts/n | counts/nan | counts/null | distribution/max | distribution/mean | distribution/median | ... | distribution/q_95 | distribution/q_99 | distribution/stddev | type            | types/boolean | types/fractional | types/integral | types/object | types/string | types/tensor |
    |-------------------|-----------------|----------------------|----------------------|------------|-----------|-------------|--------------|------------------|-------------------|---------------------|-----|-------------------|-------------------|----------------------|-----------------|----------------|------------------|----------------|--------------|---------------|--------------|
    | json_string       | 0.0             | 0.0                  | 0.0                  | 0          | 3         | 3           | 3            | NaN              | 0.0               | None                | ... | None              | None              | 0.0                  | SummaryType.COLUMN | 0               | 0                | 0              | 0            | 0             | 0             |
    Expected Profile Output: • A detailed profile reflecting the characteristics of the JSON data such as data types, cardinality, counts, and distributions. I would appreciate any insights or guidance on whether WhyLogs supports JSON input directly for generating profiles. If not, I'd love to know any workarounds or best practices to achieve this. Additionally, if there are any mistakes in my approach or if further clarification is needed, please feel free to let me know.
    m
    • 2
    • 2
  • p

    purple-airplane-15031

    02/20/2024, 6:05 AM
    Hi, in the documentation it states the: “pyspark implementation is the experimental phase” 1. Is this statement still accurate? 2. Are there currently any issues with pushing whylogs into a Databricks/Pyspark environment? 3. Besides usage statistics, is any data sent outside of the local environment? (HIPAA compliance) Thanks.
    m
    • 2
    • 1
  • h

    happy-hamburger-71923

    02/21/2024, 3:53 PM
    hello, I stumbeled upon whylogs Git page and found the project really interesting! I'm trying to test it by running the following notebook to integrate it with MLflow : https://github.com/whylabs/whylogs-examples/blob/mainline/python/MLFlow%20Integration%20Example.ipynb how ever I'm getting an error at the very start : AttributeError: module 'whylogs' has no attribute 'get_or_create_session'
  • h

    happy-hamburger-71923

    02/21/2024, 3:54 PM
    is the notebook out of date or is there any other documentation to link whylogs with mlflow
  • h

    happy-hamburger-71923

    02/21/2024, 6:51 PM
    more over, in most of the examples whylogs is called directly in colab without initializing a connect to whylabs ... however that doesn't seem possible anymore (this colab for instance https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Getting_Started.ipynb#scrollTo=3XU5AY8ogr0I) can you please provide documentation on how to run whylogs directly in code ?
  • m

    mysterious-solstice-25388

    02/21/2024, 8:32 PM
    Hello @happy-hamburger-71923 Can you try this example? https://whylogs.readthedocs.io/en/stable/examples/integrations/Mlflow_Logging.html The example repository you reference is based on older whylogs 0.7.x and earlier: https://github.com/whylabs/whylogs-examples?tab=readme-ov-file#whylogs-examples
    ✅ 1
  • h

    happy-hamburger-71923

    02/22/2024, 9:26 AM
    thank you @mysterious-solstice-25388 for your reply, I was able to fully test it and I'm impressed by how easy it can be used and integrated with different frameworks
    🎉 2
  • b

    boundless-easter-47982

    03/05/2024, 11:05 AM
    👋 hello!
  • a

    astonishing-kangaroo-33717

    03/05/2024, 12:22 PM
    Hello everyone, I wish to perform model monitoring of machine learning models in production, and get several metrics like drift, precision, recall, AUC, etc, in distributed setting. I was intrigued by profiling offerred by whylogs team which gives out of the box pyspark support. I want to understand how can I use the profiles generated to find out the drift, precision, recall, and several other monitoring metrics. Is someone already doing this by solely using the OSS version?
    b
    • 2
    • 1
  • i

    incalculable-motorcycle-82302

    04/03/2024, 5:05 AM
    Hello, Is there a way to build a constraints that can check
    no_missing_values
    over two columns (or more) simultaneously? The check should fail only with rows where both columns have missing values.
    a
    • 2
    • 2
  • i

    incalculable-motorcycle-82302

    04/08/2024, 6:52 AM
    Hey, A couple of questions related to segmented profiling: • If I want both the overall profile and the segmented profile, do I need to log the dataset twice? Or would it be enough to perform the segmented profile and then aggregate that to the overall profile? Any code example to how to do this? • The segmented profiling seems orders of magnitude slower than the overall profile. Is it due to the large number of segments? Is there any code recommendation to speed up a segmented profile?
  • i

    incalculable-motorcycle-82302

    04/09/2024, 4:14 AM
    Should I partition the spark dataframe by the columns I am going to use in the segmented profiling to speed-up the process? any tips to speed-up the segmented profiling?
  • a

    acoustic-painter-98305

    04/09/2024, 2:19 PM
    Hi @incalculable-motorcycle-82302 - when you write segmented profiles to the whylabs platform, it will contain both segmented data and the total population for use in dashboards. How many segments do you have? It may impact performance but not to this degree you are mentioning.
  • e

    echoing-orange-31613

    06/18/2024, 3:23 PM
    Hi everyone, I'm facing slow processing times for a DataFrame loaded from BigQuery. The DataFrame has 250 million rows and 42 columns, it has 1 column with unique identifier for each customer and other columns are mostly integer columns. Currently it takes 21 mins to create profile. I'm hoping to identify the issue to improve the processing speed. The data is loaded into memory. Any suggestions would be helpful!
    a
    • 2
    • 2
  • b

    billions-easter-40437

    06/20/2024, 9:44 AM
    Hi, I need support about custom metrics in Whylogs: https://whylogs.readthedocs.io/en/latest/examples/advanced/Custom_Metrics.html and the result is in the attached image. can you explain why we have this result? also can you explain the function of columnar_update and merge?
    m
    • 2
    • 2
  • h

    high-electrician-66573

    06/26/2024, 7:31 AM
    Hi, I need support I need to remove "type" metric in the result of whylogs. how can I do that.
    m
    • 2
    • 10
  • h

    high-electrician-66573

    06/27/2024, 10:47 AM
    Hi, I need help
    Copy code
    @classmethod
        def zero(cls, config: Optional[MetricConfig] = None) -> "CustomMetric":
            return CustomMetric(
                n_row=IntegralComponent(0),
                missing_rate=FractionalComponent(0.0),
                duplicated_rate=FractionalComponent(0.0),
                n_unique=IntegralComponent(0),
                itype=StringComponent(""),
            )
    I need itype is string, do we have any component for str?
    m
    b
    • 3
    • 3
  • h

    high-electrician-66573

    06/28/2024, 7:10 AM
    hi I need support: I am using Whylogs to custom metric, and it work fine in whylogs. but when I used fugue_profile, it return the error:
    Copy code
    whylogs.core.errors.UnsupportedError: Unsupported metric: dx_base_metric
    I traced and found that _METRIC_DESERIALIZER_REGISTRY do not have the new custom metric.
    • 1
    • 1