<@U01U69UJNUF> <@UV0M2EB8Q> I tried to reproduce u...
# getting-started
a
@chilly-holiday-80781 @mammoth-bear-12532 I tried to reproduce usage of MLModel similar to presentation from

the July 23rd meeting

, but I cannot get the same lineage for Training dataset —> MLModel as it was presented in the meeting. Below you can see screenshots from the
bootstrap_mce.json
in
metadata-ingestion
. I had to add MLModelGroup to make the
scienceModel
selectable in UI. These are issues I found with this sample `bootstrap_mce.json`: 1.
scienceModel
MLModel details page doesn’t have any lineage buttons or description of connections to training/evaluation datasets (compare it with view from demo). ML Group has
View Graph
button so Model -> Group lineage is visible in the graph. 2. Browse
ML Models
and
ML Groups
don’t show any entities, but these entities are searchable 3. MLModel has
TrainingData
and
EvaluationData
defined, but it’s not visible in the model details, also this is not visible in the dataset details. I’ve added
pageViewsHive
dataset in ingestion sample dataset, but it didn’t show any relations to
scienceModel
in lineage graph and in Downstream section of Lineage details I’ve tested it on
v0.8.8
,
v0.8.10
,
master
from yesterday. I tested both
elasticseach
and
neo4j
graph storage - the same results. What have I missed to make it working?
@chilly-holiday-80781 @ambitious-airline-8020 sample
bootstrap_mce.json
to reproduce it with
./docker/ingestion/ingestion.sh
. This sample is slightly fixed to avoid errors on querying from DataHub UI (added
mlModelGroup
, link to
mlModelGroup
from
mlModel
, removed
Cost
from
MLModel
because of NPE in GraphQL mapper with
costId
l
Hello Dmitry - Sagemaker's lineage API doesn't yet expose the model to feature table edge. We have asked them for that and it is probably a Q4 thing for them.
We decided to remove the lineage button on the models page after confirming this. cc @green-football-43791 @early-lamp-41924
The lineage that does exist right now is only from datasets to feature tables. We will add the feature table to model group edge as soon as SageMaker API supports it
sorry about the confusion here
a
@loud-island-88694 What’s about lineage from training/evaluation datasets to the model? This can be defined without feature tables
a
@abundant-dinner-2901 you mean without strict relation to SageMaker at all - right ?
a
Yes, it is not related to SageMaker, I am integrating it with my custom ETL pipeline
l
ok. will take a deeper look and get back to you
e
@abundant-dinner-2901 To allow browse, you need to ingest a BrowsePaths aspect. You should find it under com.linkedin.pegasus2avro.common.BrowsePaths. It has one field called paths that takes in a array of different slash-separated paths you can get to the entity (usually one). Note, all paths must start with a slash (like /a/b/c not a/b/c) This will also fix the lineage button not showing up issue (being fixed right now)
Unfortunately, we did not hook up trainingData and evaluationData yet. We only hooked up the trainingJobs field in MLModelProperties. It should link to the job that took in a few source datasets and trained the model. In which case, we create a lineage between the datasets and the job. and the job and the model.
We have added a task to add lineage edges from the old trainingData and evaluationData aspects as well. Sorry about the confusion
a
@early-lamp-41924 thanks for guidelines!
e
Let us know how it goes!
a
@early-lamp-41924 yes, adding browse paths helped to enable lineage, but it’s pretty limited - only
trainingJobs
/
downstreamJobs
, but not datasets as I expected. I also found that it’s not possible to link
MLModel
to
MLFeatureTable
, I need to define all
MLFeature
which are contained by
MLModel
- it’s really hard, because
MLFeatureTable
defined in one process, while
MLModel
in another one. Do you have any plans on connecting
MLFeatureTable
to
MLModel
?
e
As long as the urn for features in the MLModel match the urn for features in the table, it should work. Though this was a huge debate on our end as well. Either we give flexibility by connecting models to features directly or we connect models to feature tables. We are working on making this easier and more intuitive.
a
@early-lamp-41924 the same model can be connected to different sets/tables of features, which should be grouped in some way and used by model for training on different training sets. So I think the current ML entities are missing Dataset - MLFeatureTable, MLFeatureTable - MLModel relations
e
Right. Each of the ML Feature is bound by a namespace and is stored in a ML Feature Table. Since we have links between dataset - mlfeature and mlfeature - mlmodel, we should be able to infer the feature tables that the ml model depends on. This is something we will work on.