This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

06/06/2022, 12:43 PM

This message was deleted.

Jakub Bartczuk

06/06/2022, 12:44 PM

Is there some way to freeze file version?

Eduardo

06/06/2022, 12:45 PM

it stores the source code of the function and it compares it to the current one. it normalizes whitespace and ignores comments. can you show the spurious diff? what do you mean by freezing the file version?

Jakub Bartczuk

06/06/2022, 12:45 PM

how does it normalize whitespace?

Jakub Bartczuk

06/06/2022, 12:45 PM

That might be it

Jakub Bartczuk

06/06/2022, 12:46 PM

uhm sorry

Jakub Bartczuk

06/06/2022, 12:46 PM

vector_data = pd .read_csv

Jakub Bartczuk

06/06/2022, 12:47 PM

in the original file there is no space between "pd"

Eduardo

06/06/2022, 12:53 PM

It runs autopep8, I don't think that's the problem. But let me do some digging. I'll send you some commands that you can run to debug the problem

Jakub Bartczuk

06/06/2022, 1:02 PM

I ran autopep8 and then ran black which I use for formatting, and black didn't detect any changes. But ploomber status says that code changed

Eduardo

06/06/2022, 1:05 PM

Oh I see what the problem is. It happened before. I think black and autopep8 change the quotation marks; I remember someone having a problem when using black. Try skipping black and see if that fixes it. We still have to provide a long term solution since black is pretty popular

Jakub Bartczuk

06/06/2022, 1:08 PM

but when I run autopep8 it also doesn't change anything

Jakub Bartczuk

06/06/2022, 1:09 PM

I tried normalize_python from your codediffer and it returns code with these weird new whitespace, this is different from autopep8 normalization

Jakub Bartczuk

06/06/2022, 1:09 PM

or maybe you use some specific options?

Eduardo

06/06/2022, 2:27 PM

alright, let me take a look at the source code

Eduardo

06/06/2022, 2:43 PM

ok, can you run the task that's always marked as outdated, then execute:

Copy code

ploomber interact

then:

Copy code

# replace 'task-name' with the actual name
print(dag['task-name'].status(return_code_diff=True)['Code diff'])

and show me the output?

Jakub Bartczuk

06/06/2022, 4:43 PM

def train(

product,

upstream,

classes_to_use: List[str],

class_maping: Dict,

model: str,

model_parameters: Optional[Dict] = None,

search_type: Optional[str] = None,

parameters_search: Optional[Dict] = None,

cv: int = 5,

test_size: float = 0.2,

perform_data_scaling: bool = True,

):

model_path = str(product["model_path"])

vector_data = pd .read_csv(str(upstream["train.enrich_tif_metadata"]))

vector_data = process_labels(vector_data, classes_to_use, class_maping)

X = vector_data .drop(["labels", "filename"], axis=1).to_numpy()

y = vector_data["labels"].to_numpy()

X_train, X_test, y_train, y_test = model_selection .train_test_split(

X, y, test_size=test_size, random_state=42, stratify=y

logging .info("used metadata features")

logging .info(

col

for col in vector_data .columns

if "embedding"not in col and col not in ["labels", "filename"]

logging .info("also using embeddings")

if model_parameters is None:

model_parameters = {}

classifier = MODELS[model](**model_parameters)

if perform_data_scaling:

classifier = pipeline .make_pipeline(

preprocessing .StandardScaler(), classifier)

if search_type in PARAMETER_SEARCH_TYPES .keys():

logging .info("performing parameter search")

if parameters_search is None:

parameters_search = {}

classifier = PARAMETER_SEARCH_TYPES[search_type](

classifier, parameters_search, random_state=42, cv=cv

logging .info("used metadata features")

logging .info(

col

for col in vector_data .columns

if "embedding"not in col and col not in ["labels", "filename"]

classifier .fit(X_train, y_train)

logging .info(classifier .best_params_)

classifier = classifier .best_estimator_

else:

logging .info("fitting model")

classifier .fit(X_train, y_train)

y_pred = classifier .predict(X_test)

logging .info("classification report")

logging .info(metrics .classification_report(y_test, y_pred))

with open(model_path, "wb")as handle:

pickle .dump(classifier, handle)

Jakub Bartczuk

06/06/2022, 4:43 PM

that's basically the whole function

Eduardo

06/06/2022, 9:37 PM

interesting. so that should add

and

to the diff to show what's detecting, but I don't see any, but the whitespace definitely looks weird. let me do some debugging

Jakub Bartczuk

06/07/2022, 9:13 AM

BTW I tried this with first with ploomber 0.15, I updated it to 0.19.6 and it's the same

Eduardo

06/07/2022, 11:04 AM

yeah, i was expecting that. we haven't changed the code that compares the cache vs the actual in a while. please try this:

Copy code

ploomber interact

then:

Copy code

dag[task_name].status()

and share the table that appears

Eduardo

06/17/2022, 4:27 PM

were you able to fix this?

2 Views

Open in Slack

Previous Next