Hi, I am sending `identify` events to BigQuery and...
# support
s
Hi, I am sending
identify
events to BigQuery and according to this documentation it should be performing an upsert into the
users
table https://www.rudderstack.com/docs/destinations/warehouse-destinations/warehouse-schema/#identify But it seems to be appending, screenshot just of the
id
column, any ideas?
b
Indeed this is not the expected behavior. Can you share a few more details for your setup? • Are you self-hosting? if yes which rudder-server version you are using? • If not, what’s your workspace id ?
s
Hi, thanks for the quick response, we are self-hosting and we split our servers into a gateway and processor which are both on
rudder-server:31052022.120454
and we also host the transformers which are on
rudder-transformer:23072022.053518
b
This is a very old version, from more than a year ago. It is possible that remains no longer compatible with our control plane server. Or it could be something wrong in that particular version, but I need to check. Is it possible for you to upgrade to one of our latest versions?
s
That’s understandable, I will try to update our dev environment and test, just conscious about moving this to production and what could differ or potentially break, if it is an issue in this version would there be any chance of a patch?
And do you think I need to upgrade the rudder-server version or just the transformers?
b
if it is an issue in this version would there be any chance of a patch?
In the last year, we have had many things, and trying to release a patch for an older version will require significant effort. We can only check if we can modify our control-plane to be compatible with that version. I can not guarantee this.
And do you think I need to upgrade the rudder-server version or just the transformers?
Both will be needed. Is there a particular reason you stick to that version? We can provide help in upgrading, as this would be significant less effort from patching an older version.
s
Ok thanks, significant efforts can also come with risk so I’d choose a full upgrade then
We have just had issues in the past so once we have a working version we lock it down
But I do agree we need to be upgrading more frequently
b
Do you know when this problem first started? Did anything change on your end before that?
s
This is the first time we’re using identify events in BigQuery so I wouldn’t be able to tell unfortunately
I updated the versions to the latest and it partially fixed it - it seems that it still creates more users in the users table if the columns don’t match
@bright-energy-18897 are there any env vars I need to set or something else?
I sent 17 identify events for the same user, some with different columns, and I have 17 entries in the identifies table and 3 entries in the users table
Versions
Copy code
rudder-server:1.9.6
rudder-transformer:1.32.2
b
Hey Garth, sorry for the delay.
it seems that it still creates more users in the users table if the columns don’t match
late me try to reproduce this case 17 entries in identifies table is expected, in users it should be only one with latest/merged information
s
Thanks
Hi @bright-energy-18897 do you have any findings yet?
b
I am not able to reproduce the problem, I am sending identify requires with diffrent traits and only have a single entry in the user table.
@steep-byte-42023, can you share the sequence of requests that resulted in this issue? Also, can you try to reproduce it with our hosted version using a free account ?
s
Using the hosted version could be tricky, let me try get exact steps to reproduce
@bright-energy-18897 I’m going to be sending through a number of screenshots, in the order I did things
Step 1: I deleted the
identifies
and
users
tables
Sent the following event with two traits
One entry appeared in both tables
Sent another event with 3 traits
Users table again
b
I was able to replicate the problem as well
We will need some time to identify and resolve it
s
Great thanks, please keep me updated on the progress
b
Hey Garth, I discussed the problem with the team, and turns out this is expected behavior in BigQuery in order to optimize costs. For users tables, when a new property/trait comes in instead of updating/merging the existing record a new entry will be appended, it will contain the merged data of historical and new events. You can use this query to get the latest entry for each user:
Copy code
SELECT 
  * 
EXCEPT 
  (__row_number) 
FROM 
  (
    SELECT 
      *, 
      ROW_NUMBER() OVER (
        PARTITION BY id 
        ORDER BY 
          loaded_at DESC
      ) AS __row_number 
    FROM 
      `<project>.<db>.users` 
    WHERE 
      _PARTITIONTIME BETWEEN TIMESTAMP_TRUNC(
        TIMESTAMP_MICROS(
          UNIX_MICROS(
            CURRENT_TIMESTAMP()
          ) -60 * 60 * 60 * 24 * 1000000
        ), 
        DAY, 
        'UTC'
      ) 
      AND TIMESTAMP_TRUNC(
        CURRENT_TIMESTAMP(), 
        DAY, 
        'UTC'
      )
  ) 
WHERE 
  __row_number = 1
s
Hi @bright-energy-18897 thanks for the feedback and query, would it be possible for an environment variable to be introduced to disable the optimization?
b
You can set the following config
Warehouse.bigquery.isUsersTableDedupEnabled
to
true
This will start merging instead of appending, updates to users tables
s
Where do I set that?
b
How are you currently configuring your rudder-server ? config.yaml file or env variables ?
s
Env vars
b
RSERVER_WAREHOUSE_BIGQUERY_IS_USERS_TABLE_DEDUP_ENABLED=true
This should be equivalent environment variable
s
Great thanks, will give that a go
That did the trick thanks!
b
great, sorry for the initial confusion. I’ll make sure we make it explicit in our docs.
s
No worries, that’ll be great, thanks for all your help