Trying to get the ldap ingest working for a POC of...
# ingestion
c
Trying to get the ldap ingest working for a POC of datahub but get the following error.
Copy code
[ec2-user@ip-10-16-13-173 recipes]$ datahub ingest -c ./ldap_poc.yml
[2021-04-29 04:50:55,724] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'ldap', 'config': {'ldap_server': '<ldap://dc.internal.test.com>', 'ldap_user': 'CN=datahub_ldap,OU=Generic Accounts,DC=internal,DC=test,DC=com', 'ldap_password': 'revmoved', 'base_dn': 'DC=internal,DC=test,DC=com', 'filter': '(objectClass=*)'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://10.16.13.173:8080>'}}}
Traceback (most recent call last):
  File "/usr/local/bin/datahub", line 8, in <module>
    sys.exit(datahub())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/datahub/entrypoints.py", line 74, in ingest
    pipeline.run()
  File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
    for wu in self.source.get_workunits():
  File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ldap.py", line 130, in get_workunits
    b"inetOrgPerson" in attrs["objectClass"]
TypeError: list indices must be integers or slices, not str
Does the ldap server need to have the port listed as well?
c
Can you share your base_dn and filter?
ah I see it, sec
c
Copy code
[ec2-user@ip-10-16-13-173 recipes]$ datahub ingest -c ./ldap_poc.yml
[2021-04-29 14:59:49,901] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'ldap', 'config': {'ldap_server': '<ldap://dc.internal.test.com>', 'ldap_user': 'CN=datahub_ldap,OU=Generic Accounts,DC=internal,DC=test,DC=com', 'ldap_password': 'removed', 'base_dn': 'OU=Users,DC=internal,DC=test,DC=com'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://10.16.13.173:8080>'}}}

Source report:
{'failures': {}, 'warnings': {}, 'workunit_ids': [], 'workunits_produced': 0}
Sink report:
{'failures': [], 'records_written': 0, 'warnings': []}

Pipeline finished successfully
[ec2-user@ip-10-16-13-173 recipes]$
but doesn't show anything that was fetched.
c
Yeah I think you’re not fetching anything. You can try to use ldaps:// instead of ldap in the server name. Otherwise I would doublecheck the user, are you sure it’s cn and not uid?
Other than that I would play a bit around with the base_dn, to make sure that one is correct. But I’m not an LDAP expert myself, so i’m not that well placed to give the best advice on it 😄
I’m also not sure in how far spaces in the name are possible or not. Like ‘Generic Accounts’
c
Sure let me play with it a bit and ill report back
still getting the same. I have played around with the basedn and moved the user to a OU that doesn't have a space in it. This is an active directory server.
I can confirm that data is being returned. I added a print(rdata) to https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/ldap.py on line 127 and was able to see data returned
g
would you mind pasting the output of that print statement?
c
I don't think I can past that as it would dump out all of my users. I added a try except block around that loop.
Copy code
for dn, attrs in rdata:
    try:
        if (
            "inetOrgPerson" in attrs["objectClass"]
            or "posixAccount" in attrs["objectClass"]
        ):
            yield from self.handle_user(dn, attrs)
    except:
        print(f'Skipping {dn} {attrs}')
Copy code
[2021-04-29 17:09:34,307] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'ldap', 'config': {'ldap_server': '<ldap://dc.internal.test.com>', 'ldap_user': 'CN=datahub_ldap, OU=Testing, OU=testUsers, DC=internal, DC=test,DC=com', 'ldap_password': 'removed', 'base_dn': 'DC=internal, DC=test, DC=com', 'filter': '(objectClass=*)'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://10.16.13.173:8080>'}}}
Skipping None ['<ldap://DomainDnsZones.internal.test.com/DC=DomainDnsZones,DC=internal,DC=test,DC=com>']
Skipping CN=VolumeTable,CN=FileLinks,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={C48548EC-123B-49AD-A693-789ACE661C39},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={06F2631B-1178-4749-9423-BB87C3B58A8C},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={9A0D50A6-F99C-4EBE-B558-8F0245CE8BB5},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={0D3277DA-0CDC-4AEF-A140-1492CC32620C},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={8A91D8E0-DD8F-4712-8442-972B145A4284},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={5BBF5D67-2F3B-45C9-9F20-5D983DA9281C},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={B63A8ED5-75B4-4941-9793-858CD2AC0223},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={61149DD0-AA8C-4C12-B50F-7A05EFCE303A},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={145F662B-A886-4914-8C6C-28D5A7C1BF1A},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={08BBCF2F-B43D-4C30-85CB-35D772EE118E},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={85C183B3-D9E7-462D-92D1-7EB7BCFD1C46},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={3FFB2376-3751-403E-A6EF-6DEBFF260652},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={ED3732F5-AFAF-4D0F-9A62-0F36438D02BD},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={A3A61EBD-ACAC-4820-A8D0-6F1B29C1A861},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={E9696666-3133-4C1C-ABBE-C3D7E8DC5180},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={A7F7BD61-B44F-4136-BB6C-658F258A5AE2},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={40B5209C-8CF4-402F-A500-1DA8CC80744D},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={D52F0189-63AF-4958-8036-E420C31EA83F},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={CF2395C4-774C-4F7C-B772-6BF3AB3CCAF5},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={3EC2727C-2C9A-4A4F-8AEC-008DABC9FC01},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={65484C72-A54D-4FD0-8F2F-7DDDBC165AE5},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={440816C2-CA64-4B5E-A6E0-6862C6720457},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={5AD9DC59-5117-4E3A-B78D-31C74E93A039},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={DD84C1B1-FC14-444E-98BD-537D9C48802E},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={1DD9637F-F952-4FEF-87CF-96B62459E7CA},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN={4E328457-54C0-4689-81FC-95186B2A1FDA},CN=Policies,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecPolicy{72385230-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecISAKMPPolicy{72385231-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{72385232-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{59319BE2-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{594272E2-071D-11D3-AD22-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{72385233-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecFilter{7238523A-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{59319BDF-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{7238523B-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecFilter{72385235-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecPolicy{72385236-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecISAKMPPolicy{72385237-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{59319C04-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{59319C01-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecPolicy{7238523C-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecISAKMPPolicy{7238523D-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{7238523E-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{59319BF3-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{594272FD-071D-11D3-AD22-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{7238523F-70FA-11D1-864C-14A300000000},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNegotiationPolicy{59319BF0-5EE3-11D2-ACE8-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=ipsecNFA{6A1F5C6F-72B7-11D2-ACF0-0060B0ECCA17},CN=IP Security,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=Password Settings Container,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=MicrosoftDNS,CN=System,DC=internal,DC=test,DC=com {}
Skipping CN=NTDS Quotas,DC=internal,DC=test,DC=com {}
Skipping CN=Keys,DC=internal,DC=test,DC=com {}
Skipping CN=TPM Devices,DC=internal,DC=test,DC=com {}

Source report:
{'failures': {}, 'warnings': {}, 'workunit_ids': [], 'workunits_produced': 0}
Sink report:
{'failures': [], 'records_written': 0, 'warnings': []}

Pipeline finished successfully
Still doesn't show it grabbed any data.
g
Huh I haven’t seen anything like that before
What active directory system are you using?
c
Microsoft Active Directory, Windows Server 2016
g
got it - is there an easy way for me to spin one up locally?
c
ha thats a great question
c
If that would be too complicated, it can also be a way forward to just mask the sensitive data from the print and build a simple test around it?