https://datahubproject.io logo
Join Slack
Powered by
# metadata-day22-hackathon
  • u

    user

    05/05/2022, 9:39 AM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Add functionality to a postgres ingestion script to match tables/columns to business glossary terms. For at least some of our postgres datasets we have SHACL based ontologies that refer to SKOS vocabularies for definitions. I'd like to come up with an ingestion script that uses our SHACL ontologies to map ingested data to business glossary terms. Looking for Teammates: Maybe Slack Contact: Niels Hoffmann _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/07/2022, 6:52 AM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: If we can govern hive schema in code, rather than Hive metastore(HMS), the benefits are that: 1. The users can add customized metadata on datasets and columns to represent business logic. 2. The users get the version control for free. 3. Customized validation on metadata before check-in the code. So we are going to let users codegen tables in HMS as files in a repo. Then add metadata, validate and check-in code. In the CI/CD time, ingest the files into Datahub. Looking for Teammates: No Slack Contact: Mingzhi Ye _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/07/2022, 2:10 PM
    excited New Hackathon Proposal! Topic: Active Metadata: Supercharging data governance practices with metadata change events Proposed Hack: Orchestration of metadata across platforms - create a federated metadata management system that makes these different tools and systems talk to each other, thus making data assets across these systems interoperable. Looking for Teammates: No Slack Contact: MetOrche _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/07/2022, 2:42 PM
    excited New Hackathon Proposal! Topic: Data Products and Data Mesh: Creating a schema-first ecosystem organized around domains Proposed Hack: ACAD - A community for better future ! Problem:- - In today's world school students are exceptionally talented but they don't have the right platform to showcase their talent. - Despite having lots of opportunities, these students face difficulties in finding the right opportunity because they are in scattered form. - In the search of the right opportunity, they often get indulged into wrong things and waste a lot of time. - Also, Organisations whose target audience is school students face difficulties in finding them at one place. Research:- - We did a preliminary survey and what we found is that 90 out of 100 students are not clear about their passion because they never got a chance to explore themselves outside their - schools. - Students don't have a big community to support and explore their talent. - Even if they know about the competitions and somehow manage to secure good positions then also they don't have the right platform to share and showcase their skills. - When an organization plans to conduct an event, they waste a lot of money on advertisements but even then they are not able to reach most of the target community. Solution:- - We made a platform ("Acad"), that aims at making biggest School Student Community around the world. - Through this platform, students will get aware about the opportunities and it will also enlarge their field of view. - It will help students to showcase their skillset and increase their reach. - It will also help organizations targeting school students to get maximum reach in minimum investment. Benefits:- -> Student : - Access to biggest School community out there. - Can showcase their skillset . - Reach out to the organization for opportunities. - Can explore better with a community in order to find their dream career option. - Making a Global community would help to explore wide range of fields. - Always have a help regarding anything with a global community. -> Organizations: - Providing them a platform to aware students about their events. - Would help them get a better amount of audience in their event. Core Technologies:- - Frontend (React) - Backend (Node.js) - Azure (Cloud) - MongoDB (Database) - JWT - ML Business Model:- ->Revenue Streams - Website traffic - ad sense - Membership - Marketing for organizations - Hosting Events Looking for Teammates: No Slack Contact: Shubham Choudhary _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/07/2022, 7:42 PM
    excited New Hackathon Proposal! Topic: Data Products and Data Mesh: Creating a schema-first ecosystem organized around domains Proposed Hack: Prototype for Ads analytics that can be integrated in LinkedIn Looking for Teammates: Yes Slack Contact: Wizards _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/07/2022, 8:01 PM
    excited New Hackathon Proposal! Topic: Active Metadata: Supercharging data governance practices with metadata change events Proposed Hack: I wanted to solve a problem related to Natural Language Processing. My system will detect the real time mood of the person in the Camera and recommend songs according to the mood. Looking for Teammates: Maybe Slack Contact: Mohit Hinwar _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 7:00 AM
    excited New Hackathon Proposal! Topic: Active Metadata: Supercharging data governance practices with metadata change events Proposed Hack: We have a hack named as nivigo 2.0 where it can state the weather in any location( even remote places too ). It is incubated with voice support and done using modern web development tools. It is also done using a framework flask and done the whole backend using python. Looking for Teammates: No Slack Contact: _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 9:10 AM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Problem Statement : Speech Emotion Recognition (SER) on live calls while creating events Description : Due to covid ,organizations ,Ed-tech companies and tech-giant companies are moving towards the ‘hybrid work’ becomes mainstream, and we spend more time collaborating, learning, and having conversations through online.Solving real world problem where it recognizes speech and their emotions according to the data we can act with then know their feelings and interact with them better Technologies : Artificial intelligence , machine learnning. Tech Stack :python,Django. Looking for Teammates: Maybe Slack Contact: ANISH GANDLA _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 9:40 AM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Anti-corruption solution through IT Looking for Teammates: No Slack Contact: MUZEEBURRAHAMAN _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 11:45 AM
    excited New Hackathon Proposal! Topic: Active Metadata: Supercharging data governance practices with metadata change events Proposed Hack: We're planning to build a marketplace where the people can buy second hand things on EMI basis. Firstly we're working on iOS development. After that we'll start working on Android development Looking for Teammates: Yes Slack Contact: _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 1:15 PM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Education: With the onset of online eductaion, the quality of departing knowldege has been more or less compromised, the quality can be checked if there is an automated method for evaluation which shall rely on features similar to human level biasness while checking and evaluating exams. The system shall allow automated setting of questions on a specific topic with an option to increase/ decrease hardness (measures using MFCC) and conduct exams timely, proctoring the whole time, and automated comparision of submitted exam using text sumarrization and knowledge graph. Looking for Teammates: No Slack Contact: _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/08/2022, 1:54 PM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Business metric tracking in real time and fraud transaction detection. We will use Machine Learning and benchmarking to measure and optimize the efficiency of the business processes. We will XGBoost for the fraud detection and various parameters of a transaction. Looking for Teammates: No Slack Contact: _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/09/2022, 2:01 AM
    excited New Hackathon Proposal! Topic: Data Products and Data Mesh: Creating a schema-first ecosystem organized around domains Proposed Hack: I am passionate about Machine Learning and Data Science, i will be intrested in solving problems related with data, any one of the Deep Learning framework like Tensorflow or Pytorch Looking for Teammates: No Slack Contact: Datacrew _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/09/2022, 8:58 AM
    excited New Hackathon Proposal! Topic: Data Products and Data Mesh: Creating a schema-first ecosystem organized around domains Proposed Hack: I am passionate about Machine Learning and WebDev, i will be intrested in solving problems related with data Looking for Teammates: No Slack Contact: ABHINAV PRAKASH _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/09/2022, 6:23 PM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Companies that desire to adopt Datahub as a critical component in a Self Service Analytics culture need to follow metrics that help Data Governance Teams to measure their work. The Analytics dashboard only provides metrics that show Platform usage, but not metadata coverage. Due to this, our goal is to increase the amount of information provided on the Analytics Dashboard page to give more details about metadata coverage. Through this, we seek to answer questions such as: * How many tables/views have descriptions? * How many tables/views have defined owners? * How many data assets there're per tag? * How many ML Models have the technical owners defined? * How many tables are failing their data quality tests? With this, we expect that Data Teams can better prioritize their work and fulfill the metadata to drive more platform usage, allowing users to self-service. Looking for Teammates: No Slack Contact: @Vinícius Mello _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/10/2022, 3:51 AM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: I will be solving the language problem for Spanish people to get connected to the world Basically it's a language translation from English to Spanish using NLP Looking for Teammates: Maybe Slack Contact: Harshit Singh _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/10/2022, 5:03 PM
    excited New Hackathon Proposal! Topic: Data Products and Data Mesh: Creating a schema-first ecosystem organized around domains Proposed Hack: A schema-first approach focuses first on structure, then on content, and finally on interfaces. It requires conducting research in order to determine how best to formally represent the concepts, relationships and rules that constitute a particular domain of knowledge. The objective is to establish an framework for planning, designing and creating content as structured data, so it can be efficiently and flexibly published to different users interfaces. Structure creates understanding by identifying, contextualising, categorising and interconnecting the individual data and information elements that belong to a common subject domain. Moreover, by using common structural patterns across organizational and domain-area boundaries, it is possible to establish meaningful connections to other sources of content on the web. Now a days the biggest issue we are facing is providing emergency services in least time to the user. For example, a person may require medical emergency or maybe someone may face any kind of harassment. We can provide a solution for them through digital ways through our application by providing an option for them which will be running as an background application in the system and it will let him or her connect the nearest pharmacy or police station available nearby with a single tap. And we will provide the solution through minimum complexity and optimal strategy. We will be using flutter for developing the demo application and spark to demonstrate the use of big data usage. He can use to connect to various domains in application stored in the data queue which gives the nearest service centre for the individual. Our solution will be user friendly and will be optimal in nature. And for the delivery we follow the prior service available at minimum time to user that can be determined through some parameters which will be further determined by machine learning techniques. The problem we discuss contain an operational data for all the safety contact that include doctor, police men , speed dial contacts , ambulance on the data then we convert it to analytical data using ETL(Extract , Transform, Load). Code: The integration of location API help to decipher a shortest distance algorithm for each service such that the deliverables of service will be available on time without any loss. Also provided transportation facility for the user. Data and Metadata: The service data received after interaction with the system and location along with GPS for nearby location with time parameter given as 5 mins. All the data comes in table arranged in priority order of time such that while extraction and loading data into analytics most . The speed dial contact will filter out most recent favourites and then add it into data queue. Infrastructure: SAAS be used as infrastructure for our schema first architecture Looking for Teammates: Yes Slack Contact: _View all topics here: https://bit.ly/37JXs65 _
  • u

    user

    05/12/2022, 7:26 PM
    excited New Hackathon Proposal! Topic: Business Glossaries, KPIs, and Metrics: governing the intersection of technical data + business logic through code Proposed Hack: Poppin bottles, chasing costly snowflake-dbt-models Problem: dbt models are easy to create, but also easy to make expensive. As a platform owner, you need to keep track of your costs. Solution: For each dbt model, I will calculate the cost and expose it on the dataset/model page on dbt. This should be possible using the query logs, and connect it to respective model. This does then need to be ingested into Datahub somehow. If possible it would be exposed as a dashboard over time, and potentially show different kinds of insights. I however, do not know much about Datahub, we do not use it at my company today. I have only been following the project for some time and I hope to be able to contribute with this feature somehow. Looking for Teammates: Maybe Slack Contact: @Kevin Neville _View all topics here: https://bit.ly/37JXs65 _
  • m

    many-family-96756

    05/13/2022, 2:40 AM
    Hi. I saw this in reddit/dataengineering. Decided to try the hackathon.
  • u

    user

    05/16/2022, 9:52 PM
    excited New Hackathon Proposal! Topic: Active Metadata: Supercharging data governance practices with metadata change events Proposed Hack: Adding upstream/downstream lineage dependency to the dataset with a single click of a button Looking for Teammates: No Slack Contact: Salih Can _View all topics here: https://bit.ly/37JXs65 _
  • b

    brave-pager-62740

    05/17/2022, 9:20 PM
    Hey channel, I have met an error when trying to quickstart datahub, it worked yestarday, but failed this time. The exception message says the datahub-gms container is running but it’s not healthy, and this is the exception from the container, please help thanks!
    Copy code
    21:08:53.842 [main] INFO  o.a.k.clients.producer.KafkaProducer:1182 - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
    
    21:08:53.842 [main] INFO  o.a.k.clients.producer.KafkaProducer:1182 - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
    
    21:08:54.068 [main] ERROR o.s.web.context.ContextLoader:313 - Context initialization failed
    
    org.springframework.boot.context.properties.ConfigurationPropertiesBindException: Error creating bean with name 'configurationProvider': Could not bind properties to 'ConfigurationProvider' : prefix=, ignoreInvalidFields=false, ignoreUnknownFields=true; nested exception is org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'telemetry.enabled-server' to boolean
    
    at org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.bind(ConfigurationPropertiesBindingPostProcessor.java:92)
    
    at org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.postProcessBeforeInitialization(ConfigurationPropertiesBindingPostProcessor.java:78)
    
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
    
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
    
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
    
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
    
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
    
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
    
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
    
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
    
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:953)
    
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
    
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
    
    at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:401)
    
    at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:292)
    
    at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103)
    
    at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1073)
    
    at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:572)
    
    at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:1002)
    
    at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:746)
    
    at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379)
    
    at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1449)
    
    at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1414)
    
    at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:916)
    
    at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288)
    
    at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
    
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
    
    2022-05-17 21:08:54.069:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@2d209079{Open source GMS,/,[file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-8055446704998167747/webapp/, jar:file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-8055446704998167747/webapp/WEB-INF/lib/swagger-ui-4.10.3.jar!/META-INF/resources],UNAVAILABLE}{file:///datahub/datahub-gms/bin/war.war}
    
    org.springframework.boot.context.properties.ConfigurationPropertiesBindException: Error creating bean with name 'configurationProvider': Could not bind properties to 'ConfigurationProvider' : prefix=, ignoreInvalidFields=false, ignoreUnknownFields=true; nested exception is org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'telemetry.enabled-server' to boolean
    
    at org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.bind(ConfigurationPropertiesBindingPostProcessor.java:92)
    
    at org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.postProcessBeforeInitialization(ConfigurationPropertiesBindingPostProcessor.java:78)
    
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
    
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
    
    at org.eclipse.jetty.server.Server.start(Server.java:423)
    
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
    
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
    
    at org.eclipse.jetty.server.Server.doStart(Server.java:387)
    
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
    
    at org.eclipse.jetty.runner.Runner.run(Runner.java:519)
    
    at org.eclipse.jetty.runner.Runner.main(Runner.java:564)
    
    Caused by: 
    
    org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'telemetry.enabled-server' to boolean
    Copy code
    at org.eclipse.jetty.runner.Runner.run(Runner.java:519)
    
    at org.eclipse.jetty.runner.Runner.main(Runner.java:564)
    
    Caused by: org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'telemetry.enabled-server' to boolean
    
    
    ... 42 common frames omitted
    
    Caused by: org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [boolean] for value 'null'; nested exception is java.lang.IllegalArgumentException: A null value cannot be assigned to a primitive type
    
    at org.springframework.core.convert.support.GenericConversionService.assertNotPrimitiveTargetType(GenericConversionService.java:335)
    
    at org.springframework.boot.context.properties.bind.Binder.bind(Binder.java:340)
    
    ... 69 common frames omitted
    
    Caused by: java.lang.IllegalArgumentException: A null value cannot be assigned to a primitive type
    
    ... 78 common frames omitted
    
    at org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.postProcessBeforeInitialization(ConfigurationPropertiesBindingPostProcessor.java:78)
    
    at tLifeCycle.java:73)
    
    at org.eclipse.jetty.runner.Runner.run(Runner.java:519)
    
    at org.eclipse.jetty.runner.Runner.main(Runner.java:564)
    
    Caused by: 
    
    org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [boolean] for value 'null'; nested exception is java.lang.IllegalArgumentException: A null value cannot be assigned to a primitive type
    
    at org.springframework.core.convert.support.GenericConversionService.assertNotPrimitiveTargetType(GenericConversionService.java:335)
    
    at org.springframework.core.convert.support.GenericConversionService.handleResult(GenericConversionService.java:328)
    
    at org.springframework.core.convert.support.GenericConversionService.convert(GenericConversionService.java:193)