<!channel> Hi all! I'm getting on a data panel in ...
# advice-metadata-modeling
m
<!channel> Hi all! I'm getting on a data panel in a few hours to talk Data, and one of the questions we will be answering is: β€œWhat is a Dataset?” I thought it would be a cool idea to poll this community and ask you all, What is a Dataset?? I promise I will share your thoughts in the panel. Send your thoughts on this 🧡 πŸ˜€
🍿 8
Files in a folder (vote +1 if you like this)
A single file
A table in your warehouse
A table in your operational MySQL db
A Dbt model
A Kafka topic
A view (whether materialized or not)
A search index in Elastic
A pipeline defined by a view that materializes it
A micro service API (like Open API)
(End of stream from me) vote on the choices above or add your own πŸ˜€
c
A collection of data in some structure, may it live in a google sheet, data warehouse, dataframe etc
plus1 3
teamwork 1
q
@cool-france-3974, so if it's Key:Value structure it's not a dataset? 🀨
teamwork 1
c
haha yes. I tried πŸ˜›
g
I forget where I originally found this, but this is the best definition of a dataset in my mind: If we can assume data is:
Copy code
Data is the raw numbers that we capture according to some agreed to standards. Having consistent standards is very important since having data recorded according to different standards can be extremely problematic. For example, there is the age old question of how long is a piece of string? The answer depends on what measurement standard you are using. If we use the Metric system we may come up with some number. That number can vary depending on if we are using meters, centimeters, or millimeters. Using British Imperial/US Customary units will result in an entirely different number. As such, one of the most important steps in any analytics effort is defining standards we are applying. When doing analytics projects, one of our first tasks is to go through the client's current data structure and normalize that data. In other words, we make sure that all the like things are being measured in the same way.
Then a dataset is a collection of data points. The medium may change, and could be all of the above examples you gave Shirshanka. There are further abstractions we can create from here: collection of datasets can help create information, and a collection of information gets us to insights. But that may start to be off topic πŸ™‚
teamwork 1
Quick google search tells me this is the source, in case anyone else is interested in reading the rest: https://online.ben.edu/programs/mba/resources/data-vs-information-vs-insight
m
ended up being a pretty cool panel!