Preview of Data
Basically, trilateral data contains raw transactional data. It is low-level details which are impossible to analyze on its own. On the other hand, summarised data is something which can be presented in a meaningful and structural manner. It can be called as composed data. For a developer, the data which resides between detailed transactional data and the fully composed data is important.
This datasheet is required by the developer, but it still required more for final presentation. These data sheets are used by the user to extract data and shape it further according to their requirement. Sorting data to the perfect form to avoid further manipulation on the front end should not be the goal, instead of reducing data to datasets should be the aim.
Designing datasets is a delicate job. You have to make your data sufficient for analysis without putting much pressure on your user’s computer. Depending on how wide is your dataset, you have to come up with clear and concise dimensions. There are numerous dimensions and metrics are need to be included in the dataset. Such as;
- Is the variety of content an edge case or something that will be used frequently? Go with the 80/20 rule: 80% of users generally need 20% of what’s available.
- Every dimension needs to be finite. They should always have a predetermined set of rules and values. For instance, always increasing product inventory might be overwhelming, however sorting products according to category might be great.
- If possible try to aggregate data as per date. You can aggregate data year wise or quarterly as well. But, don’t go deeper in your aggregation than that.
- The dimension should be kept less as the dimension with lesser value is easier to perform. Such as, if you have a dataset with 100 rows. If you add another dimension to a dataset which has three values, then dataset becomes 100*3 = 300 rows. The dimensions keep on compounding with each value you add.
- In multiple dimensional datasets, don’t use summarised measures as in this measurement data is recalculated every time you add something new.
Folks, before trying out any of the above-mentioned data structures, make sure that you properly understand your data. Otherwise, you might interpret some wrong assumptions regarding data and this is wrong as data quality needs to be your top priority. Never assume anything in the data analysis and to understand data if you have to look up at data dictionaries, then do that.