This article will answer questions related to creating a new Data View.
- Q: What is a Data View?
- A: A Data View is the Citrination way of performing data analysis over one or more data sets.
- Q: On the property annotation step, why do I need to specify whether a given column is Real, Categorical, or an Inorganic Chemical Formula?
- A: Real means that the property has a numerical value and is continuous. For example, the yield strength of a material is a real valued property; it can be 100 MPa or 101 MPa. Categorical means that the property can have only a few discrete values. For example, Heusler type is acategorical property, because a Heusler alloy can either be a full Heusler, half Heusler, or inverse Heusler. Chemical formulas are treated differently in the Citrination platform. Our platform can parse these formulas to calculate many different formula features based on the chemical formula.
- Q: On the property annotation step, what is the difference between choosing Input, Output, Ignore, and Latent Variable?
- A: Inputs will be used as input features to the machine learning model. Outputs are the properties that you would like the model to predict. Ignore means that you would like to keep track of this property, but don't want to use it as an input or output of the model. Often names and sample IDs are tagged with Ignore. Latent Variables are properties that you would like the model to predict, and which you think could also be useful in predicting other outputs. The latent variables are used to build a hierarchical models, so the predictions of the latent variable are then used as inputs to help predict the other model outputs. Each model needs at least one Input and at least one Output.