Data analysis and statistics

IVL has long experience in analyzing different types of data and can serve as coach or statistical consultant. We offer support throughout the process, from planning of the experiments through the collection and analysis of the data, to the visualization and presentation of the results in an understandable way.

Our specialists have good knowledge in managing and analyzing environmental and process industrial data. Data can consist of anything from small to very large data tables (Big Data) and can come from, for example, surveys, experiments, measurements or databases.

Examples of areas where IVL has performed data analysis:

  • Environmental monitoring - Monitoring programs often generate very extensive and complex data. We perform for example analysis of trends and receptor modeling. We use multivariate receptor modeling to identify what proportion of particles and metals in air samples originates from road traffic.
  • Land - We model total concentrations of various substances in soil samples to investigate the relationship between various parameters such as geochemistry and pollution situation. In addition to this we also work on sampling strategies and sampling errors for a certain sampling area.
  • Information and instructions on the use of statistics. IVL has contributed in all parts related to multivariate data analysis on the web site

Different problems require different approaches and tools

We work with data that have one or more variables, resulting in anything from small to very large data sets (known as Big Data).

Examples of classical statistics that we use for one or a few variables:

  • Measures of central tendency (typical value, mean, median)
  • Dispersion (variance, standard deviation)
  • Uncertainty in the data (standard errors and confidence intervals)
  • Hypothesis tests (t-test, ANOVA)
  • Correlation analysis
  • Regression analysis

Examples of multivariate statistics that we use for many variables simultaneously:

  • PCA (principal component analysis) to summarize and evaluate the data
  • PLS (Partial Least Squares Regression to Latent Structures) regression to find relationships in data
  • PMF (Positive matrix factorization) or COPREM (Constrained Physical Receptor Model) for factor analysis and receptor modeling, for example to identify and quantify the sources of air pollution.

If you are interested in getting help to analyze and extract information from data from  industrial processes or from drinking or wastewater treatment plants, read more on the page Industrial modeling.