I am working for a scientific institute and we’re doing an analysis on air quality regarding heart diseases. I have the task of summarizing data within a specific region regarding PM2.5 and PM10 readings from sds011 sensors.
When it comes to abnormal outliers (µm/m^3 >500 PM2.5 & µm/m^3 >200 PM10), what is the general consensus of these numbers, especially when only one out of many sensors in the same area reports the high number?
From our own conclusions we have drawn that it is merely as stated an abnormal outlier and should be dismissed, but could there be other natural reasons? The city I’m analysis is a medium sized city in Germany.
It is not abnormal that sometimes a low cost sensor shows an unexpected value. If this appears only on a peak (short period) the value can be discarded. If it is on a long period and only on one sensor in the same area, the sensor is deficient and shall be forgotten. In the present case such high values may be due to foggy atmosphere but this should be reported on other close sensors too.
Regarding the present scientific topic, Deciphair is also in contact with a German institute about this subject and did a pre-analysis of a major town in Germany you can check on Deciphair.com web site. Maybe we have the same contact !
I’m also working on summarizing data rearding PM2.5 and PM10 from sds011 sensors in Freiburg in Germany for my master thesis.
I have also a question about the abnormal outliers: Is there any paper that shows that values >500 for PM2.5 and >200 for PM10 are wrong values and can be discarded? Because I have also sometimes very high values in short periods and I don’t know at which number size I should delete the value from my dataset.
Normal values in a typical environment are below 100. However it may happens that sds011 sensors show values above 200 particularly when the atmosphere is foggy; this is the case at that season and this does not means that the air is polluted by fine particules as the sensors is measuring water drops. You can ignore these peaks by limiting their values to 100; anyway what is interesting is the average value over a given period (a few hours, a day) and usually peak values (which are normally short) do not significantly impact the average value.
There is no specific paper on this point but you can refer to the dedicated topics on how to handle sds011 sensors on Deciphair web site (see address on the above post) at the technical tab.
The sensor is working perfectly and the peaks do not correspond to pollution; these peaks are due to the specificity of the sds011 sensor which is sensitive to humidity when there is fog. You have just to forget this phenomenon by capping the excessive values in your processing by a lower value (<100)