R-Script or tips to download and use sensor data

Hello everyone,

I am looking for tips to download easily sensor data from a specific country (Belgium) and some periods but I am so difficulty to find it on the https://archive.sensor.community/. I don’t understand properly how find that as there is not really filter to do a specific research of data.

Moreover, I am looking for data to do a paper for my university studies. So I would like to know if anyone has an R-script of something similar to use the .csv data file to do some basic statistics for my project ? For example to calculate and work with Excel or R-studio once the data is downloaded.

Thanks so much for your help,

Bayens M.

Hi,
short after your first mail regarding this question someone send this link:

2 Likes

Thanks so much for contributing !
This paper looks very interesting for my project.

I would do:

  1. load the current API data in order to get the list of all the sensors as json: https://data.sensor.community/static/v1/data.json

  2. import the data in QGIS: Download QGIS

  3. get a shapefile of Belgium and import it in QGIS

  4. use the intersection algorithm in QGIS in order to get the sensor IDs in Belgium

  5. write a simple script in python to download in the archives according to IDs and dates :

    import requests
    #Mettre les ID des capteurs dans le tableau séparées par des virgules
    sensor_id = []
    #Mettre les dates dans le tableau au format 'YYYY-MM-DD' séparées par des virgules
    dates = []
    url_deb = 'https://archive.sensor.community/'
    
    for n1 in range(0,len(dates)):
        date = dates[n1]
        url_ok = url_deb + date
        r1 = requests.get(url_ok)
        source_code = r1.text
    
        for n2 in range(0,len(sensor_id)):
            test = 'sensor_'+str(sensor_id[n2])+'.csv'
         
            if test in source_code:
                split1 = source_code.split(test)[0]
                split2 = split1.split('<a href="')[-1]
                url_fin = url_ok + '/' + split2 + test
                r2 = requests.get(url_fin)
                data = r2.text
                #Les données vont s'afficher dans le terminal. 
                print(data)
1 Like

I guess your question has been answered to the point of your definition.
But as allways, there is a way by foot: Get the csv-File, run it through a search for all LAT/LONG between Most-North-BE/Most-South-BE, Most-West-BE Most-East-BE, write line into a new CSV-File. Now that the file has a reasonable size, you may import the CSV into the EXCEL/ACCESS Database Engine (Data-handling is the same, EXCEL/ACCESS is the front-end)
If you want to do more analysis, using SQL may be your partner.
After you have figured first sensors outside the scope, it’ll be routine to delete those outside the borders.
As I mentioned: It is a way by foot and it creates some ‘Transpiration’ but it is a way.

2 Likes

hi all
I’ve developed a small ShinyApp to download of historical data from sensors. The app also makes a brief analisys of data using the OpenAir package.
You can find the source code of the App here
Maybe we can add a country selector and translate it if you find it usefull
Saludos

is not online anymore?

You should be able to use OpenAir which was written by David Carslaw at University of York. See openair | David Carslaw

this is the link that works https://nanocastro.shinyapps.io/CS_data_retrieval/

Another way:
Download the monthly archive of interest and filter the csv with csvkit/csvgrep and a regex for the sensors you need.

It looks to me that AirSensor only works with PurpleAir, is that right?

Also OpenAir for the UK only looks at DEFRA sensors as far as I can tell. Unless there is now an import from Sensor Community? The OpenAir manual doesn’t mention Sensor Community at all.

So - is there any R package interface between SensorCommunity and OpenAir as OpenAir seems a very nice R package which also interfaces with ADMS meterological models?

You mean the openair .R library ?
Indeed, the examples are using UK data but you can edit the scripts and download what you want.
I used openair to produce some analysis for the City Marmande in France.

1 Like

Thanks - so are you saying that SensorCommunity API requests have to be written in Python?

If I tried to translate this into R would this work with the API?

Tina

Thats the kind of thing I need, yes - but I want to build my own dashboard so I can also bring in traffic data, both real-time and historic. Is your R code available as an .R package (like for OpenAir)? OpenAir worked straightaway using that but I couldn’t see how to import from SensorCommunity - it just needs another import routine, written in R, specifically for SensorCommunity. Has anyone done that for OpenAir?

Christina

Yes, I’m happy to do it like that initially. please could you point me how to download the monthly archive using R, as I am a newborn baby on this forum?

Christina

In .R:

list.of.packages <- c("lubridate","tidyverse", "tidyr","RCurl", "httr","readr","ggplot2","stringi")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

library(lubridate)
library(tidyr)
library(tidyverse)
library(RCurl)
library(httr)
library(readr)
library(stringi)
library(ggplot2)
set_config(config(ssl_verifypeer = 0L))

setwd("/Users/PJ/CODES/R_VGA")

url_dir<-"http://archive.sensor.community" 
inicio <- "2023-02-01"
fin <- "2023-07-28"

#sensor_id=c("76689", "76699", "76701", "76703", "76735", "76737", "77531", "77535", "80342")
#sensor_type=c("sds011", "sds011", "sds011", "sds011", "sds011", "sds011", "sds011", "sds011", "sds011")

sensor_id=c("76560", "76685")
sensor_type=c("sds011", "sds011")


inicio_date <- ymd(inicio)
actual<-inicio_date
fin_date <- ymd(fin)

dates <- seq(inicio_date, by = "day", length.out = (fin_date-inicio_date)+1)
print(dates)

url_test<-paste(url_dir,"/",actual,"/",actual,"_",sensor_type[1],"_","sensor","_",sensor_id[1],".csv",sep="")
print(url_test)

x<-GET(url_test)
http_status(x)
if(x$status_code==200){
  print(url_test[1])
  luft_data_fin <- read_delim(url(url_test[1]),delim=";")
  print("ok")
}else{
  print("error")
}

print(luft_data_fin)

luft_data_fin <- luft_data_fin[0,]
print(luft_data_fin)

for (i in 1:length(sensor_id)){
  print(sensor_id[i])
  for (j in 1:length(dates)) {
    print(dates[[j]])
    print(sensor_id[i])
    url_day<-paste(url_dir,"/",dates[[j]],"/",dates[[j]],"_",sensor_type[i],"_","sensor","_",sensor_id[i],".csv",sep="")
    print(url_day)
    x<-GET(url_day)
    http_status(x)
    if(x$status_code==200){
      luft_data <- read_delim(url(url_day[1]),delim=";")
      luft_data_fin<-rbind(luft_data_fin,luft_data)
    }else {
      print("error")
      print(dates[[j]])
    }
  }
}


luft_data_fin <- luft_data_fin[,-c(2,3,4,5,8,9,11,12)]

write_csv(luft_data_fin,file=paste("datasensor_suite2.csv"),na="NA",append=FALSE,col_names=TRUE)

2 Likes

In Python:

https://archive.sensor.community/csv_per_month/2023-09/2023-09_sds011.zip

For example, download the archive for September for the SDS011.

pip3 install csvkit
csvgrep -d ";" -c sensor_id -r "^2492$|^4686$|^10557$" 2022-02_pms7003.csv > testnewfin.csv

The regex is made with the ^ and $ and | for the list of sensor.
It should filter the huge csv for just the sensors of interest.

http://archive.sensor.community/csv_per_month/

Then you click on the month you want.

hi Tina
the shinyapp is here and the code scrip is the github repo. You can strip the download part from there. It should work for 2023 data (for older data you need to change the url!). It uses de OpenAir package, it is really easy to adapt for CS data.

hi @pjg
I see some changes in the code, cool :slight_smile: is there some place/repo were I can take a look ?