OK without going right through it, I've currently found myself tracking the coronavirus 'new infections' data in attempt to establish the direction of travel that we're heading (it's my belief that we're now at a level below that which we locked down under incidentally) but as ever, it's bit a muddy
Now I'm using the daily DoH data, and running the reports as a 7 day moving/ rolling average to omit the peaks of Tuesday and troughs of Sunday and Monday. That's fine. I'm happy that's the right way of doing it. The problem occurs of course with the raw data report which can show an increase in positive tests resulting from the obvious assumption that the more people you test, the more cases you detect.
To try and put this on a level, I've introduced a baseline of 10,000 tests. That is to say divide the number of positives into the number of people tested, and multiply by 10,000. This figure then goes into the moving average until such time that it falls off
What I'm wrestling with however is the sample construct. Throughout March the only people tested were those presenting with symptoms or those who had been in contact with people who'd tested positive. This creates a skewed sample, but is OK for analytical purposes so long as its evenly applied as you still get a trend
Altering a sample during a survey period is of course a nightmare
Is there any method in quantitative analysis that anyone is aware of that could adjust for this?
I'm looking at an output at the moment which has probably inflated the position of March because the sample was targeted. If you then project onto the 10,000 tested
Now I'm using the daily DoH data, and running the reports as a 7 day moving/ rolling average to omit the peaks of Tuesday and troughs of Sunday and Monday. That's fine. I'm happy that's the right way of doing it. The problem occurs of course with the raw data report which can show an increase in positive tests resulting from the obvious assumption that the more people you test, the more cases you detect.
To try and put this on a level, I've introduced a baseline of 10,000 tests. That is to say divide the number of positives into the number of people tested, and multiply by 10,000. This figure then goes into the moving average until such time that it falls off
What I'm wrestling with however is the sample construct. Throughout March the only people tested were those presenting with symptoms or those who had been in contact with people who'd tested positive. This creates a skewed sample, but is OK for analytical purposes so long as its evenly applied as you still get a trend
line. In April (certainly the second half of the month) we began testing key workers (people who weren't reporting symptoms). Naturally the ratio of people testing positive to the number of people tested rises, which might also be attributable to the virus being less prevalent too (probably takes us into the realms of determinant coefficients).
Altering a sample during a survey period is of course a nightmare
Is there any method in quantitative analysis that anyone is aware of that could adjust for this?
I'm looking at an output at the moment which has probably inflated the position of March because the sample was targeted. If you then project onto the 10,000 tested
baseline you get a higher figure than was probably the case. This in turn leads to me potentially over-estimating the prevalence of Covid-19 circa March 15th onwards, potentially rendering the whole conclusion that we're broadly at our March 21st position again wrong
Last edited: