Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
870 views
in Technique[技术] by (71.8m points)

algorithm - Conting events between sequential stages in a process using R

I've been trying to resolve an exercise from a textbook where I am faced with the challenge of count different events between sequential stages of an industrial process.

Information related to the process: A test Subject is made to go through a 3 stage process with stages A, B and C respectively the first one being A, second B and finally C; a test Subject may abandon the process at stages A or B and then start again from point A, each time the process takes place a dataset is created with the IDENTIFICATION of the test subject, the TIMESTAMP in which the stage took place and a unique VISIT_CODE During any stage, a test subject may trigger an "ALERT" and this will be recorded with the TIMESTAMP, ALERT_CODE and test subject IDENTIFICATION.

What is to be calculated: I have to create a code in R to count how many ALERTS where generated by the test subjects between stages A and B, between stages B and C and finally how many ALERTS where generated after C. Please note that a test subject may at some point abandon the process to later on start again from point A.

The textbook gives a HINT: "Take a closer look at the stage a test subject is currently to then determine if the ALERT is generated from stage A and before a stage B and also if the test subject triggers an ALERT after stage B and before stage C, but keep in mind that if a test subject abandons at stage A and triggers an ALERT if the TIMESTAMP of that ALERT is smaller than their next attempt at stage A then it should be credited as an ALERT after a stage A"

as another hint, the textbook reveals that the ALERTS after Stage C where only 1 and that it was trigger by test subject W-6 with ALTER_CODE AYUJ-3915716168. The datasets are:

Stage Process

TableA<-tribble(~STAGE, ~TEST_SUBJECT,~TIMESTAMP,~VISIT_CODE,
"A",    "XYU-1",    "10",   "BKO",
"A",    "XYU-1",    "15",   "JUJD",
"B",    "XYU-1",    "20",   "DUDH",
"A",    "FF-09",    "25",   "KSIWJD",
"B",    "FF-09",    "30",   "AJAKAM",
"C",    "FF-09",    "35",   "ZISKS",
"A",    "UU-89",    "40",   "NNXJD",
"B",    "UU-89",    "45",   "DDUWO",
"A",    "I-44", "50",   "JIWIW",
"A",    "W-6",  "55",   "SHDN",
"B",    "W-6",  "60",   "IWOLS",
"C",    "W-6",  "65",   "JDDD",
"A",    "U-90", "70",   "DJDKSMS",
"B",    "U-90", "75",   "NDJSM",
"A",    "T-87", "80",   "DNDJDK")

Alerts dataset

TableB<-tribble(~TEST_SUBJECT,~TIMESTAMP,~ALERT_CODE,
"XYU-1",    "11",   "AYUJ-151571406",
"XYU-1",    "12",   "AYUJ-487008829",
"XYU-1",    "28",   "AYUJ-211990388",
"FF-09",    "32",   "AYUJ-4177221842",
"W-6",  "56",   "AYUJ-1300211351",
"W-6",  "63",   "AYUJ-3014305494",
"I-44", "67",   "AYUJ-4454800551",
"U-90", "73",   "AYUJ-1079921935",
"U-90", "76",   "AYUJ-3348911727",
"U-90", "79",   "AYUJ-2381219626",
"T-87", "82",   "AYUJ-4778326278",
"W-6",  "89",   "AYUJ-3915716168")

SOLUTION:

The textbook states that the proper solution for this problem is:

Alerts between Stages A & B including alerts from test subjects that abandoned the process in the attempt nth at stage A Alerts between Stages B & C including alerts from test subjects that abandoned the process in the attempt nth at stage B Alerts after stage C
AYUJ-151571406 AYUJ-211990388 AYUJ-3915716168
AYUJ-487008829 AYUJ-3014305494
AYUJ-1300211351 AYUJ-3348911727
AYUJ-1079921935 AYUJ-4177221842
AYUJ-4778326278 AYUJ-2381219626
AYUJ-4454800551
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a data.table apprioach, resulting in a list of alerts after a-b-c..

library(data.table)
# Make tables data.table format
setDT(TableA)
setDT(TableB)
# set TiMESTAP to numeric
TableA[, TIMESTAMP := as.numeric(TIMESTAMP)]
TableB[, TIMESTAMP := as.numeric(TIMESTAMP)]
# Create data.table with Stage intervals by test subject
DT.interval <- TableA[, .(start = min(TIMESTAMP)), by = .(TEST_SUBJECT, STAGE)]
# Perform rolling join
TableB[, Stage := DT.interval[TableB, 
                              STAGE, 
                              on = .(TEST_SUBJECT, start = TIMESTAMP), 
                              roll = Inf]][]
# Split alerts by stage
split(TableB[,3:4], by = "Stage")
# $A
#         ALERT_CODE Stage
# 1:  AYUJ-151571406     A
# 2:  AYUJ-487008829     A
# 3: AYUJ-1300211351     A
# 4: AYUJ-4454800551     A
# 5: AYUJ-1079921935     A
# 6: AYUJ-4778326278     A
# 
# $B
#         ALERT_CODE Stage
# 1:  AYUJ-211990388     B
# 2: AYUJ-4177221842     B
# 3: AYUJ-3014305494     B
# 4: AYUJ-3348911727     B
# 5: AYUJ-2381219626     B
# 
# $C
#         ALERT_CODE Stage
# 1: AYUJ-3915716168     C

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...