Vote count:
0
I have a data set (~200K rows) similar to this mock data:
mydf <- read.csv(url("http://ift.tt/1z3iXk7"), header=T)
I am trying to widen it from its current form to what I fully understand is an untidy, bad schema (external requirements, etc).
Specifically, what I am looking to do is have one row per student with each duplicate field numbered--i.e.
| StudentID | Major | University | Birthday | EnrollmentDate | CourseID.1 | CourseStartDate.1 | CourseEndDate.1 | CourseDescription.1 | Instructor.1 | Hours.1 | CourseID.2 | CourseStartDate.2 | CourseEndDate.2 | CourseDescription.2 | Instructor.2 | Hours.2 | CourseID.3... (etc) |
|-----------+-----------+------------+----------+----------------+------------+-------------------+-----------------+---------------------+--------------+---------+------------+-------------------+-----------------+---------------------+----------------+---------+---------------------|
| 1 | Economics | Oxford | 4/9/1956 | 9/1/2001 | 100 | 8/15/2014 | 8/15/2014 | Stats With Cats | Charlie Kufs | 3 | 101 | 8/16/2014 | 8/16/2014 | Fun with Cthulhu | James Hatfield | 1 | |
Problems I've encountered are that I want the course variables to be numbered ordinally--i.e. 1, 2, 3, 4...n per student. That is, for each course they took, I want the column name to relate to the order they took the course in as opposed to the order it is labeled by specific dates or course ID.
The reshaping examples I have seen all want to name the widened columns by the actual value--e.g. EnrollmentDate9/1/2001.
asked 53 secs ago
Widening data with ordinal subgroupings
Aucun commentaire:
Enregistrer un commentaire