Computer-graded essays?

Hu
Hugh
Posts: 2444
Joined: Thu Mar 08, 2007 12:05 pm
Platform: Mac
Location: UK

Thu Apr 19, 2012 8:53 pm Post

PJS wrote:
Hugh wrote:Everyone is always at least 15 per cent wrong.


Corollary 2.b of Sturgeon's Law.

ps


Good grief Phil, you're right :D. However, I'd quite like it if henceforth the Corolllary was known as Hugh's Corollary 2.b of Sturgeon's Law.

P.S. Olaf Stapledon's brilliantly prophetic novel Last and First Men is very good on the risks of AI - especially when linked by what he seems to have foreseen (in the 1930s) as the equivalent of the Internet.
'Listen, some quiet night, when you've shirked your work that day. Do you hear
that distant, almost inaudible clicking sound? That's one of your
competitors, working away in the night in
Paris or London or Erie, PA.'

PJ
PJS
Posts: 1185
Joined: Sun Jul 22, 2007 5:05 pm
Platform: Mac + Windows
Location: Upstate New York

Thu Apr 19, 2012 9:27 pm Post

Hugh wrote:PJS wrote:
Hugh wrote:

Everyone is always at least 15 per cent wrong.

Corollary 2.b of Sturgeon's Law.
ps

Good grief Phil, you're right . However, I'd quite like it if henceforth the Corolllary was known as Hugh's Corollary 2.b of Sturgeon's Law.


It is terse, precise, and unassailable. It deserves its own place in the literature. Simply "Hugh's Corollary," with no external reference.

And Last and First Men? Good grief. Yours must have been as richly mis-spent a youth as my own. (Youth, as I age, grows ever more vast and lovely Current setting, approximately forty years of sunshine and lollipops.)

Phil
You can't conquer stupid — or cure it — with more stupid.

User avatar
Siren
Posts: 759
Joined: Mon Mar 12, 2007 11:29 am
Platform: Mac + iOS
Location: U.K.

Thu Apr 19, 2012 9:40 pm Post

PJS wrote:And Last and First Men? Good grief. Yours must have been as richly mis-spent a youth as my own.

I mentioned Hugh's book recommendation to my son a few minutes ago, and apparently (at the ripe old age of 18) he has read it, too. And (he thinks) a sequel. A sci-fi fiend and lost cause, obviously...
Literature & Latte support team

User avatar
xiamenese
Posts: 4477
Joined: Mon Jan 29, 2007 1:32 am
Platform: Mac
Location: London or Exeter, UK.

Sat Apr 21, 2012 8:29 am Post

pigfender wrote:
xiamenese wrote:I gave her 85%, when the ceiling for a good distinction grade was 75%. Why? Because under exam conditions, she had written better answers than I myself could have done, sitting at my desk and taking my time over it with reference books handy


But she was still 15% wrong?


Well, I see it like this, based on the apparently absurd grading system in use, 75% is deemed 100% right, therefore 85% is akin to the possibility in mathematics — so I understand — of a paper being awarded 110%.

Actually, to me, this is the whole point ... you cannot take an essay, or a translation for that matter, and go through it word by word giving minus points for errors (how many minus points? How does error type 1 stack up against error type 2?) and plus points for every correct bit (same "how many points" issues), and if you, or at least I, try, you/I go mad after about 3 papers. The only way to do it is to work in grades. A typical A/Distinction/First Class answer is 70+% (UK) or 85+% (China — they've recently reduced it from 90+ while at the same time giving instructions not to hand out so many A grades ... some teachers were giving A grades to well over 50% of their students!); a typical B/Credit/Upper Second is 60-69% (75-85%), etc. So you ask yourself, "In my experience, is this an A, a B ...?" Say the answer is "Not good enough for an A, so a B. How good a B? Top of the range, marginal to A? Middle of the range? Just making it into B?" When you've worked that out, you give it an appropriate number.

On that basis, it really doesn't matter where your grade boundaries are placed within the system — 40% as a bare pass for a BA in the UK. The numbers are only there so that a final grade can be worked out across all the courses a student has taken ... they don't get a final result of 66% or 67% to be argued over, they get an Upper Second at BA level or a Credit at MA.

In the UK, with every paper blind double marked by the teacher and a colleague who then have to come to an agreement on what the paper is given ... and then a large sample of the papers, based on a specific set of rules — any passes, any fails, any grade borderline cases and a representative sample of the rest — sent to an external examiner from a different university, whose job it is to check that the internal examiners have been fair and balanced in their marking ... The whole thing is a total nightmare to be involved in, but very few results go askew, and they can be and are argued over at the final board. I don't think relying on an AI solution can match it for fairness of result while allowing exceptional students like mine to get the result they deserve — No one, including the external, queried my 85%.

Couple of examples.

One year, in Linguistics, the two internals disagreed totally: the person who'd taught the course gave all the students good passes; the other failed the lot of them; they couldn't come to an agreement. So they sent all the papers to the external, who sent them straight back with a harsh note saying it was not his job to adjudicate between the internals, and that they had to settle on a mark and then he'd look at them. I can't remember the final outcome ... too long ago.

My first finals exams: If a student got a final average of 68.5 or above they were eligible for upgrading to a first. There was a girl doing Italian and Linguistics, whose final average was 68.4 or 68.3, not in the consideration band. At the pre-final board — no externals present — I put it to the chair that since all but one of her final year assessments were clear firsts and that it was her less stellar performance during her second year that had reduced her to just below the line, she should be considered for a First as clearly her academic ability was on an upward path, etc. I was stamped on firmly by the chair ... she's outside the band, she can't be considered. After the meeting, the course leader in Italian thanked me for bringing that up, as they hadn't picked up on it, and they'd let their external examiner know. At the final board, the Italian external took it up, told the chair in no uncertain terms he shouldn't be so rigid as the whole point of the system was to look for exceptions to the mathematical rule; he was backed by the Linguistics external; she got her First, which is what all the board wanted apart from the chair.

Could such a case be programmed into an AI algorithm? I doubt it, certainly for some time to come ... or maybe DARPA could do it ... after all they have built a remote-controlled mechanical humming-bird ... but the cost would no doubt be prohibitive. I defer to Jaysen.

:)

X
The Scrivenato sometimes known as Mr X.
iMac 27" (late 2015) 10.15.6, 24GB RAM, 512GB SSID
MBP17" (late 2011) 10.13.6, 16GB RAM, 2TB SSID
2017 iPad, iPadOS 14, 128GB, Apple Pencil
Scrivener, Scapple, Nisus Writer Pro, Bookends …

User avatar
Jaysen
Posts: 6278
Joined: Mon Dec 17, 2007 4:00 am
Platform: Mac + Windows
Location: East-Be-Jesus-Nowhere SC, USA

Sat Apr 21, 2012 1:11 pm Post

Mr X,

Theoretically yes. Some theories behind AI suggest that there should be core "flavors" (what we would think of personalities) of AI that would self evolve into (write new) further sub flavors eventually leading to flavor A developing a flavor B "offspring". The flavors could be used in tandem with a mediation to perform the logical analysis just like to trained humans.

If the idea that folks want the flavors to write new AI flavors doesn't give you the willies then it might be that the sterile nature of the statement masks the fact that the AI is "giving birth" to new AI variants. We call this "making babies" where I am from. Computers effectively procreating new personalities that would need to compete for computer resources.

I just don't like it.
Jaysen

I have a wife and 2 kids that I can only attribute to a wiggle, a giggle, and the realization that she was out of my league so I might as well be happy with her as a friend. 26 years marriage later, I can't imagine life without her. -Me 10/7/09

ImageImage

User avatar
xiamenese
Posts: 4477
Joined: Mon Jan 29, 2007 1:32 am
Platform: Mac
Location: London or Exeter, UK.

Sat Apr 21, 2012 3:15 pm Post

Jaysen wrote:Mr X,

Theoretically yes. Some theories behind AI suggest that there should be core "flavors" (what we would think of personalities) of AI that would self evolve into (write new) further sub flavors eventually leading to flavor A developing a flavor B "offspring". The flavors could be used in tandem with a mediation to perform the logical analysis just like to trained humans.

If the idea that folks want the flavors to write new AI flavors doesn't give you the willies then it might be that the sterile nature of the statement masks the fact that the AI is "giving birth" to new AI variants. We call this "making babies" where I am from. Computers effectively procreating new personalities that would need to compete for computer resources.

I just don't like it.

Nor do I ...

http://www.ted.com/talks/lang/en/susan_ ... temes.html

Also frightening ... she even frightened herself, it seems to me!

X
The Scrivenato sometimes known as Mr X.
iMac 27" (late 2015) 10.15.6, 24GB RAM, 512GB SSID
MBP17" (late 2011) 10.13.6, 16GB RAM, 2TB SSID
2017 iPad, iPadOS 14, 128GB, Apple Pencil
Scrivener, Scapple, Nisus Writer Pro, Bookends …

Hu
Hugh
Posts: 2444
Joined: Thu Mar 08, 2007 12:05 pm
Platform: Mac
Location: UK

Sun Apr 22, 2012 12:46 pm Post

Siren wrote:
PJS wrote:And Last and First Men? Good grief. Yours must have been as richly mis-spent a youth as my own.

I mentioned Hugh's book recommendation to my son a few minutes ago, and apparently (at the ripe old age of 18) he has read it, too. And (he thinks) a sequel*. A sci-fi fiend and lost cause, obviously...


So he's trodden the same path, as (I believe) youthful Arthur C. Clarke, Brian Aldiss and Patrick Moore once did. As well as Phil and I.

*Probably Star Maker.
'Listen, some quiet night, when you've shirked your work that day. Do you hear
that distant, almost inaudible clicking sound? That's one of your
competitors, working away in the night in
Paris or London or Erie, PA.'

User avatar
nom
Posts: 1927
Joined: Sun Aug 31, 2008 12:02 am
Platform: Mac + iOS
Location: Melbourne, Australia
Contact:

Sun Apr 22, 2012 1:53 pm Post

Reading this thread with interest. I have a few thoughts on content, but will save them for a time when I am not falling asleep at the keys. I am posting now to say just one thing in response to Mr X's parenthetical comment to me:

ABOUT TIME! :D
Complete and utter NOMsense.
Image

User avatar
Sin
Posts: 801
Joined: Wed Mar 02, 2011 4:05 am
Platform: Mac
Location: Georgia

Sun Apr 22, 2012 9:16 pm Post

After seeing what Watson was capable of on Jeopardy, I think we may underestimate the potential of a computer program. This also brings up the argument of subjectivity vs objectivity in grading. Generally speaking, the greater weight given to subjective elements, the poorer the grader. In my school we have clear rubrics that determine how an essay will be graded. It still allows for creative expression, but the grade itself is determined as objectively as possible. I find that fair as a student, because I know exactly what is expected of me.

The question I have about computer grading is this: Will it be used as a tool to assist teachers, or is it a replacement and nobody will bother reading student work anymore. If we are moving in the latter direction, then let's abandon the grading system all together. Grading fewer papers may sound appealing, but it will only lead to fewer teachers and larger class sizes, which means less interaction. If we can build computers that can do all of the grading for us, then why bother with teachers at all? Cut the middle man out and place the student in front of the computer and a guard at the door.

User avatar
nom
Posts: 1927
Joined: Sun Aug 31, 2008 12:02 am
Platform: Mac + iOS
Location: Melbourne, Australia
Contact:

Mon Apr 23, 2012 1:43 am Post

I've been reading this thread with interest. While I don't have a strong opinion (yet, give me time :D ) on computer marking, I did react to this comment:
Content, by which I mean the regurgitation of facts learnt and opinions already established is the key to exam success at these levels.
. I think this depends entirely on the subject and the content. Speaking as someone who marks student papers (university undergraduates) I don't want regurgitation of "facts" - I want a clear, logical, well written "argument". Facts are a dime a dozen, but being able to research, collate and present those facts in an interesting and coherent way to arrive at a new point is a much more valuable skill than simply regurgitating them.

Sin wrote:Grading fewer papers may sound appealing, but it will only lead to fewer teachers and larger class sizes, which means less interaction.

Maybe I'm an optimist, but I would think the opposite. Grading takes a lot of time. We have to prepare the marking rubric, read and mark all the papers (and add meaningful comments) plus cross-mark a selection of papers (including all fails). This costs money. And, while we are marking, we are not available for teaching other classes (it can take several days full-time work to mark all the students' work just for a single assignment - equivalent to half a semester per tutorial class of face-to-face teaching time). If we didn't have to do this, then we could afford up to 50% *more* staff and have smaller class sizes. Now whether the university would actually do this is another matter entirely, but that becomes a question of budget priorities rather than a logical progression. I see no reason why classes would get bigger though.

Another benefit of computer marking is the greater likelihood of detecting plagiarism. I have no doubt that I have missed many examples in the papers I have marked. When marking, we are under tight deadlines to return work to students and grades to admin, we don't have the time to investigate every piece of suspicious text. Hence I tend to prioritise those "most likely" and only work my way through to some of the more subtle instances if time permits. It rarely does.

Having said all that, I'd like to see how well a computer algorithm compares to human markers. Not just on average where I suspect it would do well, but (especially) with the outliers: the "odd" papers, the ones that don't follow the script but work regardless. Or, conversely, the ones that follow the script so tightly they show no original thought whatsoever.

I'd also like to know how a computer algorithm would go about giving feedback. How can it read "intent" behind what was written? Can it provide useful comment on phrasing or arguments? Can it give encouraging feedback as well as pointing out errors and omissions? Can it recognise, highlight and reward use of unexpected but salient references or creative intellectual links between constructs? If it can do all that, I'd hand over my marking in a flash. I suspect, however, that we are still some way from achieving this.
Complete and utter NOMsense.
Image

User avatar
Sin
Posts: 801
Joined: Wed Mar 02, 2011 4:05 am
Platform: Mac
Location: Georgia

Mon Apr 23, 2012 3:22 am Post

nom wrote:
Sin wrote:Grading fewer papers may sound appealing, but it will only lead to fewer teachers and larger class sizes, which means less interaction.

Maybe I'm an optimist, but I would think the opposite. Grading takes a lot of time. We have to prepare the marking rubric, read and mark all the papers (and add meaningful comments) plus cross-mark a selection of papers (including all fails). This costs money. And, while we are marking, we are not available for teaching other classes (it can take several days full-time work to mark all the students' work just for a single assignment - equivalent to half a semester per tutorial class of face-to-face teaching time). If we didn't have to do this, then we could afford up to 50% *more* staff and have smaller class sizes. Now whether the university would actually do this is another matter entirely, but that becomes a question of budget priorities rather than a logical progression. I see no reason why classes would get bigger though.


I can't comment on Australian education, but the cost of education in the US has been a big issue lately, particularly when every government expense is becoming suspect. I believe you acknowledged the point I was trying to make. Grading takes a lot of time. If you eliminate that aspect, then the school will be aware that you have more time on your hands.

I wanted to make some comments about professors no longer reading papers, but hell, I know some grad students who are tasked to grade freshman and sophomore papers. I'm sure I'll have a better perspective if and when I'm on the other side of the desk. The higher you go up the ladder, the more human involvement may be required. Would you have felt comfortable with a machine reading your thesis?

User avatar
kewms
Posts: 6566
Joined: Fri Feb 02, 2007 5:22 pm
Platform: Mac

Mon Apr 23, 2012 5:36 am Post

Sin wrote:I wanted to make some comments about professors no longer reading papers, but hell, I know some grad students who are tasked to grade freshman and sophomore papers. I'm sure I'll have a better perspective if and when I'm on the other side of the desk. The higher you go up the ladder, the more human involvement may be required. Would you have felt comfortable with a machine reading your thesis?


When the professor is reading freshman and sophomore term papers, he's not reading theses.

Katherine
Scrivener Support Team

User avatar
Sin
Posts: 801
Joined: Wed Mar 02, 2011 4:05 am
Platform: Mac
Location: Georgia

Mon Apr 23, 2012 6:27 pm Post

kewms wrote:When the professor is reading freshman and sophomore term papers, he's not reading theses.

Katherine


Katherine, forgive me, but you may need to elaborate. My first impression was that you are simply dumping on freshmen and sophomores, which is fine if that was your intent, but I don't have a response.

Or do you mean that if the professor is teaching freshman/sophomore classes, then he is unable to serve on a thesis committee?

If so, I didn't know. However, I wasn't making any distinctions at any level. High school and all the way up . Even publications. There's already a program that determines the probability of a song becoming a hit. http://www.npr.org/templates/story/story.php?storyId=113673324 Why not books? I'm assuming the computer just needs the parameters.

Or do you mean that if the professor is reading fewer undergrad papers, then he would have more time to read theses?

If so, then true. How far does a student climb the ladder before a human actually reads her paper? While it may make professors happy, I can't imagine any student would be thrilled to write a paper that no one will ever read. Ever. At any level. If I were a prospective freshman, that would be the school I would avoid.

User avatar
kewms
Posts: 6566
Joined: Fri Feb 02, 2007 5:22 pm
Platform: Mac

Mon Apr 23, 2012 7:13 pm Post

Sorry, I guess I was a little terse...

My first point was that professor time is inherently limited. Time spent reading freshman papers takes away from time spent reading theses, and vice versa. So it makes sense for the professor -- or any other highly skilled professional -- to invest their own time in the things that only they can do, and delegate everything else. This is why most business executives don't do their own travel planning, and why writers who can afford to do so often hire research assistants.

So the next question is where do you draw the line? Exactly what *are* the tasks that only the professor can do? At one extreme, I think we can all agree that you don't need a PhD and years of experience to correct basic structural and grammatical errors. At the other extreme, a PhD-level research project probably does require close contact with a highly trained supervisor. The fact that many professors already delegate grading for large survey courses -- in any field -- to graduate students shows where their priorities lie. And I would argue that's exactly as it should be.

If a machine can accurately grade a freshman-level essay, then why not? If freshman are so poorly prepared that the mere ability to construct a literate argument is what separates good from bad papers, that's the fault of the high school, not the professor or the university.

Katherine
Scrivener Support Team

User avatar
Sin
Posts: 801
Joined: Wed Mar 02, 2011 4:05 am
Platform: Mac
Location: Georgia

Mon Apr 23, 2012 11:57 pm Post

Thank you for the clarification. I agree with your points as well.

kewms wrote:If a machine can accurately grade a freshman-level essay, then why not? If freshman are so poorly prepared that the mere ability to construct a literate argument is what separates good from bad papers, that's the fault of the high school, not the professor or the university.


In my experience, I notice my standards drop significantly whenever I've been handed a topic and asked to write an essay within a fixed time (entrance exam, Regent's exam, GACE, etc.). I regress back to a three-point, five paragraph format. I put no thought process into at all, because none is required. I'm just a dog doing tricks. Here is an example of what an essay looks like. Ta-da! I feel sorry for whoever grades my essays during those exams, but I've never had one negatively impact me. I suspect I'm not the only student who does this during standardized testing. I also suspect that's exactly how I would write machine-graded essays.