I've been working on a multi-label email classification model. It's been a frustrating slog, fraught with challenges, including a lack of training data. Labeling emails is labor-intensive and error-prone. Also, I habitually delete certain classes of email immediately after its usefulness has been reduced. I use a CRM-114-based spam filtering system (actually I use two different isntances of the same mailreaver config, but that's another story), which is differently frustrating, but I delete spam when it's detected or when it's trained. Fortunately, there's no shortage of incoming spam, so I can collect enough, but for other, arguably more important labels, they arrive infrequently. So, those labels need to be excluded, or the small sample sizes wreck the training feedback loop. Currently, I have ten active labels, and even though the point of this is not to be a spam filter, “spam” is one of the labels.

Out of curiosity, I decided to compare the performance of my three different models, and to do so on a neutral corpus (in other words, emails that none of them had ever been trained on). I grabbed the full TREC 2007 corpus and ran inference. The results were unexpected in many ways. For example, the Pearson correlation coefficient between my older CRM-114 model and my newer CRM-114 was only about 0.78.

I was even more surprised by how poorly all three performed. Were they overfit to my email? So, I decided to look at the TREC corpus for the first time, and lo and behold, the first spam-labeled email I checked was something I would definitely train all three models with as non-spam, but ham for CRM-114 and an entirely different label for my experimental model.

Posted on 2025-05-28
Tags:

Why can't Debian have embarrassing flamewars like this thread?

Posted on 2021-09-21
Tags:

unmerged /usr is unsupported in bookworm and sid has been feeding bookworm since 2021-08-14,

unmerged /usr is also unsupported in sid since 2021-08-14,

no one using any portion of either bookworm or sid since 2021-08-14 should have any expectation that things should function correctly with unmerged /usr ,

∴ anyone using any portion of either bookworm or sid should execute apt install usrmerge or perform its equivalent on or prior to 2021-08-14.

Posted on 2021-08-23
Tags:

Mom,

When you upgrade to bullseye, you need to change your security source from

deb http://security.debian.org/ buster/updates main

to

deb http://security.debian.org/debian-security bullseye-security main

However, that will silently fail to work if you forget to update the file in /etc/apt/preferences.d to add something like this stanza:

Explanation: Debian security
Package: *
Pin: release o=Debian,n=bullseye-security
Pin-Priority: 990
Posted on 2021-08-14
Tags:
Posted on 2021-04-12
Tags:

This story is not true.

In 1971, Atlantic Records released John Prine's eponymous debut album. The third track (on the first side) was a song called “Hello in There”.

In 1972, Atlantic Records released Bette Midler's debut album, The Divine Miss M. The seventh track (the second track on the second side) was a song called “Hello in There”, written by John Prine.

In 1973, Asylum Records released Tom Waits's debut album, Closing Time. The sixth track on that album was a song called “Martha”.

Later that year, Bette Midler's concert tour took her to Radio City Music Hall in New York City, where she performed from December 3 until December 22. On one of those nights, she sang “Hello in There” while a young John Prine sat in the cheap seats, thinking to himself, “Someday I'll be up on that stage.”

In 1977, both Tom Waits and Bette Midler, who were dating, released albums with a duet by Waits called “I Never Talk to Strangers”.

In 1979, Bette Midler, who was no longer dating Waits, performed on Saturday Night Live a version of “Martha”.

On Friday, April 13, 2018, Sturgill Simpson took the stage at Radio City Music Hall, acoustic and solo. After he finished, John Prine appeared, and for the first time of his life on stage at Radio City, he performed “Hello in There”, nearly 45 years after declaring he would.

For his final encore, he played “When I Get to Heaven” while his niece and nephew played kazoo and Brandi Carlisle yodeled.

If you received this story through a blog aggregator of some kind and are annoyed because this story is not true, you may find that the administrators are more than eager to be complicit in censorship reactions in response to your complaint.

Posted on 2019-01-22
Tags:

Did dkg certify his new key with something I've certified?

hkt findpaths --keyring ~/.gnupg/pubring.gpg '' \
        2100A32C46F895AF3A08783AF6D3495BB0AE9A02 \
        C4BC2DDB38CCE96485EBE9C2F20691179038E5C6 2>/dev/null
(3,[46,31,257])

(31,0EE5BE979282D80B9F7540F1CCD2ED94D21739E9)
(46,2100A32C46F895AF3A08783AF6D3495BB0AE9A02)
(257,C4BC2DDB38CCE96485EBE9C2F20691179038E5C6)

I (№ 46) have certified № 31 (0EE5BE979282D80B9F7540F1CCD2ED94D21739E9) which has certified № 257 (C4BC2DDB38CCE96485EBE9C2F20691179038E5C6).

Posted on 2019-01-19
Tags:

“Are you winning at life?” she asked.

“My marriage has broken down, I'm thirty-seven and bald, and my net worth is negative five thousand euros,” he replied.

“That sounds like you're a perfectly functional adult,” she observed.

“Oh, absolutely,” he said, “but it doesn't mean I'm winning at life. I haven't taken a scalpel to my carotids yet, that's probably the most positive aspect of my life at the moment. Continuing to perfuse my peripheries! Glasgow coma scale fifteen out of fifteen! Not covering the wall in arterial blood!”

“Is the Glasgow coma scale anything like the Bristol stool scale?” she inquired.

“Well, yes,” he replied, “in that it's used by doctors and it's a number.”

Posted on 2018-10-13
Tags:

“Now that I'm 50 and within range of medical disaster, any ideas for a comfortable suicide?” he said. “Leading candidates are sleeping pills or car exhaust. I tried to enlist my kids to do it but they can't, so I guess I have to do it myself if necessary.”

“A large plastic bag over the head supposedly puts you into a dreamy, pleasant stupor before it kills you,” she replied, “or any CO₂ replacement would work. Also an opioid overdose is probably nice.”

Posted on 2018-10-08
Tags:

Sheena plodded down the stairs barefoot, her shiny bunions glinting in the cheap fluorescent light. “My boobs hurt,” she announced.

“That happens every month,” mumbled Luke, not looking up from his newspaper.

“It does not!” she retorted. “I think I'm perimenopausal.”

“At age 29?” he asked skeptically.

“Don't mansplain perimenopause to me!” she shouted.

“Okay,” he said, putting down the paper and walking over to embrace her.

“My boobs hurt,” she whispered.

Posted on 2018-09-16
Tags: