?

Log in

No account? Create an account
There's a common theme running through these, but I can't tell what it is - Many a mickle maks a muckle — LiveJournal

> Recent Entries
> Archive
> Friends
> Profile

June 8th, 2005


Previous Entry Share Next Entry
12:33 pm - There's a common theme running through these, but I can't tell what it is
1. I'm not going to shut up about this for a while: if you enjoy logic puzzles, whether or not you think you're any good at them, register (register! register! register!) for the online qualifying test for the World Puzzle Championships, taking place at 1pm EDT on Saturday 18th June. If you're positively predisposed towards puzzles - if you're enjoying the Su Doku craze - then don't worry about the competition and you'll still enjoy the test.

Practice is going well. Yesterday I resat the 2000 qualifying test which I hadn't looked at for five years. I scored 120 and it would've been 145 but for a copying error. In 2000, I scored 55 on it, with the top British score being only 90 and the top US score being 220. Perhaps having seen the puzzles before helps a lot, but this is pretty good progress. British folk, 55 was enough to get me onto the British team in 2000; if you can score anything like 55 on the 2000 test - that's just four or five puzzles in 2½ hours - then you really are UK team calibre.

2. At least four people on my Friends list make, or have made, at least part of their living by teaching people how to do well on the SAT, GRE or similar college-entry tests. Now this shouldn't be surprising because y'all are damn smart, but would you like to get to know each other? Is there a community where you can hang out, share tips, find employment in the field and so on? (And why don't you use the WPC online qualifier as a test for logical thinking skills?)

3. Google are providing patronage to students who write software for the good of the world through their Summer of Code promotion - and, if you're so inclined, you can get paid to work on LiveJournal. I can understand people's concerns about Google's privacy policy, they've buggered up the interface for Google Groups and Froogle was a bit naff, but in my book Google do so much good for the world that I have lots of love and time for them. Plus they're sponsoring the online qualifying test and the US team for the World Puzzle Championships.

4. 284,376 LJ accounts, according to the stats, list that the poster is based in the state of Massachusetts (a state with lots of smart people and puzzle fans, who might enjoy the WPC qualifying test). The number of people in Massachusetts with LJ accounts will be lower than that, because people may well have more than one account, but it will be higher than you would expect based on that, because there will be people in MA with accounts who have not listed them as being in MA. Accordingly, let's guess at 200,000 and regard that guess as conservative.

The population of MA is something like 6.2 million. Accordingly at least 3% of people in MA have a LiveJournal, and it seems likely that at least, ooh, 6%-10% of people in MA know what LiveJournal is. These are tremendously high proportions - LJ is approaching being mainstream! (Based on this post to lj_research.)

5. Talking of the stats, you might observe that there are 2.2 million LJ accounts registered male, 4.5 million registered female and 2.1 million registered unspecified. total: 8.8 million. However, there are a total of 7.35 million LJ accounts! What's the discrepancy due to? A vexing puzzle of the sort that you won't find on any online qualifying tests for the World Puzzle Championships.

I asked support and got an answer back quite quickly. It's probably impolite to quote, so you'll have to trust that I'm not misrepresenting the position when I say I was told that the million-and-a-half account discrepancy can be attributed to accounts that have been deleted and possibly purged in the past. Perhaps we can use this figure to estimate some sort of LiveJournal churn percentage. (It is not clear whether those deleted accounts are included in the 284,376 figure quoted above or not.)

6. The BBC report research suggesting that the difficulty some women have in reaching orgasm may be genetic and hint that it's possible that there might be drug therapy some day which could help those who have found that even the most desired partner (if any) and the best technique are not sufficient. This is entirely cheering news and I hope that some appropriate drugs without hazardous side-effects can be discovered. One would expect that such a drug would be bigger news than Viagra.

However, it does illustrate a double standard in me and I'm worried about this. I am not embarrassed by adverts for Viagra, but should such a drug treatment eventually exist, I can't imagine that adverts for it wouldn't be horribly embarrassing. I don't think this is a double standard of mine along gender lines, it's more that the concept of "do something you used to be able to do" is less embarrassing than the concept of "do something you've never been able to do and you feel you're less of a person because of it" - a similar product for men who've never experienced orgasm would be just as embarrassing. I don't know why I feel this way; perhaps it's because it's closer to a purely hedonistic drug than we have legally yet reached. (ETA: I think I've worked this out. See comment.) Cough cough World Puzzle Championship qualifying test.
Current Mood: impressedpuzzling

(29 comments | Leave a comment)

Comments:


[User Picture]
From:imc
Date:June 15th, 2005 08:00 am (UTC)

Computing the statistics (2/2)

(Link)
The "Total accounts" figure in the top line is essentially the number of records returned by the query:
SELECT DATE_FORMAT(timecreate, '%Y-%m-%d') AS 'datereg',   
       DATE_FORMAT(NOW(), '%Y-%m-%d') AS 'nowdate',
       UNIX_TIMESTAMP(timeupdate) AS 'timeupdate'
FROM userusage
(where the results are also used to compute the "new accounts by day" and "users updated in last n days" statistics — note that a record is returned for each user even if they never updated their journal). Unfortunately this doesn't tell me whether deleted and purged users are held in the "userusage" database. However, they must be in at least one database because if you try to view their userinfo then LiveJournal tells you they have been deleted and purged. Clearly, users who have been suspended or deleted but not purged must still be in the database, though they have to be filtered out of the results of any directory search. Incidentally, it must certainly include communities, and I suppose it includes syndicated feeds too.

Renames are a slightly different matter. LiveJournal has to keep the old username around because it either pretends the old name has been deleted or forwards you to the new username (depending on what the user chose when they renamed). I can't currently find any "befores and afters", but it looks like you keep your old userid number when you rename, so I guess that your old name has to be assigned a new number. Unless, that is, you've renamed to a name that was deleted and purged. I'm not sure what happens in that case, but it would make sense for the purged entry to be removed entirely from the database (to be replaced by the user who renamed) when that happens. So, the total accounts statistic probably doesn't count the accounts which were deleted, purged, and then replaced by someone else — but this is pure speculation on my part.

The maximum value of userid is stored in the stats database and can be read from the text dump as "size accounts". The above total accounts number is stored as "userinfo total". In the current text dump, we have:
size    accounts        7433711
userinfo        total   7421711
which means that there are 12,000 completely vanished accounts (it surely must be a coincidence that this works out to such a round figure). It is left as an exercise for the reader to speculate whether this could be accounted for by purged-and-renamed journals.

Now the gender information is retrieved on a cluster-by-cluster basis from the "userproplite2" database. I've no idea how the clustering actually works or what this database is (or indeed the exact meaning of the SQL query). However, what happens is that the data for each cluster is saved in the partialstats database, and when this is complete the records from partialstats are summed and placed in stats. The code claims to count every possible value of gender except for '', and according to the text dump it comes up with four possible values: blank (with only one matching account), 'F', 'M' and 'U'.

I don't know whether it's relevant, but the clustered code asks for
  c.clusterid IS NULL OR c.clusterid=?
(where "?" is the cluster under consideration). If there are any records with "c.clusterid IS NULL" then it looks like they'll be counted several times — once for each cluster.

However, let me speculate on where the extra 1.5 million users (not counting those with blank genders) are coming from. The stats database is never cleared out (I'm assuming this because there are several statistics in the text dump which haven't changed for years and aren't mentioned in the program). Suppose, when they occasionally rename some or all of the clusters, they accidentally leave the stats for the old cluster names in the partialstats database. When the code comes to compute the sum of any clustered statistic, it will include all the out-of-date info for the clusters which no longer exist and thus produce an inflated figure. Of course, I have no idea whether the clusters are referred to by name in the database or by some other identifier which would render my theory invalid, and since I don't have access to the database there is no way to check whether I'm on the right track.
[User Picture]
From:imc
Date:June 15th, 2005 08:01 am (UTC)
(Link)
…and that's the first time I've ever been told to go back and shorten my comment. :-)

> Go to Top
LiveJournal.com