Jump to content

Wikipedia:Village pump (miscellaneous)

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:VPM)
 Policy Technical Proposals Idea lab WMF Miscellaneous 
The miscellaneous section of the village pump is used to post messages that do not fit into any other category. Please post on the policy, technical, or proposals sections when appropriate, or at the help desk for assistance. For general knowledge questions, please use the reference desk.

For questions about a wiki that is not the English Wikipedia, please post at m:Wikimedia Forum instead.

Discussions are automatically archived after remaining inactive for 8 days.

« Archives, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82

Meaningful intervals for edit size histogram

[edit]

With T236087 XTools is going to get a histogram of a user's edit sizes soon. This will be a bar chart. For screen real estate reasons, it's max ~12 bars. The idea is that each bar gives the number of edits in a certain size interval. My question is: which intervals do you think we should use? The current code uses 200-width intervals (0-200, 200-400, &c), up to 1800-2000, and lumps the rest into >2000.

The issue with fixed-width intervals is they don't allow much granularity for smaller edits (e.g., separating the +1 typo fix from the +120 paragraph addition). I was thinking also of perhaps something exponential like 0-20, 20-40, 40-80, 80-160, 160-320, 320-640, 640-1280, 1280-2560, >2560. What do you think could be more meaningful to users, and why? Welcoming suggestions. Thanks, — Alien  3
3 3
16:33, 28 April 2025 (UTC)[reply]

Just looking at my most recent mainspace contributions, the <10 typo fix or minor c/e shows up, then from 10-100 there's larger copyedits, adding categories, and formatting tweaks. The adding text+adding source seems to start from perhaps 200. I have a small number of +2000 edits which seem meaningfully distinct from say reverting page blanking vandalism, so I'd put the final bin a bit higher. CMD (talk) 02:32, 29 April 2025 (UTC)[reply]
Thanks for the answer! When you say "higher", where would that be? 3K? 4K? 10K? Just asking for a general order of magnitude. — Alien  3
3 3
09:09, 29 April 2025 (UTC)[reply]
Probably something like 5K or 10K? Maybe someone has an existing histogram this could be based on. CMD (talk) 12:15, 29 April 2025 (UTC)[reply]
What about negatives? A few years ago I looked at my edits (in mainspace) and found that my median change was −3 bytes. —Tamfang (talk) 23:33, 29 April 2025 (UTC)[reply]
This would be in absolute value, i.e. putting -1 with +1. Else it takes twice as much width. We could do both positive and negative, but then we'd have pretty low granularity (could only have about 6 bars on either side). — Alien  3
3 3
05:43, 30 April 2025 (UTC)[reply]
Could you split the bars in two? Top colour is positive and bottom colour is negative. 80.76.122.163 (talk) 08:45, 1 May 2025 (UTC)[reply]
We could, I think. Question would be, what do we do with 0? is it positive or negative? — Alien  3
3 3
09:25, 1 May 2025 (UTC)[reply]
Centered/split? I agree that positive/negative above/below the horizontal axis was also where my mind went immediately. -- Avocado (talk) 22:27, 4 May 2025 (UTC)[reply]
Yup, that's done (see discussion below). Currently the zero is put between the additions and the x-axis in the 0-10 interval, in a separate colour.
Splitting the zero bar (as in half-above and half-below) is not doable with our library without some meh hacks I'd really like to avoid. — Alien  3
3 3
09:49, 5 May 2025 (UTC)[reply]
Centred, obviously. —Tamfang (talk) 19:18, 15 May 2025 (UTC)[reply]
A lot of stuff's happened in the last two weeks, see below (currently looks like this). Centering the zero isn't really doable without some very ugly hacking, though, in the end, so it'll have to stick with the pos. — Alien  3
3 3
19:26, 15 May 2025 (UTC)[reply]


I like the exponential (or semi-log?) better than a straight division. Most of our edits are actually small.
What I really wish is that we could get numbers for changes to readable prose (e.g., not fiddling with whitespace and template formatting). WhatamIdoing (talk) 03:25, 1 May 2025 (UTC)[reply]
Sadly, that's just not doable on a statistical scale. The best possible in reasonable time would be a bit below 100 edits, which is not a lot.
If you're ready to wait something like at least 30 seconds for it, we could make a separate tool that does this.

Update: now looks like this. Other suggestions? — Alien  3
3 3
13:29, 1 May 2025 (UTC)[reply]

The link doesn't work.
Instead of a separate tool (I greedily want all the tools, but would I use it often enough to justify your efforts? I'm not sure, in this case), I wonder if it would be possible to add Special:Tags to non-prose changes. Something like the "Undo" tag, which is calculated later? WhatamIdoing (talk) 03:53, 2 May 2025 (UTC)[reply]
Well, my bad for the link. This one should work.
Adding tags is beyond our capacity (should ask the mw people), but I get the use of it. I'm wondering, though: is a non-prose change a change that changes no prose, or that also changes something that isn't prose? — Alien  3
3 3
05:36, 2 May 2025 (UTC)[reply]
The red/green color choice in the diagram probably needs to be checked for Wikipedia:Manual of Style/Accessibility purposes. Could the red/minus items hang down below the 0 line?
About non-prose changes: I don't want to be bothered with edits like these: [1][2][3][4][5][6]. I do want to see edits like this one: [7] WhatamIdoing (talk) 20:27, 2 May 2025 (UTC)[reply]
Current histogram, after some color tweaking and putting the neg below the 0 line. (Actually, it was the grey that was really problematic for accessibility). — Alien  3
3 3
08:16, 3 May 2025 (UTC)[reply]
That shape is a little easier for me to understand at a glance.
Does the new color scheme work for someone with Red–green color blindness? WhatamIdoing (talk) 22:41, 3 May 2025 (UTC)[reply]
Yes; I checked. Still clearly distinguishable. — Alien  3
3 3
22:55, 3 May 2025 (UTC)[reply]
Thanks. WhatamIdoing (talk) 17:08, 8 May 2025 (UTC)[reply]

Many thanks to everyone for all the input! Will probably go out in the next deployment or two. — Alien  3
3 3
12:29, 9 May 2025 (UTC)[reply]

Alien333, where you say,

For screen real estate reasons, it's max ~12 bars

Please correct me if I am wrong, but I presume this max value is due to the assumption that the bar chart must be displayed with vertical bars, in which case your max value of 12 is reasonable, because the bars would become too narrow or merge if there were a lot more than that, especially in the case of mobile users with much narrower screens.
But is this assumption necessary? I don't think it is. Please see T394066 and this horizontal bar chart demo, in which case the 12 bsr limit goes away. I assume XTools is not using the Chart extension, but the same argument applies. An ideal design imho should be able to handle a param |mode=vertical as opt-in, and flip the chart 90 degrees, or at least, be robust enough and forward-thinking in the initial release not to prevent it from being easily added in a later enhancement. Thanks, Mathglot (talk) 07:21, 15 May 2025 (UTC)[reply]
We do use horizontal bar charts most of the time (cf yearmonth counts).
But look where this goes in the edit counter: in the general stats sections; this would replace the two edit size pie charts. Which are 200px tall.
So using vertical bars would mean either a) making the bars less than 10px tall, which believe me makes them unreadable; or b) forcing everyone to scroll a lot.
So in a nutshell vertical real estate is even more constrained than horizontal real estate; hence the conscious choice to use a vertical bar chart and not a horizontal bar chart.
(Also, for information, changing bar dimension with ChartJS (which we use) is ridiculously easy, so there is zero risk of preventing future updates.)
I would also argue that we have to put some higher limit anyhow, because else we'd be adding a lot of empty bars just to show that the user did one +200K rvv. — Alien  3
3 3
07:31, 15 May 2025 (UTC)[reply]
Even if you do not flip, in response to Tamfang's question about negative values, you said:

this would be in absolute value, i.e. putting -1 with +1. Else it takes twice as much width. We could do both positive and negative, but then we'd have pretty low granularity (could only have about 6 bars on either side).

but that isn't necessarily the case, iiuc. In horizontal mode, if your y-axis 0-byte change value were centered vertically (well, it should be at y=max pos. value + min neg. value / 2) then you could display negative values below the y=0 line with no increase in width, retaining twelve bars, even without flipping to vertical orientation. (edit conflict × 2) Mathglot (talk) 07:45, 15 May 2025 (UTC)[reply]
I'd say try to look at the current output I linked above; it does currently do that in the end :). — Alien  3
3 3
07:48, 15 May 2025 (UTC)[reply]
Imho, the choice is not only between a) and b). Couldn't one collapse the section to minimize scrolling and allow access to the totality of the data? Mobile users (already the majority, iiuc) already have all sections collapsed; I don't see a collapsed section being a huge burden for desktop users to click '[show]', in exchange for the benefit of minimizing scrolling past a long chart. (edit conflict) Mathglot (talk) 08:02, 15 May 2025 (UTC)[reply]
We could give a button for the whole data, I suppose.
Adding an optional full scrollable chart does free us from all real estate concerns, though.
So I don't really see how a horizontal bar chart helps in this case. Plus, the default data does look cramped in a horizontal chart. I don't think making the default data have less bars is an improvement. — Alien  3
3 3
09:40, 15 May 2025 (UTC)[reply]

AI tool to fact-check articles (proof of concept)

[edit]

I have created a proof of concept tool for automating fact-checking of articles against sources using AI. GitHub repository. An OpenAI API key or compatible provider is required (I use BotHub). It is cost-effective; when using gpt-4.1-nano, verification of one 100-word block against a single source (approximately 12,000 characters) costs about 0.1 cent. Functionality:

  1. The program loads the article text from file and all available sources (text files: source1.txt, source2.txt, etc.).
  2. It divides the article into blocks of approximately 100 words, preserving sentences.
  3. For each block and each source:
    • Sends a request to the OpenAI API for correspondence analysis
    • Receives credibility probabilities for each word
  4. Combines results for all blocks and sources
  5. Visualizes the text with color coding based on the obtained probabilities (textmode with all sources combined or GUI allowing to select individual sources)

Installation and usage instructions, along with example screenshots, are available in the README. Bugs are certainly present (almost all code was generated using Anthropic Claude 3.7).

It is also possible to use models hosted locally by installing an OpenAI API compatible LLM server (such as LLaMA.cpp HTTP Server) and directing script to use it with --base_url and --model parameters.

Suggestions and proposals are welcome, but unless submitted as pull requests, they will be reviewed at an indeterminate time. The creation of new tools based on this idea and code is strongly encouraged. Kotik Polosatij (talk) 13:40, 5 May 2025 (UTC)[reply]

Interesting, thanks! -- GreenC 00:56, 9 May 2025 (UTC)[reply]

Papal traffic - one of our busiest hours?

[edit]

In case anyone is curious, I did a bit of digging on yesterday's traffic:

  • On 8 May, the Pope Leo XIV article here was read 13.2 million times ; the Spanish, Italian, German, French and Portuguese made up another 10.9 million. This was 4.5% of all pageviews in the day for English, and as high as 12.9% for the Spanish Wikipedia. (These figures include all traffic from redirect pages)
  • Absolute totals for all Wikipedias are a little trickier. The count for pageviews of the "main article title" was around 15 million on all 93 Wikipedias with articles; the six biggest ones above made up 88.5% of that. So assuming the breakdown between main articles + redirects is in proportion, maybe something like 27 million pageviews overall, including redirects.
  • We went from 23 WPs having an article on him before the announcement, to 93 by midnight UTC, and 113 now. 20 Wikipedias managed to rename their article in the first three minutes (17:14 to 17:17 UTC) and two other projects had created new articles on him by that time.
  • In the hour after the announcement (17:00 to 18:00 UTC), English Wikipedia had around 8.4 million hits on Pope Leo XIV and the redirect titles - around half of those were to Robert Francis Prevost - which represented one third of all pageviews during the hour.
  • It probably represented over 40% of all pageviews, over 3000/second, from 17:14 to 18:00 (assuming that the other traffic was evenly distributed) and while the public data doesn't go lower than hourly, I would be happy betting money that in the first fifteen minutes, it was well over half of our traffic.

I don't know if this was our one-time traffic record, but it must certainly be well up there. Congratulations to everyone who worked on it. Andrew Gray (talk) 21:12, 9 May 2025 (UTC)[reply]

Other contenders: Death and funeral of Pope John Paul II; Death of Michael Jackson. I think the Michael Jackson one maxed out our servers. --Redrose64 🌹 (talk) 22:26, 9 May 2025 (UTC)[reply]
Looks like the death of Michael Jackson in 2009 and the views it generated caused wikitech:Michael Jackson effect, which was solved by our software engineers writing the software mw:PoolCounter, which is now installed on our servers to prevent it from happening again. An interesting bit of technical history. –Novem Linguae (talk) 22:40, 9 May 2025 (UTC)[reply]
Interesting, thankyou - I had somehow forgotten the Jackson case!
That page points to Wikipedia:Article traffic jumps which identifies a handful pushing towards 10m in a day (Kobe Bryant, Matthew Perry, Elizabeth II). Some of these do not include redirects in the count and so are ahead of Leo XIV on purely "single title" data, but I think none are likely to beat the one-day (or one-hour) figure for Leo once redirects are included (and IMO they should be).
I'll see if I can work out what any of these were like as a percentage of traffic - in particular it seems plausible that Steve Jobs might be higher than Leo XIV, with 7.4m views in 2011. Andrew Gray (talk) 22:54, 9 May 2025 (UTC)[reply]
@Andrew Gray: Awhile back I wrote this about the impact of Prince's death on Wikipedia. Forgive the writing -- I'd like to think I'm more concise these days -- but there's probably some useful info in there for you. Ed [talk] [OMT] 20:28, 12 May 2025 (UTC)[reply]
@The ed17 very interesting, thankyou! I had a vague recollection that at some point there had been minute-by-minute hit analysis on a page, but I completely failed to recall what it was about (I thought maybe an election...)
Quickly comparing that to the numbers for the others below - for Prince, the max "clock hour" (1700-1800) was 1.81m hits (very close to the 1.84m for the first 60 min), or about 12% of total enwiki traffic that hour.
Prince had 500 views/second in the first hour, with (per your data) a peak at 810/second. If we assume the same sort of pattern held for the recent traffic, then we have an average of 3000 views/second in the first 3/4 hour, which might imply a peak at somewhere around 5000/sec for the Pope?
It is possible, though, that the traffic in data-served terms was higher for the deaths of people with long-established articles - the Prevost/Leo article was quite short with one image, while Prince had much higher wordcount plus eight images.
I guess it would be a bit cheeky to ask if you could find out if someone could generate that data for the article titles here (Pope Leo XIV & Robert Francis Prevost, plus redirects at Leo XIV, Pope Leo XIIV & Pope Leon XIV), before it gets too old for analytics to be storing it? I think that might be really interesting to do as a comparison to see how the two evolved. But if it's an unreasonably complicated request, no worries :-) Andrew Gray (talk) 22:42, 12 May 2025 (UTC)[reply]
We also did some minute-by-minute stuff for the Super Bowl! (Forgive the formatting in that automatically imported post.)
I've passed along the ask. No guarantees, as I know that team is heavily taxed. :-) Ed [talk] [OMT] 01:00, 13 May 2025 (UTC)[reply]
Amazing, thankyou! Andrew Gray (talk) 18:54, 13 May 2025 (UTC)[reply]
@Andrew Gray: They unfortunately can't displace planned work for this request, but they did suggest that we have hourly data in public dumps. Those are tricky to work with (e.g. the file sizes alone), so the the analytics listserv is available for clarification questions. Ed [talk] [OMT] 14:57, 14 May 2025 (UTC)[reply]
@The ed17 No worries - thanks for asking! I've been using the daily dumps and they're pretty good - it's just that for something where it's so quick-moving as this, it seemed worth checking if the minute-resolution data might be available. Andrew Gray (talk) 22:22, 14 May 2025 (UTC)[reply]

Looking at some recent high-traffic deaths, with a little rounding up added to the global data for redirects (which are relatively rare for stable articles like these ones):

  • Matthew Perry got ~8.8m enwiki hits on 29/10/23, and ~11.8m globally, which would put him at 3.7% of enwiki traffic and 2.1% of global traffic. (Death was reported about midnight UTC)
  • Kobe Bryant got ~9.5m enwiki hits on 26/01/20, and ~15.1m globally, which would put him at 3.4% of enwiki traffic and 2.6% of global traffic. (Death was reported about 1930 UTC)
  • Elizabeth II got ~8.5m enwiki hits on 8/9/22, and ~20m globally, which would put her on 3.2% of enwiki traffic and 3.5% of global traffic. (Death was reported about 1730 UTC)

My rough estimate for the Pope had 4.5% of enwiki and (more tentatively) 4.4% of global traffic in the day, so I think that puts him ahead of all three. Interesting to see, though, the difference between Elizabeth/Leo and Perry/Bryant in terms of English vs global traffic. Peak hour was I think around 3.5m/21% for Bryant, 2.2m/13% for Elizabeth II, and 1.3m/11% for Perry, so again all a bit behind what we saw this week.

  • For Jobs in 2011, we have the problem that a new and more reliable pagecount system came in about a month after his death. From what we do have (which may have errors/omissions), I get ~7.8m enwiki hits over the full day 6/10/11 (counting Steve Jobs & the main redirect at Steve jobs). Total hits for the day were 231.5m for enwiki, so this suggests Jobs was ~3.3% of English Wikipedia traffic that day, maybe a shade higher to account for the other redirects. Jobs's death seems to have been announced about midnight UTC so the affected period covers the full day; for the peak hour (1-2am) it was 10% of all traffic.
  • For Jackson in 2009, with the same caveats, there were ~1.5m hits over the full day 25/6/09 (Michael Jackson + Michael jackson), or 0.6% of total enwiki traffic, but his death was announced only in the last couple of hours of the day so it's not a great comparison. The last two hours of the day had ~7.1% of all enwiki traffic go to the two Jackson page titles, and the last hour had ~12%.

Again, I think the data for the Pope this time around is ahead of both in terms of the share of traffic and the one-hour spike.

In terms of overall sitewide impact, 8 May was a relatively normal day for English Wikipedia in absolute traffic terms - it was busier than usual, especially for a Thursday, but only the fifth busiest this year. However, for Wikimedia as a whole, it was quite a leap, with 613m pageviews - this is the most it has been since 28/1/2024, and the sixth highest since the start of 2021. — Preceding unsigned comment added by Andrew Gray (talkcontribs) 15:47, 10 May 2025 (UTC)[reply]

Graph of traffic to the English Wikipedia highlighting the activity around the announcement of Pope Leo XIV

One more addition: here's the traffic graphed against "all other page hits". It's interesting to see how it clearly seems to be "extra" traffic rather than Wikipedia's existing reader base, which more or less continues unaffected. It's also noticeable that there is an extra few million hits in that hour which isn't accounted for by the main article - some of that is presumably to pages with similar "what just happened" information like 2025 papal conclave or Pope, but I think we're also seeing a decent amount of spillover from people moving onto other pages - which is great. Andrew Gray (talk) 22:49, 12 May 2025 (UTC)[reply]

How long before we hit 7 million articles?

[edit]
5482 to go!

At this writing, there were 6,991,903 articles in the encyclopedia, and as you are reading, there are now 6,994,518. There are 5482 left to go to hit the big 7M! Who will be the lucky one to make the seven millionth edit article?? Mathglot (talk) 07:09, 10 May 2025 (UTC)   pinned[reply]

P.S. If you are sitting here hitting reload to see the number change, you might need to Purge the page instead. While you do that, you can listen to the calming sound of Wikipedia being edited. Mathglot (talk) 08:41, 10 May 2025 (UTC)[reply]
Surely we've hit our 7th million edit! I have a list of notable article topics and I might get to some of them, so I'll try and chip away at a quarter of a percent. CMD (talk) 09:03, 10 May 2025 (UTC)[reply]
Yes, we're up into the region of 1.2 thousand million edits now (specifically, 1,285,971,944). I suspect that Mathglot meant "seven millionth article" when they wrote "seven millionth edit". --Redrose64 🌹 (talk) 13:31, 10 May 2025 (UTC)[reply]
Big 'oops!' on my part. Of course I meant article, thanks for the correction. Someone trout me! Mathglot (talk) 18:36, 10 May 2025 (UTC)[reply]
CMD (talk) 02:31, 11 May 2025 (UTC)[reply]
Gawrsh, thanks; I needed that!   [wipes trout juice and a few silvery scales off chin...]   Mathglot (talk) 02:38, 11 May 2025 (UTC)[reply]
I wonder what % of those articles don't meet the WP:Notability guidelines... Some1 (talk) 14:17, 10 May 2025 (UTC)[reply]
Probably a smaller number than the number of articles that could meet the notability guidelines that don't yet exist, so it should all balance out in some way. CMD (talk) 17:35, 10 May 2025 (UTC)[reply]

Any predictions?

[edit]

Anyone want to take a guess at when it will happen? You'll probably at least qualify for the Barnstar of Arbitrary Achievement, and bragging rights (at least, until we get to 8 million). Cast your bets... Mathglot (talk) 03:56, 11 May 2025 (UTC)[reply]

This error appears sometimes

[edit]

I wonder if this is connected to the "Search is too busy." error I used to get the other day. If it is, then it seems like Wikipedia itself is either being DDoSed or is experiencing a kind of unintentional equivalent from a high amount of readers attempting to look up Pope Leo XIV (or whichever has been getting lots of pageviews lately), which could be a manifestation of the Michael Jackson effect. Thankfully both of these errors are short-lived and infrequent. – MrPersonHumanGuy 19:43, 13 May 2025 (UTC)[reply]

See phab:T393513. The cause of this is unknown, but not AFAIK caused by traffic spikes, as those are handled by the edge caches and don't reach the database. * Pppery * it has begun... 19:50, 13 May 2025 (UTC)[reply]
WP:VPT is a good spot for technical questions. In general, these kinds of error messages are caught by downtime alert tools and are handled invisibly by WMF SREs, without needing to be reported directly by users. –Novem Linguae (talk) 01:43, 14 May 2025 (UTC)[reply]

Concerns Regarding Cross-Wiki Conduct and Tone by Administrator Bedivere

[edit]

Hello community, this is to notify that there is a request for comment on Meta that some users might be affected. You can join the discussion here.

Please do not reply to this message. 📅 05:16, 15 May 2025 (UTC)[reply]

Have editors become free labor for AI techbro oligarchs?

[edit]

Recent news reports say that traffic to AI website ChatGPT has surpassed Wikipedia.org. I used to derive pleasure from providing information to the whole world ... I had no qualms about donating hundreds, even thousands of hours of my time: I did it proudly. It seemed noble.

But now Wikipedia is one of the primary sources of raw data for the AI models. In a couple of years, almost all people will directly ask an AI tool (which will rely heavily on Wikipedia articles) and bypass Wikipedia altogether. It is inevitable; can't stop progress. Granted, the work of WP editors is still (indirectly) helping millions of people around the globe ... even when people go through AI to get the information.

But what bothers me is: the owners/C-suite executives of the AI companies are getting exceedingly wealthy, off the back of free labor from Wikipedia editors. What was once noble, now feels like exploitation. Noleander (talk) 17:28, 15 May 2025 (UTC)[reply]

That is the nature of all such projects. Surely you're not surprised people are actually taking us up on the "even commercially" clause of the CC-BY-SA license we use? RoySmith (talk) 19:26, 15 May 2025 (UTC)[reply]
You don't actually have to be "surprised" to decide that something feels icky to you. WhatamIdoing (talk) 21:33, 15 May 2025 (UTC)[reply]
@RoySmith No, I'm not surprised, just saddened. Sure, WP was always copied & used freely, even by commercial ventures. But the AI companies are massively profitable (Google, Microsoft, Facebook, etc) ... in the past, companies that copied WP for profit seemed marginal, and not exploitive.
Another thing that is changing is that people used to visit the WP web site(s) a lot; but that seems to be declining due to AI (so says the recent internet stats) ... one can image - 5 years in the future - that users never visit the WP web sites, and instead get all of that same info from AI portals. In that scenario: WP is simply raw data for AI, and WP editors are exploited drones.
Much of the pride of being a WP editor will disappear in that scenario, at least for me. Noleander (talk) 23:07, 15 May 2025 (UTC)[reply]
Hard to say if AI summaries will steal 20% of our traffic or 50% or whatever. Or it could be a big nothing burger. Microsoft used to think that tablets would replace most desktop computers too (think Windows 8). Sometimes hype cycles (new technologies) flop pretty spectacularly after the initial hype. –Novem Linguae (talk) 23:38, 15 May 2025 (UTC)[reply]
@Novem Linguae You're right about the hype possibility: Hard to predict what the digital world will be like 10 or 20 years from now.
I enjoy editing WP, it is a hobby. I'm not suggesting that editors should be paid by massively profitable AI companies ... But wouldn't it be nice if the AI companies made some donations to Wikimedia Foundation in recognition of the value of the WP raw data? Noleander (talk) 23:47, 15 May 2025 (UTC)[reply]
I do not think it can just be called hype. Since I'm in college, I can confidently say that no one around me does their assignments the traditional way now. Everyone uses ChatGPT or whatever, even if it is known to hallucinate or spit rubbish sometimes. If this is the confidence with which people are using AI now, and such is their dependency on it, it is extremely difficult to revert back to when there was no AI. And all those tech giants pushing AI summaries over anything else doesn't help either. CX Zoom[he/him] (let's talk • {CX}) 01:58, 16 May 2025 (UTC)[reply]
Anecdotally, I stopped visiting Wikipedia for general reference when the default layout redesign was launched. I find it harder to read and navigate, but I don't care to create an account just for that. I think the change coincided with the rise in popularity of LLMs, so if I'm in any way representative, that might be a significant factor too. I doubt most people care about it as much as I do and most people are probably used to it by now, but maybe it had some effect. 207.11.240.2 (talk) 08:59, 16 May 2025 (UTC)[reply]
Bloggers and commercial sites (some, not all) have been copying from us without attribution for years. What seems to have changed is that search engines now prioritize their own LLMs over WP. Running LLMs is quite expensive, however. This article is six months old, but I suspect the companies pushing LLMs haven't seen a profit yet. Whether they will in the foreseeable future is a question I can't answer. Donald Albury 22:07, 15 May 2025 (UTC)[reply]
@Donald Albury - Isn't it true that the biggest AI companies are Google, Microsoft, Musk, and Facebook? (OpenAI/ChatGPT is partially owned by Microsoft, I believe). Those are huge, profitable companies, and their executives make big $$$$$. Sure, they may stick their AI work into subsidiaries that lose money on paper, but the parent companies continue to be profitable. And the loss-leader AI subsidiaries drive customers to the parent apps/websites, which have ads, etc.
Example: in the future, most questions that people type into Google web site will be run thru Google's AI. I foresee Google's AI using WP as a primary source. So WP editors are working - unpaid - for Google. It is bothersome that most Google employee get paid, but WP editors would not. Noleander (talk) 23:17, 15 May 2025 (UTC)[reply]
But, will they continue to pump money into running LLMs if they do not become profitable? Big companies will pour money into developing products, but if the products do not become profitable within some period, they will cut their losses. So the questions are, when or if will LLMs become profitable to operate, and if they do not become profitable, will one or more companies continue to subsidize them because of other perceived benefits? Donald Albury 02:22, 16 May 2025 (UTC)[reply]
Google (and much of the rest of the word) runs on Linux. Does it bother you that Linux developers don't get paid? If you don't want people to make money off your volunteer efforts, find projects to contribute to which don't allow commercial use. RoySmith (talk) 13:16, 16 May 2025 (UTC)[reply]
You're right: there are many examples of billionaires profiting from the free labor of volunteers. But that doesn't make it fair or ethical. The 1% oligarchs shouldn't be able to hoard 99% of the planet's wealth .... Nothing wrong with pointing that profiteering off free labor is happening here in the context of Wikipedia.
Volunteer scientists around the world for centuries have built-up useful, global knowledge without pay. But were oligarchs routinely profiting from that? Yeah, probably sometimes, but not it was not common.
In addition to the issue of "should AI companies pay WP for its content" is a related issue of "It's kinda sad that visits to WP articles are gradually diminishing as people shift to AI portals".
The same thing happening to WP is also happening with Stack Exchange ... for the past decade a very popular online resource (built by volunteers) for engineers ... but now its web traffic is dropping because its user base is shifting to AI portals. Noleander (talk) 13:42, 16 May 2025 (UTC)[reply]
But now Wikipedia is one of the primary sources of raw data for the AI models. Is it? I mean, that would be a scandal for anyone trying to push such AI as reliable because even Wikipedia says Wikipedia isn't a reliable source. So I'm both surprised and a bit suspicious about that claim. At any rate, it would make more sense for AI to be told to follow all the references cited on Wikipedia and glean from them. Largoplazo (talk) 22:36, 15 May 2025 (UTC)[reply]
@Largoplazo I hope you're right. But I am pretty sure that AI _is_ using WP as a primary source of its data. For the past 2 months I've tried using AI a couple dozen times to find new sources for research I'm working on, and at least 80% of the results are facts (generally correct) that include a "source link" (a kind of AI footnote) pointing to a WP article (often the article I'm working on  :-) Noleander (talk) 23:10, 15 May 2025 (UTC)[reply]
The open source movement in general has pros and cons. People we don't want to use our open source work using our open source work is certainly one of the cons. Won't stop me from editing though. –Novem Linguae (talk) 23:39, 15 May 2025 (UTC)[reply]
Just as an interesting aside, of the court cases challenging whether using copyrighted materials consistitutes fair use, the courts seem to be siding for creators. If this holds, then arguably any AI that has used WP content needs to follow up by including necessary attribution licenses per CC-By, or otherwise seek an exemption license from WMF. Nothings final yet though. Masem (t) 00:00, 16 May 2025 (UTC)[reply]
Yeah, I've been following that legal issue closely. That is a battle between titans: on the one hand Google/Microsoft/Facebook: on the other hand: Hollywood/Music/authors. The "fair use" exception is so broad, who knows how SCOTUS will ultimately rule. I was happy to see a court decision in Australia about 2 years ago where they forced search engines (Google, etc) to pay $$$ to news sources, when the search app was earning massive revenue for merely listing the news articles, and paying nothing for the content.... that at time when newspapers are dying at an alarming rate. Noleander (talk) 00:48, 16 May 2025 (UTC)[reply]
It doesn't have any effect on my ability to write Wikipedia articles or other people's ability to read them, so I don't see why it should make any difference to me. User:Thebiguglyalien/Wikipedia is not about page views. Thebiguglyalien (talk) 🛸 02:22, 16 May 2025 (UTC)[reply]
Most of us who edit Wikipedia were sucked into it while we used to read it. A generation that never visits Wikipedia to read it would not feel the urge or need to edit contents here. CX Zoom[he/him] (let's talk • {CX}) 02:29, 16 May 2025 (UTC)[reply]
Ever since AI exploded, I've started to understand how the editors of Encyclopedia Britannica (hardcopy) must have felt in the 1980s ... wondering if your entire medium will become irrelevant.
I wonder if AI will continue to make lots of mistakes, leading to increased attention to the quality of the raw data (especially WP articles) ... if so, WP will become more important, not to say more often viewed. Noleander (talk) 02:58, 16 May 2025 (UTC)[reply]
On the other hand, the Wikipedia screenshot as a questionable source has been dethroned by llm screenshots, so we've got no longer being the generic lowest common denominator going for us. CMD (talk) 10:25, 16 May 2025 (UTC)[reply]
:-) Donald Albury 15:01, 16 May 2025 (UTC)[reply]
Since I discovered that it is possible to get paid (to the tune of a 5 figure sum in a matter of a few months), for stumping frontier LLMs on a platform that I won't name (but whose clients undoubtedly include OpenAI, Google, Meta etc.) my editing on wikipedia has all but ceased. The latest LLMs are data hungry, they have pretty much exhausted all open sources of information. Polyamorph (talk) 15:09, 16 May 2025 (UTC)[reply]
I think the remarkable aspect isn't that businesses take advantage of free work (they've been doing that forever), but that so many people have been willing to contribute their work for anyone to freely use (which I wouldn't have predicted at Wikipedia's genesis). For Linux, there's a huge network effect that makes it beneficial to its contributors, but there's nothing equivalent for Wikipedia at the scale of its volunteer base. This probably makes Wikipedia vulnerable to disenchantment, and as others have said, losing readers through less prominent positioning in search results affects recruitment of new editors. isaacl (talk) 16:46, 16 May 2025 (UTC)[reply]

Call for Candidates for the Universal Code of Conduct Coordinating Committee (U4C)

[edit]

The results of voting on the Universal Code of Conduct Enforcement Guidelines and Universal Code of Conduct Coordinating Committee (U4C) Charter is available on Meta-wiki.

You may now submit your candidacy to serve on the U4C through 29 May 2025 at 12:00 UTC. Information about eligibility, process, and the timeline are on Meta-wiki. Voting on candidates will open on 1 June 2025 and run for two weeks, closing on 15 June 2025 at 12:00 UTC.

If you have any questions, you can ask on the discussion page for the election. -- in cooperation with the U4C,

Keegan (WMF) (talk) 22:07, 15 May 2025 (UTC)[reply]

Mako Hill on "The Challenge of Peer-Produced Websites "

[edit]

Recent article on the University of Washington website. Unsurprisingly, a fair amount about Wikipedia in there. - Jmabel | Talk 17:03, 16 May 2025 (UTC)[reply]