Any idea why that happens? I get it after long running queries. Sample: quarry:query/86096. If I happen to have the window open, I sometimes see the actual results before, meaning the query was successful. Usually I forget to export it before it disappears.
Talk:Quarry
Appearance
Hello! it seams to be an internal bug in Quarry. Could you open a bugreport on Phabricator for it? Thanks!
Which is more efficient?
- AND NOT ( lt_title = "ABC" ) AND NOT ( lt_title = "XYZ")
- AND NOT in ( "ABC", "XYZ")
- AND lt_title <> "ABC" AND lt_title <> "XYZ"
Agree that none is ideal.
As the first thing, the query engine builds a query plan, then executes the query according to the plan. You can check if the query plan is always the same (e.g., using Toolforge SQL Optimizer). If it is, there is no difference.
I prefer NOT IN
.
How to specify tables from two different databases (wikidatawiki_p and commonswiki_p)?
I tried
- SELECT * FROM `commonswiki_p`.`pages` LIMIT 1
- USE DATABASE commonswiki_p
- USE commonswiki_p;
to override what's specified in the GUI.
It's not possible since around 2021. See Topic:W6tzj276xib56phf.
They are completely separate DB servers, you cannot make queries across multiple servers in the same query.
Apparently it was possible (see sample in the topic referenced by Matej) but then un-featured.
I found an easier solution, as the gap between Wikidata and Commons is only partial: one table at Commons is updated ( wbc_entity_usage), but not the other (page_props): quarry:query/86040
"Apparently it was possible " Yes, until the infrastructure ran into scaling problems.
Are there any measure in place to keep the databases in sync? The gap mentioned above is minor in percentages (maybe 0.1%), but in absolute numbers 4600 is a lot.
Is there a way to find Commons files that
- use P625 SDC property
- do not transclude c:Module:Coordinates.
Seems we might need the "Pages with maps" category again.
See also: c:Commons:Bots/Work_requests#Add_missing_Template:Location
The enwiki database has been on replag for an entire week now. It should hopefully be fixed in the next week or so.
I was able to complete the above SQL query as seen in https://quarry.wmcloud.org/history/84807/911494/884555.
However, when I run the same query again at https://quarry.wmcloud.org/query/84807 . I got the error "OperationalError('table resultsets already exists')". How should I fix the error?
I guess that resultsets may be a temporary table that was produced from running my previous SQL query. I tried to execute the following:
DROP TABLE IF EXISTS resultsets;
However, I got the following information: “Access denied; you need (at least one of) the SUPER, READ_ONLY ADMIN privilege(s) for this operation”
May anyone help me on this issue?
Thanks.
I just found that the above query is complete now. The issue looks to be automatically solved, at least for now, although I still don't know the reason of the "OperationalError('table resultsets already exists').
However, my query https://quarry.wmcloud.org/query/84817 has just failed still with the same reason of the "OperationalError('table resultsets already exists').
May anyone know how to handle "OperationalError('table resultsets already exists')?
Thanks.
The issue looks to be automatically solved again. Did someone help in background?
Thanks a lot.
This post was hidden by Gluo88 (history)
Is there a delay between a tool database being created and it being available in Quarry? It looks like s55926__wishlist_p can not be queried (it was only created today): https://quarry.wmcloud.org/query/11263
As far as I know, tools databases are not public, and so not in quarry.
@TheDJ: That used to be the case, but I'd thought that recently (phab:T151158) it had become possible. It's only databases with names ending in _p
.
Quarry reads from the mirror of the primary ToolsDB rather than the primary. The mirror is lagged by 15 hours at the moment: https://grafana.wmcloud.org/d/PTtEnEyVk/toolsdb-mariadb?orgId=1&var-server=tools-db-3
Ah that's good to know, thanks! I'll be patient. :-)
Hi, I'm trying to optimize a simple SQL query for pages on commonswiki (table page) which start with a certain string (e.g. "SELECT * FROM page WHERE page_title LIKE "Building%" ORDER by page_title;"
- see also https://quarry.wmcloud.org/query/83277 for an example with a smaller result set). The SQL Optimizer on Toolforge tells me that this query would use filesort instead of indexes which causes performance issues ("Query plan 1.1 is using filesort or a temporary table. This is usually an indication of an inefficient query. If you find your query is slow, try taking advantage of available indexes to avoid filesort."). The DB schema documentation tells me that there should be an index defined (key: page_name_title
) on page_title and page_namespace columns. The MySQL docs tell me that i could use USE INDEX (page_name_title)
to enforce using that index. If I add that clause to my query (SELECT * FROM page USE INDEX (page_name_title) WHERE page_title LIKE "Building%" ORDER by page_title;
), the SQL optimizer complains about a "Query error: Key 'page_name_title' doesn't exist in table 'page'". At the Commons village pump, I've learned that the replicas may lack the indices. So, I'm not sure if there's a way to optimize such a query on Quarry. I would prefer using Quarry instead of the toolforge CLI because Quarry allows for linking queries and results.
This query cannot use the index since you don't have a condition on page_namespace
.
Aww - ok :-) - yes, with such a condition, it works like a charm - thanks a lot!
Hi, I am doing some research on sockpuppets. I'm trying to use the SQL database to assist in building my dataset, but queries seem to take a long time.
For instance, this previous query ran in 196 seconds:
https://quarry.wmcloud.org/query/61732
Whereas my identical query has been going for over 9 hours:
https://quarry.wmcloud.org/query/83588
A previous instance was running over a week before I restarted it.
Am I doing something wrong, or are these running times typical?
On another note, I have been looking for complete SQL dumps to run my own instance so that I don't cause issues with excessive queries, however I can only find complete xml dumps (https://dumps.wikimedia.org/enwiki/20240601/).
Is there any way to get a full SQL dump (complete with article revision history)?
Thank you for your help.
Using Toolforge CLI, the query returns 187814 rows in 31.187 sec. Maybe that's too much for Quarry.
Recently, some of my queries get immediately queued and won't run. If I fork the query, then it runs immediately, but now I have a bunch that are marked as queued but not running. Is there a way to kill these old ones, and any ideas why this happens?
Here is an example of a few that are still queued:
https://quarry.wmcloud.org/query/71639
https://quarry.wmcloud.org/query/83591
https://quarry.wmcloud.org/query/83590
Thanks!