Active on Wikimedia Commons and the Dutch Wikipedia. Interested in Monuments, helping new users find their way, automation and batch uploads.
I have a background (and work) in Data Science, Machine Learning and Artificial Intelligence.
User Details
- User Since
- Oct 13 2014, 9:30 PM (530 w, 2 d)
- Availability
- Available
- IRC Nick
- Basvb
- LDAP User
- Unknown
- MediaWiki User
- Basvb [ Global Accounts ]
Feb 26 2018
Aug 21 2017
Aug 8 2017
An interesting test project could be the detection of images which need to be rotated. For these images getting (endless) training data is trivial: We can assume 99,9% of the images have the correct orrientation. We can select all images as the positive class and rotate an equal amount of images and use these as the negative class.
Aug 1 2017
Jun 9 2017
Hi Infobliss, the UI looks like a good start. Am I correct that there is nothing behind the pages yet? Or do I have to login using OAuth first?
May 27 2017
About the https://github.com/infobliss/sibutest/blob/master/NationaalArchief2.py file: nice work, some comments:
May 21 2017
May 13 2017
May 8 2017
I added Pattypan and the GlamWiki toolset, they are currently used a lot for batch uploads and thus it is important to learn lessons from them, however it is also important that the SIBU tool isn't a copy of those, but aims at a different set of images/type of uploading.
May 7 2017
Sorry for the difficulties between Google and the University, hopefully we can still see you around (maybe in a next round?). I'll be closing the task, you're always welcome to come and discuss with us.
Thank you very much for your proposal, we ended up selecting another candidate
Thank you very much for your proposal. We had to pick between multiple suitable candidates for 1 place and ended up selecting another candidate. I send you an email with more information and some good points and potential points for improvement. I hope to see you around, just around at Wikimedia coding or in one of the next rounds.
Apr 29 2017
@Kamsuri5: I'll be on IRC tomorrow for quite a bit so we can discuss it more in depth. Some first questions/remarks after a quick look. The main part of this is at https://github.com/kamsuri/Single-Image-Batch-Upload/blob/master/extractor.py ? Can you explain there a bit what it does, currently the code has no comments at all and it's a bit difficult to go through. It also looks a lot like: https://gist.github.com/shlomibabluki/5539628 if you used somebody else their code at least you should indicate you did so. For open source software (the SIBU is going to be open source software) we should only reuse open source software from elsewhere.
Apr 20 2017
The recently (4-4-2017 (re)-shared message (original 11-08-2016 on commons-l) on wikitech-l about some of Google's work with Commons images might also be relevant. https://www.youtube.com/watch?v=HgWHeT_OwHc&feature=youtu.be&t=2h1m19s and https://cloud.google.com/blog/big-data/2016/05/explore-the-galaxy-of-images-with-cloud-vision-api
I'm interested in thinking along/working on image recognition (and classification) on Commons. See also T155538.
Apr 19 2017
The textstat package looks like a good idea for English. For other languages this might be a bit more difficult to use. Maybe using overall word frequency within a language (wikipedia version) can be used to determine the how complex the terms used in an article are (how many % of the article is top-1000 words, how many top-10000, how many top 100000, and how many outside of that).
Apr 2 2017
@Kamsuri5: Yes if you do an upload with any of the upload scripts in pywikibot you have to create the full wikitext for the image description yourself. This includes an information template and a license template (template meaning wikitemplate).
Hi Poulami, some feedback as requested. It will be short due to my time constraints this weekend and your late submission.
@PoulamiSarkar: A quick heads up: don't forget to set your proposal at the GSOC website to final before the deadline
@djff: A quick heads up: don't forget to set your proposal at the GSOC website to final before the deadline
@Kamsuri5: Please make sure that you have submitted your proposal to GSOC also their website (https://summerofcode.withgoogle.com), I currently do not see your proposal their. Make sure to also set your proposal as final. I'll try to give some more feedback on the proposals (and image uploads) but this might be after the gsoc/outreachy deadlines.
Apr 1 2017
@PoulamiSarkar I think working on both of these very different projects together is not a good idea and suggest you decide which of the two you are interested in the most and write a proposal specifically for that project. In writing your proposal please take into account that we are 2 days away from the deadline. Please take a look at some of the other proposals and suggestions for writing a proposal: For example I would expect more information on your proposed timeline and the exact things you would want to implement.
Mar 30 2017
Do you know what the lessons learned where or the main reasons that this did not happen? I've personally never seen the GWToolset as a tool to upload one-at-a-time with and seen it as too complicated to use for end-users.
and that every person with a Commons account can use that to upload the images from a collection they are interested in.
If everything is ready, then files should be uploaded immediately so that random users have quick access to them. It makes sense to delay the upload only if there is some work left to do and/or the users triggering the upload are experienced and committed users.
There is almost always work to be done after uploading, the question is how much work has to be done and what the quality is without reviewing each file on its own. Currently there are a lot of batch uploaded files without even any categories or other useful information making them hard - if not impossible - to find for reusers. A good example for the work to be done I think is the Rijksmonumenten upload. I think we were fairly successful in stimulating improvements to the file with a good workflow for identifying monuments and with that adding relevant categories. However even in such a successful case, just because of the scale of the upload (400.000+ images) there are tens of thousands of images still needing some kind of fix. I've fixed huge parts of that semi-automatically or by hand. But as one of the uploaders I'm simply not able to committing to fix all of those. So ideally these subsets would have been left out of the batch upload for hand-picking the most useful ones and fixing those.
Thank you very much for your proposal. It looks very good overal. I'll give some pointers as requested.
I understand the confusingness of the name. It is after all a contradictio in terminis. I think that it is a good idea to move to a more descriptive name if/once the projects starts.
Hi @djff, thank you for the overal very good proposal. I'll give some remarks as requested:
We can leave it like this. Yes you have to submit a proposal to both Outreachy and GSoC. I think you are correct that for both of those you have to send in a different proposal. However we also have a proposal here on Phabricator which we use from the Wikimedia side. Personally I think it is better to have 1 proposal here if they are almost same. It gives a bit more clarity and saves all of the people who have to review the proposal from reading two proposals.
Hi Kamsuri,
@Kamsuri5: I see you created another proposal: T161782. Do you want to have two separate proposals for GSOC and outreachy? Or is T161782: GSOC'17 Proposal for Single Image Batch Upload the new proposal and do I no longer have to look at this proposal? If that is the case please close this (T161649) task.
Mar 29 2017
Hi Kamsuri5 thanks for the nice and thorough proposal. I'll be giving some feedback as requested, hopefully you can use the feedback to make an ever better proposal.
Mar 28 2017
@Kamsuri5. First of all nice work. Some pointers: About image 2 and 3 (and the related process step). How do you see this working exactly? Will we generate a list of all of the files the GLAM has or does the user type in a search query? Some of the GLAMs will have 10.000-1.000.000 images, so showing all of them is not an option. Personally I was thinking of letting the user enter the ID for the image. The name for the ID field could be different per GLAM, so we might ask for the "object number" at GLAM 1 and for the "afbeeldingsidentificator" (image ID) for GLAM 2.
Mar 27 2017
@djff, aah ok. Please also upload the work you did on the other task as it will also show your abilities.
in NationaalArchief.py I had some issues with pointing to pywikibot.specialbots as well. I used sys.path.append(YOURPATHHERE) in line 4 as a quick fix, although I believe fixing imports should ideally not be done like that.
Hi @djff, Good to see that you are interested in the project. You can claim the microtasks you are working on here in Phabricator (if there is not yet a sub task for it you or me could create it). You can paste the link to your repo in the subtask ticket and in your proposal.
Mar 26 2017
@Kapilkd13, @Meghana95, and @tskolm: We've now listed a few micro tasks. I'm curious if you are still interested in pursuing a proposal for this project or have found other interesting projects in the meantime?
Mar 24 2017
@Capt_Swing: Thank you again for your thoughts on the project. The upcoming structured data is a good point to keep into consideration. For the past years I personally try to use the regular (structured) templates to convey information, these can then be easily transfered into the structured data format. Within the tool we'd have to keep in mind that likely in a few years the metadata-mappings and some other parts should be changed to directly use structured data.
Mar 23 2017
@Aklapper: description and some further info added, hopefully this is all the info needed, if not I'm happy to add the necessary information.
Mar 22 2017
@srishakatux: Is it possible to have two co-mentors? I've talked to both @tom29739 and @zhuyifei1999 who are willing to help mentor the project.
@Capt_Swing: Am I correct in the interpretation that your message was more of an offer to take a look once or twice at the user-perspective than as an intention to co-mentor?