Deprecated: Function get_magic_quotes_gpc() is deprecated in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 99

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 619

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1169

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176
8000 Update GemTypo to use the -/_ variation detection by rietta · Pull Request #2341 · rubygems/rubygems.org · GitHub
Nothing Special   »   [go: up one dir, main page]

Skip to content

Conversation

rietta
Copy link
Contributor
@rietta rietta commented Apr 30, 2020

This update changes the method used by GemTypo from the Levenshtein distance
comparison with popular Gems with 10 million or more downloads to a simpler
check that compares the proposed Gem with all of its - and _ characters removed
to all existing Gems with their - and _ characters removed.

This comparison is performed completely within PostgreSQL and an index is added
to ensure that a table scan is not necessary. This means that this typo protection
can be enforced against any other already published Gem rather than just popular
ones with 10 million or more downloads.

To support the suitability of this new approach, we conducted an analysis of the malicious gems published in the Reversing Labs data set from February 2020. A rake task was used to use the GemTypo class to check each known malicious gem to for matches. Timing data was collected and written out to a CSV file, which was then loaded into spreadsheet to compute the relevant statistics.

The code for collecting the data was:

 baddies.each do |malicious_gem|
      start_time = Time.now
      typo_check = GemTypo.new(malicious_gem)
      typo_check.protected_typo?
      end_time = Time.now

      puts [malicious_gem, typo_check.protected_gem, end_time - start_time].to_csv
    end

With a sample size of 725 database checks, the timing in seconds for checks resulting in:

  • Min Seconds 0.0087
  • Max Seconds 0.2499
  • Average Seconds 0.1221

Please see this attached XLSX for details.

Malicious Gem Detection Timing Statistics Based on New Method and Reversing Labs List of Baddies.xlsx

My April 2020 analysis of the typo squatting attack in February of 2020 found that
none of the target Gems were protected at the time and the vast majority of the
malicious Gem names would have been prevented by this check if it had been in place
at the time. Please see https://speakerdeck.com/rietta/securing-the-open-source-software-supply-chain
slides 22 and 23 for the publication of my findings at this time.

RubyGems Typo Squat Report

Pie Chart of -_ Confusion

This update changes the method used by GemTypo from the Levenshtein distance
comparison with popular Gems with 10 million or more downloads to a simpler
check that compares the proposed Gem with all of its - and _ characters removed
to all existing Gems with their - and _ characters removed.

This comparison is performed completely within PostgreSQL and an index is added
to ensure that a table scan is not neccesary. This means that this typo protection
can be enforced againstn any other already published Gem rather than just popular
ones with 10 million or more downloads.

My April 2020 analysis of the typo squatting attack in February of 2020 found that
none of the target Gems were protected at the time and the vast majority of the
malicious Gem names would have been prevented by this check if it had been in place
at the time. Please see https://speakerdeck.com/rietta/securing-the-open-source-software-supply-chain
slides 22 and 23 for the publication of my findings at this time.

Fix same name error

This update fixes the error where a gem that is being upgraded was
considered a typo squat. Additionally, I simplified the protected_typo?
method a bit and ensured that only one query was neccesary rather than
two without overfetching data.

Refactor GemTypo tests

Updates to index based on internal code review and stats gathering

Fix broken test

Allow update gem pre-existing Gem that is a typo variant of another

For existing data, where there may be pre-existing typo variants such that
awesome_gem and awesome-gem both already exist. Then skip typo checking if
the current gem name matches either. This does not reduce security in that
other validations already prevent a user from publishing a Gem that has the
exact same name as a Gem already published by another user.

Consider yanked Gems when determining

This update considers the yanked_at status for potential exact name
matches such that a Gem that has at least one not-yanked version would
be considered an exact match. However, a Gem that has all versions yanked
and is also a typo squat of another Gem would be determined to be a
typo squat.

Update GemTypo to use existing case insensitive name index

This update removes the unnecsary downcasing of the target Gem name
and uses the existing UPPER(name) index on the rubygems table to check
for exact name matches.
@rietta
Copy link
Contributor Author
rietta commented May 6, 2020

Any feedback? Concerns?

8000

@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

Is there anything I can do to help address silent concerns about this approach?

While I am not a committer, I am willing to help code review some of the other pull requests as well if that would be helpful.

@sonalkr132
Copy link
Member

Is there anything I can do to help address silent concerns about this approach?

We don't have any personal vendetta against you or this PR 😅 Reviews are hard to come by on rubygems.org in general.
If it makes you feel any better. you should know we did have an internal discussion about adding this and we are willing to try this out, so that's some progress.
We need to verify your observation about the impact of adding index with the data dump, hopefully I will find time for this sometime soon. In the meantime, please remove , limit: 255 addition from schema.rb.
Thank you for your patience.

Running rake db:migrate had added these limits to the schema automatically. This reverts them.
@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

Thanks @sonalkr132! For your comparison purposes, I just tested the creation of the index on a fresh copy of the database dump from 2020-05-15:

SQL

DROP INDEX IF EXISTS dashunderscore_typos_idx;
CREATE INDEX dashunderscore_typos_idx ON rubygems (regexp_replace(upper(name), '[_-]', '', 'g'));

Results

NOTICE:  index "dashunderscore_typos_idx" does not exist, skipping
Query returned successfully with no result in 465 msec.

@simi
Copy link
Contributor
simi commented May 15, 2020

ℹ️ Adding index should not be problem with concurrently strategy since it doesn't lock the table and is safe to do in production without any downtime.

Should be enough to change index definition to CREATE INDEX CONCURRENTLY ....

@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

One open question is is it desirable to consider the user ID such that a user may typo squat on his or own own gem. So if there was an awesome_gem then the same author could publish awesome-gem. This has happened before. I do not believe that it is necessarily desirable and leads to confusion. Of course existing gems that are not malicious should be allowed to continue to be updated.

@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

To get a sense of some of the Gems names that would be blocked, in the PostgreSQL database dump, you can run:

SELECT
	A.id,
	A.name,
	A.created_at,
	A.updated_at,
	B.id AS match_rubygem_id,
	B.name AS match_name,
	B.created_at AS match_created_at,
	B.updated_at AS match_updated_at
FROM rubygems A INNER JOIN rubygems B ON
	A.name <> B.name
		AND regexp_replace(upper(A.name), '[_-]', '', 'g') = regexp_replace(upper(B.name), '[_-]', '', 'g')
		AND A.created_at > B.created_at
ORDER BY updated_at DESC

@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

information_source Adding index should not be problem with concurrently strategy since it doesn't lock the table and is safe to do in production without any downtime.

Should be enough to change index definition to CREATE INDEX CONCURRENTLY ....

Thanks @simi. I just committed this suggested change to the database migration.

@rietta
Copy link
Contributor Author
rietta commented May 15, 2020

Actually @simi, CREATE INDEX CONCURRENTLY seems to have broken the CI build.

PG::ActiveSqlTransaction: ERROR:  CREATE INDEX CONCURRENTLY cannot run inside a transaction block

/home/travis/build/rubygems/rubygems.org/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.2.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:92:in `exec'

/home/travis/build/rubygems/rubygems.org/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.2.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:92:in `block (2 levels) in execute'

/home/travis/build/rubygems/rubygems.org/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.2.2/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'

/home/travis/build/rubygems/rubygems.org/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.2.2/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'

/home

@rietta rietta force-pushed the rietta/typo-squatter-protection branch from 87c6454 to e45e77d Compare May 15, 2020 21:51
This update at the suggestion of @simi, asks PostgreSQL to create the new dashunderscore_typos_idx
index in the background without blocking. This will allow the migration to run quickly and cause
no downtime.
@ni3t
Copy link
ni3t commented May 16, 2020

We use https://github.com/adomokos/light-service pretty heavily in production and I know there has been a PR from @adomokos (adomokos/light-service#62) to change it from light-service to light_service, so @rietta's suggestion about an author changing the name of his own gem might be valid. In that case the gem is called LightService, not Light::Service, so the underscore makes more sense from a ruby standard practice perspective.

@adomokos
Copy link

@ni3t, I have not done the rename itself, it's still named light-service, I only brought it up as an issue to gauge interest. Thanks for mentioning it here!

@rietta
Copy link
Contributor Author
rietta commented May 18, 2020

@simi What are your thoughts on the gems by the same user aspect? Worth a tweak or should be left out?

@sonalkr132
Copy link
Member

I wouldn't recommend it. Following was pointed to us, I hope we agree that allowing this is just confusing:
https://rubygems.org/gems/google-cloud-web_risk
https://rubygems.org/gems/google-cloud-webrisk

Perhaps we can update this to match against indexed rubgyems only.

@rietta
Copy link
Contributor Author
rietta commented May 18, 2020

If there is a legitimate need to rename Gems, I would prefer to see an official method that preserves the old name as a symlink/slug rather than adhoc duplication.

assert_equal true, gem_typo.protected_typo?
end
gem_typo = GemTypo.new("delayed_job_active_record")
assert_equal false, gem_typo.protected_typo?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain the difference between return false for exact match and return false for any exact match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test asserts that the owner of the existing delayed_job_active_record Gem can push an update even though there is an existing typo squat delayed-job-active-record that would otherwise block the update.

end
should "return true for three -/_ character change" do
gem_typo = GemTypo.new("delayed-job-active-record")
assert_equal true, gem_typo.protected_typo?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please justify your reasoning of adding two -/_ change and three -/_ character change? I am not sure I understand what additional value are these two adding.

Copy link
Contributor Author
@rietta rietta May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons. 1) The regex has a 'g' attribute for replace all, but if it didn't it would only replace the first one. Without at least one of these tests that would not fail if the regex was wrong and only replaced the first match. 2) It keeps these tests parallel with the previous tests.

end
should "return true for three -/_ character change" do
gem_typo = GemTypo.new("delayed-job-active-record")
assert_equal true, gem_typo.protected_typo?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add tests for logic in published_exact_name_matches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this also means changing published_exact_name_matches from being a private method to a public one. That's why I had not written explicit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this doesn't have to mean published_exact_name_matches to be public. Testing protected_typo? for behavior when yanked gem exists would work as well. Does such an implicit test already exist and I am just not noticing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is the equivalent to

RubyGem.with_versions.where('upper(rubygems.name) = upper(?)', @rubygem_name) 

It is looking for published Gems with the name of our Gem. This is the method that allows grandfathered Gem to actually be published. Otherwise, if it is not indexed and typo squats on another variant it will be blocked.

return false if @rubygem_name.size < GemTypo::SIZE_THRESHOLD

gem_typo_exceptions = GemTypoException.all.pluck(:name)
return false if gem_typo_exceptions.include?(@rubygem_name)

This comment was marked as resolved.

This comment was marked as resolved.

class GemTypo
attr_reader :protected_gem

include Gem::Text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and require on line 1 can be removed.

if: :needs_name_validation?
validate :blacklist_names_exclusion
# validate :protected_gem_typo, on: :create, unless: -> { Array(validation_context).include?(:typo_exception) }
vali 751D date :protected_gem_typo, on: :create, unless: -> { Array(validation_context).include?(:typo_exception) }

This comment was marked as resolved.

.pluck(:name)
# This SQL query uses an index and thus does not induce a full table
# scan in PostgreSQL. See
# 20200429005140_add_dash_underscore_typo_detection_index_to_rubygems.rb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to keep this comment. It doesn't seem to be explaining anything out of ordinary. May add it to commit message?

@rietta
Copy link
Contributor Author
rietta commented Jun 11, 2020

Thanks @DavisC0801 for pushing the fix!

@rietta
Copy link
Contributor Author
rietta commented Jun 12, 2020

@sonalkr132 Thanks. I think this is ready for another look now.


return unless gem_typo.protected_typo?
errors.add :name, "'#{name}' is too close to typo-protected gem: #{gem_typo.protected_gem}"
errors.add :name, "Your gem '#{name}' is too similar to typo-protected gem named '#{gem_typo.protected_gem}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change type-protected gem to an existing gem. There is no non-typo-protected gem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote error message on branch, thank you for the suggestion!

# assert_equal "There was a problem saving your gem: Name 'test' is too close to typo-protected gem: best", @response.body
# end
# end
context "On POST to create with a protected gem name" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context "On POST to create with a protected gem name" do
context "On POST to create with an underscore or dash variant of an existing gem" do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestion on-branch, thank you!

refute gem_typo.protected_typo?
end

should "return false for an exact match of a yanked gem so a gem with an identical name can be published in the future" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as test on line 12.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored tests to test deleted_active_record_gem in order to avoid confusion, test gem was previously named deleted_job_active_record.

end
should "return true for three -/_ character change" do
gem_typo = GemTypo.new("delayed-job-active-record")
assert gem_typo.protected_typo?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case for delayed-job-activerecord being protected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added on branch, including additional tests for missing -/_.

return true
end
end
match = matched_protected_gem_names.select(:id, :name).first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we selecting id here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we selecting id here?

For backwards compatibility with any existing code that was using the match to render an error message. It selected the object, so this selects the ID and the name as a minimum set needed to instantiate the model for that purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility with any existing code

I don't think there is any existing code that uses id. If there is such code can you please point it out to me? My issue with using ID here is that it will create confusion in the future about why was this added if not used anywhere.

db/schema.rb Outdated
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 2019_08_31_172741) do
ActiveRecord::Schema.define(version: 2020_04_29_005140) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update this number. we deployed some migrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict has been resolved.

@sonalkr132
Copy link
Member

please fix the merge conflict.

@rietta
Copy link
Contributor Author
rietta commented Jun 21, 2020

@sonalkr132 The merge conflict appears to have been resolved.

should respond_with :forbidden
should "not register new gem" do
assert_equal 1, Rubygem.count
assert_equal "There was a problem saving your gem: Name Your gem 'test' is too similar to an existing gem named 't_es-t'", @response.body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please drop Your gem. Following reads better:

Name 'test' is too similar to an existing gem named 't_es-t'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on branch, thank you!

Copy link
Member
@sonalkr132 sonalkr132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment. Otherwise LGTM 👍 We can merge this in a few days if no one else has any further comments.

@rietta
Copy link
Contributor Author
rietta commented Jul 8, 2020

Anything left to do?

@sonalkr132
Copy link
Member

Not really. I wanted to test this locally before merging. Probably tomorrow or the day after it.

@rietta
Copy link
Contributor Author
rietta commented Jul 9, 2020

Thank you for your diligence. It speaks well to your leadership of this project :-)

Copy link
Member
@sonalkr132 sonalkr132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is not hitting the index added here. please fix that. Thank you.

return true
end
end
match = matched_protected_gem_names.select(:name).first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of using first here, rails is adding order by and it is not using the intended index:

rubygems_development=# explain analyze SELECT "rubygems"."name" FROM "rubygems" WHERE "rubygems"."indexed" = true AND (regexp_replace(upper(name), '[_-]', '', 'g') = r
egexp_replace(upper('r__d-oc-js--on---'), '[_-]', '', 'g')) ORDER BY "rubygems"."id" ASC limit 1;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..15.43 rows=1 width=18) (actual time=310.622..310.623 rows=1 loops=1)
   ->  Index Scan using rubygems_pkey on rubygems  (cost=0.42..12038.09 rows=802 width=18) (actual time=310.620..310.620 rows=1 loops=1)
         Filter: (indexed AND (regexp_replace(upper((name)::text), '[_-]'::text, ''::text, 'g'::text) = 'RDOCJSON'::text))
         Rows Removed by Filter: 172345
 Planning time: 0.340 ms
 Execution time: 310.659 ms
(6 rows)

rubygems_development=# explain analyze SELECT "rubygems"."name" FROM "rubygems" WHERE "rubygems"."indexed" = true AND (upper(name) != upper('r__d-oc-json---') and
regexp_replace(upper(name), '[_-]', '', 'g') = regexp_replace(upper('r__d-oc-js--on---'), '[_-]', '', 'g'));
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on rubygems  (cost=23.09..1414.05 rows=802 width=14) (actual time=0.048..0.050 rows=1 loops=1)
   Recheck Cond: (regexp_replace(upper((name)::text), '[_-]'::text, ''::text, 'g'::text) = 'RDOCJSON'::text)
   Filter: (indexed AND (upper((name)::text) <> 'R__D-OC-JSON---'::text))
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on dashunderscore_typos_idx  (cost=0.00..22.89 rows=862 width=0) (actual time=0.033..0.033 rows=1 loops=1)
         Index Cond: (regexp_replace(upper((name)::text), '[_-]'::text, ''::text, 'g'::text) = 'RDOCJSON'::text)
 Planning time: 0.299 ms
 Execution time: 0.096 ms
(8 rows)

.pluck(:name)
def matched_protected_gem_names
Rubygem.with_versions.where(
"upper(name) != upper(?) AND regexp_replace(upper(name), '[_-]', '', 'g') = regexp_replace(upper(?), '[_-]', '', 'g')",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper(name) != upper(?) seems unnecessary. It is needed please add a test case which would fail if I remove this.

sonalkr132 added a commit that referenced this pull request Aug 2, 2020
Using `first` was adding order by and it was not hitting index
added in #2341.

Removed upper(name) != upper(@rubygem_name) as it was redundant when
using `published_exact_name_matches`.

`rubygems/text` was not used since we removed `levenshtein distance`
func.
@sonalkr132
Copy link
Member

merged in 86ab6ab

Thank you for working on this.

@sonalkr132 sonalkr132 closed this Aug 2, 2020
@rietta
Copy link
Contributor Author
rietta commented Aug 7, 2020

Thank you. It was a pleasure to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

0